The one DB fits all approach only works when the size of the database is really ...

beoberha · on Oct 3, 2023

Exactly - vector indexes are so different than traditional RDBMS B-Tree or LSM Tree indexes that it doesn’t make sense to use the same store for both unless it’s basically a toy app.

Someone makes the example in another comment, but it’s analogous to OLTP vs OLAP

Foobar8568 · on Oct 3, 2023

I don't even want to imagine the workload on a high txn OLTP mixed with OLAP access pattern. IMHO If you can, you don't need OLAP in the first place.

trashtester · on Oct 3, 2023

My experience is that if you do the data modelling properly a well designed star schema with some aggregation tables or materialized views on top can often remove the need for dedicated OLAP software.

Now you do NOT want to run such a setup on the same hardware that you use for your transactional systems, of course. But you CAN use the same software (like Oracle), which means that you do get some reduction in tech complexity.

totetsu · on Oct 3, 2023

Are there any DB that could support both use cases while being able to partition them in such a way that the transactions etc are only kept on part of the resources they need to be. Basically two seperated DBs but sharing the same interfaces and security etc.

shri_krishna · on Oct 3, 2023

What you are talking about is possible to do in regular SQL dbs with extensions. However, when it comes to scaling traditional DBs don't have the necessary tools to do so automatically. Most extensions provide support for an underlying ANN algorithm it implements and there's that and nothing more. Everything else you'll have to hand roll yourself.

Clustering, load balancing, aggregating queries etc are quite different for a vector database in comparison to traditional OLTP databases.

It's the same as difference between OLAP vs OLTP. Both have different underlying architectural differences which make it incompatible for both to run in an integrated fashion.

For instance, in a traditional DB the index is maintained and rebuilt alongside data storage and for scaling you can separate it into read/write nodes. The write nodes typically only focus on building indexes while the read nodes for querying eventually consistent indexes (eventual consistency is achieved by broadcasting only the changed rows rather than sending entire index).

Now it's similar in vector dbs too. You can seperate the indexer from query nodes (which access eventually consistent index). However, the load is way higher than a regular db as the index is humongous/takes a long time to build and sharing the index with query nodes is also more time consuming and resource/network intensive, as you won't be sharing few rows but the entire index itself. It requires a totally different strategy to get all query nodes to be eventually consistent.

The only advantage of traditional DBs also implementing vector extensions is familiarity for the end user. If you are already familiar with postgres you wouldn't want to leave your comfort zone. However, scaling a traditional DB is different from scaling a vector DB and you'll encounter those pain points only in production and will be forced to switch to proper vector databases anyways.

samlambert · on Oct 4, 2023

We (PlanetScale) announced Vector storage and search today. If I am understanding your request it sounds like something could do. I would love to hear more if you are willing to chat? s@planetscale.com

redwood · on Oct 3, 2023

Absolutely this is par for the course for distributed engines.. just not postgres and other single node engines which a lot of people here will tell you are all you need.. what they mean is that's all you need until you either have to introduce enormously complex application tier sharding or you're moving to a scalable engine