Survey of Vector Database Management Systems
James Jie Pan, Jianguo Wang, Guoliang Li

TL;DR
This survey reviews recent advances in vector database management systems driven by large language models, highlighting new techniques for storage, indexing, and query processing to address key challenges.
Contribution
It provides a comprehensive overview of techniques and systems for vector data management, identifying obstacles and solutions, and discusses benchmarks and future research directions.
Findings
Multiple techniques for vector compression and partitioning are used for storage and indexing.
New operators and optimization techniques improve hybrid query processing.
Diverse VDBMS architectures exist, including native and extended systems.
Abstract
There are now over 20 commercial vector database management systems (VDBMSs), all produced within the past five years. But embedding-based retrieval has been studied for over ten years, and similarity search a staggering half century and more. Driving this shift from algorithms to systems are new data intensive applications, notably large language models, that demand vast stores of unstructured data coupled with reliable, secure, fast, and scalable query processing capability. A variety of new data management techniques now exist for addressing these needs, however there is no comprehensive survey to thoroughly review these techniques and systems. We start by identifying five main obstacles to vector data management, namely vagueness of semantic similarity, large size of vectors, high cost of similarity comparison, lack of natural partitioning that can be used for indexing, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Semantic Web and Ontologies
