Toward Understanding Bugs in Vector Database Management Systems
Yinglin Xie, Xinyi Hou, Yanjie Zhao, Shenao Wang, Kai Chen, Haoyu Wang

TL;DR
This paper presents the first large-scale empirical analysis of software bugs in vector database management systems, revealing common fault patterns, bug symptoms, and fix strategies to improve system reliability.
Contribution
It provides a comprehensive taxonomy of bugs, fault patterns, and fix strategies specific to VDBMSs, addressing a significant research gap in system reliability.
Findings
Over half of bugs manifest as functional failures
Identified 31 recurring fault patterns unique to vector search systems
Summarized 12 common fix strategies emphasizing correct program logic
Abstract
Vector database management systems (VDBMSs) play a crucial role in facilitating semantic similarity searches over high-dimensional embeddings from diverse data sources. While VDBMSs are widely used in applications such as recommendation, retrieval-augmented generation (RAG), and multimodal search, their reliability remains underexplored. Traditional database reliability models cannot be directly applied to VDBMSs because of fundamental differences in data representation, query mechanisms, and system architecture. To address this gap, we present the first large-scale empirical study of software defects in VDBMSs. We manually analyzed 1,671 bug-fix pull requests from 15 widely used open-source VDBMSs and developed a comprehensive taxonomy of bugs based on symptoms, root causes, and developer fix strategies. Our study identifies five categories of bug symptoms, with more than half…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpecies Distribution and Climate Change · Advanced Malware Detection Techniques · Scientific Computing and Data Management
