Towards Robustness: A Critique of Current Vector Database Assessments
Zikai Wang, Qianxi Zhang, Baotong Lu, Qi Chen, Cheng Tan

TL;DR
This paper critiques the reliance on average recall for evaluating vector databases, introducing a new robustness metric that better captures performance variability across queries.
Contribution
It proposes Robustness-$ delta$@K, a novel metric for assessing vector database robustness, and demonstrates its effectiveness in benchmarking and guiding improvements.
Findings
Robustness-$ delta$@K reveals significant differences in index robustness.
More robust indexes improve downstream application performance.
Design factors influencing robustness are identified and analyzed.
Abstract
Vector databases are critical infrastructure in AI systems, and average recall is the dominant metric for their evaluation. Both users and researchers rely on it to choose and optimize their systems. We show that relying on average recall is problematic. It hides variability across queries, allowing systems with strong mean performance to underperform significantly on hard queries. These tail cases confuse users and can lead to failure in downstream applications such as RAG. We argue that robustness consistently achieving acceptable recall across queries is crucial to vector database evaluation. We propose Robustness-@K, a new metric that captures the fraction of queries with recall above a threshold . This metric offers a deeper view of recall distribution, helps vector index selection regarding application needs, and guides the optimization of tail performance. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
