Filtered Approximate Nearest Neighbor Search in Vector Databases: System Design and Performance Analysis
Abylay Amanbayev, Brian Tsan, Tri Dang, Florin Rusu

TL;DR
This paper systematically evaluates filtering strategies in vector databases for Approximate Nearest Neighbor Search, introducing new benchmarks and metrics to guide system design and optimize performance.
Contribution
It provides a comprehensive taxonomy of filtering strategies, introduces the MoReVec dataset and GLS metric, and offers practical guidelines for system configuration.
Findings
Milvus achieves stable recall with hybrid execution.
pgvector's optimizer often chooses suboptimal plans.
Partition-based indexes outperform graph-based indexes for low-selectivity queries.
Abstract
Retrieval-Augmented Generation (RAG) applications increasingly rely on Filtered Approximate Nearest Neighbor Search (FANNS) to combine semantic retrieval with metadata constraints. While algorithmic innovations for FANNS have been proposed, there remains a lack of understanding regarding how generic filtering strategies perform within Vector Databases. In this work, we systematize the taxonomy of filtering strategies and evaluate their integration into FAISS, Milvus, and pgvector. To provide a robust benchmarking framework, we introduce a new relational dataset, \textit{MoReVec}, consisting of two tables, featuring 768-dimensional text embeddings and a rich schema of metadata attributes. We further propose the \textit{Global-Local Selectivity (GLS)} correlation metric to quantify the relationship between filters and query vectors. Our experiments reveal that algorithmic adaptations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Data Management and Algorithms · Advanced Database Systems and Queries
