Curator: Efficient Vector Search with Low-Selectivity Filters
Yicheng Jin, Yongji Wu, Wenjun Hu, Bruce M. Maggs, Jun Yang, Xiao Zhang, Danyang Zhuo

TL;DR
Curator introduces a dual-index system that enhances vector search efficiency under low-selectivity filters by partitioning data and adapting indexes to label distributions, significantly reducing query latency.
Contribution
It proposes a novel partition-based index architecture that complements graph indexes, improving low-selectivity filtered ANNS performance with minimal overhead.
Findings
Reduces low-selectivity query latency by up to 20.9x
Maintains low memory overhead with only 4.3% increase
Supports incremental updates and complex predicates
Abstract
Embedding-based dense retrieval has become the cornerstone of many critical applications, where approximate nearest neighbor search (ANNS) queries are often combined with filters on labels such as dates and price ranges. Graph-based indexes achieve state-of-the-art performance on unfiltered ANNS but encounter connectivity breakdown on low-selectivity filtered queries, where qualifying vectors become sparse and the graph structure among them fragments. Recent research proposes specialized graph indexes that address this issue by expanding graph degree, which incurs prohibitively high construction costs. Given these inherent limitations of graph-based methods, we argue for a dual-index architecture and present Curator, a partition-based index that complements existing graph-based approaches for low-selectivity filtered ANNS. Curator builds specialized indexes for different labels within a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Advanced Graph Neural Networks · Graph Theory and Algorithms
