TL;DR
PipeANN-Filter is a novel SSD-based vector search system that reduces I/O by exploring a superset of candidates and verifying attributes afterward, significantly improving search efficiency.
Contribution
It introduces a new approach that leverages probabilistic data structures to minimize SSD I/O during filtered vector searches.
Findings
Reduces SSD I/O by exploring a candidate superset.
Improves search latency and throughput over existing systems.
Utilizes Bloom filters for efficient candidate identification.
Abstract
We propose PipeANN-Filter, an efficient filtered vector search system on SSD. Unlike existing systems that explore only valid vectors (i.e., those satisfying the attribute constraints) during search, PipeANN-Filter explores a superset of valid vectors, and performs attribute verification after getting the top-k closest result vectors. This allows PipeANN-Filter to leverage probabilistic data structures (e.g., Bloom filters) to identify the superset, trading off a small number of false-positive vector explorations for a massive reduction in SSD I/O for attribute reading. Evaluations show that PipeANN-Filter improves search latency and throughput compared to state-of-the-art systems. PipeANN-Filter is open-source at https://github.com/thustorage/PipeANN
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
