PipeANN-Filter: An Efficient Filtered Vector Search System on SSD

Hao Guo; Jiwu Shu; Youyou Lu

arXiv:2605.17992·cs.OS·May 19, 2026

PipeANN-Filter: An Efficient Filtered Vector Search System on SSD

Hao Guo, Jiwu Shu, Youyou Lu

PDF

1 Repo

TL;DR

PipeANN-Filter is a novel SSD-based vector search system that reduces I/O by exploring a superset of candidates and verifying attributes afterward, significantly improving search efficiency.

Contribution

It introduces a new approach that leverages probabilistic data structures to minimize SSD I/O during filtered vector searches.

Findings

01

Reduces SSD I/O by exploring a candidate superset.

02

Improves search latency and throughput over existing systems.

03

Utilizes Bloom filters for efficient candidate identification.

Abstract

We propose PipeANN-Filter, an efficient filtered vector search system on SSD. Unlike existing systems that explore only valid vectors (i.e., those satisfying the attribute constraints) during search, PipeANN-Filter explores a superset of valid vectors, and performs attribute verification after getting the top-k closest result vectors. This allows PipeANN-Filter to leverage probabilistic data structures (e.g., Bloom filters) to identify the superset, trading off a small number of false-positive vector explorations for a massive reduction in SSD I/O for attribute reading. Evaluations show that PipeANN-Filter improves search latency and throughput compared to state-of-the-art systems. PipeANN-Filter is open-source at https://github.com/thustorage/PipeANN

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thustorage/PipeANN
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.