B+ANN: A Fast Billion-Scale Disk-based Nearest-Neighbor Index
Selim Furkan Tekin, Rajesh Bordawekar

TL;DR
The paper introduces B+ANN, a disk-based nearest-neighbor index that improves performance and memory efficiency over existing methods like HNSW and DiskANN, while supporting dissimilarity queries.
Contribution
It presents a novel B+ tree-based disk index that enhances cache locality, reduces memory use, and enables dissimilarity queries in large-scale vector search.
Findings
Improves recall and QPS over HNSW
Reduces cache misses by 19.23%
Decreases build time and memory usage by 24x
Abstract
Storing and processing of embedding vectors by specialized Vector databases (VDBs) has become the linchpin in building modern AI pipelines. Most current VDBs employ variants of a graph-based ap- proximate nearest-neighbor (ANN) index algorithm, HNSW, to an- swer semantic queries over stored vectors. Inspite of its wide-spread use, the HNSW algorithm suffers from several issues: in-memory design and implementation, random memory accesses leading to degradation in cache behavior, limited acceleration scope due to fine-grained pairwise computations, and support of only semantic similarity queries. In this paper, we present a novel disk-based ANN index, B+ANN, to address these issues: it first partitions input data into blocks containing semantically similar items, then builds an B+ tree variant to store blocks both in-memory and on disks, and finally, enables hybrid edge- and block-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Graph Theory and Algorithms
