PiPNN: Ultra-Scalable Graph-Based Nearest Neighbor Indexing
Tobias Rubel, Richard Wen, Laxman Dhulipala, Lars Gottesb\"uren, Rajesh Jayaram, Jakub {\L}\k{a}cki

TL;DR
PiPNN introduces a novel scalable graph construction algorithm for Approximate Nearest Neighbor Search that significantly reduces index building time while maintaining high query performance, enabling billion-scale datasets to be indexed rapidly.
Contribution
PiPNN's HashPrune algorithm enables ultra-fast, scalable graph-based index construction without large memory overhead, outperforming existing methods in speed and scalability.
Findings
PiPNN is up to 12.9x faster than HNSW and Vamana.
Builds high-quality indexes on billion-scale datasets in under 20 minutes.
Achieves higher query throughput than previous algorithms.
Abstract
The fastest indexes for Approximate Nearest Neighbor Search today are also the slowest to build: graph-based methods like HNSW and Vamana achieve state-of-the-art query performance but have large construction times due to relying on random-access-heavy beam searches. We introduce PiPNN (Pick-in-Partitions Nearest Neighbors), an ultra-scalable graph construction algorithm that avoids this ``search bottleneck'' that existing graph-based methods suffer from. PiPNN's core innovation is HashPrune, a novel online pruning algorithm which dynamically maintains sparse collections of edges. HashPrune enables PiPNN to partition the dataset into overlapping sub-problems, efficiently perform bulk distance comparisons via dense matrix multiplication kernels, and stream a subset of the edges into HashPrune. HashPrune guarantees bounded memory during index construction which permits PiPNN to build…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Data Management and Algorithms · Advanced Image and Video Retrieval Techniques
