Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations
Piotr Indyk, Haike Xu

TL;DR
This paper analyzes the theoretical worst-case performance of popular graph-based approximate nearest neighbor search algorithms, revealing limitations in their guarantees and demonstrating scenarios where query times can be linear in dataset size.
Contribution
It provides the first worst-case performance bounds for algorithms like HNSW, NSG, and DiskANN, highlighting their limitations and conditions for guaranteed efficiency.
Findings
DiskANN with slow preprocessing supports constant approximation and poly-logarithmic query time on bounded data.
HNSW and NSG can have linear query times in the worst case for certain data instances.
Empirical query times can be significantly larger than expected, reaching linear in dataset size.
Abstract
Graph-based approaches to nearest neighbor search are popular and powerful tools for handling large datasets in practice, but they have limited theoretical guarantees. We study the worst-case performance of recent graph-based approximate nearest neighbor search algorithms, such as HNSW, NSG and DiskANN. For DiskANN, we show that its "slow preprocessing" version provably supports approximate nearest neighbor search query with constant approximation ratio and poly-logarithmic query time, on data sets with bounded "intrinsic" dimension. For the other data structure variants studied, including DiskANN with "fast preprocessing", HNSW and NSG, we present a family of instances on which the empirical query time required to achieve a "reasonable" accuracy is linear in instance size. For example, for DiskANN, we show that the query procedure can take at least steps on instances of size…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Management and Algorithms · Advanced Image and Video Retrieval Techniques · Machine Learning and Algorithms
