Graph-based time-space trade-offs for approximate near neighbors
Thijs Laarhoven

TL;DR
This paper provides a rigorous asymptotic analysis of graph-based approximate nearest neighbor search, revealing conditions under which it matches hash-based methods in efficiency and exploring its scalability for large datasets.
Contribution
It introduces a formal complexity analysis of greedy graph-based near neighbor search, establishing conditions for optimal trade-offs and comparing with hash-based approaches.
Findings
Graph-based search matches hash-based trade-offs for small approximation factors.
Complexity bounds depend on dataset size and approximation factor.
Scalability analyzed for datasets of size exponential in dimension.
Abstract
We take a first step towards a rigorous asymptotic analysis of graph-based approaches for finding (approximate) nearest neighbors in high-dimensional spaces, by analyzing the complexity of (randomized) greedy walks on the approximate near neighbor graph. For random data sets of size on the -dimensional Euclidean unit sphere, using near neighbor graphs we can provably solve the approximate nearest neighbor problem with approximation factor in query time and space , for arbitrary satisfying \begin{align} (2c^2 - 1) \rho_q + 2 c^2 (c^2 - 1) \sqrt{\rho_s (1 - \rho_s)} \geq c^4. \end{align} Graph-based near neighbor searching is especially competitive with hash-based methods for small and near-linear memory, and in this regime the asymptotic scaling of a greedy graph-based search matches the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
