Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors
Alexandr Andoni, Thijs Laarhoven, Ilya Razenshteyn, Erik, Waingarten

TL;DR
This paper establishes tight upper and lower bounds for time-space trade-offs in approximate near neighbor search in high-dimensional Euclidean spaces, achieving sublinear query time with near-linear space for all approximation factors greater than one.
Contribution
It introduces a new data structure that optimally balances space and query time for approximate near neighbor search, and proves matching lower bounds, including the first non-polynomial space lower bound for two probes.
Findings
Achieves sublinear query time with near-linear space for all c > 1
Provides tight upper and lower bounds for the problem
Establishes a connection to locally-decodable codes for lower bounds
Abstract
[See the paper for the full abstract.] We show tight upper and lower bounds for time-space trade-offs for the -Approximate Near Neighbor Search problem. For the -dimensional Euclidean space and -point datasets, we develop a data structure with space and query time for every such that: \begin{equation} c^2 \sqrt{\rho_q} + (c^2 - 1) \sqrt{\rho_u} = \sqrt{2c^2 - 1}. \end{equation} This is the first data structure that achieves sublinear query time and near-linear space for every approximation factor , improving upon [Kapralov, PODS 2015]. The data structure is a culmination of a long line of work on the problem for all space regimes; it builds on Spherical Locality-Sensitive Filtering [Becker, Ducas, Gama, Laarhoven, SODA 2016] and data-dependent hashing [Andoni, Indyk, Nguyen,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
