Optimal Data-Dependent Hashing for Approximate Near Neighbors
Alexandr Andoni, Ilya Razenshteyn

TL;DR
This paper introduces an optimal data-dependent hashing scheme for approximate near neighbor search that outperforms classical LSH methods in Euclidean and Hamming spaces, with improved query time and space complexity.
Contribution
It presents a new data-dependent hashing scheme that is proven to be optimal and surpasses existing LSH-based methods for all approximation factors.
Findings
Achieves optimal query time and space complexity for Euclidean space.
Outperforms classical LSH methods for all approximation factors.
Provides a new technical approach by decomposing datasets into pseudo-random subsets.
Abstract
We show an optimal data-dependent hashing scheme for the approximate near neighbor problem. For an -point data set in a -dimensional space our data structure achieves query time and space , where for the Euclidean space and approximation . For the Hamming space, we obtain an exponent of . Our result completes the direction set forth in [AINR14] who gave a proof-of-concept that data-dependent hashing can outperform classical Locality Sensitive Hashing (LSH). In contrast to [AINR14], the new bound is not only optimal, but in fact improves over the best (optimal) LSH data structures [IM98,AI06] for all approximation factors . From the technical perspective, we proceed by decomposing an arbitrary dataset into several subsets that are, in a certain sense, pseudo-random.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Locality-Sensitive Hashing and Beyond· youtube
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Neural Network Applications
