Optimal Data-Dependent Hashing for Approximate Near Neighbors

Alexandr Andoni; Ilya Razenshteyn

arXiv:1501.01062·cs.DS·July 17, 2015·33 cites

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Alexandr Andoni, Ilya Razenshteyn

PDF

Open Access 1 Video

TL;DR

This paper introduces an optimal data-dependent hashing scheme for approximate near neighbor search that outperforms classical LSH methods in Euclidean and Hamming spaces, with improved query time and space complexity.

Contribution

It presents a new data-dependent hashing scheme that is proven to be optimal and surpasses existing LSH-based methods for all approximation factors.

Findings

01

Achieves optimal query time and space complexity for Euclidean space.

02

Outperforms classical LSH methods for all approximation factors.

03

Provides a new technical approach by decomposing datasets into pseudo-random subsets.

Abstract

We show an optimal data-dependent hashing scheme for the approximate near neighbor problem. For an $n$ -point data set in a $d$ -dimensional space our data structure achieves query time $O (d n^{ρ + o (1)})$ and space $O (n^{1 + ρ + o (1)} + d n)$ , where $ρ = \frac{1}{2 c ^{2} - 1}$ for the Euclidean space and approximation $c > 1$ . For the Hamming space, we obtain an exponent of $ρ = \frac{1}{2 c - 1}$ . Our result completes the direction set forth in [AINR14] who gave a proof-of-concept that data-dependent hashing can outperform classical Locality Sensitive Hashing (LSH). In contrast to [AINR14], the new bound is not only optimal, but in fact improves over the best (optimal) LSH data structures [IM98,AI06] for all approximation factors $c > 1$ . From the technical perspective, we proceed by decomposing an arbitrary dataset into several subsets that are, in a certain sense, pseudo-random.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Locality-Sensitive Hashing and Beyond· youtube

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Neural Network Applications