Hypercube LSH for approximate near neighbors
Thijs Laarhoven

TL;DR
This paper provides a theoretical analysis of hypercube LSH, showing it outperforms hyperplane LSH in high-dimensional approximate nearest neighbor searches by deriving explicit collision probability asymptotics.
Contribution
It offers the first theoretical explanation for the improved performance of hypercube LSH over hyperplane LSH in high dimensions, including explicit asymptotics and practical guidance.
Findings
Hypercube LSH has lower collision probabilities for orthogonal vectors than hyperplane LSH.
The asymptotic collision probability for near-orthogonal vectors is $(rac{ ext{constant}}{ ext{pi}})^{d+o(d)}$.
Hypercube LSH achieves a smaller exponent $ ho$ in LSH, indicating better performance in high-dimensional nearest neighbor search.
Abstract
A celebrated technique for finding near neighbors for the angular distance involves using a set of \textit{random} hyperplanes to partition the space into hash regions [Charikar, STOC 2002]. Experiments later showed that using a set of \textit{orthogonal} hyperplanes, thereby partitioning the space into the Voronoi regions induced by a hypercube, leads to even better results [Terasawa and Tanaka, WADS 2007]. However, no theoretical explanation for this improvement was ever given, and it remained unclear how the resulting hypercube hash method scales in high dimensions. In this work, we provide explicit asymptotics for the collision probabilities when using hypercubes to partition the space. For instance, two near-orthogonal vectors are expected to collide with probability in dimension , compared to when using random hyperplanes.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Algorithms and Data Compression · Data Management and Algorithms
