Improved Space-Efficient Approximate Nearest Neighbor Search Using   Function Inversion

Samuel McCauley

arXiv:2407.02468·cs.DS·July 3, 2024

Improved Space-Efficient Approximate Nearest Neighbor Search Using Function Inversion

Samuel McCauley

PDF

TL;DR

This paper introduces a novel approach using function inversion to enhance the space efficiency of locality-sensitive hashing (LSH) for approximate nearest neighbor search, reducing space requirements and improving query times in high-dimensional data.

Contribution

It presents a new method leveraging function inversion to simplify and improve LSH-based ANN data structures, particularly enhancing the ALRW structure for Euclidean distance.

Findings

01

Reduces space usage of LSH-based ANN data structures.

02

Improves query times for Euclidean ANN.

03

Shows list-of-points structures are not optimal for Euclidean or Manhattan ANN.

Abstract

Approximate nearest neighbor search (ANN) data structures have widespread applications in machine learning, computational biology, and text processing. The goal of ANN is to preprocess a set S so that, given a query q, we can find a point y whose distance from q approximates the smallest distance from q to any point in S. For most distance functions, the best-known ANN bounds for high-dimensional point sets are obtained using techniques based on locality-sensitive hashing (LSH). Unfortunately, space efficiency is a major challenge for LSH-based data structures. Classic LSH techniques require a very large amount of space, oftentimes polynomial in |S|. A long line of work has developed intricate techniques to reduce this space usage, but these techniques suffer from downsides: they must be hand tailored to each specific LSH, are often complicated, and their space reduction comes at the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.