# Confirmation Sampling for Exact Nearest Neighbor Search

**Authors:** Tobias Christiani, Rasmus Pagh, Mikkel Thorup

arXiv: 1812.02603 · 2018-12-07

## TL;DR

This paper introduces confirmation sampling, a new technique for exact nearest neighbor search using LSH, providing a general reduction and a new query algorithm that achieves efficient, high-probability results.

## Contribution

The paper presents confirmation sampling for exact nearest neighbor search, along with a reduction method and a novel query algorithm for LSH Forests that improves parameter tuning.

## Key findings

- Achieves exact nearest neighbor with high probability using fewer queries.
- Provides a general reduction transforming small-probability data structures into high-probability solutions.
- Develops a new query algorithm for LSH Forests that matches the efficiency of well-tuned larger structures.

## Abstract

Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC '98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest neighbor problems, in practice LSH data structures with suitably chosen parameters are used to solve the exact nearest neighbor problem (with some error probability). Sublinear query time is often possible in practice even for exact nearest neighbor search, intuitively because the nearest neighbor tends to be significantly closer than other data points. However, theory offers little advice on how to choose LSH parameters outside of pre-specified worst-case settings.   We introduce the technique of confirmation sampling for solving the exact nearest neighbor problem using LSH. First, we give a general reduction that transforms a sequence of data structures that each find the nearest neighbor with a small, unknown probability, into a data structure that returns the nearest neighbor with probability $1-\delta$, using as few queries as possible. Second, we present a new query algorithm for the LSH Forest data structure with $L$ trees that is able to return the exact nearest neighbor of a query point within the same time bound as an LSH Forest of $\Omega(L)$ trees with internal parameters specifically tuned to the query and data.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.02603/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/1812.02603/full.md

---
Source: https://tomesphere.com/paper/1812.02603