# Algorithms for Similarity Search and Pseudorandomness

**Authors:** Tobias Christiani

arXiv: 1906.09430 · 2019-06-25

## TL;DR

This paper advances algorithms for approximate near neighbor search and pseudorandom number generation, providing new frameworks, bounds, and practical algorithms with improved efficiency and theoretical guarantees.

## Contribution

It introduces new frameworks and bounds for ANN search using locality-sensitive hashing and develops high-quality pseudorandom number generators with optimal or near-optimal resource usage.

## Key findings

- Reduced evaluations and complexity in ANN algorithms.
- Established tight bounds for space-time tradeoffs in ANN.
- Developed high-quality pseudorandom number generators with constant time.

## Abstract

We study the problem of approximate near neighbor (ANN) search and show the following results:   - An improved framework for solving the ANN problem using locality-sensitive hashing, reducing the number of evaluations of locality-sensitive hash functions and the word-RAM complexity compared to the standard framework.   - A framework for solving the ANN problem with space-time tradeoffs as well as tight upper and lower bounds for the space-time tradeoff of framework solutions to the ANN problem under cosine similarity.   - A novel approach to solving the ANN problem on sets along with a matching lower bound, improving the state of the art.   - A self-tuning version of the algorithm is shown through experiments to outperform existing similarity join algorithms.   - Tight lower bounds for asymmetric locality-sensitive hashing which has applications to the approximate furthest neighbor problem, orthogonal vector search, and annulus queries.   - A proof of the optimality of a well-known Boolean locality-sensitive hashing scheme.   We study the problem of efficient algorithms for producing high-quality pseudorandom numbers and obtain the following results:   - A deterministic algorithm for generating pseudorandom numbers of arbitrarily high quality in constant time using near-optimal space.   - A randomized construction of a family of hash functions that outputs pseudorandom numbers of arbitrarily high quality with space usage and running time nearly matching known cell-probe lower bounds.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.09430/full.md

## Figures

28 figures with captions in the complete paper: https://tomesphere.com/paper/1906.09430/full.md

## References

174 references — full list in the complete paper: https://tomesphere.com/paper/1906.09430/full.md

---
Source: https://tomesphere.com/paper/1906.09430