Sequential Hypothesis Tests for Adaptive Locality Sensitive Hashing
Aniket Chakrabarti, Srinivasan Parthasarathy

TL;DR
This paper introduces sequential hypothesis testing methods to improve the efficiency of Locality Sensitive Hashing (LSH) algorithms for high-dimensional similarity search, enabling more aggressive candidate pruning with controlled accuracy loss.
Contribution
It formulates sequential hypothesis tests for LSH, proposing a vanilla SPRT and two novel variants, including extensions for approximate similarity computation.
Findings
Sequential tests enable adaptive candidate pruning in LSH.
Proposed methods improve search efficiency while maintaining accuracy.
Extensions handle approximate similarity with confidence intervals.
Abstract
All pairs similarity search is a problem where a set of data objects is given and the task is to find all pairs of objects that have similarity above a certain threshold for a given similarity measure-of-interest. When the number of points or dimensionality is high, standard solutions fail to scale gracefully. Approximate solutions such as Locality Sensitive Hashing (LSH) and its Bayesian variants (BayesLSH and BayesLSHLite) alleviate the problem to some extent and provides substantial speedup over traditional index based approaches. BayesLSH is used for pruning the candidate space and computation of approximate similarity, whereas BayesLSHLite can only prune the candidates, but similarity needs to be computed exactly on the original data. Thus where ever the explicit data representation is available and exact similarity computation is not too expensive, BayesLSHLite can be used to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
