A Refined Analysis of LSH for Well-dispersed Data Points
Wenlong Mou, Liwei Wang

TL;DR
This paper provides a refined analysis of locality sensitive hashing (LSH) for well-dispersed data points, showing improved query time bounds by leveraging data structure and low doubling dimension, with implications for practical parameter tuning.
Contribution
It introduces a new analysis framework for LSH that accounts for data dispersion and low doubling dimension, improving theoretical bounds and practical understanding.
Findings
Sharper performance bounds for well-dispersed data
First rigorous proof of LSH exploiting data structure
Insights into parameter setting beyond worst-case analysis
Abstract
Near neighbor problems are fundamental in algorithms for high-dimensional Euclidean spaces. While classical approaches suffer from the curse of dimensionality, locality sensitive hashing (LSH) can effectively solve a-approximate r-near neighbor problem, and has been proven to be optimal in the worst case. However, for real-world data sets, LSH can naturally benefit from well-dispersed data and low doubling dimension, leading to significantly improved performance. In this paper, we address this issue and propose a refined analyses for running time of approximating near neighbors queries via LSH. We characterize dispersion of data using N_b, the number of b*r-near pairs among the data points. Combined with optimal data-oblivious LSH scheme, we get a new query time bound depending on N_b and doubling dimension. For many natural scenarios where points are well-dispersed or lying in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Automated Road and Building Extraction
