A Refined Analysis of LSH for Well-dispersed Data Points

Wenlong Mou; Liwei Wang

arXiv:1612.04571·cs.DS·December 15, 2016

A Refined Analysis of LSH for Well-dispersed Data Points

Wenlong Mou, Liwei Wang

PDF

Open Access

TL;DR

This paper provides a refined analysis of locality sensitive hashing (LSH) for well-dispersed data points, showing improved query time bounds by leveraging data structure and low doubling dimension, with implications for practical parameter tuning.

Contribution

It introduces a new analysis framework for LSH that accounts for data dispersion and low doubling dimension, improving theoretical bounds and practical understanding.

Findings

01

Sharper performance bounds for well-dispersed data

02

First rigorous proof of LSH exploiting data structure

03

Insights into parameter setting beyond worst-case analysis

Abstract

Near neighbor problems are fundamental in algorithms for high-dimensional Euclidean spaces. While classical approaches suffer from the curse of dimensionality, locality sensitive hashing (LSH) can effectively solve a-approximate r-near neighbor problem, and has been proven to be optimal in the worst case. However, for real-world data sets, LSH can naturally benefit from well-dispersed data and low doubling dimension, leading to significantly improved performance. In this paper, we address this issue and propose a refined analyses for running time of approximating near neighbors queries via LSH. We characterize dispersion of data using N_b, the number of b*r-near pairs among the data points. Combined with optimal data-oblivious LSH scheme, we get a new query time bound depending on N_b and doubling dimension. For many natural scenarios where points are well-dispersed or lying in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Automated Road and Building Extraction