Local Density Estimation in High Dimensions
Xian Wu, Moses Charikar, Vishnu Natchu

TL;DR
This paper introduces two locality sensitive hashing-based estimators, LSH Count and Multi-Probe Count, for efficiently estimating local densities in high-dimensional data, with proven bounds and experimental validation.
Contribution
It presents novel estimators that efficiently estimate local densities in high dimensions using importance sampling and multiple bucket sampling, with theoretical bounds and empirical results.
Findings
Effective density estimation demonstrated on word embeddings
Bounded space and sample complexity for the proposed estimators
Experimental validation shows practical utility
Abstract
An important question that arises in the study of high dimensional vector representations learned from data is: given a set of vectors and a query , estimate the number of points within a specified distance threshold of . We develop two estimators, LSH Count and Multi-Probe Count that use locality sensitive hashing to preprocess the data to accurately and efficiently estimate the answers to such questions via importance sampling. A key innovation is the ability to maintain a small number of hash tables via preprocessing data structures and algorithms that sample from multiple buckets in each hash table. We give bounds on the space requirements and sample complexity of our schemes, and demonstrate their effectiveness in experiments on a standard word embedding dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
