Local Density Estimation in High Dimensions

Xian Wu; Moses Charikar; Vishnu Natchu

arXiv:1809.07471·cs.DS·September 21, 2018·6 cites

Local Density Estimation in High Dimensions

Xian Wu, Moses Charikar, Vishnu Natchu

PDF

Open Access

TL;DR

This paper introduces two locality sensitive hashing-based estimators, LSH Count and Multi-Probe Count, for efficiently estimating local densities in high-dimensional data, with proven bounds and experimental validation.

Contribution

It presents novel estimators that efficiently estimate local densities in high dimensions using importance sampling and multiple bucket sampling, with theoretical bounds and empirical results.

Findings

01

Effective density estimation demonstrated on word embeddings

02

Bounded space and sample complexity for the proposed estimators

03

Experimental validation shows practical utility

Abstract

An important question that arises in the study of high dimensional vector representations learned from data is: given a set $D$ of vectors and a query $q$ , estimate the number of points within a specified distance threshold of $q$ . We develop two estimators, LSH Count and Multi-Probe Count that use locality sensitive hashing to preprocess the data to accurately and efficiently estimate the answers to such questions via importance sampling. A key innovation is the ability to maintain a small number of hash tables via preprocessing data structures and algorithms that sample from multiple buckets in each hash table. We give bounds on the space requirements and sample complexity of our schemes, and demonstrate their effectiveness in experiments on a standard word embedding dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning