Randomized partition trees for exact nearest neighbor search

Sanjoy Dasgupta; Kaushik Sinha

arXiv:1302.1948·cs.DS·February 11, 2013·27 cites

Randomized partition trees for exact nearest neighbor search

Sanjoy Dasgupta, Kaushik Sinha

PDF

Open Access

TL;DR

This paper analyzes randomized partition trees for exact nearest neighbor search, providing probabilistic failure bounds based on data complexity, applicable to high-dimensional data and specific data models.

Contribution

It introduces a potential function to analyze the failure probability of randomized partition schemes, with bounds for doubling measures and topic model data.

Findings

01

Failure probability linked to a simple potential function

02

Bounds established for doubling measure data

03

Bounds established for topic model data

Abstract

The k-d tree was one of the first spatial data structures proposed for nearest neighbor search. Its efficacy is diminished in high-dimensional spaces, but several variants, with randomization and overlapping cells, have proved to be successful in practice. We analyze three such schemes. We show that the probability that they fail to find the nearest neighbor, for any data set and any query point, is directly related to a simple potential function that captures the difficulty of the point configuration. We then bound this potential function in two situations of interest: the first, when data come from a doubling measure, and the second, when the data are documents from a topic model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Geographic Information Systems Studies · Automated Road and Building Extraction