Randomized partition trees for exact nearest neighbor search
Sanjoy Dasgupta, Kaushik Sinha

TL;DR
This paper analyzes randomized partition trees for exact nearest neighbor search, providing probabilistic failure bounds based on data complexity, applicable to high-dimensional data and specific data models.
Contribution
It introduces a potential function to analyze the failure probability of randomized partition schemes, with bounds for doubling measures and topic model data.
Findings
Failure probability linked to a simple potential function
Bounds established for doubling measure data
Bounds established for topic model data
Abstract
The k-d tree was one of the first spatial data structures proposed for nearest neighbor search. Its efficacy is diminished in high-dimensional spaces, but several variants, with randomization and overlapping cells, have proved to be successful in practice. We analyze three such schemes. We show that the probability that they fail to find the nearest neighbor, for any data set and any query point, is directly related to a simple potential function that captures the difficulty of the point configuration. We then bound this potential function in two situations of interest: the first, when data come from a doubling measure, and the second, when the data are documents from a topic model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Geographic Information Systems Studies · Automated Road and Building Extraction
