Analysis of approximate nearest neighbor searching with clustered point sets
Songrit Maneewongvatana, David M. Mount

TL;DR
This paper empirically compares different data structures for approximate nearest neighbor searching, showing that alternative methods outperform kd-trees on clustered data and queries.
Contribution
It introduces and evaluates two novel splitting methods, sliding-midpoint and minimum-ambiguity, demonstrating their effectiveness on clustered datasets.
Findings
Alternative methods outperform kd-trees on clustered data.
Minimum-ambiguity method reduces query ambiguity.
Sliding-midpoint balances cell aspect ratio and emptiness.
Abstract
We present an empirical analysis of data structures for approximate nearest neighbor searching. We compare the well-known optimized kd-tree splitting method against two alternative splitting methods. The first, called the sliding-midpoint method, which attempts to balance the goals of producing subdivision cells of bounded aspect ratio, while not producing any empty cells. The second, called the minimum-ambiguity method is a query-based approach. In addition to the data points, it is also given a training set of query points for preprocessing. It employs a simple greedy algorithm to select the splitting plane that minimizes the average amount of ambiguity in the choice of the nearest neighbor for the training points. We provide an empirical analysis comparing these two methods against the optimized kd-tree construction for a number of synthetically generated data and query sets. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Image and Video Retrieval Techniques · Automated Road and Building Extraction
