High-dimensional approximate nearest neighbor: k-d Generalized Randomized Forests
Yannis Avrithis, Ioannis Z. Emiris, and Georgios Samaras

TL;DR
This paper introduces the generalized randomized kd forest (kgeraf), a new data structure for high-dimensional approximate nearest neighbor search that improves accuracy and speed, outperforming existing methods in large, high-dimensional datasets.
Contribution
The paper presents the kgeraf, a novel randomized forest structure with new techniques for high-dimensional approximate nearest neighbor search, optimized for accuracy and efficiency.
Findings
Effective in dimensions around 1,000 to 10,000
Handles datasets with up to a million points efficiently
Achieves under 1 second query time on a million-image dataset
Abstract
We propose a new data-structure, the generalized randomized kd forest, or kgeraf, for approximate nearest neighbor searching in high dimensions. In particular, we introduce new randomization techniques to specify a set of independently constructed trees where search is performed simultaneously, hence increasing accuracy. We omit backtracking, and we optimize distance computations, thus accelerating queries. We release public domain software geraf and we compare it to existing implementations of state-of-the-art methods including BBD-trees, Locality Sensitive Hashing, randomized kd forests, and product quantization. Experimental results indicate that our method would be the method of choice in dimensions around 1,000, and probably up to 10,000, and pointsets of cardinality up to a few hundred thousands or even one million; this range of inputs is encountered in many critical applications…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods · Data Management and Algorithms
