A Heterogeneous High Dimensional Approximate Nearest Neighbor Algorithm
Moshe Dubiner

TL;DR
This paper introduces a novel high-dimensional approximate nearest neighbor algorithm for sparse, heterogeneous data, analyzing its optimality and estimating the number of tries needed for success.
Contribution
It proposes a direct, non-dimensional reduction approach using coordinate reordering and lexicographic ordering, along with a theoretical analysis of its optimality and efficiency.
Findings
The algorithm is shown to be optimal within a certain class of methods.
The necessary number of tries for success is estimated based on bucketing forest information.
The approach outperforms traditional dimensional reduction techniques in certain high-dimensional sparse settings.
Abstract
We consider the problem of finding high dimensional approximate nearest neighbors. Suppose there are d independent rare features, each having its own independent statistics. A point x will have x_{i}=0 denote the absence of feature i, and x_{i}=1 its existence. Sparsity means that usually x_{i}=0. Distance between points is a variant of the Hamming distance. Dimensional reduction converts the sparse heterogeneous problem into a lower dimensional full homogeneous problem. However we will see that the converted problem can be much harder to solve than the original problem. Instead we suggest a direct approach. It consists of T tries. In try t we rearrange the coordinates in decreasing order of (1-r_{t,i})\frac{p_{i,11}}{p_{i,01}+p_{i,10}} \ln\frac{1}{p_{i,1*}} where 0<r_{t,i}<1 are uniform pseudo-random numbers, and the p's are the coordinate's statistical parameters. The points are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Face and Expression Recognition · Machine Learning and Algorithms
