New Instability Results for High Dimensional Nearest Neighbor Search
Chris Giannella

TL;DR
This paper investigates the behavior of high-dimensional nearest neighbor search, revealing that dataset size growth influences the stability of distance ratios, with implications for the effectiveness of nearest neighbor algorithms in high dimensions.
Contribution
The paper provides new theoretical instability results for high-dimensional nearest neighbor search, especially regarding dataset size growth and distance ratio behavior.
Findings
Sub-exponential dataset size leads to near-constant distance ratios as dimension increases.
Super-exponential dataset size causes distance ratios to deviate from one with positive probability.
Preliminary results extend findings to Gaussian distributions.
Abstract
Consider a dataset of n(d) points generated independently from R^d according to a common p.d.f. f_d with support(f_d) = [0,1]^d and sup{f_d([0,1]^d)} growing sub-exponentially in d. We prove that: (i) if n(d) grows sub-exponentially in d, then, for any query point q^d in [0,1]^d and any epsilon>0, the ratio of the distance between any two dataset points and q^d is less that 1+epsilon with probability -->1 as d-->infinity; (ii) if n(d)>[4(1+epsilon)]^d for large d, then for all q^d in [0,1]^d (except a small subset) and any epsilon>0, the distance ratio is less than 1+epsilon with limiting probability strictly bounded away from one. Moreover, we provide preliminary results along the lines of (i) when f_d=N(mu_d,Sigma_d).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Geometry and Mesh Generation · Data Management and Algorithms · Point processes and geometric inequalities
