Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions
Vladimir Pestov

TL;DR
This paper establishes fundamental lower bounds on the efficiency of metric tree indexing schemes for exact similarity search in high-dimensional spaces, demonstrating the curse of dimensionality under certain mathematical assumptions.
Contribution
It provides a rigorous theoretical analysis showing that hierarchical metric-tree schemes cannot outperform superpolynomial time complexity in high dimensions, under specified conditions.
Findings
Lower bound of (n^{1/4}) on expected search performance
Superpolynomial complexity in the intrinsic dimension d
Analysis based on VC dimension and Lipschitz functions
Abstract
Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees. The datasets are sampled randomly from a domain , equipped with a distance, , and an underlying probability distribution, . While performing an asymptotic analysis, we send the intrinsic dimension of to infinity, and assume that the size of a dataset, , grows superpolynomially yet subexponentially in . Exact similarity search refers to finding the nearest neighbour in the dataset to a query point , where the query points are subject to the same probability distribution as datapoints. Let denote a class of all 1-Lipschitz functions on that can be used as decision functions in constructing a hierarchical metric tree…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Image and Video Retrieval Techniques · Algorithms and Data Compression
