Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact   Similarity Search in High Dimensions

Vladimir Pestov

arXiv:0812.0146·cs.DS·March 27, 2013

Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions

Vladimir Pestov

PDF

Open Access

TL;DR

This paper establishes fundamental lower bounds on the efficiency of metric tree indexing schemes for exact similarity search in high-dimensional spaces, demonstrating the curse of dimensionality under certain mathematical assumptions.

Contribution

It provides a rigorous theoretical analysis showing that hierarchical metric-tree schemes cannot outperform superpolynomial time complexity in high dimensions, under specified conditions.

Findings

01

Lower bound of (n^{1/4}) on expected search performance

02

Superpolynomial complexity in the intrinsic dimension d

03

Analysis based on VC dimension and Lipschitz functions

Abstract

Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees. The datasets $X$ are sampled randomly from a domain $Ω$ , equipped with a distance, $ρ$ , and an underlying probability distribution, $μ$ . While performing an asymptotic analysis, we send the intrinsic dimension $d$ of $Ω$ to infinity, and assume that the size of a dataset, $n$ , grows superpolynomially yet subexponentially in $d$ . Exact similarity search refers to finding the nearest neighbour in the dataset $X$ to a query point $ω \in Ω$ , where the query points are subject to the same probability distribution $μ$ as datapoints. Let $F$ denote a class of all 1-Lipschitz functions on $Ω$ that can be used as decision functions in constructing a hierarchical metric tree…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Advanced Image and Video Retrieval Techniques · Algorithms and Data Compression