Curse of Dimensionality in the Application of Pivot-based Indexes to the   Similarity Search Problem

Ilya Volnyansky

arXiv:0905.2141·cs.DS·May 14, 2009

Curse of Dimensionality in the Application of Pivot-based Indexes to the Similarity Search Problem

Ilya Volnyansky

PDF

Open Access

TL;DR

This paper demonstrates that in high-dimensional spaces exhibiting measure concentration, pivot-based indexes for similarity search become asymptotically ineffective, confirming the curse of dimensionality.

Contribution

It provides an asymptotic analysis showing that pivot-based indexing schemes lose efficiency in high dimensions with measure concentration, aligning with the curse of dimensionality.

Findings

01

Index performance becomes linear in dataset size at high dimensions.

02

Pivot-based indexes offer negligible advantage over brute-force search in high dimensions.

03

The curse of dimensionality is confirmed for measure-concentrated spaces.

Abstract

In this work we study the validity of the so-called curse of dimensionality for indexing of databases for similarity search. We perform an asymptotic analysis, with a test model based on a sequence of metric spaces $(Ω_{d})$ from which we pick datasets $X_{d}$ in an i.i.d. fashion. We call the subscript $d$ the dimension of the space $Ω_{d}$ (e.g. for $R^{d}$ the dimension is just the usual one) and we allow the size of the dataset $n = n_{d}$ to be such that $d$ is superlogarithmic but subpolynomial in $n$ . We study the asymptotic performance of pivot-based indexing schemes where the number of pivots is $o (n / d)$ . We pick the relatively simple cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the spaces $Ω_{d}$ exhibit the (fairly common) concentration of measure phenomenon the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Bayesian Methods and Mixture Models · Data Mining Algorithms and Applications