Curse of Dimensionality in the Application of Pivot-based Indexes to the Similarity Search Problem
Ilya Volnyansky

TL;DR
This paper demonstrates that in high-dimensional spaces exhibiting measure concentration, pivot-based indexes for similarity search become asymptotically ineffective, confirming the curse of dimensionality.
Contribution
It provides an asymptotic analysis showing that pivot-based indexing schemes lose efficiency in high dimensions with measure concentration, aligning with the curse of dimensionality.
Findings
Index performance becomes linear in dataset size at high dimensions.
Pivot-based indexes offer negligible advantage over brute-force search in high dimensions.
The curse of dimensionality is confirmed for measure-concentrated spaces.
Abstract
In this work we study the validity of the so-called curse of dimensionality for indexing of databases for similarity search. We perform an asymptotic analysis, with a test model based on a sequence of metric spaces from which we pick datasets in an i.i.d. fashion. We call the subscript the dimension of the space (e.g. for the dimension is just the usual one) and we allow the size of the dataset to be such that is superlogarithmic but subpolynomial in . We study the asymptotic performance of pivot-based indexing schemes where the number of pivots is . We pick the relatively simple cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the spaces exhibit the (fairly common) concentration of measure phenomenon the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Bayesian Methods and Mixture Models · Data Mining Algorithms and Applications
