Curse of Dimensionality in Pivot-based Indexes

Ilya Volnyansky; Vladimir Pestov

arXiv:0906.0391·cs.DS·November 17, 2016

Curse of Dimensionality in Pivot-based Indexes

Ilya Volnyansky, Vladimir Pestov

PDF

TL;DR

This paper provides a theoretical validation that in high-dimensional spaces, pivot-based similarity search indexes cannot outperform linear scan, confirming the curse of dimensionality under certain conditions.

Contribution

It offers a rigorous theoretical analysis showing the limitations of pivot-based indexes in high-dimensional settings, extending understanding of the curse of dimensionality.

Findings

01

Pivot-based indexes are asymptotically linear in high dimensions.

02

Performance depends on the intrinsic dimension and concentration of measure.

03

Linear scan remains optimal in the high-dimensional regime.

Abstract

We offer a theoretical validation of the curse of dimensionality in the pivot-based indexing of datasets for similarity search, by proving, in the framework of statistical learning, that in high dimensions no pivot-based indexing scheme can essentially outperform the linear scan. A study of the asymptotic performance of pivot-based indexing schemes is performed on a sequence of datasets modeled as samples $X_{d}$ picked in i.i.d. fashion from metric spaces $Ω_{d}$ . We allow the size of the dataset $n = n_{d}$ to be such that $d$ , the ``dimension'', is superlogarithmic but subpolynomial in $n$ . The number of pivots is allowed to grow as $o (n / d)$ . We pick the least restrictive cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the intrinsic dimension of the spaces $Ω_{d}$ in the sense of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.