Curse of Dimensionality in Pivot-based Indexes
Ilya Volnyansky, Vladimir Pestov

TL;DR
This paper provides a theoretical validation that in high-dimensional spaces, pivot-based similarity search indexes cannot outperform linear scan, confirming the curse of dimensionality under certain conditions.
Contribution
It offers a rigorous theoretical analysis showing the limitations of pivot-based indexes in high-dimensional settings, extending understanding of the curse of dimensionality.
Findings
Pivot-based indexes are asymptotically linear in high dimensions.
Performance depends on the intrinsic dimension and concentration of measure.
Linear scan remains optimal in the high-dimensional regime.
Abstract
We offer a theoretical validation of the curse of dimensionality in the pivot-based indexing of datasets for similarity search, by proving, in the framework of statistical learning, that in high dimensions no pivot-based indexing scheme can essentially outperform the linear scan. A study of the asymptotic performance of pivot-based indexing schemes is performed on a sequence of datasets modeled as samples picked in i.i.d. fashion from metric spaces . We allow the size of the dataset to be such that , the ``dimension'', is superlogarithmic but subpolynomial in . The number of pivots is allowed to grow as . We pick the least restrictive cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the intrinsic dimension of the spaces in the sense of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
