An alternative proof of the vulnerability of retrieval in high intrinsic   dimensionality neighborhood

Teddy Furon

arXiv:2010.00990·cs.LG·May 23, 2022

An alternative proof of the vulnerability of retrieval in high intrinsic dimensionality neighborhood

Teddy Furon

PDF

Open Access

TL;DR

This paper presents an alternative proof demonstrating the vulnerability of nearest neighbor search in high-dimensional spaces, showing how small perturbations can significantly alter neighbor rankings, with validation on large datasets.

Contribution

It provides a new theoretical proof of neighbor search vulnerability in high dimensions, supported by empirical validation on large-scale datasets.

Findings

01

Vulnerability increases with data dimensionality

02

Small perturbations can change neighbor rankings significantly

03

Model validated on six large datasets

Abstract

This paper investigates the vulnerability of the nearest neighbors search, which is a pivotal tool in data analysis and machine learning. The vulnerability is gauged as the relative amount of perturbation that an attacker needs to add onto a dataset point in order to modify its neighbor rank w.r.t. a query. The statistical distribution of this quantity is derived from simple assumptions. Experiments on six large scale datasets validate this model up to some outliers which are explained in term of violations of the assumptions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Machine Learning and Algorithms · Machine Learning and Data Classification