TL;DR
This paper demonstrates that SVD can effectively denoise high-dimensional data corrupted by Gaussian noise, enabling accurate nearest neighbor search in regimes where naive methods fail.
Contribution
It proves SVD's effectiveness for denoising and recovering nearest neighbors under specific noise conditions, improving upon prior spectral methods.
Findings
SVD denoises data for noise variance up to O(1/k^{1/4})
Identifies a noise threshold beyond which nearest neighbor recovery is impossible
Empirical results support theoretical claims on real datasets
Abstract
We study the Nearest Neighbor Search (NNS) problem in a high-dimensional setting where data lies in a low-dimensional subspace and is corrupted by Gaussian noise. Specifically, we consider a semi-random model in which points from an unknown -dimensional subspace of () are perturbed by zero-mean -dimensional Gaussian noise with variance per coordinate. Assuming the second-nearest neighbor is at least a factor farther from the query than the nearest neighbor, and given only the noisy data, our goal is to recover the nearest neighbor in the uncorrupted data. We prove three results. First, for , simply performing SVD denoises the data and provably recovers the correct nearest neighbor of the uncorrupted data. Second, for , the nearest neighbor in the uncorrupted data is not even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
