Local Averaging Accurately Distills Manifold Structure From Noisy Data
Yihan Shen, Shiyu Wang, Arnaud Lamy, Mariam Avagyan, John Wright

TL;DR
This paper provides a theoretical analysis of local averaging for manifold learning from noisy high-dimensional data, demonstrating its accuracy even in high-noise regimes and establishing bounds on the approximation error.
Contribution
It offers the first rigorous analysis of local averaging accuracy on manifolds under high noise, with bounds applicable to practical denoising and dimensionality reduction.
Findings
Achieves a bound on the distance to the manifold proportional to noise level and manifold properties.
First analysis of local averaging accuracy in high-noise regimes where noise magnitude is comparable to manifold reach.
Framework supports preprocessing for manifold-based methods in noisy data scenarios.
Abstract
High-dimensional data are ubiquitous, with examples ranging from natural images to scientific datasets, and often reside near low-dimensional manifolds. Leveraging this geometric structure is vital for downstream tasks, including signal denoising, reconstruction, and generation. However, in practice, the manifold is typically unknown and only noisy samples are available. A fundamental approach to uncovering the manifold structure is local averaging, which is a cornerstone of state-of-the-art provable methods for manifold fitting and denoising. However, to the best of our knowledge, there are no works that rigorously analyze the accuracy of local averaging in a manifold setting in high-noise regimes. In this work, we provide theoretical analyses of a two-round mini-batch local averaging method applied to noisy samples drawn from a -dimensional manifold $\mathcal M \subset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
