Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling
Boris Landa, Xiuyuan Cheng

TL;DR
This paper introduces a doubly stochastic normalization method for Gaussian kernels that enhances robustness in high-dimensional noisy data, enabling more accurate inference of manifold density, geometry, and related structures.
Contribution
The authors develop a novel doubly stochastic normalization approach that improves robustness and accuracy in manifold inference under high-dimensional, heteroskedastic noise conditions.
Findings
Doubly stochastic affinity matrices concentrate around population forms.
The method outperforms standard kernel density estimators under heteroskedasticity.
Robust graph Laplacian normalizations better approximate manifold Laplacians in noisy data.
Abstract
The Gaussian kernel and its traditional normalizations (e.g., row-stochastic) are popular approaches for assessing similarities between data points. Yet, they can be inaccurate under high-dimensional noise, especially if the noise magnitude varies considerably across the data, e.g., under heteroskedasticity or outliers. In this work, we investigate a more robust alternative -- the doubly stochastic normalization of the Gaussian kernel. We consider a setting where points are sampled from an unknown density on a low-dimensional manifold embedded in high-dimensional space and corrupted by possibly strong, non-identically distributed, sub-Gaussian noise. We establish that the doubly stochastic affinity matrix and its scaling factors concentrate around certain population forms, and provide corresponding finite-sample probabilistic error bounds. We then utilize these results to develop…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Gene expression and cancer classification · Bayesian Methods and Mixture Models
