Tangent Space and Dimension Estimation with the Wasserstein Distance
Uzu Lim, Harald Oberhauser, Vidit Nanda

TL;DR
This paper establishes rigorous bounds on the sample size needed for accurate tangent space and dimension estimation of manifolds using Local PCA, accommodating noise and non-uniform data distributions.
Contribution
It provides explicit bounds and a rigorous theoretical framework for tangent space and dimension estimation using Wasserstein distance and matrix concentration inequalities.
Findings
Explicit sample size bounds for manifold estimation
Robustness to noisy, non-uniform data distributions
Simultaneous estimation at multiple points
Abstract
Consider a set of points sampled independently near a smooth compact submanifold of Euclidean space. We provide mathematically rigorous bounds on the number of sample points required to estimate both the dimension and the tangent spaces of that manifold with high confidence. The algorithm for this estimation is Local PCA, a local version of principal component analysis. Our results accommodate for noisy non-uniform data distribution with the noise that may vary across the manifold, and allow simultaneous estimation at multiple points. Crucially, all of the constants appearing in our bound are explicitly described. The proof uses a matrix concentration inequality to estimate covariance matrices and a Wasserstein distance bound for quantifying nonlinearity of the underlying manifold and non-uniformity of the probability measure.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Geometric Analysis and Curvature Flows · Point processes and geometric inequalities
MethodsPrincipal Components Analysis
