The Shape of Data and Probability Measures
Diego Hern\'an D\'iaz Mart\'inez, Facundo M\'emoli, Washington Mio

TL;DR
This paper introduces multiscale covariance tensor fields (CTF) as a robust tool for analyzing the shape of data distributions, providing theoretical stability and practical applications in manifold clustering.
Contribution
It develops a systematic framework for multiscale CTFs, including stability, consistency, and convergence results, and demonstrates their effectiveness in shape analysis and clustering.
Findings
CTFs are stable under Wasserstein distance.
Empirical CTFs are consistent and robust to noise.
The proposed clustering method is stable and effective.
Abstract
We introduce the notion of multiscale covariance tensor fields (CTF) associated with Euclidean random variables as a gateway to the shape of their distributions. Multiscale CTFs quantify variation of the data about every point in the data landscape at all spatial scales, unlike the usual covariance tensor that only quantifies global variation about the mean. Empirical forms of localized covariance previously have been used in data analysis and visualization, but we develop a framework for the systematic treatment of theoretical questions and computational models based on localized covariance. We prove strong stability theorems with respect to the Wasserstein distance between probability measures, obtain consistency results, as well as estimates for the rate of convergence of empirical CTFs. These results ensure that CTFs are robust to sampling, noise and outliers. We provide numerous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Landslides and related hazards · Soil Geostatistics and Mapping
