Persistence Norms and the Datasaurus
Pawel Dlotko, Simon Rudkin

TL;DR
This paper demonstrates that persistence norms from Topological Data Analysis can distinguish datasets with identical summary statistics, highlighting their usefulness as additional data shape measures.
Contribution
It shows that persistence norms can differentiate datasets with similar traditional statistics, emphasizing their value in data shape analysis.
Findings
Persistence norms vary across datasets with identical summary statistics.
Multivariate distributions with same covariance can have different persistence norms.
Persistence norms are sensitive to the distribution of point clouds.
Abstract
Topological Data Analysis (TDA) provides a toolkit for the study of the shape of high dimensional and complex data. While operating on a space of persistence diagrams is cumbersome, persistence norms provide a simple real value measure of multivariate data which is seeing greater adoption within finance. A growing literature seeks links between persistence norms and the summary statistics of the data being analysed. This short note targets the demonstration of differences in the persistence norms of the Datasaurus datasets of Matejka and Fitzmaurice. We show that persistence norms can be used as additional measures that often discriminate datasets with the same collection of summary statistics. Treating each of the data sets as a point cloud we construct the and persistence norms in dimensions 0 and 1. We show multivariate distributions with identical covariance and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis
