
TL;DR
This paper introduces a Gaussian null model and a statistical test to distinguish genuine subtypes from covariance artifacts in Mapper graph analyses of high-dimensional data.
Contribution
It provides a null modeling framework that accounts for covariance structure, improving validation of Mapper-derived subtypes.
Findings
Null model controls Type I error in Gaussian simulations.
Observed Mapper communities do not significantly differ from null expectations in four datasets.
Covariance geometry alone can produce apparent subtypes, challenging previous interpretations.
Abstract
The Mapper algorithm from topological data analysis constructs a graph summarizing the shape of a high-dimensional dataset, and groups of data points identified within this graph are widely interpreted as evidence of distinct subtypes. However, the covariance structure of the data alone can make such groups appear differentiated, even when no subtypes are present. Existing validation approaches do not account for this effect and thus cannot distinguish covariance artifacts from genuine subtypes. We propose a Gaussian null model that generates reference data matching the sample covariance matrix. We pair it with a test statistic that measures mean-level differentiation between communities. In an idealized setting, we prove that covariance geometry alone causes Mapper communities to differ in their average feature profiles, and we show that a simpler label-permutation baseline cannot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
