Beyond Explained Variance: A Cautionary Tale of PCA

Gionni Marchetti

arXiv:2605.13520·cond-mat.stat-mech·May 19, 2026

Beyond Explained Variance: A Cautionary Tale of PCA

Gionni Marchetti

PDF

TL;DR

This paper critiques PCA for visualizing nonlinear data, demonstrating that alternative methods like t-SNE and persistent homology reveal a ring structure in fossil teeth data, challenging previous clustering interpretations.

Contribution

It introduces a combined analysis using t-SNE and persistent homology to better understand nonlinear data structures and proposes a probabilistic-geometric model supporting these findings.

Findings

01

PCA shows clustering, but t-SNE and PH reveal a ring structure.

02

The data likely lie on a one-dimensional manifold, a circle.

03

Pairwise cosine distances follow an arcsine distribution, supporting the geometric model.

Abstract

We address shortcomings of principal component analysis (PCA) for visualizing high-dimensional data lying on a nonlinear low-dimensional manifold via two-dimensional scatterplots, focusing on a fossil teeth dataset from the early mammalian insectivore Kuehneotherium. While the PCA scatterplot reported by Jolliffe and Cadima (Philosophical Transactions of the Royal Society A, 2016) shows clustering in the region where PC2 < 0, our analysis based on t-SNE and persistent homology (PH) reveals a ring-like structure with no evident clustering and intrinsic dimensionality equal to one. We further propose a generative probabilistic-geometric model in which the data are sampled uniformly from a unit circle. Under this model, pairwise cosine distances follow an arcsine distribution, in qualitative agreement with the observed U-shaped distribution, thereby independently supporting the analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.