On the Incommensurability Phenomenon
Donniell E. Fishkind, Cencheng Shen, Youngser Park, Carey E. Priebe

TL;DR
This paper investigates the incommensurability phenomenon in PCA-reduced data sets, quantifying how large fitting errors can occur between noisy measurements of the same process, with implications demonstrated through simulations and real data.
Contribution
It provides a theoretical framework to quantify the incommensurability phenomenon in PCA, linking fitting error to subspace distances and correlation parameters.
Findings
Procrustean fitting-error relates to subspace Hausdorff distance and correlation.
The phenomenon can significantly impact data analysis in practice.
Simulations and real data illustrate the effect's magnitude.
Abstract
Suppose that two large, multi-dimensional data sets are each noisy measurements of the same underlying random process, and principle components analysis is performed separately on the data sets to reduce their dimensionality. In some circumstances it may happen that the two lower-dimensional data sets have an inordinately large Procrustean fitting-error between them. The purpose of this manuscript is to quantify this "incommensurability phenomenon." In particular, under specified conditions, the square Procrustean fitting-error of the two normalized lower-dimensional data sets is (asymptotically) a convex combination (via a correlation parameter) of the Hausdorff distance between the projection subspaces and the maximum possible value of the square Procrustean fitting-error for normalized data. We show how this gives rise to the incommensurability phenomenon, and we employ illustrative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Statistical Methods and Inference · Advanced Statistical Methods and Models
