Relative Information Loss in the PCA
Bernhard C. Geiger, Gernot Kubin

TL;DR
This paper investigates the information loss in PCA as a deterministic system, revealing that dimensionality reduction causes similar information loss whether PCA is used or not, and analyzing the impact of sample size and covariance estimation.
Contribution
It provides a theoretical analysis of information loss in PCA, especially when using sample covariance matrices and the effects of sample size on information retention.
Findings
Information loss in PCA matches that of direct dimensionality reduction.
Using sample covariance matrices leads to infinite information loss if the rotation matrix isn't available.
Increasing sample size reduces the relative information loss.
Abstract
In this work we analyze principle component analysis (PCA) as a deterministic input-output system. We show that the relative information loss induced by reducing the dimensionality of the data after performing the PCA is the same as in dimensionality reduction without PCA. Finally, we analyze the case where the PCA uses the sample covariance matrix to compute the rotation. If the rotation matrix is not available at the output, we show that an infinite amount of information is lost. The relative information loss is shown to decrease with increasing sample size.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
