A Geometric Analysis of PCA
Ayoub El Hanchi, Murat Erdogdu, Chris Maddison

TL;DR
This paper provides a detailed geometric and probabilistic analysis of PCA, including a central limit theorem for the PCA error and bounds on excess risk, revealing how data distribution influences PCA performance.
Contribution
It introduces a precise asymptotic distribution for PCA excess risk and establishes a geometric property of the negative block Rayleigh quotient on the Grassmannian.
Findings
Central limit theorem for PCA error
Asymptotic distribution of excess risk
Non-asymptotic upper bounds on PCA excess risk
Abstract
What property of the data distribution determines the excess risk of principal component analysis? In this paper, we provide a precise answer to this question. We establish a central limit theorem for the error of the principal subspace estimated by PCA, and derive the asymptotic distribution of its excess risk under the reconstruction loss. We obtain a non-asymptotic upper bound on the excess risk of PCA that recovers, in the large sample limit, our asymptotic characterization. Underlying our contributions is the following result: we prove that the negative block Rayleigh quotient, defined on the Grassmannian, is generalized self-concordant along geodesics emanating from its minimizer of maximum rotation less than .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
