Finite sample approximation results for principal component analysis: a matrix perturbation approach
Boaz Nadler

TL;DR
This paper provides finite sample bounds and a matrix perturbation perspective on PCA eigenvalues and eigenvectors, analyzing their relation to population PCA and phase transition phenomena in high-dimensional settings.
Contribution
It introduces a nonasymptotic, high-probability theorem for sample PCA eigenvalues and eigenvectors under a spiked covariance model, and offers a matrix perturbation view of phase transitions.
Findings
Finite sample bounds for PCA eigenvalues and eigenvectors.
Analysis of phase transition and eigenvector loss in high-dimensional PCA.
Eigenvector stability depends on noise level and sample size.
Abstract
Principal component analysis (PCA) is a standard tool for dimensional reduction of a set of observations (samples), each with variables. In this paper, using a matrix perturbation approach, we study the nonasymptotic relation between the eigenvalues and eigenvectors of PCA computed on a finite sample of size , and those of the limiting population PCA as . As in machine learning, we present a finite sample theorem which holds with high probability for the closeness between the leading eigenvalue and eigenvector of sample PCA and population PCA under a spiked covariance model. In addition, we also consider the relation between finite sample PCA and the asymptotic results in the joint limit , with . We present a matrix perturbation view of the "phase transition phenomenon," and a simple linear-algebra based derivation of the eigenvalue and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
