Resampling Sensitivity of High-Dimensional PCA
Haoyu Wang

TL;DR
This paper investigates the stability of PCA under data resampling in high-dimensional settings, revealing a sharp threshold at which PCA's principal components become either stable or completely change due to small data perturbations.
Contribution
It establishes the precise threshold for PCA sensitivity to data resampling in high dimensions, linking stability to the number of resampled entries.
Findings
For large resampling ($k o ext{large}$), principal components become orthogonal.
For small resampling ($k o 0$), principal components remain colinear.
PCA's stability sharply transitions at $k oughly n^{5/3}$ resampled entries.
Abstract
The study of stability and sensitivity of statistical methods or algorithms with respect to their data is an important problem in machine learning and statistics. The performance of the algorithm under resampling of the data is a fundamental way to measure its stability and is closely related to generalization or privacy of the algorithm. In this paper, we study the resampling sensitivity for the principal component analysis (PCA). Given an random matrix , let be the matrix obtained from by resampling randomly chosen entries of . Let and denote the principal components of and . In the proportional growth regime , we establish the sharp threshold for the sensitivity/stability transition of PCA. When $ k \gg…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom Matrices and Applications · Blind Source Separation Techniques · Theoretical and Computational Physics
MethodsPrincipal Components Analysis
