Asymptotic properties of Principal Component Analysis and shrinkage-bias adjustment under the Generalized Spiked Population model
Rounak Dey, Seunggeun Lee

TL;DR
This paper studies PCA in high-dimensional data with correlated features, providing theoretical insights and methods to improve eigenvalue estimation, eigenvector alignment, and bias correction, leading to better prediction accuracy.
Contribution
It extends PCA theory to the generalized spiked population model and introduces methods for consistent estimation and bias adjustment in complex data settings.
Findings
Methods significantly reduce bias in PCA estimates.
Improved prediction accuracy demonstrated on genetic data.
Theoretical results validate the proposed estimators.
Abstract
With the development of high-throughput technologies, principal component analysis (PCA) in the high-dimensional regime is of great interest. Most of the existing theoretical and methodological results for high-dimensional PCA are based on the spiked population model in which all the population eigenvalues are equal except for a few large ones. Due to the presence of local correlation among features, however, this assumption may not be satisfied in many real-world datasets. To address this issue, we investigated the asymptotic behaviors of PCA under the generalized spiked population model. Based on the theoretical results, we proposed a series of methods for the consistent estimation of population eigenvalues, angles between the sample and population eigenvectors, correlation coefficients between the sample and population principal component (PC) scores, and the shrinkage bias…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPrincipal Components Analysis
