The High-Dimensional Asymptotics of Principal Component Regression
Alden Green, Elad Romanov

TL;DR
This paper provides exact formulas for the estimation and prediction risks of principal components regression in high-dimensional settings, revealing how eigenvalues, PC alignment, and the number of PCs influence performance.
Contribution
It introduces novel asymptotic formulas for PCR risks in high dimensions using advanced random matrix theory tools, addressing inconsistencies in sample covariance estimates.
Findings
Sample PCs may fail to capture true low-dimensional structure.
Prediction risk depends on eigenvalues, PC alignment, and number of PCs.
Random matrix tools like multi-resolvent traces are effective in this analysis.
Abstract
We study principal components regression (PCR) in an asymptotic high-dimensional regression setting, where the number of data points is proportional to the dimension. We derive exact limiting formulas for the estimation and prediction risks, which depend in a complicated manner on the eigenvalues of the population covariance, the alignment between the population PCs and the true signal, and the number of selected PCs. A key challenge in the high-dimensional setting stems from the fact that the sample covariance is an inconsistent estimate of its population counterpart, so that sample PCs may fail to fully capture potential latent low-dimensional structure in the data. We demonstrate this point through several case studies, including that of a spiked covariance model. To calculate the asymptotic prediction risk, we leverage tools from random matrix theory which to our knowledge have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
