Convergence and prediction of principal component scores in high-dimensional settings
Seunggeun Lee, Fei Zou, Fred A. Wright

TL;DR
This paper investigates the bias in predicting principal component scores in high-dimensional data, identifies the causes, and proposes bias-adjusted methods with demonstrated improvements through simulations and real data examples.
Contribution
It extends the understanding of bias in PC score prediction in high dimensions and introduces bias-adjusted estimators to improve accuracy.
Findings
Naive PC score prediction is biased toward 0 in large matrices.
Bias-adjusted estimators significantly reduce bias and improve numerical properties.
Simulation and real data confirm the effectiveness of the proposed methods.
Abstract
A number of settings arise in which it is of interest to predict Principal Component (PC) scores for new observations using data from an initial sample. In this paper, we demonstrate that naive approaches to PC score prediction can be substantially biased toward 0 in the analysis of large matrices. This phenomenon is largely related to known inconsistency results for sample eigenvalues and eigenvectors as both dimensions of the matrix increase. For the spiked eigenvalue model for random matrices, we expand the generality of these results, and propose bias-adjusted PC score prediction. In addition, we compute the asymptotic correlation coefficient between PC scores from sample and population eigenvectors. Simulation and real data examples from the genetics literature show the improved bias and numerical properties of our estimators.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
