Selecting the number of principal components: estimation of the true rank of a noisy matrix
Yunjin Choi, Jonathan Taylor, Robert Tibshirani

TL;DR
This paper introduces an exact distribution-based method for determining the number of principal components in noisy data, improving PCA component selection through hypothesis testing and confidence intervals under Gaussian noise.
Contribution
It generalizes existing hypothesis testing methods for PCA to test for any number of components using exact distributions, enhancing accuracy and power.
Findings
Methods compare favorably to existing approaches in simulations
Provides exact tests and confidence intervals for PCA component estimation
Generalizes previous null hypothesis testing to multiple components
Abstract
Principal component analysis (PCA) is a well-known tool in multivariate statistics. One significant challenge in using PCA is the choice of the number of components. In order to address this challenge, we propose an exact distribution-based method for hypothesis testing and construction of confidence intervals for signals in a noisy matrix. Assuming Gaussian noise, we use the conditional distribution of the singular values of a Wishart matrix and derive exact hypothesis tests and confidence intervals for the true signals. Our paper is based on the approach of Taylor, Loftus and Tibshirani (2013) for testing the global null: we generalize it to test for any number of principal components, and derive an integrated version with greater power. In simulation studies we find that our proposed methods compare well to existing approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
