On the number of principal components in high dimensions
Sungkyu Jung, Myung Hee Lee, Jeongyoun Ahn

TL;DR
This paper introduces a new method for determining the number of principal components to retain in high-dimensional PCA by sequentially testing skewness of residual scores, demonstrating consistency and effectiveness.
Contribution
It proposes a novel skewness-based sequential testing procedure for estimating the number of principal components in high-dimensional settings.
Findings
Estimator is consistent in high dimensions.
Performs well in simulation studies.
Provides reasonable estimates on real data.
Abstract
We consider the problem of how many components to retain in the application of principal component analysis when the dimension is much higher than the number of observations. To estimate the number of components, we propose to sequentially test skewness of the squared lengths of residual scores that are obtained by removing leading principal components. The residual lengths are asymptotically left-skewed if all principal components with diverging variances are removed, and right-skewed if not. The proposed estimator is shown to be consistent, performs well in high-dimensional simulation studies, and provides reasonable estimates in a number of real data examples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
