PCA consistency in high dimension, low sample size context
Sungkyu Jung, J. S. Marron

TL;DR
This paper studies the behavior of PCA in high-dimensional, low-sample-size settings, establishing conditions for consistency and inconsistency of principal component directions as dimension grows.
Contribution
It provides a comprehensive analysis of PCA consistency in HDLSS contexts, including conditions for convergence and inconsistency of PC directions.
Findings
Large leading eigenvalues ensure PC direction consistency.
Most other PC directions are strongly inconsistent.
Geometric representation of HDLSS data holds under broad conditions.
Abstract
Principal Component Analysis (PCA) is an important tool of dimension reduction especially when the dimension (or the number of variables) is very high. Asymptotic studies where the sample size is fixed, and the dimension grows [i.e., High Dimension, Low Sample Size (HDLSS)] are becoming increasingly relevant. We investigate the asymptotic behavior of the Principal Component (PC) directions. HDLSS asymptotics are used to study consistency, strong inconsistency and subspace consistency. We show that if the first few eigenvalues of a population covariance matrix are large enough compared to the others, then the corresponding estimated PC directions are consistent or converge to the appropriate subspace (subspace consistency) and most other PC directions are strongly inconsistent. Broad sets of sufficient conditions for each of these cases are specified and the main theorem gives a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
