TL;DR
This paper develops new spectral tests for PCA in high-dimensional, correlated data, leveraging linear spectral statistics to detect weak principal components more effectively than traditional methods.
Contribution
It introduces a nonparametric, non-Gaussian generalization of the spiked model and derives optimal tests based on linear spectral statistics for correlated noise eigenvalues.
Findings
New tests outperform traditional methods in correlated settings.
Optimal tests satisfy a Fredholm integral equation.
Algorithms for solving the integral equation are developed.
Abstract
Principal component analysis (PCA) is a widely used method for dimension reduction. In high dimensional data, the "signal" eigenvalues corresponding to weak principal components (PCs) do not necessarily separate from the bulk of the "noise" eigenvalues. Therefore, popular tests based on the largest eigenvalue have little power to detect weak PCs. In the special case of the spiked model, certain tests asymptotically equivalent to linear spectral statistics (LSS)---averaging effects over all eigenvalues---were recently shown to achieve some power. We consider a nonparametric, non-Gaussian generalization of the spiked model to the setting of Marchenko and Pastur (1967). This allows a general bulk of the noise eigenvalues, accomodating correlated variables even under the null hypothesis of no significant PCs. We develop new tests based on LSS to detect weak PCs in this model. We show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
