Heteroskedastic PCA: Algorithm, Optimality, and Applications
Anru R. Zhang, T. Tony Cai, Yihong Wu

TL;DR
HeteroPCA is a novel algorithm for PCA that effectively handles heteroskedastic noise by iteratively imputing covariance matrix entries, achieving optimality and broad applicability in high-dimensional statistical problems.
Contribution
The paper introduces HeteroPCA, an efficient and provably optimal PCA algorithm designed for heteroskedastic noise, with a new robust perturbation analysis technique.
Findings
HeteroPCA outperforms traditional PCA in heteroskedastic settings.
The algorithm is effective for SVD, Poisson PCA, and incomplete data.
The method is computationally efficient and theoretically optimal.
Abstract
A general framework for principal component analysis (PCA) in the presence of heteroskedastic noise is introduced. We propose an algorithm called HeteroPCA, which involves iteratively imputing the diagonal entries of the sample covariance matrix to remove estimation bias due to heteroskedasticity. This procedure is computationally efficient and provably optimal under the generalized spiked covariance model. A key technical step is a deterministic robust perturbation analysis on singular subspaces, which can be of independent interest. The effectiveness of the proposed algorithm is demonstrated in a suite of problems in high-dimensional statistics, including singular value decomposition (SVD) under heteroskedastic noise, Poisson PCA, and SVD for heteroskedastic and incomplete data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlind Source Separation Techniques · Spectroscopy and Chemometric Analyses · Statistical and numerical algorithms
MethodsPrincipal Components Analysis
