Correlated-PCA: Principal Components' Analysis when Data and Noise are Correlated
Namrata Vaswani, Han Guo

TL;DR
This paper investigates PCA performance when data and noise are correlated, providing theoretical guarantees for standard and generalized eigenvalue decomposition methods under data-dependent noise conditions.
Contribution
It offers the first theoretical analysis of PCA with correlated data and noise, introducing a generalized cluster-EVD method that enhances robustness.
Findings
Standard EVD PCA remains correct under certain data-noise correlation assumptions.
Cluster-EVD improves PCA accuracy in specific correlated noise regimes.
Theoretical guarantees extend PCA applicability to data-dependent noise scenarios.
Abstract
Given a matrix of observed data, Principal Components Analysis (PCA) computes a small number of orthogonal directions that contain most of its variability. Provably accurate solutions for PCA have been in use for a long time. However, to the best of our knowledge, all existing theoretical guarantees for it assume that the data and the corrupting noise are mutually independent, or at least uncorrelated. This is valid in practice often, but not always. In this paper, we study the PCA problem in the setting where the data and noise can be correlated. Such noise is often also referred to as "data-dependent noise". We obtain a correctness result for the standard eigenvalue decomposition (EVD) based solution to PCA under simple assumptions on the data-noise correlation. We also develop and analyze a generalization of EVD, cluster-EVD, that improves upon EVD in certain regimes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPrincipal Components Analysis
