Robust PCA via Outlier Pursuit
Huan Xu, Constantine Caramanis, Sujay Sanghavi

TL;DR
This paper introduces Outlier Pursuit, a convex optimization algorithm that robustly recovers low-dimensional subspaces and identifies corrupted data points in PCA, even with entire points contaminated, which is crucial for applications like bioinformatics.
Contribution
The paper proposes a novel convex optimization method, Outlier Pursuit, that accurately recovers the underlying subspace and detects outliers in PCA with contaminated points, extending existing matrix decomposition techniques.
Findings
Recovers exact low-dimensional subspace under mild assumptions.
Successfully identifies completely corrupted data points.
Applicable to bioinformatics and financial data analysis.
Abstract
Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA such as robust collaborative filtering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. We present an efficient convex optimization-based algorithm we call Outlier Pursuit, that under some mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace, and identifies the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Blind Source Separation Techniques · Machine Learning and Algorithms
MethodsPrincipal Components Analysis
