Unlabeled Principal Component Analysis and Matrix Completion
Yunzhen Yao, Liangzu Peng, Manolis C. Tsakiris

TL;DR
This paper introduces Unlabeled Principal Component Analysis (UPCA), a method for recovering original data matrices from corrupted, permuted, and incomplete data using algebraic geometry and a two-stage algorithm, with applications in privacy and record linkage.
Contribution
It formulates UPCA as an algebraic problem, proposes an efficient two-stage algorithm, and extends the framework to unlabeled matrix completion with theoretical guarantees.
Findings
Algorithms effectively recover original data in synthetic and real datasets.
Method outperforms existing approaches in scenarios with permutations and missing data.
Applications demonstrated in data privatization and record linkage.
Abstract
We introduce robust principal component analysis from a data matrix in which the entries of its columns have been corrupted by permutations, termed Unlabeled Principal Component Analysis (UPCA). Using algebraic geometry, we establish that UPCA is a well-defined algebraic problem in the sense that the only matrices of minimal rank that agree with the given data are row-permutations of the ground-truth matrix, arising as the unique solutions of a polynomial system of equations. Further, we propose an efficient two-stage algorithmic pipeline for UPCA suitable for the practically relevant case where only a fraction of the data have been permuted. Stage-I employs outlier-robust PCA methods to estimate the ground-truth column-space. Equipped with the column-space, Stage-II applies recent methods for unlabeled sensing to restore the permuted data. Allowing for missing entries on top of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Blind Source Separation Techniques · Face and Expression Recognition
MethodsLinear Regression
