Self-Paced Probabilistic Principal Component Analysis for Data with Outliers
Bowen Zhao, Xi Xiao, Wanpeng Zhang, Bin Zhang, Shutao Xia

TL;DR
This paper introduces SP-PPCA, a robust variant of probabilistic PCA that incorporates self-paced learning to effectively identify and mitigate the influence of outliers in data analysis.
Contribution
The paper proposes a novel self-paced learning approach integrated into PPCA, enhancing robustness against outliers with an efficient optimization algorithm.
Findings
SP-PPCA effectively reduces outlier impact in synthetic and real datasets.
The method outperforms standard PPCA in robustness and accuracy.
Experimental results validate the effectiveness of the proposed approach.
Abstract
Principal Component Analysis (PCA) is a popular tool for dimensionality reduction and feature extraction in data analysis. There is a probabilistic version of PCA, known as Probabilistic PCA (PPCA). However, standard PCA and PPCA are not robust, as they are sensitive to outliers. To alleviate this problem, this paper introduces the Self-Paced Learning mechanism into PPCA, and proposes a novel method called Self-Paced Probabilistic Principal Component Analysis (SP-PPCA). Furthermore, we design the corresponding optimization algorithm based on the alternative search strategy and the expectation-maximization algorithm. SP-PPCA looks for optimal projection vectors and filters out outliers iteratively. Experiments on both synthetic problems and real-world datasets clearly demonstrate that SP-PPCA is able to reduce or eliminate the impact of outliers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Spectroscopy and Chemometric Analyses · Blind Source Separation Techniques
MethodsPrincipal Components Analysis
