TL;DR
HePPCAT introduces a probabilistic PCA method that effectively models and estimates heteroscedastic noise, improving data dimensionality reduction in heterogeneous datasets compared to traditional PCA.
Contribution
This paper develops a novel heteroscedastic probabilistic PCA method using efficient algorithms, addressing limitations of classical PCA in heterogeneous noise settings.
Findings
HePPCAT outperforms traditional PCA in heterogeneous noise scenarios.
The proposed algorithms are computationally efficient and scalable.
Real data experiments demonstrate improved PCA estimates with HePPCAT.
Abstract
Principal component analysis (PCA) is a classical and ubiquitous method for reducing data dimensionality, but it is suboptimal for heterogeneous data that are increasingly common in modern applications. PCA treats all samples uniformly so degrades when the noise is heteroscedastic across samples, as occurs, e.g., when samples come from sources of heterogeneous quality. This paper develops a probabilistic PCA variant that estimates and accounts for this heterogeneity by incorporating it in the statistical model. Unlike in the homoscedastic setting, the resulting nonconvex optimization problem is not seemingly solved by singular value decomposition. This paper develops a heteroscedastic probabilistic PCA technique (HePPCAT) that uses efficient alternating maximization algorithms to jointly estimate both the underlying factors and the unknown noise variances. Simulation experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
