Bayesian Variable Selection for Globally Sparse Probabilistic PCA
Charles Bouveyron, Pierre Latouche, Pierre-Alexandre Mattei

TL;DR
This paper introduces GSPPCA, a Bayesian method for sparse PCA that enforces a shared sparsity pattern across components, improving interpretability and variable selection in high-dimensional data.
Contribution
The paper presents the first exact marginal likelihood computation for Bayesian PCA with shared sparsity, along with a variational EM algorithm for model selection.
Findings
GSPPCA outperforms traditional sparse PCA in gene subset relevance.
Shared sparsity pattern enhances interpretability of principal components.
Method successfully applied to real and synthetic datasets.
Abstract
Sparse versions of principal component analysis (PCA) have imposed themselves as simple, yet powerful ways of selecting relevant features of high-dimensional data in an unsupervised manner. However, when several sparse principal components are computed, the interpretation of the selected variables is difficult since each axis has its own sparsity pattern and has to be interpreted separately. To overcome this drawback, we propose a Bayesian procedure called globally sparse probabilistic PCA (GSPPCA) that allows to obtain several sparse components with the same sparsity pattern. This allows the practitioner to identify the original variables which are relevant to describe the data. To this end, using Roweis' probabilistic interpretation of PCA and a Gaussian prior on the loading matrix, we provide the first exact computation of the marginal likelihood of a Bayesian PCA model. To avoid the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPrincipal Components Analysis
