Sparse logistic principal components analysis for binary data
Seokho Lee, Jianhua Z. Huang, Jianhua Hu

TL;DR
The paper introduces a sparse logistic PCA method tailored for binary data, leveraging a penalized Bernoulli likelihood and a Majorization-Minimization algorithm for improved interpretability and stability.
Contribution
It presents a novel sparse logistic PCA framework based on a penalized Bernoulli likelihood and an efficient optimization algorithm, extending PCA to binary data.
Findings
Effective in analyzing SNP data
Outperforms traditional PCA in binary data contexts
Provides interpretable sparse principal components
Abstract
We develop a new principal components analysis (PCA) type dimension reduction method for binary data. Different from the standard PCA which is defined on the observed data, the proposed PCA is defined on the logit transform of the success probabilities of the binary observations. Sparsity is introduced to the principal component (PC) loading vectors for enhanced interpretability and more stable extraction of the principal components. Our sparse PCA is formulated as solving an optimization problem with a criterion function motivated from a penalized Bernoulli likelihood. A Majorization--Minimization algorithm is developed to efficiently solve the optimization problem. The effectiveness of the proposed sparse logistic PCA method is illustrated by application to a single nucleotide polymorphism data set and a simulation study.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
