Sparse logistic principal components analysis for binary data

Seokho Lee; Jianhua Z. Huang; Jianhua Hu

arXiv:1011.3626·stat.AP·November 17, 2010

Sparse logistic principal components analysis for binary data

Seokho Lee, Jianhua Z. Huang, Jianhua Hu

PDF

TL;DR

The paper introduces a sparse logistic PCA method tailored for binary data, leveraging a penalized Bernoulli likelihood and a Majorization-Minimization algorithm for improved interpretability and stability.

Contribution

It presents a novel sparse logistic PCA framework based on a penalized Bernoulli likelihood and an efficient optimization algorithm, extending PCA to binary data.

Findings

01

Effective in analyzing SNP data

02

Outperforms traditional PCA in binary data contexts

03

Provides interpretable sparse principal components

Abstract

We develop a new principal components analysis (PCA) type dimension reduction method for binary data. Different from the standard PCA which is defined on the observed data, the proposed PCA is defined on the logit transform of the success probabilities of the binary observations. Sparsity is introduced to the principal component (PC) loading vectors for enhanced interpretability and more stable extraction of the principal components. Our sparse PCA is formulated as solving an optimization problem with a criterion function motivated from a penalized Bernoulli likelihood. A Majorization--Minimization algorithm is developed to efficiently solve the optimization problem. The effectiveness of the proposed sparse logistic PCA method is illustrated by application to a single nucleotide polymorphism data set and a simulation study.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.