Bayesian segmented Gaussian copula factor model for single-cell sequencing data
Junsouk Choi, Hee Cheol Chung, Irina Gaynanova, Yang Ni

TL;DR
This paper introduces a Bayesian segmented Gaussian copula factor model tailored for single-cell sequencing data, effectively handling dropout-induced zero inflation and skewness, and automatically determining the number of latent factors.
Contribution
The novel model explicitly accounts for zero inflation and skewness in single-cell data, with a Dirichlet-Laplace prior enabling automatic factor number selection and addressing identifiability issues.
Findings
Outperforms existing methods in simulated data with dropout and skewness
Identifies meaningful biological factors in real single-cell RNA-seq data
Detects previously uncharacterized cell subtypes
Abstract
Single-cell sequencing technologies have significantly advanced molecular and cellular biology, offering unprecedented insights into cellular heterogeneity by allowing for the measurement of gene expression at an individual cell level. However, the analysis of such data is challenged by the prevalence of low counts due to dropout events and the skewed nature of the data distribution, which conventional Gaussian factor models struggle to handle effectively. To address these challenges, we propose a novel Bayesian segmented Gaussian copula model to explicitly account for inflation of zero and near-zero counts, and to address the high skewness in the data. By employing a Dirichlet-Laplace prior for each column of the factor loadings matrix, we shrink the loadings of unnecessary factors towards zero, which leads to a simple approach to automatically determine the number of latent factors,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics
