Probabilistic Count Matrix Factorization for Single Cell Expression Data Analysis
Ghislain Durif, Laurent Modolo, Jeff E. Mold, Sophie Lambert-Lacroix, and Franck Picard

TL;DR
This paper introduces a probabilistic count matrix factorization method tailored for single-cell RNA sequencing data, effectively capturing data variability and enabling improved visualization and clustering compared to traditional PCA and other methods.
Contribution
It proposes a novel sparse Gamma-Poisson factor model with a variational EM inference, specifically designed to handle over-dispersed count data with dropouts in single-cell analysis.
Findings
Outperforms PCA and t-SNE in representing single-cell data.
Provides a low-dimensional embedding suitable for clustering.
Demonstrates effectiveness on publicly available datasets.
Abstract
The development of high throughput single-cell sequencing technologies now allows the investigation of the population level diversity of cellular transcriptomes. This diversity has shown two faces. First, the expression dynamics (gene to gene variability) can be quantified more accurately, thanks to the measurement of lowly-expressed genes. Second, the cell-to-cell variability is high, with a low proportion of cells expressing the same gene at the same time/level. Those emerging patterns appear to be very challenging from the statistical point of view, especially to represent and to provide a summarized view of single-cell expression data. PCA is one of the most powerful framework to provide a suitable representation of high dimensional datasets, by searching for latent directions catching the most variability in the data. Unfortunately, classical PCA is based on Euclidean distances and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Gene expression and cancer classification · Cell Image Analysis Techniques
