DIMM-SC: A Dirichlet mixture model for clustering droplet-based single cell transcriptomic data
Zhe Sun, Ting Wang, Ke Deng, Xiao-Feng Wang, Robert Lafyatis, Ying, Ding, Ming Hu, Wei Chen

TL;DR
DIMM-SC introduces a Dirichlet Mixture Model tailored for clustering droplet-based single cell transcriptomic data, improving accuracy, quantifying uncertainty, and enabling rigorous statistical inference in scRNA-Seq analysis.
Contribution
This paper presents the first model-based clustering method specifically designed for droplet-based scRNA-Seq data, incorporating a Dirichlet mixture prior and an EM algorithm for enhanced performance.
Findings
DIMM-SC outperforms existing methods in clustering accuracy.
It provides reliable quantification of clustering uncertainty.
Demonstrated effectiveness on simulated and real datasets.
Abstract
Motivation: Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the technology advances, statistical methods and computational tools are still lacking for analyzing droplet-based scRNA-Seq data. Particularly, model-based approaches for clustering large-scale single cell transcriptomic data are still under-explored. Methods: We developed DIMM-SC, a Dirichlet Mixture Model for clustering droplet-based Single Cell transcriptomic data. This approach explicitly models UMI count data from scRNA-Seq experiments and characterizes variations across different cell…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Gene expression and cancer classification · Bayesian Methods and Mixture Models
