Model-based Clustering of Multi-Dimensional Zero-Inflated Counts via the EM Algorithm
Zahra AghahosseinaliShirazi, Pedro A. Rangel, and Camila P. E. de, Souza

TL;DR
This paper introduces a model-based clustering method for multi-dimensional zero-inflated count data using mixtures of ZIP and ZINB distributions, estimated with the EM algorithm, applicable to structured data with covariates.
Contribution
It develops a novel EM-based clustering framework for zero-inflated counts in matrix form, incorporating covariates and size factors, which advances existing methods for heterogeneous data.
Findings
High clustering accuracy demonstrated in simulations
Effective handling of covariates and size factors
Successful application to real datasets
Abstract
Zero-inflated count data arise in various fields, including health, biology, economics, and the social sciences. These data are often modelled using probabilistic distributions such as zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), or zero-inflated binomial (ZIB). To account for heterogeneity in the data, it is often useful to cluster observations into groups that may explain underlying differences in the data-generating process. This paper focuses on model-based clustering for zero-inflated counts when observations are structured in a matrix form rather than a vector. We propose a clustering framework based on mixtures of ZIP or ZINB distributions, with both the count and zero components depending on cluster assignments. Our approach incorporates covariates through a log-linear structure for the mean parameter and includes a size factor to adjust for differences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics
