A Comparison of Zero-Inflated Models for Modern Biomedical Data
Max Beveridge, Zach Goldstein, and Hee Cheol Chung

TL;DR
This paper compares different zero-inflated statistical models to determine which performs best on biomedical data with excess zeros, considering factors like dependence, inflation level, and variance.
Contribution
It provides a comprehensive comparison of zero-inflated models for biomedical data, highlighting their performance under various data conditions.
Findings
Zero-inflated models outperform standard models in zero-heavy data.
Model performance varies with dependence and zero-inflation levels.
Simulation and real data analyses validate model selection criteria.
Abstract
Many data sets cannot be accurately described by standard probability distributions due to the excess number of zero values present. For example, zero-inflation is prevalent in microbiome data and single-cell RNA sequencing data, which serve as our real data examples. Several models have been proposed to address zero-inflated datasets including the zero-inflated negative binomial, hurdle negative binomial model, and the truncated latent Gaussian copula model. This study aims to compare various models and determine which one performs optimally under different conditions using both simulation studies and real data analyses. We are particularly interested in investigating how dependence among the variables, level of zero-inflation or deflation, and variance of the data affects model selection.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Bioinformatics and Genomic Networks · Gene expression and cancer classification
