A Comparison of Zero-Inflated Models for Modern Biomedical Data

Max Beveridge; Zach Goldstein; and Hee Cheol Chung

arXiv:2411.12086·stat.ME·November 20, 2024

A Comparison of Zero-Inflated Models for Modern Biomedical Data

Max Beveridge, Zach Goldstein, and Hee Cheol Chung

PDF

Open Access

TL;DR

This paper compares different zero-inflated statistical models to determine which performs best on biomedical data with excess zeros, considering factors like dependence, inflation level, and variance.

Contribution

It provides a comprehensive comparison of zero-inflated models for biomedical data, highlighting their performance under various data conditions.

Findings

01

Zero-inflated models outperform standard models in zero-heavy data.

02

Model performance varies with dependence and zero-inflation levels.

03

Simulation and real data analyses validate model selection criteria.

Abstract

Many data sets cannot be accurately described by standard probability distributions due to the excess number of zero values present. For example, zero-inflation is prevalent in microbiome data and single-cell RNA sequencing data, which serve as our real data examples. Several models have been proposed to address zero-inflated datasets including the zero-inflated negative binomial, hurdle negative binomial model, and the truncated latent Gaussian copula model. This study aims to compare various models and determine which one performs optimally under different conditions using both simulation studies and real data analyses. We are particularly interested in investigating how dependence among the variables, level of zero-inflation or deflation, and variance of the data affects model selection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Bioinformatics and Genomic Networks · Gene expression and cancer classification