The consequences of checking for zero-inflation and overdispersion in the analysis of count data
Harlan Campbell

TL;DR
This paper investigates how checking for zero-inflation and overdispersion affects count data analysis, highlighting the potential biases introduced by model selection procedures in ecological studies.
Contribution
It provides a large-scale simulation study to assess the impact of model selection bias when testing for zero-inflation and overdispersion in count data analysis.
Findings
Model selection bias can significantly influence analysis outcomes.
Checking for zero-inflation and overdispersion affects model choice.
Simulation results highlight potential pitfalls in multi-stage modeling procedures.
Abstract
Count data are ubiquitous in ecology and the Poisson generalized linear model (GLM) is commonly used to model the association between counts and explanatory variables of interest. When fitting this model to the data, one typically proceeds by first confirming that the data is not overdispersed and that there is no excess of zeros. If the data appear to be overdispersed or if there is any zero-inflation, key assumptions of the Poison GLM may be violated and researchers will then typically consider alternatives to the Poison GLM. An important question is whether the potential model selection bias introduced by this data-driven multi-stage procedure merits concern. In this paper, we conduct a large-scale simulation study to investigate the potential consequences of model selection bias that can arise in the simple scenario of analyzing a sample of potentially overdispersed, potentially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
