Bayesian nonparametric models for zero-inflated count-compositional data using ensembles of regression trees
Andr\'e F. B. Menezes, Andrew C. Parnell, Keefe Murphy

TL;DR
This paper introduces two Bayesian ensemble models using regression trees to effectively analyze zero-inflated count-compositional data, addressing overdispersion, excess zeros, and complex covariate effects.
Contribution
The authors develop novel Bayesian models with nonparametric priors and latent effects, improving flexibility and inference for zero-inflated compositional data.
Findings
Models successfully capture covariate effects in simulations.
Extension with latent effects models overdispersion and dependencies.
Case study demonstrates practical applicability in palaeoclimate data.
Abstract
Count-compositional data arise in many different fields, including high-throughput sequencing experiments, ecological surveys, and palaeoclimate studies, where a common, important goal is to understand how covariates relate to the observed compositions. Existing methods often fail to simultaneously address key challenges inherent in such data, namely: overdispersion, an excess of zeros, cross-sample heterogeneity, and complex covariate effects. To address these concerns, we propose two novel Bayesian models based on ensembles of regression trees. Specifically, we leverage the recently introduced zero-and--inflated multinomial distribution and assign independent nonparametric Bayesian additive regression tree (BART) priors to both the compositional and structural zero probability components of the model, to flexibly capture covariate effects. We further extend this by adding latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
