A Bayesian Nonparametric Model for Zero-Inflated Outcomes: Prediction, Clustering, and Causal Estimation
Arman Oganisian, Nandita Mitra, Jason Roy

TL;DR
This paper introduces a flexible Bayesian nonparametric model for zero-inflated outcomes that improves prediction, clustering, and causal inference in complex, skewed data settings, demonstrated through simulations and real medical cost data.
Contribution
The paper presents a novel fully nonparametric Bayesian model that simultaneously predicts, clusters, and estimates causal effects for zero-inflated, skewed data, outperforming existing methods.
Findings
Better joint data distribution capture than traditional methods.
Low bias and accurate interval coverage in simulations.
Effective analysis of medical costs with zero-inflation.
Abstract
Researchers are often interested in predicting outcomes, conducting clustering analysis to detect distinct subgroups of their data, or computing causal treatment effects. Pathological data distributions that exhibit skewness and zero-inflation complicate these tasks - requiring highly flexible, data-adaptive modeling. In this paper, we present a fully nonparametric Bayesian generative model for continuous, zero-inflated outcomes that simultaneously predicts structural zeros, captures skewness, and clusters patients with similar joint data distributions. The flexibility of our approach yields predictions that capture the joint data distribution better than commonly used zero-inflated methods. Moreover, we demonstrate that our model can be coherently incorporated into a standardization procedure for computing causal effect estimates that are robust to such data pathologies. Uncertainty at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
