Prediction with Missing Data via Bayesian Additive Regression Trees
Adam Kapelner, Justin Bleich

TL;DR
This paper introduces a Bayesian tree-based method that effectively handles missing data without imputation, improving prediction accuracy and stability in complex scenarios involving missingness.
Contribution
It extends Bayesian Additive Regression Trees with a novel approach to incorporate missingness directly into the model, enhancing predictive performance and interpretability.
Findings
Outperforms competitors in predictive accuracy and stability.
Effectively models missing-at-random and not-missing-at-random data.
Enables uncertainty quantification considering missingness.
Abstract
We present a method for incorporating missing data in non-parametric statistical learning without the need for imputation. We focus on a tree-based method, Bayesian Additive Regression Trees (BART), enhanced with "Missingness Incorporated in Attributes," an approach recently proposed incorporating missingness into decision trees (Twala, 2008). This procedure takes advantage of the partitioning mechanisms found in tree-based models. Simulations on generated models and real data indicate that our proposed method can forecast well on complicated missing-at-random and not-missing-at-random models as well as models where missingness itself influences the response. Our procedure has higher predictive performance and is more stable than competitors in many cases. We also illustrate BART's abilities to incorporate missingness into uncertainty intervals and to detect the influence of missingness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
