Adapting tree-based multiple imputation methods for multi-level data? A simulation study
Nico F\"oge, Jakob Schwerter, Ketevan Gurtskaia, Markus Pauly, and, Philipp Doebler

TL;DR
This study evaluates novel tree-based imputation methods adapted for hierarchical data, demonstrating their advantages over traditional MICE especially at higher missingness rates and for level-1 variables.
Contribution
It introduces and assesses adapted tree-based imputation methods, specifically Chained Random Forests and Extreme Gradient Boosting with cluster dummies, for multilevel data.
Findings
Adapted boosting outperforms MICE at high missingness for level-1 variables.
MICE remains robust for level-2 variables at low missingness.
Tree-based methods show promise as alternatives to MICE in multilevel data.
Abstract
When data have a hierarchical structure, such as students nested within classrooms, ignoring dependencies between observations can compromise the validity of imputation procedures. Standard tree-based imputation methods implicitly assume independence between observations, limiting their applicability in multilevel data settings. Although Multivariate Imputation by Chained Equations (MICE) is widely used for hierarchical data, it has limitations, including sensitivity to model specification and computational complexity. Alternative tree-based approaches have shown promise for individual-level data, but remain largely unexplored for hierarchical contexts. In this simulation study, we systematically evaluate the performance of novel tree-based methods--Chained Random Forests and Extreme Gradient Boosting (mixgb)--explicitly adapted for multi-level data by incorporating dummy variables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Bayesian Methods and Mixture Models
