Evaluating tree-based imputation methods as an alternative to MICE PMM for drawing inference in empirical studies
Jakob Schwerter, Ketevan Gurtskaia, Andr\'es Romero, Birgit, Zeyer-Gliozzo, Markus Pauly

TL;DR
This study compares tree-based imputation methods to the standard MICE PMM approach for handling missing data, focusing on their impact on statistical inference in empirical research, using a large educational dataset.
Contribution
It provides a comprehensive evaluation of tree-based imputation methods versus MICE PMM, highlighting their relative performance in coefficient estimation, error rates, and power in linear models.
Findings
Random Forest-based imputations outperform MICE PMM in most scenarios.
MICE PMM shows increased bias and conservative test decisions.
All methods perform worse with higher missingness, especially missRanger.
Abstract
Dealing with missing data is an important problem in statistical analysis that is often addressed with imputation procedures. The performance and validity of such methods are of great importance for their application in empirical studies. While the prevailing method of Multiple Imputation by Chained Equations (MICE) with Predictive Mean Matching (PMM) is considered standard in the social science literature, the increase in complex datasets may require more advanced approaches based on machine learning. In particular, tree-based imputation methods have emerged as very competitive approaches. However, the performance and validity are not completely understood, particularly compared to the standard MICE PMM. This is especially true for inference in linear models. In this study, we investigate the impact of various imputation methods on coefficient estimation, Type I error, and power, to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Analysis with R · Computational and Text Analysis Methods · Mental Health Research Topics
