MIBoost: A gradient boosting algorithm for variable selection after multiple imputation
Robert Kuchen

TL;DR
MIBoost is a new gradient boosting algorithm designed for variable selection in datasets with missing data, using multiple imputation and providing a practical, unified approach.
Contribution
It extends gradient boosting to handle multiply imputed datasets with a unified variable selection mechanism, implemented in the R package booami.
Findings
MIBoost achieves predictive performance comparable to existing methods.
The approach simplifies variable selection across multiple imputed datasets.
Implemented as an accessible R package for practical use.
Abstract
Statistical learning methods for automated variable selection, such as the Least Absolute Shrinkage and Selection Operator (LASSO), elastic nets, and gradient boosting, have become increasingly popular tools for building powerful prediction models. Yet, in practice, analyses are often complicated by missing data. The most widely used approach to address missingness is multiple imputation, which involves creating several completed datasets. However, there is an ongoing debate about how to perform model selection in the presence of multiple imputed datasets. Simple strategies, such as pooling models across datasets, have been shown to have suboptimal properties. Although more sophisticated methods exist, they are often difficult to implement and therefore not widely applied. In contrast, two recent approaches extend the regularization methods LASSO and elastic nets to multiply imputed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
