MIBoost: A gradient boosting algorithm for variable selection after multiple imputation

Robert Kuchen

arXiv:2507.21807·stat.ML·April 13, 2026

MIBoost: A gradient boosting algorithm for variable selection after multiple imputation

Robert Kuchen

PDF

TL;DR

MIBoost is a new gradient boosting algorithm designed for variable selection in datasets with missing data, using multiple imputation and providing a practical, unified approach.

Contribution

It extends gradient boosting to handle multiply imputed datasets with a unified variable selection mechanism, implemented in the R package booami.

Findings

01

MIBoost achieves predictive performance comparable to existing methods.

02

The approach simplifies variable selection across multiple imputed datasets.

03

Implemented as an accessible R package for practical use.

Abstract

Statistical learning methods for automated variable selection, such as the Least Absolute Shrinkage and Selection Operator (LASSO), elastic nets, and gradient boosting, have become increasingly popular tools for building powerful prediction models. Yet, in practice, analyses are often complicated by missing data. The most widely used approach to address missingness is multiple imputation, which involves creating several completed datasets. However, there is an ongoing debate about how to perform model selection in the presence of multiple imputed datasets. Simple strategies, such as pooling models across datasets, have been shown to have suboptimal properties. Although more sophisticated methods exist, they are often difficult to implement and therefore not widely applied. In contrast, two recent approaches extend the regularization methods LASSO and elastic nets to multiply imputed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.