Multiple Imputation Through XGBoost

Yongshi Deng; Thomas Lumley

arXiv:2106.01574·stat.ME·July 31, 2023·J. Comput. Graph. Stat.·6 cites

Multiple Imputation Through XGBoost

Yongshi Deng, Thomas Lumley

PDF

Open Access 1 Repo

TL;DR

This paper introduces mixgb, a scalable multiple imputation framework using XGBoost that efficiently handles large, complex datasets by capturing non-linear relations and interactions, improving imputation accuracy and computational speed.

Contribution

The paper presents a novel MI framework based on XGBoost, combining subsampling and predictive mean matching for improved scalability and performance on large datasets.

Findings

01

Effective handling of large datasets with complex structures

02

High computational efficiency achieved by XGBoost-based imputation

03

Reduced bias through subsampling and predictive mean matching

Abstract

The use of multiple imputation (MI) is becoming increasingly popular for addressing missing data. Although some conventional MI approaches have been well studied and have shown empirical validity, they have limitations when processing large datasets with complex data structures. Their imputation performances usually rely on the proper specification of imputation models, which requires expert knowledge of the inherent relations among variables. Moreover, these standard approaches tend to be computationally inefficient for medium and large datasets. In this paper, we propose a scalable MI framework mixgb, which is based on XGBoost, subsampling, and predictive mean matching. Our approach leverages the power of XGBoost, a fast implementation of gradient boosted trees, to automatically capture interactions and non-linear relations while achieving high computational efficiency. In addition,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

agnesdeng/mixgb-supplement
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Bayesian Inference · Statistical Methods and Inference · Bayesian Methods and Mixture Models