A comparison of strategies for selecting auxiliary variables for multiple imputation
Rheanna M. Mainzer, Cattram D. Nguyen, John B. Carlin, Margarita, Moreno-Betancur, Ian R. White, Katherine J. Lee

TL;DR
This paper compares eight strategies for selecting auxiliary variables in multiple imputation, finding that including all variables generally performs best, with LASSO being a promising alternative when full inclusion is infeasible.
Contribution
It provides a comprehensive simulation and case study comparison of auxiliary variable selection strategies for multiple imputation, offering practical guidance.
Findings
Full model outperforms all selection strategies in simulations.
LASSO is the best performing selection method overall.
All strategies yielded similar estimates in the case study.
Abstract
Multiple imputation (MI) is a popular method for handling missing data. Auxiliary variables can be added to the imputation model(s) to improve MI estimates. However, the choice of which auxiliary variables to include in the imputation model is not always straightforward. Including too few may lead to important information being discarded, but including too many can cause problems with convergence of the estimation procedures for imputation models. Several data-driven auxiliary variable selection strategies have been proposed. This paper uses a simulation study and a case study to provide a comprehensive comparison of the performance of eight auxiliary variable selection strategies, with the aim of providing practical advice to users of MI. A complete case analysis and an MI analysis with all auxiliary variables included in the imputation model (the full model) were also performed for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Statistical Methods and Inference · Bayesian Methods and Mixture Models
