Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods
Jiacong Du, Jonathan Boss, Peisong Han, Lauren J Beesley, Stephen A, Goutman, Stuart Batterman, Eva L Feldman, Bhramar Mukherjee

TL;DR
This paper compares stacked and grouped penalized regression methods for variable selection in multiply-imputed datasets, proposing algorithms, an R package, and demonstrating their effectiveness through simulations and real data application.
Contribution
It introduces efficient algorithms for both methods, incorporates adaptive penalties, and provides a comprehensive comparison and an R package for practical use.
Findings
Stacked methods are more computationally efficient.
Stacked approaches have better estimation accuracy.
Methods successfully applied to ALS risk data.
Abstract
Penalized regression methods, such as lasso and elastic net, are used in many biomedical applications when simultaneous regression coefficient estimation and variable selection is desired. However, missing data complicates the implementation of these methods, particularly when missingness is handled using multiple imputation. Applying a variable selection algorithm on each imputed dataset will likely lead to different sets of selected predictors, making it difficult to ascertain a final active set without resorting to ad hoc combination rules. In this paper we consider a general class of penalized objective functions which, by construction, force selection of the same variables across multiply-imputed datasets. By pooling objective functions across imputations, optimization is then performed jointly over all imputed datasets rather than separately for each dataset. We consider two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Bayesian Methods and Mixture Models
