High-dimensional Imputation for the Social Sciences: a Comparison of State-of-the-art Methods
Edoardo Costantini, Kyle M. Lang, Tim Reeskens, Klaas Sijtsma

TL;DR
This paper compares various high-dimensional multiple imputation methods for handling large predictor sets in social science data, highlighting effective techniques like lasso, forward selection, and PCA.
Contribution
It provides a comprehensive evaluation of seven high-dimensional MI methods, offering guidance on their relative performance in social science research.
Findings
Lasso penalty and forward selection improve imputation accuracy.
Principal component analysis effectively reduces dimensionality.
Selected methods yield unbiased and valid parameter estimates.
Abstract
Including a large number of predictors in the imputation model underlying a multiple imputation (MI) procedure is one of the most challenging tasks imputers face. A variety of high-dimensional MI techniques can help, but there has been limited research on their relative performance. In this study, we investigated a wide range of extant high-dimensional MI techniques that can handle a large number of predictors in the imputation models and general missing data patterns. We assessed the relative performance of seven high-dimensional MI methods with a Monte Carlo simulation study and a resampling study based on real survey data. The performance of the methods was defined by the degree to which they facilitate unbiased and confidencevalid estimates of the parameters of complete data analysis models. We found that using lasso penalty or forward selection to select the predictors used in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Survey Methodology and Nonresponse · Spatial and Panel Data Analysis
