TL;DR
This paper proposes a method for selecting imputation models by comparing the density of imputed and observed values after balancing covariates, demonstrated with survey data and an R package.
Contribution
It introduces a new approach for choosing imputation models based on density similarity and covariate balancing, with practical implementation.
Findings
The proposed method effectively compares imputation models using discrepancy statistics.
Application to survey data shows improved model selection accuracy.
An R package is provided for practical use of the method.
Abstract
Imputing missing values is an important preprocessing step in data analysis, but the literature offers little guidance on how to choose between different imputation models. This letter suggests adopting the imputation model that generates a density of imputed values most similar to those of the observed values for an incomplete variable after balancing all other covariates. We recommend stable balancing weights as a practical approach to balance covariates whose distribution is expected to differ if the values are not missing completely at random. After balancing, discrepancy statistics can be used to compare the density of imputed and observed values. We illustrate the application of the suggested approach using simulated and real-world survey data from the American National Election Study, comparing popular imputation approaches including random forests, hot-deck, predictive mean…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
