Masking criteria for selecting an imputation model
Yanjiao Yang, Daniel Suen, Yen-Chi Chen

TL;DR
This paper develops new criteria for selecting imputation models that better account for data stochasticity, extending the traditional masking-one-out method with theoretical insights and practical tools.
Contribution
It introduces three modified MOO criteria based on rank, energy distance, and likelihood, along with a comprehensive theoretical framework and a visual comparison diagram.
Findings
New criteria improve imputation model selection accuracy.
Likelihood-based approach enables model learning and consistency proofs.
Prediction-imputation diagram offers a visual utility comparison.
Abstract
The masking-one-out (MOO) procedure, masking an observed entry and comparing it versus its imputed values, is a very common procedure for comparing imputation models. We study the optimum of this procedure and generalize it to a missing data assumption and establish the corresponding semi-parametric efficiency theory. However, MOO is a measure of prediction accuracy, which is not ideal for evaluating an imputation model. To address this issue, we introduce three modified MOO criteria, based on rank transformation, energy distance, and likelihood principle, that allow us to select an imputation model that properly account for the stochastic nature of data. The likelihood approach further enables an elegant framework of learning an imputation model from the data and we derive its statistical and computational learning theories as well as consistency of BIC model selection. We also show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Machine Learning and Algorithms · Data Visualization and Analytics
