Masking criteria for selecting an imputation model

Yanjiao Yang; Daniel Suen; Yen-Chi Chen

arXiv:2511.10048·stat.ME·November 14, 2025

Masking criteria for selecting an imputation model

Yanjiao Yang, Daniel Suen, Yen-Chi Chen

PDF

Open Access

TL;DR

This paper develops new criteria for selecting imputation models that better account for data stochasticity, extending the traditional masking-one-out method with theoretical insights and practical tools.

Contribution

It introduces three modified MOO criteria based on rank, energy distance, and likelihood, along with a comprehensive theoretical framework and a visual comparison diagram.

Findings

01

New criteria improve imputation model selection accuracy.

02

Likelihood-based approach enables model learning and consistency proofs.

03

Prediction-imputation diagram offers a visual utility comparison.

Abstract

The masking-one-out (MOO) procedure, masking an observed entry and comparing it versus its imputed values, is a very common procedure for comparing imputation models. We study the optimum of this procedure and generalize it to a missing data assumption and establish the corresponding semi-parametric efficiency theory. However, MOO is a measure of prediction accuracy, which is not ideal for evaluating an imputation model. To address this issue, we introduce three modified MOO criteria, based on rank transformation, energy distance, and likelihood principle, that allow us to select an imputation model that properly account for the stochastic nature of data. The likelihood approach further enables an elegant framework of learning an imputation model from the data and we derive its statistical and computational learning theories as well as consistency of BIC model selection. We also show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Bayesian Inference · Machine Learning and Algorithms · Data Visualization and Analytics