How to rank imputation methods?

Jeffrey N\"af; Krystyna Grzesiak; Erwan Scornet

arXiv:2507.11297·stat.ME·July 16, 2025

How to rank imputation methods?

Jeffrey N\"af, Krystyna Grzesiak, Erwan Scornet

PDF

Open Access

TL;DR

This paper introduces a new, reliable scoring method for ranking imputation techniques based on how well they replicate data distribution, addressing limitations of existing approaches especially under MAR missingness.

Contribution

We develop a novel Imputation Score (I-Score) that effectively ranks imputation methods without complete data, considering data distribution and proper masking strategies.

Findings

01

The new score accurately ranks imputation methods in simulations.

02

It outperforms traditional RMSE-based approaches under MAR.

03

The score is effective for various downstream tasks.

Abstract

Imputation is an attractive tool for dealing with the widespread issue of missing values. Consequently, studying and developing imputation methods has been an active field of research over the last decade. Faced with an imputation task and a large number of methods, how does one find the most suitable imputation? Although model selection in different contexts, such as prediction, has been well studied, this question appears not to have received much attention. In this paper, we follow the concept of Imputation Scores (I-Scores) and develop a new, reliable, and easy-to-implement score to rank missing value imputations for a given data set without access to the complete data. In practice, this is usually done by artificially masking observations to compare imputed to observed values using measures such as the Root Mean Squared Error (RMSE). We discuss how this approach of additionally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models