How to rank imputation methods?
Jeffrey N\"af, Krystyna Grzesiak, Erwan Scornet

TL;DR
This paper introduces a new, reliable scoring method for ranking imputation techniques based on how well they replicate data distribution, addressing limitations of existing approaches especially under MAR missingness.
Contribution
We develop a novel Imputation Score (I-Score) that effectively ranks imputation methods without complete data, considering data distribution and proper masking strategies.
Findings
The new score accurately ranks imputation methods in simulations.
It outperforms traditional RMSE-based approaches under MAR.
The score is effective for various downstream tasks.
Abstract
Imputation is an attractive tool for dealing with the widespread issue of missing values. Consequently, studying and developing imputation methods has been an active field of research over the last decade. Faced with an imputation task and a large number of methods, how does one find the most suitable imputation? Although model selection in different contexts, such as prediction, has been well studied, this question appears not to have received much attention. In this paper, we follow the concept of Imputation Scores (I-Scores) and develop a new, reliable, and easy-to-implement score to rank missing value imputations for a given data set without access to the complete data. In practice, this is usually done by artificially masking observations to compare imputed to observed values using measures such as the Root Mean Squared Error (RMSE). We discuss how this approach of additionally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models
