Classification of datasets with imputed missing values: does imputation quality matter?
Tolou Shadbahr, Michael Roberts, Jan Stanczuk, Julian Gilbey, and Philip Teare, S\"oren Dittmer, Matthew Thorpe, Ramon Vinas Torne, Evis, Sala, Pietro Lio, Mishal Patel, AIX-COVNET Collaboration, James H.F. Rudd,, Tuomas Mirtti, Antti Rannikko, John A.D. Aston, Jing Tang

TL;DR
This paper investigates the impact of imputation quality on classification performance in incomplete datasets, revealing flaws in current assessment methods and proposing new discrepancy scores to better evaluate imputation methods.
Contribution
It introduces a novel class of discrepancy scores for assessing imputation quality based on data distribution recreation, emphasizing the importance of imputation quality for classifier interpretability.
Findings
Current quality measures are flawed
Proposed discrepancy scores better evaluate imputation
Poor imputation impairs classifier interpretability
Abstract
Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods, followed by classification of the now complete, imputed, samples. The focus of the machine learning researcher is then to optimise the downstream classification performance. In this study, we highlight that it is imperative to consider the quality of the imputation. We demonstrate how the commonly used measures for assessing quality are flawed and propose a new class of discrepancy scores which focus on how well the method recreates the overall distribution of the data. To conclude, we highlight the compromised interpretability of classifier models trained using poorly imputed data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
