A method for comparing multiple imputation techniques: a case study on the U.S. National COVID Cohort Collaborative
Elena Casiraghi, Rachel Wong, Margaret Hall, Ben Coleman, Marco, Notaro, Michael D. Evans, Jena S. Tronieri, Hannah Blau, Bryan Laraway,, Tiffany J. Callahan, Lauren E. Chan, Carolyn T. Bramante, John B. Buse,, Richard A. Moffitt, Til Sturmer, Steven G. Johnson, Yu Raymond Shao

TL;DR
This paper introduces a new framework for evaluating multiple imputation methods to handle missing data in healthcare datasets, demonstrated on COVID-19 patient data from the N3C cohort, highlighting the most effective strategies.
Contribution
The authors propose a novel, generalizable framework for numerically comparing multiple imputation techniques in complex datasets, addressing a key challenge in missing data analysis.
Findings
The framework effectively identified the most valid imputation strategy for the case study.
Different models' behaviors varied with parameter changes, providing insights into their robustness.
The approach is applicable across various research fields and heterogeneous datasets.
Abstract
Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful to assess associations between patients' predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases and the simple removal of these cases may introduce severe bias. For these reasons, several multiple imputation algorithms have been proposed to attempt to recover the missing information. Each algorithm presents strengths and weaknesses, and there is currently no consensus on which multiple imputation algorithms works best in a given scenario. Furthermore, the selection of each algorithm parameters and data-related modelling choices are also both crucial and challenging. In this paper, we propose a novel framework to numerically evaluate strategies for handling missing data in the context of statistical analysis, with a particular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChronic Disease Management Strategies · Machine Learning in Healthcare · Diabetes Management and Research
