Combining Missing Data Imputation and Internal Validation in Clinical Risk Prediction Models
Junhui Mi, Rahul D. Tendulkar, Sarah M. C. Sittenfeld, Sujata Patil, Emily C. Zabor

TL;DR
This paper explains how to handle missing data in clinical risk prediction models using deterministic imputation and bootstrapping for better accuracy and validation.
Contribution
The paper introduces a tutorial on combining deterministic imputation and internal validation for clinical risk prediction models.
Findings
Deterministic imputation is suitable for clinical risk prediction models when the outcome is not part of the imputation model.
Simulation studies help determine when imputation is appropriate in real-world clinical settings.
Bootstrapping followed by deterministic imputation improves internal validation of risk prediction models.
Abstract
Methods to handle missing data have been extensively explored in the context of estimation and descriptive studies, with multiple imputation being the most widely used method in clinical research. However, in the context of clinical risk prediction models, where the goal is often to achieve high prediction accuracy and to make predictions for future patients, there are different considerations regarding the handling of missing covariate data. As a result, deterministic imputation is better suited to the setting of clinical risk prediction models, since the outcome is not included in the imputation model and the imputation method can be easily applied to future patients. In this paper, we provide a tutorial demonstrating how to conduct bootstrapping followed by deterministic imputation of missing covariate data to construct and internally validate the performance of a clinical risk…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Statistical Methods and Inference · Statistical Methods and Bayesian Inference
