Combining missing data imputation and internal validation in clinical risk prediction models
Junhui Mi, Rahul D. Tendulkar, Sarah M. C. Sittenfeld, Sujata Patil,, Emily C. Zabor

TL;DR
This paper demonstrates how to combine deterministic imputation with bootstrapping for internal validation of clinical risk prediction models, emphasizing practical guidance and simulation results for handling missing data effectively.
Contribution
It introduces a tutorial approach for integrating deterministic imputation with bootstrapping to improve model validation in the presence of missing data.
Findings
Deterministic imputation is suitable for clinical risk prediction models.
Bootstrapping combined with imputation enhances internal validation.
Simulation results guide practical decision-making.
Abstract
Methods to handle missing data have been extensively explored in the context of estimation and descriptive studies, with multiple imputation being the most widely used method in clinical research. However, in the context of clinical risk prediction models, where the goal is often to achieve high prediction accuracy and to make predictions for future patients, there are different considerations regarding the handling of missing data. As a result, deterministic imputation is better suited to the setting of clinical risk prediction models, since the outcome is not included in the imputation model and the imputation method can be easily applied to future patients. In this paper, we provide a tutorial demonstrating how to conduct bootstrapping followed by deterministic imputation of missing data to construct and internally validate the performance of a clinical risk prediction model in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Statistical Methods and Inference · Artificial Intelligence in Healthcare
