Incorporating Missing Data Considerations into Sample Size Calculations for Developing Clinical Prediction Models
Glen P. Martin, Sian Bladon, Rebecca Whittle, Molly Wells, Gary S. Collins, Richard D. Riley

TL;DR
This study quantifies how missing predictor data impacts sample size requirements and model performance in clinical prediction models, proposing a framework to incorporate missing data considerations into sample size calculations.
Contribution
It introduces a novel adaptation of posterior sampling-based sample size calculations to explicitly account for missing data and imputation strategies.
Findings
Missing data reduces predictive performance and calibration.
Increasing sample size can mitigate missing data effects, sometimes doubling the minimum size.
A new framework allows sample size determination considering missing data handling.
Abstract
Clinical prediction models must be developed using sufficiently large datasets to minimise overfitting and ensure robust predictive performance. Existing sample size calculations assume complete predictor data for all included participants, yet missing values are common and may increase required sample sizes. This study aimed to quantify how missing predictor data and different imputation methods affect overfitting and model degradation, within datasets that adhere to current sample size criteria. We also aimed to explore how a general sample size framework based on anticipated posterior (sampling) distributions can be adapted to incorporate missing data assumptions and handling strategies. Using a simulation study, we found that in development data meeting current minimum sample size requirements, missing data reduced predictive performance, with expected calibration slopes frequently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
