Extended sample size calculations for evaluation of prediction models using a threshold for classification
Rebecca Whittle, Joie Ensor, Lucinda Archer, Gary S. Collins, Paula, Dhiman, Alastair Denniston, Joseph Alderman, Amardeep Legha, Maarten van, Smeden, Karel G. Moons, Jean-Baptiste Cazier, Richard D. Riley, Kym I.E., Snell

TL;DR
This paper extends sample size calculation methods to precisely estimate threshold-based performance measures for prediction models, providing formulas and code to aid external validation studies.
Contribution
It introduces closed-form sample size formulas for threshold-based measures, including accuracy, sensitivity, and PPV, with implementation in Python, R, and Stata.
Findings
Sample size for threshold measures can be lower than for calibration slope.
Formulas enable precise estimation of performance measures in external validation.
Extension to time-to-event outcomes is also discussed.
Abstract
When evaluating the performance of a model for individualised risk prediction, the sample size needs to be large enough to precisely estimate the performance measures of interest. Current sample size guidance is based on precisely estimating calibration, discrimination, and net benefit, which should be the first stage of calculating the minimum required sample size. However, when a clinically important threshold is used for classification, other performance measures can also be used. We extend the previously published guidance to precisely estimate threshold-based performance measures. We have developed closed-form solutions to estimate the sample size required to target sufficiently precise estimates of accuracy, specificity, sensitivity, PPV, NPV, and F1-score in an external evaluation study of a prediction model with a binary outcome. This approach requires the user to pre-specify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSepsis Diagnosis and Treatment · Advanced Causal Inference Techniques · Health Systems, Economic Evaluations, Quality of Life
