Extended sample size calculations for evaluation of prediction models   using a threshold for classification

Rebecca Whittle; Joie Ensor; Lucinda Archer; Gary S. Collins; Paula; Dhiman; Alastair Denniston; Joseph Alderman; Amardeep Legha; Maarten van; Smeden; Karel G. Moons; Jean-Baptiste Cazier; Richard D. Riley; Kym I.E.; Snell

arXiv:2406.19673·stat.ME·July 1, 2024·1 cites

Extended sample size calculations for evaluation of prediction models using a threshold for classification

Rebecca Whittle, Joie Ensor, Lucinda Archer, Gary S. Collins, Paula, Dhiman, Alastair Denniston, Joseph Alderman, Amardeep Legha, Maarten van, Smeden, Karel G. Moons, Jean-Baptiste Cazier, Richard D. Riley, Kym I.E., Snell

PDF

Open Access

TL;DR

This paper extends sample size calculation methods to precisely estimate threshold-based performance measures for prediction models, providing formulas and code to aid external validation studies.

Contribution

It introduces closed-form sample size formulas for threshold-based measures, including accuracy, sensitivity, and PPV, with implementation in Python, R, and Stata.

Findings

01

Sample size for threshold measures can be lower than for calibration slope.

02

Formulas enable precise estimation of performance measures in external validation.

03

Extension to time-to-event outcomes is also discussed.

Abstract

When evaluating the performance of a model for individualised risk prediction, the sample size needs to be large enough to precisely estimate the performance measures of interest. Current sample size guidance is based on precisely estimating calibration, discrimination, and net benefit, which should be the first stage of calculating the minimum required sample size. However, when a clinically important threshold is used for classification, other performance measures can also be used. We extend the previously published guidance to precisely estimate threshold-based performance measures. We have developed closed-form solutions to estimate the sample size required to target sufficiently precise estimates of accuracy, specificity, sensitivity, PPV, NPV, and F1-score in an external evaluation study of a prediction model with a binary outcome. This approach requires the user to pre-specify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSepsis Diagnosis and Treatment · Advanced Causal Inference Techniques · Health Systems, Economic Evaluations, Quality of Life