Performance evaluation of predictive AI models to support medical decisions: Overview and guidance
Ben Van Calster, Gary S. Collins, Andrew J. Vickers, Laure Wynants,, Kathleen F. Kerr, Lasai Barre\~nada, Gael Varoquaux, Karandeep Singh, Karel, G. M. Moons, Tina Hernandez-boussard, Dirk Timmerman, David J. Mclernon,, Maarten Van Smeden

TL;DR
This paper reviews and guides the selection of performance measures for binary predictive AI models in medicine, emphasizing proper measures and graphical assessments to ensure safe and effective clinical decision support.
Contribution
It provides a comprehensive evaluation of 32 performance measures, highlighting their properties and recommending key measures and plots for medical AI validation.
Findings
17 measures are both proper and decision-analytic
Classification accuracy and F1 are improper for clinical thresholds
Recommended measures include AUROC, calibration plot, and net benefit
Abstract
A myriad of measures to illustrate performance of predictive artificial intelligence (AI) models have been proposed in the literature. Selecting appropriate performance measures is essential for predictive AI models that are developed to be used in medical practice, because poorly performing models may harm patients and lead to increased costs. We aim to assess the merits of classic and contemporary performance measures when validating predictive AI models for use in medical practice. We focus on models with a binary outcome. We discuss 32 performance measures covering five performance domains (discrimination, calibration, overall, classification, and clinical utility) along with accompanying graphical assessments. The first four domains cover statistical performance, the fifth domain covers decision-analytic performance. We explain why two key characteristics are important when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
