Docs are ROCs: A simple off-the-shelf approach for estimating average human performance in diagnostic studies
Luke Oakden-Rayner, Lyle Palmer

TL;DR
This paper proposes using summary receiver operating characteristic curve analysis as a robust, off-the-shelf method to estimate average human performance in diagnostic studies, addressing inconsistencies in current metrics.
Contribution
It introduces a simple, standardized approach from meta-analysis to better estimate human performance in medical AI research, improving comparability.
Findings
ROC-based estimates provide more consistent human performance metrics.
Application to AI studies demonstrates the method's practicality.
Addresses underestimation issues in traditional metrics.
Abstract
Estimating average human performance has been performed inconsistently in research in diagnostic medicine. This has been particularly apparent in the field of medical artificial intelligence, where humans are often compared against AI models in multi-reader multi-case studies, and commonly reported metrics such as the pooled or average human sensitivity and specificity will systematically underestimate the performance of human experts. We present the use of summary receiver operating characteristic curve analysis, a technique commonly used in the meta-analysis of diagnostic test accuracy studies, as a sensible and methodologically robust alternative. We describe the motivation for using these methods and present results where we apply these meta-analytic techniques to a handful of prominent medical AI studies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Clinical Reasoning and Diagnostic Skills · Healthcare cost, quality, practices
