Recommendations on test datasets for evaluating AI solutions in pathology
Andr\'e Homeyer, Christian Gei{\ss}ler, Lars Ole Schwen, Falk, Zakrzewski, Theodore Evans, Klaus Strohmenger, Max Westphal, Roman David, B\"ulow, Michaela Kargl, Aray Karjauv, Isidre Munn\'e-Bertran, Carl Orge, Retzlaff, Adri\`a Romero-L\'opez, Tomasz So{\l}tysi\'nski

TL;DR
This paper provides comprehensive recommendations for creating and reporting test datasets to evaluate AI solutions in pathology, aiming to improve diagnostic accuracy, regulatory approval, and clinical integration.
Contribution
It offers the first detailed, consensus-based guidelines on test dataset collection, reporting, and regulatory considerations for AI in pathology.
Findings
Guidelines on dataset size and composition
Strategies for handling low-prevalence cases
Methods for bias detection and reporting standards
Abstract
Artificial intelligence (AI) solutions that automatically extract information from digital histology images have shown great promise for improving pathological diagnosis. Prior to routine use, it is important to evaluate their predictive performance and obtain regulatory approval. This assessment requires appropriate test datasets. However, compiling such datasets is challenging and specific recommendations are missing. A committee of various stakeholders, including commercial AI developers, pathologists, and researchers, discussed key aspects and conducted extensive literature reviews on test datasets in pathology. Here, we summarize the results and derive general recommendations for the collection of test datasets. We address several questions: Which and how many images are needed? How to deal with low-prevalence subsets? How can potential bias be detected? How should datasets be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
