Medical Imaging AI Competitions Lack Fairness

Annika Reinke; Evangelia Christodoulou; Sthuthi Sadananda; A. Emre Kavur; Khrystyna Faryna; Daan Schouten; Bennett A. Landman; Carole Sudre; Olivier Colliot; Nick Heller; Sophie Loizillon; Martin Ma\v{s}ka; Ma\"elys Solal; Arya Yazdan-Panah; Vilma Bozgo; \"Omer S\"umer; Siem de Jong; Sophie Fischer; Michal Kozubek; Tim R\"adsch; Nadim Hammoud; Fruzsina Moln\'ar-G\'abor; Steven Hicks; Michael A. Riegler; Anindo Saha; Vajira Thambawita; Pal Halvorsen; Amelia Jim\'enez-S\'anchez; Qingyang Yang; Veronika Cheplygina; Sabrina Bottazzi; Alexander Seitel; Spyridon Bakas; Alexandros Karargyris; Kiran Vaidhya Venkadesh; Bram van Ginneken; Lena Maier-Hein

arXiv:2512.17581·cs.CV·December 22, 2025

Medical Imaging AI Competitions Lack Fairness

Annika Reinke, Evangelia Christodoulou, Sthuthi Sadananda, A. Emre Kavur, Khrystyna Faryna, Daan Schouten, Bennett A. Landman, Carole Sudre, Olivier Colliot, Nick Heller, Sophie Loizillon, Martin Ma\v{s}ka, Ma\"elys Solal, Arya Yazdan-Panah, Vilma Bozgo, \"Omer S\"umer

PDF

Open Access

TL;DR

This study reveals significant fairness issues in medical imaging AI competitions, showing that datasets often lack representativeness and accessibility, which hampers clinical relevance and reproducibility.

Contribution

The paper provides a comprehensive systematic analysis of 241 challenges, highlighting biases and access issues that undermine the fairness and utility of current benchmarks.

Findings

01

Datasets show geographic and modality biases.

02

Access restrictions limit dataset reuse.

03

Benchmark datasets often lack comprehensive documentation.

Abstract

Benchmarking competitions are central to the development of artificial intelligence (AI) in medical imaging, defining performance standards and shaping methodological progress. However, it remains unclear whether these benchmarks provide data that are sufficiently representative, accessible, and reusable to support clinically meaningful AI. In this work, we assess fairness along two complementary dimensions: (1) whether challenge datasets are representative of real-world clinical diversity, and (2) whether they are accessible and legally reusable in line with the FAIR principles. To address this question, we conducted a large-scale systematic study of 241 biomedical image analysis challenges comprising 458 tasks across 19 imaging modalities. Our findings show substantial biases in dataset composition, including geographic location, modality-, and problem type-related biases, indicating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Radiomics and Machine Learning in Medical Imaging · Adversarial Robustness in Machine Learning