Challenge Results Are Not Reproducible
Annika Reinke, Georg Grab, Lena Maier-Hein

TL;DR
This paper investigates the reproducibility of challenge results in medical image analysis, revealing significant discrepancies between original challenge rankings and reimplementations, thus questioning their reliability.
Contribution
It provides an empirical analysis of the reproducibility issues in medical image analysis challenges, highlighting the need for improved standardization and reporting.
Findings
Reproduced algorithms showed different rankings from original challenge results.
Discrepancies suggest challenge outcomes are not reliably reproducible.
Reproducibility issues may impact clinical and research decisions.
Abstract
While clinical trials are the state-of-the-art methods to assess the effect of new medication in a comparative manner, benchmarking in the field of medical image analysis is performed by so-called challenges. Recently, comprehensive analysis of multiple biomedical image analysis challenges revealed large discrepancies between the impact of challenges and quality control of the design and reporting standard. This work aims to follow up on these results and attempts to address the specific question of the reproducibility of the participants methods. In an effort to determine whether alternative interpretations of the method description may change the challenge ranking, we reproduced the algorithms submitted to the 2019 Robust Medical Image Segmentation Challenge (ROBUST-MIS). The leaderboard differed substantially between the original challenge and reimplementation, indicating that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Artificial Intelligence in Healthcare and Education · Machine Learning in Materials Science
