Why rankings of biomedical image analysis competitions should be interpreted with care
Lena Maier-Hein, Matthias Eisenmann, Annika Reinke, Sinan Onogur,, Marko Stankovic, Patrick Scholz, Tal Arbel, Hrvoje Bogunovic, Andrew P., Bradley, Aaron Carass, Carolin Feldmann, Alejandro F. Frangi, Peter M. Full,, Bram van Ginneken, Allan Hanbury, Katrin Honauer

TL;DR
This paper critically examines biomedical image analysis challenges, highlighting issues with reproducibility and robustness of rankings, and proposes guidelines to improve challenge practices and result interpretation.
Contribution
It provides a comprehensive analysis of current challenge practices, identifies critical issues affecting reproducibility and robustness, and offers recommendations for better challenge organization.
Findings
Reproducibility is often hampered by limited information sharing.
Algorithm rankings are sensitive to test data and annotation variability.
Current challenge practices lack sufficient quality control.
Abstract
International challenges have become the standard for validation of biomedical image analysis methods. Given their scientific impact, it is surprising that a critical analysis of common practices related to the organization of challenges has not yet been performed. In this paper, we present a comprehensive analysis of biomedical image analysis challenges conducted up to now. We demonstrate the importance of challenges and show that the lack of quality control has critical consequences. First, reproducibility and interpretation of the results is often hampered as only a fraction of relevant information is typically provided. Second, the rank of an algorithm is generally not robust to a number of variables such as the test data used for validation, the ranking scheme applied and the observers that make the reference annotations. To overcome these problems, we recommend best practice…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
