Towards a Guideline for Evaluation Metrics in Medical Image Segmentation
Dominik M\"uller, I\~naki Soto-Rey, Frank Kramer

TL;DR
This paper reviews and interprets key evaluation metrics for medical image segmentation, highlighting issues with current practices and proposing standardized guidelines to enhance evaluation quality and comparability.
Contribution
It provides a comprehensive overview and interpretation guide for segmentation metrics and proposes a standardized evaluation guideline for medical image segmentation research.
Findings
Current evaluation practices often contain statistical biases.
The paper offers a unified interpretation of multiple segmentation metrics.
Proposes a guideline to improve evaluation consistency and reproducibility.
Abstract
In the last decade, research on artificial intelligence has seen rapid growth with deep learning models, especially in the field of medical image segmentation. Various studies demonstrated that these models have powerful prediction capabilities and achieved similar results as clinicians. However, recent studies revealed that the evaluation in image segmentation studies lacks reliable model performance assessment and showed statistical bias by incorrect metric implementation or usage. Thus, this work provides an overview and interpretation guide on the following metrics for medical image segmentation evaluation in binary as well as multi-class problems: Dice similarity coefficient, Jaccard, Sensitivity, Specificity, Rand index, ROC curves, Cohen's Kappa, and Hausdorff distance. As a summary, we propose a guideline for standardized medical image segmentation evaluation to improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Artificial Intelligence in Healthcare and Education
