Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation
Theodore Barfoot, Luis C. Garcia-Peraza-Herrera, Samet Akcay, Ben Glocker, and Tom Vercauteren

TL;DR
This paper introduces differentiable calibration loss functions for medical image segmentation that improve the reliability of confidence estimates while maintaining segmentation accuracy, aiding clinical decision-making.
Contribution
It proposes novel mL1-ACE loss formulations and dataset reliability histograms to enhance calibration in medical image segmentation models.
Findings
Soft-binned mL1-ACE improves calibration but may reduce segmentation accuracy.
Hard-binned mL1-ACE maintains segmentation performance with weaker calibration gains.
Calibration errors are significantly reduced across four medical imaging datasets.
Abstract
Deep neural networks for medical image segmentation are often overconfident, compromising both reliability and clinical utility. In this work, we propose differentiable formulations of marginal L1 Average Calibration Error (mL1-ACE) as an auxiliary loss that can be computed on a per-image basis. We compare both hard- and soft-binning approaches to directly improve pixel-wise calibration. Our experiments on four datasets (ACDC, AMOS, KiTS, BraTS) demonstrate that incorporating mL1-ACE significantly reduces calibration errors, particularly Average Calibration Error (ACE) and Maximum Calibration Error (MCE), while largely maintaining high Dice Similarity Coefficients (DSCs). We find that the soft-binned variant yields the greatest improvements in calibration over the DSC plus cross-entropy loss baseline but often compromises segmentation performance, with hard-binned mL1-ACE maintaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
