Confidence Calibration and Predictive Uncertainty Estimation for Deep Medical Image Segmentation
Alireza Mehrtash, William M. Wells III, Clare M. Tempany, Purang, Abolmaesumi, Tina Kapur

TL;DR
This paper investigates the calibration of deep neural networks for medical image segmentation, comparing loss functions, proposing ensembling for better confidence estimates, and evaluating out-of-distribution detection across multiple medical imaging tasks.
Contribution
It introduces a systematic comparison of loss functions, proposes model ensembling for confidence calibration, and evaluates uncertainty estimation and out-of-distribution detection in medical segmentation.
Findings
Ensembling improves confidence calibration.
Dice loss affects uncertainty estimation differently than cross entropy.
Calibrated models better predict segmentation quality and detect OOD examples.
Abstract
Fully convolutional neural networks (FCNs), and in particular U-Nets, have achieved state-of-the-art results in semantic segmentation for numerous medical imaging applications. Moreover, batch normalization and Dice loss have been used successfully to stabilize and accelerate training. However, these networks are poorly calibrated i.e. they tend to produce overconfident predictions both in correct and erroneous classifications, making them unreliable and hard to interpret. In this paper, we study predictive uncertainty estimation in FCNs for medical image segmentation. We make the following contributions: 1) We systematically compare cross entropy loss with Dice loss in terms of segmentation quality and uncertainty estimation of FCNs; 2) We propose model ensembling for confidence calibration of the FCNs trained with batch normalization and Dice loss; 3) We assess the ability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest · Dice Loss · Batch Normalization
