TL;DR
This paper evaluates post-training calibration methods for deep neural networks in medical image segmentation, showing they can improve confidence score calibration and are competitive with MC dropout, with varied results across methods.
Contribution
It introduces and compares several straightforward post hoc calibration techniques, including novel methods, for neural networks trained with different loss functions in medical segmentation.
Findings
Post hoc calibration improves confidence scores in segmentation models.
Models trained with soft Dice loss are not necessarily less calibrated than those trained with cross-entropy.
Calibration methods are competitive with MC dropout, but subject-level variance remains similar.
Abstract
Neural networks for automated image segmentation are typically trained to achieve maximum accuracy, while less attention has been given to the calibration of their confidence scores. However, well-calibrated confidence scores provide valuable information towards the user. We investigate several post hoc calibration methods that are straightforward to implement, some of which are novel. They are compared to Monte Carlo (MC) dropout. They are applied to neural networks trained with cross-entropy (CE) and soft Dice (SD) losses on BraTS 2018 and ISLES 2018. Surprisingly, models trained on SD loss are not necessarily less calibrated than those trained on CE loss. In all cases, at least one post hoc method improves the calibration. There is limited consistency across the results, so we can't conclude on one method being superior. In all cases, post hoc calibration is competitive with MC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
