Multi-Rater Calibrated Segmentation Models
Meritxell Riera-Mar\'in, Javier Garc\'ia L\'opez, J\'ulia Rodr\'iguez-Comas, Miguel A. Gonz\'alez Ballester, Adrian Galdran

TL;DR
This paper introduces a novel ordinal learning approach to improve the calibration of medical image segmentation models by leveraging inter-rater agreement as ordered information.
Contribution
It reformulates multi-rater supervision as an ordinal learning problem, enhancing probabilistic calibration without sacrificing segmentation accuracy.
Findings
Ordinal-aware training significantly improves calibration across multiple benchmarks.
The approach maintains high segmentation accuracy while better reflecting annotation uncertainty.
Calibration metrics show consistent improvement over existing methods.
Abstract
Objective: Accurate probability estimates are essential for the safe deployment of medical image segmentation models in clinical decision-making. However, modern deep segmentation networks are often poorly calibrated, a problem exacerbated when multiple expert annotations exhibit substantial disagreement. While inter-rater variability is typically treated as noise, it provides valuable information about intrinsic annotation ambiguity that must be reflected in model confidence. Methods: We improve the probabilistic calibration of medical image segmentation models by reformulating multi-rater supervision as an ordinal learning problem. Voxel-wise annotator agreement is treated as an ordered target, linking predictive confidence to the empirical variability in training data. This formulation allows the use of ordinal-aware scoring rules, such as the Ranked Probability Score ordinal loss,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
