Multi-Rater Calibrated Segmentation Models

Meritxell Riera-Mar\'in; Javier Garc\'ia L\'opez; J\'ulia Rodr\'iguez-Comas; Miguel A. Gonz\'alez Ballester; Adrian Galdran

arXiv:2605.02437·cs.CV·May 5, 2026

Multi-Rater Calibrated Segmentation Models

Meritxell Riera-Mar\'in, Javier Garc\'ia L\'opez, J\'ulia Rodr\'iguez-Comas, Miguel A. Gonz\'alez Ballester, Adrian Galdran

PDF

TL;DR

This paper introduces a novel ordinal learning approach to improve the calibration of medical image segmentation models by leveraging inter-rater agreement as ordered information.

Contribution

It reformulates multi-rater supervision as an ordinal learning problem, enhancing probabilistic calibration without sacrificing segmentation accuracy.

Findings

01

Ordinal-aware training significantly improves calibration across multiple benchmarks.

02

The approach maintains high segmentation accuracy while better reflecting annotation uncertainty.

03

Calibration metrics show consistent improvement over existing methods.

Abstract

Objective: Accurate probability estimates are essential for the safe deployment of medical image segmentation models in clinical decision-making. However, modern deep segmentation networks are often poorly calibrated, a problem exacerbated when multiple expert annotations exhibit substantial disagreement. While inter-rater variability is typically treated as noise, it provides valuable information about intrinsic annotation ambiguity that must be reflected in model confidence. Methods: We improve the probabilistic calibration of medical image segmentation models by reformulating multi-rater supervision as an ordinal learning problem. Voxel-wise annotator agreement is treated as an ordered target, linking predictive confidence to the empirical variability in training data. This formulation allows the use of ordinal-aware scoring rules, such as the Ranked Probability Score ordinal loss,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.