Label fusion and training methods for reliable representation of inter-rater uncertainty
Andreanne Lemay, Charley Gros, Enamundram Naga Karthik, Julien, Cohen-Adad

TL;DR
This paper compares label fusion methods for training deep learning models on inter-rater annotated medical data, demonstrating that SoftSeg frameworks improve calibration and variability preservation over conventional methods.
Contribution
It introduces a comprehensive comparison of three label fusion techniques within SoftSeg and conventional frameworks, highlighting SoftSeg's advantages in calibration and uncertainty representation.
Findings
SoftSeg models outperform conventional models in calibration.
Averaging labels with SoftSeg leads to better uncertainty estimation.
Best label fusion method varies by dataset.
Abstract
Medical tasks are prone to inter-rater variability due to multiple factors such as image quality, professional experience and training, or guideline clarity. Training deep learning networks with annotations from multiple raters is a common practice that mitigates the model's bias towards a single expert. Reliable models generating calibrated outputs and reflecting the inter-rater disagreement are key to the integration of artificial intelligence in clinical practice. Various methods exist to take into account different expert labels. We focus on comparing three label fusion methods: STAPLE, average of the rater's segmentation, and random sampling of each rater's segmentation during training. Each label fusion method is studied using both the conventional training framework and the recently published SoftSeg framework that limits information loss by treating the segmentation task as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Explainable Artificial Intelligence (XAI) · Reliability and Agreement in Measurement
