How inter-rater variability relates to aleatoric and epistemic   uncertainty: a case study with deep learning-based paraspinal muscle   segmentation

Parinaz Roshanzamir; Hassan Rivaz; Joshua Ahn; Hamza Mirza; Neda; Naghdi; Meagan Anstruther; Michele C. Batti\'e; Maryse Fortin; and Yiming; Xiao

arXiv:2308.06964·eess.IV·August 15, 2023·1 cites

How inter-rater variability relates to aleatoric and epistemic uncertainty: a case study with deep learning-based paraspinal muscle segmentation

Parinaz Roshanzamir, Hassan Rivaz, Joshua Ahn, Hamza Mirza, Neda, Naghdi, Meagan Anstruther, Michele C. Batti\'e, Maryse Fortin, and Yiming, Xiao

PDF

Open Access

TL;DR

This study investigates how inter-rater variability influences uncertainty in deep learning-based medical image segmentation, comparing different models and label fusion strategies to improve reliability in clinical applications.

Contribution

It provides a detailed analysis of the relationship between inter-rater variability and model uncertainties, comparing Transformer-based and CNN models with various label fusion methods.

Findings

01

Inter-rater variability affects both aleatoric and epistemic uncertainties.

02

Transformers like TransUNet influence uncertainty differently than CNNs.

03

Label fusion strategies impact the reliability of segmentation models.

Abstract

Recent developments in deep learning (DL) techniques have led to great performance improvement in medical image segmentation tasks, especially with the latest Transformer model and its variants. While labels from fusing multi-rater manual segmentations are often employed as ideal ground truths in DL model training, inter-rater variability due to factors such as training bias, image noise, and extreme anatomical variability can still affect the performance and uncertainty of the resulting algorithms. Knowledge regarding how inter-rater variability affects the reliability of the resulting DL algorithms, a key element in clinical deployment, can help inform better training data construction and DL models, but has not been explored extensively. In this paper, we measure aleatoric and epistemic uncertainties using test-time augmentation (TTA), test-time dropout (TTD), and deep ensemble to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Layer Normalization · Adam · Softmax · Label Smoothing · Position-Wise Feed-Forward Layer · Residual Connection