Confidence Matters: Uncertainty Quantification and Precision Assessment of Deep Learning-based CMR Biomarker Estimates Using Scan-rescan Data

Dewmini Hasara Wickremasinghe; Michelle Gibogwe; Andrew Bell; Esther Puyol-Ant\'on; Muhummad Sohaib Nazir; Reza Razavi; Bruno Paun; Paul Aljabar; Andrew P. King

arXiv:2603.26789·cs.CV·March 31, 2026

Confidence Matters: Uncertainty Quantification and Precision Assessment of Deep Learning-based CMR Biomarker Estimates Using Scan-rescan Data

Dewmini Hasara Wickremasinghe, Michelle Gibogwe, Andrew Bell, Esther Puyol-Ant\'on, Muhummad Sohaib Nazir, Reza Razavi, Bruno Paun, Paul Aljabar, Andrew P. King

PDF

TL;DR

This paper evaluates uncertainty estimation techniques in deep learning for cardiac MRI biomarker analysis, emphasizing the importance of distribution-based metrics over traditional accuracy measures for assessing scan-rescan precision.

Contribution

It introduces new distribution-based metrics for biomarker precision assessment and demonstrates their importance in evaluating deep learning model reliability.

Findings

01

High accuracy (Dice 87%) achieved on external datasets.

02

Distribution metrics showed less than 45% overlap in confidence intervals.

03

Significant differences in biomarkers between scan and rescan in over 65% of cases.

Abstract

The performance of deep learning (DL) methods for the analysis of cine cardiovascular magnetic resonance (CMR) is typically assessed in terms of accuracy, overlooking precision. In this work, uncertainty estimation techniques, namely deep ensemble, test-time augmentation, and Monte Carlo dropout, are applied to a state-of-the-art DL pipeline for cardiac functional biomarker estimation, and new distribution-based metrics are proposed for the assessment of biomarker precision. The model achieved high accuracy (average Dice 87%) and point estimate precision on two external validation scan-rescan CMR datasets. However, distribution-based metrics showed that the overlap between scan/rescan confidence intervals was >50% in less than 45% of the cases. Statistical similarity tests between scan and rescan biomarkers also resulted in significant differences for over 65% of the cases. We conclude…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.