Calibrated Confidence Expression for Radiology Report Generation

David Bani-Harouni; Chantal Pellegrini; Julian L\"uers; Su Hwan Kim; Markus Baalmann; Benedikt Wiestler; Rickmer Braren; Nassir Navab; Matthias Keicher

arXiv:2603.29492·cs.CL·April 1, 2026

Calibrated Confidence Expression for Radiology Report Generation

David Bani-Harouni, Chantal Pellegrini, Julian L\"uers, Su Hwan Kim, Markus Baalmann, Benedikt Wiestler, Rickmer Braren, Nassir Navab, Matthias Keicher

PDF

TL;DR

This paper presents ConRad, a reinforcement learning framework that calibrates confidence estimates in radiology report generation by medical vision-language models, enhancing safety and interpretability.

Contribution

It introduces a novel RL-based method for producing calibrated verbalized confidence in radiology reports, improving over existing models and aligning with clinical judgment.

Findings

01

ConRad significantly improves calibration of confidence estimates.

02

Report-level scores align well with clinicians' judgments.

03

Supports safer AI-assisted report generation through targeted review.

Abstract

Safe deployment of Large Vision-Language Models (LVLMs) in radiology report generation requires not only accurate predictions but also clinically interpretable indicators of when outputs should be thoroughly reviewed, enabling selective radiologist verification and reducing the risk of hallucinated findings influencing clinical decisions. One intuitive approach to this is verbalized confidence, where the model explicitly states its certainty. However, current state-of-the-art language models are often overconfident, and research on calibration in multimodal settings such as radiology report generation is limited. To address this gap, we introduce ConRad (Confidence Calibration for Radiology Reports), a reinforcement learning framework for fine-tuning medical LVLMs to produce calibrated verbalized confidence estimates alongside radiology reports. We study two settings: a single…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.