Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning

Benjamin Gundersen; Nicolas Deperrois; Samuel Ruiperez-Campillo; Thomas M. Sutter; Julia E. Vogt; Michael Moor; Farhad Nooralahzadeh; Michael Krauthammer

arXiv:2512.10691·cs.AI·December 12, 2025

Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning

Benjamin Gundersen, Nicolas Deperrois, Samuel Ruiperez-Campillo, Thomas M. Sutter, Julia E. Vogt, Michael Moor, Farhad Nooralahzadeh, Michael Krauthammer

PDF

Open Access

TL;DR

This paper demonstrates that reinforcement learning, combined with task-specific rewards, enhances radiology report generation and visual grounding in vision-language models, outperforming traditional supervised fine-tuning methods.

Contribution

The study introduces a novel RL approach with clinically grounded rewards to improve medical vision-language models, showing significant performance gains over baseline methods.

Findings

01

RL provides additional performance gains beyond supervised fine-tuning.

02

Explicit thinking does not significantly improve results in this context.

03

RL-optimized models achieve state-of-the-art performance on report generation and grounding.

Abstract

Recent advances in vision-language models (VLMs) have improved Chest X-ray (CXR) interpretation in multiple aspects. However, many medical VLMs rely solely on supervised fine-tuning (SFT), which optimizes next-token prediction without evaluating answer quality. In contrast, reinforcement learning (RL) can incorporate task-specific feedback, and its combination with explicit intermediate reasoning ("thinking") has demonstrated substantial gains on verifiable math and coding tasks. To investigate the effects of RL and thinking in a CXR VLM, we perform large-scale SFT on CXR data to build an updated RadVLM based on Qwen3-VL, followed by a cold-start SFT stage that equips the model with basic thinking ability. We then apply Group Relative Policy Optimization (GRPO) with clinically grounded, task-specific rewards for report generation and visual grounding, and run matched RL experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Machine Learning in Healthcare · Domain Adaptation and Few-Shot Learning