Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning

Qin Zhou; Guoyan Liang; Qianyi Yang; Jingyuan Chen; Sai Wu; Chang Yao; Zhe Wang

arXiv:2604.13598·cs.LG·April 16, 2026

Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning

Qin Zhou, Guoyan Liang, Qianyi Yang, Jingyuan Chen, Sai Wu, Chang Yao, Zhe Wang

PDF

TL;DR

This paper introduces ESC-RL, a reinforcement learning framework for radiology report generation that uses evidence-aware rewards and self-correcting preference learning to improve clinical faithfulness and alignment.

Contribution

It proposes a novel evidence-aware reward mechanism and a self-correcting preference learning strategy to enhance clinical accuracy and continual self-improvement in report generation.

Findings

01

Achieves state-of-the-art results on two chest X-ray datasets.

02

Demonstrates improved clinical faithfulness and disease alignment.

03

Supports continual self-improvement during training.

Abstract

Recent reinforcement learning (RL) approaches have advanced radiology report generation (RRG), yet two core limitations persist: (1) report-level rewards offer limited evidence-grounded guidance for clinical faithfulness; and (2) current methods lack an explicit self-improving mechanism to align with clinical preference. We introduce clinically aligned Evidence-aware Self-Correcting Reinforcement Learning (ESC-RL), comprising two key components. First, a Group-wise Evidence-aware Alignment Reward (GEAR) delivers group-wise, evidence-aware feedback. GEAR reinforces consistent grounding for true positives, recovers missed findings for false negatives, and suppresses unsupported content for false positives. Second, a Self-correcting Preference Learning (SPL) strategy automatically constructs a reliable, disease-aware preference dataset from multiple noisy observations and leverages an LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.