Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation
Kaito Baba, Risa Kishikawa, Satoshi Kodera

TL;DR
MARL-Rad is a multi-modal reinforcement learning framework that enhances radiology report generation by jointly optimizing region-specific and global agents within the clinical workflow.
Contribution
It introduces a novel multi-agent reinforcement learning approach that directly optimizes report quality in a clinical setting, surpassing fixed LLM-based methods.
Findings
Achieves state-of-the-art scores on RadGraph, CheXbert, and GREEN metrics.
Improves report consistency and detail accuracy.
Clinicians find reports produced by MARL-Rad clinically comparable to ground truth.
Abstract
We propose MARL-Rad, a multi-modal multi-agent reinforcement learning framework for radiology report generation that trains the entire agentic system on policy within its deployed radiology workflow. MARL-Rad addresses the limitation of post-hoc agentization, where fixed LLMs are organized into hand-designed agentic workflows without being optimized for their assigned roles. Our framework decomposes chest X-ray interpretation into region-specific agents and a global integrating agent, and jointly optimizes them using clinically verifiable rewards. Experiments on the MIMIC-CXR and IU X-ray datasets show that MARL-Rad consistently improves clinical efficacy metrics such as RadGraph, CheXbert, and GREEN scores, achieving state-of-the-art clinical efficacy performance. Further analyses show that MARL-Rad improves laterality consistency and produces more accurate and detailed reports. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
