CXRMate-2: Structured Multimodal Temporal Embeddings and Tractable Reinforcement Learning for Clinically Acceptable Chest X-ray Radiology Report Generation
Aaron Nicolson, Elizabeth J. Cooper, Hwan-Jin Yoon, Claire McCafferty, Ramya Krishnan, Michelle Craigie, Nivene Saad, Jason Dowling, Ian A. Scott, and Bevan Koopman

TL;DR
CXRMate-2 is a novel CXR report generation model that combines structured multimodal embeddings and reinforcement learning to improve clinical relevance and radiologist acceptance.
Contribution
The paper introduces CXRMate-2, a new model that uses structured embeddings and RL for more clinically acceptable radiology report generation.
Findings
CXRMate-2 outperforms benchmarks on multiple datasets.
Generated reports are acceptable in 45% of cases compared to radiologists.
Radiologists preferred reports for readability, with similar acceptance rates for most findings.
Abstract
Chest X-ray (CXR) radiology report generation (RRG) models have shown rapid progress on automated metrics, yet their clinical utility remains uncertain due to limited qualitative evaluation by radiologists. We present CXRMate-2, a state-of-the-art CXR RRG model that enables tractable reinforcement learning (RL) through structured multimodal temporal embeddings and high-resolution visual feature compression, for efficient, unified conditioning of an LLM decoder on visual, textual, and temporal context from a study and its prior. This enables group relative policy optimisation (GRPO), where a proposed reward function is used to improve semantic alignment with radiologist reports. Across the MIMIC-CXR, CheXpert Plus, and ReXgradient datasets, CXRMate-2 achieves statistically significant improvements over strong benchmarks, including gains of 11.2% and 24.4% in GREEN and RadGraph-XL,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
