Bridging Vision and Language: Optimal Transport-Driven Radiology Report Generation via LLMs
Haifeng Zhao, Yufei Zhang, Leilei Ma, Shuo Xu, Dengdi Sun

TL;DR
This paper introduces OTDRG, a novel framework that uses Optimal Transport to align X-ray image features with disease labels, enhancing the clinical accuracy and linguistic quality of radiology reports generated by LLMs.
Contribution
The paper proposes a new OT-based alignment method and a disease prediction module to improve the clinical relevance of LLM-generated radiology reports.
Findings
Achieves state-of-the-art results on MIMIC-CXR and IU X-Ray datasets.
Reports are both linguistically coherent and clinically accurate.
Outperforms existing methods in natural language generation and clinical efficacy metrics.
Abstract
Radiology report generation represents a significant application within medical AI, and has achieved impressive results. Concurrently, large language models (LLMs) have demonstrated remarkable performance across various domains. However, empirical validation indicates that general LLMs tend to focus more on linguistic fluency rather than clinical effectiveness, and lack the ability to effectively capture the relationship between X-ray images and their corresponding texts, thus resulting in poor clinical practicability. To address these challenges, we propose Optimal Transport-Driven Radiology Report Generation (OTDRG), a novel framework that leverages Optimal Transport (OT) to align image features with disease labels extracted from reports, effectively bridging the cross-modal gap. The core component of OTDRG is Alignment \& Fine-Tuning, where OT utilizes results from the encoding of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
