Bridging Vision and Language: Optimal Transport-Driven Radiology Report Generation via LLMs

Haifeng Zhao; Yufei Zhang; Leilei Ma; Shuo Xu; Dengdi Sun

arXiv:2507.03908·cs.CV·July 8, 2025

Bridging Vision and Language: Optimal Transport-Driven Radiology Report Generation via LLMs

Haifeng Zhao, Yufei Zhang, Leilei Ma, Shuo Xu, Dengdi Sun

PDF

TL;DR

This paper introduces OTDRG, a novel framework that uses Optimal Transport to align X-ray image features with disease labels, enhancing the clinical accuracy and linguistic quality of radiology reports generated by LLMs.

Contribution

The paper proposes a new OT-based alignment method and a disease prediction module to improve the clinical relevance of LLM-generated radiology reports.

Findings

01

Achieves state-of-the-art results on MIMIC-CXR and IU X-Ray datasets.

02

Reports are both linguistically coherent and clinically accurate.

03

Outperforms existing methods in natural language generation and clinical efficacy metrics.

Abstract

Radiology report generation represents a significant application within medical AI, and has achieved impressive results. Concurrently, large language models (LLMs) have demonstrated remarkable performance across various domains. However, empirical validation indicates that general LLMs tend to focus more on linguistic fluency rather than clinical effectiveness, and lack the ability to effectively capture the relationship between X-ray images and their corresponding texts, thus resulting in poor clinical practicability. To address these challenges, we propose Optimal Transport-Driven Radiology Report Generation (OTDRG), a novel framework that leverages Optimal Transport (OT) to align image features with disease labels extracted from reports, effectively bridging the cross-modal gap. The core component of OTDRG is Alignment \& Fine-Tuning, where OT utilizes results from the encoding of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.