Learnable Retrieval Enhanced Visual-Text Alignment and Fusion for Radiology Report Generation
Qin Zhou, Guoyan Liang, Xindi Li, Jingyuan Chen, Wang Zhe, Chang Yao, Sai Wu

TL;DR
REVTAF introduces a novel framework that enhances radiology report generation by combining learnable retrieval, fine-grained visual-text alignment, and dynamic fusion, effectively addressing class imbalance and improving cross-modal integration.
Contribution
The paper proposes REVTAF, a new framework integrating adaptive retrieval and optimal transport-based alignment for improved report generation under weak supervision.
Findings
Outperforms state-of-the-art methods with 7.4% improvement on MIMIC-CXR
Achieves 2.9% higher accuracy on IU X-Ray dataset
Demonstrates superiority over mainstream multimodal LLMs in radiology report tasks
Abstract
Automated radiology report generation is essential for improving diagnostic efficiency and reducing the workload of medical professionals. However, existing methods face significant challenges, such as disease class imbalance and insufficient cross-modal fusion. To address these issues, we propose the learnable Retrieval Enhanced Visual-Text Alignment and Fusion (REVTAF) framework, which effectively tackles both class imbalance and visual-text fusion in report generation. REVTAF incorporates two core components: (1) a Learnable Retrieval Enhancer (LRE) that utilizes semantic hierarchies from hyperbolic space and intra-batch context through a ranking-based metric. LRE adaptively retrieves the most relevant reference reports, enhancing image representations, particularly for underrepresented (tail) class inputs; and (2) a fine-grained visual-text alignment and fusion strategy that ensures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
