R2GenGPT: Radiology Report Generation with Frozen LLMs
Zhanyu Wang, Lingqiao Liu, Lei Wang, Luping Zhou

TL;DR
R2GenGPT leverages frozen large language models with a visual alignment module to generate radiology reports efficiently, achieving state-of-the-art results with minimal additional training.
Contribution
The paper introduces R2GenGPT, a novel method that aligns visual features with LLMs using a lightweight module, enabling effective radiology report generation without fine-tuning the entire model.
Findings
Achieves state-of-the-art performance in R2Gen tasks.
Requires training only 5 million parameters, 0.07% of total.
Demonstrates high training efficiency and rapid convergence.
Abstract
Large Language Models (LLMs) have consistently showcased remarkable generalization capabilities when applied to various language tasks. Nonetheless, harnessing the full potential of LLMs for Radiology Report Generation (R2Gen) still presents a challenge, stemming from the inherent disparity in modality between LLMs and the R2Gen task. To bridge this gap effectively, we propose R2GenGPT, which is a novel solution that aligns visual features with the word embedding space of LLMs using an efficient visual alignment module. This innovative approach empowers the previously static LLM to seamlessly integrate and process image information, marking a step forward in optimizing R2Gen performance. R2GenGPT offers the following benefits. First, it attains state-of-the-art (SOTA) performance by training only the lightweight visual alignment module while freezing all the parameters of LLM. Second,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
