Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation

Xi Zhang; Zaiqiao Meng; Jake Lever; Edmond S. L. Ho

arXiv:2412.04954·cs.CV·July 8, 2025

Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation

Xi Zhang, Zaiqiao Meng, Jake Lever, Edmond S. L. Ho

PDF

1 Repo 2 Models 1 Video

TL;DR

This paper presents a specialized visual language model that effectively generates detailed radiology reports from chest X-ray images by aligning vision encoders with a fine-tuned large language model.

Contribution

It introduces a novel radiology-focused visual language model that combines vision encoders with a fine-tuned LLM for accurate report generation from chest X-rays.

Findings

01

Model effectively generates radiology reports from chest X-rays.

02

Two-stage training improves alignment and report accuracy.

03

Demonstrates potential of multimodal LLMs in medical imaging.

Abstract

We introduce a radiology-focused visual language model designed to generate radiology reports from chest X-rays. Building on previous findings that large language models (LLMs) can acquire multimodal capabilities when aligned with pretrained vision encoders, we demonstrate similar potential with chest X-ray images. This integration enhances the ability of model to understand and describe chest X-ray images. Our model combines an image encoder with a fine-tuned LLM based on the Vicuna-7B architecture, enabling it to generate different sections of a radiology report with notable accuracy. The training process involves a two-stage approach: (i) initial alignment of chest X-ray features with the LLM (ii) followed by fine-tuning for radiology report generation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Glasgow-AI4BioMed/RRG-BioNLP-ACL2024
noneOfficial

Models

Videos

Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation· underline