XrayGPT: Chest Radiographs Summarization using Medical Vision-Language   Models

Omkar Thawakar; Abdelrahman Shaker; Sahal Shaji Mullappilly; Hisham; Cholakkal; Rao Muhammad Anwer; Salman Khan; Jorma Laaksonen; Fahad Shahbaz; Khan

arXiv:2306.07971·cs.CV·May 8, 2025·39 cites

XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models

Omkar Thawakar, Abdelrahman Shaker, Sahal Shaji Mullappilly, Hisham, Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Fahad Shahbaz, Khan

PDF

Open Access 1 Repo

TL;DR

XrayGPT is a novel vision-language model designed for analyzing and answering questions about chest radiographs, leveraging medical image encoders and fine-tuned language models to improve radiology report understanding.

Contribution

The paper introduces XrayGPT, combining medical visual encoders with large language models, and generates high-quality radiology report summaries to enhance medical image analysis capabilities.

Findings

01

Enhanced visual conversation abilities in radiology

02

Generated 217k high-quality radiology report summaries

03

Open-source models and demos available

Abstract

The latest breakthroughs in large vision-language models, such as Bard and GPT-4, have showcased extraordinary abilities in performing a wide range of tasks. Such models are trained on massive datasets comprising billions of public image-text pairs with diverse tasks. However, their performance on task-specific domains, such as radiology, is still under-investigated and potentially limited due to a lack of sophistication in understanding biomedical images. On the other hand, conversational medical models have exhibited remarkable success but have mainly focused on text-based analysis. In this paper, we introduce XrayGPT, a novel conversational medical vision-language model that can analyze and answer open-ended questions about chest radiographs. Specifically, we align both medical visual encoder (MedClip) with a fine-tuned large language model (Vicuna), using a simple linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mbzuai-oryx/xraygpt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Biomedical Text Mining and Ontologies

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Layer Normalization · Residual Connection · Softmax · Byte Pair Encoding