XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models
Omkar Thawakar, Abdelrahman Shaker, Sahal Shaji Mullappilly, Hisham, Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Fahad Shahbaz, Khan

TL;DR
XrayGPT is a novel vision-language model designed for analyzing and answering questions about chest radiographs, leveraging medical image encoders and fine-tuned language models to improve radiology report understanding.
Contribution
The paper introduces XrayGPT, combining medical visual encoders with large language models, and generates high-quality radiology report summaries to enhance medical image analysis capabilities.
Findings
Enhanced visual conversation abilities in radiology
Generated 217k high-quality radiology report summaries
Open-source models and demos available
Abstract
The latest breakthroughs in large vision-language models, such as Bard and GPT-4, have showcased extraordinary abilities in performing a wide range of tasks. Such models are trained on massive datasets comprising billions of public image-text pairs with diverse tasks. However, their performance on task-specific domains, such as radiology, is still under-investigated and potentially limited due to a lack of sophistication in understanding biomedical images. On the other hand, conversational medical models have exhibited remarkable success but have mainly focused on text-based analysis. In this paper, we introduce XrayGPT, a novel conversational medical vision-language model that can analyze and answer open-ended questions about chest radiographs. Specifically, we align both medical visual encoder (MedClip) with a fine-tuned large language model (Vicuna), using a simple linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Biomedical Text Mining and Ontologies
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Layer Normalization · Residual Connection · Softmax · Byte Pair Encoding
