CXR-LLAVA: a multimodal large language model for interpreting chest   X-ray images

Seowoo Lee; Jiwon Youn; Hyungjin Kim; Mansu Kim; Soon Ho Yoon

arXiv:2310.18341·cs.CL·January 17, 2024·5 cites

CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images

Seowoo Lee, Jiwon Youn, Hyungjin Kim, Mansu Kim, Soon Ho Yoon

PDF

Open Access 1 Repo 3 Models

TL;DR

CXR-LLAVA is an open-source multimodal large language model designed for interpreting chest X-ray images, demonstrating competitive diagnostic performance and potential for autonomous radiology reporting, thus advancing AI applications in medical imaging.

Contribution

This work introduces CXR-LLAVA, a novel multimodal LLM trained on extensive CXR datasets, integrating vision transformers with language models for improved radiology interpretation.

Findings

01

Achieved an average F1 score of 0.81 on internal test set

02

Surpassed GPT-4-vision and Gemini-Pro-Vision in performance

03

Achieved 72.7% success rate in autonomous reporting

Abstract

Purpose: This study aimed to develop an open-source multimodal large language model (CXR-LLAVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists Materials and Methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLAVA network. Then, the model was fine-tuned, primarily using Dataset 2. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists, to gauge its potential for autonomous reporting. Results: The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ecofri/cxr_llava
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging · COVID-19 diagnosis using AI · Topic Modeling

MethodsSparse Evolutionary Training · Multi-Head Attention · Attention Is All You Need · Vision Transformer · Linear Layer · Dropout · Layer Normalization · Label Smoothing · Byte Pair Encoding · Dense Connections