CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images
Seowoo Lee, Jiwon Youn, Hyungjin Kim, Mansu Kim, Soon Ho Yoon

TL;DR
CXR-LLAVA is an open-source multimodal large language model designed for interpreting chest X-ray images, demonstrating competitive diagnostic performance and potential for autonomous radiology reporting, thus advancing AI applications in medical imaging.
Contribution
This work introduces CXR-LLAVA, a novel multimodal LLM trained on extensive CXR datasets, integrating vision transformers with language models for improved radiology interpretation.
Findings
Achieved an average F1 score of 0.81 on internal test set
Surpassed GPT-4-vision and Gemini-Pro-Vision in performance
Achieved 72.7% success rate in autonomous reporting
Abstract
Purpose: This study aimed to develop an open-source multimodal large language model (CXR-LLAVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists Materials and Methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLAVA network. Then, the model was fine-tuned, primarily using Dataset 2. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists, to gauge its potential for autonomous reporting. Results: The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · COVID-19 diagnosis using AI · Topic Modeling
MethodsSparse Evolutionary Training · Multi-Head Attention · Attention Is All You Need · Vision Transformer · Linear Layer · Dropout · Layer Normalization · Label Smoothing · Byte Pair Encoding · Dense Connections
