Towards a clinically accessible radiology foundation model: open-access   and lightweight, with automated evaluation

Juan Manuel Zambrano Chaves; Shih-Cheng Huang; Yanbo Xu; Hanwen Xu,; Naoto Usuyama; Sheng Zhang; Fei Wang; Yujia Xie; Mahmoud Khademi; Ziyi Yang,; Hany Awadalla; Julia Gong; Houdong Hu; Jianwei Yang; Chunyuan Li; Jianfeng; Gao; Yu Gu; Cliff Wong; Mu Wei; Tristan Naumann; Muhao Chen; Matthew P.; Lungren; Akshay Chaudhari; Serena Yeung-Levy; Curtis P. Langlotz; Sheng Wang,; Hoifung Poon

arXiv:2403.08002·cs.CL·April 3, 2025·3 cites

Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

Juan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, Hanwen Xu,, Naoto Usuyama, Sheng Zhang, Fei Wang, Yujia Xie, Mahmoud Khademi, Ziyi Yang,, Hany Awadalla, Julia Gong, Houdong Hu, Jianwei Yang, Chunyuan Li, Jianfeng, Gao, Yu Gu, Cliff Wong, Mu Wei, Tristan Naumann

PDF

Open Access 1 Repo 2 Models

TL;DR

This paper presents LlaVA-Rad, a lightweight, open-source multimodal radiology model trained on extensive data, achieving state-of-the-art clinical task performance with efficient inference suitable for real-world medical settings.

Contribution

The development of a small, open-source multimodal radiology model with a modular training approach and a new GPT-4-based factuality evaluation metric.

Findings

01

LlaVA-Rad outperforms larger models like GPT-4V and Med-PaLM M on radiology tasks.

02

The model achieves high accuracy in report generation and retrieval.

03

Inference can be done efficiently on a single GPU.

Abstract

The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics. Frontier general-domain models such as GPT-4V still have significant performance gaps in multimodal biomedical applications. More importantly, less-acknowledged pragmatic issues, including accessibility, model cost, and tedious manual evaluation make it hard for clinicians to use state-of-the-art large models directly on private patient data. Here, we explore training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology. To maximize data efficiency, we adopt a modular approach by incorporating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/llava-rad
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiology practices and education

MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Layer Normalization · Absolute Position Encodings · Residual Connection · Dropout · Softmax · Linear Layer · Multi-Head Attention