Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation
Juan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, Hanwen Xu,, Naoto Usuyama, Sheng Zhang, Fei Wang, Yujia Xie, Mahmoud Khademi, Ziyi Yang,, Hany Awadalla, Julia Gong, Houdong Hu, Jianwei Yang, Chunyuan Li, Jianfeng, Gao, Yu Gu, Cliff Wong, Mu Wei, Tristan Naumann

TL;DR
This paper presents LlaVA-Rad, a lightweight, open-source multimodal radiology model trained on extensive data, achieving state-of-the-art clinical task performance with efficient inference suitable for real-world medical settings.
Contribution
The development of a small, open-source multimodal radiology model with a modular training approach and a new GPT-4-based factuality evaluation metric.
Findings
LlaVA-Rad outperforms larger models like GPT-4V and Med-PaLM M on radiology tasks.
The model achieves high accuracy in report generation and retrieval.
Inference can be done efficiently on a single GPU.
Abstract
The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics. Frontier general-domain models such as GPT-4V still have significant performance gaps in multimodal biomedical applications. More importantly, less-acknowledged pragmatic issues, including accessibility, model cost, and tedious manual evaluation make it hard for clinicians to use state-of-the-art large models directly on private patient data. Here, we explore training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology. To maximize data efficiency, we adopt a modular approach by incorporating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiology practices and education
MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Layer Normalization · Absolute Position Encodings · Residual Connection · Dropout · Softmax · Linear Layer · Multi-Head Attention
