The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models
Daniel P. Jeong, Pranav Mani, Saurabh Garg, Zachary C. Lipton, Michael Oberst

TL;DR
This study critically evaluates whether domain-specific pretraining of large language and vision-language models genuinely enhances medical question-answering performance, finding limited or inconsistent improvements over base models.
Contribution
It provides a comprehensive comparison of medical versus base models, revealing that domain adaptation often does not lead to significant performance gains in medical tasks.
Findings
Medical models rarely outperform base models in medical QA.
Performance improvements are inconsistent and often statistically insignificant.
General-purpose models may already possess substantial medical knowledge.
Abstract
Several recent works seek to adapt general-purpose large language models (LLMs) and vision-language models (VLMs) for medical applications through continued pretraining on publicly available biomedical corpora. These works typically claim that such domain-adaptive pretraining improves performance on various downstream medical tasks, such as answering medical exam questions. In this paper, we compare ten "medical" LLMs and two VLMs against their corresponding base models, arriving at a different conclusion: all medical VLMs and nearly all medical LLMs fail to consistently improve over their base models in the zero-/few-shot prompting and supervised fine-tuning regimes for medical question answering (QA). For instance, on clinical-note-based QA tasks in the 3-shot setting, medical LLMs outperform their base models in only 26.7% of cases, reach a (statistical) tie in 16.7% of cases, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
MethodsBalanced Selection
