To Adapt or not to Adapt, Rethinking the Value of Medical Knowledge-Aware Large Language Models

Ane G. Domingo-Aldama; Iker De La Iglesia; Maitane Urruela; Aitziber Atutxa; Ander Barrena

arXiv:2604.06854·cs.CL·April 9, 2026

To Adapt or not to Adapt, Rethinking the Value of Medical Knowledge-Aware Large Language Models

Ane G. Domingo-Aldama, Iker De La Iglesia, Maitane Urruela, Aitziber Atutxa, Ander Barrena

PDF

TL;DR

This study compares general and clinical large language models on medical question answering tasks, revealing limited benefits of clinical adaptation in English but notable improvements in Spanish with lightweight models.

Contribution

It introduces Marmoka, a family of lightweight clinical LLMs for Spanish, and proposes a perturbation-based benchmark to evaluate model robustness and instruction following.

Findings

01

Clinical LLMs do not consistently outperform general models in English.

02

Marmoka models outperform Llama in Spanish clinical tasks.

03

Both model types show limitations in instruction following and output formatting.

Abstract

BACKGROUND: Recent studies have shown that domain-adapted large language models (LLMs) do not consistently outperform general-purpose counterparts on standard medical benchmarks, raising questions about the need for specialized clinical adaptation. METHODS: We systematically compare general and clinical LLMs on a diverse set of multiple choice clinical question answering tasks in English and Spanish. We introduce a perturbation based evaluation benchmark that probes model robustness, instruction following, and sensitivity to adversarial variations. Our evaluation includes, one-step and two-step question transformations, multi prompt testing and instruction guided assessment. We analyze a range of state-of-the-art clinical models and their general-purpose counterparts, focusing on Llama 3.1-based models. Additionally, we introduce Marmoka, a family of lightweight 8B-parameter clinical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.