Is There Knowledge Left to Extract? Evidence of Fragility in Medically Fine-Tuned Vision-Language Models

Oliver McLaughlin; Daniel Shubin; Carsten Eickhoff; Ritambhara Singh; William Rudman; Michal Golovanevsky

arXiv:2604.09841·cs.CV·April 14, 2026

Is There Knowledge Left to Extract? Evidence of Fragility in Medically Fine-Tuned Vision-Language Models

Oliver McLaughlin, Daniel Shubin, Carsten Eickhoff, Ritambhara Singh, William Rudman, Michal Golovanevsky

PDF

TL;DR

This study evaluates the robustness and reasoning capabilities of medical vision-language models, revealing their fragility, prompt sensitivity, and limited benefit from domain-specific fine-tuning in high-stakes medical tasks.

Contribution

It provides a comprehensive analysis showing that medical fine-tuning does not reliably enhance reasoning and highlights the models' sensitivity to prompt variations and visual representation weaknesses.

Findings

01

Performance drops to near-random with increased task difficulty.

02

Medical fine-tuning offers no consistent performance advantage.

03

Models are highly sensitive to prompt formulation, affecting accuracy.

Abstract

Vision-language models (VLMs) are increasingly adapted through domain-specific fine-tuning, yet it remains unclear whether this improves reasoning beyond superficial visual cues, particularly in high-stakes domains like medicine. We evaluate four paired open-source VLMs (LLaVA vs. LLaVA-Med; Gemma vs. MedGemma) across four medical imaging tasks of increasing difficulty: brain tumor, pneumonia, skin cancer, and histopathology classification. We find that performance degrades toward near-random levels as task difficulty increases, indicating limited clinical reasoning. Medical fine-tuning provides no consistent advantage, and models are highly sensitive to prompt formulation, with minor changes causing large swings in accuracy and refusal rates. To test whether closed-form VQA suppresses latent knowledge, we introduce a description-based pipeline where models generate image descriptions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.