On the Robustness of Medical Vision-Language Models: Are they Truly Generalizable?
Raza Imam, Rufael Marew, Mohammad Yaqub

TL;DR
This paper evaluates the robustness of medical vision-language models under real-world noise and corruptions, introduces a new benchmark for testing, and proposes an adaptation method to improve resilience without sacrificing generalization.
Contribution
It introduces MediMeta-C, a comprehensive corruption benchmark for medical imaging, and proposes RobustMedCLIP, a few-shot tuned adaptation method to enhance model robustness.
Findings
Existing models degrade significantly under corruption.
RobustMedCLIP improves robustness with minimal loss of generalization.
Diverse training enhances model resilience across modalities.
Abstract
Medical Vision-Language Models (MVLMs) have achieved par excellence generalization in medical image analysis, yet their performance under noisy, corrupted conditions remains largely untested. Clinical imaging is inherently susceptible to acquisition artifacts and noise; however, existing evaluations predominantly assess generally clean datasets, overlooking robustness -- i.e., the model's ability to perform under real-world distortions. To address this gap, we first introduce MediMeta-C, a corruption benchmark that systematically applies several perturbations across multiple medical imaging datasets. Combined with MedMNIST-C, this establishes a comprehensive robustness evaluation framework for MVLMs. We further propose RobustMedCLIP, a visual encoder adaptation of a pretrained MVLM that incorporates few-shot tuning to enhance resilience against corruptions. Through extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
