Consistent but Dangerous: Per-Sample Safety Classification Reveals False Reliability in Medical Vision-Language Models
Binesh Sadanandan, Vahid Behzadan

TL;DR
This paper reveals that medical vision-language models can appear reliable through consistency checks alone, but may rely on text patterns rather than images, leading to false confidence and potential safety risks.
Contribution
It introduces a four-quadrant safety taxonomy and demonstrates that current models often appear consistent yet are not image-reliant, exposing safety vulnerabilities.
Findings
LoRA fine-tuning reduces flip rates but increases Dangerous samples
Dangerous samples have high accuracy and low entropy, evading detection
Pairing consistency checks with text-only baselines improves safety evaluation
Abstract
Consistency under paraphrase, the property that semantically equivalent prompts yield identical predictions, is increasingly used as a proxy for reliability when deploying medical vision-language models (VLMs). We show this proxy is fundamentally flawed: a model can achieve perfect consistency by relying on text patterns rather than the input image. We introduce a four-quadrant per-sample safety taxonomy that jointly evaluates consistency (stable predictions across paraphrased prompts) and image reliance (predictions that change when the image is removed). Samples are classified as Ideal (consistent and image-reliant), Fragile (inconsistent but image-reliant), Dangerous (consistent but not image-reliant), or Worst (inconsistent and not image-reliant). Evaluating five medical VLM configurations across two chest X-ray datasets (MIMIC-CXR, PadChest), we find that LoRA fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Artificial Intelligence in Healthcare and Education
