Consistent but Dangerous: Per-Sample Safety Classification Reveals False Reliability in Medical Vision-Language Models

Binesh Sadanandan; Vahid Behzadan

arXiv:2603.20985·cs.CV·March 24, 2026

Consistent but Dangerous: Per-Sample Safety Classification Reveals False Reliability in Medical Vision-Language Models

Binesh Sadanandan, Vahid Behzadan

PDF

Open Access

TL;DR

This paper reveals that medical vision-language models can appear reliable through consistency checks alone, but may rely on text patterns rather than images, leading to false confidence and potential safety risks.

Contribution

It introduces a four-quadrant safety taxonomy and demonstrates that current models often appear consistent yet are not image-reliant, exposing safety vulnerabilities.

Findings

01

LoRA fine-tuning reduces flip rates but increases Dangerous samples

02

Dangerous samples have high accuracy and low entropy, evading detection

03

Pairing consistency checks with text-only baselines improves safety evaluation

Abstract

Consistency under paraphrase, the property that semantically equivalent prompts yield identical predictions, is increasingly used as a proxy for reliability when deploying medical vision-language models (VLMs). We show this proxy is fundamentally flawed: a model can achieve perfect consistency by relying on text patterns rather than the input image. We introduce a four-quadrant per-sample safety taxonomy that jointly evaluates consistency (stable predictions across paraphrased prompts) and image reliance (predictions that change when the image is removed). Samples are classified as Ideal (consistent and image-reliant), Fragile (inconsistent but image-reliant), Dangerous (consistent but not image-reliant), or Worst (inconsistent and not image-reliant). Evaluating five medical VLM configurations across two chest X-ray datasets (MIMIC-CXR, PadChest), we find that LoRA fine-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Artificial Intelligence in Healthcare and Education