TL;DR
This paper introduces MedFocusLeak, a black-box attack method that subtly manipulates background regions in medical images to produce misleading yet plausible diagnoses, exposing vulnerabilities in vision-language models used in healthcare.
Contribution
The paper presents MedFocusLeak, a novel transferable attack that induces realistic misdiagnoses by perturbing non-diagnostic regions and shifting model attention, revealing critical weaknesses in clinical VLMs.
Findings
MedFocusLeak achieves state-of-the-art attack success across six medical imaging modalities.
The attack produces realistic, misleading diagnoses while maintaining image fidelity.
A new evaluation framework effectively measures attack success and image quality.
Abstract
Vision-Language Models (VLMs) are increasingly used in clinical diagnostics, yet their robustness to adversarial attacks remains largely unexplored, posing serious risks. Existing medical attacks focus on secondary objectives such as model stealing or adversarial fine-tuning, while transferable attacks from natural images introduce visible distortions that clinicians can easily detect. To address this, we propose MedFocusLeak, a highly transferable black-box multimodal attack that induces incorrect yet clinically plausible diagnoses while keeping perturbations imperceptible. The method injects coordinated perturbations into non-diagnostic background regions and employs an attention distraction mechanism to shift the model's focus away from pathological areas. Extensive evaluations across six medical imaging modalities show that MedFocusLeak achieves state-of-the-art performance,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
