An Image Is Worth 1000 Lies: Adversarial Transferability across Prompts on Vision-Language Models
Haochen Luo, Jindong Gu, Fengyuan Liu, Philip Torr

TL;DR
This paper introduces CroPA, a novel adversarial attack method that enhances the transferability of adversarial images across different prompts in vision-language models, revealing vulnerabilities in prompt-based task adaptation.
Contribution
We propose CroPA, a new attack technique that updates visual adversarial perturbations with learnable prompts to improve cross-prompt transferability in vision-language models.
Findings
CroPA significantly improves adversarial transferability across prompts.
Vulnerabilities are demonstrated in models like Flamingo, BLIP-2, and InstructBLIP.
Cross-prompt attacks can mislead models regardless of prompt variations.
Abstract
Different from traditional task-specific vision models, recent large VLMs can readily adapt to different vision tasks by simply using different textual instructions, i.e., prompts. However, a well-known concern about traditional task-specific vision models is that they can be misled by imperceptible adversarial perturbations. Furthermore, the concern is exacerbated by the phenomenon that the same adversarial perturbations can fool different task-specific models. Given that VLMs rely on prompts to adapt to different tasks, an intriguing question emerges: Can a single adversarial image mislead all predictions of VLMs when a thousand different prompts are given? This question essentially introduces a novel perspective on adversarial transferability: cross-prompt adversarial transferability. In this work, we propose the Cross-Prompt Attack (CroPA). This proposed method updates the visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
