Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors
Jiachen Sun, Changsheng Wang, Jiongxiao Wang, Yiwei Zhang and, Chaowei Xiao

TL;DR
This paper introduces SmoothVLM, a smoothing-based defense mechanism that significantly reduces the success of patched visual prompt injections in vision-language models, enhancing their robustness against adversarial patches.
Contribution
The paper proposes SmoothVLM, a novel smoothing technique that effectively defends VLMs from patched adversarial prompts, with minimal impact on image context recovery.
Findings
Attack success rate reduced to 0-5%
Achieves 67.3-95% context recovery
Robust against adaptive adversarial attacks
Abstract
Large language models have become increasingly prominent, also signaling a shift towards multimodality as the next frontier in artificial intelligence, where their embeddings are harnessed as prompts to generate textual content. Vision-language models (VLMs) stand at the forefront of this advancement, offering innovative ways to combine visual and textual data for enhanced understanding and interaction. However, this integration also enlarges the attack surface. Patch-based adversarial attack is considered the most realistic threat model in physical vision applications, as demonstrated in many existing literature. In this paper, we propose to address patched visual prompt injection, where adversaries exploit adversarial patches to generate target content in VLMs. Our investigation reveals that patched adversarial prompts exhibit sensitivity to pixel-wise randomization, a trait that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing
