Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt
Zonghao Ying, Aishan Liu, Tianyuan Zhang, Zhengmin Yu, Siyuan Liang,, Xianglong Liu, Dacheng Tao

TL;DR
This paper introduces BAP, a novel bi-modal adversarial prompt attack that effectively bypasses safety guardrails in vision-language models by optimizing both visual and textual prompts, revealing safety vulnerabilities.
Contribution
The paper presents a new bi-modal attack method that jointly optimizes visual and textual prompts to successfully jailbreak large vision language models, outperforming existing approaches.
Findings
Achieves +29.03% higher attack success rate on average.
Effective against black-box commercial LVLMs like Gemini and ChatGLM.
Significantly outperforms previous methods in robustness and success rate.
Abstract
In the realm of large vision language models (LVLMs), jailbreak attacks serve as a red-teaming approach to bypass guardrails and uncover safety implications. Existing jailbreaks predominantly focus on the visual modality, perturbing solely visual inputs in the prompt for attacks. However, they fall short when confronted with aligned models that fuse visual and textual features simultaneously for generation. To address this limitation, this paper introduces the Bi-Modal Adversarial Prompt Attack (BAP), which executes jailbreaks by optimizing textual and visual prompts cohesively. Initially, we adversarially embed universally harmful perturbations in an image, guided by a few-shot query-agnostic corpus (e.g., affirmative prefixes and negative inhibitions). This process ensures that image prompt LVLMs to respond positively to any harmful queries. Subsequently, leveraging the adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Digital and Cyber Forensics · Forensic and Genetic Research
MethodsFocus
