White-box Multimodal Jailbreaks Against Large Vision-Language Models
Ruofan Wang, Xingjun Ma, Hanxu Zhou, Chuanjun Ji, Guangnan Ye, Yu-Gang, Jiang

TL;DR
This paper introduces a dual-modality attack method that jointly manipulates images and text to exploit vulnerabilities in large vision-language models, successfully bypassing defenses and generating harmful content.
Contribution
It presents a novel universal attack strategy, the Universal Master Key, that effectively jailbreaks VLMs by jointly optimizing adversarial images and texts, revealing critical robustness weaknesses.
Findings
Achieves a 96% success rate in jailbreaking MiniGPT-4.
Demonstrates vulnerability of VLMs to combined image-text adversarial attacks.
Highlights the need for improved alignment and robustness strategies.
Abstract
Recent advancements in Large Vision-Language Models (VLMs) have underscored their superiority in various multimodal tasks. However, the adversarial robustness of VLMs has not been fully explored. Existing methods mainly assess robustness through unimodal adversarial attacks that perturb images, while assuming inherent resilience against text-based attacks. Different from existing attacks, in this work we propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within VLMs. Specifically, we propose a dual optimization objective aimed at guiding the model to generate affirmative responses with high toxicity. Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input, thus imbuing the image with toxic semantics.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
