Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection
Ziqi Miao, Yi Ding, Lijun Li, Jing Shao

TL;DR
This paper introduces VisCo, a novel attack method that uses realistic, vision-centric scenarios with auxiliary images to effectively jailbreak multimodal large language models, exposing security vulnerabilities.
Contribution
It proposes a new vision-centric jailbreak setting and develops VisCo, a dynamic, image-driven attack strategy that significantly improves success rates against MLLMs.
Findings
Achieves 85% attack success rate on MM-SafetyBench
Generates high-toxicity prompts with a score of 4.78
Outperforms baseline attack methods
Abstract
With the emergence of strong vision language capabilities, multimodal large language models (MLLMs) have demonstrated tremendous potential for real-world applications. However, the security vulnerabilities exhibited by the visual modality pose significant challenges to deploying such models in open-world environments. Recent studies have successfully induced harmful responses from target MLLMs by encoding harmful textual semantics directly into visual inputs. However, in these approaches, the visual modality primarily serves as a trigger for unsafe behavior, often exhibiting semantic ambiguity and lacking grounding in realistic scenarios. In this work, we define a novel setting: vision-centric jailbreak, where visual information serves as a necessary component in constructing a complete and realistic jailbreak context. Building on this setting, we propose the VisCo (Visual Contextual)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMedical Imaging Techniques and Applications · Brain Metastases and Treatment · COVID-19 diagnosis using AI
