VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models
Bingrui Sima, Linhua Cong, Wenxuan Wang, Kun He

TL;DR
This paper reveals that improved visual reasoning in multimodal large language models increases their vulnerability to jailbreak attacks, and introduces VisCRA, a novel method exploiting reasoning chains to bypass safety measures.
Contribution
The paper introduces VisCRA, a new attack framework that leverages visual reasoning chains to effectively jailbreak multimodal large language models, highlighting security risks associated with advanced visual reasoning.
Findings
VisCRA achieves over 76% success rate on Gemini 2.0.
It attains nearly 69% success on QvQ-Max.
It reaches 57% success on GPT-4o.
Abstract
The emergence of Multimodal Large Language Models (MLRMs) has enabled sophisticated visual reasoning capabilities by integrating reinforcement learning and Chain-of-Thought (CoT) supervision. However, while these enhanced reasoning capabilities improve performance, they also introduce new and underexplored safety risks. In this work, we systematically investigate the security implications of advanced visual reasoning in MLRMs. Our analysis reveals a fundamental trade-off: as visual reasoning improves, models become more vulnerable to jailbreak attacks. Motivated by this critical finding, we introduce VisCRA (Visual Chain Reasoning Attack), a novel jailbreak framework that exploits the visual reasoning chains to bypass safety mechanisms. VisCRA combines targeted visual attention masking with a two-stage reasoning induction strategy to precisely control harmful outputs. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
MethodsSoftmax · Attention Is All You Need
