VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack

Shiji Zhao; Shukun Xiong; Yao Huang; Yan Jin; Zhenyu Wu; Jiyang Guan; Ranjie Duan; Jialing Tao; Hui Xue; Xingxing Wei

arXiv:2512.05853·cs.CV·December 9, 2025

VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack

Shiji Zhao, Shukun Xiong, Yao Huang, Yan Jin, Zhenyu Wu, Jiyang Guan, Ranjie Duan, Jialing Tao, Hui Xue, Xingxing Wei

PDF

Open Access

TL;DR

This paper introduces VRSA, a novel attack method that exploits visual reasoning in multimodal large language models to induce harmful outputs by sequentially manipulating images, revealing new vulnerabilities in visual modalities.

Contribution

The paper proposes VRSA, a new sequential attack framework targeting visual reasoning in MLLMs, with techniques for scene refinement, semantic coherence, and consistency to improve attack success.

Findings

01

VRSA achieves higher attack success rates than existing methods.

02

The attack effectively exploits visual reasoning vulnerabilities in MLLMs.

03

Experimental results on GPT-4o and Claude-4.5-Sonnet demonstrate its effectiveness.

Abstract

Multimodal Large Language Models (MLLMs) are widely used in various fields due to their powerful cross-modal comprehension and generation capabilities. However, more modalities bring more vulnerabilities to being utilized for jailbreak attacks, which induces MLLMs to output harmful content. Due to the strong reasoning ability of MLLMs, previous jailbreak attacks try to explore reasoning safety risk in text modal, while similar threats have been largely overlooked in the visual modal. To fully evaluate potential safety risks in the visual reasoning task, we propose Visual Reasoning Sequential Attack (VRSA), which induces MLLMs to gradually externalize and aggregate complete harmful intent by decomposing the original harmful text into several sequentially related sub-images. In particular, to enhance the rationality of the scene in the image sequence, we propose Adaptive Scene Refinement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection