DMN: A Compositional Framework for Jailbreaking Multimodal LLMs with Multi-Image Inputs
Wenzhuo Xu, Zhipeng Wei, Zonghao Ying, Deyue Zhang, Dongdong Yang, Xiangzheng Zhang, Quanchen Zou

TL;DR
This paper introduces DMN, a novel compositional framework that significantly improves jailbreaking success rates in multimodal large language models by exploiting multi-image inputs and visual reasoning tasks.
Contribution
The paper presents DMN, a new compositional jailbreak method that leverages distributed instructions, multimodal evidence, and number chain tasks to enhance attack effectiveness on MLLMs.
Findings
DMN achieves over 90% success rate on GPT-4o, Gemini-2.5-pro, and Claude Sonnet 4.
DMN outperforms existing jailbreak methods by a large margin.
The strategy exposes fundamental safety weaknesses in current MLLMs.
Abstract
Multimodal Large Language Models (MLLMs) are vulnerable to jailbreak attacks, which can elicit harmful responses from MLLMs. Many MLLMs support multi-image inputs, inadvertently introducing new vulnerabilities due to less efforts on multi-image safety alignment. Previous MLLM jailbreak methods only uses a single image, which restricts the attack space: they cannot distribute harmful requests across multiple images, carry abundant information, or exploit additional visual reasoning tasks to distract MLLMs. To address these limitations, in this paper, we propose a compositional jailbreak framework, \textbf{DMN}, which leverages \textbf{D}istributed instruction, \textbf{M}ultimodal evidence and a \textbf{N}umber chain task to fully enhance the jailbreak performance. Extensive experiments show that DMN is highly effective for MLLM jailbreaking, e.g. achieving attack success rates of over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
