DMN: A Compositional Framework for Jailbreaking Multimodal LLMs with Multi-Image Inputs

Wenzhuo Xu; Zhipeng Wei; Zonghao Ying; Deyue Zhang; Dongdong Yang; Xiangzheng Zhang; Quanchen Zou

arXiv:2605.18915·cs.CR·May 20, 2026

DMN: A Compositional Framework for Jailbreaking Multimodal LLMs with Multi-Image Inputs

Wenzhuo Xu, Zhipeng Wei, Zonghao Ying, Deyue Zhang, Dongdong Yang, Xiangzheng Zhang, Quanchen Zou

PDF

TL;DR

This paper introduces DMN, a novel compositional framework that significantly improves jailbreaking success rates in multimodal large language models by exploiting multi-image inputs and visual reasoning tasks.

Contribution

The paper presents DMN, a new compositional jailbreak method that leverages distributed instructions, multimodal evidence, and number chain tasks to enhance attack effectiveness on MLLMs.

Findings

01

DMN achieves over 90% success rate on GPT-4o, Gemini-2.5-pro, and Claude Sonnet 4.

02

DMN outperforms existing jailbreak methods by a large margin.

03

The strategy exposes fundamental safety weaknesses in current MLLMs.

Abstract

Multimodal Large Language Models (MLLMs) are vulnerable to jailbreak attacks, which can elicit harmful responses from MLLMs. Many MLLMs support multi-image inputs, inadvertently introducing new vulnerabilities due to less efforts on multi-image safety alignment. Previous MLLM jailbreak methods only uses a single image, which restricts the attack space: they cannot distribute harmful requests across multiple images, carry abundant information, or exploit additional visual reasoning tasks to distract MLLMs. To address these limitations, in this paper, we propose a compositional jailbreak framework, \textbf{DMN}, which leverages \textbf{D}istributed instruction, \textbf{M}ultimodal evidence and a \textbf{N}umber chain task to fully enhance the jailbreak performance. Extensive experiments show that DMN is highly effective for MLLM jailbreaking, e.g. achieving attack success rates of over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.