MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs

Yilian Liu; Xiaojun Jia; Guoshun Nan; Jiuyang Lyu; Zhican Chen; Tao Guan; Shuyuan Luo; Zhongyi Zhai; Yang Liu

arXiv:2603.00565·cs.CV·March 3, 2026

MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs

Yilian Liu, Xiaojun Jia, Guoshun Nan, Jiuyang Lyu, Zhican Chen, Tao Guan, Shuyuan Luo, Zhongyi Zhai, Yang Liu

PDF

Open Access 3 Reviews

TL;DR

The paper introduces MIDAS, a multimodal jailbreak framework that disperses harmful semantics across multiple images and uses cross-image reasoning to bypass safety mechanisms in MLLMs, significantly increasing attack success rates.

Contribution

MIDAS is the first framework to decompose and disperse malicious content across multiple images, enabling more effective jailbreak attacks on advanced MLLMs.

Findings

01

Achieves an average attack success rate of 81.46% across 4 closed-source MLLMs.

02

Outperforms state-of-the-art jailbreak methods in effectiveness.

03

Enforces longer, structured multi-image reasoning to bypass safety defenses.

Abstract

Multimodal Large Language Models (MLLMs) have achieved remarkable performance but remain vulnerable to jailbreak attacks that can induce harmful content and undermine their secure deployment. Previous studies have shown that introducing additional inference steps, which disrupt security attention, can make MLLMs more susceptible to being misled into generating malicious content. However, these methods rely on single-image masking or isolated visual cues, which only modestly extend reasoning paths and thus achieve limited effectiveness, particularly against strongly aligned commercial closed-source models. To address this problem, in this paper, we propose Multi-Image Dispersion and Semantic Reconstruction (MIDAS), a multimodal jailbreak framework that decomposes harmful semantics into risk-bearing subunits, disperses them across multiple visual clues, and leverages cross-image reasoning…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

The paper presents good numerical results compared to SOTA methods. It introduces a comprehensive framework and a well-presented approach, including large-scale experiments, experiments to support the claim made, and other analyses, such as the external safety detection mechanism.

Weaknesses

The idea that a longer chain-of-thought with innocuous early steps eases jailbreaks is not novel (for example, VisCRA, which the authors mention, already supports this idea). That said, the authors introduce a new method and cleverly engineer this idea. I would reduce the amount of mathematical notation in favor of more detail on method design and implementation. See the details below.

Reviewer 02Rating 6Confidence 3

Strengths

1. The paper presents a novel jailbreak formulation that disperses harmful semantics across multiple images and reconstructs them through structured reasoning—a clear departure from prior single-image or heuristic-based attacks. The combination of game-based visual reasoning and persona-driven textual reconstruction is a creative and conceptually new approach to multimodal adversarial prompting. 2. The jailbreak method proposed by MIDAS exploits the model’s own reasoning capabilities. As modern

Weaknesses

1. Limited theoretical grounding. While MIDAS provides an intuitive probabilistic formulation, the framework remains largely empirical. The paper would benefit from a deeper theoretical analysis of why cross-image dispersion weakens alignment—e.g., formalizing how reasoning-chain extension interacts with attention allocation or safety gating mechanisms. 2. Ambiguity in puzzle design generalization. The game-style visual reasoning templates are described conceptually but not quantitatively analyz

Reviewer 03Rating 4Confidence 4

Strengths

- Clear writing and storytelling, which is easy to follow. - Good comparisons on several models demonstrate the effectiveness of such a method.

Weaknesses

- Lack of novelty, which is my major concern. Previous work has demonstrated that hiding sensitive words in images can enhance the jailbreak success rate, which is why FigStep and MM-SafetyBench focus on early models. Besides, given that FigStep could be defended by GPT-4V, the author also proposed FigStep-Pro, leveraging the model's capability of analyzing each sub-token shown in the images, concatenating the words, and then answering the question. From this perspective, the novelty lies in an

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis