Conceal, Reconstruct, Jailbreak: Exploiting the Reconstruction-Concealment Tradeoff in MLLMs

Md Farhamdur Reza; Richeng Jin; Tianfu Wu; and Huaiyu Dai

arXiv:2605.05709·cs.AI·May 8, 2026

Conceal, Reconstruct, Jailbreak: Exploiting the Reconstruction-Concealment Tradeoff in MLLMs

Md Farhamdur Reza, Richeng Jin, Tianfu Wu, and Huaiyu Dai

PDF

TL;DR

This paper analyzes the tradeoff in multimodal large language models between concealing harmful intent and enabling reconstruction, proposing strategies to exploit this for jailbreak attacks.

Contribution

It introduces a novel concealment-aware construction method and distractor images to better balance concealment and reconstruction, exposing vulnerabilities in MLLMs.

Findings

01

Existing transformations struggle to balance concealment and reconstruction.

02

Character-removed variants improve the concealment-reconstruction tradeoff.

03

Proposed strategies outperform baselines in revealing harmful intent.

Abstract

Intent-obfuscation-based jailbreak attacks on multimodal large language models (MLLMs) transform a harmful query into a concealed multimodal input to bypass safety mechanisms. We show that such attacks are governed by a \emph{reconstruction--concealment tradeoff}: the transformed input must hide harmful intent from safety filters while remaining recoverable enough for the victim model to reconstruct the original request. Through a reconstruction analysis of three representative black-box methods, we find that existing transformations struggle to balance this tradeoff, limiting their effectiveness. In contrast, we show that character-removed variants achieve a better balance. Building on this, we propose \emph{concealment-aware variant construction}, which greedily selects character-removed variants that are low in harmful-keyword alignment and mutually diverse, and instantiates them…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.