When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models

Jiacheng Hou; Yining Sun; Ruochong Jin; Haochen Han; Fangming Liu; Wai Kin Victor Chan; Alex Jinpeng Wang

arXiv:2602.10179·cs.CV·February 12, 2026

When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models

Jiacheng Hou, Yining Sun, Ruochong Jin, Haochen Han, Fangming Liu, Wai Kin Victor Chan, Alex Jinpeng Wang

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a novel visual jailbreak attack on large image editing models, demonstrates its effectiveness, and proposes a training-free defense to enhance safety without significant computational costs.

Contribution

It presents the first visual-to-visual jailbreak attack, introduces the IESBench benchmark, and proposes a simple yet effective defense mechanism for image editing models.

Findings

01

VJA achieves up to 80.9% attack success rate.

02

IESBench effectively evaluates model vulnerabilities.

03

The proposed defense improves safety with negligible overhead.

Abstract

Recent advances in large image editing models have shifted the paradigm from text-driven instructions to vision-prompt editing, where user intent is inferred directly from visual inputs such as marks, arrows, and visual-text prompts. While this paradigm greatly expands usability, it also introduces a critical and underexplored safety risk: the attack surface itself becomes visual. In this work, we propose Vision-Centric Jailbreak Attack (VJA), the first visual-to-visual jailbreak attack that conveys malicious instructions purely through visual inputs. To systematically study this emerging threat, we introduce IESBench, a safety-oriented benchmark for image editing models. Extensive experiments on IESBench demonstrate that VJA effectively compromises state-of-the-art commercial models, achieving attack success rates of up to 80.9% on Nano Banana Pro and 70.1% on GPT-Image-1.5. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

CSU-JPG/IESBench
dataset· 1.6k dl
1.6k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Digital Media Forensic Detection