OrchJail: Jailbreaking Tool-Calling Text-to-Image Agents by Orchestration-Guided Fuzzing

Jianming Chen; Yawen Wang; Junjie Wang; Zhe Liu; Qing Wang; Fanjiang Xu

arXiv:2605.07414·cs.MA·May 11, 2026

OrchJail: Jailbreaking Tool-Calling Text-to-Image Agents by Orchestration-Guided Fuzzing

Jianming Chen, Yawen Wang, Junjie Wang, Zhe Liu, Qing Wang, Fanjiang Xu

PDF

TL;DR

OrchJail is a novel fuzzing framework that exploits tool orchestration patterns to effectively jailbreak tool-calling text-to-image agents, revealing new safety vulnerabilities.

Contribution

It introduces an orchestration-guided fuzzing approach that targets multi-step tool behaviors, improving jailbreak success and efficiency over existing prompt-only methods.

Findings

01

Achieves higher attack success rates against T2I agents.

02

Demonstrates lower query costs and better image fidelity.

03

Remains robust against common jailbreak defenses.

Abstract

Tool-calling text-to-image (T2I) agents can plan and execute multi-step tool chains to accomplish complex generation and editing queries. However, this capability introduces a new safety attack surface: harmful outputs may arise from tool orchestration, where individually benign steps combine into unsafe results, making prompt-only jailbreak techniques insufficient. We present OrchJail, an orchestration-guided fuzzing framework for jailbreaking tool-calling T2I agents. Its core idea is to exploit high-risk tool-orchestration patterns: by learning from successful jailbreak tool-calling traces and their causal relationships to prompt wording, OrchJail directly guides the fuzzing search toward prompts that are more likely to trigger unsafe multi-step tool behaviors, rather than relying on surface-level textual perturbations. Extensive experiments demonstrate that OrchJail improves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.