Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models
Yingkai Dong, Xiangtao Meng, Ning Yu, Zheng Li, Shanqing Guo

TL;DR
JailFuzzer is an LLM-driven fuzzing framework that efficiently generates natural jailbreak prompts to bypass safety measures in text-to-image models, highlighting vulnerabilities and aiding in developing stronger defenses.
Contribution
The paper introduces JailFuzzer, a novel LLM-based fuzzing framework that effectively creates natural jailbreak prompts in a black-box setting, outperforming existing methods in success rate and efficiency.
Findings
High success rate in jailbreaking T2I models
Generates natural, semantically coherent prompts
Reduces query overhead compared to prior methods
Abstract
Text-to-image (T2I) generative models have revolutionized content creation by transforming textual descriptions into high-quality images. However, these models are vulnerable to jailbreaking attacks, where carefully crafted prompts bypass safety mechanisms to produce unsafe content. While researchers have developed various jailbreak attacks to expose this risk, these methods face significant limitations, including impractical access requirements, easily detectable unnatural prompts, restricted search spaces, and high query demands on the target system. In this paper, we propose JailFuzzer, a novel fuzzing framework driven by large language model (LLM) agents, designed to efficiently generate natural and semantically meaningful jailbreak prompts in a black-box setting. Specifically, JailFuzzer employs fuzz-testing principles with three components: a seed pool for initial and jailbreak…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Digital and Cyber Forensics · Digital Media Forensic Detection
MethodsFocus
