Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models

Yingkai Dong; Xiangtao Meng; Ning Yu; Zheng Li; Shanqing Guo

arXiv:2408.00523·cs.CR·June 26, 2025·2 cites

Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models

Yingkai Dong, Xiangtao Meng, Ning Yu, Zheng Li, Shanqing Guo

PDF

Open Access 1 Repo

TL;DR

JailFuzzer is an LLM-driven fuzzing framework that efficiently generates natural jailbreak prompts to bypass safety measures in text-to-image models, highlighting vulnerabilities and aiding in developing stronger defenses.

Contribution

The paper introduces JailFuzzer, a novel LLM-based fuzzing framework that effectively creates natural jailbreak prompts in a black-box setting, outperforming existing methods in success rate and efficiency.

Findings

01

High success rate in jailbreaking T2I models

02

Generates natural, semantically coherent prompts

03

Reduces query overhead compared to prior methods

Abstract

Text-to-image (T2I) generative models have revolutionized content creation by transforming textual descriptions into high-quality images. However, these models are vulnerable to jailbreaking attacks, where carefully crafted prompts bypass safety mechanisms to produce unsafe content. While researchers have developed various jailbreak attacks to expose this risk, these methods face significant limitations, including impractical access requirements, easily detectable unnatural prompts, restricted search spaces, and high query demands on the target system. In this paper, we propose JailFuzzer, a novel fuzzing framework driven by large language model (LLM) agents, designed to efficiently generate natural and semantically meaningful jailbreak prompts in a black-box setting. Specifically, JailFuzzer employs fuzz-testing principles with three components: a seed pool for initial and jailbreak…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yingkaid/jailfuzzer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Digital and Cyber Forensics · Digital Media Forensic Detection

MethodsFocus