GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization

Zixuan Chen; Hao Lin; Ke Xu; Xinghao Jiang; Tanfeng Sun

arXiv:2505.18979·cs.LG·May 27, 2025

GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization

Zixuan Chen, Hao Lin, Ke Xu, Xinghao Jiang, Tanfeng Sun

PDF

Open Access

TL;DR

GhostPrompt is an innovative automated framework that uses dynamic prompt optimization and multimodal feedback to effectively bypass modern text-to-image safety filters, exposing vulnerabilities in current AI safety measures.

Contribution

It introduces GhostPrompt, the first automated jailbreak method combining dynamic prompt optimization with multimodal feedback to bypass advanced safety filters in T2I models.

Findings

01

Achieves 99.0% bypass rate on ShieldLM-7B.

02

Improves CLIP score from 0.2637 to 0.2762.

03

Reduces time cost by 4.2 times.

Abstract

Text-to-image (T2I) generation models can inadvertently produce not-safe-for-work (NSFW) content, prompting the integration of text and image safety filters. Recent advances employ large language models (LLMs) for semantic-level detection, rendering traditional token-level perturbation attacks largely ineffective. However, our evaluation shows that existing jailbreak methods are ineffective against these modern filters. We introduce GhostPrompt, the first automated jailbreak framework that combines dynamic prompt optimization with multimodal feedback. It consists of two key components: (i) Dynamic Optimization, an iterative process that guides a large language model (LLM) using feedback from text safety filters and CLIP similarity scores to generate semantically aligned adversarial prompts; and (ii) Adaptive Safety Indicator Injection, which formulates the injection of benign visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Digital and Cyber Forensics · Generative Adversarial Networks and Image Synthesis

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Multi-Head Attention · Layer Normalization · Byte Pair Encoding