PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation

Zonglei Jing; Xiao Yang; Xiaoqian Li; Siyuan Liang; Aishan Liu; Mingchuan Zhang; Xianglong Liu

arXiv:2508.01272·cs.CV·August 15, 2025

PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation

Zonglei Jing, Xiao Yang, Xiaoqian Li, Siyuan Liang, Aishan Liu, Mingchuan Zhang, Xianglong Liu

PDF

Open Access

TL;DR

PromptSafe introduces a gated prompt tuning method that adaptively enhances safety in text-to-image models by rewriting unsafe prompts and controlling unsafe content without degrading benign image quality.

Contribution

The paper presents a novel gated prompt tuning framework that uses a lightweight text-only approach and adaptive control to improve safety in T2I models, reducing reliance on large datasets.

Findings

01

Achieves state-of-the-art unsafe generation rate of 2.36%.

02

Maintains high fidelity for benign images.

03

Demonstrates robustness against unseen harmful categories and adversarial attacks.

Abstract

Text-to-image (T2I) models have demonstrated remarkable generative capabilities but remain vulnerable to producing not-safe-for-work (NSFW) content, such as violent or explicit imagery. While recent moderation efforts have introduced soft prompt-guided tuning by appending defensive tokens to the input, these approaches often rely on large-scale curated image-text datasets and apply static, one-size-fits-all defenses at inference time. However, this results not only in high computational cost and degraded benign image quality, but also in limited adaptability to the diverse and nuanced safety requirements of real-world prompts. To address these challenges, we propose PromptSafe, a gated prompt tuning framework that combines a lightweight, text-only supervised soft embedding with an inference-time gated control network. Instead of training on expensive image-text datasets, we first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Hate Speech and Cyberbullying Detection