PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models

Lingzhi Yuan; Xinfeng Li; Chejian Xu; Guanhong Tao; Xiaojun Jia; Yihao Huang; Wei Dong; Yang Liu; Bo Li

arXiv:2501.03544·cs.CV·May 12, 2026

PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models

Lingzhi Yuan, Xinfeng Li, Chejian Xu, Guanhong Tao, Xiaojun Jia, Yihao Huang, Wei Dong, Yang Liu, Bo Li

PDF

1 Repo

TL;DR

PromptGuard introduces a soft prompt-based safety mechanism for text-to-image models, effectively reducing unsafe content generation without sacrificing image quality or inference speed.

Contribution

It proposes a universal soft prompt approach for NSFW moderation in T2I models, outperforming existing methods in speed and safety.

Findings

01

PromptGuard reduces unsafe image outputs to around 6%.

02

It is 3.8 times faster than previous moderation techniques.

03

The method maintains high-quality benign image generation.

Abstract

Recent text-to-image (T2I) models have exhibited remarkable performance in generating high-quality images from text descriptions. However, these models are vulnerable to misuse, particularly generating not-safe-for-work (NSFW) content, such as sexually explicit, violent, political, and disturbing images, raising serious ethical concerns. In this work, we present PromptGuard, a novel content moderation technique that draws inspiration from the system prompt mechanism in large language models (LLMs) for safety alignment. Unlike LLMs, T2I models lack a direct interface for enforcing behavioral guidelines. Our key idea is to optimize a safety soft prompt that functions as an implicit system prompt within the T2I model's textual embedding space. This universal soft prompt (P*) directly moderates NSFW inputs, enabling safe yet realistic image generation without affecting inference efficiency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://t2i-promptguard.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.