TL;DR
PromptGuard introduces a soft prompt-based safety mechanism for text-to-image models, effectively reducing unsafe content generation without sacrificing image quality or inference speed.
Contribution
It proposes a universal soft prompt approach for NSFW moderation in T2I models, outperforming existing methods in speed and safety.
Findings
PromptGuard reduces unsafe image outputs to around 6%.
It is 3.8 times faster than previous moderation techniques.
The method maintains high-quality benign image generation.
Abstract
Recent text-to-image (T2I) models have exhibited remarkable performance in generating high-quality images from text descriptions. However, these models are vulnerable to misuse, particularly generating not-safe-for-work (NSFW) content, such as sexually explicit, violent, political, and disturbing images, raising serious ethical concerns. In this work, we present PromptGuard, a novel content moderation technique that draws inspiration from the system prompt mechanism in large language models (LLMs) for safety alignment. Unlike LLMs, T2I models lack a direct interface for enforcing behavioral guidelines. Our key idea is to optimize a safety soft prompt that functions as an implicit system prompt within the T2I model's textual embedding space. This universal soft prompt (P*) directly moderates NSFW inputs, enabling safe yet realistic image generation without affecting inference efficiency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
