SPOT: Selective Prompt Projection via Total Variation for Inference-Only Safe Text-to-Image Generation
Minhyuk Lee, Hyekyung Yoon, Myungjoo Kang

TL;DR
SPOT is an inference-time framework that improves the safety of text-to-image diffusion models by selectively projecting prompts to safer alternatives without retraining the generator.
Contribution
It introduces a novel prompt projection method using large language models and visual safeguards to enhance safety while preserving benign prompt behavior.
Findings
SPOT reduces inappropriate image generation scores by up to 44.4%.
It maintains close behavior to the reference on benign prompts.
Effective across multiple datasets and diffusion models.
Abstract
Text-to-Image (T2I) diffusion models enable high quality open ended synthesis, but practical use requires suppressing unsafe generations while preserving behavior on benign prompts. We study this tension relative to the frozen generator, using its prompt conditioned distribution as the preservation reference. Since T2I safety is commonly evaluated by bounded risk scores on generated images, total variation (TV) bounds how much expected risk can change from this reference. We call this fixed reference constraint the Safety-Prompt Alignment Tradeoff (SPAT): reducing expected unsafety requires prompt conditioned distributional deviation. To make this deviation selective and adjustable, we define the tau safe set as prompts whose reference risk is at most tau, and cast intervention as projection toward nearby prompts in this set. We propose Selective Prompt prOjecTion (SPOT), an inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
