PurifyGen: A Risk-Discrimination and Semantic-Purification Model for Safe Text-to-Image Generation

Zongsheng Cao; Yangfan He; Anran Liu; Jun Xie; Feng Chen; and Zepeng Wang

arXiv:2512.23546·cs.CV·December 30, 2025

PurifyGen: A Risk-Discrimination and Semantic-Purification Model for Safe Text-to-Image Generation

Zongsheng Cao, Yangfan He, Anran Liu, Jun Xie, Feng Chen, and Zepeng Wang

PDF

Open Access

TL;DR

PurifyGen is a training-free, dual-stage prompt purification method for safe text-to-image generation that effectively reduces unsafe content while preserving original prompt semantics.

Contribution

It introduces a novel, training-free prompt purification approach using semantic distance evaluation and dual-space transformation for safer T2I generation.

Findings

01

Outperforms existing safety methods across five datasets.

02

Effectively removes unsafe content without retraining or keyword matching.

03

Maintains prompt intent and coherence after purification.

Abstract

Recent advances in diffusion models have notably enhanced text-to-image (T2I) generation quality, but they also raise the risk of generating unsafe content. Traditional safety methods like text blacklisting or harmful content classification have significant drawbacks: they can be easily circumvented or require extensive datasets and extra training. To overcome these challenges, we introduce PurifyGen, a novel, training-free approach for safe T2I generation that retains the model's original weights. PurifyGen introduces a dual-stage strategy for prompt purification. First, we evaluate the safety of each token in a prompt by computing its complementary semantic distance, which measures the semantic proximity between the prompt tokens and concept embeddings from predefined toxic and clean lists. This enables fine-grained prompt classification without explicit keyword matching or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Hate Speech and Cyberbullying Detection · Multimodal Machine Learning Applications