Responsible Diffusion Models via Constraining Text Embeddings within Safe Regions

Zhiwen Li; Die Chen; Mingyuan Fan; Cen Chen; Yaliang Li; Yanhao Wang; Wenmeng Zhou

arXiv:2505.15427·cs.CL·May 22, 2025

Responsible Diffusion Models via Constraining Text Embeddings within Safe Regions

Zhiwen Li, Die Chen, Mingyuan Fan, Cen Chen, Yaliang Li, Yanhao Wang, Wenmeng Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel method to constrain diffusion model text embeddings within safe regions, significantly reducing harmful content and biases without compromising overall image quality.

Contribution

The proposed approach identifies a semantic direction in embedding space to restrict prompts, improving safety and robustness of diffusion models against unsafe and biased outputs.

Findings

01

Effectively reduces NSFW content compared to baselines

02

Mitigates social biases in generated images

03

Maintains high image fidelity with minimal impact

Abstract

The remarkable ability of diffusion models to generate high-fidelity images has led to their widespread adoption. However, concerns have also arisen regarding their potential to produce Not Safe for Work (NSFW) content and exhibit social biases, hindering their practical use in real-world applications. In response to this challenge, prior work has focused on employing security filters to identify and exclude toxic text, or alternatively, fine-tuning pre-trained diffusion models to erase sensitive concepts. Unfortunately, existing methods struggle to achieve satisfactory performance in the sense that they can have a significant impact on the normal model output while still failing to prevent the generation of harmful content in some cases. In this paper, we propose a novel self-discovery approach to identifying a semantic direction vector in the embedding space to restrict text embedding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lzws/Responsible-Diffusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques

MethodsDiffusion