Training-Free Safe Text Embedding Guidance for Text-to-Image Diffusion Models

Byeonghu Na; Mina Kang; Jiseok Kwak; Minsang Park; Jiwoo Shin; SeJoon Jun; Gayoung Lee; Jin-Hwa Kim; Il-Chul Moon

arXiv:2510.24012·cs.LG·October 29, 2025

Training-Free Safe Text Embedding Guidance for Text-to-Image Diffusion Models

Byeonghu Na, Mina Kang, Jiseok Kwak, Minsang Park, Jiwoo Shin, SeJoon Jun, Gayoung Lee, Jin-Hwa Kim, Il-Chul Moon

PDF

1 Video

TL;DR

This paper introduces Safe Text embedding Guidance (STG), a training-free method that enhances the safety of text-to-image diffusion models by guiding text embeddings during sampling to reduce unsafe outputs.

Contribution

STG is a novel training-free approach that aligns diffusion model outputs with safety constraints by adjusting text embeddings based on a safety function during sampling.

Findings

01

STG effectively reduces unsafe content in generated images across multiple safety scenarios.

02

STG outperforms existing training-based and training-free safety methods.

03

STG maintains high semantic fidelity of the generated images.

Abstract

Text-to-image models have recently made significant advances in generating realistic and semantically coherent images, driven by advanced diffusion models and large-scale web-crawled datasets. However, these datasets often contain inappropriate or biased content, raising concerns about the generation of harmful outputs when provided with malicious text prompts. We propose Safe Text embedding Guidance (STG), a training-free approach to improve the safety of diffusion models by guiding the text embeddings during sampling. STG adjusts the text embeddings based on a safety function evaluated on the expected final denoised image, allowing the model to generate safer outputs without additional training. Theoretically, we show that STG aligns the underlying model distribution with safety constraints, thereby achieving safer outputs while minimally affecting generation quality. Experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Training-Free Safe Text Embedding Guidance for Text-to-Image Diffusion Models· slideslive