The Safety-Aware Denoiser for Text Diffusion Models
Amman Yusuf, Zhejun Jiang, Mijung Park

TL;DR
The paper introduces the Safety-Aware Denoiser (SAD), a novel inference-time safety guidance method for text diffusion models that reduces unsafe outputs without retraining.
Contribution
It presents SAD, a lightweight safety-guidance framework that steers text diffusion models toward safe outputs during denoising, unlike prior post-hoc or inference-time interventions.
Findings
SAD significantly reduces unsafe text generations.
It maintains high quality, diversity, and fluency of generated text.
Outperforms existing safety methods in experiments.
Abstract
Recent work on text diffusion models offers a promising alternative to autoregressive generation, but controlling their safety remains underexplored. Existing safety approaches are geared toward autoregressive models and typically rely on post-hoc filtering or inference-time interventions. These are inadequate for effectively addressing safety risks in text diffusion models. We propose the Safety-Aware Denoiser (SAD), a safety-guidance framework in text diffusion models. The SAD modifies the iterative denoising process such that the text sample at the final denoising step is steered toward provably safe regions of the text space. This inference-time method can integrate safety constraints into the denoiser, avoiding computationally expensive retraining of the underlying diffusion model and enabling flexible, lightweight safety guidance. We evaluate the safety of the generated text using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
