SafeCtrl: Region-Based Safety Control for Text-to-Image Diffusion via Detect-Then-Suppress

Lingyun Zhang; Yu Xie; Yanwei Fu; Ping Chen

arXiv:2508.11904·cs.CV·August 19, 2025

SafeCtrl: Region-Based Safety Control for Text-to-Image Diffusion via Detect-Then-Suppress

Lingyun Zhang, Yu Xie, Yanwei Fu, Ping Chen

PDF

Open Access

TL;DR

SafeCtrl introduces a region-based safety control for text-to-image diffusion models that localizes and suppresses unsafe content without compromising image fidelity, using a novel training strategy with preference data.

Contribution

We propose SafeCtrl, a flexible detect-then-suppress safety mechanism that localizes unsafe regions and suppresses harmful semantics without explicit concept replacement, trained via Direct Preference Optimization.

Findings

01

Outperforms state-of-the-art safety methods in efficacy and fidelity.

02

Effectively localizes unsafe content without pixel-level annotations.

03

Enables scalable, context-aware safety interventions in generative models.

Abstract

The widespread deployment of text-to-image models is challenged by their potential to generate harmful content. While existing safety methods, such as prompt rewriting or model fine-tuning, provide valuable interventions, they often introduce a trade-off between safety and fidelity. Recent localization-based approaches have shown promise, yet their reliance on explicit ``concept replacement" can sometimes lead to semantic incongruity. To address these limitations, we explore a more flexible detect-then-suppress paradigm. We introduce SafeCtrl, a lightweight, non-intrusive plugin that first precisely localizes unsafe content. Instead of performing a hard A-to-B substitution, SafeCtrl then suppresses the harmful semantics, allowing the generative process to naturally and coherently resolve into a safe, context-aware alternative. A key aspect of our work is a novel training strategy using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques