SafeCtrl: Region-Based Safety Control for Text-to-Image Diffusion via Detect-Then-Suppress
Lingyun Zhang, Yu Xie, Yanwei Fu, Ping Chen

TL;DR
SafeCtrl introduces a region-based safety control for text-to-image diffusion models that localizes and suppresses unsafe content without compromising image fidelity, using a novel training strategy with preference data.
Contribution
We propose SafeCtrl, a flexible detect-then-suppress safety mechanism that localizes unsafe regions and suppresses harmful semantics without explicit concept replacement, trained via Direct Preference Optimization.
Findings
Outperforms state-of-the-art safety methods in efficacy and fidelity.
Effectively localizes unsafe content without pixel-level annotations.
Enables scalable, context-aware safety interventions in generative models.
Abstract
The widespread deployment of text-to-image models is challenged by their potential to generate harmful content. While existing safety methods, such as prompt rewriting or model fine-tuning, provide valuable interventions, they often introduce a trade-off between safety and fidelity. Recent localization-based approaches have shown promise, yet their reliance on explicit ``concept replacement" can sometimes lead to semantic incongruity. To address these limitations, we explore a more flexible detect-then-suppress paradigm. We introduce SafeCtrl, a lightweight, non-intrusive plugin that first precisely localizes unsafe content. Instead of performing a hard A-to-B substitution, SafeCtrl then suppresses the harmful semantics, allowing the generative process to naturally and coherently resolve into a safe, context-aware alternative. A key aspect of our work is a novel training strategy using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques
