SafeCtrl: Region-Aware Safety Control for Text-to-Image Diffusion via Detect-Then-Suppress
Lingyun Zhang, Yu Xie, Zhongli Fang, Yu Liu, Ping Chen

TL;DR
SafeCtrl introduces a region-aware safety control framework for text-to-image diffusion models that localizes and neutralizes harmful content while preserving image fidelity and resisting adversarial attacks.
Contribution
It proposes a novel Detect-Then-Suppress paradigm with attention-guided localization and localized suppression optimized via DPO, improving safety and robustness over existing methods.
Findings
SafeCtrl achieves better safety-fidelity trade-off than state-of-the-art methods.
It demonstrates increased resilience against adversarial prompt attacks.
Extensive experiments validate its effectiveness across multiple risk categories.
Abstract
The widespread deployment of text-to-image diffusion models is significantly challenged by the generation of visually harmful content, such as sexually explicit content, violence, and horror imagery. Common safety interventions, ranging from input filtering to model concept erasure, often suffer from two critical limitations: (1) a severe trade-off between safety and context preservation, where removing unsafe concepts degrades the fidelity of the safe content, and (2) vulnerability to adversarial attacks, where safety mechanisms are easily bypassed. To address these challenges, we propose SafeCtrl, a Region-Aware safety control framework operating on a Detect-Then-Suppress paradigm. Unlike global safety interventions, SafeCtrl first employs an attention-guided Detect module to precisely localize specific risk regions. Subsequently, a localized Suppress module, optimized via image-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
