TL;DR
SafeRoPE introduces a risk-specific, head-wise embedding rotation method that effectively mitigates unsafe semantics in rectified flow transformer models for text-to-image generation, maintaining high fidelity.
Contribution
It proposes a novel, lightweight framework that constructs unsafe subspaces and applies head-wise RoPE perturbations for precise safety control in transformer-based diffusion models.
Findings
SafeRoPE outperforms existing methods in balancing safety and image quality.
It effectively suppresses unsafe semantics without degrading benign content.
Codes are available at https://github.com/deng12yx/SafeRoPE.
Abstract
Recent Text-to-Image (T2I) models based on rectified-flow transformers (e.g., SD3, FLUX) achieve high generative fidelity but remain vulnerable to unsafe semantics, especially when triggered by multi-token interactions. Existing mitigation methods largely rely on fine-tuning or attention modulation for concept unlearning; however, their expensive computational overhead and design tailored to U-Net-based denoisers hinder direct adaptation to transformer-based diffusion models (e.g., MMDiT). In this paper, we conduct an in-depth analysis of the attention mechanism in MMDiT and find that unsafe semantics concentrate within interpretable, low-dimensional subspaces at head level, where a finite set of safety-critical heads is responsible for unsafe feature extraction. We further observe that perturbing the Rotary Positional Embedding (RoPE) applied to the query and key vectors can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
