SafeRoPE: Risk-specific Head-wise Embedding Rotation for Safe Generation in Rectified Flow Transformers

Xiang Yang; Feifei Li; Mi Zhang; Geng Hong; Xiaoyu You; Min Yang

arXiv:2604.01826·cs.CV·April 3, 2026

SafeRoPE: Risk-specific Head-wise Embedding Rotation for Safe Generation in Rectified Flow Transformers

Xiang Yang, Feifei Li, Mi Zhang, Geng Hong, Xiaoyu You, Min Yang

PDF

1 Repo

TL;DR

SafeRoPE introduces a risk-specific, head-wise embedding rotation method that effectively mitigates unsafe semantics in rectified flow transformer models for text-to-image generation, maintaining high fidelity.

Contribution

It proposes a novel, lightweight framework that constructs unsafe subspaces and applies head-wise RoPE perturbations for precise safety control in transformer-based diffusion models.

Findings

01

SafeRoPE outperforms existing methods in balancing safety and image quality.

02

It effectively suppresses unsafe semantics without degrading benign content.

03

Codes are available at https://github.com/deng12yx/SafeRoPE.

Abstract

Recent Text-to-Image (T2I) models based on rectified-flow transformers (e.g., SD3, FLUX) achieve high generative fidelity but remain vulnerable to unsafe semantics, especially when triggered by multi-token interactions. Existing mitigation methods largely rely on fine-tuning or attention modulation for concept unlearning; however, their expensive computational overhead and design tailored to U-Net-based denoisers hinder direct adaptation to transformer-based diffusion models (e.g., MMDiT). In this paper, we conduct an in-depth analysis of the attention mechanism in MMDiT and find that unsafe semantics concentrate within interpretable, low-dimensional subspaces at head level, where a finite set of safety-critical heads is responsible for unsafe feature extraction. We further observe that perturbing the Rotary Positional Embedding (RoPE) applied to the query and key vectors can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deng12yx/SafeRoPE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.