SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

Javad Rajabi; Kimia Shaban; Koorosh Roohi; David B. Lindell; Babak Taati

arXiv:2605.22668·cs.CV·May 22, 2026

SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

Javad Rajabi, Kimia Shaban, Koorosh Roohi, David B. Lindell, Babak Taati

PDF

1 Repo

TL;DR

SEGA is a training-free, adaptive attention scaling method for diffusion transformers that enhances high-resolution image synthesis by dynamically adjusting attention based on spatial-frequency content.

Contribution

SEGA introduces a novel, content-aware, frequency-guided attention scaling technique that improves resolution extrapolation without additional training.

Findings

01

SEGA outperforms existing training-free methods in high-resolution image generation.

02

It improves structural coherence and fine detail fidelity across multiple resolutions.

03

Experimental results demonstrate consistent performance gains over state-of-the-art baselines.

Abstract

Diffusion transformers (DiTs) have emerged as a dominant architecture for text-to-image generation, yet their performance drops when generating at resolutions beyond their training range. Existing training-free approaches mitigate this by modifying inference-time attention behavior, often through Rotary Position Embeddings (RoPE) extrapolation combined with attention scaling. However, these strategies apply a uniform and content-agnostic scaling across RoPE components with distinct frequency characteristics, inducing a trade-off between preserving global structure and recovering fine detail. We introduce SEGA, a training-free method that dynamically scales attention across RoPE components according to the latent's spatial-frequency structure at each denoising step. This adaptive scaling improves both structural coherence and fine-detail fidelity. Experiments show that SEGA consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rajabi2001/sega
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.