One Attention, One Scale: Phase-Aligned Rotary Positional Embeddings for Mixed-Resolution Diffusion Transformer
Haoyu Wu, Jingyi Xu, Qiaomu Miao, Dimitris Samaras, Hieu Le

TL;DR
The paper introduces CRPA, a novel method that aligns phases in rotary positional embeddings for mixed-resolution diffusion transformers, significantly improving stability and quality in image and video generation.
Contribution
It proposes a training-free fix, CRPA, that corrects phase aliasing in rotary embeddings, enhancing the stability and performance of pretrained diffusion transformers.
Findings
CRPA stabilizes all attention heads and layers.
CRPA outperforms previous methods in image and video generation quality.
CRPA is compatible with pretrained models, requiring no retraining.
Abstract
We identify a core failure mode that occurs when using the usual linear interpolation on rotary positional embeddings (RoPE) for mixed-resolution denoising with Diffusion Transformers. When tokens from different spatial grids are mixed, the attention mechanism collapses. The issue is structural. Linear coordinate remapping forces a single attention head to compare RoPE phases sampled at incompatible rates, creating phase aliasing that destabilizes the score landscape. Pretrained DiTs are especially brittle-many heads exhibit extremely sharp, periodic phase selectivity-so even tiny cross-rate inconsistencies reliably cause blur, artifacts, or full collapse. To this end, our main contribution is Cross-Resolution Phase-Aligned Attention (CRPA), a training-free drop-in fix that eliminates this failure at its source. CRPA modifies only the RoPE index map for each attention call: all Q/K…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Image and Video Quality Assessment · Image Enhancement Techniques
