Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers

Eric Tillman Bill; Cristian Perez Jensen; Sotiris Anagnostidis; Dimitri von R\"utte

arXiv:2505.19122·cs.CV·May 27, 2025

Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers

Eric Tillman Bill, Cristian Perez Jensen, Sotiris Anagnostidis, Dimitri von R\"utte

PDF

Open Access

TL;DR

This paper introduces a magnitude-preserving design and rotation modulation for diffusion transformers, improving training stability and performance while reducing parameters, with potential insights into conditioning strategies.

Contribution

It proposes a novel magnitude-preserving approach and rotation modulation for diffusion transformers, enhancing training stability and efficiency without normalization layers.

Findings

01

Reduced FID scores by approximately 12.8%.

02

Rotation modulation with scaling is competitive with AdaLN.

03

Fewer parameters needed compared to existing methods.

Abstract

Denoising diffusion models exhibit remarkable generative capabilities, but remain challenging to train due to their inherent stochasticity, where high-variance gradient estimates lead to slow convergence. Previous works have shown that magnitude preservation helps with stabilizing training in the U-net architecture. This work explores whether this effect extends to the Diffusion Transformer (DiT) architecture. As such, we propose a magnitude-preserving design that stabilizes training without normalization layers. Motivated by the goal of maintaining activation magnitudes, we additionally introduce rotation modulation, which is a novel conditioning method using learned rotations instead of traditional scaling or shifting. Through empirical evaluations and ablation studies on small-scale models, we show that magnitude-preserving strategies significantly improve performance, notably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Neural Networks and Reservoir Computing · Magneto-Optical Properties and Applications

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection · Concatenated Skip Connection · Dense Connections · Max Pooling · Convolution · Softmax