Few Channels Draw The Whole Picture: Revealing Massive Activations in Diffusion Transformers
Evelyn Turri, Davide Bucciarelli, Sara Sarto, Lorenzo Baraldi, Marcella Cornia

TL;DR
This paper reveals that a small subset of massive activations in diffusion transformers critically influence image generation, organize spatial semantics, and can be transferred across prompts for controllable image synthesis.
Contribution
It uncovers the functional, spatial, and transfer properties of massive activations, highlighting their role as a sparse semantic carrier in diffusion transformer models.
Findings
Massive activations are functionally critical for image quality.
They are spatially organized and align with main image subjects.
Transferring massive activations enables prompt interpolation and subject-driven generation.
Abstract
Diffusion Transformers (DiTs) and related flow-based architectures are now among the strongest text-to-image generators, yet the internal mechanisms through which prompts shape image semantics remain poorly understood. In this work, we study massive activations: a small subset of hidden-state channels whose responses are consistently much larger than the rest. We show that, despite their sparsity, these few channels effectively draw the whole picture, in three complementary senses. First, they are functionally critical: a controlled disruption probe that zeroes the massive channels causes a sharp collapse in generation quality, while disrupting an equally-sized set of low-statistic channels has marginal effect. Second, they are spatially organized: restricting image-stream tokens to massive channels and clustering them yields coherent partitions that closely align with the main subject…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
