Steering Video Diffusion Transformers with Massive Activations
Xianhang Cheng, Yujian Zheng, Zhenyu Xie, Tingting Liao, Hao Li

TL;DR
This paper investigates the role of Massive Activations in video diffusion transformers and introduces STAS, a training-free method that leverages these signals to improve video quality and coherence with minimal overhead.
Contribution
We identify structured patterns in Massive Activations and propose STAS, a novel, training-free activation steering technique for enhancing video diffusion models.
Findings
STAS improves video quality and temporal coherence.
Massive Activations exhibit a hierarchical magnitude pattern.
The method introduces negligible computational overhead.
Abstract
Despite rapid progress in video diffusion transformers, how their internal model signals can be leveraged with minimal overhead to enhance video generation quality remains underexplored. In this work, we study the role of Massive Activations (MAs), which are rare, high-magnitude hidden state spikes in video diffusion transformers. We observed that MAs emerge consistently across all visual tokens, with a clear magnitude hierarchy: first-frame tokens exhibit the largest MA magnitudes, latent-frame boundary tokens (the head and tail portions of each temporal chunk in the latent space) show elevated but slightly lower MA magnitudes than the first frame, and interior tokens within each latent frame remain elevated, yet are comparatively moderate in magnitude. This structured pattern suggests that the model implicitly prioritizes token positions aligned with the temporal chunking in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neuroimaging Techniques and Applications · Image and Video Quality Assessment · Generative Adversarial Networks and Image Synthesis
