Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation
Xingyang Li, Muyang Li, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, Song Han

TL;DR
This paper introduces Radial Attention, a sparse attention mechanism with O(n log n) complexity for long video generation, leveraging energy decay principles to improve efficiency and extendability of diffusion models.
Contribution
It proposes Radial Attention, a novel sparse attention method inspired by energy decay, enabling scalable, efficient long video generation and fine-tuning of pre-trained diffusion models.
Findings
Achieves up to 1.9× speedup over dense attention.
Enables up to 4× longer video generation with minimal tuning.
Reduces training costs by up to 4.4×.
Abstract
Recent advances in diffusion models have enabled high-quality video generation, but the additional temporal dimension significantly increases computational costs, making training and inference on long videos prohibitively expensive. In this paper, we identify a phenomenon we term Spatiotemporal Energy Decay in video diffusion models: post-softmax attention scores diminish as spatial and temporal distance between tokens increase, akin to the physical decay of signal or waves over space and time in nature. Motivated by this, we propose Radial Attention, a scalable sparse attention mechanism with complexity that translates energy decay into exponentially decaying compute density, which is significantly more efficient than standard dense attention and more expressive than linear attention. Specifically, Radial Attention employs a simple, static…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques
