Ride the Wave: Precision-Allocated Sparse Attention for Smooth Video Generation
Wentai Zhang, Ronghui Xi, Shiyao Peng, Jiayu Huang, Haoran Luo, Zichen Tang, Haihong E

TL;DR
PASA is a novel, training-free sparse attention framework that enhances efficiency and temporal smoothness in high-fidelity video generation by adaptive budgeting, hardware-aligned approximations, and stochastic routing.
Contribution
It introduces a dynamic, curvature-aware budgeting and stochastic attention routing to improve speed and stability in video diffusion models without additional training.
Findings
PASA significantly accelerates inference in video diffusion models.
It produces more fluid and stable video sequences compared to existing methods.
The approach effectively eliminates flickering caused by static sparsity patterns.
Abstract
Video Diffusion Transformers have revolutionized high-fidelity video generation but suffer from the massive computational burden of self-attention. While sparse attention provides a promising acceleration solution, existing methods frequently provoke severe visual flickering caused by static sparsity patterns and deterministic block routing. To resolve these limitations, we propose Precision-Allocated Sparse Attention (PASA), a training-free framework designed for highly efficient and temporally smooth video generation. First, we implement a curvature-aware dynamic budgeting mechanism. By profiling the generation trajectory acceleration across timesteps, we elastically allocate the exact-computation budget to secure high-precision processing strictly during critical semantic transitions. Second, we replace global homogenizing estimations with hardware-aligned grouped approximations,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
