Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention
Chengtao Lv, Yumeng Shi, Yushi Huang, Ruihao Gong, Shen Ren, Wenya Wang

TL;DR
Light Forcing introduces a novel sparse attention mechanism tailored for autoregressive video generation, significantly improving efficiency and quality while maintaining high fidelity in generated videos.
Contribution
It presents the first sparse attention solution specifically designed for AR video models, incorporating Chunk-Aware Growth and Hierarchical Sparse Attention for better context utilization.
Findings
Outperforms existing sparse attention in quality (84.5 on VBench)
Achieves 1.2-1.3x speedup in end-to-end inference
Combined with FP8 quantization, reaches 19.7 FPS on RTX 5090
Abstract
Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attention solutions have shown promise on bidirectional models, we identify that applying these solutions to AR models leads to considerable performance degradation for two reasons: isolated consideration of chunk generation and insufficient utilization of past informative context. Motivated by these observations, we propose \textsc{Light Forcing}, the \textit{first} sparse attention solution tailored for AR video generation models. It incorporates a \textit{Chunk-Aware Growth} mechanism to quantitatively estimate the contribution of each chunk, which determines their sparsity allocation. This progressive sparsity increase strategy enables the current chunk to inherit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Visual Attention and Saliency Detection
