Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

Chengtao Lv; Yumeng Shi; Yushi Huang; Ruihao Gong; Shen Ren; Wenya Wang

arXiv:2602.04789·cs.CV·February 5, 2026

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

Chengtao Lv, Yumeng Shi, Yushi Huang, Ruihao Gong, Shen Ren, Wenya Wang

PDF

Open Access

TL;DR

Light Forcing introduces a novel sparse attention mechanism tailored for autoregressive video generation, significantly improving efficiency and quality while maintaining high fidelity in generated videos.

Contribution

It presents the first sparse attention solution specifically designed for AR video models, incorporating Chunk-Aware Growth and Hierarchical Sparse Attention for better context utilization.

Findings

01

Outperforms existing sparse attention in quality (84.5 on VBench)

02

Achieves 1.2-1.3x speedup in end-to-end inference

03

Combined with FP8 quantization, reaches 19.7 FPS on RTX 5090

Abstract

Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attention solutions have shown promise on bidirectional models, we identify that applying these solutions to AR models leads to considerable performance degradation for two reasons: isolated consideration of chunk generation and insufficient utilization of past informative context. Motivated by these observations, we propose \textsc{Light Forcing}, the \textit{first} sparse attention solution tailored for AR video generation models. It incorporates a \textit{Chunk-Aware Growth} mechanism to quantitatively estimate the contribution of each chunk, which determines their sparsity allocation. This progressive sparsity increase strategy enables the current chunk to inherit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Visual Attention and Saliency Detection