Rectified SpaAttn: Revisiting Attention Sparsity for Efficient Video Generation

Xuewen Liu; Zhikai Li; Jing Zhang; Mengjuan Chen; and Qingyi Gu

arXiv:2511.19835·cs.CV·November 26, 2025

Rectified SpaAttn: Revisiting Attention Sparsity for Efficient Video Generation

Xuewen Liu, Zhikai Li, Jing Zhang, Mengjuan Chen, and Qingyi Gu

PDF

Open Access

TL;DR

This paper introduces Rectified SpaAttn, a novel attention mechanism that improves the efficiency and accuracy of sparse attention in diffusion transformers for video generation, reducing computational costs while maintaining quality.

Contribution

We propose Rectified SpaAttn, which corrects biases in existing sparse attention methods, and develop specific rectification techniques to better align sparse and full attention maps.

Findings

01

Achieves up to 3.33x speedup on HunyuanVideo

02

Maintains high video generation quality

03

Addresses systematic biases in attention allocation

Abstract

Diffusion Transformers dominate video generation, but the quadratic complexity of attention computation introduces substantial latency. Attention sparsity reduces computational costs by focusing on critical tokens while ignoring non-critical tokens. However, existing methods suffer from severe performance degradation. In this paper, we revisit attention sparsity and reveal that existing methods induce systematic biases in attention allocation: (1) excessive focus on critical tokens amplifies their attention weights; (2) complete neglect of non-critical tokens causes the loss of relevant attention weights. To address these issues, we propose Rectified SpaAttn, which rectifies attention allocation with implicit full attention reference, thereby enhancing the alignment between sparse and full attention maps. Specifically: (1) for critical tokens, we show that their bias is proportional to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Visual Attention and Saliency Detection