Training-free and Adaptive Sparse Attention for Efficient Long Video   Generation

Yifei Xia; Suhan Ling; Fangcheng Fu; Yujie Wang; Huixia Li; Xuefeng; Xiao; Bin Cui

arXiv:2502.21079·cs.CV·March 3, 2025

Training-free and Adaptive Sparse Attention for Efficient Long Video Generation

Yifei Xia, Suhan Ling, Fangcheng Fu, Yujie Wang, Huixia Li, Xuefeng, Xiao, Bin Cui

PDF

TL;DR

This paper introduces AdaSpa, a novel adaptive sparse attention method for Diffusion Transformers that significantly accelerates long video generation without sacrificing quality by leveraging hierarchical sparsity and dynamic pattern search.

Contribution

The paper presents AdaSpa, the first dynamic pattern and online precise search sparse attention method that is plug-and-play, dataset-independent, and improves efficiency in long video generation.

Findings

01

AdaSpa achieves substantial acceleration in video generation.

02

It maintains high video quality with reduced computational cost.

03

The method seamlessly integrates with existing Diffusion Transformers.

Abstract

Generating high-fidelity long videos with Diffusion Transformers (DiTs) is often hindered by significant latency, primarily due to the computational demands of attention mechanisms. For instance, generating an 8-second 720p video (110K tokens) with HunyuanVideo takes about 600 PFLOPs, with around 500 PFLOPs consumed by attention computations. To address this issue, we propose AdaSpa, the first Dynamic Pattern and Online Precise Search sparse attention method. Firstly, to realize the Dynamic Pattern, we introduce a blockified pattern to efficiently capture the hierarchical sparsity inherent in DiTs. This is based on our observation that sparse characteristics of DiTs exhibit hierarchical and blockified structures between and within different modalities. This blockified approach significantly reduces the complexity of attention computation while maintaining high fidelity in the generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.