Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers

Pengtao Chen; Xianfang Zeng; Maosen Zhao; Peng Ye; Mingzhu Shen; Wei Cheng; Gang Yu; Tao Chen

arXiv:2506.03065·cs.CV·June 4, 2025

Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers

Pengtao Chen, Xianfang Zeng, Maosen Zhao, Peng Ye, Mingzhu Shen, Wei Cheng, Gang Yu, Tao Chen

PDF

Open Access

TL;DR

Sparse-vDiT leverages identified sparsity patterns in attention maps to significantly accelerate video diffusion transformers, reducing computational complexity and inference time while maintaining high visual quality.

Contribution

This work introduces a novel sparsity acceleration framework for vDiT models, including pattern-optimized sparse kernels and an offline search algorithm for optimal sparse strategies.

Findings

01

Achieves over 2x FLOP reduction in vDiT models.

02

Realizes up to 1.85x inference speedup in practice.

03

Maintains high visual fidelity with PSNR up to 27.09.

Abstract

While Diffusion Transformers (DiTs) have achieved breakthroughs in video generation, this long sequence generation task remains constrained by the quadratic complexity of attention mechanisms, resulting in significant inference latency. Through detailed analysis of attention maps in Video Diffusion Transformer (vDiT), we identify three recurring sparsity patterns: diagonal, multi-diagonal, and vertical-stripe structures. And even 3-6\% attention heads can be skipped. Crucially, these patterns exhibit strong layer-depth and head-position correlations but show limited dependence on the input content. Leveraging these findings, we propose Sparse-vDiT, a sparsity acceleration framework for vDiT comprising: 1) Pattern-optimized sparse kernels that replace dense attention with computationally efficient implementations for each identified sparsity pattern. 2) An offline sparse diffusion search…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Neural Networks and Reservoir Computing