SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer

Tongcheng Fang; Hanling Zhang; Ruiqi Xie; Zhuo Han; Xin Tao; Tianchen Zhao; Pengfei Wan; Wenbo Ding; Wanli Ouyang; Xuefei Ning; Yu Wang

arXiv:2601.16515·cs.CV·April 3, 2026

SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer

Tongcheng Fang, Hanling Zhang, Ruiqi Xie, Zhuo Han, Xin Tao, Tianchen Zhao, Pengfei Wan, Wenbo Ding, Wanli Ouyang, Xuefei Ning, Yu Wang

PDF

TL;DR

SALAD introduces a lightweight linear attention branch with a static-dynamic scaling strategy to achieve up to 90% sparsity and over 1.5x speedup in video diffusion transformers without sacrificing quality.

Contribution

The paper presents a novel parallel linear attention method with a scaling strategy, enabling high sparsity and efficient finetuning in video diffusion transformers.

Findings

01

Achieves up to 90% sparsity in attention mechanisms.

02

Provides 1.52-2.03x inference speedup across models.

03

Requires only 2,000 video samples and 30 GPU hours for finetuning.

Abstract

Diffusion Transformers have demonstrated remarkable performance in video generation. However, their long input sequences incur substantial latency due to the quadratic complexity of full attention. Various sparse attention mechanisms have been proposed. Training-free approaches are limited to moderate sparsity and thus yield only modest acceleration, whereas training-based methods can reach much higher sparsity but demand substantial data and computation. In this work, we propose SALAD, introducing a lightweight linear attention branch in parallel with the sparse attention. Leveraging a Multi-level Static-Dynamic Scaling Strategy to balance the two branches, our method attains up to 90% sparsity and 1.52-2.03x inference speedup across different models and sequence lengths, while maintaining generation quality comparable to the full attention baseline. Moreover, our finetuning process is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.