HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention
Xuzhe Zheng, Yuexiao Ma, Jing Xu, Xiawu Zheng, Rongrong Ji, Fei Chao

TL;DR
HASTE introduces a training-free, head-wise adaptive sparse attention method for video diffusion models, significantly accelerating inference while preserving quality.
Contribution
It proposes a novel head-wise adaptive framework with mask reuse and error-guided calibration to improve speed-quality trade-offs in sparse attention.
Findings
Achieves up to 1.93x speedup at 720P resolution.
Maintains competitive video quality and similarity metrics.
Improves efficiency of pretrained video diffusion models.
Abstract
Diffusion-based video generation has advanced substantially in visual fidelity and temporal coherence, but practical deployment remains limited by the quadratic complexity of full attention. Training-free sparse attention is attractive because it accelerates pretrained models without retraining, yet existing online top- sparse attention still spends non-negligible cost on mask prediction and applies shared thresholds despite strong head-level heterogeneity. We show that these two overlooked factors limit the practical speed-quality trade-off of training-free sparse attention in Video DiTs. To address them, we introduce a head-wise adaptive framework with two plug-in components: Temporal Mask Reuse, which skips unnecessary mask prediction based on query-key drift, and Error-guided Budgeted Calibration, which assigns per-head top- thresholds by minimizing measured model-output error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
