TL;DR
NABLA introduces a neighborhood adaptive block-level attention mechanism that reduces computational complexity in video diffusion transformers, enabling faster training and inference without significant loss in quality.
Contribution
The paper presents NABLA, a novel attention mechanism that adapts to sparsity patterns, improving efficiency in video transformers without requiring custom operators.
Findings
Achieves up to 2.7x faster training and inference
Maintains comparable quantitative metrics and visual quality
Seamlessly integrates with PyTorch's Flex Attention
Abstract
Recent progress in transformer-based architectures has demonstrated remarkable success in video generation tasks. However, the quadratic complexity of full attention mechanisms remains a critical bottleneck, particularly for high-resolution and long-duration video sequences. In this paper, we propose NABLA, a novel Neighborhood Adaptive Block-Level Attention mechanism that dynamically adapts to sparsity patterns in video diffusion transformers (DiTs). By leveraging block-wise attention with adaptive sparsity-driven threshold, NABLA reduces computational overhead while preserving generative quality. Our method does not require custom low-level operator design and can be seamlessly integrated with PyTorch's Flex Attention operator. Experiments demonstrate that NABLA achieves up to 2.7x faster training and inference compared to baseline almost without compromising quantitative metrics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗kandinskylab/Kandinsky-5.0-I2V-Pro-sft-5s-Diffusersmodel· 273 dl· ♡ 27273 dl♡ 27
- 🤗ai-forever/Wan2.1-T2V-14B-NABLA-0.7model· 68 dl· ♡ 568 dl♡ 5
- 🤗ai-forever/Wan2.1-T2V-14B-NABLA-0.6-STA-11-3-3model· 83 dl· ♡ 183 dl♡ 1
- 🤗ai-forever/Wan2.1-T2V-14B-NABLA-0.5-STA-11-5-5model· 32 dl32 dl
- 🤗kandinskylab/Kandinsky-5.0-T2V-Lite-pretrain-5smodel· 6 dl· ♡ 106 dl♡ 10
- 🤗kandinskylab/Kandinsky-5.0-T2V-Lite-pretrain-10smodel· 2 dl· ♡ 12 dl♡ 1
- 🤗kandinskylab/Kandinsky-5.0-T2V-Lite-sft-5smodel· 10 dl10 dl
- 🤗kandinskylab/Kandinsky-5.0-T2V-Lite-sft-10smodel· 11 dl· ♡ 911 dl♡ 9
- 🤗kandinskylab/Kandinsky-5.0-T2V-Lite-distilled16steps-5smodel· 5 dl· ♡ 35 dl♡ 3
- 🤗kandinskylab/Kandinsky-5.0-T2V-Lite-nocfg-5smodel· 5 dl5 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
