RainFusion2.0: Temporal-Spatial Awareness and Hardware-Efficient Block-wise Sparse Attention
Aiyue Chen, Yaofu Liu, Junjian Huang, Guang Lian, Yiwu Yao, Wangli Lan, Jing Lin, Zhixin Ma, Tingting Zhou

TL;DR
RainFusion2.0 introduces a hardware-efficient, adaptive sparse attention mechanism for diffusion models, significantly reducing computational costs while maintaining quality across diverse hardware.
Contribution
It proposes a novel block-wise sparse attention method with low overhead, adaptable to multiple hardware platforms, enhancing efficiency in video and image generation.
Findings
Achieves 80% sparsity with 1.5-1.8x speedup
Maintains high video quality with reduced computation
Demonstrates robustness across different hardware platforms
Abstract
In video and image generation tasks, Diffusion Transformer (DiT) models incur extremely high computational costs due to attention mechanisms, which limits their practical applications. Furthermore, with hardware advancements, a wide range of devices besides graphics processing unit (GPU), such as application-specific integrated circuit (ASIC), have been increasingly adopted for model inference. Sparse attention, which leverages the inherent sparsity of attention by skipping computations for insignificant tokens, is an effective approach to mitigate computational costs. However, existing sparse attention methods have two critical limitations: the overhead of sparse pattern prediction and the lack of hardware generality, as most of these methods are designed for GPU. To address these challenges, this study proposes RainFusion2.0, which aims to develop an online adaptive,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
