RainFusion2.0: Temporal-Spatial Awareness and Hardware-Efficient Block-wise Sparse Attention

Aiyue Chen; Yaofu Liu; Junjian Huang; Guang Lian; Yiwu Yao; Wangli Lan; Jing Lin; Zhixin Ma; Tingting Zhou

arXiv:2512.24086·cs.CV·April 21, 2026

RainFusion2.0: Temporal-Spatial Awareness and Hardware-Efficient Block-wise Sparse Attention

Aiyue Chen, Yaofu Liu, Junjian Huang, Guang Lian, Yiwu Yao, Wangli Lan, Jing Lin, Zhixin Ma, Tingting Zhou

PDF

TL;DR

RainFusion2.0 introduces a hardware-efficient, adaptive sparse attention mechanism for diffusion models, significantly reducing computational costs while maintaining quality across diverse hardware.

Contribution

It proposes a novel block-wise sparse attention method with low overhead, adaptable to multiple hardware platforms, enhancing efficiency in video and image generation.

Findings

01

Achieves 80% sparsity with 1.5-1.8x speedup

02

Maintains high video quality with reduced computation

03

Demonstrates robustness across different hardware platforms

Abstract

In video and image generation tasks, Diffusion Transformer (DiT) models incur extremely high computational costs due to attention mechanisms, which limits their practical applications. Furthermore, with hardware advancements, a wide range of devices besides graphics processing unit (GPU), such as application-specific integrated circuit (ASIC), have been increasingly adopted for model inference. Sparse attention, which leverages the inherent sparsity of attention by skipping computations for insignificant tokens, is an effective approach to mitigate computational costs. However, existing sparse attention methods have two critical limitations: the overhead of sparse pattern prediction and the lack of hardware generality, as most of these methods are designed for GPU. To address these challenges, this study proposes RainFusion2.0, which aims to develop an online adaptive,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.