RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy
Aiyue Chen, Bin Dong, Jingru Li, Jing Lin, Kun Tian, Yiwu Yao, Gongyi Wang

TL;DR
RainFusion introduces a training-free sparse attention method that exploits visual data sparsity to significantly accelerate 3D attention in video generation models while preserving quality.
Contribution
It proposes a novel, plug-and-play sparse attention technique with an adaptive recognition module that accelerates video generation without additional training.
Findings
Over 2x speedup in attention computation
Maintains video quality with minimal quality score impact
Applicable to multiple state-of-the-art models
Abstract
Video generation using diffusion models is highly computationally intensive, with 3D attention in Diffusion Transformer (DiT) models accounting for over 80\% of the total computational resources. In this work, we introduce {\bf RainFusion}, a novel training-free sparse attention method that exploits inherent sparsity nature in visual data to accelerate attention computation while preserving video quality. Specifically, we identify three unique sparse patterns in video generation attention calculations--Spatial Pattern, Temporal Pattern and Textural Pattern. The sparse pattern for each attention head is determined online with negligible overhead (\textasciitilde\,0.2\%) with our proposed {\bf ARM} (Adaptive Recognition Module) during inference. Our proposed {\bf RainFusion} is a plug-and-play method, that can be seamlessly integrated into state-of-the-art 3D-attention video generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Video Coding and Compression Technologies · Advanced Vision and Imaging
