SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation

Shenggan Cheng; Yuanxin Wei; Lansong Diao; Yong Liu; Bujiao Chen; Lianghua Huang; Yu Liu; Wenyuan Yu; Jiangsu Du; Wei Lin; Yang You

arXiv:2505.19151·cs.GR·May 27, 2025

SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation

Shenggan Cheng, Yuanxin Wei, Lansong Diao, Yong Liu, Bujiao Chen, Lianghua Huang, Yu Liu, Wenyuan Yu, Jiangsu Du, Wei Lin, Yang You

PDF

Open Access

TL;DR

SRDiffusion introduces a collaborative approach between large and small models to significantly accelerate high-quality video diffusion inference without compromising visual fidelity.

Contribution

It proposes a novel collaboration framework leveraging large and small models for efficient video diffusion inference, surpassing existing acceleration methods.

Findings

01

Over 3× speedup for Wan with minimal quality loss

02

2× speedup for CogVideoX

03

Outperforms existing acceleration approaches

Abstract

Leveraging the diffusion transformer (DiT) architecture, models like Sora, CogVideoX and Wan have achieved remarkable progress in text-to-video, image-to-video, and video editing tasks. Despite these advances, diffusion-based video generation remains computationally intensive, especially for high-resolution, long-duration videos. Prior work accelerates its inference by skipping computation, usually at the cost of severe quality degradation. In this paper, we propose SRDiffusion, a novel framework that leverages collaboration between large and small models to reduce inference cost. The large model handles high-noise steps to ensure semantic and motion fidelity (Sketching), while the smaller model refines visual details in low-noise steps (Rendering). Experimental results demonstrate that our method outperforms existing approaches, over 3 $\times$ speedup for Wan with nearly no quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Face recognition and analysis

MethodsDiffusion