FasterCache: Training-Free Video Diffusion Model Acceleration with High   Quality

Zhengyao Lv; Chenyang Si; Junhao Song; Zhenyu Yang; Yu Qiao; Ziwei; Liu; Kwan-Yee K. Wong

arXiv:2410.19355·cs.CV·March 13, 2025

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei, Liu, Kwan-Yee K. Wong

PDF

Open Access

TL;DR

FasterCache is a training-free method that accelerates video diffusion model inference by intelligently reusing features and optimizing classifier-free guidance, achieving significant speedups without quality loss.

Contribution

The paper introduces FasterCache, a novel training-free approach that enhances inference speed of video diffusion models through dynamic feature reuse and CFG optimization.

Findings

01

Achieves 1.67× speedup on Vchitect-2.0

02

Maintains comparable video quality to baseline

03

Outperforms existing acceleration methods

Abstract

In this paper, we present \textbf{\textit{FasterCache}}, a novel training-free strategy designed to accelerate the inference of video diffusion models with high-quality generation. By analyzing existing cache-based methods, we observe that \textit{directly reusing adjacent-step features degrades video quality due to the loss of subtle variations}. We further perform a pioneering investigation of the acceleration potential of classifier-free guidance (CFG) and reveal significant redundancy between conditional and unconditional features within the same timestep. Capitalizing on these observations, we introduce FasterCache to substantially accelerate diffusion-based video generation. Our key contributions include a dynamic feature reuse strategy that preserves both feature distinction and temporal continuity, and CFG-Cache which optimizes the reuse of conditional and unconditional outputs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Image and Signal Denoising Methods

MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings