FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei, Liu, Kwan-Yee K. Wong

TL;DR
FasterCache is a training-free method that accelerates video diffusion model inference by intelligently reusing features and optimizing classifier-free guidance, achieving significant speedups without quality loss.
Contribution
The paper introduces FasterCache, a novel training-free approach that enhances inference speed of video diffusion models through dynamic feature reuse and CFG optimization.
Findings
Achieves 1.67× speedup on Vchitect-2.0
Maintains comparable video quality to baseline
Outperforms existing acceleration methods
Abstract
In this paper, we present \textbf{\textit{FasterCache}}, a novel training-free strategy designed to accelerate the inference of video diffusion models with high-quality generation. By analyzing existing cache-based methods, we observe that \textit{directly reusing adjacent-step features degrades video quality due to the loss of subtle variations}. We further perform a pioneering investigation of the acceleration potential of classifier-free guidance (CFG) and reveal significant redundancy between conditional and unconditional features within the same timestep. Capitalizing on these observations, we introduce FasterCache to substantially accelerate diffusion-based video generation. Our key contributions include a dynamic feature reuse strategy that preserves both feature distinction and temporal continuity, and CFG-Cache which optimizes the reuse of conditional and unconditional outputs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Image and Signal Denoising Methods
MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
