Efficient Video Diffusion Models: Advancements and Challenges
Shitong Shao, Lichen Bai, Pengfei Wan, James Kwok, and Zeke Xie

TL;DR
This survey reviews efficient video diffusion models, categorizing methods to reduce inference costs and discussing open challenges for practical deployment.
Contribution
First comprehensive survey organizing existing methods into four paradigms and analyzing their trends for efficient video diffusion.
Findings
Four main paradigms: step distillation, efficient attention, model compression, cache/trajectory optimization.
Analysis of how design choices reduce function evaluations and per-step overhead.
Discussion of open challenges like quality preservation and hardware-software co-design.
Abstract
Video diffusion models have rapidly become the dominant paradigm for high-fidelity generative video synthesis, but their practical deployment remains constrained by severe inference costs. Compared with image generation, video synthesis compounds computation across spatial-temporal token growth and iterative denoising, making attention and memory traffic major bottlenecks in real-world settings. This survey provides a systematic and deployment-oriented review of efficient video diffusion models. We propose a unified categorization that organizes existing methods into four classes of main paradigms, including step distillation, efficient attention, model compression, and cache/trajectory optimization. Building on this categorization, we respectively analyze algorithmic trends of these four paradigms and examine how different design choices target two core objectives: reducing the number…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
