Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models
Yiming Wu, Zhenghao Chen, Huan Wang, Dong Xu

TL;DR
This paper proposes a novel pruning method for Video Diffusion Models that preserves individual content and motion dynamics, significantly reducing inference time while maintaining high video quality.
Contribution
It introduces a new pruning approach combined with an ICMD loss to create a lightweight VDM variant called VDMini, enhancing efficiency without sacrificing performance.
Findings
Achieves up to 2.5x speedup in video generation tasks.
Maintains high-quality video output on multiple benchmarks.
Effectively preserves motion and content dynamics in pruned models.
Abstract
The high computational cost and slow inference time are major obstacles to deploying Video Diffusion Models (VDMs). To overcome this, we introduce a new Video Diffusion Model Compression approach using individual content and motion dynamics preserved pruning and consistency loss. First, we empirically observe that deeper VDM layers are crucial for maintaining the quality of \textbf{motion dynamics} (\textit{e.g.,} coherence of the entire video), while shallower layers are more focused on \textbf{individual content} (\textit{e.g.,} individual frames). Therefore, we prune redundant blocks from the shallower layers while preserving more of the deeper layers, resulting in a lightweight VDM variant called VDMini. Moreover, we propose an \textbf{Individual Content and Motion Dynamics (ICMD)} Consistency Loss to gain comparable generation performance as larger VDM to VDMini. In particular, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Vision and Imaging · Human Motion and Animation
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Pruning · Diffusion
