Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models

Yiming Wu; Zhenghao Chen; Huan Wang; Dong Xu

arXiv:2411.18375·cs.CV·August 6, 2025

Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models

Yiming Wu, Zhenghao Chen, Huan Wang, Dong Xu

PDF

Open Access

TL;DR

This paper proposes a novel pruning method for Video Diffusion Models that preserves individual content and motion dynamics, significantly reducing inference time while maintaining high video quality.

Contribution

It introduces a new pruning approach combined with an ICMD loss to create a lightweight VDM variant called VDMini, enhancing efficiency without sacrificing performance.

Findings

01

Achieves up to 2.5x speedup in video generation tasks.

02

Maintains high-quality video output on multiple benchmarks.

03

Effectively preserves motion and content dynamics in pruned models.

Abstract

The high computational cost and slow inference time are major obstacles to deploying Video Diffusion Models (VDMs). To overcome this, we introduce a new Video Diffusion Model Compression approach using individual content and motion dynamics preserved pruning and consistency loss. First, we empirically observe that deeper VDM layers are crucial for maintaining the quality of \textbf{motion dynamics} (\textit{e.g.,} coherence of the entire video), while shallower layers are more focused on \textbf{individual content} (\textit{e.g.,} individual frames). Therefore, we prune redundant blocks from the shallower layers while preserving more of the deeper layers, resulting in a lightweight VDM variant called VDMini. Moreover, we propose an \textbf{Individual Content and Motion Dynamics (ICMD)} Consistency Loss to gain comparable generation performance as larger VDM to VDMini. In particular, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Vision and Imaging · Human Motion and Animation

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Pruning · Diffusion