Efficient Video Diffusion Models via Content-Frame Motion-Latent   Decomposition

Sihyun Yu; Weili Nie; De-An Huang; Boyi Li; Jinwoo Shin; Anima; Anandkumar

arXiv:2403.14148·cs.CV·March 22, 2024·1 cites

Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

Sihyun Yu, Weili Nie, De-An Huang, Boyi Li, Jinwoo Shin, Anima, Anandkumar

PDF

Open Access

TL;DR

This paper introduces a content-motion latent diffusion model (CMD) that efficiently generates high-quality videos by decomposing videos into content frames and motion latents, significantly reducing computational costs and improving quality.

Contribution

The paper presents a novel autoencoder that encodes videos into content and motion components, and leverages pretrained image diffusion models for efficient video generation.

Findings

01

CMD samples videos 7.7× faster than prior methods.

02

Achieves an FVD score of 212.7 on WebVid-10M, outperforming previous state-of-the-art.

03

Reduces computational costs while maintaining high-quality video generation.

Abstract

Video diffusion models have recently made great progress in generation quality, but are still limited by the high memory and computational requirements. This is because current video diffusion models often attempt to process high-dimensional videos directly. To tackle this issue, we propose content-motion latent diffusion model (CMD), a novel efficient extension of pretrained image diffusion models for video generation. Specifically, we propose an autoencoder that succinctly encodes a video as a combination of a content frame (like an image) and a low-dimensional motion latent representation. The former represents the common content, and the latter represents the underlying motion in the video, respectively. We generate the content frame by fine-tuning a pretrained image diffusion model, and we generate the motion latent representation by training a new lightweight diffusion model. A…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques

MethodsDiffusion · Latent Diffusion Model