VideoCrafter2: Overcoming Data Limitations for High-Quality Video   Diffusion Models

Haoxin Chen; Yong Zhang; Xiaodong Cun; Menghan Xia; Xintao Wang; Chao; Weng; Ying Shan

arXiv:2401.09047·cs.CV·January 18, 2024·1 cites

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Haoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao, Weng, Ying Shan

PDF

Open Access 2 Repos 1 Models

TL;DR

VideoCrafter2 introduces a novel training scheme that leverages low-quality videos and synthesized high-quality images to develop high-quality video diffusion models without requiring large-scale high-quality datasets.

Contribution

The paper proposes a new training approach extending Stable Diffusion, enabling high-quality video generation from low-quality data by finetuning spatial modules with high-quality images.

Findings

01

High-quality video models can be trained using low-quality videos and synthesized images.

02

Finetuning spatial modules improves video quality without motion degradation.

03

The method outperforms existing approaches in picture quality, motion, and concept composition.

Abstract

Text-to-video generation aims to produce a video based on a given prompt. Recently, several commercial video models have been able to generate plausible videos with minimal noise, excellent details, and high aesthetic scores. However, these models rely on large-scale, well-filtered, high-quality videos that are not accessible to the community. Many existing research works, which train models using the low-quality WebVid-10M dataset, struggle to generate high-quality videos because the models are optimized to fit WebVid-10M. In this work, we explore the training scheme of video models extended from Stable Diffusion and investigate the feasibility of leveraging low-quality videos and synthesized high-quality images to obtain a high-quality video model. We first analyze the connection between the spatial and temporal modules of video models and the distribution shift to low-quality videos.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
ReySajju742/VideoCrafter
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Advanced Vision and Imaging

MethodsDiffusion