VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Haoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao, Weng, Ying Shan

TL;DR
VideoCrafter2 introduces a novel training scheme that leverages low-quality videos and synthesized high-quality images to develop high-quality video diffusion models without requiring large-scale high-quality datasets.
Contribution
The paper proposes a new training approach extending Stable Diffusion, enabling high-quality video generation from low-quality data by finetuning spatial modules with high-quality images.
Findings
High-quality video models can be trained using low-quality videos and synthesized images.
Finetuning spatial modules improves video quality without motion degradation.
The method outperforms existing approaches in picture quality, motion, and concept composition.
Abstract
Text-to-video generation aims to produce a video based on a given prompt. Recently, several commercial video models have been able to generate plausible videos with minimal noise, excellent details, and high aesthetic scores. However, these models rely on large-scale, well-filtered, high-quality videos that are not accessible to the community. Many existing research works, which train models using the low-quality WebVid-10M dataset, struggle to generate high-quality videos because the models are optimized to fit WebVid-10M. In this work, we explore the training scheme of video models extended from Stable Diffusion and investigate the feasibility of leveraging low-quality videos and synthesized high-quality images to obtain a high-quality video model. We first analyze the connection between the spatial and temporal modules of video models and the distribution shift to low-quality videos.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Advanced Vision and Imaging
MethodsDiffusion
