VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

Haoxin Chen; Menghan Xia; Yingqing He; Yong Zhang; Xiaodong Cun,; Shaoshu Yang; Jinbo Xing; Yaofang Liu; Qifeng Chen; Xintao Wang; Chao Weng,; Ying Shan

arXiv:2310.19512·cs.CV·October 31, 2023·37 cites

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

Haoxin Chen, Menghan Xia, Yingqing He, Yong Zhang, Xiaodong Cun,, Shaoshu Yang, Jinbo Xing, Yaofang Liu, Qifeng Chen, Xintao Wang, Chao Weng,, Ying Shan

PDF

Open Access 3 Repos 1 Models

TL;DR

VideoCrafter1 introduces two open-source diffusion models for high-quality text-to-video and image-to-video generation, enabling realistic, cinematic videos and content-preserving video synthesis from images.

Contribution

The paper presents the first open-source I2V foundation model and a high-quality T2V model, advancing open-source video generation capabilities.

Findings

01

T2V model generates 1024x576 realistic videos surpassing existing open-source models.

02

I2V model preserves content, structure, and style from reference images.

03

Models are open-source, fostering community research and development.

Abstract

Video generation has increasingly gained interest in both academia and industry. Although commercial tools can generate plausible videos, there is a limited number of open-source models available for researchers and engineers. In this work, we introduce two diffusion models for high-quality video generation, namely text-to-video (T2V) and image-to-video (I2V) models. T2V models synthesize a video based on a given text input, while I2V models incorporate an additional image input. Our proposed T2V model can generate realistic and cinematic-quality videos with a resolution of $1024 \times 576$ , outperforming other open-source T2V models in terms of quality. The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style. This model is the first open-source I2V foundation model capable of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
ReySajju742/VideoCrafter
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization

MethodsDiffusion · Contrastive Language-Image Pre-training