MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

Weimin Wang; Jiawei Liu; Zhijie Lin; Jiangqiao Yan; Shuo Chen; Chetwin; Low; Tuyen Hoang; Jie Wu; Jun Hao Liew; Hanshu Yan; Daquan Zhou; Jiashi Feng

arXiv:2401.04468·cs.CV·January 10, 2024·2 cites

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

Weimin Wang, Jiawei Liu, Zhijie Lin, Jiangqiao Yan, Shuo Chen, Chetwin, Low, Tuyen Hoang, Jie Wu, Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng

PDF

Open Access

TL;DR

MagicVideo-V2 is an end-to-end multi-stage system that generates high-quality, aesthetically pleasing videos from text descriptions, integrating multiple modules for improved fidelity and smoothness.

Contribution

It introduces a novel multi-stage architecture combining text-to-image, motion, reference embedding, and interpolation modules for high-quality video synthesis.

Findings

01

Outperforms leading Text-to-Video systems in user evaluations.

02

Produces high-resolution, smooth, and aesthetically pleasing videos.

03

Demonstrates significant improvements in video fidelity and quality.

Abstract

The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field. In this work, we introduce MagicVideo-V2 that integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline. Benefiting from these architecture designs, MagicVideo-V2 can generate an aesthetically pleasing, high-resolution video with remarkable fidelity and smoothness. It demonstrates superior performance over leading Text-to-Video systems such as Runway, Pika 1.0, Morph, Moon Valley and Stable Video Diffusion model via user evaluation at large scale.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Advanced Vision and Imaging

MethodsDiffusion