I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion   Models

Shiwei Zhang; Jiayu Wang; Yingya Zhang; Kang Zhao; Hangjie Yuan; Zhiwu; Qin; Xiang Wang; Deli Zhao; Jingren Zhou

arXiv:2311.04145·cs.CV·November 8, 2023·22 cites

I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models

Shiwei Zhang, Jiayu Wang, Yingya Zhang, Kang Zhao, Hangjie Yuan, Zhiwu, Qin, Xiang Wang, Deli Zhao, Jingren Zhou

PDF

Open Access 3 Repos 4 Models

TL;DR

I2VGen-XL is a cascaded diffusion model that significantly improves high-quality image-to-video synthesis by decoupling semantic accuracy and detail refinement, utilizing large-scale data and a two-stage process.

Contribution

The paper introduces I2VGen-XL, a novel cascaded diffusion approach that enhances image-to-video synthesis by decoupling semantic and detail refinement stages, and leverages extensive datasets.

Findings

01

Outperforms current top methods in semantic accuracy and visual quality.

02

Achieves high-resolution videos at 1280×720 with improved continuity.

03

Demonstrates effectiveness across diverse datasets.

Abstract

Video synthesis has recently made remarkable strides benefiting from the rapid development of diffusion models. However, it still encounters challenges in terms of semantic accuracy, clarity and spatio-temporal continuity. They primarily arise from the scarcity of well-aligned text-video data and the complex inherent structure of videos, making it difficult for the model to simultaneously ensure semantic and qualitative excellence. In this report, we propose a cascaded I2VGen-XL approach that enhances model performance by decoupling these two factors and ensures the alignment of the input data by utilizing static images as a form of crucial guidance. I2VGen-XL consists of two stages: i) the base stage guarantees coherent semantics and preserves content from input images by using two hierarchical encoders, and ii) the refinement stage enhances the video's details by incorporating an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Cell Image Analysis Techniques

MethodsDiffusion · Balanced Selection