I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
Shiwei Zhang, Jiayu Wang, Yingya Zhang, Kang Zhao, Hangjie Yuan, Zhiwu, Qin, Xiang Wang, Deli Zhao, Jingren Zhou

TL;DR
I2VGen-XL is a cascaded diffusion model that significantly improves high-quality image-to-video synthesis by decoupling semantic accuracy and detail refinement, utilizing large-scale data and a two-stage process.
Contribution
The paper introduces I2VGen-XL, a novel cascaded diffusion approach that enhances image-to-video synthesis by decoupling semantic and detail refinement stages, and leverages extensive datasets.
Findings
Outperforms current top methods in semantic accuracy and visual quality.
Achieves high-resolution videos at 1280×720 with improved continuity.
Demonstrates effectiveness across diverse datasets.
Abstract
Video synthesis has recently made remarkable strides benefiting from the rapid development of diffusion models. However, it still encounters challenges in terms of semantic accuracy, clarity and spatio-temporal continuity. They primarily arise from the scarcity of well-aligned text-video data and the complex inherent structure of videos, making it difficult for the model to simultaneously ensure semantic and qualitative excellence. In this report, we propose a cascaded I2VGen-XL approach that enhances model performance by decoupling these two factors and ensures the alignment of the input data by utilizing static images as a form of crucial guidance. I2VGen-XL consists of two stages: i) the base stage guarantees coherent semantics and preserves content from input images by using two hierarchical encoders, and ii) the refinement stage enhances the video's details by incorporating an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Cell Image Analysis Techniques
MethodsDiffusion · Balanced Selection
