Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao,, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J., Fleet, Tim Salimans

TL;DR
Imagen Video introduces a high-definition, text-conditional video generation system using diffusion models, achieving high fidelity, controllability, and diverse artistic outputs through innovative scaling, super-resolution, and distillation techniques.
Contribution
The paper presents a novel cascade of diffusion models for high-definition text-to-video generation, incorporating design choices like super-resolution and progressive distillation for improved quality and speed.
Findings
High fidelity video generation from text prompts
Ability to generate diverse and artistic videos
Fast sampling enabled by progressive distillation
Abstract
We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. We describe how we scale up the system as a high definition text-to-video model including design decisions such as the choice of fully-convolutional temporal and spatial super-resolution models at certain resolutions, and the choice of the v-parameterization of diffusion models. In addition, we confirm and transfer findings from previous work on diffusion-based image generation to the video generation setting. Finally, we apply progressive distillation to our video models with classifier-free guidance for fast, high quality sampling. We find Imagen Video not only capable of generating videos…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Advanced Image Processing Techniques
MethodsBalanced Selection · Diffusion
