Imagen Video: High Definition Video Generation with Diffusion Models

Jonathan Ho; William Chan; Chitwan Saharia; Jay Whang; Ruiqi Gao,; Alexey Gritsenko; Diederik P. Kingma; Ben Poole; Mohammad Norouzi; David J.; Fleet; Tim Salimans

arXiv:2210.02303·cs.CV·October 6, 2022·346 cites

Imagen Video: High Definition Video Generation with Diffusion Models

Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao,, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J., Fleet, Tim Salimans

PDF

Open Access

TL;DR

Imagen Video introduces a high-definition, text-conditional video generation system using diffusion models, achieving high fidelity, controllability, and diverse artistic outputs through innovative scaling, super-resolution, and distillation techniques.

Contribution

The paper presents a novel cascade of diffusion models for high-definition text-to-video generation, incorporating design choices like super-resolution and progressive distillation for improved quality and speed.

Findings

01

High fidelity video generation from text prompts

02

Ability to generate diverse and artistic videos

03

Fast sampling enabled by progressive distillation

Abstract

We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. We describe how we scale up the system as a high definition text-to-video model including design decisions such as the choice of fully-convolutional temporal and spatial super-resolution models at certain resolutions, and the choice of the v-parameterization of diffusion models. In addition, we confirm and transfer findings from previous work on diffusion-based image generation to the video generation setting. Finally, we apply progressive distillation to our video models with classifier-free guidance for fast, high quality sampling. We find Imagen Video not only capable of generating videos…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Advanced Image Processing Techniques

MethodsBalanced Selection · Diffusion