MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and   Interpolation

Vikram Voleti; Alexia Jolicoeur-Martineau; Christopher Pal

arXiv:2205.09853·cs.CV·October 14, 2022·46 cites

MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation

Vikram Voleti, Alexia Jolicoeur-Martineau, Christopher Pal

PDF

Open Access 2 Repos 2 Models 1 Video

TL;DR

MCVD introduces a versatile probabilistic diffusion framework that unifies multiple video synthesis tasks such as prediction, generation, and interpolation, achieving state-of-the-art results with simple architectures.

Contribution

The paper proposes a single, flexible diffusion-based model trained with random masking to handle various video tasks simultaneously, a novel approach in video synthesis.

Findings

01

Achieves state-of-the-art results on standard benchmarks.

02

Generates high-quality diverse video frames.

03

Trains efficiently within 1-12 days on limited GPUs.

Abstract

Video prediction is a challenging task. The quality of video frames from current state-of-the-art (SOTA) generative models tends to be poor and generalization beyond the training data is difficult. Furthermore, existing prediction frameworks are typically not capable of simultaneously handling other video-related tasks such as unconditional generation or interpolation. In this work, we devise a general-purpose framework called Masked Conditional Video Diffusion (MCVD) for all of these video synthesis tasks using a probabilistic conditional score-based denoising diffusion model, conditioned on past and/or future frames. We train the model in a manner where we randomly and independently mask all the past frames or all the future frames. This novel but straightforward setup allows us to train a single model that is capable of executing a broad range of video tasks, specifically:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

MCVD - Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Model Reduction and Neural Networks

MethodsDiffusion