CCVS: Context-aware Controllable Video Synthesis

Guillaume Le Moing; Jean Ponce; Cordelia Schmid

arXiv:2107.08037·cs.CV·October 27, 2021·30 cites

CCVS: Context-aware Controllable Video Synthesis

Guillaume Le Moing, Jean Ponce, Cordelia Schmid

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper presents CCVS, a self-supervised, controllable video synthesis method that enhances realism and spatial resolution by conditioning on contextual and ancillary information, using autoregressive models, adversarial training, and multimodal control mechanisms.

Contribution

It introduces a novel self-supervised framework for controllable video synthesis that incorporates contextual conditioning, multimodal ancillary information, and a learnable optical flow module for improved realism.

Findings

01

Achieves high-quality video synthesis with strong spatial and temporal consistency.

02

Demonstrates flexibility in controlling synthesis through multimodal ancillary inputs.

03

Outperforms existing methods on multiple benchmarks.

Abstract

This presentation introduces a self-supervised learning approach to the synthesis of new video clips from old ones, with several new key elements for improved spatial resolution and realism: It conditions the synthesis process on contextual information for temporal continuity and ancillary information for fine control. The prediction model is doubly autoregressive, in the latent space of an autoencoder for forecasting, and in image space for updating contextual information, which is also used to enforce spatio-temporal consistency through a learnable optical flow module. Adversarial training of the autoencoder in the appearance and temporal domains is used to further improve the realism of its output. A quantizer inserted between the encoder and the transformer in charge of forecasting future frames in latent space (and its inverse inserted between the transformer and the decoder) adds…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

16lemoing/ccvs
pytorchOfficial

Videos

CCVS: Context-aware Controllable Video Synthesis· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Advanced Image Processing Techniques