CCVS: Context-aware Controllable Video Synthesis
Guillaume Le Moing, Jean Ponce, Cordelia Schmid

TL;DR
This paper presents CCVS, a self-supervised, controllable video synthesis method that enhances realism and spatial resolution by conditioning on contextual and ancillary information, using autoregressive models, adversarial training, and multimodal control mechanisms.
Contribution
It introduces a novel self-supervised framework for controllable video synthesis that incorporates contextual conditioning, multimodal ancillary information, and a learnable optical flow module for improved realism.
Findings
Achieves high-quality video synthesis with strong spatial and temporal consistency.
Demonstrates flexibility in controlling synthesis through multimodal ancillary inputs.
Outperforms existing methods on multiple benchmarks.
Abstract
This presentation introduces a self-supervised learning approach to the synthesis of new video clips from old ones, with several new key elements for improved spatial resolution and realism: It conditions the synthesis process on contextual information for temporal continuity and ancillary information for fine control. The prediction model is doubly autoregressive, in the latent space of an autoencoder for forecasting, and in image space for updating contextual information, which is also used to enforce spatio-temporal consistency through a learnable optical flow module. Adversarial training of the autoencoder in the appearance and temporal domains is used to further improve the realism of its output. A quantizer inserted between the encoder and the transformer in charge of forecasting future frames in latent space (and its inverse inserted between the transformer and the decoder) adds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Advanced Image Processing Techniques
