Controllable Video Generation through Global and Local Motion Dynamics
Aram Davtyan, Paolo Favaro

TL;DR
GLASS is an unsupervised generative model that synthesizes realistic videos from a single image by learning global and local motion dynamics, including a new synthetic dataset for evaluation.
Contribution
The paper introduces GLASS, a novel method for controllable video generation using global and local action representations, and presents W-Sprites, a new synthetic dataset for action modeling.
Findings
GLASS can generate realistic videos from a single image.
It successfully learns complex action spaces beyond prior work.
The method performs well on both synthetic and real datasets.
Abstract
We present GLASS, a method for Global and Local Action-driven Sequence Synthesis. GLASS is a generative model that is trained on video sequences in an unsupervised manner and that can animate an input image at test time. The method learns to segment frames into foreground-background layers and to generate transitions of the foregrounds over time through a global and local action representation. Global actions are explicitly related to 2D shifts, while local actions are instead related to (both geometric and photometric) local deformations. GLASS uses a recurrent neural network to transition between frames and is trained through a reconstruction loss. We also introduce W-Sprites (Walking Sprites), a novel synthetic dataset with a predefined action space. We evaluate our method on both W-Sprites and real datasets, and find that GLASS is able to generate realistic video sequences from a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation
