Self-Supervised Equivariant Scene Synthesis from Video

Cinjon Resnick; Or Litany; Cosmas Hei{\ss}; Hugo Larochelle; Joan; Bruna; Kyunghyun Cho

arXiv:2102.00863·cs.CV·February 2, 2021

Self-Supervised Equivariant Scene Synthesis from Video

Cinjon Resnick, Or Litany, Cosmas Hei{\ss}, Hugo Larochelle, Joan, Bruna, Kyunghyun Cho

PDF

Open Access

TL;DR

This paper introduces a self-supervised method for scene understanding from video that automatically separates background, characters, and animations, enabling real-time manipulation and synthesis of unseen scene combinations.

Contribution

It is the first to perform unsupervised extraction and synthesis of interpretable scene components like background, characters, and animations from video data.

Findings

01

Successfully applied to Moving MNIST, 2D video game sprites, and Fashion Modeling datasets.

02

Enables real-time manipulation of scene components.

03

Achieves unsupervised, interpretable scene decomposition.

Abstract

We propose a self-supervised framework to learn scene representations from video that are automatically delineated into background, characters, and their animations. Our method capitalizes on moving characters being equivariant with respect to their transformation across frames and the background being constant with respect to that same transformation. After training, we can manipulate image encodings in real time to create unseen combinations of the delineated components. As far as we know, we are the first method to perform unsupervised extraction and synthesis of interpretable background, character, and animation. We demonstrate results on three datasets: Moving MNIST with backgrounds, 2D video game sprites, and Fashion Modeling.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Computer Graphics and Visualization Techniques