Learned Equivariant Rendering without Transformation Supervision
Cinjon Resnick, Or Litany, Hugo Larochelle, Joan Bruna, Kyunghyun Cho

TL;DR
This paper introduces a self-supervised method for learning scene representations from videos, enabling real-time scene manipulation and rendering without explicit transformation supervision.
Contribution
It presents a novel framework that leverages object equivariance and background constancy to automatically delineate objects and backgrounds in videos.
Findings
Effective on moving MNIST with backgrounds
Allows real-time scene manipulation
No transformation supervision needed
Abstract
We propose a self-supervised framework to learn scene representations from video that are automatically delineated into objects and background. Our method relies on moving objects being equivariant with respect to their transformation across frames and the background being constant. After training, we can manipulate and render the scenes in real time to create unseen combinations of objects, transformations, and backgrounds. We show results on moving MNIST with backgrounds.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis
