Unsupervised Learning of Visual Structure using Predictive Generative   Networks

William Lotter; Gabriel Kreiman; David Cox

arXiv:1511.06380·cs.LG·January 21, 2016·83 cites

Unsupervised Learning of Visual Structure using Predictive Generative Networks

William Lotter, Gabriel Kreiman, David Cox

PDF

Open Access 2 Repos

TL;DR

This paper demonstrates that deep neural networks trained to predict future video frames can learn internal representations of 3D objects, generalize to new tasks, and outperform models trained only with reconstruction loss.

Contribution

It introduces a predictive generative network framework that learns rich, transformation-tolerant representations through unsupervised future frame prediction.

Findings

01

Achieves state-of-the-art performance in video prediction tasks.

02

Learns representations that generalize to object classification.

03

Outperforms reconstruction-only models in generalization.

Abstract

The ability to predict future states of the environment is a central pillar of intelligence. At its core, effective prediction requires an internal model of the world and an understanding of the rules by which the world changes. Here, we explore the internal models developed by deep neural networks trained using a loss based on predicting future frames in synthetic video sequences, using a CNN-LSTM-deCNN framework. We first show that this architecture can achieve excellent performance in visual sequence prediction tasks, including state-of-the-art performance in a standard 'bouncing balls' dataset (Sutskever et al., 2009). Using a weighted mean-squared error and adversarial loss (Goodfellow et al., 2014), the same architecture successfully extrapolates out-of-the-plane rotations of computer-generated faces. Furthermore, despite being trained end-to-end to predict only pixel-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques