Predicting the Future with Transformational States

Andrew Jaegle; Oleh Rybkin; Konstantinos G. Derpanis; Kostas; Daniilidis

arXiv:1803.09760·cs.CV·March 28, 2018·1 cites

Predicting the Future with Transformational States

Andrew Jaegle, Oleh Rybkin, Konstantinos G. Derpanis, Kostas, Daniilidis

PDF

Open Access

TL;DR

This paper introduces a neural network model that predicts future images by learning present states and their transformations, capturing realistic motion over multiple frames without adversarial training, and achieves competitive results on standard benchmarks.

Contribution

The paper presents a novel architecture combining latent state components with an RNN core for stable, multi-frame image prediction without adversarial training.

Findings

01

Generates stable, realistic motion sequences over multiple frames.

02

Achieves prediction accuracy comparable to state-of-the-art methods.

03

Operates effectively on benchmarks like Moving MNIST, KTH, and UCF101.

Abstract

An intelligent observer looks at the world and sees not only what is, but what is moving and what can be moved. In other words, the observer sees how the present state of the world can transform in the future. We propose a model that predicts future images by learning to represent the present state and its transformation given only a sequence of images. To do so, we introduce an architecture with a latent state composed of two components designed to capture (i) the present image state and (ii) the transformation between present and future states, respectively. We couple this latent state with a recurrent neural network (RNN) core that predicts future frames by transforming past states into future states by applying the accumulated state transformation with a learned operator. We describe how this model can be integrated into an encoder-decoder convolutional neural network (CNN)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Pose and Action Recognition