Decomposing Motion and Content for Natural Video Sequence Prediction

Ruben Villegas; Jimei Yang; Seunghoon Hong; Xunyu Lin; Honglak Lee

arXiv:1706.08033·cs.CV·January 9, 2018·416 cites

Decomposing Motion and Content for Natural Video Sequence Prediction

Ruben Villegas, Jimei Yang, Seunghoon Hong, Xunyu Lin, Honglak Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces an end-to-end neural network that decomposes motion and content to improve future frame prediction in natural videos, achieving state-of-the-art results without separate training stages.

Contribution

The novel contribution is an end-to-end trainable architecture that separately models motion and content for video prediction, simplifying the prediction task and improving accuracy.

Findings

01

Achieves state-of-the-art performance on multiple video datasets.

02

Successfully decomposes motion and content without separate training.

03

Demonstrates effective pixel-level future frame prediction.

Abstract

We propose a deep neural network for the prediction of future frames in natural video sequences. To effectively handle complex evolution of pixels in videos, we propose to decompose the motion and content, two key components generating dynamics in videos. Our model is built upon the Encoder-Decoder Convolutional Neural Network and Convolutional LSTM for pixel-level prediction, which independently capture the spatial layout of an image and the corresponding temporal dynamics. By independently modeling motion and content, predicting the next frame reduces to converting the extracted content features into the next frame content by the identified motion features, which simplifies the task of prediction. Our model is end-to-end trainable over multiple time steps, and naturally learns to decompose motion and content without separate training. We evaluate the proposed network architecture on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rubenvillegas/iclr2017mcnet
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Human Pose and Action Recognition · Advanced Vision and Imaging

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory