High Fidelity Video Prediction with Large Stochastic Recurrent Neural   Networks

Ruben Villegas; Arkanath Pathak; Harini Kannan; Dumitru Erhan; Quoc V.; Le; Honglak Lee

arXiv:1911.01655·cs.CV·November 6, 2019·20 cites

High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V., Le, Honglak Lee

PDF

Open Access

TL;DR

This paper explores whether minimal inductive biases combined with large neural networks can effectively predict future video frames, achieving state-of-the-art results across diverse datasets without complex architectural assumptions.

Contribution

It presents the first large-scale empirical study on video prediction with minimal biases and demonstrates that large stochastic recurrent neural networks can outperform specialized models.

Findings

01

Achieved state-of-the-art performance on three diverse datasets.

02

Large models with minimal biases can effectively predict complex video dynamics.

03

Questioned the necessity of handcrafted inductive biases in video prediction.

Abstract

Predicting future video frames is extremely challenging, as there are many factors of variation that make up the dynamics of how frames change through time. Previously proposed solutions require complex inductive biases inside network architectures with highly specialized computation, including segmentation masks, optical flow, and foreground and background separation. In this work, we question if such handcrafted architectures are necessary and instead propose a different approach: finding minimal inductive bias for video prediction while maximizing network capacity. We investigate this question by performing the first large-scale empirical study and demonstrate state-of-the-art performance by learning large models on three different datasets: one for modeling object interactions, one for modeling human motion, and one for modeling car driving.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging