Learning to Generate Long-term Future via Hierarchical Prediction

Ruben Villegas; Jimei Yang; Yuliang Zou; Sungryull Sohn; Xunyu Lin,; Honglak Lee

arXiv:1704.05831·cs.CV·January 9, 2018·180 cites

Learning to Generate Long-term Future via Hierarchical Prediction

Ruben Villegas, Jimei Yang, Yuliang Zou, Sungryull Sohn, Xunyu Lin,, Honglak Lee

PDF

Open Access 2 Repos

TL;DR

This paper introduces a hierarchical method for long-term video prediction that estimates high-level structures first, reducing error propagation and improving accuracy over existing recursive pixel-level prediction methods.

Contribution

The paper presents a novel hierarchical approach combining LSTM and analogy-based CNNs to predict future video frames without recursive pixel-level predictions, enhancing long-term prediction accuracy.

Findings

01

Outperforms state-of-the-art on Human3.6M and Penn Action datasets

02

Effectively reduces error accumulation in long-term predictions

03

Demonstrates significant improvement in human action video forecasting

Abstract

We propose a hierarchical approach for making long-term predictions of future frames. To avoid inherent compounding errors in recursive pixel-level prediction, we propose to first estimate high-level structure in the input frames, then predict how that structure evolves in the future, and finally by observing a single frame from the past and the predicted high-level structure, we construct the future frames without having to observe any of the pixel-level predictions. Long-term video prediction is difficult to perform by recurrently observing the predicted frames because the small errors in pixel space exponentially amplify as predictions are made deeper into the future. Our approach prevents pixel-level error propagation from happening by removing the need to observe the predicted frames. Our model is built with a combination of LSTM and analogy based encoder-decoder convolutional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Image Enhancement Techniques