Learning to Generate Long-term Future via Hierarchical Prediction
Ruben Villegas, Jimei Yang, Yuliang Zou, Sungryull Sohn, Xunyu Lin,, Honglak Lee

TL;DR
This paper introduces a hierarchical method for long-term video prediction that estimates high-level structures first, reducing error propagation and improving accuracy over existing recursive pixel-level prediction methods.
Contribution
The paper presents a novel hierarchical approach combining LSTM and analogy-based CNNs to predict future video frames without recursive pixel-level predictions, enhancing long-term prediction accuracy.
Findings
Outperforms state-of-the-art on Human3.6M and Penn Action datasets
Effectively reduces error accumulation in long-term predictions
Demonstrates significant improvement in human action video forecasting
Abstract
We propose a hierarchical approach for making long-term predictions of future frames. To avoid inherent compounding errors in recursive pixel-level prediction, we propose to first estimate high-level structure in the input frames, then predict how that structure evolves in the future, and finally by observing a single frame from the past and the predicted high-level structure, we construct the future frames without having to observe any of the pixel-level predictions. Long-term video prediction is difficult to perform by recurrently observing the predicted frames because the small errors in pixel space exponentially amplify as predictions are made deeper into the future. Our approach prevents pixel-level error propagation from happening by removing the need to observe the predicted frames. Our model is built with a combination of LSTM and analogy based encoder-decoder convolutional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Image Enhancement Techniques
