Hierarchical Model for Long-term Video Prediction
Peter Wang, Zhongxia Yan, Jeff Zhang

TL;DR
This paper introduces a hierarchical approach for long-term video prediction that estimates high-level structure first, then generates realistic frames using an analogy network, improving long-term prediction quality.
Contribution
The paper proposes a novel hierarchical model combining LSTMs and analogy networks with adversarial loss for improved long-term video prediction.
Findings
Effective high-level structure prediction over long sequences
Improved realism in generated video frames
Demonstrated on Penn Action dataset with promising results
Abstract
Video prediction has been an active topic of research in the past few years. Many algorithms focus on pixel-level predictions, which generates results that blur and disintegrate within a few frames. In this project, we use a hierarchical approach for long-term video prediction. We aim at estimating high-level structure in the input frame first, then predict how that structure grows in the future. Finally, we use an image analogy network to recover a realistic image from the predicted structure. Our method is largely adopted from the work by Villegas et al. The method is built with a combination of LSTMs and analogy-based convolutional auto-encoder networks. Additionally, in order to generate more realistic frame predictions, we also adopt adversarial loss. We evaluate our method on the Penn Action dataset, and demonstrate good results on high-level long-term structure prediction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
