Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction
Chaofan Ling, Junpei Zhong, Weihua Li

TL;DR
This paper introduces a multi-scale predictive coding model with encoder-decoder LSTM for improved future video frame prediction, integrating hierarchical predictions and enhanced training strategies to better capture temporal and spatial dependencies.
Contribution
It proposes a novel multi-scale predictive coding framework with encoder-decoder LSTM, incorporating hierarchical predictions and improved training methods for better video prediction.
Findings
Achieves strong performance on KTH, Moving MNIST, and Caltech Pedestrian datasets.
Effectively models temporal and spatial dependencies in video prediction.
Addresses training instability and long-term prediction errors.
Abstract
We present a multi-scale predictive coding model for future video frames prediction. Drawing inspiration on the ``Predictive Coding" theories in cognitive science, it is updated by a combination of bottom-up and top-down information flows, which can enhance the interaction between different network levels. However, traditional predictive coding models only predict what is happening hierarchically rather than predicting the future. To address the problem, our model employs a multi-scale approach (Coarse to Fine), where the higher level neurons generate coarser predictions (lower resolution), while the lower level generate finer predictions (higher resolution). In terms of network architecture, we directly incorporate the encoder-decoder network within the LSTM module and share the final encoded high-level semantic information across different network levels. This enables comprehensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Human Pose and Action Recognition
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
