Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for   Video Prediction

Chaofan Ling; Junpei Zhong; Weihua Li

arXiv:2212.11642·cs.CV·October 10, 2023

Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction

Chaofan Ling, Junpei Zhong, Weihua Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-scale predictive coding model with encoder-decoder LSTM for improved future video frame prediction, integrating hierarchical predictions and enhanced training strategies to better capture temporal and spatial dependencies.

Contribution

It proposes a novel multi-scale predictive coding framework with encoder-decoder LSTM, incorporating hierarchical predictions and improved training methods for better video prediction.

Findings

01

Achieves strong performance on KTH, Moving MNIST, and Caltech Pedestrian datasets.

02

Effectively models temporal and spatial dependencies in video prediction.

03

Addresses training instability and long-term prediction errors.

Abstract

We present a multi-scale predictive coding model for future video frames prediction. Drawing inspiration on the ``Predictive Coding" theories in cognitive science, it is updated by a combination of bottom-up and top-down information flows, which can enhance the interaction between different network levels. However, traditional predictive coding models only predict what is happening hierarchically rather than predicting the future. To address the problem, our model employs a multi-scale approach (Coarse to Fine), where the higher level neurons generate coarser predictions (lower resolution), while the lower level generate finer predictions (higher resolution). In terms of network architecture, we directly incorporate the encoder-decoder network within the LSTM module and share the final encoded high-level semantic information across different network levels. This enables comprehensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ling-cf/mspn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Human Pose and Action Recognition

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory