Effect of Architectures and Training Methods on the Performance of   Learned Video Frame Prediction

M. Akin Yilmaz; A. Murat Tekalp

arXiv:2008.06106·cs.CV·August 17, 2020

Effect of Architectures and Training Methods on the Performance of Learned Video Frame Prediction

M. Akin Yilmaz, A. Murat Tekalp

PDF

TL;DR

This paper compares different neural network architectures and training methods for video frame prediction, highlighting the trade-offs between accuracy and computational efficiency.

Contribution

It provides a comprehensive analysis of feedforward and recurrent architectures, introducing effective training strategies and evaluating their performance.

Findings

01

Residual FCNN achieves highest PSNR but with higher computational cost.

02

CRNN can be trained efficiently with stateful BPTT and offers near real-time inference.

03

Recurrent networks can be trained stably and are more computationally efficient during inference.

Abstract

We analyze the performance of feedforward vs. recurrent neural network (RNN) architectures and associated training methods for learned frame prediction. To this effect, we trained a residual fully convolutional neural network (FCNN), a convolutional RNN (CRNN), and a convolutional long short-term memory (CLSTM) network for next frame prediction using the mean square loss. We performed both stateless and stateful training for recurrent networks. Experimental results show that the residual FCNN architecture performs the best in terms of peak signal to noise ratio (PSNR) at the expense of higher training and test (inference) computational complexity. The CRNN can be trained stably and very efficiently using the stateful truncated backpropagation through time procedure, and it requires an order of magnitude less inference runtime to achieve near real-time frame prediction with an acceptable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.