STIP: A SpatioTemporal Information-Preserving and Perception-Augmented Model for High-Resolution Video Prediction
Zheng Chang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, and Wen Gao

TL;DR
This paper introduces STIP, a novel high-resolution video prediction model that preserves spatiotemporal information and enhances perceptual quality using a multi-grained auto-encoder, a spatiotemporal GRU, and a GAN-based perceptual loss.
Contribution
The paper proposes a new model combining MGST-AE, STGRU, and a perceptual loss to improve high-resolution video prediction performance.
Findings
Outperforms state-of-the-art methods in visual quality
Preserves more spatiotemporal information during feature extraction
Achieves better perceptual quality with lower computational load
Abstract
Although significant achievements have been achieved by recurrent neural network (RNN) based video prediction methods, their performance in datasets with high resolutions is still far from satisfactory because of the information loss problem and the perception-insensitive mean square error (MSE) based loss functions. In this paper, we propose a Spatiotemporal Information-Preserving and Perception-Augmented Model (STIP) to solve the above two problems. To solve the information loss problem, the proposed model aims to preserve the spatiotemporal information for videos during the feature extraction and the state transitions, respectively. Firstly, a Multi-Grained Spatiotemporal Auto-Encoder (MGST-AE) is designed based on the X-Net structure. The proposed MGST-AE can help the decoders recall multi-grained information from the encoders in both the temporal and spatial domains. In this way,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Advanced Vision and Imaging · Image and Video Quality Assessment
