From Single to Multiple: Leveraging Multi-level Prediction Spaces for Video Forecasting
Mengcheng Lan, Shuliang Ning, Yanran Li, Qian Chen, Xunlai Chen,, Xiaoguang Han, Shuguang Cui

TL;DR
This paper introduces a novel multi-prediction space approach for video forecasting, combining pixel and high-level feature spaces with recurrent connections to improve accuracy and reduce artifacts.
Contribution
It is the first to explore and fuse multiple prediction spaces in video forecasting, significantly enhancing performance over existing single-space models.
Findings
Performance improved by 32.1% MAE on MNIST-2 dataset
Performance improved by 21.4% MAE on KTH dataset
Reduces distortions and blurry artifacts in long-term predictions
Abstract
Despite video forecasting has been a widely explored topic in recent years, the mainstream of the existing work still limits their models with a single prediction space but completely neglects the way to leverage their model with multi-prediction spaces. This work fills this gap. For the first time, we deeply study numerous strategies to perform video forecasting in multi-prediction spaces and fuse their results together to boost performance. The prediction in the pixel space usually lacks the ability to preserve the semantic and structure content of the video however the prediction in the high-level feature space is prone to generate errors in the reduction and recovering process. Therefore, we build a recurrent connection between different feature spaces and incorporate their generations in the upsampling process. Rather surprisingly, this simple idea yields a much more significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Image Enhancement Techniques · Generative Adversarial Networks and Image Synthesis
