MMVP: Motion-Matrix-based Video Prediction
Yiqi Zhong, Luming Liang, Ilya Zharkov, Ulrich Neumann

TL;DR
MMVP introduces a novel two-stream framework that decouples motion and appearance in video prediction, leading to improved accuracy, efficiency, and smaller models compared to previous methods.
Contribution
The paper proposes a new approach using appearance-agnostic motion matrices to enhance video prediction performance and efficiency.
Findings
Outperforms state-of-the-art methods by about 1 dB in PSNR.
Achieves significantly smaller model sizes, 84% or less of previous models.
Demonstrates superior accuracy and efficiency on public datasets.
Abstract
A central challenge of video prediction lies where the system has to reason the objects' future motions from image frames while simultaneously maintaining the consistency of their appearances across frames. This work introduces an end-to-end trainable two-stream video prediction framework, Motion-Matrix-based Video Prediction (MMVP), to tackle this challenge. Unlike previous methods that usually handle motion prediction and appearance maintenance within the same set of modules, MMVP decouples motion and appearance information by constructing appearance-agnostic motion matrices. The motion matrices represent the temporal similarity of each and every pair of feature patches in the input frames, and are the sole input of the motion prediction module in MMVP. This design improves video prediction in both accuracy and efficiency, and reduces the model size. Results of extensive experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Video Analysis and Summarization
