A Dynamic Multi-Scale Voxel Flow Network for Video Prediction
Xiaotao Hu, Zhewei Huang, Ailin Huang, Jun Xu, Shuchang Zhou

TL;DR
This paper introduces DMVFN, a neural network that efficiently predicts future video frames using only RGB images, by adaptively selecting sub-networks based on motion scales, achieving high quality with lower computational costs.
Contribution
The paper presents a novel differentiable routing module enabling adaptive sub-network selection for improved video prediction efficiency and quality using only RGB inputs.
Findings
DMVFN is an order of magnitude faster than Deep Voxel Flow.
DMVFN surpasses state-of-the-art OPT in generated image quality.
The method achieves comparable or better performance with lower computational costs.
Abstract
The performance of video prediction has been greatly boosted by advanced deep neural networks. However, most of the current methods suffer from large model sizes and require extra inputs, e.g., semantic/depth maps, for promising performance. For efficiency consideration, in this paper, we propose a Dynamic Multi-scale Voxel Flow Network (DMVFN) to achieve better video prediction performance at lower computational costs with only RGB images, than previous methods. The core of our DMVFN is a differentiable routing module that can effectively perceive the motion scales of video frames. Once trained, our DMVFN selects adaptive sub-networks for different inputs at the inference stage. Experiments on several benchmarks demonstrate that our DMVFN is an order of magnitude faster than Deep Voxel Flow and surpasses the state-of-the-art iterative-based OPT on generated image quality. Our code and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition
MethodsOPT · A Dynamic Multi-Scale Voxel Flow Network
