Photo-Realistic Video Prediction on Natural Videos of Largely Changing Frames
Osamu Shouno

TL;DR
This paper introduces a hierarchical deep residual network with adversarial training for natural video prediction, significantly reducing blurriness and improving realism in future frames, especially with large motions.
Contribution
It proposes a novel hierarchical residual architecture combined with adversarial and perceptual losses, achieving more realistic and detailed video predictions than existing methods.
Findings
Outperforms state-of-the-art baselines quantitatively.
Generates more realistic and detailed future frames.
Handles large motions effectively.
Abstract
Recent advances in deep learning have significantly improved performance of video prediction. However, state-of-the-art methods still suffer from blurriness and distortions in their future predictions, especially when there are large motions between frames. To address these issues, we propose a deep residual network with the hierarchical architecture where each layer makes a prediction of future state at different spatial resolution, and these predictions of different layers are merged via top-down connections to generate future frames. We trained our model with adversarial and perceptual loss functions, and evaluated it on a natural video dataset captured by car-mounted cameras. Our model quantitatively outperforms state-of-the-art baselines in future frame prediction on video sequences of both largely and slightly changing frames. Furthermore, our model generates future frames with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
