Geometry-Based Next Frame Prediction from Monocular Video
Reza Mahjourian, Martin Wicke, Anelia Angelova

TL;DR
This paper introduces a geometry-based method for next frame prediction from monocular video, leveraging depth prediction from sequences of images to improve accuracy and produce richer, geometry-aware frame predictions.
Contribution
It presents a novel approach that predicts depth from video sequences and uses scene geometry to enhance next frame prediction, outperforming existing methods.
Findings
Depth prediction accuracy improves with more prior frames
The method produces more realistic and accurate next frame predictions
Results outperform existing frame prediction approaches on KITTI dataset
Abstract
We consider the problem of next frame prediction from video input. A recurrent convolutional neural network is trained to predict depth from monocular video input, which, along with the current video image and the camera trajectory, can then be used to compute the next frame. Unlike prior next-frame prediction approaches, we take advantage of the scene geometry and use the predicted depth for generating the next frame prediction. Our approach can produce rich next frame predictions which include depth information attached to each pixel. Another novel aspect of our approach is that it predicts depth from a sequence of images (e.g. in a video), rather than from a single still image. We evaluate the proposed approach on the KITTI dataset, a standard dataset for benchmarking tasks relevant to autonomous driving. The proposed method produces results which are visually and numerically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
