Temporally Consistent Depth Prediction with Flow-Guided Memory Units
Chanho Eom, Hyunjong Park, and Bumsub Ham

TL;DR
This paper introduces a flow-guided memory module within a two-stream CNN to improve the temporal consistency of monocular depth prediction in videos, achieving state-of-the-art results on the KITTI dataset.
Contribution
It proposes a novel memory-augmented CNN architecture with ConvGRUs and optical flow to enforce long-term temporal coherence in depth estimation from monocular videos.
Findings
Achieves state-of-the-art accuracy on KITTI benchmark.
Significantly improves temporal consistency in depth predictions.
Demonstrates effectiveness of memory modules in video-based depth estimation.
Abstract
Predicting depth from a monocular video sequence is an important task for autonomous driving. Although it has advanced considerably in the past few years, recent methods based on convolutional neural networks (CNNs) discard temporal coherence in the video sequence and estimate depth independently for each frame, which often leads to undesired inconsistent results over time. To address this problem, we propose to memorize temporal consistency in the video sequence, and leverage it for the task of depth prediction. To this end, we introduce a two-stream CNN with a flow-guided memory module, where each stream encodes visual and temporal features, respectively. The memory module, implemented using convolutional gated recurrent units (ConvGRUs), inputs visual and temporal features sequentially together with optical flow tailored to our task. It memorizes trajectories of individual features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Surveillance and Tracking Methods · Image Enhancement Techniques
