TL;DR
This paper introduces ManyDepth, a self-supervised deep learning method that leverages sequence information at test time for improved monocular depth estimation, outperforming existing methods on KITTI and Cityscapes datasets.
Contribution
ManyDepth is the first adaptive, end-to-end cost volume based approach that effectively utilizes test-time sequence data for monocular depth estimation.
Findings
Outperforms all published self-supervised baselines on KITTI and Cityscapes.
Uses a novel consistency loss to handle unreliable cost volumes.
Incorporates an augmentation scheme for static cameras.
Abstract
Self-supervised monocular depth estimation networks are trained to predict scene depth using nearby frames as a supervision signal during training. However, for many applications, sequence information in the form of video frames is also available at test time. The vast majority of monocular networks do not make use of this extra signal, thus ignoring valuable information that could be used to improve the predicted depth. Those that do, either use computationally expensive test-time refinement techniques or off-the-shelf recurrent networks, which only indirectly make use of the geometric information that is inherently available. We propose ManyDepth, an adaptive approach to dense depth estimation that can make use of sequence information at test time, when it is available. Taking inspiration from multi-view stereo, we propose a deep end-to-end cost volume based approach that is trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
