TL;DR
This paper introduces simple yet effective improvements to self-supervised monocular depth estimation, achieving state-of-the-art results by focusing on robust loss functions and sampling methods rather than complex architectures.
Contribution
The paper presents a set of novel, simple design choices that significantly enhance self-supervised monocular depth estimation performance.
Findings
Achieved state-of-the-art results on KITTI benchmark.
Proposed a minimum reprojection loss for occlusion handling.
Introduced a full-resolution multi-scale sampling method.
Abstract
Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods. Research on self-supervised monocular training usually explores increasingly complex architectures, loss functions, and image formation models, all of which have recently helped to close the gap with fully-supervised methods. We show that a surprisingly simple model, and associated design choices, lead to superior predictions. In particular, we propose (i) a minimum reprojection loss, designed to robustly handle occlusions, (ii) a full-resolution multi-scale sampling method that reduces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
