TL;DR
This paper introduces MonoDEVS, a monocular depth estimation method that combines virtual-world supervision with real-world SfM self-supervision, overcoming traditional limitations and outperforming previous CNN-based models.
Contribution
It presents a novel training approach that leverages virtual-world data to enhance monocular depth estimation trained with real-world SfM signals.
Findings
MonoDEVS outperforms previous MDE CNNs trained on monocular and stereo data.
Combining virtual-world supervision with SfM self-supervision improves depth estimation accuracy.
Addressing domain gap and SfM limitations enhances model robustness.
Abstract
Depth information is essential for on-board perception in autonomous driving and driver assistance. Monocular depth estimation (MDE) is very appealing since it allows for appearance and depth being on direct pixelwise correspondence without further calibration. Best MDE models are based on Convolutional Neural Networks (CNNs) trained in a supervised manner, i.e., assuming pixelwise ground truth (GT). Usually, this GT is acquired at training time through a calibrated multi-modal suite of sensors. However, also using only a monocular system at training time is cheaper and more scalable. This is possible by relying on structure-from-motion (SfM) principles to generate self-supervision. Nevertheless, problems of camouflaged objects, visibility changes, static-camera intervals, textureless areas, and scale ambiguity, diminish the usefulness of such self-supervision. In this paper, we perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
