Monocular Depth Estimation by Learning from Heterogeneous Datasets
Akhil Gurram, Onay Urfalioglu, Ibrahim Halfaoui, Fahd Bouzaraa and, Antonio M. Lopez

TL;DR
This paper introduces a method for monocular depth estimation that learns from heterogeneous datasets containing different types of annotations, improving performance over existing methods.
Contribution
It demonstrates that CNNs for depth estimation can be trained using separate datasets for depth and semantic labels, removing the need for joint annotations.
Findings
Outperforms state-of-the-art on KITTI and Cityscapes datasets.
Shows that training with heterogeneous datasets is effective.
Reduces annotation constraints for monocular depth estimation.
Abstract
Depth estimation provides essential information to perform autonomous driving and driver assistance. Especially, Monocular Depth Estimation is interesting from a practical point of view, since using a single camera is cheaper than many other options and avoids the need for continuous calibration strategies as required by stereo-vision approaches. State-of-the-art methods for Monocular Depth Estimation are based on Convolutional Neural Networks (CNNs). A promising line of work consists of introducing additional semantic information about the traffic scene when training CNNs for depth estimation. In practice, this means that the depth data used for CNN training is complemented with images having pixel-wise semantic labels, which usually are difficult to annotate (e.g. crowded urban images). Moreover, so far it is common practice to assume that the same raw training data is associated with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
