Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation
Yuanzhouhan Cao, Chunhua Shen, Heng Tao Shen

TL;DR
This paper demonstrates that estimated depth from monocular images can significantly enhance object detection and semantic segmentation accuracy by integrating depth features with RGB data.
Contribution
The authors develop a deep depth estimation model from monocular images and incorporate estimated depth features into detection and segmentation tasks, introducing a multi-task training scheme for semantic segmentation.
Findings
Estimated depth improves detection accuracy
Estimated depth enhances segmentation performance
Multi-task training benefits semantic segmentation
Abstract
Augmenting RGB data with measured depth has been shown to improve the performance of a range of tasks in computer vision including object detection and semantic segmentation. Although depth sensors such as the Microsoft Kinect have facilitated easy acquisition of such depth information, the vast majority of images used in vision tasks do not contain depth information. In this paper, we show that augmenting RGB images with estimated depth can also improve the accuracy of both object detection and semantic segmentation. Specifically, we first exploit the recent success of depth estimation from monocular images and learn a deep depth estimation model. Then we learn deep depth features from the estimated depth and combine with RGB features for object detection and semantic segmentation. Additionally, we propose an RGB-D semantic segmentation method which applies a multi-task training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
