Exploiting Depth from Single Monocular Images for Object Detection and   Semantic Segmentation

Yuanzhouhan Cao; Chunhua Shen; Heng Tao Shen

arXiv:1610.01706·cs.CV·November 17, 2016

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

Yuanzhouhan Cao, Chunhua Shen, Heng Tao Shen

PDF

TL;DR

This paper demonstrates that estimated depth from monocular images can significantly enhance object detection and semantic segmentation accuracy by integrating depth features with RGB data.

Contribution

The authors develop a deep depth estimation model from monocular images and incorporate estimated depth features into detection and segmentation tasks, introducing a multi-task training scheme for semantic segmentation.

Findings

01

Estimated depth improves detection accuracy

02

Estimated depth enhances segmentation performance

03

Multi-task training benefits semantic segmentation

Abstract

Augmenting RGB data with measured depth has been shown to improve the performance of a range of tasks in computer vision including object detection and semantic segmentation. Although depth sensors such as the Microsoft Kinect have facilitated easy acquisition of such depth information, the vast majority of images used in vision tasks do not contain depth information. In this paper, we show that augmenting RGB images with estimated depth can also improve the accuracy of both object detection and semantic segmentation. Specifically, we first exploit the recent success of depth estimation from monocular images and learn a deep depth estimation model. Then we learn deep depth features from the estimated depth and combine with RGB features for object detection and semantic segmentation. Additionally, we propose an RGB-D semantic segmentation method which applies a multi-task training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.