SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from Monocular images
Lei He, Jiwen Lu, Guanghui Wang, Shiyu Song, Jie Zhou

TL;DR
SOSD-Net is a novel neural network that jointly performs semantic object segmentation and depth estimation from monocular images by leveraging geometric constraints and an iterative training approach, achieving superior results.
Contribution
It introduces the concept of semantic objectness to exploit geometric relationships and employs an EM-inspired iterative training method for joint scene understanding tasks.
Findings
Outperforms existing methods on Cityscapes and NYU v2 datasets.
First to incorporate geometry constraints in joint segmentation and depth estimation.
Demonstrates improved accuracy through iterative training approach.
Abstract
Depth estimation and semantic segmentation play essential roles in scene understanding. The state-of-the-art methods employ multi-task learning to simultaneously learn models for these two tasks at the pixel-wise level. They usually focus on sharing the common features or stitching feature maps from the corresponding branches. However, these methods lack in-depth consideration on the correlation of the geometric cues and the scene parsing. In this paper, we first introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks through an analysis of the imaging process, then propose a Semantic Object Segmentation and Depth Estimation Network (SOSD-Net) based on the objectness assumption. To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
