TL;DR
This paper introduces SGDepth, a self-supervised monocular depth estimation method that effectively handles moving dynamic objects by integrating semantic segmentation, semantic masking, and detection techniques, improving accuracy without requiring depth labels.
Contribution
The work presents a novel semantically-guided approach that jointly trains depth and segmentation, and employs semantic masking and detection to address dynamic objects in self-supervised depth estimation.
Findings
Outperforms all baselines on the Eigen split benchmark.
Effectively handles moving dynamic objects in depth estimation.
No test-time refinement needed for superior performance.
Abstract
Self-supervised monocular depth estimation presents a powerful method to obtain 3D scene information from single camera images, which is trainable on arbitrary image sequences without requiring depth labels, e.g., from a LiDAR sensor. In this work we present a new self-supervised semantically-guided depth estimation (SGDepth) method to deal with moving dynamic-class (DC) objects, such as moving cars and pedestrians, which violate the static-world assumptions typically made during training of such models. Specifically, we propose (i) mutually beneficial cross-domain training of (supervised) semantic segmentation and self-supervised depth estimation with task-specific network heads, (ii) a semantic masking scheme providing guidance to prevent moving DC objects from contaminating the photometric loss, and (iii) a detection method for frames with non-moving DC objects, from which the depth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
