TL;DR
This paper introduces a semi-supervised framework for semantic segmentation that leverages self-supervised monocular depth estimation, data augmentation, and sample selection to improve performance with limited labeled data.
Contribution
It presents a novel approach combining self-supervised depth features, geometry-based data augmentation, and sample selection in a student-teacher setup for semi-supervised segmentation.
Findings
Achieves state-of-the-art results on Cityscapes
Demonstrates significant performance gains with the proposed modules
Validates the effectiveness of depth-based sample selection
Abstract
Training deep networks for semantic segmentation requires large amounts of labeled training data, which presents a major challenge in practice, as labeling segmentation masks is a highly labor-intensive process. To address this issue, we present a framework for semi-supervised semantic segmentation, which is enhanced by self-supervised monocular depth estimation from unlabeled image sequences. In particular, we propose three key contributions: (1) We transfer knowledge from features learned during self-supervised depth estimation to semantic segmentation, (2) we implement a strong data augmentation by blending images and labels using the geometry of the scene, and (3) we utilize the depth feature diversity as well as the level of difficulty of learning depth in a student-teacher framework to select the most useful samples to be annotated for semantic segmentation. We validate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
