Depth-Guided Semi-Supervised Instance Segmentation
Xin Chen, Jie Hu, Xiawu Zheng, Jianghang Lin, Liujuan Cao, Rongrong Ji

TL;DR
This paper introduces a depth-guided semi-supervised instance segmentation framework that leverages depth maps to improve pseudo-label accuracy and model performance, outperforming previous RGB-only methods on COCO and Cityscapes datasets.
Contribution
The paper proposes a novel depth-guided framework with depth feature fusion and depth controller to enhance semi-supervised instance segmentation accuracy.
Findings
Achieves state-of-the-art results on COCO with 22.29% mAP at 1% labeled data.
Outperforms previous methods on Cityscapes dataset.
Demonstrates effective integration of depth information improves segmentation performance.
Abstract
Semi-Supervised Instance Segmentation (SSIS) aims to leverage an amount of unlabeled data during training. Previous frameworks primarily utilized the RGB information of unlabeled images to generate pseudo-labels. However, such a mechanism often introduces unstable noise, as a single instance can display multiple RGB values. To overcome this limitation, we introduce a Depth-Guided (DG) SSIS framework. This framework uses depth maps extracted from input images, which represent individual instances with closely associated distance values, offering precise contours for distinct instances. Unlike RGB data, depth maps provide a unique perspective, making their integration into the SSIS process complex. To this end, we propose Depth Feature Fusion, which integrates features extracted from depth estimation. This integration allows the model to understand depth information better and ensure its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Machine Learning and Data Classification
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
