Impact of Pseudo Depth on Open World Object Segmentation with Minimal User Guidance
Robin Sch\"on, Katja Ludwig, Rainer Lienhart

TL;DR
This paper explores how pseudo depth maps, generated by pretrained networks, can improve open world object segmentation, especially for unseen classes, with minimal user guidance, demonstrating significant generalization improvements.
Contribution
It introduces a method leveraging pseudo depth maps and minimal user input to enhance open world object segmentation, especially for unseen classes.
Findings
Depth-based segmentation outperforms RGB-only methods on unseen classes.
Using pseudo depth maps improves IoU scores from 61.57 to 69.79 on unseen classes.
The approach generalizes well with minimal user guidance and partial class training.
Abstract
Pseudo depth maps are depth map predicitions which are used as ground truth during training. In this paper we leverage pseudo depth maps in order to segment objects of classes that have never been seen during training. This renders our object segmentation task an open world task. The pseudo depth maps are generated using pretrained networks, which have either been trained with the full intention to generalize to downstream tasks (LeRes and MiDaS), or which have been trained in an unsupervised fashion on video sequences (MonodepthV2). In order to tell our network which object to segment, we provide the network with a single click on the object's surface on the pseudo depth map of the image as input. We test our approach on two different scenarios: One without the RGB image and one where the RGB image is part of the input. Our results demonstrate a considerably better generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Vision and Imaging · Human Pose and Action Recognition
MethodsTest
