The Best of Both Modes: Separately Leveraging RGB and Depth for Unseen Object Instance Segmentation
Christopher Xie, Yu Xiang, Arsalan Mousavian, Dieter Fox

TL;DR
This paper introduces a novel two-stage approach that separately leverages synthetic RGB and depth data to improve unseen object instance segmentation in robotic tabletop environments, outperforming existing methods.
Contribution
The authors propose a new method that uses synthetic RGB and depth data separately for better segmentation of unseen objects, along with a large-scale synthetic dataset for training.
Findings
Outperforms state-of-the-art on unseen object segmentation
Effective in robotic grasping scenarios
Learns from non-photorealistic synthetic RGB-D data
Abstract
In order to function in unstructured environments, robots need the ability to recognize unseen novel objects. We take a step in this direction by tackling the problem of segmenting unseen object instances in tabletop environments. However, the type of large-scale real-world dataset required for this task typically does not exist for most robotic settings, which motivates the use of synthetic data. We propose a novel method that separately leverages synthetic RGB and synthetic depth for unseen object instance segmentation. Our method is comprised of two stages where the first stage operates only on depth to produce rough initial masks, and the second stage refines these masks with RGB. Surprisingly, our framework is able to learn from synthetic RGB-D data where the RGB is non-photorealistic. To train our method, we introduce a large-scale synthetic dataset of random objects on tabletops.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robot Manipulation and Learning · Robotics and Sensor-Based Localization
