Robot Instance Segmentation with Few Annotations for Grasping
Moshe Kimhi, David Vainshtein, Chaim Baskin, Dotan Di Castro

TL;DR
This paper introduces a semi-supervised, interaction-based learning framework for robot instance segmentation that significantly reduces annotation requirements while achieving state-of-the-art results in cluttered environments.
Contribution
It combines semi-supervised learning with learning through interaction, enabling effective segmentation with minimal annotations and leveraging temporal context from unlabeled data.
Findings
Achieves 86.37 AP50 on ARMBench, nearly 20% better than previous methods.
Attains 84.89 AP50 with only 1% annotated data, demonstrating high data efficiency.
Outperforms existing methods on ARMBench and OCID benchmarks.
Abstract
The ability of robots to manipulate objects relies heavily on their aptitude for visual perception. In domains characterized by cluttered scenes and high object variability, most methods call for vast labeled datasets, laboriously hand-annotated, with the aim of training capable models. Once deployed, the challenge of generalizing to unfamiliar objects implies that the model must evolve alongside its domain. To address this, we propose a novel framework that combines Semi-Supervised Learning (SSL) with Learning Through Interaction (LTI), allowing a model to learn by observing scene alterations and leverage visual consistency despite temporal gaps without requiring curated data of interaction sequences. As a result, our approach exploits partially annotated data through self-supervision and incorporates temporal context using pseudo-sequences generated from unlabeled still images. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Natural Language Processing Techniques · Handwritten Text Recognition Techniques
