TL;DR
This paper introduces a self-supervised, end-to-end framework for segmenting unknown objects from manipulation sequences using motion cues, without relying on prior models or calibration, and demonstrates its effectiveness through extensive experiments.
Contribution
It presents a novel, fully trainable architecture that jointly leverages motion and semantic cues for object segmentation in robotic manipulation tasks.
Findings
Outperforms previous methods in segmentation accuracy.
Effectively handles background distractions and static scenes.
Semantic segmentation trained on automatically labeled data matches manual annotations.
Abstract
We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator. Our method successively learns an agnostic foreground segmentation followed by a distinction between manipulator and object solely by observing the motion between consecutive RGB frames. In contrast to previous approaches, we propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge. Furthermore, while the motion of the manipulator and the object are substantial cues for our algorithm, we present means to robustly deal with distraction objects moving in the background, as well as with completely static scenes. Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data. By extensive experimental evaluation we demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
