Grasp2Vec: Learning Object Representations from Self-Supervised Grasping
Eric Jang, Coline Devin, Vincent Vanhoucke, Sergey Levine

TL;DR
This paper introduces Grasp2Vec, a self-supervised learning method for robotic object representations based on object persistence, enabling improved grasping and scene understanding without human labels.
Contribution
We propose a novel self-supervised approach that learns object-centric representations through autonomous robot interactions, enhancing grasping and scene understanding.
Findings
Outperforms reinforcement learning from images
Enables object identification and localization
Improves robotic grasping success rates
Abstract
Well structured visual representations can make robot learning faster and can improve generalization. In this paper, we study how we can acquire effective object-centric representations for robotic manipulation tasks without human labeling by using autonomous robot interaction with the environment. Such representation learning methods can benefit from continuous refinement of the representation as the robot collects more experience, allowing them to scale effectively without human intervention. Our representation learning approach is based on object persistence: when a robot removes an object from a scene, the representation of that scene should change according to the features of the object that was removed. We formulate an arithmetic relationship between feature vectors from this observation, and use it to learn a representation of scenes and objects that can then be used to identify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
