Supersizing Self-supervision: Learning to Grasp from 50K Tries and 700 Robot Hours
Lerrel Pinto, Abhinav Gupta

TL;DR
This paper introduces a large-scale dataset of 50,000 robot grasping attempts collected over 700 hours, enabling training of a CNN for grasp prediction that generalizes well to unseen objects.
Contribution
The authors significantly increase training data for robot grasping, recast the problem as multi-class classification, and propose a multi-stage training approach to improve grasp prediction.
Findings
Large-scale dataset improves grasping accuracy
Multi-stage training enhances model robustness
State-of-the-art generalization to unseen objects
Abstract
Current learning-based robot grasping approaches exploit human-labeled datasets for training the models. However, there are two problems with such a methodology: (a) since each object can be grasped in multiple ways, manually labeling grasp locations is not a trivial task; (b) human labeling is biased by semantics. While there have been attempts to train robots using trial-and-error experiments, the amount of data used in such experiments remains substantially low and hence makes the learner prone to over-fitting. In this paper, we take the leap of increasing the available training data to 40 times more than prior work, leading to a dataset size of 50K data points collected over 700 hours of robot grasping attempts. This allows us to train a Convolutional Neural Network (CNN) for the task of predicting grasp locations without severe overfitting. In our formulation, we recast the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Soft Robotics and Applications
