Giving Robots a Hand: Learning Generalizable Manipulation with   Eye-in-Hand Human Video Demonstrations

Moo Jin Kim; Jiajun Wu; Chelsea Finn

arXiv:2307.05959·cs.RO·July 13, 2023·1 cites

Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations

Moo Jin Kim, Jiajun Wu, Chelsea Finn

PDF

Open Access

TL;DR

This paper introduces a method that uses cheap human videos to improve robotic manipulation policies, enabling robots to generalize better across tasks and environments without explicit domain adaptation.

Contribution

The authors propose a framework that leverages unlabeled human videos to enhance robot imitation learning, bypassing the need for domain adaptation techniques.

Findings

01

58% average success rate improvement on real-world tasks

02

Enables generalization to new environments and unseen tasks

03

Effective without explicit domain adaptation methods

Abstract

Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation. However, for robotic imitation, it is still expensive to have a human teleoperator collect large amounts of expert demonstrations with a real robot. Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation and can be quickly captured in a wide range of scenarios. Therefore, human video demonstrations are a promising data source for learning generalizable robotic manipulation policies at scale. In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies. Although a clear visual domain gap exists between human and robot data, our framework does…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition