Robotic Telekinesis: Learning a Robotic Hand Imitator by Watching Humans on Youtube
Aravind Sivakumar, Kenneth Shaw, Deepak Pathak

TL;DR
This paper presents a system that enables anyone to control a robotic hand and arm by demonstrating motions with their own hand, using only a single RGB camera and internet videos, making robot teleoperation more accessible.
Contribution
The authors introduce a novel approach that leverages internet videos to train a system for real-time, marker-free, and uncalibrated human-to-robot hand motion retargeting for dexterous manipulation.
Findings
Enables untrained users to teleoperate a robot on various tasks
Achieves smooth, safe, and semantically accurate robot trajectories
Uses internet videos to reduce data collection costs
Abstract
We build a system that enables any human to control a robot hand and arm, simply by demonstrating motions with their own hand. The robot observes the human operator via a single RGB camera and imitates their actions in real-time. Human hands and robot hands differ in shape, size, and joint structure, and performing this translation from a single uncalibrated camera is a highly underconstrained problem. Moreover, the retargeted trajectories must effectively execute tasks on a physical robot, which requires them to be temporally smooth and free of self-collisions. Our key insight is that while paired human-robot correspondence data is expensive to collect, the internet contains a massive corpus of rich and diverse human hand videos. We leverage this data to train a system that understands human hands and retargets a human video stream into a robot hand-arm trajectory that is smooth,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Multimodal Machine Learning Applications
