H2R: A Human-to-Robot Data Augmentation for Robot Pre-training from Videos
Guangrun Li, Yaoxu Lyu, Zhuoyang Liu, Chengkai Hou, Jieyu Zhang, Shanghang Zhang

TL;DR
H2R introduces a data augmentation pipeline that converts egocentric human videos into robot-centric visual data, significantly enhancing robot learning and generalization across various tasks and platforms.
Contribution
The paper presents H2R, a novel augmentation method that bridges the visual gap between human and robot embodiments during pre-training, improving downstream robot policy performance.
Findings
H2R improves success rates by up to 23.3% in real-world tasks.
H2R enhances simulation benchmark success rates by up to 10.2%.
The CLIP-based metric effectively evaluates semantic fidelity of augmented data.
Abstract
Large-scale pre-training using egocentric human videos has proven effective for robot learning. However, the models pre-trained on such data can be suboptimal for robot learning due to the significant visual gap between human hands and those of different robots. To remedy this, we propose H2R, a human-to-robot data augmentation pipeline that converts egocentric human videos into robot-centric visual data. H2R estimates human hand pose from videos, retargets the motion to simulated robotic arms, removes human limbs via segmentation and inpainting, and composites rendered robot embodiments into the original frames with camera-aligned geometry. This process explicitly bridges the visual gap between human and robot embodiments during pre-training. We apply H2R to augment large-scale egocentric human video datasets such as Ego4D and SSv2. To verify the effectiveness of the augmentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Social Robot Interaction and HRI
