H2R: A Human-to-Robot Data Augmentation for Robot Pre-training from Videos

Guangrun Li; Yaoxu Lyu; Zhuoyang Liu; Chengkai Hou; Jieyu Zhang; Shanghang Zhang

arXiv:2505.11920·cs.RO·March 17, 2026

H2R: A Human-to-Robot Data Augmentation for Robot Pre-training from Videos

Guangrun Li, Yaoxu Lyu, Zhuoyang Liu, Chengkai Hou, Jieyu Zhang, Shanghang Zhang

PDF

Open Access

TL;DR

H2R introduces a data augmentation pipeline that converts egocentric human videos into robot-centric visual data, significantly enhancing robot learning and generalization across various tasks and platforms.

Contribution

The paper presents H2R, a novel augmentation method that bridges the visual gap between human and robot embodiments during pre-training, improving downstream robot policy performance.

Findings

01

H2R improves success rates by up to 23.3% in real-world tasks.

02

H2R enhances simulation benchmark success rates by up to 10.2%.

03

The CLIP-based metric effectively evaluates semantic fidelity of augmented data.

Abstract

Large-scale pre-training using egocentric human videos has proven effective for robot learning. However, the models pre-trained on such data can be suboptimal for robot learning due to the significant visual gap between human hands and those of different robots. To remedy this, we propose H2R, a human-to-robot data augmentation pipeline that converts egocentric human videos into robot-centric visual data. H2R estimates human hand pose from videos, retargets the motion to simulated robotic arms, removes human limbs via segmentation and inpainting, and composites rendered robot embodiments into the original frames with camera-aligned geometry. This process explicitly bridges the visual gap between human and robot embodiments during pre-training. We apply H2R to augment large-scale egocentric human video datasets such as Ego4D and SSv2. To verify the effectiveness of the augmentation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Social Robot Interaction and HRI