H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation

Hongzhe Bi; Lingxuan Wu; Tianwei Lin; Hengkai Tan; Zhizhong Su; Hang Su; Jun Zhu

arXiv:2507.23523·cs.RO·August 4, 2025

H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation

Hongzhe Bi, Lingxuan Wu, Tianwei Lin, Hengkai Tan, Zhizhong Su, Hang Su, Jun Zhu

PDF

Open Access 1 Video

TL;DR

H-RDT leverages large-scale human manipulation videos with 3D hand poses to improve robotic manipulation, using a two-stage training process that enhances performance across various tasks and settings.

Contribution

The paper introduces H-RDT, a diffusion transformer model trained on human manipulation data to boost robotic manipulation capabilities, addressing data scarcity and embodiment diversity challenges.

Findings

01

H-RDT outperforms training from scratch and state-of-the-art methods.

02

Achieves 13.9% and 40.5% improvements in simulation and real-world tasks.

03

Effective in single-task, multitask, few-shot, and robustness scenarios.

Abstract

Imitation learning for robotic manipulation faces a fundamental challenge: the scarcity of large-scale, high-quality robot demonstration data. Recent robotic foundation models often pre-train on cross-embodiment robot datasets to increase data scale, while they face significant limitations as the diverse morphologies and action spaces across different robot embodiments make unified training challenging. In this paper, we present H-RDT (Human to Robotics Diffusion Transformer), a novel approach that leverages human manipulation data to enhance robot manipulation capabilities. Our key insight is that large-scale egocentric human manipulation videos with paired 3D hand pose annotations provide rich behavioral priors that capture natural manipulation strategies and can benefit robotic policy learning. We introduce a two-stage training paradigm: (1) pre-training on large-scale egocentric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation· underline

Taxonomy

TopicsRobot Manipulation and Learning