# HERMES: Human-to-Robot Embodied Learning from Multi-Source Motion Data for Mobile Dexterous Manipulation

**Authors:** Zhecheng Yuan, Tianming Wei, Langzhe Gu, Pu Hua, Tianhai Liang, Yuanpei Chen, Huazhe Xu

arXiv: 2508.20085 · 2025-09-03

## TL;DR

HERMES is a comprehensive framework that leverages multi-source human motion data and advanced transfer techniques to enable robots with dexterous manipulation skills in diverse, real-world environments.

## Contribution

The paper introduces a unified reinforcement learning approach for translating heterogeneous human motions into robotic behaviors and a depth image-based sim2real transfer method for better real-world generalization.

## Key findings

- HERMES achieves consistent generalization across diverse scenarios.
- The approach successfully performs complex mobile bimanual manipulation tasks.
- The method bridges the sim2real gap effectively.

## Abstract

Leveraging human motion data to impart robots with versatile manipulation skills has emerged as a promising paradigm in robotic manipulation. Nevertheless, translating multi-source human hand motions into feasible robot behaviors remains challenging, particularly for robots equipped with multi-fingered dexterous hands characterized by complex, high-dimensional action spaces. Moreover, existing approaches often struggle to produce policies capable of adapting to diverse environmental conditions. In this paper, we introduce HERMES, a human-to-robot learning framework for mobile bimanual dexterous manipulation. First, HERMES formulates a unified reinforcement learning approach capable of seamlessly transforming heterogeneous human hand motions from multiple sources into physically plausible robotic behaviors. Subsequently, to mitigate the sim2real gap, we devise an end-to-end, depth image-based sim2real transfer method for improved generalization to real-world scenarios. Furthermore, to enable autonomous operation in varied and unstructured environments, we augment the navigation foundation model with a closed-loop Perspective-n-Point (PnP) localization mechanism, ensuring precise alignment of visual goals and effectively bridging autonomous navigation and dexterous manipulation. Extensive experimental results demonstrate that HERMES consistently exhibits generalizable behaviors across diverse, in-the-wild scenarios, successfully performing numerous complex mobile bimanual dexterous manipulation tasks. Project Page:https://gemcollector.github.io/HERMES/.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20085/full.md

## Figures

24 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20085/full.md

## References

96 references — full list in the complete paper: https://tomesphere.com/paper/2508.20085/full.md

---
Source: https://tomesphere.com/paper/2508.20085