Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation

Yifan Xie; YuAn Wang; Guangyu Chen; Jinkun Liu; Yu Sun; Wenbo Ding

arXiv:2604.24681·cs.RO·May 22, 2026

Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation

Yifan Xie, YuAn Wang, Guangyu Chen, Jinkun Liu, Yu Sun, Wenbo Ding

PDF

TL;DR

This paper introduces MoT-HRA, a hierarchical framework that learns human manipulation priors from large-scale demonstrations to improve robotic manipulation tasks.

Contribution

It presents a novel hierarchical model and a large-scale dataset for learning embodiment-agnostic human manipulation priors for robots.

Findings

01

MoT-HRA improves motion plausibility in manipulation tasks.

02

The framework enhances robust control under distribution shifts.

03

The dataset HA-2.2M enables large-scale learning of human manipulation priors.

Abstract

Human videos contain rich manipulation priors, but using them for robot learning remains difficult because raw observations entangle scene understanding, human motion, and embodiment-specific action. We introduce MoT-HRA, a hierarchical vision-language-action framework that learns human-intention priors from large-scale human demonstrations. We first curate HA-2.2M, a 2.2M-episode action-language dataset reconstructed from heterogeneous human videos through hand-centric filtering, spatial reconstruction, temporal segmentation, and language alignment. On top of this dataset, MoT-HRA factorizes manipulation into three coupled experts: a vision-language expert predicts an embodiment-agnostic 3D trajectory, an intention expert models MANO-style hand motion as a latent human-motion prior, and a fine expert maps the intention-aware representation to robot action chunks. A shared-attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.