Learning Whole-Body Human-Humanoid Interaction from Human-Human Demonstrations

Wei-Jin Huang; Yue-Yi Zhang; Yi-Lin Wei; Zhi-Wei Xia; Juantao Tan; Yuan-Ming Li; Zhilin Zhao; Wei-Shi Zheng

arXiv:2601.09518·cs.RO·January 15, 2026

Learning Whole-Body Human-Humanoid Interaction from Human-Human Demonstrations

Wei-Jin Huang, Yue-Yi Zhang, Yi-Lin Wei, Zhi-Wei Xia, Juantao Tan, Yuan-Ming Li, Zhilin Zhao, Wei-Shi Zheng

PDF

Open Access

TL;DR

This paper introduces a novel framework for teaching humanoid robots complex whole-body interactions by converting human-human interaction data into humanoid data and employing hierarchical policies for responsive, synchronized behavior.

Contribution

The paper presents PAIR, a contact-preserving retargeting method, and D-STAR, a hierarchical policy with phase attention and multi-scale spatial modules, enabling effective learning from HHI data.

Findings

01

High-quality HHoI data generated from HHI data improves learning.

02

D-STAR outperforms baseline policies in simulation.

03

Framework enables responsive and synchronized humanoid interactions.

Abstract

Enabling humanoid robots to physically interact with humans is a critical frontier, but progress is hindered by the scarcity of high-quality Human-Humanoid Interaction (HHoI) data. While leveraging abundant Human-Human Interaction (HHI) data presents a scalable alternative, we first demonstrate that standard retargeting fails by breaking the essential contacts. We address this with PAIR (Physics-Aware Interaction Retargeting), a contact-centric, two-stage pipeline that preserves contact semantics across morphology differences to generate physically consistent HHoI data. This high-quality data, however, exposes a second failure: conventional imitation learning policies merely mimic trajectories and lack interactive understanding. We therefore introduce D-STAR (Decoupled Spatio-Temporal Action Reasoner), a hierarchical policy that disentangles when to act from where to act. In D-STAR,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSocial Robot Interaction and HRI · Robot Manipulation and Learning · Human Pose and Action Recognition