Hand-Eye Autonomous Delivery: Learning Humanoid Navigation, Locomotion and Reaching

Sirui Chen; Yufei Ye; Zi-Ang Cao; Jennifer Lew; Pei Xu; C. Karen Liu

arXiv:2508.03068·cs.RO·August 11, 2025

Hand-Eye Autonomous Delivery: Learning Humanoid Navigation, Locomotion and Reaching

Sirui Chen, Yufei Ye, Zi-Ang Cao, Jennifer Lew, Pei Xu, C. Karen Liu

PDF

Open Access

TL;DR

This paper introduces HEAD, a modular framework enabling humanoids to learn navigation, locomotion, and reaching skills directly from human motion and vision data, with successful real-world and simulation demonstrations.

Contribution

HEAD is the first framework to integrate learning of navigation, locomotion, and reaching for humanoids from human data, emphasizing modularity and scalability.

Findings

01

Effective in simulation and real-world environments

02

Decouples perception from physical actions for scalable learning

03

Successfully performs complex navigation and reaching tasks

Abstract

We propose Hand-Eye Autonomous Delivery (HEAD), a framework that learns navigation, locomotion, and reaching skills for humanoids, directly from human motion and vision perception data. We take a modular approach where the high-level planner commands the target position and orientation of the hands and eyes of the humanoid, delivered by the low-level policy that controls the whole-body movements. Specifically, the low-level whole-body controller learns to track the three points (eyes, left hand, and right hand) from existing large-scale human motion capture data while high-level policy learns from human data collected by Aria glasses. Our modular approach decouples the ego-centric vision perception from physical actions, promoting efficient learning and scalability to novel scenes. We evaluate our method both in simulation and in the real-world, demonstrating humanoid's capabilities to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Social Robot Interaction and HRI