Hand-Eye Autonomous Delivery: Learning Humanoid Navigation, Locomotion and Reaching
Sirui Chen, Yufei Ye, Zi-Ang Cao, Jennifer Lew, Pei Xu, C. Karen Liu

TL;DR
This paper introduces HEAD, a modular framework enabling humanoids to learn navigation, locomotion, and reaching skills directly from human motion and vision data, with successful real-world and simulation demonstrations.
Contribution
HEAD is the first framework to integrate learning of navigation, locomotion, and reaching for humanoids from human data, emphasizing modularity and scalability.
Findings
Effective in simulation and real-world environments
Decouples perception from physical actions for scalable learning
Successfully performs complex navigation and reaching tasks
Abstract
We propose Hand-Eye Autonomous Delivery (HEAD), a framework that learns navigation, locomotion, and reaching skills for humanoids, directly from human motion and vision perception data. We take a modular approach where the high-level planner commands the target position and orientation of the hands and eyes of the humanoid, delivered by the low-level policy that controls the whole-body movements. Specifically, the low-level whole-body controller learns to track the three points (eyes, left hand, and right hand) from existing large-scale human motion capture data while high-level policy learns from human data collected by Aria glasses. Our modular approach decouples the ego-centric vision perception from physical actions, promoting efficient learning and scalability to novel scenes. We evaluate our method both in simulation and in the real-world, demonstrating humanoid's capabilities to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · Social Robot Interaction and HRI
