EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations
Justin Yu, Yide Shentu, Di Wu, Pieter Abbeel, Ken Goldberg, Philipp Wu

TL;DR
EgoMI is a framework that captures synchronized head and hand movements during human manipulation demonstrations, enabling robots to better imitate human actions despite embodiment differences by modeling active viewpoint changes.
Contribution
We introduce EgoMI, a novel data collection and learning framework that incorporates head motion and active visual search strategies to improve robot imitation from egocentric human demonstrations.
Findings
Policies with head-motion modeling outperform baselines.
Explicit head-motion modeling improves robustness in imitation learning.
EgoMI effectively bridges the embodiment gap for semi-humanoid robots.
Abstract
Imitation learning from human demonstrations offers a promising approach for robot skill acquisition, but egocentric human data introduces fundamental challenges due to the embodiment gap. During manipulation, humans actively coordinate head and hand movements, continuously reposition their viewpoint and use pre-action visual fixation search strategies to locate relevant objects. These behaviors create dynamic, task-driven head motions that static robot sensing systems cannot replicate, leading to a significant distribution shift that degrades policy performance. We present EgoMI (Egocentric Manipulation Interface), a framework that captures synchronized end-effector and active head trajectories during manipulation tasks, resulting in data that can be retargeted to compatible semi-humanoid robot embodiments. To handle rapid and wide-spanning head viewpoint changes, we introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Social Robot Interaction and HRI · Motor Control and Adaptation
