EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations

Justin Yu; Yide Shentu; Di Wu; Pieter Abbeel; Ken Goldberg; Philipp Wu

arXiv:2511.00153·cs.RO·March 11, 2026

EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations

Justin Yu, Yide Shentu, Di Wu, Pieter Abbeel, Ken Goldberg, Philipp Wu

PDF

Open Access

TL;DR

EgoMI is a framework that captures synchronized head and hand movements during human manipulation demonstrations, enabling robots to better imitate human actions despite embodiment differences by modeling active viewpoint changes.

Contribution

We introduce EgoMI, a novel data collection and learning framework that incorporates head motion and active visual search strategies to improve robot imitation from egocentric human demonstrations.

Findings

01

Policies with head-motion modeling outperform baselines.

02

Explicit head-motion modeling improves robustness in imitation learning.

03

EgoMI effectively bridges the embodiment gap for semi-humanoid robots.

Abstract

Imitation learning from human demonstrations offers a promising approach for robot skill acquisition, but egocentric human data introduces fundamental challenges due to the embodiment gap. During manipulation, humans actively coordinate head and hand movements, continuously reposition their viewpoint and use pre-action visual fixation search strategies to locate relevant objects. These behaviors create dynamic, task-driven head motions that static robot sensing systems cannot replicate, leading to a significant distribution shift that degrades policy performance. We present EgoMI (Egocentric Manipulation Interface), a framework that captures synchronized end-effector and active head trajectories during manipulation tasks, resulting in data that can be retargeted to compatible semi-humanoid robot embodiments. To handle rapid and wide-spanning head viewpoint changes, we introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Social Robot Interaction and HRI · Motor Control and Adaptation