TL;DR
HoMMI introduces a scalable framework for learning whole-body mobile manipulation directly from human demonstrations, utilizing egocentric sensing and cross-embodiment policy design to enable complex robotic tasks.
Contribution
The paper presents a novel data collection and policy learning framework that bridges the embodiment gap for robot-free human demonstrations in mobile manipulation.
Findings
Enables long-horizon bimanual and whole-body manipulation tasks.
Uses egocentric sensing for global context in data collection.
Achieves effective policy transfer through cross-embodiment design.
Abstract
We present Whole-Body Mobile Manipulation Interface (HoMMI), a data collection and policy learning framework that learns whole-body mobile manipulation directly from robot-free human demonstrations. We augment UMI interfaces with egocentric sensing to capture the global context required for mobile manipulation, enabling portable, robot-free, and scalable data collection. However, naively incorporating egocentric sensing introduces a larger human-to-robot embodiment gap in both observation and action spaces, making policy transfer difficult. We explicitly bridge this gap with a cross-embodiment hand-eye policy design, including an embodiment agnostic visual representation; a relaxed head action representation; and a whole-body controller that realizes hand-eye trajectories through coordinated whole-body motion under robot-specific physical constraints. Together, these enable long-horizon…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Social Robot Interaction and HRI · Motor Control and Adaptation
