HoMeR: Learning In-the-Wild Mobile Manipulation via Hybrid Imitation and Whole-Body Control

Priya Sundaresan; Rhea Malhotra; Phillip Miao; Jingyun Yang; Jimmy Wu; Hengyuan Hu; Rika Antonova; Francis Engelmann; Dorsa Sadigh; Jeannette Bohg

arXiv:2506.01185·cs.RO·October 14, 2025

HoMeR: Learning In-the-Wild Mobile Manipulation via Hybrid Imitation and Whole-Body Control

Priya Sundaresan, Rhea Malhotra, Phillip Miao, Jingyun Yang, Jimmy Wu, Hengyuan Hu, Rika Antonova, Francis Engelmann, Dorsa Sadigh, Jeannette Bohg

PDF

Open Access

TL;DR

HoMeR is a novel imitation learning framework that combines whole-body control with hybrid action modes, enabling effective and generalizable mobile manipulation in real-world household tasks with minimal demonstrations.

Contribution

It introduces a hybrid action space and whole-body control approach for mobile manipulation, improving task success and generalization in in-the-wild environments.

Findings

01

Achieves 79.17% success rate with 20 demonstrations per task.

02

Outperforms baselines by 29.17% on average.

03

Compatible with vision-language models for better generalization.

Abstract

We introduce HoMeR, an imitation learning framework for mobile manipulation that combines whole-body control with hybrid action modes that handle both long-range and fine-grained motion, enabling effective performance on realistic in-the-wild tasks. At its core is a fast, kinematics-based whole-body controller that maps desired end-effector poses to coordinated motion across the mobile base and arm. Within this reduced end-effector action space, HoMeR learns to switch between absolute pose predictions for long-range movement and relative pose predictions for fine-grained manipulation, offloading low-level coordination to the controller and focusing learning on task-level decisions. We deploy HoMeR on a holonomic mobile manipulator with a 7-DoF arm in a real home. We compare HoMeR to baselines without hybrid actions or whole-body control across 3 simulated and 3 real household tasks such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Social Robot Interaction and HRI