Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Zipeng Fu, Tony Z. Zhao, Chelsea Finn

TL;DR
This paper introduces Mobile ALOHA, a low-cost teleoperation system enabling imitation learning for complex bimanual mobile manipulation tasks, demonstrating significant success rate improvements through data augmentation.
Contribution
The work presents a novel mobile whole-body teleoperation system and shows how co-training with static datasets enhances mobile manipulation performance.
Findings
Co-training increases success rates by up to 90%.
Mobile ALOHA enables autonomous complex tasks like cooking and door opening.
System effectively combines mobile base and whole-body control for manipulation.
Abstract
Imitation learning from human demonstrations has shown impressive performance in robotics. However, most results focus on table-top manipulation, lacking the mobility and dexterity necessary for generally useful tasks. In this work, we develop a system for imitating mobile manipulation tasks that are bimanual and require whole-body control. We first present Mobile ALOHA, a low-cost and whole-body teleoperation system for data collection. It augments the ALOHA system with a mobile base, and a whole-body teleoperation interface. Using data collected with Mobile ALOHA, we then perform supervised behavior cloning and find that co-training with existing static ALOHA datasets boosts performance on mobile manipulation tasks. With 50 demonstrations for each task, co-training can increase success rates by up to 90%, allowing Mobile ALOHA to autonomously complete complex mobile manipulation tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Stroke Rehabilitation and Recovery · Social Robot Interaction and HRI
MethodsContext Aggregated Bi-lateral Network for Semantic Segmentation · Focus
