BUMBLE: Unifying Reasoning and Acting with Vision-Language Models for Building-wide Mobile Manipulation
Rutav Shah, Albert Yu, Yifeng Zhu, Yuke Zhu, Roberto Mart\'in-Mart\'in

TL;DR
BUMBLE is a unified vision-language framework enabling service robots to perform long-horizon, building-wide mobile manipulation tasks by integrating perception, motor skills, and memory, significantly improving success rates and user satisfaction.
Contribution
The paper introduces BUMBLE, a novel unified vision-language model that combines perception, motor skills, and memory for building-wide mobile manipulation tasks, advancing beyond prior fragmented approaches.
Findings
Achieves 47.1% success rate over 70 trials in diverse building environments.
Outperforms multiple baselines in long-horizon tasks requiring up to 12 skills.
User study shows 22% higher satisfaction compared to state-of-the-art methods.
Abstract
To operate at a building scale, service robots must perform very long-horizon mobile manipulation tasks by navigating to different rooms, accessing different floors, and interacting with a wide and unseen range of everyday objects. We refer to these tasks as Building-wide Mobile Manipulation. To tackle these inherently long-horizon tasks, we introduce BUMBLE, a unified Vision-Language Model (VLM)-based framework integrating open-world RGBD perception, a wide spectrum of gross-to-fine motor skills, and dual-layered memory. Our extensive evaluation (90+ hours) indicates that BUMBLE outperforms multiple baselines in long-horizon building-wide tasks that require sequencing up to 12 ground truth skills spanning 15 minutes per trial. BUMBLE achieves 47.1% success rate averaged over 70 trials in different buildings, tasks, and scene layouts from different starting rooms and floors. Our user…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotic Path Planning Algorithms · Robotics and Automated Systems
Methodstravel james
