BUMBLE: Unifying Reasoning and Acting with Vision-Language Models for   Building-wide Mobile Manipulation

Rutav Shah; Albert Yu; Yifeng Zhu; Yuke Zhu; Roberto Mart\'in-Mart\'in

arXiv:2410.06237·cs.RO·October 10, 2024

BUMBLE: Unifying Reasoning and Acting with Vision-Language Models for Building-wide Mobile Manipulation

Rutav Shah, Albert Yu, Yifeng Zhu, Yuke Zhu, Roberto Mart\'in-Mart\'in

PDF

Open Access 1 Repo

TL;DR

BUMBLE is a unified vision-language framework enabling service robots to perform long-horizon, building-wide mobile manipulation tasks by integrating perception, motor skills, and memory, significantly improving success rates and user satisfaction.

Contribution

The paper introduces BUMBLE, a novel unified vision-language model that combines perception, motor skills, and memory for building-wide mobile manipulation tasks, advancing beyond prior fragmented approaches.

Findings

01

Achieves 47.1% success rate over 70 trials in diverse building environments.

02

Outperforms multiple baselines in long-horizon tasks requiring up to 12 skills.

03

User study shows 22% higher satisfaction compared to state-of-the-art methods.

Abstract

To operate at a building scale, service robots must perform very long-horizon mobile manipulation tasks by navigating to different rooms, accessing different floors, and interacting with a wide and unseen range of everyday objects. We refer to these tasks as Building-wide Mobile Manipulation. To tackle these inherently long-horizon tasks, we introduce BUMBLE, a unified Vision-Language Model (VLM)-based framework integrating open-world RGBD perception, a wide spectrum of gross-to-fine motor skills, and dual-layered memory. Our extensive evaluation (90+ hours) indicates that BUMBLE outperforms multiple baselines in long-horizon building-wide tasks that require sequencing up to 12 ground truth skills spanning 15 minutes per trial. BUMBLE achieves 47.1% success rate averaged over 70 trials in different buildings, tasks, and scene layouts from different starting rooms and floors. Our user…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ut-austin-robin/bumble
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Robotic Path Planning Algorithms · Robotics and Automated Systems

Methodstravel james