Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with   3D Semantic Maps

Dicong Qiu; Wenzong Ma; Zhenfu Pan; Hui Xiong; Junwei Liang

arXiv:2406.18115·cs.RO·June 27, 2024

Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps

Dicong Qiu, Wenzong Ma, Zhenfu Pan, Hui Xiong, Junwei Liang

PDF

Open Access

TL;DR

This paper presents a novel framework enabling mobile robots to perform open-vocabulary manipulation in unseen dynamic environments by integrating 3D semantic mapping, visual-language models, and large language models for natural language understanding and planning.

Contribution

It introduces a new framework combining 3D semantic maps, visual-language models, and large language models for zero-shot manipulation and natural language understanding in dynamic environments.

Findings

01

Achieved over 80% success rate in real-world experiments.

02

Demonstrated effective zero-shot manipulation and natural language processing.

03

Improved navigation and task success metrics significantly over baseline methods.

Abstract

Open-Vocabulary Mobile Manipulation (OVMM) is a crucial capability for autonomous robots, especially when faced with the challenges posed by unknown and dynamic environments. This task requires robots to explore and build a semantic understanding of their surroundings, generate feasible plans to achieve manipulation goals, adapt to environmental changes, and comprehend natural language instructions from humans. To address these challenges, we propose a novel framework that leverages the zero-shot detection and grounded recognition capabilities of pretraining visual-language models (VLMs) combined with dense 3D entity reconstruction to build 3D semantic maps. Additionally, we utilize large language models (LLMs) for spatial region abstraction and online planning, incorporating human instructions and spatial semantic context. We have built a 10-DoF mobile manipulation robotic platform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems · Speech and dialogue systems · Natural Language Processing Techniques

MethodsSemi-Pseudo-Label · Shrink and Fine-Tune