Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps
Dicong Qiu, Wenzong Ma, Zhenfu Pan, Hui Xiong, Junwei Liang

TL;DR
This paper presents a novel framework enabling mobile robots to perform open-vocabulary manipulation in unseen dynamic environments by integrating 3D semantic mapping, visual-language models, and large language models for natural language understanding and planning.
Contribution
It introduces a new framework combining 3D semantic maps, visual-language models, and large language models for zero-shot manipulation and natural language understanding in dynamic environments.
Findings
Achieved over 80% success rate in real-world experiments.
Demonstrated effective zero-shot manipulation and natural language processing.
Improved navigation and task success metrics significantly over baseline methods.
Abstract
Open-Vocabulary Mobile Manipulation (OVMM) is a crucial capability for autonomous robots, especially when faced with the challenges posed by unknown and dynamic environments. This task requires robots to explore and build a semantic understanding of their surroundings, generate feasible plans to achieve manipulation goals, adapt to environmental changes, and comprehend natural language instructions from humans. To address these challenges, we propose a novel framework that leverages the zero-shot detection and grounded recognition capabilities of pretraining visual-language models (VLMs) combined with dense 3D entity reconstruction to build 3D semantic maps. Additionally, we utilize large language models (LLMs) for spatial region abstraction and online planning, incorporating human instructions and spatial semantic context. We have built a 10-DoF mobile manipulation robotic platform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems · Speech and dialogue systems · Natural Language Processing Techniques
MethodsSemi-Pseudo-Label · Shrink and Fine-Tune
