Language-Conditioned Open-Vocabulary Mobile Manipulation with Pretrained Models
Shen Tan, Dong Zhou, Xiangyu Shao, Junqiao Wang, Guanghui Sun

TL;DR
This paper introduces LOVMM, a framework that combines large language and vision-language models to enable robots to perform open-vocabulary mobile manipulation tasks in household environments using natural language commands, demonstrating strong zero-shot and multi-task capabilities.
Contribution
The novel LOVMM framework integrates LLMs and VLMs for open-vocabulary mobile manipulation, enabling robots to understand and execute complex natural language instructions in household settings.
Findings
Strong zero-shot generalization in household environments
Effective multi-task learning capabilities
Higher success rates than state-of-the-art methods
Abstract
Open-vocabulary mobile manipulation (OVMM) that involves the handling of novel and unseen objects across different workspaces remains a significant challenge for real-world robotic applications. In this paper, we propose a novel Language-conditioned Open-Vocabulary Mobile Manipulation framework, named LOVMM, incorporating the large language model (LLM) and vision-language model (VLM) to tackle various mobile manipulation tasks in household environments. Our approach is capable of solving various OVMM tasks with free-form natural language instructions (e.g. "toss the food boxes on the office room desk to the trash bin in the corner", and "pack the bottles from the bed to the box in the guestroom"). Extensive experiments simulated in complex household environments show strong zero-shot generalization and multi-task learning abilities of LOVMM. Moreover, our approach can also generalize to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Multi-Agent Systems and Negotiation
