ReplanVLM: Replanning Robotic Tasks with Visual Language Models

Aoran Mei; Guo-Niu Zhu; Huaxiang Zhang; and Zhongxue Gan

arXiv:2407.21762·cs.RO·August 1, 2024

ReplanVLM: Replanning Robotic Tasks with Visual Language Models

Aoran Mei, Guo-Niu Zhu, Huaxiang Zhang, and Zhongxue Gan

PDF

Open Access

TL;DR

ReplanVLM introduces a framework combining visual language models with error correction and replanning strategies to improve robotic task success rates in complex, real-world environments.

Contribution

The paper presents a novel ReplanVLM framework that integrates internal and external error correction mechanisms with replanning strategies for robotic tasks.

Findings

01

Higher success rates in real robot experiments

02

Robust error correction in open-world tasks

03

Effective replanning for task failures

Abstract

Large language models (LLMs) have gained increasing popularity in robotic task planning due to their exceptional abilities in text analytics and generation, as well as their broad knowledge of the world. However, they fall short in decoding visual cues. LLMs have limited direct perception of the world, which leads to a deficient grasp of the current state of the world. By contrast, the emergence of visual language models (VLMs) fills this gap by integrating visual perception modules, which can enhance the autonomy of robotic task planning. Despite these advancements, VLMs still face challenges, such as the potential for task execution errors, even when provided with accurate instructions. To address such issues, this paper proposes a ReplanVLM framework for robotic task planning. In this study, we focus on error correction interventions. An internal error correction mechanism and an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Robotics and Automated Systems