ReplanVLM: Replanning Robotic Tasks with Visual Language Models
Aoran Mei, Guo-Niu Zhu, Huaxiang Zhang, and Zhongxue Gan

TL;DR
ReplanVLM introduces a framework combining visual language models with error correction and replanning strategies to improve robotic task success rates in complex, real-world environments.
Contribution
The paper presents a novel ReplanVLM framework that integrates internal and external error correction mechanisms with replanning strategies for robotic tasks.
Findings
Higher success rates in real robot experiments
Robust error correction in open-world tasks
Effective replanning for task failures
Abstract
Large language models (LLMs) have gained increasing popularity in robotic task planning due to their exceptional abilities in text analytics and generation, as well as their broad knowledge of the world. However, they fall short in decoding visual cues. LLMs have limited direct perception of the world, which leads to a deficient grasp of the current state of the world. By contrast, the emergence of visual language models (VLMs) fills this gap by integrating visual perception modules, which can enhance the autonomy of robotic task planning. Despite these advancements, VLMs still face challenges, such as the potential for task execution errors, even when provided with accurate instructions. To address such issues, this paper proposes a ReplanVLM framework for robotic task planning. In this study, we focus on error correction interventions. An internal error correction mechanism and an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Robotics and Automated Systems
