RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks
Yaran Chen, Wenbo Cui, Yuanwen Chen, Mining Tan, Xinyao Zhang, Dongbin, Zhao, He Wang

TL;DR
RoboGPT is a novel robotic agent that combines large language models with re-planning and specialized skills to improve long-term decision-making and task execution in daily instruction tasks, outperforming existing methods.
Contribution
The paper introduces RoboGPT, integrating LLM-based planning with a re-plan module and specialized RoboSkills, supported by a new robotic dataset, to enhance feasibility and adaptability in robotic task planning.
Findings
Outperforms SOTA on ALFRED daily tasks
Exceeds SOTA LLM planners in task rationality
Generalizes well to unseen tasks
Abstract
Robotic agents must master common sense and long-term sequential decisions to solve daily tasks through natural language instruction. The developments in Large Language Models (LLMs) in natural language processing have inspired efforts to use LLMs in complex robot planning. Despite LLMs' great generalization and comprehension of instruction tasks, LLMs-generated task plans sometimes lack feasibility and correctness. To address the problem, we propose a RoboGPT agent\footnote{our code and dataset will be released soon} for making embodied long-term decisions for daily tasks, with two modules: 1) LLMs-based planning with re-plan to break the task into multiple sub-goals; 2) RoboSkill individually designed for sub-goals to learn better navigation and manipulation skills. The LLMs-based planning is enhanced with a new robotic dataset and re-plan, called RoboGPT. The new robotic dataset of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
