Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
Zihao Wang, Shaofei Cai, Guanzhou Chen, Anji Liu, Xiaojian Ma, Yitao, Liang

TL;DR
This paper introduces DEPS, an interactive planning method using large language models that improves multi-task agent performance in open-world environments like Minecraft by enabling better reasoning, error correction, and goal selection.
Contribution
The paper presents DEPS, a novel interactive planning framework that enhances multi-task agents' robustness and efficiency in open-world environments through description, explanation, and goal selection modules.
Findings
Achieved 70+ Minecraft tasks with zero-shot learning.
Nearly doubled overall performance compared to baseline methods.
Proved effectiveness in non-open-ended domains like ALFWorld and tabletop manipulation.
Abstract
We investigate the challenge of task planning for multi-task embodied agents in open-world environments. Two main difficulties are identified: 1) executing plans in an open-world environment (e.g., Minecraft) necessitates accurate and multi-step reasoning due to the long-term nature of tasks, and 2) as vanilla planners do not consider how easy the current agent can achieve a given sub-task when ordering parallel sub-goals within a complicated plan, the resulting plan could be inefficient or even infeasible. To this end, we propose "escribe, xplain, lan and elect" (), an interactive planning approach based on Large Language Models (LLMs). DEPS facilitates better error correction on initial LLM-generated by integrating of the plan execution process and providing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
