Plan, Eliminate, and Track -- Language Models are Good Teachers for   Embodied Agents

Yue Wu; So Yeon Min; Yonatan Bisk; Ruslan Salakhutdinov; Amos Azaria,; Yuanzhi Li; Tom Mitchell; Shrimai Prabhumoye

arXiv:2305.02412·cs.CL·May 9, 2023·6 cites

Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents

Yue Wu, So Yeon Min, Yonatan Bisk, Ruslan Salakhutdinov, Amos Azaria,, Yuanzhi Li, Tom Mitchell, Shrimai Prabhumoye

PDF

Open Access

TL;DR

This paper introduces the PET framework, leveraging large language models to decompose tasks, filter observations, and track progress, significantly improving embodied agent performance on complex instruction-following benchmarks.

Contribution

The PET framework uses LLMs to simplify control problems for embodied agents without fine-tuning, addressing architecture constraints and improving generalization.

Findings

01

15% improvement over SOTA on AlfWorld benchmark

02

Effective task decomposition and observation filtering

03

Enhanced generalization to human goal specifications

Abstract

Pre-trained large language models (LLMs) capture procedural knowledge about the world. Recent work has leveraged LLM's ability to generate abstract plans to simplify challenging control tasks, either by action scoring, or action modeling (fine-tuning). However, the transformer architecture inherits several constraints that make it difficult for the LLM to directly serve as the agent: e.g. limited input lengths, fine-tuning inefficiency, bias from pre-training, and incompatibility with non-text environments. To maintain compatibility with a low-level trainable actor, we propose to instead use the knowledge in LLMs to simplify the control problem, rather than solving it. We propose the Plan, Eliminate, and Track (PET) framework. The Plan module translates a task description into a list of high-level sub-tasks. The Eliminate module masks out irrelevant objects and receptacles from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques