On Grounded Planning for Embodied Tasks with Language Models
Bill Yuchen Lin, Chengsong Huang, Qian Liu, Wenda Gu, Sam Sommerer,, Xiang Ren

TL;DR
This paper investigates whether language models can generate grounded, executable plans for embodied tasks by introducing G-PlanET, a novel problem formulation, and an evaluation protocol, showing that environment encoding improves planning performance.
Contribution
It presents the first study on grounded planning with language models, introducing G-PlanET, an evaluation protocol, and demonstrating the benefits of environment encoding and iterative decoding strategies.
Findings
Tables improve planning accuracy.
Iterative decoding enhances plan quality.
Grounded planning performance varies with environment encoding.
Abstract
Language models (LMs) have demonstrated their capability in possessing commonsense knowledge of the physical world, a crucial aspect of performing tasks in everyday life. However, it remains unclear **whether LMs have the capacity to generate grounded, executable plans for embodied tasks.** This is a challenging task as LMs lack the ability to perceive the environment through vision and feedback from the physical environment. In this paper, we address this important research question and present the first investigation into the topic. Our novel problem formulation, named **G-PlanET**, inputs a high-level goal and a data table about objects in a specific environment, and then outputs a step-by-step actionable plan for a robotic agent to follow. To facilitate the study, we establish an **evaluation protocol** and design a dedicated metric to assess the quality of the plans. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems
