On the Limit of Language Models as Planning Formalizers
Cassie Huang, Li Zhang

TL;DR
This paper evaluates the ability of large language models to generate complete formal planning representations like PDDL from natural language descriptions, highlighting their strengths and limitations in formalizing and planning tasks.
Contribution
It systematically assesses LLMs' capacity to produce complete PDDL representations from natural descriptions, revealing their effectiveness and robustness compared to direct plan generation.
Findings
Most large models effectively formalize descriptions as PDDL.
Performance decreases as descriptions become more natural.
Models are robust to lexical perturbations.
Abstract
Large Language Models have been found to create plans that are neither executable nor verifiable in grounded environments. An emerging line of work demonstrates success in using the LLM as a formalizer to generate a formal representation of the planning domain in some language, such as Planning Domain Definition Language (PDDL). This formal representation can be deterministically solved to find a plan. We systematically evaluate this methodology while bridging some major gaps. While previous work only generates a partial PDDL representation, given templated, and therefore unrealistic environment descriptions, we generate the complete representation given descriptions of various naturalness levels. Among an array of observations critical to improve LLMs' formal planning abilities, we note that most large enough models can effectively formalize descriptions as PDDL, outperforming those…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModel-Driven Software Engineering Techniques
