An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs

Kaustubh Vyas; Damien Graux; S\'ebastien Montella; Pavlos Vougiouklis,; Ruofei Lai; Keshuang Li; Yang Ren; Jeff Z. Pan

arXiv:2502.20175·cs.AI·February 28, 2025

An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs

Kaustubh Vyas, Damien Graux, S\'ebastien Montella, Pavlos Vougiouklis,, Ruofei Lai, Keshuang Li, Yang Ren, Jeff Z. Pan

PDF

Open Access

TL;DR

This paper systematically evaluates how well various large language models understand and generate Planning Domain Definition Language (PDDL), revealing strengths and limitations in their ability to perform formal planning tasks.

Contribution

It provides the first extensive analysis of multiple LLMs' capabilities in understanding and generating PDDL, highlighting current limitations and potential for AI planning applications.

Findings

01

Some models effectively parse and generate PDDL in zero-shot settings.

02

Limitations appear in complex planning scenarios requiring nuanced understanding.

03

Results inform future development of LLMs for formal planning tasks.

Abstract

In recent advancements, large language models (LLMs) have exhibited proficiency in code generation and chain-of-thought reasoning, laying the groundwork for tackling automatic formal planning tasks. This study evaluates the potential of LLMs to understand and generate Planning Domain Definition Language (PDDL), an essential representation in artificial intelligence planning. We conduct an extensive analysis across 20 distinct models spanning 7 major LLM families, both commercial and open-source. Our comprehensive evaluation sheds light on the zero-shot LLM capabilities of parsing, generating, and reasoning with PDDL. Our findings indicate that while some models demonstrate notable effectiveness in handling PDDL, others pose limitations in more complex scenarios requiring nuanced planning knowledge. These results highlight the promise and current limitations of LLMs in formal planning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning · Artificial Intelligence in Games · Multimodal Machine Learning Applications