How Far Are LLMs from Symbolic Planners? An NLP-Based Perspective
Ma'ayan Armony, Albert Mero\~no-Pe\~nuela, Gerard Canal

TL;DR
This paper evaluates the reasoning capabilities of Large Language Models in AI planning by treating planning as an NLP task, proposing a recovery pipeline, and analyzing its effectiveness compared to classical planners.
Contribution
It introduces an NLP-based evaluation and recovery pipeline for LLM-generated plans, providing a holistic analysis of their planning capabilities and limitations.
Findings
LLMs show no clear evidence of reasoning during plan generation.
The recovery pipeline improves plan quality and success rate.
On average, only 2.65 actions are executable in generated plans.
Abstract
The reasoning and planning abilities of Large Language Models (LLMs) have been a frequent topic of discussion in recent years. Their ability to take unstructured planning problems as input has made LLMs' integration into AI planning an area of interest. Nevertheless, LLMs are still not reliable as planners, with the generated plans often containing mistaken or hallucinated actions. Existing benchmarking and evaluation methods investigate planning with LLMs, focusing primarily on success rate as a quality indicator in various planning tasks, such as validating plans or planning in relaxed conditions. In this paper, we approach planning with LLMs as a natural language processing (NLP) task, given that LLMs are NLP models themselves. We propose a recovery pipeline consisting of an NLP-based evaluation of the generated plans, along with three stages to recover the plans through NLP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Language, Metaphor, and Cognition · Multi-Agent Systems and Negotiation
