Can LLM-Reasoning Models Replace Classical Planning? A Benchmark Study
Kai Goebel, Patrik Zips

TL;DR
This study systematically evaluates large language models' ability to perform robotic task planning, comparing their performance with classical planners across various benchmarks, highlighting their strengths and limitations in complex scenarios.
Contribution
The paper provides a comprehensive benchmark analysis of LLMs for planning, revealing their current capabilities and challenges compared to traditional planning algorithms.
Findings
LLMs perform well on simple planning tasks.
Struggle with complex scenarios requiring resource management.
Fundamental challenges remain in applying LLMs to real-world robotic planning.
Abstract
Recent advancements in Large Language Models have sparked interest in their potential for robotic task planning. While these models demonstrate strong generative capabilities, their effectiveness in producing structured and executable plans remains uncertain. This paper presents a systematic evaluation of a broad spectrum of current state of the art language models, each directly prompted using Planning Domain Definition Language domain and problem files, and compares their planning performance with the Fast Downward planner across a variety of benchmarks. In addition to measuring success rates, we assess how faithfully the generated plans translate into sequences of actions that can actually be executed, identifying both strengths and limitations of using these models in this setting. Our findings show that while the models perform well on simpler planning tasks, they continue to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLogic, Reasoning, and Knowledge · AI-based Problem Solving and Planning · Semantic Web and Ontologies
