On the Planning Abilities of Large Language Models : A Critical Investigation
Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, Subbarao, Kambhampati

TL;DR
This paper critically examines the planning abilities of large language models, revealing limited autonomous planning success but promising improvements when used as heuristic guides in external planning systems.
Contribution
The study systematically evaluates LLMs' planning capabilities and introduces the LLM-Modulo setting, showing how LLMs can assist external planners and verifiers.
Findings
GPT-4 achieves ~12% success rate in autonomous planning.
LLMs improve search efficiency when used as heuristics.
External verifiers enhance plan quality through feedback.
Abstract
Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. We aim to evaluate (1) the effectiveness of LLMs in generating plans autonomously in commonsense planning tasks and (2) the potential of LLMs in LLM-Modulo settings where they act as a source of heuristic guidance for external planners and verifiers. We conduct a systematic study by generating a suite of instances on domains similar to the ones employed in the International Planning Competition and evaluate LLMs in two distinct modes: autonomous and heuristic. Our findings reveal that LLMs' ability to generate executable plans autonomously is rather limited, with the best model (GPT-4) having an average success rate of ~12% across the domains. However, the results in the LLM-Modulo setting show more promise. In the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · linguistics and terminology studies · Topic Modeling
