On the Planning Abilities of Large Language Models : A Critical   Investigation

Karthik Valmeekam; Matthew Marquez; Sarath Sreedharan; Subbarao; Kambhampati

arXiv:2305.15771·cs.AI·November 27, 2023·52 cites

On the Planning Abilities of Large Language Models : A Critical Investigation

Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, Subbarao, Kambhampati

PDF

Open Access 2 Repos 2 Videos

TL;DR

This paper critically examines the planning abilities of large language models, revealing limited autonomous planning success but promising improvements when used as heuristic guides in external planning systems.

Contribution

The study systematically evaluates LLMs' planning capabilities and introduces the LLM-Modulo setting, showing how LLMs can assist external planners and verifiers.

Findings

01

GPT-4 achieves ~12% success rate in autonomous planning.

02

LLMs improve search efficiency when used as heuristics.

03

External verifiers enhance plan quality through feedback.

Abstract

Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. We aim to evaluate (1) the effectiveness of LLMs in generating plans autonomously in commonsense planning tasks and (2) the potential of LLMs in LLM-Modulo settings where they act as a source of heuristic guidance for external planners and verifiers. We conduct a systematic study by generating a suite of instances on domains similar to the ones employed in the International Planning Competition and evaluate LLMs in two distinct modes: autonomous and heuristic. Our findings reveal that LLMs' ability to generate executable plans autonomously is rather limited, with the best model (GPT-4) having an average success rate of ~12% across the domains. However, the results in the LLM-Modulo setting show more promise. In the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Do you think that ChatGPT can reason? [Prof. Subbarao Kambhampati]· youtube

On the Planning Abilities of Large Language Models - A Critical Investigation· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · linguistics and terminology studies · Topic Modeling