LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1   on PlanBench

Karthik Valmeekam; Kaya Stechly; Subbarao Kambhampati

arXiv:2409.13373·cs.AI·September 23, 2024·6 cites

LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench

Karthik Valmeekam, Kaya Stechly, Subbarao Kambhampati

PDF

Open Access 2 Repos

TL;DR

This paper evaluates the planning abilities of OpenAI's o1 model, a new type of Large Reasoning Model, on the PlanBench benchmark, revealing significant improvements but still highlighting limitations in accuracy and efficiency.

Contribution

It provides a comprehensive evaluation of o1's planning capabilities on PlanBench, comparing it with existing LLMs and highlighting its advancements and remaining challenges.

Findings

01

o1 outperforms other models on PlanBench

02

Performance of o1 is a significant improvement but not saturated

03

Questions about accuracy and efficiency remain for deployment

Abstract

The ability to plan a course of action that achieves a desired state of affairs has long been considered a core competence of intelligent agents and has been an integral part of AI research since its inception. With the advent of large language models (LLMs), there has been considerable interest in the question of whether or not they possess such planning abilities. PlanBench, an extensible benchmark we developed in 2022, soon after the release of GPT3, has remained an important tool for evaluating the planning abilities of LLMs. Despite the slew of new private and open source LLMs since GPT3, progress on this benchmark has been surprisingly slow. OpenAI claims that their recent o1 (Strawberry) model has been specifically constructed and trained to escape the normal limitations of autoregressive LLMs--making it a new kind of model: a Large Reasoning Model (LRM). Using this development…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLibrary Science and Information Systems · Semantic Web and Ontologies · Mathematics, Computing, and Information Processing