Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1
Karthik Valmeekam, Kaya Stechly, Atharva Gundawar, Subbarao, Kambhampati

TL;DR
This paper evaluates the planning and scheduling abilities of OpenAI's Strawberry o1 Large Reasoning Model, demonstrating its improvements over traditional LLMs, analyzing its limitations, and proposing a combined system with external verifiers for guaranteed correctness.
Contribution
The paper introduces an evaluation of Strawberry o1 LRMs for planning, highlighting their strengths and weaknesses, and proposes an LRM-Modulo system that ensures output correctness.
Findings
o1 models outperform autoregressive LLMs in planning tasks
significant inference costs are associated with o1 models
combining o1 with external verifiers guarantees output correctness
Abstract
The ability to plan a course of action that achieves a desired state of affairs has long been considered a core competence of intelligent agents and has been an integral part of AI research since its inception. With the advent of large language models (LLMs), there has been considerable interest in the question of whether or not they possess such planning abilities, but -- despite the slew of new private and open source LLMs since GPT3 -- progress has remained slow. OpenAI claims that their recent o1 (Strawberry) model has been specifically constructed and trained to escape the normal limitations of autoregressive LLMs -- making it a new kind of model: a Large Reasoning Model (LRM). In this paper, we evaluate the planning capabilities of two LRMs (o1-preview and o1-mini) on both planning and scheduling benchmarks. We see that while o1 does seem to offer significant improvements over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBIM and Construction Integration
