On The Planning Abilities of OpenAI's o1 Models: Feasibility,   Optimality, and Generalizability

Kevin Wang; Junbo Li; Neel P. Bhatt; Yihan Xi; Qiang Liu; Ufuk Topcu,; and Zhangyang Wang

arXiv:2409.19924·cs.AI·October 15, 2024·6 cites

On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability

Kevin Wang, Junbo Li, Neel P. Bhatt, Yihan Xi, Qiang Liu, Ufuk Topcu,, and Zhangyang Wang

PDF

Open Access 2 Repos

TL;DR

This paper evaluates the planning abilities of OpenAI's o1 models, analyzing their feasibility, optimality, and generalizability across various complex tasks, revealing strengths and limitations in structured environments.

Contribution

It provides the first comprehensive empirical assessment of o1 models' planning capabilities, highlighting their strengths in constraint adherence and identifying key bottlenecks.

Findings

01

o1 models outperform GPT-4 in constraint adherence

02

Models struggle with spatial reasoning and generalization

03

Suboptimal solutions with redundant actions are common

Abstract

Recent advancements in Large Language Models (LLMs) have showcased their ability to perform complex reasoning tasks, but their effectiveness in planning remains underexplored. In this study, we evaluate the planning capabilities of OpenAI's o1 models across a variety of benchmark tasks, focusing on three key aspects: feasibility, optimality, and generalizability. Through empirical evaluations on constraint-heavy tasks (e.g., $Barman$ , $Tyreworld$ ) and spatially complex environments (e.g., $Termes$ , $Floortile$ ), we highlight o1-preview's strengths in self-evaluation and constraint-following, while also identifying bottlenecks in decision-making and memory management, particularly in tasks requiring robust spatial reasoning. Our results reveal that o1-preview outperforms GPT-4 in adhering to task constraints and managing state transitions in structured…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Layer Normalization · Dense Connections · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding