Exploring and Benchmarking the Planning Capabilities of Large Language   Models

Bernd Bohnet; Azade Nova; Aaron T Parisi; Kevin Swersky; Katayoon; Goshvadi; Hanjun Dai; Dale Schuurmans; Noah Fiedel; Hanie Sedghi

arXiv:2406.13094·cs.CL·November 5, 2024·1 cites

Exploring and Benchmarking the Planning Capabilities of Large Language Models

Bernd Bohnet, Azade Nova, Aaron T Parisi, Kevin Swersky, Katayoon, Goshvadi, Hanjun Dai, Dale Schuurmans, Noah Fiedel, Hanie Sedghi

PDF

Open Access

TL;DR

This paper develops a comprehensive benchmark suite and evaluates various methods, including in-context learning, fine-tuning, and chain-of-thought reasoning, to enhance the planning capabilities of large language models across diverse scenarios.

Contribution

It introduces a systematic benchmark for LLM planning tasks and analyzes the effectiveness of multiple techniques to improve planning performance and generalization.

Findings

01

Increased context length improves planning accuracy.

02

Fine-tuning enhances optimal planning path generation.

03

Chain-of-thought reasoning boosts planning performance.

Abstract

Classical and natural language planning tasks remain a difficult domain for modern large language models (LLMs). In this work, we lay the foundations for improving planning capabilities of LLMs. First, we construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios. This suite includes algorithms to methodically generate instances of tasks with varying levels of difficulty, allowing for rigorous and systematic evaluation of LLM performance. Next, we investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance. In addition, we demonstrate the positive impact of fine-tuning LLMs on optimal planning paths. We also probe the efficacy of chain-of-thought reasoning methods to improve LLM planning performance. Moreover, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling