Exploring and Benchmarking the Planning Capabilities of Large Language Models
Bernd Bohnet, Azade Nova, Aaron T Parisi, Kevin Swersky, Katayoon, Goshvadi, Hanjun Dai, Dale Schuurmans, Noah Fiedel, Hanie Sedghi

TL;DR
This paper develops a comprehensive benchmark suite and evaluates various methods, including in-context learning, fine-tuning, and chain-of-thought reasoning, to enhance the planning capabilities of large language models across diverse scenarios.
Contribution
It introduces a systematic benchmark for LLM planning tasks and analyzes the effectiveness of multiple techniques to improve planning performance and generalization.
Findings
Increased context length improves planning accuracy.
Fine-tuning enhances optimal planning path generation.
Chain-of-thought reasoning boosts planning performance.
Abstract
Classical and natural language planning tasks remain a difficult domain for modern large language models (LLMs). In this work, we lay the foundations for improving planning capabilities of LLMs. First, we construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios. This suite includes algorithms to methodically generate instances of tasks with varying levels of difficulty, allowing for rigorous and systematic evaluation of LLM performance. Next, we investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance. In addition, we demonstrate the positive impact of fine-tuning LLMs on optimal planning paths. We also probe the efficacy of chain-of-thought reasoning methods to improve LLM planning performance. Moreover, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
