PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models
Ziliang Zhao, Zenan Xu, Shuting Wang, Hongjin Qian, Yan Lei, Minda Hu, Zhao Wang, Shihan Dou, Zhicheng Dou, Pluto Zhou

TL;DR
PlanningBench is a framework that generates scalable, diverse, and verifiable planning data from real scenarios, enabling better evaluation and training of large language models in complex planning tasks.
Contribution
It introduces a structured taxonomy and a constraint-driven synthesis pipeline for controllable, realistic planning data generation, enhancing scalability and verifiability.
Findings
Current models struggle with coupled constraints in planning.
Reinforcement learning on PlanningBench data improves model performance.
Well-specified solutions lead to more stable training dynamics.
Abstract
Planning is a fundamental capability for large language models (LLMs) because such complex tasks require models to coordinate goals, constraints, resources, and long-term consequences into executable and verifiable solutions. Existing planning benchmarks, however, usually treat planning data as fixed collections of instances rather than controllable generation targets. This limits scenario coverage, ties difficulty to surface-level proxies rather than structural sources, and offers limited support for scalable generation, automatic verification, or planning-oriented training. We introduce PlanningBench, a framework for generating scalable, diverse, and verifiable planning data for both evaluation and training. PlanningBench starts from real planning scenarios and abstracts practical workflows into a structured taxonomy of more than 30 task types, subtasks, constraint families, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
