DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints

Yinger Zhang; Shutong Jiang; Renhao Li; Jianhong Tu; Yang Su; Lianghao Deng; Xudong Guo; Chenxu Lv; Junyang Lin

arXiv:2601.18137·cs.AI·January 27, 2026

DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints

Yinger Zhang, Shutong Jiang, Renhao Li, Jianhong Tu, Yang Su, Lianghao Deng, Xudong Guo, Chenxu Lv, Junyang Lin

PDF

Open Access 5 Datasets

TL;DR

DeepPlanning introduces a challenging benchmark for long-horizon agent planning involving complex constraints, revealing current LLM limitations and guiding future improvements in explicit reasoning and tool use.

Contribution

The paper presents DeepPlanning, a new benchmark for practical long-horizon planning with real-world constraints, highlighting the need for improved reasoning and tool integration in agentic LLMs.

Findings

01

Current LLMs struggle with long-horizon planning tasks.

02

Explicit reasoning patterns improve planning effectiveness.

03

Parallel tool use enhances efficiency in complex tasks.

Abstract

While agent evaluation has shifted toward long-horizon tasks, most benchmarks still emphasize local, step-level reasoning rather than the global constrained optimization (e.g., time and financial budgets) that demands genuine planning ability. Meanwhile, existing LLM planning benchmarks underrepresent the active information gathering and fine-grained local constraints typical of real-world settings. To address this, we introduce DeepPlanning, a challenging benchmark for practical long-horizon agent planning. It features multi-day travel planning and multi-product shopping tasks that require proactive information acquisition, local constrained reasoning, and global constrained optimization. Evaluations on DeepPlanning show that even frontier agentic LLMs struggle with these problems, highlighting the importance of reliable explicit reasoning patterns and parallel tool use for achieving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · AI-based Problem Solving and Planning