Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Yilun Hao; Yang Zhang; Chuchu Fan

arXiv:2410.12112·cs.AI·July 10, 2025·2 cites

Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Yilun Hao, Yang Zhang, Chuchu Fan

PDF

Open Access 3 Reviews

TL;DR

This paper introduces LLMFP, a universal framework that uses large language models to formalize and solve complex planning problems as optimization tasks without task-specific training, achieving high success rates across diverse scenarios.

Contribution

The paper presents LLMFP, a novel, general-purpose approach leveraging LLMs for formalized, zero-shot planning as optimization, outperforming existing methods across multiple planning tasks.

Findings

01

Achieves 83.7% and 86.8% optimal rates on 9 planning tasks with GPT-4o and Claude 3.5 Sonnet.

02

Significantly outperforms baseline methods with 37.6% and 40.7% improvements.

03

Validated components through ablation experiments and analysis of success and failure cases.

Abstract

While large language models (LLMs) have recently demonstrated strong potential in solving planning problems, there is a trade-off between flexibility and complexity. LLMs, as zero-shot planners themselves, are still not capable of directly generating valid plans for complex planning problems such as multi-constraint or long-horizon tasks. On the other hand, many frameworks aiming to solve complex planning problems often rely on task-specific preparatory efforts, such as task-specific in-context examples and pre-defined critics/verifiers, which limits their cross-task generalization capability. In this paper, we tackle these challenges by observing that the core of many planning problems lies in optimization problems: searching for the optimal solution (best plan) with goals subject to constraints (preconditions and effects of decisions). With LLMs' commonsense, reasoning, and…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

Strength - The paper is in general clear (even if it is sometimes hand-wavey) - The problem is interesting and the related work seems to cover all bases - The results are impressive, and much better than the baselines. - The proposed workflow makes sense and works well.

Weaknesses

- I’m not sure if the novelty of the proposed work over the PDDL-based approach is sufficiently novel for a top conference. - The appendix is huge (~40 pages!). This seems to me not reasonable, as the main paper should be self contained. - The presentation is too much hand-wavy. It would be great to try to capture more of it in a more formal manner

Reviewer 02Rating 1Confidence 4

Strengths

LLMFP's ability to handle a wide variety of planning problems without task-specific examples is a significant strength.

Weaknesses

1. The baselines for comparison do not seem to be a fair comparison to LLMFP. See questions. 2. The related work does not cover relevant set of papers that should have been used a baseline to compare this work. Mentioning a few of them below - [1] Webb, T., Mondal, S. S., Wang, C., Krabach, B., & Momennejad, I. (2023). A Prefrontal Cortex-inspired Architecture for Planning in Large Language Models. arXiv preprint arXiv:2310.00194. [2] Fabiano, F., Pallagani, V., Ganapini, M. B., Horesh, L., Lor

Reviewer 03Rating 6Confidence 3

Strengths

I like the general idea and the presented approach. One could argue that it is simply a combination of prompt engineering and the incorporation of external tools. However, showing an effective way of doing this can be a significant contribution. The baselines and ablations are well-chosen for evaluating the performance of LLMFP. The paper is written very clearly, making it easy to read. The figures are well-chosen (particularly Figure 1), they are helpful in understanding the pipeline. I like

Weaknesses

The goal stated in the introduction is "Can we build a universal LLM-based planning system that can solve complex planning problems without task-specific efforts?". However, my main concern is whether the tasks used for experiments are indeed complex planning problems. Specifically, the 5 multi-constraint problems resemble simply optimization problems rather than planning problems. Hence it's quite clear that adding an external optimizer to LLM would be much better than just using LLM. On the ot

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning · Logic, Reasoning, and Knowledge · Logic, programming, and type systems