Systematic Analysis of LLM Contributions to Planning: Solver, Verifier,   Heuristic

Haoming Li; Zhaoliang Chen; Songyuan Liu; Yiming Lu; Fei Liu

arXiv:2412.09666·cs.AI·December 16, 2024

Systematic Analysis of LLM Contributions to Planning: Solver, Verifier, Heuristic

Haoming Li, Zhaoliang Chen, Songyuan Liu, Yiming Lu, Fei Liu

PDF

TL;DR

This paper systematically analyzes how large language models contribute to planning tasks, focusing on their roles as solvers, verifiers, and heuristics, revealing their strengths in providing feedback rather than direct plan generation.

Contribution

It introduces a framework for evaluating LLMs in planning, highlighting their utility in heuristic feedback and proposing a new benchmark for learning user preferences dynamically.

Findings

01

LLMs excel at providing feedback signals for intermediate solutions.

02

Generating correct plans directly remains challenging for LLMs.

03

A new benchmark for learning user preferences on the fly is proposed.

Abstract

In this work, we provide a systematic analysis of how large language models (LLMs) contribute to solving planning problems. In particular, we examine how LLMs perform when they are used as problem solver, solution verifier, and heuristic guidance to improve intermediate solutions. Our analysis reveals that although it is difficult for LLMs to generate correct plans out-of-the-box, LLMs are much better at providing feedback signals to intermediate/incomplete solutions in the form of comparative heuristic functions. This evaluation framework provides insights into how future work may design better LLM-based tree-search algorithms to solve diverse planning and reasoning problems. We also propose a novel benchmark to evaluate LLM's ability to learn user preferences on the fly, which has wide applications in practical settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.