Large Language Models Can Solve Real-World Planning Rigorously with   Formal Verification Tools

Yilun Hao; Yongchao Chen; Yang Zhang; Chuchu Fan

arXiv:2404.11891·cs.AI·January 30, 2025·2 cites

Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools

Yilun Hao, Yongchao Chen, Yang Zhang, Chuchu Fan

PDF

Open Access 1 Video

TL;DR

This paper introduces a formal verification-based framework that enables large language models to solve complex multi-constraint planning problems with high success rates and strong generalizability, surpassing previous limitations.

Contribution

The authors propose a novel LLM-based planning framework that formalizes planning as satisfiability problems and integrates sound solvers, achieving significant improvements in success rate and generalization.

Findings

01

Achieves 93.9% success rate on TravelPlanner benchmark

02

Successfully generalizes to unseen constraints and domains

03

Effectively identifies unsatisfiable queries and suggests modifications

Abstract

Large Language Models (LLMs) struggle to directly generate correct plans for complex multi-constraint planning problems, even with self-verification and self-critique. For example, a U.S. domestic travel planning benchmark TravelPlanner was proposed in Xie et al. (2024), where the best LLM OpenAI o1-preview can only find viable travel plans with a 10% success rate given all needed information. In this work, we tackle this by proposing an LLM-based planning framework that formalizes and solves complex multi-constraint planning problems as constrained satisfiability problems, which are further consumed by sound and complete satisfiability solvers. We start with TravelPlanner as the primary use case and show that our framework achieves a success rate of 93.9% and is effective with diverse paraphrased prompts. More importantly, our framework has strong zero-shot generalizability,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsEmirates Airlines Office in Dubai