Enhancing LLM Planning Capabilities through Intrinsic Self-Critique
Bernd Bohnet, Pierre-Alexandre Kamienny, Hanie Sedghi, Dilan Gorur, Pranjal Awasthi, Aaron Parisi, Kevin Swersky, Rosanne Liu, Azade Nova, Noah Fiedel

TL;DR
This paper introduces a method where large language models critique their own answers to improve planning performance, achieving state-of-the-art results across multiple datasets without external verification.
Contribution
The paper presents a novel intrinsic self-critique approach for LLMs that significantly enhances planning capabilities and surpasses existing benchmarks without external sources.
Findings
Significant performance improvements on Blocksworld, Logistics, and Mini-grid datasets.
Achieved new state-of-the-art results with October 2024 LLM checkpoints.
Iterative correction and refinement further boost planning accuracy.
Abstract
We demonstrate an approach for LLMs to critique their \emph{own} answers with the goal of enhancing their performance that leads to significant improvements over established planning benchmarks. Despite the findings of earlier research that has cast doubt on the effectiveness of LLMs leveraging self critique methods, we show significant performance gains on planning datasets in the Blocksworld domain through intrinsic self-critique, without external source such as a verifier. We also demonstrate similar improvements on Logistics and Mini-grid datasets, exceeding strong baseline accuracies. We employ a few-shot learning technique and progressively extend it to a many-shot approach as our base method and demonstrate that it is possible to gain substantial improvement on top of this already competitive approach by employing an iterative process for correction and refinement. We illustrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Geographic Information Systems Studies · AI-based Problem Solving and Planning
