Enhancing LLM Planning Capabilities through Intrinsic Self-Critique

Bernd Bohnet; Pierre-Alexandre Kamienny; Hanie Sedghi; Dilan Gorur; Pranjal Awasthi; Aaron Parisi; Kevin Swersky; Rosanne Liu; Azade Nova; Noah Fiedel

arXiv:2512.24103·cs.LG·January 1, 2026

Enhancing LLM Planning Capabilities through Intrinsic Self-Critique

Bernd Bohnet, Pierre-Alexandre Kamienny, Hanie Sedghi, Dilan Gorur, Pranjal Awasthi, Aaron Parisi, Kevin Swersky, Rosanne Liu, Azade Nova, Noah Fiedel

PDF

Open Access

TL;DR

This paper introduces a method where large language models critique their own answers to improve planning performance, achieving state-of-the-art results across multiple datasets without external verification.

Contribution

The paper presents a novel intrinsic self-critique approach for LLMs that significantly enhances planning capabilities and surpasses existing benchmarks without external sources.

Findings

01

Significant performance improvements on Blocksworld, Logistics, and Mini-grid datasets.

02

Achieved new state-of-the-art results with October 2024 LLM checkpoints.

03

Iterative correction and refinement further boost planning accuracy.

Abstract

We demonstrate an approach for LLMs to critique their \emph{own} answers with the goal of enhancing their performance that leads to significant improvements over established planning benchmarks. Despite the findings of earlier research that has cast doubt on the effectiveness of LLMs leveraging self critique methods, we show significant performance gains on planning datasets in the Blocksworld domain through intrinsic self-critique, without external source such as a verifier. We also demonstrate similar improvements on Logistics and Mini-grid datasets, exceeding strong baseline accuracies. We employ a few-shot learning technique and progressively extend it to a many-shot approach as our base method and demonstrate that it is possible to gain substantial improvement on top of this already competitive approach by employing an iterative process for correction and refinement. We illustrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Geographic Information Systems Studies · AI-based Problem Solving and Planning