On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks
Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati

TL;DR
This paper empirically investigates the limitations of self-verification in large language models, revealing that external verification significantly outperforms self-critique in reasoning and planning tasks.
Contribution
It provides a systematic empirical study on the effectiveness of iterative prompting and external verification for LLMs in reasoning and planning, highlighting the limitations of self-critique.
Findings
Self-critique causes significant performance collapse.
External verification leads to notable performance improvements.
Re-prompting with a sound verifier retains most benefits.
Abstract
There has been considerable divergence of opinion on the reasoning abilities of Large Language Models (LLMs). While the initial optimism that reasoning might emerge automatically with scale has been tempered thanks to a slew of counterexamples--ranging from multiplication to simple planning--there persists a wide spread belief that LLMs can self-critique and improve their own solutions in an iterative fashion. This belief seemingly rests on the assumption that verification of correctness should be easier than generation--a rather classical argument from computational complexity--which should be irrelevant to LLMs to the extent that what they are doing is approximate retrieval. In this paper, we set out to systematically investigate the effectiveness of iterative prompting in the context of reasoning and planning. We present a principled empirical study of the performance of GPT-4 in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsAttention Is All You Need · Sparse Evolutionary Training · Position-Wise Feed-Forward Layer · Dropout · Linear Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Softmax · Byte Pair Encoding
