Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in   Language Model Prompting

Rylan Schaeffer; Kateryna Pistunova; Samar Khanna; Sarthak Consul,; Sanmi Koyejo

arXiv:2307.10573·cs.AI·July 25, 2023·2 cites

Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting

Rylan Schaeffer, Kateryna Pistunova, Samar Khanna, Sarthak Consul,, Sanmi Koyejo

PDF

Open Access

TL;DR

This paper investigates why prompting language models with invalid reasoning steps can improve performance, finding that such prompts perform similarly to valid ones even on challenging tasks, suggesting other factors influence gains.

Contribution

It demonstrates that invalid reasoning prompts can match valid prompts in improving performance on difficult tasks, challenging assumptions about the necessity of logical validity.

Findings

01

Invalid prompts achieve similar gains as valid prompts on BBH tasks.

02

Some previous CoT prompts contain logical errors.

03

Performance improvements are influenced by factors beyond logical validity.

Abstract

Language models can be prompted to reason through problems in a manner that significantly improves performance. However, \textit{why} such prompting improves performance is unclear. Recent work showed that using logically \textit{invalid} Chain-of-Thought (CoT) prompting improves performance almost as much as logically \textit{valid} CoT prompting, and that editing CoT prompts to replace problem-specific information with abstract information or out-of-distribution information typically doesn't harm performance. Critics have responded that these findings are based on too few and too easily solved tasks to draw meaningful conclusions. To resolve this dispute, we test whether logically invalid CoT prompts offer the same level of performance gains as logically valid prompts on the hardest tasks in the BIG-Bench benchmark, termed BIG-Bench Hard (BBH). We find that the logically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks