Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Abulhair Saparov, He He

TL;DR
This paper systematically analyzes the reasoning process of large language models using a synthetic dataset, revealing their strengths in deduction but limitations in proof planning and exploration.
Contribution
It introduces PrOntoQA, a synthetic dataset enabling formal analysis of LLM reasoning, and provides insights into their reasoning capabilities and limitations.
Findings
LLMs can perform correct deduction steps
They are capable of reasoning in fictional contexts
They struggle with proof planning and exploring multiple deduction paths
Abstract
Large language models (LLMs) have shown remarkable reasoning capabilities given chain-of-thought prompts (examples with intermediate reasoning steps). Existing benchmarks measure reasoning ability indirectly, by evaluating accuracy on downstream tasks such as mathematical reasoning. However, it is unclear how these models obtain the answers and whether they rely on simple heuristics rather than the generated chain-of-thought. To enable systematic exploration of the reasoning ability of LLMs, we present a new synthetic question-answering dataset called PrOntoQA, where each example is generated from a synthetic world model represented in first-order logic. This allows us to parse the generated chain-of-thought into symbolic proofs for formal analysis. Our analysis on InstructGPT and GPT-3 shows that LLMs are quite capable of making correct individual deduction steps, and so are generally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Multi-Head Attention · Adam · Dense Connections · Dropout · Layer Normalization · {Dispute@FaQ-s}How to file a dispute with Expedia? · Refunds@Expedia|||How do I get a full refund from Expedia?
