Self-Consistency Improves Chain of Thought Reasoning in Language Models

Xuezhi Wang; Jason Wei; Dale Schuurmans; Quoc Le; Ed Chi; Sharan; Narang; Aakanksha Chowdhery; Denny Zhou

arXiv:2203.11171·cs.CL·March 8, 2023·683 cites

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan, Narang, Aakanksha Chowdhery, Denny Zhou

PDF

Open Access 3 Repos 1 Datasets 1 Video

TL;DR

This paper introduces a self-consistency decoding method for chain-of-thought prompting in large language models, which samples multiple reasoning paths and selects the most consistent answer, significantly improving reasoning accuracy.

Contribution

It proposes a novel self-consistency decoding strategy that enhances chain-of-thought prompting by leveraging multiple reasoning paths for better accuracy.

Findings

01

Self-consistency improves performance on arithmetic and commonsense benchmarks.

02

Significant accuracy gains on GSM8K, SVAMP, AQuA, StrategyQA, and ARC-challenge.

03

Sampling multiple reasoning paths leads to more reliable answers.

Abstract

Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

affjljoo3581/arc-cot
dataset· 49 dl
49 dl

Videos

Self-Consistency Improves Chain of Thought Reasoning in Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks