Large Language Models are Zero-Shot Reasoners

Takeshi Kojima; Shixiang Shane Gu; Machel Reid; Yutaka Matsuo; Yusuke; Iwasawa

arXiv:2205.11916·cs.CL·January 31, 2023·1.1k cites

Large Language Models are Zero-Shot Reasoners

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke, Iwasawa

PDF

Open Access 4 Repos 10 Models 1 Datasets 2 Videos

TL;DR

Large language models can perform complex reasoning tasks in a zero-shot setting by adding a simple prompt, significantly improving their performance across various benchmarks without needing few-shot examples.

Contribution

Demonstrates that adding 'Let's think step by step' enables LLMs to excel at reasoning tasks in a zero-shot manner, establishing a strong baseline and revealing untapped capabilities.

Findings

01

Zero-shot-CoT improves accuracy on arithmetic tasks from 17.7% to 78.7%.

02

Significant performance gains on diverse reasoning benchmarks.

03

Simple prompting reveals broad cognitive abilities in LLMs.

Abstract

Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique for eliciting complex multi-step reasoning through step-by-step answer examples, achieved the state-of-the-art performances in arithmetics and symbolic reasoning, difficult system-2 tasks that do not follow the standard scaling laws for LLMs. While these successes are often attributed to LLMs' ability for few-shot learning, we show that LLMs are decent zero-shot reasoners by simply adding "Let's think step by step" before each answer. Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

alehc/rejection-sampling-QA
dataset· 32 dl
32 dl

Videos

The Professor Who Proved ChatGPT Can't Think — Subbarao Kambhampati· youtube

Large Language Models are Zero-Shot Reasoners· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsPathways Language Model