Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot

Xiang Cheng; Chengyan Pan; Minjun Zhao; Deyang Li; Fangchao Liu; Xinyu Zhang; Xiao Zhang; Yong Liu

arXiv:2506.14641·cs.CL·January 9, 2026

Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot

Xiang Cheng, Chengyan Pan, Minjun Zhao, Deyang Li, Fangchao Liu, Xinyu Zhang, Xiao Zhang, Yong Liu

PDF

Open Access

TL;DR

This paper investigates the effectiveness of Chain-of-Thought prompting in recent strong language models, finding that traditional and enhanced CoT exemplars do not improve reasoning performance compared to Zero-Shot CoT, and often serve only to align output formats.

Contribution

The study systematically evaluates CoT prompting on recent models, revealing its limited benefits and highlighting the need to re-examine the ICL paradigm for mathematical reasoning tasks.

Findings

01

Traditional CoT does not improve reasoning in recent models.

02

Enhanced CoT exemplars fail to boost reasoning performance.

03

Models tend to ignore exemplars and focus on instructions.

Abstract

In-Context Learning (ICL) is an essential emergent ability of Large Language Models (LLMs), and recent studies introduce Chain-of-Thought (CoT) to exemplars of ICL to enhance the reasoning capability, especially in mathematics tasks. However, given the continuous advancement of model capabilities, it remains unclear whether CoT exemplars still benefit recent, stronger models in such tasks. Through systematic experiments, we find that for recent strong models such as the Qwen2.5 series, adding traditional CoT exemplars does not improve reasoning performance compared to Zero-Shot CoT. Instead, their primary function is to align the output format with human expectations. We further investigate the effectiveness of enhanced CoT exemplars, constructed using answers from advanced models such as \texttt{Qwen2.5-Max} and \texttt{DeepSeek-R1}. Experimental results indicate that these enhanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health Research Topics

MethodsALIGN · Focus