Prompting Science Report 2: The Decreasing Value of Chain of Thought in Prompting
Lennart Meincke, Ethan Mollick, Lilach Mollick, Dan Shapiro

TL;DR
This report critically examines Chain-of-Thought prompting in large language models, revealing its variable effectiveness across different models and tasks, often increasing computational cost without consistent accuracy improvements.
Contribution
It provides a nuanced analysis of CoT prompting, showing its limited benefits for models with inherent reasoning abilities and highlighting increased costs and variability.
Findings
CoT improves performance mainly in non-reasoning models
Recent models often perform implicit CoT reasoning without prompts
CoT increases token usage and response time without guaranteed accuracy gains
Abstract
This is the second in a series of short reports that seek to help business, education, and policy leaders understand the technical details of working with AI through rigorous testing. In this report, we investigate Chain-of-Thought (CoT) prompting, a technique that encourages a large language model (LLM) to "think step by step" (Wei et al., 2022). CoT is a widely adopted method for improving reasoning tasks, however, our findings reveal a more nuanced picture of its effectiveness. We demonstrate two things: - The effectiveness of Chain-of-Thought prompting can vary greatly depending on the type of task and model. For non-reasoning models, CoT generally improves average performance by a small amount, particularly if the model does not inherently engage in step-by-step processing by default. However, CoT can introduce more variability in answers, sometimes triggering occasional errors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Intelligent Tutoring Systems and Adaptive Learning · Topic Modeling
