Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Chengshuai Zhao; Zhen Tan; Pingchuan Ma; Dawei Li; Bohan Jiang; Yancheng Wang; Yingzhen Yang; Huan Liu

arXiv:2508.01191·cs.AI·May 12, 2026

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Chengshuai Zhao, Zhen Tan, Pingchuan Ma, Dawei Li, Bohan Jiang, Yancheng Wang, Yingzhen Yang, Huan Liu

PDF

1 Repo 1 Datasets

TL;DR

This paper investigates the effectiveness of Chain-of-Thought prompting in large language models, revealing its reliance on training data distribution and its limitations beyond in-distribution scenarios.

Contribution

It introduces a data distribution perspective and a controllable environment, DataAlchemy, to systematically analyze when CoT reasoning succeeds or fails.

Findings

01

CoT reasoning is effective within training data distribution.

02

Performance degrades significantly when test data diverges from training distribution.

03

CoT reasoning is a brittle phenomenon that does not generalize well beyond training conditions.

Abstract

Chain-of-Thought (CoT) prompting has been shown to be effective in eliciting structured reasoning (i.e., CoT reasoning) from large language models (LLMs). Regardless of its popularity, recent studies expose its failures in some reasoning tasks, raising fundamental questions about the nature of CoT reasoning. In this work, we propose a data distribution lens to understand when and why CoT reasoning succeeds or fails. We hypothesize that CoT reasoning reflects a structured inductive bias learned from in-distribution data, enabling models to conditionally generate reasoning trajectories that approximate those observed during training. As such, the effectiveness of CoT reasoning is fundamentally governed by the nature and degree of distribution discrepancy between training data and test queries. Guided by this lens, we dissect CoT reasoning via three dimensions: task, length, and format. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chengshuaizhao0/DataAlchemy
github

Datasets

chengshuaizhao/DataAlchemy
dataset· 422 dl
422 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.