Data Shifts Hurt CoT: A Theoretical Study

Lang Yin; Debangshu Banerjee; Gagandeep Singh

arXiv:2506.10647·cs.LG·June 17, 2025

Data Shifts Hurt CoT: A Theoretical Study

Lang Yin, Debangshu Banerjee, Gagandeep Singh

PDF

Open Access 3 Reviews

TL;DR

This paper provides a theoretical analysis of how data shifts, including distribution changes and poisoning, negatively affect Chain of Thought methods in large language models, especially for solving the $k$-parity problem.

Contribution

It is the first work to rigorously analyze the impact of data shifts on CoT, revealing that such shifts can significantly impair model performance and explaining the underlying mechanisms.

Findings

01

Data shifts harm CoT performance on $k$-parity.

02

CoT can perform worse than direct prediction under data shifts.

03

Theoretical explanations for the impact of distribution and poisoning shifts.

Abstract

Chain of Thought (CoT) has been applied to various large language models (LLMs) and proven to be effective in improving the quality of outputs. In recent studies, transformers are proven to have absolute upper bounds in terms of expressive power, and consequently, they cannot solve many computationally difficult problems. However, empowered by CoT, transformers are proven to be able to solve some difficult problems effectively, such as the $k$ -parity problem. Nevertheless, those works rely on two imperative assumptions: (1) identical training and testing distribution, and (2) corruption-free training data with correct reasoning steps. However, in the real world, these assumptions do not always hold. Although the risks of data shifts have caught attention, our work is the first to rigorously study the exact harm caused by such shifts to the best of our knowledge. Focusing on the…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

very interesting question and approach! - Novel investigation: Rigorous characterization of the three-way relationship between distribution shift, poisoning, and performance - Surprising finding that distribution shift always hurts, even when it leaks information - Honest discussion of limitations

Weaknesses

- Results limited to k-parity with specific CoT decomposition - The imbalanced k-parity problem (Theorem 4.1) was already known to be solvable without CoT, so showing CoT performs worse is not surprising. - Theoretical guarantees require impractically large d; experiments at realistic scale show different behavior - Limited practical impact due to restrictive assumptions (uniform/near-uniform distributions, binary inputs, specific problem structure)

Reviewer 02Rating 4Confidence 3

Strengths

1. The paper demonstrates that a one-layer transformer can effectively learn to solve the k-parity problem without CoT, as shown in Theorem 4.1, indicating the problem's tractability under imbalanced distributions. 2. It reveals that models trained with CoT data are highly sensitive to data shifts, such as poisoning and distribution changes (regulated by ρ), with Theorem 4.2 providing a rigorous necessary and sufficient condition for training success. This is both novel and insightful for under

Weaknesses

1. The setting is relatively simple: the model only needs to learn specific positions using one-hot positional encoding, and the analysis focuses on one-step gradient updates, which may oversimplify real-world scenarios. 2. The apparent contradiction between Theorems 4.1 and 4.2/4.3 is initially confusing: if the model can effectively solve the problem in one step without CoT, why does it struggle in the CoT setting? After all, the CoT decomposition breaks the k-parity problem into multiple sim

Reviewer 03Rating 6Confidence 2

Strengths

1. This paper first rigorously analyze CoT robustness under distribution shift and label poisoning. 2. Theorems 4.1 and 4.2 give clear analytic characterizations of success/failure thresholds, including asymptotic rates. 3. Some insights provided by this work's theoretical study is valuable, such that data shift always leads to worse training performance.

Weaknesses

1. The choice of linear layer function is too artificial and may not generalize well. 2. The paper does not clearly contrast its theoretical assumptions with prior CoT expressivity works (e.g., Merrill & Sabharwal 2024; Kim & Suzuki 2025) beyond citing them.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Big Data and Digital Economy