On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective
Yue Zhang, Zhiyi Dong, Tommaso Cesari, Yongyi Mao

TL;DR
This paper presents a learning-theoretic framework analyzing the benefits and costs of Chain of Thought reasoning, highlighting conditions under which it improves or hampers performance.
Contribution
It introduces a canonical risk decomposition for CoT, revealing the roles of oracle-trajectory risk and trajectory-mismatch risk, and characterizes stability conditions affecting CoT effectiveness.
Findings
Risk decomposes into oracle-trajectory and mismatch components.
Cost is unavoidable without structural stability, leading to potential error amplification.
Provides bounds on error growth regimes under stability conditions.
Abstract
We develop a learning-theoretic framework for understanding Chain of Thought (CoT). We model CoT as the interaction between an answer map and a chain rule that generates intermediate questions autoregressively, and define the reasoning risk of a hypothesis under this interaction. Our first result is a tight canonical decomposition of this risk into two terms with opposing roles: an oracle-trajectory risk (OTR), which captures the benefit of CoT and reduces to a target-domain risk in a domain adaptation problem, and a trajectory-mismatch risk (TMR), which captures the cost of CoT through error accumulation along mismatched reasoning trajectories. We then show that this cost is unavoidable without structure: if any one of the loss, the hypothesis answer map, or the chain rule lacks stability, the TMR can be arbitrarily large even when the OTR is zero and the hypothesis is uniformly close…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
