Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Guhao Feng, Bohang Zhang, Yuntian Gu, Haotian Ye, Di He, Liwei Wang

TL;DR
This paper provides a theoretical analysis of Chain-of-Thought prompting in Large Language Models, revealing its capabilities and limitations in solving mathematical and decision-making problems through circuit complexity and empirical validation.
Contribution
It offers the first theoretical insights into CoT's expressivity, showing how Transformers generate solutions step-by-step and identifying conditions for their success and failure.
Findings
Bounded-depth Transformers cannot directly solve basic arithmetic tasks without super-polynomial size.
Constant size autoregressive Transformers can generate CoT derivations effectively.
Transformers with CoT can handle complex decision-making problems like Dynamic Programming.
Abstract
Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs), particularly when dealing with complex tasks involving mathematics or reasoning. Despite the enormous empirical success, the underlying mechanisms behind CoT and how it unlocks the potential of LLMs remain elusive. In this paper, we take a first step towards theoretically answering these questions. Specifically, we examine the expressivity of LLMs with CoT in solving fundamental mathematical and decision-making problems. By using circuit complexity theory, we first give impossibility results showing that bounded-depth Transformers are unable to directly produce correct answers for basic arithmetic/equation tasks unless the model size grows super-polynomially with respect to the input length. In contrast, we then prove by construction that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Machine Learning and Algorithms · Topic Modeling
