Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Zhiyuan Li, Hong Liu, Denny Zhou, Tengyu Ma

TL;DR
This paper provides a theoretical framework explaining how chain of thought (CoT) enhances transformer models by enabling them to perform serial computations, significantly improving their ability to solve complex, inherently sequential problems.
Contribution
It offers a formal expressiveness analysis of transformers with CoT, showing how CoT extends their computational power from AC^0 to problems solvable by boolean circuits of size T.
Findings
CoT enables transformers to solve problems beyond AC^0.
Empirical results show CoT improves accuracy on complex, serial tasks.
Constant-depth transformers with CoT can simulate T-step boolean circuit computations.
Abstract
Instructing the model to generate a sequence of intermediate steps, a.k.a., a chain of thought (CoT), is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks. However, the mechanism behind CoT remains unclear. This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness. Conceptually, CoT empowers the model with the ability to perform inherently serial computation, which is otherwise lacking in transformers, especially when depth is low. Given input length , previous works have shown that constant-depth transformers with finite precision embedding size can only solve problems in without CoT. We first show an even tighter expressiveness upper bound for constant-depth transformers with constant-bit precision,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques
