Chain of Thought Empowers Transformers to Solve Inherently Serial   Problems

Zhiyuan Li; Hong Liu; Denny Zhou; Tengyu Ma

arXiv:2402.12875·cs.LG·September 24, 2024·5 cites

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

Zhiyuan Li, Hong Liu, Denny Zhou, Tengyu Ma

PDF

Open Access

TL;DR

This paper provides a theoretical framework explaining how chain of thought (CoT) enhances transformer models by enabling them to perform serial computations, significantly improving their ability to solve complex, inherently sequential problems.

Contribution

It offers a formal expressiveness analysis of transformers with CoT, showing how CoT extends their computational power from AC^0 to problems solvable by boolean circuits of size T.

Findings

01

CoT enables transformers to solve problems beyond AC^0.

02

Empirical results show CoT improves accuracy on complex, serial tasks.

03

Constant-depth transformers with CoT can simulate T-step boolean circuit computations.

Abstract

Instructing the model to generate a sequence of intermediate steps, a.k.a., a chain of thought (CoT), is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks. However, the mechanism behind CoT remains unclear. This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness. Conceptually, CoT empowers the model with the ability to perform inherently serial computation, which is otherwise lacking in transformers, especially when depth is low. Given input length $n$ , previous works have shown that constant-depth transformers with finite precision $poly (n)$ embedding size can only solve problems in $TC^{0}$ without CoT. We first show an even tighter expressiveness upper bound for constant-depth transformers with constant-bit precision,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques