Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers

Alireza Amiri; Xinting Huang; Mark Rofin; Michael Hahn

arXiv:2502.02393·cs.LG·July 15, 2025

Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers

Alireza Amiri, Xinting Huang, Mark Rofin, Michael Hahn

PDF

Open Access 1 Video

TL;DR

This paper investigates the fundamental limits of chain-of-thought reasoning in hard-attention transformers, providing tight lower bounds on the number of reasoning steps needed for various algorithmic problems, thus clarifying their computational power.

Contribution

It establishes systematic lower bounds on chain-of-thought steps in hard-attention transformers for multiple problems, advancing understanding of their capabilities and limitations.

Findings

01

Lower bounds are tight up to logarithmic factors for several problems.

02

Transformers require a significant number of reasoning steps for certain tasks.

03

Results challenge assumptions about the efficiency of chain-of-thought reasoning.

Abstract

Chain-of-thought reasoning and scratchpads have emerged as critical tools for enhancing the computational capabilities of transformers. While theoretical results show that polynomial-length scratchpads can extend transformers' expressivity from $T C^{0}$ to $P T I M E$ , their required length remains poorly understood. Empirical evidence even suggests that transformers need scratchpads even for many problems in $T C^{0}$ , such as Parity or Multiplication, challenging optimistic bounds derived from circuit complexity. In this work, we initiate the study of systematic lower bounds for the number of chain-of-thought steps across different algorithmic problems, in the hard-attention regime. We study a variety of algorithmic problems, and provide bounds that are tight up to logarithmic factors. Overall, these results contribute to emerging understanding of the power and limitations of chain-of-thought…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers· slideslive

Taxonomy

TopicsEEG and Brain-Computer Interfaces · Advanced Memory and Neural Computing · Visual Attention and Saliency Detection