Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri, Xinting Huang, Mark Rofin, Michael Hahn

TL;DR
This paper investigates the fundamental limits of chain-of-thought reasoning in hard-attention transformers, providing tight lower bounds on the number of reasoning steps needed for various algorithmic problems, thus clarifying their computational power.
Contribution
It establishes systematic lower bounds on chain-of-thought steps in hard-attention transformers for multiple problems, advancing understanding of their capabilities and limitations.
Findings
Lower bounds are tight up to logarithmic factors for several problems.
Transformers require a significant number of reasoning steps for certain tasks.
Results challenge assumptions about the efficiency of chain-of-thought reasoning.
Abstract
Chain-of-thought reasoning and scratchpads have emerged as critical tools for enhancing the computational capabilities of transformers. While theoretical results show that polynomial-length scratchpads can extend transformers' expressivity from to , their required length remains poorly understood. Empirical evidence even suggests that transformers need scratchpads even for many problems in , such as Parity or Multiplication, challenging optimistic bounds derived from circuit complexity. In this work, we initiate the study of systematic lower bounds for the number of chain-of-thought steps across different algorithmic problems, in the hard-attention regime. We study a variety of algorithmic problems, and provide bounds that are tight up to logarithmic factors. Overall, these results contribute to emerging understanding of the power and limitations of chain-of-thought…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Advanced Memory and Neural Computing · Visual Attention and Saliency Detection
