Barriers to Universal Reasoning With Transformers (And How to Overcome Them)

Oliver Kraus; Yash Sarrof; Yuekun Yao; Alexander Koller; Michael Hahn

arXiv:2604.25800·cs.LG·April 29, 2026

Barriers to Universal Reasoning With Transformers (And How to Overcome Them)

Oliver Kraus, Yash Sarrof, Yuekun Yao, Alexander Koller, Michael Hahn

PDF

TL;DR

This paper investigates the limitations of Transformers in generalizing Chain-of-Thought reasoning to longer traces, identifies core obstacles, and proposes encoding strategies to overcome these barriers for improved length generalization.

Contribution

It reveals the fundamental barriers to length generalization in Transformers with CoT and introduces encoding techniques that enable Turing-complete simulations with linear trace length.

Findings

01

Transformers with standard encodings cannot solve beyond $TC^0$ problems.

02

Allowing vocabulary growth enables Turing machine simulation with linear trace length.

03

Signpost tokens and value change encodings improve length generalization on complex problems.

Abstract

Chain-of-Thought (CoT) has been shown to empirically improve Transformers' performance, and theoretically increase their expressivity to Turing completeness. However, whether Transformers can learn to generalize to CoT traces longer than those seen during training is understudied. We use recent theoretical frameworks for Transformer length generalization and find that -- under standard positional encodings and a finite alphabet -- Transformers with CoT cannot solve problems beyond $T C^{0}$ , i.e. the expressivity benefits do not hold under the stricter requirement of length-generalizable learnability. However, if we allow the vocabulary to grow with problem size, we attain a length-generalizable simulation of Turing machines where the CoT trace length is linear in the simulated runtime up to a constant. Our construction overcomes two core obstacles to reliable length generalization:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.