Barriers to Universal Reasoning With Transformers (And How to Overcome Them)
Oliver Kraus, Yash Sarrof, Yuekun Yao, Alexander Koller, Michael Hahn

TL;DR
This paper investigates the limitations of Transformers in generalizing Chain-of-Thought reasoning to longer traces, identifies core obstacles, and proposes encoding strategies to overcome these barriers for improved length generalization.
Contribution
It reveals the fundamental barriers to length generalization in Transformers with CoT and introduces encoding techniques that enable Turing-complete simulations with linear trace length.
Findings
Transformers with standard encodings cannot solve beyond $TC^0$ problems.
Allowing vocabulary growth enables Turing machine simulation with linear trace length.
Signpost tokens and value change encodings improve length generalization on complex problems.
Abstract
Chain-of-Thought (CoT) has been shown to empirically improve Transformers' performance, and theoretically increase their expressivity to Turing completeness. However, whether Transformers can learn to generalize to CoT traces longer than those seen during training is understudied. We use recent theoretical frameworks for Transformer length generalization and find that -- under standard positional encodings and a finite alphabet -- Transformers with CoT cannot solve problems beyond , i.e. the expressivity benefits do not hold under the stricter requirement of length-generalizable learnability. However, if we allow the vocabulary to grow with problem size, we attain a length-generalizable simulation of Turing machines where the CoT trace length is linear in the simulated runtime up to a constant. Our construction overcomes two core obstacles to reliable length generalization:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
