How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad
Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Colin Sandon, Omid Saremi

TL;DR
This paper investigates the limitations of Transformers in reasoning tasks, introduces the concept of 'globality degree' to measure learnability, and proposes scratchpad techniques to overcome these barriers, enhancing reasoning and generalization.
Contribution
It introduces the 'globality degree' as a measure of target distribution learnability and develops scratchpad methods, including inductive scratchpads, to surpass the globality barrier in reasoning tasks.
Findings
High globality distributions are hard to learn with Transformers.
Agnostic scratchpads cannot overcome the globality barrier.
Inductive scratchpads can break the barrier and improve out-of-distribution generalization.
Abstract
Can Transformers predict new syllogisms by composing established ones? More generally, what type of targets can be learned by such models from scratch? Recent works show that Transformers can be Turing-complete in terms of expressivity, but this does not address the learnability objective. This paper puts forward the notion of 'globality degree' of a target distribution to capture when weak learning is efficiently achievable by regular Transformers. This measure shows a contrast with the expressivity results of Transformers captured by classes (further studied here), since the globality relates to correlations with the more limited class. We show here experimentally and theoretically under additional assumptions that distributions with high globality cannot be learned efficiently. In particular, syllogisms cannot be composed on long chains. Further, we develop…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDeception detection and forensic psychology · Neural Networks and Applications · Computability, Logic, AI Algorithms
