How Far Can Transformers Reason? The Globality Barrier and Inductive   Scratchpad

Emmanuel Abbe; Samy Bengio; Aryo Lotfi; Colin Sandon; Omid Saremi

arXiv:2406.06467·cs.LG·November 5, 2024

How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad

Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Colin Sandon, Omid Saremi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates the limitations of Transformers in reasoning tasks, introduces the concept of 'globality degree' to measure learnability, and proposes scratchpad techniques to overcome these barriers, enhancing reasoning and generalization.

Contribution

It introduces the 'globality degree' as a measure of target distribution learnability and develops scratchpad methods, including inductive scratchpads, to surpass the globality barrier in reasoning tasks.

Findings

01

High globality distributions are hard to learn with Transformers.

02

Agnostic scratchpads cannot overcome the globality barrier.

03

Inductive scratchpads can break the barrier and improve out-of-distribution generalization.

Abstract

Can Transformers predict new syllogisms by composing established ones? More generally, what type of targets can be learned by such models from scratch? Recent works show that Transformers can be Turing-complete in terms of expressivity, but this does not address the learnability objective. This paper puts forward the notion of 'globality degree' of a target distribution to capture when weak learning is efficiently achievable by regular Transformers. This measure shows a contrast with the expressivity results of Transformers captured by $T C^{0} / T C^{1}$ classes (further studied here), since the globality relates to correlations with the more limited $N C^{0}$ class. We show here experimentally and theoretically under additional assumptions that distributions with high globality cannot be learned efficiently. In particular, syllogisms cannot be composed on long chains. Further, we develop…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aryol/inductive-scratchpad
pytorchOfficial

Videos

How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad· slideslive

Taxonomy

TopicsDeception detection and forensic psychology · Neural Networks and Applications · Computability, Logic, AI Algorithms