Transformers as Transducers
Lena Strobl, Dana Angluin, David Chiang, Jonathan Rawski, Ashish, Sabharwal

TL;DR
This paper explores the expressive power of transformers by relating them to finite transducers, introducing RASP variants to characterize their capabilities in sequence-to-sequence tasks, and demonstrating their simulation of certain computational models.
Contribution
It extends the RASP programming language to model transformer transduction capabilities and establishes their equivalence to classes of rational and polyregular functions.
Findings
Transformers can express large classes of transductions.
Extended RASP variants characterize different classes of functions.
Masked attention transformers can simulate S-RASP.
Abstract
We study the sequence-to-sequence mapping capacity of transformers by relating them to finite transducers, and find that they can express surprisingly large classes of transductions. We do so using variants of RASP, a programming language designed to help people "think like transformers," as an intermediate representation. We extend the existing Boolean variant B-RASP to sequence-to-sequence functions and show that it computes exactly the first-order rational functions (such as string rotation). Then, we introduce two new extensions. B-RASP[pos] enables calculations on positions (such as copying the first half of a string) and contains all first-order regular functions. S-RASP adds prefix sum, which enables additional arithmetic operations (such as squaring a string) and contains all first-order polyregular functions. Finally, we show that masked average-hard attention transformers can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications
