Average Attention Transformers and Arithmetic Circuits
Lena Ehrmuth, Laura Strieker

TL;DR
This paper demonstrates that average attention transformers can simulate arithmetic circuits, revealing their computational power and equivalence to certain circuit families across various algebraic structures.
Contribution
It shows that average attention in transformers can emulate arithmetic circuits, extending understanding of their computational capabilities beyond neural networks.
Findings
Average attention transformers can simulate constant-depth arithmetic circuits.
Transformers with average attention compute functions equivalent to certain circuit families.
Results hold for transformers over reals, rationals, and rings.
Abstract
We analyse the computational power of transformer encoders as sequence-to-sequence functions on vectors. We show that average hard attention can be used to simulate arithmetic circuits if they are given as an input to an encoder. The circuit families that can be simulated this way have constant depth while using unbounded addition, binary multiplication and sign gates. The transformers we use have arithmetic circuits instead of feed-forward networks. With typical average attention the functions they compute are also computed by the same class of circuit families. Our results hold for transformers over the reals, rationals and any ring in between the two.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
