Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation
Samuel Cognolato, Alberto Testolin

TL;DR
This paper demonstrates that universal transformers with local attention and adaptive halting can learn to perform multi-digit addition, discovering human-like calculation strategies and generalizing beyond training data.
Contribution
The study introduces a novel transformer-based model that effectively learns symbolic addition using local attention and external memory, enabling extrapolation and human-like reasoning.
Findings
Achieves high accuracy on multi-digit addition tasks
Discovers human-like calculation strategies such as place value alignment
Successfully generalizes to problems outside training distribution
Abstract
Mathematical reasoning is one of the most impressive achievements of human intellect but remains a formidable challenge for artificial intelligence systems. In this work we explore whether modern deep learning architectures can learn to solve a symbolic addition task by discovering effective arithmetic procedures. Although the problem might seem trivial at first glance, generalizing arithmetic knowledge to operations involving a higher number of terms, possibly composed by longer sequences of digits, has proven extremely challenging for neural networks. Here we show that universal transformers equipped with local attention and adaptive halting mechanisms can learn to exploit an external, grid-like memory to carry out multi-digit addition. The proposed model achieves remarkable accuracy even when tested with problems requiring extrapolation outside the training distribution; most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Model Reduction and Neural Networks · Neural Networks and Applications
