Training Transformers as a Universal Computer
Ruize Xu, Chenxiao Yang, Yanhong Li, and David McAllester

TL;DR
This paper shows that a small transformer can be trained to execute programs in a universal programming language, demonstrating the potential of transformers as universal computers.
Contribution
It introduces a method for training transformers to execute MicroPy programs, showing they can generalize to complex and novel computations.
Findings
Transformer can learn to execute MicroPy programs after training on random code.
The trained transformer generalizes to human-written and out-of-distribution programs.
MicroPy's expressiveness allows the transformer to act as a universal computer.
Abstract
We demonstrate that a small transformer can learn to execute programs in MicroPy, a simplified yet computationally universal programming language. Given procedure definitions together with an expression to evaluate, the transformer predicts small-step execution using PENCIL scaffolding for space-efficient execution within a bounded context window. After training on randomly generated, meaningless MicroPy programs, the learned transformer generalizes to various human-written programs including bit copying and flipping, binary addition and multiplication, and SAT verification and solving. We note that the trained model can achieve out-of-distribution generalization; i.e., evaluate novel programs from distribution on programs. Since MicroPy can express any computation, our results provide empirical evidence that a standard transformer can be trained to act as a universal computer.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
