Autoregressive Large Language Models are Computationally Universal
Dale Schuurmans, Hanjun Dai, Francesco Zanini

TL;DR
This paper demonstrates that autoregressive large language models, specifically gemini-1.5-pro-001, can perform universal computation by simulating a Turing machine through a novel Lag system framework.
Contribution
It introduces a new theoretical framework linking autoregressive decoding to universal computation and shows that existing models can simulate a universal Turing machine.
Findings
Autoregressive models can realize universal computation without weight modification.
A single prompt can drive the model to simulate a universal Lag system.
The model can implement a Turing machine through extended autoregressive decoding.
Abstract
We show that autoregressive decoding of a transformer-based language model can realize universal computation, without external intervention or modification of the model's weights. Establishing this result requires understanding how a language model can process arbitrarily long inputs using a bounded context. For this purpose, we consider a generalization of autoregressive decoding where, given a long input, emitted tokens are appended to the end of the sequence as the context window advances. We first show that the resulting system corresponds to a classical model of computation, a Lag system, that has long been known to be computationally universal. By leveraging a new proof, we show that a universal Turing machine can be simulated by a Lag system with 2027 production rules. We then investigate whether an existing large language model can simulate the behaviour of such a universal Lag…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
