On the Turing Completeness of Modern Neural Network Architectures
Jorge P\'erez, Javier Marinkovi\'c, Pablo Barcel\'o

TL;DR
This paper demonstrates that modern neural network architectures like Transformers and Neural GPUs are Turing complete solely through their internal computations, without external memory, highlighting their theoretical computational power.
Contribution
It proves the Turing completeness of Transformers and Neural GPUs based on their internal representations, clarifying their computational capabilities without external memory.
Findings
Transformers are Turing complete through internal dense representations.
Neural GPUs achieve Turing completeness without external memory.
Minimal elements needed for Turing completeness are identified.
Abstract
Alternatives to recurrent neural networks, in particular, architectures based on attention or convolutions, have been gaining momentum for processing input sequences. In spite of their relevance, the computational properties of these alternatives have not yet been fully explored. We study the computational power of two of the most paradigmatic architectures exemplifying these mechanisms: the Transformer (Vaswani et al., 2017) and the Neural GPU (Kaiser & Sutskever, 2016). We show both models to be Turing complete exclusively based on their capacity to compute and access internal dense representations of the data. In particular, neither the Transformer nor the Neural GPU requires access to an external memory to become Turing complete. Our study also reveals some minimal sets of elements needed to obtain these completeness results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Computability, Logic, AI Algorithms · Fuzzy Logic and Control Systems
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
