On the Turing Completeness of Modern Neural Network Architectures

Jorge P\'erez; Javier Marinkovi\'c; Pablo Barcel\'o

arXiv:1901.03429·cs.LG·January 14, 2019·61 cites

On the Turing Completeness of Modern Neural Network Architectures

Jorge P\'erez, Javier Marinkovi\'c, Pablo Barcel\'o

PDF

Open Access

TL;DR

This paper demonstrates that modern neural network architectures like Transformers and Neural GPUs are Turing complete solely through their internal computations, without external memory, highlighting their theoretical computational power.

Contribution

It proves the Turing completeness of Transformers and Neural GPUs based on their internal representations, clarifying their computational capabilities without external memory.

Findings

01

Transformers are Turing complete through internal dense representations.

02

Neural GPUs achieve Turing completeness without external memory.

03

Minimal elements needed for Turing completeness are identified.

Abstract

Alternatives to recurrent neural networks, in particular, architectures based on attention or convolutions, have been gaining momentum for processing input sequences. In spite of their relevance, the computational properties of these alternatives have not yet been fully explored. We study the computational power of two of the most paradigmatic architectures exemplifying these mechanisms: the Transformer (Vaswani et al., 2017) and the Neural GPU (Kaiser & Sutskever, 2016). We show both models to be Turing complete exclusively based on their capacity to compute and access internal dense representations of the data. In particular, neither the Transformer nor the Neural GPU requires access to an external memory to become Turing complete. Our study also reveals some minimal sets of elements needed to obtain these completeness results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Computability, Logic, AI Algorithms · Fuzzy Logic and Control Systems

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax