On the Computational Power of Transformers and its Implications in   Sequence Modeling

Satwik Bhattamishra; Arkil Patel; Navin Goyal

arXiv:2006.09286·cs.LG·October 13, 2020

On the Computational Power of Transformers and its Implications in Sequence Modeling

Satwik Bhattamishra, Arkil Patel, Navin Goyal

PDF

1 Repo

TL;DR

This paper investigates the theoretical computational capabilities of Transformers, establishing their Turing-completeness under various configurations, and explores the roles of different components in their power, with experimental validation on translation and synthetic tasks.

Contribution

It provides a simplified proof of Transformers' Turing-completeness and analyzes the necessity of components like residual connections, offering new insights into their computational power.

Findings

01

Transformers are Turing-complete even without positional encodings.

02

A specific residual connection type is essential for Turing-completeness.

03

Experimental results demonstrate practical implications of the theoretical findings.

Abstract

Transformers are being used extensively across several sequence modeling tasks. Significant research effort has been devoted to experimentally probe the inner workings of Transformers. However, our conceptual and theoretical understanding of their power and inherent limitations is still nascent. In particular, the roles of various components in Transformers such as positional encodings, attention heads, residual connections, and feedforward networks, are not clear. In this paper, we take a step towards answering these questions. We analyze the computational power as captured by Turing-completeness. We first provide an alternate and simpler proof to show that vanilla Transformers are Turing-complete and then we prove that Transformers with only positional masking and without any positional encoding are also Turing-complete. We further analyze the necessity of each component for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

satwik77/Transformer-Computation-Analysis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsResidual Connection