Saturated Transformers are Constant-Depth Threshold Circuits

William Merrill; Ashish Sabharwal; Noah A. Smith

arXiv:2106.16213·cs.CL·April 12, 2022

Saturated Transformers are Constant-Depth Threshold Circuits

William Merrill, Ashish Sabharwal, Noah A. Smith

PDF

Open Access

TL;DR

This paper demonstrates that saturated transformers, which better model practical attention mechanisms, can be simulated by constant-depth threshold circuits, extending the theoretical understanding of their computational power.

Contribution

It shows that saturated transformers are equivalent to constant-depth threshold circuits, surpassing the limitations of hard-attention transformers in formal language recognition.

Findings

01

Saturated transformers transcend hard-attention limitations.

02

They can be simulated by constant-depth threshold circuits.

03

Recognize languages within the class TC^0.

Abstract

Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages. Recent work has shown that transformers with hard attention are quite limited in power (Hahn, 2020), as they can be simulated by constant-depth AND/OR circuits (Hao et al. 2021). However, hard attention is a strong assumption, which may complicate the relevance of these results in practice. In this work, we analyze the circuit complexity of transformers with saturated attention: a generalization of hard attention that more closely captures the attention patterns learnable in practical transformers. We first show that saturated transformers transcend the known limitations of hard-attention transformers. We then prove saturated transformers with floating-point values can be simulated by constant-depth threshold circuits, giving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Ferroelectric and Negative Capacitance Devices · Machine Learning and Data Classification