Average-Hard Attention Transformers are Constant-Depth Uniform Threshold   Circuits

Lena Strobl

arXiv:2308.03212·cs.CL·August 23, 2023·1 cites

Average-Hard Attention Transformers are Constant-Depth Uniform Threshold Circuits

Lena Strobl

PDF

Open Access

TL;DR

This paper demonstrates that average-hard attention transformers can be simulated by uniform constant-depth threshold circuits, extending previous results that linked transformers with non-uniform circuits, thus providing a deeper understanding of their computational power.

Contribution

It extends prior work by proving that average-hard attention transformers are equivalent to uniform TC0 circuits, highlighting their computational robustness and uniformity.

Findings

01

Transformers recognize languages in TC0 class.

02

Uniform TC0 circuits can simulate average-hard attention transformers.

03

The results unify transformer models with classical circuit complexity classes.

Abstract

Transformers have emerged as a widely used neural network model for various natural language processing tasks. Previous research explored their relationship with constant-depth threshold circuits, making two assumptions: average-hard attention and logarithmic precision for internal computations relative to input length. Merrill et al. (2022) prove that average-hard attention transformers recognize languages that fall within the complexity class TC0, denoting the set of languages that can be recognized by constant-depth polynomial-size threshold circuits. Likewise, Merrill and Sabharwal (2023) show that log-precision transformers recognize languages within the class of uniform TC0. This shows that both transformer models can be simulated by constant-depth threshold circuits, with the latter being more robust due to generating a uniform circuit family. Our paper shows that the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Topic Modeling · Machine Learning and Algorithms