Logical Languages Accepted by Transformer Encoders with Hard Attention

Pablo Barcelo; Alexander Kozachinskiy; Anthony Widjaja Lin; Vladimir; Podolskii

arXiv:2310.03817·cs.FL·October 9, 2023·2 cites

Logical Languages Accepted by Transformer Encoders with Hard Attention

Pablo Barcelo, Alexander Kozachinskiy, Anthony Widjaja Lin, Vladimir, Podolskii

PDF

Open Access 3 Reviews

TL;DR

This paper investigates the expressive power of transformer encoders with hard attention mechanisms, showing their limitations and capabilities in recognizing formal languages within certain circuit complexity classes.

Contribution

It characterizes the classes of formal languages recognized by UHAT and AHAT transformer encoders, revealing their computational boundaries and capabilities.

Findings

01

UHAT encoders recognize only a subset of ${ m AC}^0$ languages.

02

AHAT encoders can recognize all languages definable in first-order logic with unary predicates.

03

UHAT cannot recognize some ${ m AC}^0$ languages, but can recognize all regular languages within ${ m AC}^0$.

Abstract

We contribute to the study of formal languages that can be recognized by transformer encoders. We focus on two self-attention mechanisms: (1) UHAT (Unique Hard Attention Transformers) and (2) AHAT (Average Hard Attention Transformers). UHAT encoders are known to recognize only languages inside the circuit complexity class $AC^{0}$ , i.e., accepted by a family of poly-sized and depth-bounded boolean circuits with unbounded fan-ins. On the other hand, AHAT encoders can recognize languages outside $AC^{0}$ ), but their expressive power still lies within the bigger circuit complexity class $TC^{0}$ , i.e., $AC^{0}$ -circuits extended by majority gates. We first show a negative result that there is an $AC^{0}$ -language that cannot be recognized by an UHAT encoder. On the positive side, we show that UHAT encoders can recognize a rich fragment of $AC^{0}$ -languages, namely,…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 2

Strengths

The paper contribute to a fun connection between formal language theory in order to analyze the expressivity of transformers. This line of research sounds more like research performed in TCS than in IA tracks but since it is apply to IA-defined model it makes some sense. Understanding the expressivity of those model might be enlightening to people actually playing with them.

Weaknesses

Some of the contribution sounds really artificial to me. In particular, why should we study the commutative closure of language defined by transformers? While it makes sense in TCS, I fail to grasp the importance in this context.

Reviewer 02Rating 3· reject, not good enoughConfidence 3

Strengths

1. The study about whether circuit language can be accepted by existing transformer encoders is interesting and impressive. 2. The paper offers comprehensive theoretical justification to demonstrate and validate their findings.

Weaknesses

1. The structure of this paper is very messy, which is very hard to follow. Let me take the Section Introduction as an instance: 1.a In Section 1 Introduction, the paper claims that "the expressive power of transformer encoders has not been fully elucidated to date.". I am curious about that. What do you mean they are not fully elucidated? 1.b I am very confused about the challenges of studying the circuit language. I could not find any information to discuss the existing challenges and relate

Reviewer 03Rating 3· reject, not good enoughConfidence 3

Strengths

Justifying the expressiveness of Transformers through the length of formal language is a very important topic and could lead to better understanding of the working mechanisms of Transformers. This paper strengthens prior theoretical results and better bounds the expressivness of UHAT and AHAT.

Weaknesses

Although the theoretical results themselves sounds interesting, I found some definitions and assumptions are not approprately stated, which potentially leads to incorrect results. While it is possible that the theoretical results still hold after fixing all the problems, I think the paper needs a major revision to ensure its validity. Missing important restrictions when defining the transformer model. - Precision of the number processed by the transformer. The paper does not include any restr

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Low-power high-performance VLSI design · Machine Learning and Algorithms

MethodsFocus