Logical Languages Accepted by Transformer Encoders with Hard Attention
Pablo Barcelo, Alexander Kozachinskiy, Anthony Widjaja Lin, Vladimir, Podolskii

TL;DR
This paper investigates the expressive power of transformer encoders with hard attention mechanisms, showing their limitations and capabilities in recognizing formal languages within certain circuit complexity classes.
Contribution
It characterizes the classes of formal languages recognized by UHAT and AHAT transformer encoders, revealing their computational boundaries and capabilities.
Findings
UHAT encoders recognize only a subset of ${ m AC}^0$ languages.
AHAT encoders can recognize all languages definable in first-order logic with unary predicates.
UHAT cannot recognize some ${ m AC}^0$ languages, but can recognize all regular languages within ${ m AC}^0$.
Abstract
We contribute to the study of formal languages that can be recognized by transformer encoders. We focus on two self-attention mechanisms: (1) UHAT (Unique Hard Attention Transformers) and (2) AHAT (Average Hard Attention Transformers). UHAT encoders are known to recognize only languages inside the circuit complexity class , i.e., accepted by a family of poly-sized and depth-bounded boolean circuits with unbounded fan-ins. On the other hand, AHAT encoders can recognize languages outside ), but their expressive power still lies within the bigger circuit complexity class , i.e., -circuits extended by majority gates. We first show a negative result that there is an -language that cannot be recognized by an UHAT encoder. On the positive side, we show that UHAT encoders can recognize a rich fragment of -languages, namely,…
Peer Reviews
Decision·ICLR 2024 poster
The paper contribute to a fun connection between formal language theory in order to analyze the expressivity of transformers. This line of research sounds more like research performed in TCS than in IA tracks but since it is apply to IA-defined model it makes some sense. Understanding the expressivity of those model might be enlightening to people actually playing with them.
Some of the contribution sounds really artificial to me. In particular, why should we study the commutative closure of language defined by transformers? While it makes sense in TCS, I fail to grasp the importance in this context.
1. The study about whether circuit language can be accepted by existing transformer encoders is interesting and impressive. 2. The paper offers comprehensive theoretical justification to demonstrate and validate their findings.
1. The structure of this paper is very messy, which is very hard to follow. Let me take the Section Introduction as an instance: 1.a In Section 1 Introduction, the paper claims that "the expressive power of transformer encoders has not been fully elucidated to date.". I am curious about that. What do you mean they are not fully elucidated? 1.b I am very confused about the challenges of studying the circuit language. I could not find any information to discuss the existing challenges and relate
Justifying the expressiveness of Transformers through the length of formal language is a very important topic and could lead to better understanding of the working mechanisms of Transformers. This paper strengthens prior theoretical results and better bounds the expressivness of UHAT and AHAT.
Although the theoretical results themselves sounds interesting, I found some definitions and assumptions are not approprately stated, which potentially leads to incorrect results. While it is possible that the theoretical results still hold after fixing all the problems, I think the paper needs a major revision to ensure its validity. Missing important restrictions when defining the transformer model. - Precision of the number processed by the transformer. The paper does not include any restr
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Low-power high-performance VLSI design · Machine Learning and Algorithms
MethodsFocus
