Masked Hard-Attention Transformers Recognize Exactly the Star-Free   Languages

Andy Yang; David Chiang; and Dana Angluin

arXiv:2310.13897·cs.FL·October 31, 2024·1 cites

Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages

Andy Yang, David Chiang, and Dana Angluin

PDF

Open Access 1 Video

TL;DR

This paper characterizes the expressive power of masked hard-attention transformers, showing they can recognize exactly star-free languages, and explores how various modifications affect their capabilities.

Contribution

It provides exact formal characterizations of transformers with specific attention mechanisms, linking them to logical language classes and analyzing their expressive limits.

Findings

01

Transformers with strict masking and no position embeddings are equivalent to linear temporal logic.

02

Position embeddings, strict masking, and depth increase the expressive power of transformers.

03

Transformers can recognize exactly star-free languages under certain attention constraints.

Abstract

The expressive power of transformers over inputs of unbounded size can be studied through their ability to recognize classes of formal languages. In this paper, we establish exact characterizations of transformers with hard attention (in which all attention is focused on exactly one position) and attention masking (in which each position only attends to positions on one side). With strict masking (each position cannot attend to itself) and without position embeddings, these transformers are expressively equivalent to linear temporal logic (LTL), which defines exactly the star-free languages. A key technique is the use of Boolean RASP as a convenient intermediate language between transformers and LTL. We then take numerous results known for LTL and apply them to transformers, showing how position embeddings, strict masking, and depth all increase expressive power.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages· slideslive

Taxonomy

TopicsCellular Automata and Applications · Formal Methods in Verification · semigroups and automata theory