The Power of Hard Attention Transformers on Data Sequences: A Formal   Language Theoretic Perspective

Pascal Bergstr\"a{\ss}er; Chris K\"ocher; Anthony Widjaja Lin; Georg; Zetzsche

arXiv:2405.16166·cs.FL·November 13, 2024

The Power of Hard Attention Transformers on Data Sequences: A Formal Language Theoretic Perspective

Pascal Bergstr\"a{\ss}er, Chris K\"ocher, Anthony Widjaja Lin, Georg, Zetzsche

PDF

Open Access 1 Video

TL;DR

This paper explores the increased expressive power of hard attention transformers on data sequences, demonstrating their ability to recognize complex properties beyond regular languages, unlike in string processing.

Contribution

It introduces the formal analysis of transformer encoders on data sequences, showing their capacity to capture non-regular properties and temporal logic extensions.

Findings

01

UHAT over data sequences surpasses $AC^0$ complexity

02

UHAT can recognize non-regular properties

03

UHAT captures languages in extended linear temporal logic

Abstract

Formal language theory has recently been successfully employed to unravel the power of transformer encoders. This setting is primarily applicable in Natural Language Processing (NLP), as a token embedding function (where a bounded number of tokens is admitted) is first applied before feeding the input to the transformer. On certain kinds of data (e.g. time series), we want our transformers to be able to handle arbitrary input sequences of numbers (or tuples thereof) without a priori limiting the values of these numbers. In this paper, we initiate the study of the expressive power of transformer encoders on sequences of data (i.e. tuples of numbers). Our results indicate an increase in expressive power of hard attention transformers over data sequences, in stark contrast to the case of strings. In particular, we prove that Unique Hard Attention Transformers (UHAT) over inputs as data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Power of Hard Attention Transformers on Data Sequences: A formal language theoretic perspective· slideslive

Taxonomy

TopicsNeural Networks and Applications · Topic Modeling · Algorithms and Data Compression