The Power of Hard Attention Transformers on Data Sequences: A Formal Language Theoretic Perspective
Pascal Bergstr\"a{\ss}er, Chris K\"ocher, Anthony Widjaja Lin, Georg, Zetzsche

TL;DR
This paper explores the increased expressive power of hard attention transformers on data sequences, demonstrating their ability to recognize complex properties beyond regular languages, unlike in string processing.
Contribution
It introduces the formal analysis of transformer encoders on data sequences, showing their capacity to capture non-regular properties and temporal logic extensions.
Findings
UHAT over data sequences surpasses $AC^0$ complexity
UHAT can recognize non-regular properties
UHAT captures languages in extended linear temporal logic
Abstract
Formal language theory has recently been successfully employed to unravel the power of transformer encoders. This setting is primarily applicable in Natural Language Processing (NLP), as a token embedding function (where a bounded number of tokens is admitted) is first applied before feeding the input to the transformer. On certain kinds of data (e.g. time series), we want our transformers to be able to handle arbitrary input sequences of numbers (or tuples thereof) without a priori limiting the values of these numbers. In this paper, we initiate the study of the expressive power of transformer encoders on sequences of data (i.e. tuples of numbers). Our results indicate an increase in expressive power of hard attention transformers over data sequences, in stark contrast to the case of strings. In particular, we prove that Unique Hard Attention Transformers (UHAT) over inputs as data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Topic Modeling · Algorithms and Data Compression
