Characterizing the Expressivity of Local Attention in Transformers

Jiaoda Li; Ryan Cotterell

arXiv:2605.00768·cs.CL·May 20, 2026

Characterizing the Expressivity of Local Attention in Transformers

Jiaoda Li, Ryan Cotterell

PDF

TL;DR

This paper provides a formal analysis of local attention in transformers, showing it expands the model's expressivity and improves language modeling performance when combined with global attention.

Contribution

It introduces a formal framework linking local attention to recognizer expressivity, demonstrating their complementary nature and benefits in language modeling.

Findings

01

Local attention introduces a second temporal operator, enlarging expressivity.

02

Hybrid global-local transformers outperform global-only models.

03

Experiments confirm theoretical predictions with improved language modeling results.

Abstract

The transformer is the most popular neural architecture for language modeling. The cornerstone of the transformer is its global attention mechanism, which lets the model aggregate information from all preceding tokens before generating the next token. One common variant of attention is called local attention, which restricts each token to aggregating information from a bounded window of predecessors, reducing the quadratic cost of global attention to linear. Although this restriction is usually motivated by efficiency, it has also been found to improve model quality, a phenomenon that has so far lacked a satisfactory explanation. We provide a formal account of this phenomenon in terms of recognizer expressivity. It has been shown that fixed-precision transformers with global attention correspond to a fragment of linear temporal logic containing a single past operator. We additionally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.