On the Existence of Universal Simulators of Attention

Debanjan Dutta; Anish Chakrabarty; Faizanuddin Ansari; Swagatam Das

arXiv:2506.18739·cs.LG·April 23, 2026

On the Existence of Universal Simulators of Attention

Debanjan Dutta, Anish Chakrabarty, Faizanuddin Ansari, Swagatam Das

PDF

TL;DR

This paper investigates whether transformer encoders can universally simulate attention mechanisms and elementary operations, providing a formal, data-agnostic construction of such a universal simulator.

Contribution

It introduces a universal transformer-based simulator that can replicate attention outputs and elementary matrix operations without training, bridging learnability and expressivity.

Findings

01

Constructed a universal transformer encoder simulator for attention mechanisms.

02

Provided an algorithmic, data-agnostic solution previously only approximated by learning.

03

Demonstrated the theoretical existence of a universal transformer simulator for attention.

Abstract

Previous work on the learnability of transformers \textemdash\ focused primarily on examining their ability to approximate specific algorithmic patterns through training \textemdash\ has largely been data-driven, offering only probabilistic guarantees rather than deterministic solutions. Expressivity, on the contrary, has been devised to address the problems \emph{computable} by such architecture theoretically. These results proved the Turing-completeness of transformers, investigated bounds focused on circuit complexity, and formal logic. Being at the crossroad between learnability and expressivity, the question remains: \emph{can a transformer, as a computational model, simulate an arbitrary attention mechanism, or in particular, the underlying operations?} In this study, we investigate the transformer encoder's ability to simulate a vanilla attention mechanism. By constructing a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.