On the Existence of Universal Simulators of Attention
Debanjan Dutta, Anish Chakrabarty, Faizanuddin Ansari, Swagatam Das

TL;DR
This paper investigates whether transformer encoders can universally simulate attention mechanisms and elementary operations, providing a formal, data-agnostic construction of such a universal simulator.
Contribution
It introduces a universal transformer-based simulator that can replicate attention outputs and elementary matrix operations without training, bridging learnability and expressivity.
Findings
Constructed a universal transformer encoder simulator for attention mechanisms.
Provided an algorithmic, data-agnostic solution previously only approximated by learning.
Demonstrated the theoretical existence of a universal transformer simulator for attention.
Abstract
Previous work on the learnability of transformers \textemdash\ focused primarily on examining their ability to approximate specific algorithmic patterns through training \textemdash\ has largely been data-driven, offering only probabilistic guarantees rather than deterministic solutions. Expressivity, on the contrary, has been devised to address the problems \emph{computable} by such architecture theoretically. These results proved the Turing-completeness of transformers, investigated bounds focused on circuit complexity, and formal logic. Being at the crossroad between learnability and expressivity, the question remains: \emph{can a transformer, as a computational model, simulate an arbitrary attention mechanism, or in particular, the underlying operations?} In this study, we investigate the transformer encoder's ability to simulate a vanilla attention mechanism. By constructing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
