Self-Attention as a Parametric Endofunctor: A Categorical Framework for Transformer Architectures
Charles O'Neill

TL;DR
This paper introduces a category-theoretic framework for understanding self-attention in transformers, modeling layers as endofunctors and monads, and analyzing positional encodings and equivariance properties.
Contribution
It provides a novel categorical model of self-attention, representing layers as endofunctors and monads, and clarifies the algebraic and geometric structures underlying transformer architectures.
Findings
Self-attention layers correspond to free monads on an endofunctor.
Positional encodings relate to monoid actions and universal properties.
Linear self-attention exhibits permutation equivariance.
Abstract
Self-attention mechanisms have revolutionised deep learning architectures, yet their core mathematical structures remain incompletely understood. In this work, we develop a category-theoretic framework focusing on the linear components of self-attention. Specifically, we show that the query, key, and value maps naturally define a parametric 1-morphism in the 2-category . On the underlying 1-category , these maps induce an endofunctor whose iterated composition precisely models multi-layer attention. We further prove that stacking multiple self-attention layers corresponds to constructing the free monad on this endofunctor. For positional encodings, we demonstrate that strictly additive embeddings correspond to monoid actions in an affine sense, while standard sinusoidal encodings, though not additive, retain a universal property among injective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Science and Education Research · Design Education and Practice
MethodsAttention Is All You Need · Softmax
