Self-Attention as a Parametric Endofunctor: A Categorical Framework for   Transformer Architectures

Charles O'Neill

arXiv:2501.02931·cs.LG·January 15, 2025

Self-Attention as a Parametric Endofunctor: A Categorical Framework for Transformer Architectures

Charles O'Neill

PDF

Open Access

TL;DR

This paper introduces a category-theoretic framework for understanding self-attention in transformers, modeling layers as endofunctors and monads, and analyzing positional encodings and equivariance properties.

Contribution

It provides a novel categorical model of self-attention, representing layers as endofunctors and monads, and clarifies the algebraic and geometric structures underlying transformer architectures.

Findings

01

Self-attention layers correspond to free monads on an endofunctor.

02

Positional encodings relate to monoid actions and universal properties.

03

Linear self-attention exhibits permutation equivariance.

Abstract

Self-attention mechanisms have revolutionised deep learning architectures, yet their core mathematical structures remain incompletely understood. In this work, we develop a category-theoretic framework focusing on the linear components of self-attention. Specifically, we show that the query, key, and value maps naturally define a parametric 1-morphism in the 2-category $Para (Vect)$ . On the underlying 1-category $Vect$ , these maps induce an endofunctor whose iterated composition precisely models multi-layer attention. We further prove that stacking multiple self-attention layers corresponds to constructing the free monad on this endofunctor. For positional encodings, we demonstrate that strictly additive embeddings correspond to monoid actions in an affine sense, while standard sinusoidal encodings, though not additive, retain a universal property among injective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Science and Education Research · Design Education and Practice

MethodsAttention Is All You Need · Softmax