Self-Attention as Distributional Projection: A Unified Interpretation of Transformer Architecture

Nihal Mehta

arXiv:2511.13780·cs.LG·November 19, 2025

Self-Attention as Distributional Projection: A Unified Interpretation of Transformer Architecture

Nihal Mehta

PDF

Open Access

TL;DR

This paper offers a mathematical interpretation of self-attention in Transformers, linking it to distributional semantics and showing how it naturally arises from projecting co-occurrence statistics, explaining the architecture's design choices.

Contribution

It introduces a unified projection-based framework for understanding self-attention, connecting it to distributional semantics and deriving Transformer components from this principle.

Findings

01

Self-attention can be derived from projecting co-occurrence matrices.

02

Positional encodings and multi-head attention are structured refinements of the projection principle.

03

Transformer architecture's algebraic form follows from the distributional projection framework.

Abstract

This paper presents a mathematical interpretation of self-attention by connecting it to distributional semantics principles. We show that self-attention emerges from projecting corpus-level co-occurrence statistics into sequence context. Starting from the co-occurrence matrix underlying GloVe embeddings, we demonstrate how the projection naturally captures contextual influence, with the query-key-value mechanism arising as the natural asymmetric extension for modeling directional relationships. Positional encodings and multi-head attention then follow as structured refinements of this same projection principle. Our analysis demonstrates that the Transformer architecture's particular algebraic form follows from these projection principles rather than being an arbitrary design choice.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmbodied and Extended Cognition · Philosophy and Theoretical Science · Ferroelectric and Negative Capacitance Devices