What are you sinking? A geometric approach on attention sink
Valeria Ruscio, Umberto Nanni, Fabrizio Silvestri

TL;DR
This paper reveals that attention sink patterns in transformers are driven by fundamental geometric principles related to reference frames, influencing architecture design and understanding of attention mechanisms.
Contribution
It introduces a geometric perspective showing attention sink as a manifestation of reference frame establishment in transformer representations.
Findings
Attention sink correlates with three reference frame types.
Reference frames emerge early in training as optimal solutions.
Position encoding influences the type of reference frame.
Abstract
Attention sink (AS) is a consistent pattern in transformer attention maps where certain tokens (often special tokens or positional anchors) disproportionately attract attention from other tokens. We show that in transformers, AS is not an architectural artifact, but it is the manifestation of a fundamental geometric principle: the establishment of reference frames that anchor representational spaces. We analyze several architectures and identify three distinct reference frame types, centralized, distributed, and bidirectional, that correlate with the attention sink phenomenon. We show that they emerge during the earliest stages of training as optimal solutions to the problem of establishing stable coordinate systems in high-dimensional spaces. We show the influence of architecture components, particularly position encoding implementations, on the specific type of reference frame. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsVisual Attention and Saliency Detection · Neural and Behavioral Psychology Studies · Data Visualization and Analytics
