The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks
Shangwen Sun, Alfredo Canziani, Yann LeCun, Jiachen Zhu

TL;DR
This paper investigates two phenomena in Transformer models—massive activations and attention sinks—showing they are architectural artifacts with distinct functions, and identifies key design choices influencing their co-occurrence.
Contribution
The study systematically analyzes the relationship between massive activations and attention sinks, revealing their distinct roles and the architectural factors that cause their co-occurrence.
Findings
Massive activations act as implicit parameters across layers.
Attention sinks bias attention towards short-range dependencies.
Pre-norm configuration enables the co-occurrence of the phenomena.
Abstract
We study two recurring phenomena in Transformer language models: massive activations, in which a small number of tokens exhibit extreme outliers in a few channels, and attention sinks, in which certain tokens attract disproportionate attention mass regardless of semantic relevance. Prior work observes that these phenomena frequently co-occur and often involve the same tokens, but their functional roles and causal relationship remain unclear. Through systematic experiments, we show that the co-occurrence is largely an architectural artifact of modern Transformer design, and that the two phenomena serve related but distinct functions. Massive activations operate globally: they induce near-constant hidden representations that persist across layers, effectively functioning as implicit parameters of the model. Attention sinks operate locally: they modulate attention outputs across heads and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Neurobiology of Language and Bilingualism · Embodied and Extended Cognition
