Attention Sinks: A 'Catch, Tag, Release' Mechanism for Embeddings
Stephen Zhang, Mustafa Khan, Vardan Papyan

TL;DR
This paper investigates the role of attention sinks in large language models, revealing their semantic significance and proposing a catch, tag, release mechanism to understand their function and influence on model performance.
Contribution
It introduces a novel 'catch, tag, release' method to analyze attention sinks, demonstrating their semantic encoding and importance in model reasoning and compression.
Findings
Attention sinks carry meaningful semantic information.
The mechanism explains variance in embeddings across models.
Sinks persist even with query-key normalization.
Abstract
Large language models (LLMs) often concentrate their attention on a few specific tokens referred to as attention sinks. Common examples include the first token, a prompt-independent sink, and punctuation tokens, which are prompt-dependent. While the tokens causing the sinks often lack direct semantic meaning, the presence of the sinks is critical for model performance, particularly under model compression and KV-caching. Despite their ubiquity, the function, semantic role, and origin of attention sinks -- especially those beyond the first token -- remain poorly understood. In this work, we conduct a comprehensive investigation demonstrating that attention sinks: catch a sequence of tokens, tag them using a common direction in embedding space, and release them back into the residual stream, where tokens are later retrieved based on the tags they have acquired. Probing experiments reveal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Science and Mapping · Computational and Text Analysis Methods · Software Engineering Research
MethodsSoftmax · Attention Is All You Need · Attention Sinks
