Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Zunhai Su; Hengyuan Zhang; Wei Wu; Yifan Zhang; Yaxiu Liu; He Xiao; Qingyao Yang; Yuxuan Sun; Rui Yang; Chao Zhang; Keyu Fan; Weihao Ye; Jing Xiong; Hui Shen; Chaofan Tao; Taiqiang Wu; Zhongwei Wan; Yulei Qian; Yuchen Xie; Ngai Wong

arXiv:2604.10098·cs.LG·April 14, 2026

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Zunhai Su, Hengyuan Zhang, Wei Wu, Yifan Zhang, Yaxiu Liu, He Xiao, Qingyao Yang, Yuxuan Sun, Rui Yang, Chao Zhang, Keyu Fan, Weihao Ye, Jing Xiong, Hui Shen, Chaofan Tao, Taiqiang Wu, Zhongwei Wan, Yulei Qian, Yuchen Xie, Ngai Wong

PDF

1 Repo

TL;DR

This survey comprehensively reviews Attention Sink in Transformers, covering its utilization, interpretation, and mitigation strategies to improve model interpretability and performance.

Contribution

It is the first systematic survey that consolidates research on Attention Sink, clarifies key concepts, and guides future research directions.

Findings

01

Identifies key dimensions of Attention Sink research

02

Provides a structured overview of utilization, interpretation, and mitigation

03

Offers guidance for managing Attention Sink in Transformers

Abstract

As the foundational architecture of modern machine learning, Transformers have driven remarkable progress across diverse AI domains. Despite their transformative impact, a persistent challenge across various Transformers is Attention Sink (AS), in which a disproportionate amount of attention is focused on a small subset of specific yet uninformative tokens. AS complicates interpretability, significantly affecting the training and inference dynamics, and exacerbates issues such as hallucinations. In recent years, substantial research has been dedicated to understanding and harnessing AS. However, a comprehensive survey that systematically consolidates AS-related research and offers guidance for future advancements remains lacking. To address this gap, we present the first survey on AS, structured around three key dimensions that define the current research landscape: Fundamental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZunhaiSu/Awesome-Attention-Sink
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.