Garbage Attention in Large Language Models: BOS Sink Heads and Sink-aware Pruning
Jaewon Sok, Jewon Yeom, Seonghyeon Park, Jeongjae Park, Taesup Kim

TL;DR
This paper identifies the BOS sink phenomenon as a key reason for redundancy in large language models, and proposes a pruning method targeting high-BOS sink heads to improve model compression without performance loss.
Contribution
It introduces the BOS sink concept to explain layer-wise redundancy and develops a pruning strategy based on this insight, outperforming traditional weight- or activation-based methods.
Findings
High-BOS sink heads are highly redundant and contribute little to performance.
Pruning high-BOS sink heads preserves accuracy even with aggressive compression.
Sink head behavior remains stable across different sequence lengths.
Abstract
Large Language Models (LLMs) are known to contain significant redundancy, yet a systematic explanation for why certain components, particularly in higher layers, are more redundant has remained elusive. In this work, we identify the BOS sink phenomenon as a key mechanism driving this layer-wise sensitivity. We show that attention heads with high BOS sink scores are strongly associated with functional redundancy: such heads, especially in deeper layers, contribute little to predictive performance and effectively serve as \emph{dumping grounds} for superfluous attention weights. This provides a concrete functional explanation for the structural redundancy reported in prior studies. Leveraging this insight, we introduce a simple pruning strategy that removes high-BOS sink heads. Experiments on Gemma-3, Llama-3.1, and Qwen3 demonstrate that this approach identifies redundant transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Artificial Intelligence in Healthcare and Education · Topic Modeling
