Garbage Attention in Large Language Models: BOS Sink Heads and Sink-aware Pruning

Jaewon Sok; Jewon Yeom; Seonghyeon Park; Jeongjae Park; Taesup Kim

arXiv:2601.06787·cs.CL·January 13, 2026

Garbage Attention in Large Language Models: BOS Sink Heads and Sink-aware Pruning

Jaewon Sok, Jewon Yeom, Seonghyeon Park, Jeongjae Park, Taesup Kim

PDF

Open Access

TL;DR

This paper identifies the BOS sink phenomenon as a key reason for redundancy in large language models, and proposes a pruning method targeting high-BOS sink heads to improve model compression without performance loss.

Contribution

It introduces the BOS sink concept to explain layer-wise redundancy and develops a pruning strategy based on this insight, outperforming traditional weight- or activation-based methods.

Findings

01

High-BOS sink heads are highly redundant and contribute little to performance.

02

Pruning high-BOS sink heads preserves accuracy even with aggressive compression.

03

Sink head behavior remains stable across different sequence lengths.

Abstract

Large Language Models (LLMs) are known to contain significant redundancy, yet a systematic explanation for why certain components, particularly in higher layers, are more redundant has remained elusive. In this work, we identify the BOS sink phenomenon as a key mechanism driving this layer-wise sensitivity. We show that attention heads with high BOS sink scores are strongly associated with functional redundancy: such heads, especially in deeper layers, contribute little to predictive performance and effectively serve as \emph{dumping grounds} for superfluous attention weights. This provides a concrete functional explanation for the structural redundancy reported in prior studies. Leveraging this insight, we introduce a simple pruning strategy that removes high-BOS sink heads. Experiments on Gemma-3, Llama-3.1, and Qwen3 demonstrate that this approach identifies redundant transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Artificial Intelligence in Healthcare and Education · Topic Modeling