Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large   Language Models without Training through Attention Calibration

Zhongzhi Yu; Zheng Wang; Yonggan Fu; Huihong Shi; Khalid Shaikh,; Yingyan Celine Lin

arXiv:2406.15765·cs.LG·June 25, 2024·1 cites

Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration

Zhongzhi Yu, Zheng Wang, Yonggan Fu, Huihong Shi, Khalid Shaikh,, Yingyan Celine Lin

PDF

Open Access 1 Repo

TL;DR

This paper investigates the phenomenon of attention sinks in large language models, visualizes their occurrence, and introduces a training-free attention calibration method that improves model accuracy during inference without weight updates.

Contribution

It uncovers new insights about attention sinks in LLMs, including their occurrence beyond sequence starts and their varied impact on accuracy, and proposes a novel attention calibration technique that enhances performance without training.

Findings

01

Attention sinks occur within tokens, not just at sequence starts.

02

Not all attention sinks positively impact accuracy.

03

ACT improves LLM accuracy by up to 7.30% across datasets.

Abstract

Attention is a fundamental component behind the remarkable achievements of large language models (LLMs). However, our current understanding of the attention mechanism, especially regarding how attention distributions are established, remains limited. Inspired by recent studies that explore the presence of attention sink in the initial token, which receives disproportionately large attention scores despite their lack of semantic importance, this work delves deeper into this phenomenon. We aim to provide a more profound understanding of the existence of attention sinks within LLMs and to uncover ways to enhance the achievable accuracy of LLMs by directly optimizing the attention distributions, without the need for weight finetuning. Specifically, this work begins with comprehensive visualizations of the attention distributions in LLMs during inference across various inputs and tasks.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gatech-eic/act
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSoftmax · Attention Is All You Need · Attention Sinks