Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification
Konstantin Donhauser, Charles Arnal, Mohammad Pezeshki, Vivien, Cabannes, David Lopez-Paz, Kartik Ahuja

TL;DR
This paper investigates how attention heads in long-context NLP models function, revealing that some heads rely on local information while others switch between local and long-context, and shows how to identify crucial heads efficiently.
Contribution
It introduces a method to predict which attention heads need long-context information using only local keys, simplifying analysis and potentially improving efficiency.
Findings
Some heads attend only locally, others switch between local and long-context.
Long-context scores can be predicted with second moment approximations.
Identifying crucial heads enables more efficient long-sequence processing.
Abstract
The ability to process long contexts is crucial for many natural language processing tasks, yet it remains a significant challenge. While substantial progress has been made in enhancing the efficiency of attention mechanisms, there is still a gap in understanding how attention heads function in long-context settings. In this paper, we observe that while certain heads consistently attend to local information only, others swing between attending to local and long-context information depending on the query. This raises the question: can we identify which heads require long-context information to predict the next token accurately? We demonstrate that it's possible to predict which heads are crucial for long-context processing using only local keys. The core idea here is to exploit a simple model for the long-context scores via second moment approximations. These findings unveil simple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Functional Brain Connectivity Studies
MethodsSoftmax · Attention Is All You Need
