Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding
Mingyu Jin, Kai Mei, Wujiang Xu, Mingjie Sun, Ruixiang Tang, Mengnan Du, Zirui Liu, Yongfeng Zhang

TL;DR
This paper reveals that massive values in self-attention modules are crucial for understanding contextual knowledge in large language models, influenced by Rotary Positional Encoding, and impacts model interpretability and design.
Contribution
It uncovers the emergence and significance of massive attention values in Q and K, linked to RoPE, and their role in contextual understanding rather than parametric knowledge retrieval.
Findings
Massive values consistently emerge in specific attention regions.
Ignoring massive values reduces performance on contextual tasks.
Rotary Positional Encoding causes the concentration of massive values.
Abstract
Large language models (LLMs) have achieved remarkable success in contextual knowledge understanding. In this paper, we show that these concentrated massive values consistently emerge in specific regions of attention queries (Q) and keys (K) while not having such patterns in values (V) in various modern transformer-based LLMs (Q, K, and V mean the representations output by the query, key, and value layers respectively). Through extensive experiments, we further demonstrate that these massive values play a critical role in interpreting contextual knowledge (knowledge obtained from the current context window) rather than in retrieving parametric knowledge stored within the model's parameters. Our further investigation of quantization strategies reveals that ignoring these massive values leads to a pronounced drop in performance on tasks requiring rich contextual understanding, aligning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Teaching and Learning Methods · Educational Strategies and Epistemologies · Intelligent Tutoring Systems and Adaptive Learning
MethodsSoftmax · Attention Is All You Need
