Demystifying the Slash Pattern in Attention: The Role of RoPE
Yuan Cheng, Fengzhuo Zhang, Yunlong Hou, Cunxiao Du, Chao Du, Tianyu Pang, Aixin Sun, Zhuoran Yang

TL;DR
This paper investigates the emergence of slash attention patterns in large language models, revealing their intrinsic nature and explaining their formation through empirical analysis and theoretical modeling of queries, keys, and Rotary Position Embedding.
Contribution
It provides a comprehensive analysis of slash-dominant heads in LLMs, identifying key conditions and offering a theoretical framework that explains their emergence and generalization.
Findings
SDHs are intrinsic to models and generalize out-of-distribution
Queries and keys are nearly rank-one in SDHs
RoPE's medium- and high-frequency components drive SDHs
Abstract
Large Language Models (LLMs) often exhibit slash attention patterns, where attention scores concentrate along the -th sub-diagonal for some offset . These patterns play a key role in passing information across tokens. But why do they emerge? In this paper, we demystify the emergence of these Slash-Dominant Heads (SDHs) from both empirical and theoretical perspectives. First, by analyzing open-source LLMs, we find that SDHs are intrinsic to models and generalize to out-of-distribution prompts. To explain the intrinsic emergence, we analyze the queries, keys, and Rotary Position Embedding (RoPE), which jointly determine attention scores. Our empirical analysis reveals two characteristic conditions of SDHs: (1) Queries and keys are almost rank-one, and (2) RoPE is dominated by medium- and high-frequency components. Under these conditions, queries and keys are nearly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Big Data and Digital Economy
