DoPE: Denoising Rotary Position Embedding
Jing Xiong, Liyang Fan, Hui Shen, Zunhai Su, Min Yang, Lingpeng Kong, Ngai Wong

TL;DR
This paper analyzes the instability issues of Rotary Position Embedding in large language models and introduces DoPE, a training-free method that enhances stability and performance during long-context extrapolation by suppressing noisy attention heads.
Contribution
The paper provides a spectral analysis of RoPE's instabilities and proposes DoPE, a novel, training-free approach to improve LLM stability and performance without fine-tuning.
Findings
DoPE improves length extrapolation performance.
DoPE increases robustness to perturbations.
DoPE boosts in-context learning tasks.
Abstract
Positional encoding is essential for large language models (LLMs) to represent sequence order, yet recent studies show that Rotary Position Embedding (RoPE) can induce massive activation. We investigate the source of these instabilities via a spectral analysis of RoPE, and show that its low-frequency components concentrate structured energy, producing low-rank, over-aligned attention patterns. We theoretically reveal that this low-frequency alignment manifests as activation noise, degrading stability during long-context extrapolation. To mitigate this effect, we introduce Denoising Rotary Position Embedding (DoPE), a training-free method that identifies and suppresses noisy attention heads using truncated matrix entropy, then reparameterizes their attention maps with an isotropic Gaussian distribution. Across a range of settings, DoPE improves length extrapolation performance without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
