DoPE: Denoising Rotary Position Embedding

Jing Xiong; Liyang Fan; Hui Shen; Zunhai Su; Min Yang; Lingpeng Kong; Ngai Wong

arXiv:2511.09146·cs.CL·January 7, 2026

DoPE: Denoising Rotary Position Embedding

Jing Xiong, Liyang Fan, Hui Shen, Zunhai Su, Min Yang, Lingpeng Kong, Ngai Wong

PDF

Open Access

TL;DR

This paper analyzes the instability issues of Rotary Position Embedding in large language models and introduces DoPE, a training-free method that enhances stability and performance during long-context extrapolation by suppressing noisy attention heads.

Contribution

The paper provides a spectral analysis of RoPE's instabilities and proposes DoPE, a novel, training-free approach to improve LLM stability and performance without fine-tuning.

Findings

01

DoPE improves length extrapolation performance.

02

DoPE increases robustness to perturbations.

03

DoPE boosts in-context learning tasks.

Abstract

Positional encoding is essential for large language models (LLMs) to represent sequence order, yet recent studies show that Rotary Position Embedding (RoPE) can induce massive activation. We investigate the source of these instabilities via a spectral analysis of RoPE, and show that its low-frequency components concentrate structured energy, producing low-rank, over-aligned attention patterns. We theoretically reveal that this low-frequency alignment manifests as activation noise, degrading stability during long-context extrapolation. To mitigate this effect, we introduce Denoising Rotary Position Embedding (DoPE), a training-free method that identifies and suppresses noisy attention heads using truncated matrix entropy, then reparameterizes their attention maps with an isotropic Gaussian distribution. Across a range of settings, DoPE improves length extrapolation performance without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning