The Rotary Position Embedding May Cause Dimension Inefficiency in   Attention Heads for Long-Distance Retrieval

Ting-Rui Chiang; Dani Yogatama

arXiv:2502.11276·cs.CL·February 18, 2025

The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval

Ting-Rui Chiang, Dani Yogatama

PDF

Open Access

TL;DR

This paper investigates how Rotary Position Embedding (RoPE) in large language models may lead to dimension inefficiency, especially in long-distance retrieval tasks, by causing some dimensions to be underutilized due to wide rotation angles.

Contribution

The study provides empirical evidence that RoPE can cause certain dimensions to be underused, highlighting a potential limitation of RoPE in long-context modeling.

Findings

01

RoPE causes low utility in some dimensions during attention.

02

Dimensions with large rotation angles are less effective.

03

RoPE's dimension inefficiency impacts long-distance question answering.

Abstract

The Rotary Position Embedding (RoPE) is widely used in the attention heads of many large language models (LLM). It rotates dimensions in the query and the key vectors by different angles according to their positions in the input sequence. For long context modeling, the range of positions may vary a lot, and thus RoPE rotates some dimensions by a great range of angles. We hypothesize that the wide range of rotation angles may prevent LLMs from utilizing those dimensions. To validate this hypothesis, we present a controlled experiment showing that applying RoPE causes low utility of certain dimensions. Our analyses on three LLMs also indicate that these dimensions do not help LLMs do long-context question answering.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpatial Cognition and Navigation · Augmented Reality Applications · Robotics and Automated Systems

MethodsSoftmax · Attention Is All You Need