Base of RoPE Bounds Context Length
Xin Men, Mingyu Xu, Bingning Wang, Qingyu Zhang, Hongyu Lin, Xianpei, Han, Weipeng Chen

TL;DR
This paper investigates the role of Rotary Position Embedding (RoPE) in LLMs, revealing a theoretical lower bound on the RoPE base parameter necessary for achieving certain context lengths, impacting long-context training.
Contribution
It introduces the concept of the RoPE base bounds for context length, providing both theoretical analysis and empirical evidence of its importance.
Findings
There is a lower bound on the RoPE base for desired context length.
Superficial long-context ability may be due to out-of-distribution effects.
The relationship between RoPE base and context length is both theoretically and empirically established.
Abstract
Position embedding is a core component of current Large Language Models (LLMs). Rotary position embedding (RoPE), a technique that encodes the position information with a rotation matrix, has been the de facto choice for position embedding in many LLMs, such as the Llama series. RoPE has been further utilized to extend long context capability, which is roughly based on adjusting the \textit{base} parameter of RoPE to mitigate out-of-distribution (OOD) problems in position embedding. However, in this paper, we find that LLMs may obtain a superficial long-context ability based on the OOD theory. We revisit the role of RoPE in LLMs and propose a novel property of long-term decay, we derive that the \textit{base of RoPE bounds context length}: there is an absolute lower bound for the base value to obtain certain context length capability. Our work reveals the relationship between context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsBalanced Selection · LLaMA
