Extending Context Window of Large Language Models from a Distributional Perspective
Yingsheng Wu, Yuxuan Gu, Xiaocheng Feng, Weihong Zhong, Dongliang Xu,, Qing Yang, Hongtao Liu, Bing Qin

TL;DR
This paper introduces a distributional approach to extend large language models' context windows by minimizing rotary angle distribution disturbance, leading to significant improvements in sequence length and benchmark performance.
Contribution
It proposes a novel extension strategy based on rotary angle distribution analysis, outperforming empirical methods in maintaining model performance for longer sequences.
Findings
Reduces distributional disturbance by up to 72% for 8k context extension
Achieves up to 4.33% improvement on LongBench-E benchmark
Maintains performance on Hugging Face Open LLM benchmark with minimal fluctuation
Abstract
Scaling the rotary position embedding (RoPE) has become a common method for extending the context window of RoPE-based large language models (LLMs). However, existing scaling methods often rely on empirical approaches and lack a profound understanding of the internal distribution within RoPE, resulting in suboptimal performance in extending the context window length. In this paper, we propose to optimize the context window extending task from the view of rotary angle distribution. Specifically, we first estimate the distribution of the rotary angles within the model and analyze the extent to which length extension perturbs this distribution. Then, we present a novel extension strategy that minimizes the disturbance between rotary angle distributions to maintain consistency with the pre-training phase, enhancing the model's capability to generalize to longer sequences. Experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Computational and Text Analysis Methods
