Extending Context Window of Large Language Models via Positional Interpolation
Shouyuan Chen, Sherman Wong, Liangjian Chen, Yuandong Tian

TL;DR
This paper introduces Position Interpolation, a method to extend the context window of RoPE-based large language models up to 32,768 tokens with minimal fine-tuning, enabling better performance on long-context tasks while maintaining original quality within the trained window.
Contribution
The paper proposes Position Interpolation, a simple and stable technique to extend context windows of pretrained LLMs without retraining from scratch or significant architectural changes.
Findings
Extended models perform well on long-context tasks like summarization and retrieval.
Position Interpolation preserves original model quality within the original context window.
Theoretical analysis shows interpolation is more stable than extrapolation, with a bound 600 times smaller.
Abstract
We present Position Interpolation (PI) that extends the context window sizes of RoPE-based pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within 1000 steps), while demonstrating strong empirical results on various tasks that require long context, including passkey retrieval, language modeling, and long document summarization from LLaMA 7B to 65B. Meanwhile, the extended model by Position Interpolation preserve quality relatively well on tasks within its original context window. To achieve this goal, Position Interpolation linearly down-scales the input position indices to match the original context window size, rather than extrapolating beyond the trained context length which may lead to catastrophically high attention scores that completely ruin the self-attention mechanism. Our theoretical study shows that the upper bound of interpolation is at least…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-GPTQmodel· 700 dl· ♡ 14700 dl♡ 14
- 🤗bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-fp16model· 722 dl· ♡ 4722 dl♡ 4
- 🤗bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-GPTQmodel· 5 dl· ♡ 35 dl♡ 3
- 🤗bhenrym14/airoboros-33b-gpt4-1.4.1-NTK-16384-GPTQmodel· 4 dl· ♡ 74 dl♡ 7
- 🤗bhenrym14/airoboros-7b-gpt4-1.4.1-lxctx-PI-16384-fp16model· 6 dl6 dl
- 🤗bhenrym14/airoboros-7b-gpt4-1.4.1-lxctx-PI-16384-GPTQmodel· 3 dl· ♡ 23 dl♡ 2
- 🤗bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16model· 718 dl· ♡ 4718 dl♡ 4
- 🤗bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-GPTQmodel· 3 dl· ♡ 53 dl♡ 5
- 🤗bhenrym14/airophin-13b-pntk-16k-GPTQmodel· 5 dl· ♡ 45 dl♡ 4
- 🤗bhenrym14/airophin-13b-pntk-16k-fp16model· 752 dl· ♡ 4752 dl♡ 4
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
