Extending Context Window of Large Language Models via Positional   Interpolation

Shouyuan Chen; Sherman Wong; Liangjian Chen; Yuandong Tian

arXiv:2306.15595·cs.CL·June 29, 2023·44 cites

Extending Context Window of Large Language Models via Positional Interpolation

Shouyuan Chen, Sherman Wong, Liangjian Chen, Yuandong Tian

PDF

Open Access 5 Repos 10 Models

TL;DR

This paper introduces Position Interpolation, a method to extend the context window of RoPE-based large language models up to 32,768 tokens with minimal fine-tuning, enabling better performance on long-context tasks while maintaining original quality within the trained window.

Contribution

The paper proposes Position Interpolation, a simple and stable technique to extend context windows of pretrained LLMs without retraining from scratch or significant architectural changes.

Findings

01

Extended models perform well on long-context tasks like summarization and retrieval.

02

Position Interpolation preserves original model quality within the original context window.

03

Theoretical analysis shows interpolation is more stable than extrapolation, with a bound 600 times smaller.

Abstract

We present Position Interpolation (PI) that extends the context window sizes of RoPE-based pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within 1000 steps), while demonstrating strong empirical results on various tasks that require long context, including passkey retrieval, language modeling, and long document summarization from LLaMA 7B to 65B. Meanwhile, the extended model by Position Interpolation preserve quality relatively well on tasks within its original context window. To achieve this goal, Position Interpolation linearly down-scales the input position indices to match the original context window size, rather than extrapolating beyond the trained context length which may lead to catastrophically high attention scores that completely ruin the self-attention mechanism. Our theoretical study shows that the upper bound of interpolation is at least…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis