Position Interpolation Improves ALiBi Extrapolation
Faisal Al-Khateeb, Nolan Dey, Daria Soboleva, Joel Hestness

TL;DR
This paper demonstrates that linear position interpolation enhances the ability of models with ALiBi to extrapolate to longer sequences, improving performance on language modeling, summarization, and retrieval tasks.
Contribution
It introduces the use of linear position interpolation to extend ALiBi's extrapolation range, a novel approach for better long-sequence modeling.
Findings
Position interpolation significantly improves extrapolation in language models.
Enhanced performance on downstream summarization and retrieval tasks.
Method is effective for models using Attention with Linear Biases (ALiBi).
Abstract
Linear position interpolation helps pre-trained models using rotary position embeddings (RoPE) to extrapolate to longer sequence lengths. We propose using linear position interpolation to extend the extrapolation range of models using Attention with Linear Biases (ALiBi). We find position interpolation significantly improves extrapolation capability on upstream language modelling and downstream summarization and retrieval tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
