Position Interpolation Improves ALiBi Extrapolation

Faisal Al-Khateeb; Nolan Dey; Daria Soboleva; Joel Hestness

arXiv:2310.13017·cs.CL·October 23, 2023·1 cites

Position Interpolation Improves ALiBi Extrapolation

Faisal Al-Khateeb, Nolan Dey, Daria Soboleva, Joel Hestness

PDF

Open Access 1 Repo 3 Models

TL;DR

This paper demonstrates that linear position interpolation enhances the ability of models with ALiBi to extrapolate to longer sequences, improving performance on language modeling, summarization, and retrieval tasks.

Contribution

It introduces the use of linear position interpolation to extend ALiBi's extrapolation range, a novel approach for better long-sequence modeling.

Findings

01

Position interpolation significantly improves extrapolation in language models.

02

Enhanced performance on downstream summarization and retrieval tasks.

03

Method is effective for models using Attention with Linear Biases (ALiBi).

Abstract

Linear position interpolation helps pre-trained models using rotary position embeddings (RoPE) to extrapolate to longer sequence lengths. We propose using linear position interpolation to extend the extrapolation range of models using Attention with Linear Biases (ALiBi). We find position interpolation significantly improves extrapolation capability on upstream language modelling and downstream summarization and retrieval tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ofirpress/attention_with_linear_biases
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis