An Efficient Recipe for Long Context Extension via Middle-Focused   Positional Encoding

Tong Wu; Yanpeng Zhao; Zilong Zheng

arXiv:2406.07138·cs.CL·October 11, 2024

An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding

Tong Wu, Yanpeng Zhao, Zilong Zheng

PDF

Open Access 1 Repo 1 Video

TL;DR

CREAM is a simple, training-efficient method that extends large language models' context length to 256K by interpolating positional encodings and focusing on the middle context during fine-tuning.

Contribution

It introduces CREAM, a novel positional encoding interpolation technique that enables long context extension with minimal fine-tuning and addresses the middle-context information loss.

Findings

01

Successfully extends Llama 2-7B to 256K context length

02

Improves middle-context information utilization

03

Requires only fine-tuning at original context window

Abstract

Recently, many methods have been developed to extend the context length of pre-trained large language models (LLMs), but they often require fine-tuning at the target length ( $≫ 4 K$ ) and struggle to effectively utilize information from the middle part of the context. To address these issues, we propose $C$ ontinuity- $R$ elativity ind $E$ xing with g $A$ ussian $M$ iddle ( $CREAM$ ), which interpolates positional encodings by manipulating position indices. Apart from being simple, $CREAM$ is training-efficient: it only requires fine-tuning at the pre-trained context window (e.g., Llama 2-4K) and can extend LLMs to a much longer target context length (e.g., 256K). To ensure that the model focuses more on the information in the middle, we introduce a truncated Gaussian to encourage sampling from the middle part of the context during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bigai-nlco/cream
noneOfficial

Videos

An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding· slideslive

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques

MethodsBalanced Selection · LLaMA