LongRecipe: Recipe for Efficient Long Context Generalization in Large   Language Models

Zhiyuan Hu; Yuliang Liu; Jinman Zhao; Suyuchen Wang; Yan Wang; Wei; Shen; Qing Gu; Anh Tuan Luu; See-Kiong Ng; Zhiwei Jiang; Bryan Hooi

arXiv:2409.00509·cs.CL·September 5, 2024

LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models

Zhiyuan Hu, Yuliang Liu, Jinman Zhao, Suyuchen Wang, Yan Wang, Wei, Shen, Qing Gu, Anh Tuan Luu, See-Kiong Ng, Zhiwei Jiang, Bryan Hooi

PDF

Open Access 1 Repo 1 Video

TL;DR

LongRecipe is an efficient training method that significantly extends the context window of large language models, enabling better long-range dependency understanding with reduced computational resources.

Contribution

We propose LongRecipe, a novel training strategy that extends LLMs' context window efficiently without extensive retraining, improving long-sequence processing capabilities.

Findings

01

Extends context window from 8k to 128k tokens.

02

Reduces training resources by over 85%.

03

Maintains performance on general tasks.

Abstract

Large language models (LLMs) face significant challenges in handling long-context tasks because of their limited effective context window size during pretraining, which restricts their ability to generalize over extended sequences. Meanwhile, extending the context window in LLMs through post-pretraining is highly resource-intensive. To address this, we introduce LongRecipe, an efficient training strategy for extending the context window of LLMs, including impactful token analysis, position index transformation, and training optimization strategies. It simulates long-sequence inputs while maintaining training efficiency and significantly improves the model's understanding of long-range dependencies. Experiments on three types of LLMs show that LongRecipe can utilize long sequences while requiring only 30% of the target context window size, and reduces computational training resource over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhiyuanhubj/LongRecipe
pytorchOfficial

Videos

LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models· underline

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis

MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer · Adam