LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language   Models

Chi Han; Qifan Wang; Hao Peng; Wenhan Xiong; Yu Chen; Heng Ji; Sinong; Wang

arXiv:2308.16137·cs.CL·June 26, 2024·5 cites

LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models

Chi Han, Qifan Wang, Hao Peng, Wenhan Xiong, Yu Chen, Heng Ji, Sinong, Wang

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

LM-Infinite is a method that enables large language models to effectively process extremely long contexts, up to 200 million tokens, without retraining, significantly improving their applicability to long-text tasks.

Contribution

The paper introduces LM-Infinite, a simple, flexible, and parameter-free approach that enhances LLMs' ability to handle ultra-long inputs, overcoming limitations of existing techniques.

Findings

01

Enables models trained on 2K-4K segments to process up to 200M tokens.

02

Achieves 2.7x decoding speedup and 7.5x memory reduction.

03

Improves zero-shot performance on tasks like Passkey Retrieval and Qasper.

Abstract

Today's large language models (LLMs) typically train on short text segments (e.g., <4K tokens) due to the quadratic complexity of their Transformer architectures. As a result, their performance suffers drastically on inputs longer than those encountered during training, substantially limiting their applications in real-world tasks involving long contexts such as encoding scientific articles, code repositories, or long dialogues. Through theoretical analysis and empirical investigation, this work identifies three major factors contributing to this length generalization failure. Our theoretical analysis further reveals that commonly used techniques like truncating the attention window or relative positional encodings are inadequate to address them. Answering these challenges, we propose LM-Infinite, a simple and effective method for enhancing LLMs' capabilities of handling long contexts.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Glaciohound/LM-Infinite
pytorchOfficial

Models

🤗
budecosystem/genz-13b-infinite
model· 783 dl· ♡ 2
783 dl♡ 2

Videos

LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsAttention Is All You Need · Linear Layer · Dropout · Multi-Head Attention · Position-Wise Feed-Forward Layer · Layer Normalization · Absolute Position Encodings · Softmax · Dense Connections · Label Smoothing