InfLLM: Training-Free Long-Context Extrapolation for LLMs with an   Efficient Context Memory

Chaojun Xiao; Pengle Zhang; Xu Han; Guangxuan Xiao; Yankai Lin,; Zhengyan Zhang; Zhiyuan Liu; Maosong Sun

arXiv:2402.04617·cs.CL·May 29, 2024·1 cites

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin,, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun

PDF

Open Access 1 Repo

TL;DR

InfLLM is a training-free method that enhances large language models' ability to process extremely long sequences by using an efficient memory mechanism, avoiding costly retraining.

Contribution

This paper introduces InfLLM, a novel memory-based approach that enables LLMs to handle long sequences without additional training or fine-tuning.

Findings

01

InfLLM achieves comparable performance to trained models on long sequences.

02

It effectively captures long-distance dependencies up to 1,024K tokens.

03

The method requires no additional training or fine-tuning.

Abstract

Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs (e.g., LLM-driven agents). However, existing LLMs, pre-trained on sequences with a restricted maximum length, cannot process longer sequences due to the out-of-domain and distraction issues. Common solutions often involve continual pre-training on longer sequences, which will introduce expensive computational overhead and uncontrollable change in model capabilities. In this paper, we unveil the intrinsic capacity of LLMs for understanding extremely long sequences without any fine-tuning. To this end, we introduce a training-free memory-based method, InfLLM. Specifically, InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention computation. Thereby, InfLLM allows LLMs to efficiently process…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thunlp/infllm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications