InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin,, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun

TL;DR
InfLLM is a training-free method that enhances large language models' ability to process extremely long sequences by using an efficient memory mechanism, avoiding costly retraining.
Contribution
This paper introduces InfLLM, a novel memory-based approach that enables LLMs to handle long sequences without additional training or fine-tuning.
Findings
InfLLM achieves comparable performance to trained models on long sequences.
It effectively captures long-distance dependencies up to 1,024K tokens.
The method requires no additional training or fine-tuning.
Abstract
Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs (e.g., LLM-driven agents). However, existing LLMs, pre-trained on sequences with a restricted maximum length, cannot process longer sequences due to the out-of-domain and distraction issues. Common solutions often involve continual pre-training on longer sequences, which will introduce expensive computational overhead and uncontrollable change in model capabilities. In this paper, we unveil the intrinsic capacity of LLMs for understanding extremely long sequences without any fine-tuning. To this end, we introduce a training-free memory-based method, InfLLM. Specifically, InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention computation. Thereby, InfLLM allows LLMs to efficiently process…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
