XL3M: A Training-free Framework for LLM Length Extension Based on   Segment-wise Inference

Shengnan Wang; Youhui Bai; Lin Zhang; Pingyi Zhou; Shixiong Zhao; Gong; Zhang; Sen Wang; Renhai Chen; Hua Xu; Hongwei Sun

arXiv:2405.17755·cs.CL·May 29, 2024

XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference

Shengnan Wang, Youhui Bai, Lin Zhang, Pingyi Zhou, Shixiong Zhao, Gong, Zhang, Sen Wang, Renhai Chen, Hua Xu, Hongwei Sun

PDF

Open Access

TL;DR

XL3M is a training-free framework that enables large language models to handle extremely long sequences by decomposing inputs and constructing relevant key contexts, significantly improving length generalization without additional training.

Contribution

The paper introduces XL3M, a novel training-free method allowing LLMs to reason over much longer sequences by segmenting inputs and selecting relevant segments, without further training or fine-tuning.

Findings

01

Llama2-7B can reason over 20 million tokens using XL3M.

02

XL3M outperforms existing methods in length generalization benchmarks.

03

The framework operates efficiently on standard hardware without additional training.

Abstract

Length generalization failure problem, namely the large language model (LLM) fails to generalize to texts longer than its maximum training length, greatly restricts the application of LLM in the scenarios with streaming long inputs. To address this problem, the existing methods either require substantial costs or introduce precision loss. In this paper, we empirically find that the accuracy of the LLM's prediction is highly correlated to its certainty. Based on this, we propose an efficient training free framework, named XL3M (it means extra-long large language model), which enables the LLMs trained on short sequences to reason extremely long sequence without any further training or fine-tuning. Under the XL3M framework, the input context will be firstly decomposed into multiple short sub-contexts, where each sub-context contains an independent segment and a common ``question'' which is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Mathematics, Computing, and Information Processing