A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)
Yan Li, Tianyi Zhang, Zechuan Li, Soyeon Caren Han

TL;DR
GALI is a training-free method that enhances large language models' ability to handle longer inputs by interpolating attention logits, improving stability and performance without additional training.
Contribution
We introduce GALI, a novel training-free approach that reuses pretrained positional intervals and interpolates attention logits to improve length extrapolation in LLMs.
Findings
GALI outperforms existing methods on long-context tasks.
Interpolation within narrower positional ranges improves performance.
LLMs interpret positional intervals unevenly, affecting extrapolation.
Abstract
Transformer-based Large Language Models (LLMs) struggle with inputs exceeding their training context window due to positional out-of-distribution (O.O.D.) issues that disrupt attention. Existing solutions, including fine-tuning and training-free methods, face challenges like inefficiency, redundant interpolation, logit outliers, or loss of local positional information. We propose Greedy Attention Logit Interpolation (GALI), a training-free method that improves length extrapolation by greedily reusing pretrained positional intervals and interpolating attention logit to eliminate outliers. GALI achieves stable and superior performance across a wide range of long-context tasks without requiring input-length-specific tuning. Our analysis further reveals that LLMs interpret positional intervals unevenly and that restricting interpolation to narrower ranges improves performance, even on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Machine Learning and Data Classification · Neural Networks and Applications
MethodsSoftmax · Attention Is All You Need
