A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)

Yan Li; Tianyi Zhang; Zechuan Li; Soyeon Caren Han

arXiv:2502.02659·cs.CL·June 2, 2025

A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)

Yan Li, Tianyi Zhang, Zechuan Li, Soyeon Caren Han

PDF

Open Access 1 Repo

TL;DR

GALI is a training-free method that enhances large language models' ability to handle longer inputs by interpolating attention logits, improving stability and performance without additional training.

Contribution

We introduce GALI, a novel training-free approach that reuses pretrained positional intervals and interpolates attention logits to improve length extrapolation in LLMs.

Findings

01

GALI outperforms existing methods on long-context tasks.

02

Interpolation within narrower positional ranges improves performance.

03

LLMs interpret positional intervals unevenly, affecting extrapolation.

Abstract

Transformer-based Large Language Models (LLMs) struggle with inputs exceeding their training context window due to positional out-of-distribution (O.O.D.) issues that disrupt attention. Existing solutions, including fine-tuning and training-free methods, face challenges like inefficiency, redundant interpolation, logit outliers, or loss of local positional information. We propose Greedy Attention Logit Interpolation (GALI), a training-free method that improves length extrapolation by greedily reusing pretrained positional intervals and interpolating attention logit to eliminate outliers. GALI achieves stable and superior performance across a wide range of long-context tasks without requiring input-length-specific tuning. Our analysis further reveals that LLMs interpret positional intervals unevenly and that restricting interpolation to narrower ranges improves performance, even on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

academycityl/gali
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Machine Learning and Data Classification · Neural Networks and Applications

MethodsSoftmax · Attention Is All You Need