Long-Short Alignment for Effective Long-Context Modeling in LLMs

Tianqi Du; Haotian Huang; Yifei Wang; Yisen Wang

arXiv:2506.11769·cs.CL·June 16, 2025

Long-Short Alignment for Effective Long-Context Modeling in LLMs

Tianqi Du, Haotian Huang, Yifei Wang, Yisen Wang

PDF

Open Access

TL;DR

This paper introduces a novel approach to improve long-context modeling in large language models by focusing on aligning output distributions across different sequence lengths, leading to better length generalization.

Contribution

It proposes the concept of long-short alignment, introduces a metric called Long-Short Misalignment, and develops a regularization method to enhance length generalization in LLMs.

Findings

01

Long-Short Misalignment correlates with length generalization performance.

02

Regularization promoting long-short alignment improves model performance on long sequences.

03

The approach offers a new perspective beyond positional encodings for long-context modeling.

Abstract

Large language models (LLMs) have exhibited impressive performance and surprising emergent properties. However, their effectiveness remains limited by the fixed context window of the transformer architecture, posing challenges for long-context modeling. Among these challenges, length generalization -- the ability to generalize to sequences longer than those seen during training -- is a classical and fundamental problem. In this work, we propose a fresh perspective on length generalization, shifting the focus from the conventional emphasis on input features such as positional encodings or data structures to the output distribution of the model. Specifically, through case studies on synthetic tasks, we highlight the critical role of \textbf{long-short alignment} -- the consistency of output distributions across sequences of varying lengths. Extending this insight to natural language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies

MethodsFocus