Long-Short Alignment for Effective Long-Context Modeling in LLMs
Tianqi Du, Haotian Huang, Yifei Wang, Yisen Wang

TL;DR
This paper introduces a novel approach to improve long-context modeling in large language models by focusing on aligning output distributions across different sequence lengths, leading to better length generalization.
Contribution
It proposes the concept of long-short alignment, introduces a metric called Long-Short Misalignment, and develops a regularization method to enhance length generalization in LLMs.
Findings
Long-Short Misalignment correlates with length generalization performance.
Regularization promoting long-short alignment improves model performance on long sequences.
The approach offers a new perspective beyond positional encodings for long-context modeling.
Abstract
Large language models (LLMs) have exhibited impressive performance and surprising emergent properties. However, their effectiveness remains limited by the fixed context window of the transformer architecture, posing challenges for long-context modeling. Among these challenges, length generalization -- the ability to generalize to sequences longer than those seen during training -- is a classical and fundamental problem. In this work, we propose a fresh perspective on length generalization, shifting the focus from the conventional emphasis on input features such as positional encodings or data structures to the output distribution of the model. Specifically, through case studies on synthetic tasks, we highlight the critical role of \textbf{long-short alignment} -- the consistency of output distributions across sequences of varying lengths. Extending this insight to natural language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
MethodsFocus
