LongAlign: A Recipe for Long Context Alignment of Large Language Models
Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang,, Yuxiao Dong, Juanzi Li

TL;DR
LongAlign presents a comprehensive approach for training large language models to effectively understand and generate long-context sequences, combining new datasets, training strategies, and evaluation benchmarks.
Contribution
It introduces a novel recipe including data construction, training techniques, and evaluation methods specifically for long context alignment in large language models.
Findings
Outperforms existing methods by up to 30% on long context tasks.
Maintains proficiency in short, generic tasks.
Provides open-source code, data, and models.
Abstract
Extending large language models to effectively handle long contexts requires instruction fine-tuning on input sequences of similar length. To address this, we present LongAlign -- a recipe of the instruction data, training, and evaluation for long context alignment. First, we construct a long instruction-following dataset using Self-Instruct. To ensure the data diversity, it covers a broad range of tasks from various long context sources. Second, we adopt the packing and sorted batching strategies to speed up supervised fine-tuning on data with varied length distributions. Additionally, we develop a loss weighting method to balance the contribution to the loss across different sequences during packing training. Third, we introduce the LongBench-Chat benchmark for evaluating instruction-following capabilities on queries of 10k-100k in length. Experiments show that LongAlign outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗zai-org/LongAlign-7B-64kmodel· 28 dl· ♡ 328 dl♡ 3
- 🤗zai-org/LongAlign-6B-64kmodel· 23 dl· ♡ 223 dl♡ 2
- 🤗zai-org/LongAlign-13B-64kmodel· 18 dl· ♡ 1318 dl♡ 13
- 🤗zai-org/LongAlign-6B-64k-basemodel· 20 dl· ♡ 520 dl♡ 5
- 🤗zai-org/LongAlign-7B-64k-basemodel· 17 dl· ♡ 417 dl♡ 4
- 🤗zai-org/LongAlign-13B-64k-basemodel· 11 dl· ♡ 311 dl♡ 3
- 🤗MaziyarPanahi/LongAlign-13B-64k-GGUFmodel· 189 dl· ♡ 3189 dl♡ 3
- 🤗MaziyarPanahi/LongAlign-13B-64k-AWQmodel· 7 dl· ♡ 27 dl♡ 2
- 🤗MaziyarPanahi/LongAlign-13B-64k-GPTQmodel· 5 dl· ♡ 15 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
