LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
Guanzheng Chen, Xin Li, Michael Qizhe Shieh, Lidong Bing

TL;DR
LongPO enables large language models to self-improve on long-context tasks by internally transferring short-context capabilities, effectively balancing performance across different context lengths without extensive human annotation.
Contribution
The paper introduces LongPO, a novel method allowing LLMs to self-evolve for long-context tasks through internal preference learning, reducing reliance on human annotations.
Findings
LongPO retains short-context performance while improving long-context capabilities.
Models trained with LongPO outperform naive fine-tuning and DPO in both contexts.
LongPO-trained models achieve results comparable to or better than GPT-4-128K on long-context benchmarks.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities through pretraining and alignment. However, superior short-context LLMs may underperform in long-context scenarios due to insufficient long-context alignment. This alignment process remains challenging due to the impracticality of human annotation for extended contexts and the difficulty in balancing short- and long-context performance. To address these challenges, we introduce LongPO, that enables short-context LLMs to self-evolve to excel on long-context tasks by internally transferring short-context capabilities. LongPO harnesses LLMs to learn from self-generated short-to-long preference data, comprising paired responses generated for identical instructions with long-context inputs and their compressed short-context counterparts, respectively. This preference reveals capabilities and potentials of LLMs cultivated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsDirect Preference Optimization · Shrink and Fine-Tune
