LongPO: Long Context Self-Evolution of Large Language Models through   Short-to-Long Preference Optimization

Guanzheng Chen; Xin Li; Michael Qizhe Shieh; Lidong Bing

arXiv:2502.13922·cs.CL·March 4, 2025

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Guanzheng Chen, Xin Li, Michael Qizhe Shieh, Lidong Bing

PDF

Open Access 1 Repo 4 Models

TL;DR

LongPO enables large language models to self-improve on long-context tasks by internally transferring short-context capabilities, effectively balancing performance across different context lengths without extensive human annotation.

Contribution

The paper introduces LongPO, a novel method allowing LLMs to self-evolve for long-context tasks through internal preference learning, reducing reliance on human annotations.

Findings

01

LongPO retains short-context performance while improving long-context capabilities.

02

Models trained with LongPO outperform naive fine-tuning and DPO in both contexts.

03

LongPO-trained models achieve results comparable to or better than GPT-4-128K on long-context benchmarks.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities through pretraining and alignment. However, superior short-context LLMs may underperform in long-context scenarios due to insufficient long-context alignment. This alignment process remains challenging due to the impracticality of human annotation for extended contexts and the difficulty in balancing short- and long-context performance. To address these challenges, we introduce LongPO, that enables short-context LLMs to self-evolve to excel on long-context tasks by internally transferring short-context capabilities. LongPO harnesses LLMs to learn from self-generated short-to-long preference data, comprising paired responses generated for identical instructions with long-context inputs and their compressed short-context counterparts, respectively. This preference reveals capabilities and potentials of LLMs cultivated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DAMO-NLP-SG/LongPO
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsDirect Preference Optimization · Shrink and Fine-Tune