PolicyLong: Towards On-Policy Context Extension

Junlong Jia; Ziyang Chen; Xing Wu; Chaochen Gao; TingHao Yu; Feng Zhang; Songlin Hu

arXiv:2604.07809·cs.LG·April 10, 2026

PolicyLong: Towards On-Policy Context Extension

Junlong Jia, Ziyang Chen, Xing Wu, Chaochen Gao, TingHao Yu, Feng Zhang, Songlin Hu

PDF

TL;DR

PolicyLong introduces an on-policy data construction method for extending LLM context windows, dynamically adapting to the model's evolving capabilities to improve long-context learning.

Contribution

It proposes a novel iterative on-policy data screening approach that aligns training data with the model's current state, enhancing long-context performance.

Findings

01

PolicyLong outperforms EntropyLong and NExtLong across multiple benchmarks.

02

Significant gains observed at longer context lengths, e.g., +2.54 at 128K on RULER.

03

Dynamic on-policy data evolution improves model's ability to handle long contexts.

Abstract

Extending LLM context windows is hindered by scarce high-quality long-context data. Recent methods synthesize data with genuine long-range dependencies via information-theoretic verification, selecting contexts that reduce a base model's predictive entropy. However, their single-pass offline construction with a fixed model creates a fundamental off-policy gap: the static screening landscape misaligns with the model's evolving capabilities, causing the training distribution to drift. We propose PolicyLong, shifting data construction towards a dynamic on-policy paradigm. By iteratively re-executing data screening (entropy computation, retrieval, and verification) using the current model, PolicyLong ensures the training distribution tracks evolving capabilities, yielding an emergent self-curriculum. Crucially, both positive and hard negative contexts derive from the current model's entropy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.