OPSDL: On-Policy Self-Distillation for Long-Context Language Models

Xinsen Zhang; Zhenkai Ding; Tianjun Pan; Run Yang; Chun Kang; Xue Xiong; Jingnan Gu

arXiv:2604.17535·cs.CL·April 21, 2026

OPSDL: On-Policy Self-Distillation for Long-Context Language Models

Xinsen Zhang, Zhenkai Ding, Tianjun Pan, Run Yang, Chun Kang, Xue Xiong, Jingnan Gu

PDF

TL;DR

OPSDL introduces an on-policy self-distillation technique that enhances long-context understanding in large language models by leveraging their own short-context capabilities, leading to improved performance without sacrificing short-term skills.

Contribution

This paper presents OPSDL, a novel self-distillation method that uses the model's own short-context abilities as a teacher to improve long-context performance efficiently and stably.

Findings

01

OPSDL outperforms standard post-training methods like SFT and DPO.

02

The method improves long-context performance across models from 7B to 32B parameters.

03

OPSDL maintains short-context performance while enhancing long-context understanding.

Abstract

Extending the effective context length of large language models (LLMs) remains a central challenge for real-world applications. While recent post-training methods have made progress in long-context scaling, they either rely on high-quality supervision data or sparse sequence-level rewards, leading to unstable and inefficient optimization. We propose OPSDL, an On-Policy Self-Distillation method for enhancing the Long-context capabilities of LLMs. Unlike other recent self-distillation methods that inject privileged information and rely on the model's in-context learning ability to act as a teacher, OPSDL leverages the model's own inherently strong short-context capability as a self-teacher to supervise its own generation in long-context scenarios. The model first generates responses conditioned on the full long-context, then the self-teacher provides per-token supervision signals via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.