OPSDL: On-Policy Self-Distillation for Long-Context Language Models
Xinsen Zhang, Zhenkai Ding, Tianjun Pan, Run Yang, Chun Kang, Xue Xiong, Jingnan Gu

TL;DR
OPSDL introduces an on-policy self-distillation technique that enhances long-context understanding in large language models by leveraging their own short-context capabilities, leading to improved performance without sacrificing short-term skills.
Contribution
This paper presents OPSDL, a novel self-distillation method that uses the model's own short-context abilities as a teacher to improve long-context performance efficiently and stably.
Findings
OPSDL outperforms standard post-training methods like SFT and DPO.
The method improves long-context performance across models from 7B to 32B parameters.
OPSDL maintains short-context performance while enhancing long-context understanding.
Abstract
Extending the effective context length of large language models (LLMs) remains a central challenge for real-world applications. While recent post-training methods have made progress in long-context scaling, they either rely on high-quality supervision data or sparse sequence-level rewards, leading to unstable and inefficient optimization. We propose OPSDL, an On-Policy Self-Distillation method for enhancing the Long-context capabilities of LLMs. Unlike other recent self-distillation methods that inject privileged information and rely on the model's in-context learning ability to act as a teacher, OPSDL leverages the model's own inherently strong short-context capability as a self-teacher to supervise its own generation in long-context scenarios. The model first generates responses conditioned on the full long-context, then the self-teacher provides per-token supervision signals via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
