On-Policy Context Distillation for Language Models
Tianzhu Ye, Li Dong, Xun Wu, Shaohan Huang, Furu Wei

TL;DR
This paper introduces On-Policy Context Distillation (OPCD), a novel framework that trains language models to internalize knowledge from their own generated trajectories, improving performance and knowledge retention.
Contribution
The paper proposes OPCD, a new method combining on-policy distillation with context distillation, enabling models to learn from their own outputs and transfer knowledge across different model sizes.
Findings
OPCD outperforms baseline methods in mathematical reasoning and domain tasks.
Models trained with OPCD better preserve out-of-distribution capabilities.
Cross-size distillation enables smaller models to learn from larger teachers.
Abstract
Context distillation enables language models to internalize in-context knowledge into their parameters. In our work, we propose On-Policy Context Distillation (OPCD), a framework that bridges on-policy distillation with context distillation by training a student model on its own generated trajectories while minimizing reverse Kullback-Leibler divergence against a context-conditioned teacher. We demonstrate the effectiveness of OPCD on two important applications: experiential knowledge distillation, where models extract and consolidate transferable knowledge from their historical solution traces, and system prompt distillation, where models internalize beneficial behaviors encoded in optimized prompts. Across mathematical reasoning, text-based games, and domain-specific tasks, OPCD consistently outperforms baseline methods, achieving higher task accuracy while better preserving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning
