Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

Yuchen Cai; Ding Cao; Liang Lin; Chunxi Luo; Xin Xu; Kai Yang; Weijie Liu; Saiyong Yang; Tianxiang Zhao; Guangzhong Sun; Guiquan Liu; Junfeng Fang

arXiv:2605.11739·cs.CL·May 22, 2026

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

Yuchen Cai, Ding Cao, Liang Lin, Chunxi Luo, Xin Xu, Kai Yang, Weijie Liu, Saiyong Yang, Tianxiang Zhao, Guangzhong Sun, Guiquan Liu, Junfeng Fang

PDF

1 Repo

TL;DR

This paper reveals that on-policy distillation's efficiency in large language models is due to its foresight in establishing stable update trajectories early, and introduces EffOPD, a method that accelerates this process.

Contribution

The paper uncovers the parameter-level mechanisms behind OPD's efficiency and proposes EffOPD, a simple, plug-and-play acceleration technique that triples training speed without extra modules.

Findings

01

OPD's efficiency is due to early stable update trajectories.

02

EffOPD accelerates OPD by 3x without extra modules.

03

OPD's dominant subspaces align with final updates early in training.

Abstract

On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, existing studies largely attribute this advantage to denser and more stable supervision, while the parameter-level mechanisms underlying OPD's efficiency remain poorly understood. In this work, we argue that OPD's efficiency stems from a form of ``foresight'': it establishes a stable update trajectory toward the final model early in training. This foresight manifests in two aspects. First, at the \textbf{Module-Allocation Level}, OPD identifies regions with low marginal utility and concentrates updates on modules that are more critical to reasoning. Second, at the \textbf{Update-Direction Level}, OPD exhibits stronger low-rank concentration, with its dominant subspaces aligning closely with the final update subspace early in training. Building on these findings, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

caiyuchen-ustc/EffOPD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.