Near-Policy: Accelerating On-Policy Distillation via Asynchronous Generation and Selective Packing

Miao Rang; Zhenni Bi; Hang Zhou; Kai Han; Xuechun Wang; An Xiao; Xinghao Chen; Yunhe Wang; Hanting Chen

arXiv:2605.05940·cs.LG·May 8, 2026

Near-Policy: Accelerating On-Policy Distillation via Asynchronous Generation and Selective Packing

Miao Rang, Zhenni Bi, Hang Zhou, Kai Han, Xuechun Wang, An Xiao, Xinghao Chen, Yunhe Wang, Hanting Chen

PDF

TL;DR

Near-Policy Distillation (NPD) accelerates on-policy model training by decoupling generation and training, using selective filtering and sparse updates to maintain stability and improve efficiency, achieving significant speedups and performance gains.

Contribution

The paper introduces NPD, an asynchronous distillation framework with novel filtering and update strategies that enhance efficiency and stability in on-policy autoregressive model training.

Findings

01

Achieves 8.1x speedup over on-policy baselines.

02

Outperforms supervised fine-tuning by 8.09%.

03

Enables openPangu-Embedded-1B to reach a state-of-the-art score of 68.73%.

Abstract

Standard knowledge distillation for autoregressive models often suffers from distribution mismatch. While on-policy methods mitigate this by leveraging student-generated outputs, they rely on computationally expensive Reinforcement Learning (RL) frameworks. To improve efficiency, we propose Near-Policy Distillation (NPD), an asynchronous approach that decouples student generation from training. This reformulation enables Supervised Fine-Tuning (SFT) with sequence packing. However, asynchronous updates inevitably introduce policy lag and sample noise, which can cause the behavior to drift from near-policy toward off-policy. To counteract this without sacrificing efficiency, NPD integrates sparse student updates and the $Δ$ -IFD filtering mechanism, a heuristic sample selection mechanism that empirically stabilizes the optimization trajectory. By filtering extreme out-of-distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.