Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe

Wenjin Hou; Shangpin Peng; Weinong Wang; Zheng Ruan; Yue Zhang; Zhenglin Zhou; Mingqi Gao; Yifei Chen; Kaiqi Wang; Hongming Yang; Chengquan Zhang; Zhuotao Tian; Han Hu; Yi Yang; Fei Wu; Hehe Fan

arXiv:2605.03677·cs.LG·May 6, 2026

Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe

Wenjin Hou, Shangpin Peng, Weinong Wang, Zheng Ruan, Yue Zhang, Zhenglin Zhou, Mingqi Gao, Yifei Chen, Kaiqi Wang, Hongming Yang, Chengquan Zhang, Zhuotao Tian, Han Hu, Yi Yang, Fei Wu, Hehe Fan

PDF

1 Repo

TL;DR

Uni-OPD introduces a unified framework for on-policy distillation that enhances exploration and supervision reliability, improving model performance across diverse large language and multimodal models.

Contribution

The paper proposes Uni-OPD, a dual-perspective optimization strategy that addresses key bottlenecks in on-policy distillation for LLMs and MLLMs, demonstrating broad applicability.

Findings

01

Effective in 5 domains and 16 benchmarks.

02

Improves exploration of informative states during training.

03

Restores order consistency in teacher supervision.

Abstract

On-policy distillation (OPD) has recently emerged as an effective post-training paradigm for consolidating the capabilities of specialized expert models into a single student model. Despite its empirical success, the conditions under which OPD yields reliable improvement remain poorly understood. In this work, we identify two fundamental bottlenecks that limit effective OPD: insufficient exploration of informative states and unreliable teacher supervision for student rollouts. Building on this insight, we propose Uni-OPD, a unified OPD framework that generalizes across Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs), centered on a dual-perspective optimization strategy. Specifically, from the student's perspective, we adopt two data balancing strategies to promote exploration of informative student-generated states during training. From the teacher's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wenjinhou/Uni-OPD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.