DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models

Quanhao Li; Junqiu Yu; Kaixun Jiang; Yujie Wei; Zhen Xing; Pandeng Li; Ruihang Chu; Shiwei Zhang; Yu Liu; Zuxuan Wu

arXiv:2605.15055·cs.LG·May 15, 2026

DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models

Quanhao Li, Junqiu Yu, Kaixun Jiang, Yujie Wei, Zhen Xing, Pandeng Li, Ruihang Chu, Shiwei Zhang, Yu Liu, Zuxuan Wu

PDF

TL;DR

DiffusionOPD introduces a multi-task diffusion model training method using online policy distillation, effectively combining task-specific knowledge into a unified model with improved efficiency and performance.

Contribution

The paper presents a novel multi-task training paradigm for diffusion models based on online policy distillation, extending OPD to continuous Markov processes and demonstrating superior results.

Findings

01

DiffusionOPD outperforms multi-reward RL and cascade RL baselines in efficiency and performance.

02

Theoretical derivation of a closed-form KL objective unifies SDE and ODE refinements.

03

Empirical results achieve state-of-the-art benchmarks across multiple tasks.

Abstract

Reinforcement learning has emerged as a powerful tool for improving diffusion-based text-to-image models, but existing methods are largely limited to single-task optimization. Extending RL to multiple tasks is challenging: joint optimization suffers from cross-task interference and imbalance, while cascade RL is cumbersome and prone to catastrophic forgetting. We propose DiffusionOPD, a new multi-task training paradigm for diffusion models based on Online Policy Distillation (OPD). DiffusionOPD first trains task-specific teachers independently, then distills their capabilities into a unified student along the student own rollout trajectories. This decouples single-task exploration from multi-task integration and avoids the optimization burden of solving all tasks jointly from scratch. Theoretically, we lift the OPD framework from discrete tokens to continuous-state Markov processes,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.