D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

Dengyang Jiang; Xin Jin; Dongyang Liu; Zanyi Wang; Mingzhe Zheng; Ruoyi Du; Xiangpeng Yang; Qilong Wu; Zhen Li; Peng Gao; Harry Yang; Steven Hoi

arXiv:2605.05204·cs.CV·May 19, 2026

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

Dengyang Jiang, Xin Jin, Dongyang Liu, Zanyi Wang, Mingzhe Zheng, Ruoyi Du, Xiangpeng Yang, Qilong Wu, Zhen Li, Peng Gao, Harry Yang, Steven Hoi

PDF

1 Repo 1 Models

TL;DR

D-OPSD introduces an on-policy self-distillation training method for step-distilled diffusion models, allowing continuous fine-tuning without losing their efficient inference capabilities.

Contribution

The paper proposes a novel on-policy self-distillation paradigm for fine-tuning step-distilled diffusion models while preserving their few-step inference efficiency.

Findings

01

Enables models to learn new concepts without sacrificing inference speed.

02

Formulates training as an on-policy self-distillation process.

03

Leverages encoder's in-context capabilities for improved training.

Abstract

The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for direct continuous supervised fine-tuning. For example, applying the commonly used fine-tuning technique would compromise their inherent few-step inference capability. To address this, we propose D-OPSD, a novel training paradigm for step-distilled diffusion models that enables on-policy learning during supervised fine-tuning. We first find that the modern diffusion models, where the LLM/VLM serves as the encoder, can inherit its encoder's in-context capabilities. This enables us to formulate the training as an on-policy self-distillation process. Specifically, during training, we make the model act as both the teacher and the student with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vvvvvjdy/D-OPSD
github

Models

🤗
bdsqlsz/Qwen3-VL-4B-Reweight
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.