TL;DR
D-OPSD introduces an on-policy self-distillation training method for step-distilled diffusion models, allowing continuous fine-tuning without losing their efficient inference capabilities.
Contribution
The paper proposes a novel on-policy self-distillation paradigm for fine-tuning step-distilled diffusion models while preserving their few-step inference efficiency.
Findings
Enables models to learn new concepts without sacrificing inference speed.
Formulates training as an on-policy self-distillation process.
Leverages encoder's in-context capabilities for improved training.
Abstract
The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for direct continuous supervised fine-tuning. For example, applying the commonly used fine-tuning technique would compromise their inherent few-step inference capability. To address this, we propose D-OPSD, a novel training paradigm for step-distilled diffusion models that enables on-policy learning during supervised fine-tuning. We first find that the modern diffusion models, where the LLM/VLM serves as the encoder, can inherit its encoder's in-context capabilities. This enables us to formulate the training as an on-policy self-distillation process. Specifically, during training, we make the model act as both the teacher and the student with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
