Guiding Distribution Matching Distillation with Gradient-Based Reinforcement Learning
Linwei Dong, Ruoyu Guo, Ge Bai, Zehuan Yuan, Yawei Luo, Changqing Zou

TL;DR
GDMD introduces a gradient-based reinforcement learning framework for diffusion distillation, enhancing few-step generation quality by aligning RL rewards with distillation gradients.
Contribution
It redefines reward mechanisms in diffusion distillation, enabling direct evaluation of distillation updates and improving generation quality with fewer steps.
Findings
GDMD achieves state-of-the-art results in few-step generation.
4-step models outperform multi-step teachers in quality metrics.
GDMD significantly exceeds previous DMDR results in GenEval and human preferences.
Abstract
Diffusion distillation, exemplified by Distribution Matching Distillation (DMD), has shown great promise in few-step generation but often sacrifices quality for sampling speed. While integrating Reinforcement Learning (RL) into distillation offers potential, a naive fusion of these two objectives relies on suboptimal raw sample evaluation. This sample-based scoring creates inherent conflicts with the distillation trajectory and produces unreliable rewards due to the noisy nature of early-stage generation. To overcome these limitations, we propose GDMD, a novel framework that redefines the reward mechanism by prioritizing distillation gradients over raw pixel outputs as the primary signal for optimization. By reinterpreting the DMD gradients as implicit target tensors, our framework enables existing reward models to directly evaluate the quality of distillation updates. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
