Distribution Matching Distillation Meets Reinforcement Learning
Dengyang Jiang, Dongyang Liu, Zanyi Wang, Qilong Wu, Liuzhuozheng Li, Hengzhuang Li, Xin Jin, David Liu, Changsheng Lu, Zhen Li, Bo Zhang, Mengmeng Wang, Steven Hoi, Peng Gao, Harry Yang

TL;DR
This paper introduces DMDR, a unified framework combining Distribution Matching Distillation and Reinforcement Learning to improve the quality and controllability of diffusion model distillation, surpassing multi-step teacher models.
Contribution
The work presents a novel joint optimization framework that unifies DMD and RL, enhancing diffusion model distillation with mutual benefits and new training strategies.
Findings
DMDR achieves state-of-the-art visual quality in few-step generation.
It surpasses the performance of multi-step teacher models.
Joint optimization improves controllability and reduces reward hacking.
Abstract
Distribution Matching Distillation (DMD) facilitates efficient inference by distilling multi-step diffusion models into few-step variants. Concurrently, Reinforcement Learning (RL) has emerged as a vital tool for aligning generative models with human preferences. While both represent critical post-training stages for large-scale diffusion models, existing studies typically treat them as independent, sequential processes, leaving a systematic framework for their unification largely unexplored. In this work, we demonstrate that jointly optimizing these two objectives yields mutual benefits: RL enables more preference-aware and controllable distillation rather than uniformly compressing the full data distribution, while DMD serves as an effective regularizer to mitigate reward hacking during RL training. Building on these insights, we propose DMDR, a unified framework that incorporates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Tongyi-MAI/Z-Image-Turbomodel· 824k dl· ♡ 4375824k dl♡ 4375
- 🤗unsloth/Z-Image-Turbo-GGUFmodel· 39k dl· ♡ 12039k dl♡ 120
- 🤗BlackStone-Yu/Z-Image-Turbo-GGUFmodel· 1.0k dl· ♡ 21.0k dl♡ 2
- 🤗unsloth/Z-Image-Turbo-unsloth-bnb-4bitmodel· 439 dl· ♡ 5439 dl♡ 5
- 🤗mrfakename/Z-Image-Turbomodel· 70 dl· ♡ 2170 dl♡ 21
- 🤗mzbac/Z-Image-Turbo-8bitmodel· 752 dl· ♡ 4752 dl♡ 4
- 🤗vantagewithai/Z-Image-Turbo-GGUFmodel· 6.8k dl· ♡ 486.8k dl♡ 48
- 🤗not-pegasus/IMAGE_MODALmodel· 5 dl5 dl
- 🤗tsqn/Z-Image-Turbo_fp32-fp16-bf16_full_and_ema-onlymodel· 609 dl· ♡ 12609 dl♡ 12
- 🤗kp-forks/Z-Image-Turbomodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Image Enhancement Techniques
