Distribution Matching Distillation Meets Reinforcement Learning

Dengyang Jiang; Dongyang Liu; Zanyi Wang; Qilong Wu; Liuzhuozheng Li; Hengzhuang Li; Xin Jin; David Liu; Changsheng Lu; Zhen Li; Bo Zhang; Mengmeng Wang; Steven Hoi; Peng Gao; Harry Yang

arXiv:2511.13649·cs.CV·March 26, 2026

Distribution Matching Distillation Meets Reinforcement Learning

Dengyang Jiang, Dongyang Liu, Zanyi Wang, Qilong Wu, Liuzhuozheng Li, Hengzhuang Li, Xin Jin, David Liu, Changsheng Lu, Zhen Li, Bo Zhang, Mengmeng Wang, Steven Hoi, Peng Gao, Harry Yang

PDF

Open Access 10 Models

TL;DR

This paper introduces DMDR, a unified framework combining Distribution Matching Distillation and Reinforcement Learning to improve the quality and controllability of diffusion model distillation, surpassing multi-step teacher models.

Contribution

The work presents a novel joint optimization framework that unifies DMD and RL, enhancing diffusion model distillation with mutual benefits and new training strategies.

Findings

01

DMDR achieves state-of-the-art visual quality in few-step generation.

02

It surpasses the performance of multi-step teacher models.

03

Joint optimization improves controllability and reduces reward hacking.

Abstract

Distribution Matching Distillation (DMD) facilitates efficient inference by distilling multi-step diffusion models into few-step variants. Concurrently, Reinforcement Learning (RL) has emerged as a vital tool for aligning generative models with human preferences. While both represent critical post-training stages for large-scale diffusion models, existing studies typically treat them as independent, sequential processes, leaving a systematic framework for their unification largely unexplored. In this work, we demonstrate that jointly optimizing these two objectives yields mutual benefits: RL enables more preference-aware and controllable distillation rather than uniformly compressing the full data distribution, while DMD serves as an effective regularizer to mitigate reward hacking during RL training. Building on these insights, we propose DMDR, a unified framework that incorporates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Image Enhancement Techniques