$R_\text{dm}$: Re-conceptualizing Distribution Matching as a Reward for Diffusion Distillation
Linqian Fan, Peiqin Sun, Tiancheng Wen, Shun Lu, Chengru Song

TL;DR
This paper introduces a reward-based framework for diffusion distillation, improving stability, efficiency, and flexibility in high-quality image synthesis.
Contribution
It re-conceptualizes distribution matching as a reward, unifying diffusion distillation with reinforcement learning to enhance optimization and sampling.
Findings
GNDM reduces FID by 1.87 over vanilla DMD.
GNDMR achieves a peak HPS of 30.37 and low FID-SD of 12.21.
The framework improves sampling efficiency and supports adaptive reward integration.
Abstract
Diffusion models achieve state-of-the-art generative performance but are fundamentally bottlenecked by their slow, iterative sampling process. While diffusion distillation techniques enable high-fidelity, few-step generation, traditional objectives often restrict the student's performance by anchoring it solely to the teacher. Recent approaches have attempted to break this ceiling by integrating Reinforcement Learning (RL), typically through a simple summation of distillation and RL objectives. In this work, we propose a novel paradigm by re-conceptualizing distribution matching as a reward, denoted as . This unified perspective bridges the algorithmic gap between Diffusion Matching Distillation (DMD) and RL, providing several primary benefits. (1) Enhanced Optimization Stability: We introduce Group Normalized Distribution Matching (GNDM), which adapts standard RL group…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
