Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning
Guanjie Chen, Shirui Huang, Kai Liu, Jianchen Zhu, Xiaoye Qu, Peng Chen, Yu Cheng, Yifu Sun

TL;DR
Flash-DMD introduces a fast, stable, and high-quality image generation framework that combines efficient timestep-aware distillation with joint reinforcement learning, significantly reducing training costs and improving generation fidelity.
Contribution
It proposes a novel distillation strategy and joint RL training scheme that enhance efficiency, stability, and quality in few-step diffusion-based image generation.
Findings
Outperforms DMD2 with only 2.1% of its training cost.
Achieves state-of-the-art quality in few-step sampling.
Demonstrates improved visual quality and human preference metrics.
Abstract
Diffusion Models have emerged as a leading class of generative models, yet their iterative sampling process remains computationally expensive. Timestep distillation is a promising technique to accelerate generation, but it often requires extensive training and leads to image quality degradation. Furthermore, fine-tuning these distilled models for specific objectives, such as aesthetic appeal or user preference, using Reinforcement Learning (RL) is notoriously unstable and easily falls into reward hacking. In this work, we introduce Flash-DMD, a novel framework that enables fast convergence with distillation and joint RL-based refinement. Specifically, we first propose an efficient timestep-aware distillation strategy that significantly reduces training cost with enhanced realism, outperforming DMD2 with only its training cost. Second, we introduce a joint training scheme where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Visual Attention and Saliency Detection
