Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield

Dongyang Liu; Peng Gao; David Liu; Ruoyi Du; Zhen Li; Qilong Wu; Xin Jin; Sihan Cao; Shifeng Zhang; Hongsheng Li; Steven Hoi

arXiv:2511.22677·cs.CV·December 1, 2025

Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield

Dongyang Liu, Peng Gao, David Liu, Ruoyi Du, Zhen Li, Qilong Wu, Xin Jin, Sihan Cao, Shifeng Zhang, Hongsheng Li, Steven Hoi

PDF

Open Access 10 Models 3 Reviews

TL;DR

This paper reexamines diffusion model distillation, revealing that CFG Augmentation, not distribution matching, is the main driver of few-step performance, leading to improved methods and practical applications.

Contribution

It uncovers the primary role of CFG Augmentation in diffusion distillation, decouples it from distribution matching, and proposes principled modifications for better performance.

Findings

01

CFG Augmentation is the core engine of distillation.

02

Distribution Matching acts mainly as a regularizer.

03

Decoupling noise schedules improves performance.

Abstract

Diffusion model distillation has emerged as a powerful technique for creating efficient few-step and single-step generators. Among these, Distribution Matching Distillation (DMD) and its variants stand out for their impressive performance, which is widely attributed to their core mechanism of matching the student's output distribution to that of a pre-trained teacher model. In this work, we challenge this conventional understanding. Through a rigorous decomposition of the DMD training objective, we reveal that in complex tasks like text-to-image generation, where CFG is typically required for desirable few-step performance, the primary driver of few-step distillation is not distribution matching, but a previously overlooked component we identify as CFG Augmentation (CA). We demonstrate that this term acts as the core ``engine'' of distillation, while the Distribution Matching (DM) term…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

* Provides a timely and insightful analysis of the functional roles of DMD’s two loss terms, addressing the open question of why DMD excels in few-step or one-step generation. * The authors design careful and hypothesis-driven experiments to isolate and test the contribution of each loss term, leading to well-supported conclusions. * Based on these insights, the paper proposes using distinct $\tau$ values for the two terms, leading to measurable performance gains.

Weaknesses

Most experiments rely primarily on qualitative evaluation (visual inspection of generated images). While visualization is valuable for illustrating effects, heavy reliance on qualitative judgments risks confirmation bias—highlighting supportive examples while overlooking contradictory ones. A more scientifically rigorous approach would involve defining quantitative metrics and validating observations across the entire test set, to ensure statistical robustness and reproducibility.

Reviewer 02Rating 8Confidence 4

Strengths

1. This paper identifies a discrepancy between theory and practice in DMD that CFG is only used in the teacher model but not the student model. This is an interesting observation and a natural motivation for this important research topic. 2. The decomposition of the DMD loss into the DM and CA terms provide novel and valuable insights towards a better and principled understanding of the underlying mechanism of DMD. 3. The arguments and hypotheses in the paper are supported by extensive experimen

Weaknesses

Overall, I like the paper very much. My only concern is the paper's claim about the CA term being the engine for DMD, which is a bit strong to me. For example, early DMD papers achieved great distillation performance on unconditional generation for CIFAR images, which is not discussed or explored in this paper.

Reviewer 03Rating 4Confidence 3

Strengths

I really like this research topic and believe the distribution-matching distillation is an under-explored topic, and only from a divergence perspective, it can't answer why it works or why it doesn't work in some scenarios, so I think the topic of this paper is very valuable. The experiments are also sound, which can support the argument.

Weaknesses

My major concern with this paper is that I found the conclusion a little bit conclusive. The argument is CFG Augmentation is the engine for dilatation, and Distribution Matching is the regularizer for stability. However, many CIFAR experiments don't use label-conditioned and can achieve one-step distillation, e.g. the original diff-intruct paper or more recent paper: https://arxiv.org/pdf/2502.08005. In this case, the pure driven engine is only the distribution matching term, which couldn't

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques