Diffusion Models without Classifier-free Guidance
Zhicong Tang, Jianmin Bao, Dong Chen, Baining Guo

TL;DR
This paper introduces Model-guidance, a new training objective for diffusion models that eliminates the need for classifier-free guidance, leading to faster training, higher inference speed, and state-of-the-art image quality.
Contribution
It proposes Model-guidance, a novel, plug-and-play training method that improves diffusion models by removing classifier-free guidance, enhancing efficiency and performance.
Findings
Doubles inference speed compared to CFG-based models
Achieves state-of-the-art FID of 1.34 on ImageNet 256
Demonstrates scalability across different datasets and models
Abstract
This paper presents Model-guidance (MG), a novel objective for training diffusion model that addresses and removes of the commonly used Classifier-free guidance (CFG). Our innovative approach transcends the standard modeling of solely data distribution to incorporating the posterior probability of conditions. The proposed technique originates from the idea of CFG and is easy yet effective, making it a plug-and-play module for existing models. Our method significantly accelerates the training process, doubles the inference speed, and achieve exceptional quality that parallel and even surpass concurrent diffusion models with CFG. Extensive experiments demonstrate the effectiveness, efficiency, scalability on different models and datasets. Finally, we establish state-of-the-art performance on ImageNet 256 benchmarks with an FID of 1.34. Our code is available at…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. This paper introduces Model-Guidance (MG), a novel training objective that replaces the traditional Classifier-Free Guidance (CFG) by directly modeling conditional posteriors within diffusion models, thereby unifying conditional and unconditional learning. 2. The proposed method eliminates the need for dual forward passes during inference, doubling generation efficiency while maintaining or improving sample quality. 3. The paper provides a theoretical derivation of MG based on Bayes’ rule and
1. From a theoretical perspective, the derivation of the target term in Equation (14) lacks rigor. It is unclear why the first term uses the ground-truth noise ($\epsilon$) instead of the model prediction ($\epsilon_\theta$). The paper does not specify under what assumptions this formulation holds, nor whether it conflicts with the original derivation of Classifier-Free Guidance (CFG). 2. Although the authors employ a stop-gradient mechanism to prevent model collapse, there is no theoretical an
1. The paper is generally well-written and easy to follow. 2. Reducing inference time in diffusion sampling is an important problem, and the proposed method offers a practical approach to halving inference time when classifier-free guidance (CFG) is applied. 3. Although the method is relatively straightforward, the empirical results demonstrate its effectiveness.
1. While the proposed methods show some promising results, their novelty appears limited. In essence, the method introduces an additional network to record the drift produced by CFG, thereby saving one forward pass during inference. 2. In the introduction, the authors mention several drawbacks of CFG, including the simultaneous modeling of both unconditional and conditional tasks during inference. However, it is unclear how the proposed approach addresses these issues. From my understanding, in
The paper addresses a highly relevant and practical problem in diffusion models: the computational cost and potential distributional issues of CFG. The experimental results are comprehensive, showing strong performance gains, particularly on ImageNet, where the method achieves state-of-the-art FID scores. The practical benefit of a 2x speedup in inference is a major potential advantage. Furthermore, the authors provide extensive ablation studies and the method is presented as being simple to imp
Despite the promising results and appealing concept, the paper has critical weaknesses that undermine its contribution and practicality. Lack of Theoretical Depth and Justification: The theoretical foundation for MG is underdeveloped. The paper presents the loss function as a given, without a rigorous analysis of its convergence properties or the precise distribution it models. The mechanism is presented as a heuristic, and the "self-improvement cycle" lacks a solid theoretical explanation. A m
- The paper is well structured and has reasonably coherent narrative. - The experiment section is elaborate, and cover different settings and evaluations. Results are promising.
- Paper definitely benefits from another round of thorough proof-read; there are typos and grammatical error across the document. This has to be improved during the rebuttal. - I think the justification behind the mathematical derivations in Section 3.1 might not be well established. More specifically, where does Eq (7) comes from? That's not classifier guidance (as posterior should be modeled as $p_\phi$ not $\theta$) and the first term on the right should not be $p(x_t|c)$ but $p(x_t)$. At th
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Control Multi-Agent Systems · Mathematical Biology Tumor Growth · Guidance and Control Systems
MethodsDiffusion
