Rectified Diffusion Guidance for Conditional Generation
Mengfei Xia, Nan Xue, Yujun Shen, Ran Yi, Tieliang Gong, Yong-Jin Liu

TL;DR
This paper introduces ReCFG, a theoretically grounded modification to classifier-free guidance that corrects expectation shifts and aligns denoising with diffusion theory, improving the reliability of conditional diffusion models without retraining.
Contribution
The authors propose ReCFG, a rectified guidance method with closed-form coefficients that ensures proper theoretical alignment and can be applied post-hoc to existing diffusion models.
Findings
ReCFG corrects expectation shifts in guidance coefficients.
ReCFG maintains sampling speed and compatibility with state-of-the-art models.
Empirical results show improved generative quality without retraining.
Abstract
Classifier-Free Guidance (CFG), which combines the conditional and unconditional score functions with two coefficients summing to one, serves as a practical technique for diffusion model sampling. Theoretically, however, denoising with CFG \textit{cannot} be expressed as a reciprocal diffusion process, which may consequently leave some hidden risks during use. In this work, we revisit the theory behind CFG and rigorously confirm that the improper configuration of the combination coefficients (\textit{i.e.}, the widely used summing-to-one version) brings about expectation shift of the generative distribution. To rectify this issue, we propose ReCFG with a relaxation on the guidance coefficients such that denoising with \method strictly aligns with the diffusion theory. We further show that our approach enjoys a \textbf{\textit{closed-form}} solution given the guidance strength. That way,…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
The paper provides a rigorous analysis of a known issue with CFG, quantifying expectation shift and proposing ReCFG as a theoretically grounded alternative. This analysis contributes to understanding the mathematical foundation of guided sampling in diffusion models. The resulting ReCFG is designed as a post-hoc solution that can be applied to pretrained models without the need for additional training. The experiments on the toy data is helpful for understanding the method and subsequent real d
My main concern is the scope of the experiment and demonstrated experimental improvements. CFG has been a very important component in most AIGC model. Even GPT-type sequential image modeling uses CFG for better performance. Targeting such a component, the current work doesn't demonstrate enough practical impact in more complex generative tasks such as text to image generation. Performance-wise, the empirical gain measured by the FID or CLIP scores in ImageNet and CC12M are not that significant.
1. The paper provides a theoretical analysis, identifying a flaw in the popular CFG technique and proposing a mathematically grounded solution, also provides rigorous proof of the benefits of ReCFG. 2. The introduction of ReCFG offers a novel solution to the expectation shift problem associated with CFG in diffusion models.
1. Since it is mainly a theoretical paper, I think it would be beneficial to add more explanation after Theorem 3, while probably reduce the length of Theorem 2. 2. If I understand correctly, the method needs to solve (28) for each time step t, which may increase the computation. 3. Although it is a theoretical paper, it would be better to add more experiment results, e.g., some illustrations of generated images.
- The paper is well written - The proposed approach is novel - The paper is well illustrated with intuitive theorems
- The proposed method doesn't give a closed form expression but an estimator - The method is computationally intensive
The analysis of CFG is extensive.
* The proposed method appears numerically unstable due to the minute coefficient deviation from the default value of 1.0, casting doubt on the experiment's reliability. Furthermore, this slight difference in coefficients may indicate that the violation of the zero-mean assumption in the original CFG has negligible practical implications. Please consider conducting an ablation study showing how results change with small perturbations to the coefficients. * The merits of unbiasness scarfices the
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum chaos and dynamical systems
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion
