Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
Seyedmorteza Sadat, Otmar Hilliges, Romann M. Weber

TL;DR
This paper proposes an adaptive guidance method for diffusion models that reduces oversaturation caused by high guidance scales, improving image quality and realism without extra computational cost.
Contribution
It introduces a novel decomposition of guidance updates and a rescaling technique, enabling higher guidance scales without oversaturation in diffusion model sampling.
Findings
Improved FID and recall scores across various models.
Reduced oversaturation and artifacts in generated images.
Maintained precision comparable to standard guidance.
Abstract
Classifier-free guidance (CFG) is crucial for improving both generation quality and alignment between the input condition and final output in diffusion models. While a high guidance scale is generally required to enhance these aspects, it also causes oversaturation and unrealistic artifacts. In this paper, we revisit the CFG update rule and introduce modifications to address this issue. We first decompose the update term in CFG into parallel and orthogonal components with respect to the conditional model prediction and observe that the parallel component primarily causes oversaturation, while the orthogonal component enhances image quality. Accordingly, we propose down-weighting the parallel component to achieve high-quality generations without oversaturation. Additionally, we draw a connection between CFG and gradient ascent and introduce a new rescaling and momentum method for the CFG…
Peer Reviews
Decision·ICLR 2025 Poster
- The intuition and theory are well-balanced. - The results are quite promising. Their guidance does not drastically alter the image content, which aligns well with their claims. - They demonstrate robustness across various types of diffusion models.
- I do not view this as a straightforward rescaling of CFG. However, it might be open to interpretation, and I would be interested to hear the authors’ perspective on this. - Which model was used for Figure 8? - Further analysis on the rescale weight would be helpful—does it show any trends with respect to t? - *(As a suggestion)* Including illustrations of the geometry, along with the related intuitions, would further enhance the clarity.
1. The paper introduces a new method, APG, that addresses a significant problem in diffusion models—oversaturation and artifacts associated with high guidance scales. The approach is innovative and directly tackles a well-known issue in the field. 2. The authors provide extensive experimental results to validate their claims. They demonstrate the effectiveness of APG across various models and samplers, showing improvements in FID, recall, and saturation scores. 3. The paper provides a clear comp
1. The final example generated by APG in Figure 10 has a logical error, where the cat has only one paw. 2. For Table 1, the authors should consider adding HPSv2 and ImageReward metrics to evaluate the generation performance. 3. For Table 1, why is the guidance scale of Stable Diffusion XL so large (w=15)? When reducing the guidance scale, are the performance improvements limited? The authors should provide more quantitative results. 4. The authors claim that the orthogonal component is chiefly r
1. APG is presented as a plug-and-play method with virtually no additional computational overhead, making it attractive for practical use in various diffusion models. 2. The method of down-weighting the parallel component in CFG is novel and interesting, effectively reducing oversaturation and preserving image quality at high guidance scales.
1. In Figure 8, could the authors explain why the proposed APG shows lower precision and higher FID compared to CFG when the guidance scale w is below certain values (e.g., w = 2.5 , where APG precision is lower than CFG)? 2. While reverse momentum is introduced, the intuition behind why it improves the image generation quality could be expanded. More detailed theoretical or experimental exploration of this component would clarify its importance.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAerospace and Aviation Technology · Inertial Sensor and Navigation · Infrared Target Detection Methodologies
MethodsDiffusion
