Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales
Seyedmorteza Sadat, Tobias Vontobel, Farnood Salehi, Romann M. Weber

TL;DR
This paper introduces a frequency domain perspective on classifier-free guidance in diffusion models, revealing distinct roles of frequency components and proposing a frequency-decoupled guidance method to improve image quality and diversity.
Contribution
The paper presents a novel frequency domain analysis of CFG and introduces FDG, a new guidance method that applies separate guidance strengths to different frequency components.
Findings
FDG improves sample fidelity at low guidance scales.
FDG maintains diversity and enhances FID and recall.
Frequency analysis clarifies CFG's effects on image quality.
Abstract
Classifier-free guidance (CFG) has become an essential component of modern conditional diffusion models. Although highly effective in practice, the underlying mechanisms by which CFG enhances quality, detail, and prompt alignment are not fully understood. We present a novel perspective on CFG by analyzing its effects in the frequency domain, showing that low and high frequencies have distinct impacts on generation quality. Specifically, low-frequency guidance governs global structure and condition alignment, while high-frequency guidance mainly enhances visual fidelity. However, applying a uniform scale across all frequencies -- as is done in standard CFG -- leads to oversaturation and reduced diversity at high scales and degraded visual quality at low scales. Based on these insights, we propose frequency-decoupled guidance (FDG), an effective approach that decomposes CFG into low- and…
Peer Reviews
Decision·Submitted to ICLR 2026
1. I like the presentation of the paper. The motivation, figures, and explanations are clear. 2. The empirical results are suggesting that the proposed method significantly improves the standard classifier-free guidance 3. I appreciate the authors take on explaining the performance of autoguidance and guidance in limited time intervals. Seems convincing to me.
The biggest weakness of the paper, in my opinion, is around the question of whether the comparison with CFG is fair. The authors say (line 310): "hyperparameters used for each experiment are given in Appendix D", and they are provided in Table 12. However, the procedure for selecting these hyperparameters: "the guidance scales are selected in the same way practitioners typically choose CFG values for a model, i.e., generating a few samples and visually inspecting them" makes me doubt that the co
- This paper tackles one of the key issues in CFG, saturation at high guidance scales, and provides a principled way to mitigate it. - Unlike other guidance methods that require multiple model predictions (often 2×2 = 4 predictions) to combine different forms of guidance, the proposed approach enhances sample quality without additional model calls, maintaining computational efficiency. - The use of a lightweight frequency-domain decomposition (via Laplacian transform) makes the approach conceptu
- Lack of proper citation. In the Introduction (L084-L098), the discussion lacks references, even though prior works have already analyzed diffusion processes in the frequency domain, such as FreeU (Si et al., 2024) and [a]. These should be cited appropriately to situate this work in the existing literature. - Ambiguity in frequency computation. It is unclear how the frequency components are calculated. Since intermediate predictions are inherently noisy, it should be explicitly stated whether t
- **Simplicity**: FDG is a plug-and-play and training-free method that only introduces a few additional lines of code with negligible computational cost. This makes FDG a versatile method for general conditional diffusion models. - **Comprehensive Evaluation**: The authors evaluate FDG's compatibility with various base diffusion models (EDM, SD, etc.), various samplers (DDIM, DPM++, etc.), distilled models (SDXL-Lightning), and other guidance-improvement techniques (CADS, APG, FreeU), showing
- **Discrepancy in FID Reporting**: There appears to be a mismatch between the reported baseline CFG FIDs for models in Table 1 and the FIDs commonly reported in their respective original papers, like EDM2-S (9.77 vs. 2.23), EDM2-XXL (8.65 vs. 1.81), and DiT-XL/2 (9.31 vs. 2.27). While I'm not comparing all the other models, it seems inconsistency of FIDs may stem from the arbitrary guidance scale (e.g., DiT-XL utilizes cfg=1.5 for the best FID, but the paper reports FID with cfg=2.0 according t
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUltrasound Imaging and Elastography
