Classifier-free Guidance with Adaptive Scaling
Dawid Malarz, Artur Kasymov, Maciej Zi\k{e}ba, Jacek Tabor,, Przemys{\l}aw Spurek

TL;DR
This paper introduces $eta$-CFG, an adaptive guidance method for diffusion models that dynamically balances image quality and prompt fidelity, improving FID scores while maintaining CLIP similarity.
Contribution
The paper proposes $eta$-CFG, a novel adaptive guidance technique that stabilizes and dynamically adjusts guidance strength during diffusion, enhancing image quality and prompt alignment.
Findings
Improved FID scores over standard CFG.
Maintained CLIP similarity comparable to reference CFG.
Effective dynamic adjustment of guidance during diffusion.
Abstract
Classifier-free guidance (CFG) is an essential mechanism in contemporary text-driven diffusion models. In practice, in controlling the impact of guidance we can see the trade-off between the quality of the generated images and correspondence to the prompt. When we use strong guidance, generated images fit the conditioned text perfectly but at the cost of their quality. Dually, we can use small guidance to generate high-quality results, but the generated images do not suit our prompt. In this paper, we present -CFG (-adaptive scaling in Classifier-Free Guidance), which controls the impact of guidance during generation to solve the above trade-off. First, -CFG stabilizes the effects of guiding by gradient-based adaptive normalization. Second, -CFG uses the family of single-modal (-distribution), time-dependent curves to dynamically adapt the trade-off…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInertial Sensor and Navigation · Advanced Measurement and Metrology Techniques · Astronomical Observations and Instrumentation
MethodsDiffusion · Contrastive Language-Image Pre-training
