Guiding a Diffusion Model with a Bad Version of Itself
Tero Karras, Miika Aittala, Tuomas Kynk\"a\"anniemi, Jaakko Lehtinen,, Timo Aila, Samuli Laine

TL;DR
This paper introduces a novel guidance method for diffusion models that uses a less-trained version of the model itself, enabling disentangled control over image quality and variation, and achieving state-of-the-art results on ImageNet.
Contribution
It demonstrates that guiding with a weaker model version allows independent control of image quality and diversity, improving generation results without sacrificing variation.
Findings
Achieved record FID scores of 1.01 and 1.25 on ImageNet at different resolutions.
Guidance with a weaker model improves image quality in unconditional diffusion models.
The method is effective using publicly available networks.
Abstract
The primary axes of interest in image-generating diffusion models are image quality, the amount of variation in the results, and how well the results align with a given condition, e.g., a class label or a text prompt. The popular classifier-free guidance approach uses an unconditional model to guide a conditional model, leading to simultaneously better prompt alignment and higher-quality images at the cost of reduced variation. These effects seem inherently entangled, and thus hard to control. We make the surprising observation that it is possible to obtain disentangled control over image quality without compromising the amount of variation by guiding generation using a smaller, less-trained version of the model itself rather than an unconditional model. This leads to significant improvements in ImageNet generation, setting record FIDs of 1.01 for 64x64 and 1.25 for 512x512, using…
Peer Reviews
Decision·NeurIPS 2024 oral
- The paper studies an important topic. Since CFG is widely used in current diffusion models, overcoming its shortcomings will have a noticeable impact in the future. - The method is well-motivated through controlled experiments that shed light on the behavior of CFG and how autoguidance improves it in those aspects. - The experiments are well-organized and clearly demonstrate the impact of different components in autoguidance. - The paper is well-written and enjoyable to read.
- More visual examples are needed to show how the diversity of generations changes as the guidance scale increases. In the final version, please include a batch of examples with a fixed condition and compare the sampling with CFG and autoguidance to better demonstrate the disentanglement between image quality and diversity in autoguidance. - The method is not readily applicable to pretrained diffusion models such as Stable Diffusion. This might limit the current use cases of autoguidance. Howev
Code & Models
Videos
Taxonomy
TopicsAdvanced Neuroimaging Techniques and Applications · Generative Adversarial Networks and Image Synthesis · Medical Image Segmentation Techniques
MethodsALIGN · Diffusion
