Compress Guidance in Conditional Diffusion Sampling
Anh-Dung Dinh, Daochang Liu, Chang Xu

TL;DR
This paper introduces Compress Guidance, a method that distributes guidance over many timesteps during diffusion sampling, improving image quality and diversity while reducing guidance steps, addressing issues of model-fitting and guidance effectiveness.
Contribution
It proposes a novel guidance distribution strategy that mitigates model-fitting issues and enhances diffusion sampling efficiency and quality.
Findings
Reducing guidance at many timesteps improves image quality.
Distributing guidance over more steps reduces total guidance needed.
The method outperforms baseline models on multiple benchmarks.
Abstract
We found that enforcing guidance throughout the sampling process is often counterproductive due to the model-fitting issue, where samples are 'tuned' to match the classifier's parameters rather than generalizing the expected condition. This work identifies and quantifies the problem, demonstrating that reducing or excluding guidance at numerous timesteps can mitigate this issue. By distributing a small amount of guidance over a large number of sampling timesteps, we observe a significant improvement in image quality and diversity while also reducing the required guidance timesteps by nearly 40%. This approach addresses a major challenge in applying guidance effectively to generative tasks. Consequently, our proposed method, termed Compress Guidance, allows for the exclusion of a substantial number of guidance timesteps while still surpassing baseline models in image quality. We validate…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. The paper presents the model fitting problem of classifier guidance with experimental evidence. 2. It proposes various alternative methods to address this problem, demonstrating that the proposed compress guidance is the most effective. 3. The approach shows numerical improvements across various models and guidance scenarios.
1. Lack of analysis regarding the relationship with the ODE sampler: This method inherently requires more sampling steps to function effectively. A performance comparison based on SNR variation via the ODE sampler seems necessary.
How they set up the problem of model fitting and on-sampling/off-sampling loss can be novel, but with current presentation, it is hard to understand.
The paper seems to be written in haste. There are too many typos and grammatical errors which hurts reading. A lot of references of table and figure are mis-referenced, which adds another barrier of difficulty. A lot of quotation marks are wrong. The main idea they propose here is model fitting, but reading their definition, I can't quite get what they exactly mean by model fitting. It is too loosely defined. Also, in off-sampling loss, I don't know what phi' is and how it's obtained. With thi
Pros: - The paper addresses a significant issue: guidance in diffusion models. - The experiments are well-structured and organized.
Cons: - The motivation for compressing guidance is unclear. The paper fails to adequately demonstrate through experiments the weaknesses of uncompressed guidance, such as model-fitting issues and poor image quality. - Distributing guidance across different timesteps presents a vast search space, which is a significant challenge. - The method described in section 3.3 doesn't seem as simple as claimed in the paper's contributions. - The table format is uncomfortable to read and appears inconsiste
1. This work provides clear theoretical and experimental explanations for model-fitting. The explanation is convincing. 2. The method is simple but effective, it is not hard to implement technically. 3. This paper includes sufficient experiments including U-Net and Transformer-based Diffusion models.
1. As the key contribution and observation of this paper, the experiments of model-fitting shown in Figure 2 are not well explained (such as the details of the model you use, and the dataset setting). Moreover, the main concern is whether the conclusion from Figure 2 is still valid on different model architectures and different datasets. 2. The main assumption of this method is that the gradient of guidance should be concentrated in the early stages. Ignoring the latter stage guidance means les
- The paper is well structured, and the motivation is clear. - The comparison by employing Early Stopping and Uniform Skipping is quite intuitive and easy to follow. - The novelly proposed Compress Guidance encourages further works delving into guided sampling theory.
- There are plenty of theoretical flaws in the paper, and some conclusions are not that obvious to draw but with no clarification. 1. Proof of Thm. 1 is wrong. Forward process of diffusion model is technically a Markovian process, therefore, one cannot assume that two $\mathbf{x}_t$ at different timesteps $t_1$ and $t_2$ are diffused with the same noise $\epsilon$. Besides, noise prediction is not independent with $\epsilon$, so one cannot assume that $|\epsilon - \epsilon(\mathbf{x}_t,t)|$ ha
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurvey Sampling and Estimation Techniques · Bayesian Methods and Mixture Models · Sparse and Compressive Sensing Techniques
