TL;DR
This paper identifies a bias in diffusion probabilistic models related to SNR and timestep during inference, provides empirical and theoretical analysis, and proposes a correction method that improves generation quality with minimal overhead.
Contribution
It uncovers the SNR-t bias in diffusion models, analyzes its causes, and introduces a frequency-based differential correction technique to mitigate this bias.
Findings
The SNR-t bias causes error accumulation during inference.
The proposed correction method significantly improves model performance.
The approach works across various diffusion models and datasets.
Abstract
Diffusion Probabilistic Models have demonstrated remarkable performance across a wide range of generative tasks. However, we have observed that these models often suffer from a Signal-to-Noise Ratio-timestep (SNR-t) bias. This bias refers to the misalignment between the SNR of the denoising sample and its corresponding timestep during the inference phase. Specifically, during training, the SNR of a sample is strictly coupled with its timestep. However, this correspondence is disrupted during inference, leading to error accumulation and impairing the generation quality. We provide comprehensive empirical evidence and theoretical analysis to substantiate this phenomenon and propose a simple yet effective differential correction method to mitigate the SNR-t bias. Recognizing that diffusion models typically reconstruct low-frequency components before focusing on high-frequency details…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
