Elucidating the Exposure Bias in Diffusion Models
Mang Ning, Mingxiao Li, Jianlin Su, Albert Ali Salah, Itir Onal, Ertugrul

TL;DR
This paper investigates the exposure bias in diffusion models, analytically models its causes, and proposes a simple training-free method called Epsilon Scaling to reduce this bias, improving sampling quality.
Contribution
It provides a systematic analysis of exposure bias in diffusion models and introduces Epsilon Scaling, a novel, effective, training-free technique to mitigate this bias during sampling.
Findings
Epsilon Scaling reduces exposure bias by scaling down network outputs.
The method improves sampling quality across various diffusion frameworks.
State-of-the-art results achieved on CIFAR-10 with fewer sampling steps.
Abstract
Diffusion models have demonstrated impressive generative capabilities, but their \textit{exposure bias} problem, described as the input mismatch between training and sampling, lacks in-depth exploration. In this paper, we systematically investigate the exposure bias problem in diffusion models by first analytically modelling the sampling distribution, based on which we then attribute the prediction error at each sampling step as the root cause of the exposure bias issue. Furthermore, we discuss potential solutions to this issue and propose an intuitive metric for it. Along with the elucidation of exposure bias, we propose a simple, yet effective, training-free method called Epsilon Scaling to alleviate the exposure bias. We show that Epsilon Scaling explicitly moves the sampling trajectory closer to the vector field learned in the training phase by scaling down the network output,…
Peer Reviews
Decision·ICLR 2024 poster
* The background and the motivating section is well clarified, by first notifying the necessity of calibrating the expected noise term in the diffusion sampling, and then compare the existing method to the newly proposed method. * To the best of our knowledge, this paper is the first training-free method that calibrates the distribution drift (e.g. exposure bias), by modifying the neural network output with some dataset statistics. * The experimental section showed that this training-free method
* Although this method is introduced as a simulation-free method, it is not completely simulation-free; in order to determine $\lambda_t$ for each timestep, one can compute the dataset statistics with respect to all intermediate trajectory particle, which require some simulation. (But not heavy.) * The assumption that the model value $x_\theta^t$ is averaged to the true $x_0$ should be more verified. * In the experimental section, only constant or linearized values are used as the scaling sche
- The paper tries to address an important issue in diffusion models, where the initial error accumulation can negatively affect the quality of the generated samples. - It introduces a method that employs the empirical $\ell_2$ ratio during both training and sampling phases to decide the appropriate scaling factor. This technique is straightforward yet proves to be effective. - Extensive experiments show that the proposed method can consistently improve the pre-trained models across datasets.
- Several prior works have observed and identified the "exposure bias" problem studied in the current paper. It would be helpful to discuss them in the paper: (1) Section 4 (practical considerations) and Fig 13 in EDM [1] points out that the neural network tends to remove slightly too much noise. Hence they use an inflated noise to counteract it. I think adding $S_{noise}$ into ODE or SDE samplers is a valid baseline for the current paper. (2) Section 4.2 in PFGM [2] / Section 5, Fig 4.b in PFGM
1. It is interesting and insightful to take in-depth exploration on the exposure bias problem in diffusion models. This paper connects exposure bias with prediction error and gives the expressions of prediction error. 2. To solve the exposure bias in a learning-free manner, this paper proposes to scale the noise prediction in the sampling process to match the noise prediction in the training process. 3. Solid experiments. Extensive experiments demonstrate the generality of Epsilon Scaling and
1. The main concern lies in the assumption at the start of the derivation. In part 3.2, the authors assume that the reconstructed image x_{\theta}^t at the sampling process follows the Gaussian distribution, where the mean is the GT image x_0, and the variance is the Gaussian noise. This conflicts with some intuitive observations. For example, the reconstructed image x_{\theta}^t is often a degraded version of the GT image, and the mean of x_{\theta}^t is different from x_0. 2. Some formulatio
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neuroimaging Techniques and Applications · Model Reduction and Neural Networks · Advanced Mathematical Modeling in Engineering
MethodsDiffusion
