Diffusion Models With Learned Adaptive Noise
Subham Sekhar Sahoo, Aaron Gokaslan, Chris De Sa, Volodymyr Kuleshov

TL;DR
This paper introduces MULAN, a learned adaptive noise process for diffusion models that improves likelihood estimation and achieves state-of-the-art results on CIFAR-10 and ImageNet, while reducing training steps.
Contribution
It proposes a multivariate learned adaptive noise schedule that makes the ELBO dependent on the noise process, challenging previous invariance assumptions.
Findings
Sets new state-of-the-art density estimation on CIFAR-10 and ImageNet.
Reduces training steps by 50%.
Demonstrates the effectiveness of learned adaptive noise in diffusion models.
Abstract
Diffusion models have gained traction as powerful algorithms for synthesizing high-quality images. Central to these algorithms is the diffusion process, a set of equations which maps data to noise in a way that can significantly affect performance. In this paper, we explore whether the diffusion process can be learned from data. Our work is grounded in Bayesian inference and seeks to improve log-likelihood estimation by casting the learned diffusion process as an approximate variational posterior that yields a tighter lower bound (ELBO) on the likelihood. A widely held assumption is that the ELBO is invariant to the noise process: our work dispels this assumption and proposes multivariate learned adaptive noise (MULAN), a learned diffusion process that applies noise at different rates across an image. Specifically, our method relies on a multivariate noise schedule that is a function of…
Peer Reviews
Decision·NeurIPS 2024 spotlight
- The paper is well-written and easy to follow. - The authors present a theoretical motivation for making the noise schedule multivariate and learnable from a variational inference perspective, showing that it can impact the ELBO of the model unlike a univariate schedule. - The proposed method demonstrates state-of-the-art likelihood estimation results. - The authors conduct an extensive ablation study to thoroughly investigate the proposed method. Overall, I believe MuLAN is a well-motivated a
- According to the experimental results provided, the only benefit of MuLAN is improved likelihood estimation. The experiments are conducted exclusively in the image domain, but MuLAN does not yield better FID scores compared to prior works. As discussed in Section 3.1, a model with better likelihood estimation may be useful for tasks like compression or adversarial example detection. However, the authors only provide likelihood estimation results and do not demonstrate improved practical outcom
The ELBO perspective of diffusion models has drawn much attention since the diffusion models' introduction. This paper made a non-trivial extension to the current framework and showed that the well-known assumption is no longer held that noise schedule does not alter EBLO. This observation gives additional flexibility to increase ELBO, resulting in a better performance in density estimation.
1. In section 3.5, the authors discussed why the generalization makes the ELBO rely on the entire trajectory; however, they state the fact without giving any intuitive explanations. The authors should discuss this issue more. In addition, providing some toy examples to clearly show how the extended framework is differentiated from the existing one could significantly improve the paper's presentation. 2. The FIDs of the proposed model are significantly worse than the existing diffusion models.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Medical Image Segmentation Techniques · Radiomics and Machine Learning in Medical Imaging
MethodsSparse Evolutionary Training · Diffusion
