Stable Target Field for Reduced Variance Score Estimation in Diffusion Models
Yilun Xu, Shangyuan Tong, Tommi Jaakkola

TL;DR
This paper introduces a stable target field method for diffusion models that reduces training target variance by using a reference batch, leading to improved image quality, stability, and faster training, achieving state-of-the-art results on CIFAR-10.
Contribution
The paper proposes a novel stable target calculation using a reference batch to reduce variance in diffusion model training, enhancing performance and stability.
Findings
Reduced covariance of training targets improves model stability.
Enhanced image quality and training speed across datasets.
Achieved state-of-the-art FID score of 1.90 on CIFAR-10.
Abstract
Diffusion models generate samples by reversing a fixed forward diffusion process. Despite already providing impressive empirical results, these diffusion models algorithms can be further improved by reducing the variance of the training targets in their denoising score-matching objective. We argue that the source of such variance lies in the handling of intermediate noise-variance scales, where multiple modes in the data affect the direction of reverse paths. We propose to remedy the problem by incorporating a reference batch which we use to calculate weighted conditional scores as more stable training targets. We show that the procedure indeed helps in the challenging intermediate regime by reducing (the trace of) the covariance of training targets. The new stable targets can be seen as trading bias for reduced variance, where the bias vanishes with increasing reference batch size.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neuroimaging Techniques and Applications · Advanced Mathematical Modeling in Engineering · MRI in cancer diagnosis
MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
