Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training
Yunshu Wu, Yingtao Luo, Xianghao Kong, Evangelos E. Papalexakis, Greg, Ver Steeg

TL;DR
This paper reveals that diffusion models function as noise classifiers and introduces contrastive training to improve denoising, especially in out-of-distribution regions, enhancing sample quality and sampling speed.
Contribution
The authors propose a novel contrastive training objective that leverages the implicit log-likelihood ratio in diffusion models to improve out-of-distribution denoising performance.
Findings
Contrastive training improves OOD denoising.
Enhanced parallel sampling performance.
Significant speedup in sample generation.
Abstract
Diffusion models learn to denoise data and the trained denoiser is then used to generate new samples from the data distribution. In this paper, we revisit the diffusion sampling process and identify a fundamental cause of sample quality degradation: the denoiser is poorly estimated in regions that are far Outside Of the training Distribution (OOD), and the sampling process inevitably evaluates in these OOD regions. This can become problematic for all sampling methods, especially when we move to parallel sampling which requires us to initialize and update the entire sample trajectory of dynamics in parallel, leading to many OOD evaluations. To address this problem, we introduce a new self-supervised training objective that differentiates the levels of noise added to a sample, leading to improved OOD denoising performance. The approach is based on our observation that diffusion models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsForecasting Techniques and Applications · Gaussian Processes and Bayesian Inference
MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
