Information Theoretic Learning for Diffusion Models with Warm Start
Yirong Shen, Lu Gan, Cong Ling

TL;DR
This paper introduces an information theoretic approach to diffusion models, providing a tighter likelihood bound that improves training efficiency and accuracy, and demonstrates state-of-the-art results on image datasets.
Contribution
It extends classical KL divergence relationships to arbitrary noise, enabling structured noise use and improving likelihood estimation in diffusion models.
Findings
Achieves competitive NLL on CIFAR-10
Sets SOTA results on ImageNet
Works effectively without data augmentation
Abstract
Generative models that maximize model likelihood have gained traction in many practical settings. Among them, perturbation based approaches underpin many strong likelihood estimation models, yet they often face slow convergence and limited theoretical understanding. In this paper, we derive a tighter likelihood bound for noise driven models to improve both the accuracy and efficiency of maximum likelihood learning. Our key insight extends the classical KL divergence Fisher information relationship to arbitrary noise perturbations, going beyond the Gaussian assumption and enabling structured noise distributions. This formulation allows flexible use of randomized noise distributions that naturally account for sensor artifacts, quantization effects, and data distribution smoothing, while remaining compatible with standard diffusion training. Treating the diffusion process as a Gaussian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
