Heavy-Tailed Diffusion Models
Kushagra Pandey, Jaideep Pathak, Yilun Xu, Stephan Mandt, Michael, Pritchard, Arash Vahdat, Morteza Mardani

TL;DR
This paper introduces a novel heavy-tailed diffusion framework using Student-t distributions, enhancing the modeling of rare and extreme events in data, with practical extensions and improved performance on weather datasets.
Contribution
It develops a heavy-tail diffusion framework with a tailored perturbation kernel and training objective, compatible with existing models, for better heavy-tail behavior modeling.
Findings
Outperforms standard models in heavy-tail estimation
Effectively models rare and extreme weather events
Requires minimal modifications to existing diffusion models
Abstract
Diffusion models achieve state-of-the-art generation quality across many applications, but their ability to capture rare or extreme events in heavy-tailed distributions remains unclear. In this work, we show that traditional diffusion and flow-matching models with standard Gaussian priors fail to capture heavy-tailed behavior. We address this by repurposing the diffusion framework for heavy-tail estimation using multivariate Student-t distributions. We develop a tailored perturbation kernel and derive the denoising posterior based on the conditional Student-t distribution for the backward process. Inspired by -divergence for heavy-tailed distributions, we derive a training objective for heavy-tailed denoisers. The resulting framework introduces controllable tail generation using only a single scalar hyperparameter, making it easily tunable for diverse real-world distributions.…
Peer Reviews
Decision·ICLR 2025 Poster
The t-EDM provides a straightforward and mathematically justified extension for diffusion models to heavy-tailed data. This is to be compared, for instance, to Yoon et al (2023) who also provide a framework for heavy-tailed diffusion models (using Levy processes), but whose implementation requires heavier mathematics and to make the model work in practice they make theoretically unjustified changes to the model. The authors do a good job presenting the concepts as natural extensions of what wa
The experiments for unconditional generation in Table 2 show that the EDM (using preconditioning) outperforms the t-EDM for the KR and SR metric in the VIL channel with test data. This hinders the claim the authors make that standard diffusion models, even with preconditioning, fail to capture the heavy-tailed distributions (line 043) and that t-EDM outperforms standard diffusion models. Qualitatively, the plots in Figure 3 do show that t-EDM outperforms EDM with preconditioning in capturing the
- The paper introduces a new way of training a score based model with heavy tailed noise, based on the student t distribution, and shows clear benefits in training. - The experimental results are reasonably convincing.
- The paper fails to discuss previous work, for example "Heavy-tailed denoising score matching" covering the subject matter. - The paper lacks experiments on synthetic data which is generated as heavy tailed. While not needed, I believe it would help make the case that this method definitively helps with modeling heavy tails by demonstrating it on definitively heavy tailed data.
I really liked the problem the paper was looking into — better characterisation of extremes in diffusion and flow based generative models — especially in the context of environmental forecasting. I think that this is an important practical problem. I also liked the use of a simple example to analyse how well diffusion models capture tail behaviour. (In fact I would have liked to have seen more investigation to toy examples of this sort to pin down the failure models of diffusion more generally.
I would have liked to have seen a bit more clarity on the formulation of the denoising process (the sub-optimalities here seem significantly more severe than in the Gaussian case, see questions below for more detail). The use of the power-divergence for training was creative, but connecting the parameters of the model to the divergence parameters also seemed sub-optimal as it does not allow the framework to learn the degree of freedom parameter in the student-t distribution. The experimentatio
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Mathematical Modeling in Engineering
MethodsDiffusion
