# Closing the ODE–SDE gap in score-based diffusion models through the Fokker–Planck equation

**Authors:** Teo Deveney, Jan Stanczuk, Lisa Kreusser, Chris Budd, Carola-Bibiane Schönlieb

PMC · DOI: 10.1098/rsta.2024.0503 · Philosophical transactions. Series A, Mathematical, physical, and engineering sciences · 2025-06-05

## TL;DR

This paper explains why ODE-based samplers in diffusion models perform worse than SDE-based ones and proposes a method to improve them using the Fokker–Planck equation.

## Contribution

The paper introduces a theoretical framework linking ODE and SDE dynamics via the Fokker–Planck equation and proposes a regularization method to reduce their performance gap.

## Key findings

- The difference between ODE and SDE samplers is linked to the Fokker–Planck residual.
- Adding the Fokker–Planck residual as a regularization term improves ODE sampler performance.
- Improving ODE samplers can sometimes degrade SDE sample quality.

## Abstract

Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling, due to both their mathematical foundations and their state-of-the art performance in many tasks. Empirically, it has been reported that samplers based on ordinary differential equations (ODEs) are inferior to those based on stochastic differential equations (SDEs). In this article, we systematically analyse the difference between the ODE and SDE dynamics of score-based diffusion models and show how this relates to an associated Fokker–Planck equation. We rigorously describe the full range of dynamics and approximations arising when training score-based diffusion models and derive a theoretical upper bound on the Wasserstein 2-distance between the ODE- and SDE-induced distributions in terms of a Fokker–Planck residual. We also show numerically that conventional score-based diffusion models can exhibit significant differences between ODE- and SDE-induced distributions that we demonstrate using explicit comparisons. Moreover, we show numerically that reducing this Fokker–Planck residual by adding it as an additional regularization term during training closes the gap between ODE- and SDE-induced distributions. Our experiments suggest that this regularization can improve the distribution generated by the ODE; however this can come at the cost of degraded SDE sample quality.

This article is part of the theme issue ‘Partial differential equations in data science’.

## Full-text entities

- **Diseases:** TD (MESH:D004409)
- **Chemicals:** SDE (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12139524/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12139524/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/PMC12139524/full.md

---
Source: https://tomesphere.com/paper/PMC12139524