# Are We Really Learning the Score Function? Reinterpreting Diffusion Models Through Wasserstein Gradient Flow Matching

**Authors:** An B. Vuong, Michael T. McCann, Javier E. Santos, Yen Ting Lin

arXiv: 2509.00336 · 2025-09-03

## TL;DR

This paper challenges the traditional view that diffusion models learn the score function, proposing instead that they perform flow matching to Wasserstein Gradient Flows, which better explains their effectiveness despite non-conservative learned vector fields.

## Contribution

The paper introduces a new theoretical perspective that interprets diffusion training as flow matching to Wasserstein Gradient Flows, providing a more accurate understanding of their generative capabilities.

## Key findings

- Diffusion networks often violate score function constraints.
- Models perform well despite non-conservative vector fields.
- WGF perspective explains probability flow without reverse-time SDEs.

## Abstract

Diffusion models are commonly interpreted as learning the score function, i.e., the gradient of the log-density of noisy data. However, this assumption implies that the target of learning is a conservative vector field, which is not enforced by the neural network architectures used in practice. We present numerical evidence that trained diffusion networks violate both integral and differential constraints required of true score functions, demonstrating that the learned vector fields are not conservative. Despite this, the models perform remarkably well as generative mechanisms. To explain this apparent paradox, we advocate a new theoretical perspective: diffusion training is better understood as flow matching to the velocity field of a Wasserstein Gradient Flow (WGF), rather than as score learning for a reverse-time stochastic differential equation. Under this view, the "probability flow" arises naturally from the WGF framework, eliminating the need to invoke reverse-time SDE theory and clarifying why generative sampling remains successful even when the neural vector field is not a true score. We further show that non-conservative errors from neural approximation do not necessarily harm density transport. Our results advocate for adopting the WGF perspective as a principled, elegant, and theoretically grounded framework for understanding diffusion generative models.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00336/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00336/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/2509.00336/full.md

---
Source: https://tomesphere.com/paper/2509.00336