Speed-accuracy relations for diffusion models: Wisdom from nonequilibrium thermodynamics and optimal transport

Kotaro Ikeda; Tomoya Uda; Daisuke Okanohara; and Sosuke Ito

arXiv:2407.04495·cond-mat.stat-mech·August 4, 2025

Speed-accuracy relations for diffusion models: Wisdom from nonequilibrium thermodynamics and optimal transport

Kotaro Ikeda, Tomoya Uda, Daisuke Okanohara, and Sosuke Ito

PDF

TL;DR

This paper establishes a theoretical framework linking diffusion models and nonequilibrium thermodynamics, deriving speed-accuracy inequalities that inform optimal data generation strategies, validated through numerical experiments on image datasets.

Contribution

It introduces a novel connection between diffusion models and stochastic thermodynamics, deriving speed-accuracy relations and optimal protocols using optimal transport theory.

Findings

01

Derived speed-accuracy inequalities for diffusion models.

02

Validated relations through numerical experiments with various noise schedules.

03

Demonstrated applicability to real-world image data generation.

Abstract

We discuss a connection between a generative model, called the diffusion model, and nonequilibrium thermodynamics for the Fokker-Planck equation, called stochastic thermodynamics. Using techniques from stochastic thermodynamics, we derive the speed-accuracy relations for diffusion models, which are inequalities that relate the accuracy of data generation to the entropy production rate. This relation can be interpreted as the speed of the diffusion dynamics in the absence of the non-conservative force. From a stochastic thermodynamic perspective, our results provide quantitative insight into how best to generate data in diffusion models. The optimal learning protocol is introduced by the geodesic of space of the 2-Wasserstein distance in optimal transport theory. We numerically illustrate the validity of the speed-accuracy relations for diffusion models with different noise schedules and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings