Generative diffusion models in infinite dimensions: a survey

Giulio Franzese; Pietro Michiardi

PMC · DOI:10.1098/rsta.2024.0322·June 19, 2025

Generative diffusion models in infinite dimensions: a survey

Giulio Franzese, Pietro Michiardi

PDF

Open Access

TL;DR

This survey explores how diffusion models can be extended to work with infinite-dimensional data, covering theory, methods, and applications in function spaces.

Contribution

The paper provides a comprehensive survey of generative diffusion models in infinite-dimensional spaces, highlighting their theoretical and practical advancements.

Findings

01

Diffusion models in infinite dimensions are grounded in stochastic differential equations in Hilbert spaces.

02

Applications include functional data generation and solving inverse problems.

03

The survey identifies open problems and future research directions in the field.

Abstract

Diffusion models have recently emerged as a powerful class of generative models, achieving state-of-the-art performance in various domains such as image and audio synthesis. While most existing work focuses on finite-dimensional data, there is growing interest in extending diffusion models to infinite-dimensional function spaces. This survey provides a comprehensive overview of the theoretical foundations and practical applications of diffusion models in infinite dimensions. We review the necessary background on stochastic differential equations in Hilbert spaces, and then discuss different approaches to define generative models rooted in such formalism. Finally, we survey recent applications of infinite-dimensional diffusion models in areas such as generative modelling for function spaces, conditional generation of functional data and solving inverse problems. Throughout the survey, we…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Tables1

Table 1.. Finite-dimensional generative models and infinite-dimensional counterparts.

finite-dimensional	infinite-dimensional	position in survey
DDPM [5,40,41]	[28]	§4a
annealed-LD [42,43]	[30]	§3a
score-SDE [1]	[27,31,44,45]	§2a
flow matching [46,47]	[29]	§3b
diffusion bridges and stochastic control [48,49]	[32,33,50]	§5a
Bayesian inverse problems [51,52]	[24,25]	§5b

Equations26

Funding1

—Huawei Technologieshttp://dx.doi.org/10.13039/501100003816

Keywords

stochastic differential equationsgenerative diffusion modelsHilbert spaces

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Advanced Mathematical Modeling in Engineering · Stochastic processes and financial applications

Full text

Introduction

Diffusion models have emerged as a prominent class of generative models, achieving notable success in various fields such as image generation [1], audio synthesis [2,3], video [4,5], molecular structures and general three-dimensional shapes [6–9] and conditional generation from open-ended text prompts [10–13]. For applications involving stochastic processes and complex systems where finite-dimensional approximations are insufficient, like weather forecasts or seismology [14,15] or extremely high-resolution point clouds [16], generative models which operate directly in function space are required. This survey aims to provide a structured overview of the theoretical and practical developments in generative diffusion models operating in infinite dimensions.

A fundamental motivation for considering infinite-dimensional approaches arises from Bayesian inverse problems in function spaces. The work in [17] established a rigorous mathematical framework for inference in such settings, emphasizing that naive finite-dimensional approximations often introduce undesirable pathologies. Rather than discretizing early in the modelling process, a more principled approach is to avoid discretization until the last possible moment [17]. Numerous studies have demonstrated that working directly in infinite-dimensional spaces leads to improved stability and accuracy in inference tasks [18,19].

The necessity of such models stems from the intrinsic limitations of finite-dimensional approximations, particularly their inconsistency across different resolutions. In empirical studies on generative diffusion models, it has been widely observed that models trained at one resolution often fail to generalize across different resolutions or scales unless carefully adapted [20–23]. This issue extends beyond neural architecture design (e.g. parameter adjustments) and is fundamentally linked to the role of trace-class noise in the perturbation process [24–31]. When the additive noise has a flat spectral density, changing the sampling frequency (i.e. the resolution) typically alters the effective signal-to-noise ratio, affecting consistency in generative models.

In contrast, infinite-dimensional methods allow for the construction of generative models that generalize effectively across different resolutions and are not constrained by regular sampling grids [24–33]. Additionally, infinite-dimensional generative models have been shown to provide enhanced parameter efficiency [26]. These models inherently accommodate function spaces in a way that ensures robustness and consistency in inference and generation tasks, making them particularly well-suited for applications requiring flexibility across multiple scales.

The problem of generative modelling in functional domains can be approached by adapting the techniques valid for the finite-dimensional case to the one where observed data points are considered as finite-resolution realizations of underlying functions [34–36]. These approaches allow for the incorporation of the notion of underlying manifold (whether Euclidean space or not) on which the data is collected, without the need to explicitly consider the machinery of infinite-dimensional stochastic calculus [37]. However, scaling these methods to arbitrary resolution is challenging. A different approach, which has been incorporated by some of the purely functional domain diffusion models, is to consider pseudo-invertible mappings from infinite-dimensional domains onto finite-dimensional ones and consider classical diffusion in those finite spaces [38,39]. Given the limitations of such approaches, a purely function space investigation of diffusion processes has been explored by the community, which we describe in the following.

An overview of existing methods and applications

(a)

Although the primary focus of this survey is the exploration of infinite-dimensional diffusion models, we first provide an overview of key generative models based on diffusion and related dynamics in finite dimensions, to set the stage for a discussion of their infinite-dimensional counterparts and applications.

On the left column of table 1, we summarize the major classes of finite-dimensional generative models. Among these, discrete-time diffusion models, commonly referred to as DDPM [5,40,41], learn to approximate the reversal of a forward Markov chain with a finite number of steps. This forward process perturbs the original data, typically through additive Gaussian noise, which smooths the data distribution. A related approach, annealed Langevin dynamics, replaces single-step transitions with multiple runs of Langevin dynamics, using the perturbed score function as the drift term to iteratively refine the noisy samples [42,43].

As the number of steps tends to infinity, the Markov chain formulation converges to a continuous-time setting, leading to score-based SDEs [1], which model the forward and reverse evolution through stochastic differential equations. Another approach, known as Flow Matching [46,47], models the perturbation dynamics deterministically, inducing transport flows that map between the initial noisy distribution and the target data distribution.

Beyond generative modelling, diffusion models have been applied to stochastic control problems and Bayesian inverse problems. In finite dimensions, diffusion bridges provide a framework for modelling stochastic paths conditioned on endpoints, making them useful in molecular dynamics and controlled trajectory planning [48,49]. Similarly, diffusion-based priors have been leveraged in Bayesian inverse problems, where they help reconstruct unknown samples from noisy measurements [51,52]. By providing strong probabilistic priors, these models enhance inference accuracy and enable uncertainty quantification in complex systems.

The transition to infinite-dimensional settings introduces additional mathematical and computational challenges, which we explore in later sections. In particular, we discuss the infinite-dimensional counterpart of score-SDEs [27,31,44,45] in §2a, where they appear as specific cases of equation (2.4). Alternative formulations, including annealed Langevin dynamics [30] and flow matching [28], are examined in §3a,b, in connection with equations (3.1) and (3.4). Applications of diffusion bridges and Bayesian inverse problems in infinite dimensions are covered in §5a,b, particularly in equations (5.2)–(5.4).

Structure of the manuscript

(b)

In §2, we delve into the formal definition of diffusion models in infinite dimensions, focusing on Hilbert spaces. We introduce the necessary mathematical foundations, such as SDEs in Hilbert spaces, the properties of infinite-dimensional diffusion processes and key concepts like time-reversal of SDEs and the formulation of probability measures. Special attention is given to generative processes driven by these models, alongside theoretical considerations such as existence and uniqueness of solutions. In §3, we explore alternative methodologies for constructing generative models in infinite-dimensional settings, contrasting them with the diffusion-based approaches discussed earlier. We review methods like annealed Langevin dynamics and functional flow matching and connect their mathematical underpinnings to diffusion models, highlighting both their strengths and limitations. In §4, we outline the loss functions and training objectives associated with these parametric methods, particularly in terms of denoising and likelihood maximization. We also briefly overview neural architectures suitable for functional data and analyse their ability to handle resolution invariance. In §5, we provide an overview of various real-world applications of infinite-dimensional diffusion models. Other than purely generative use cases, we discuss the solution of Bayesian inverse problems, stochastic bridges and optimal control.

Finally, in §6, we summarize the key takeaways from this survey, highlighting the theoretical advancements and practical applications of diffusion models in infinite dimensions. We also outline open problems and future research directions, suggesting ways to further refine these models and explore their broader use in generative modelling and related fields.

Diffusion-based generative models in Hilbert spaces

We consider $[eqn]$ to be a real, separable Hilbert space with inner product $[eqn]$ , norm $[eqn]$ . Let $[eqn]$ be the set of bounded linear operators on $[eqn]$ , $[eqn]$ be its Borel $[eqn]$ -algebra, $[eqn]$ be the set of bounded $[eqn]$ -measurable functions $[eqn]$ and $[eqn]$ be the set of probability measures on $[eqn]$ .

Our focus is to survey existing generative models which leverage the properties of diffusion processes in Hilbert spaces. Informally, given a finite collection of samples drawn from an underlying probability measure $[eqn]$ , we consider different, but strictly related techniques, to transform a sample drawn from an easy measure into a sample whose measure coincides with the desired $[eqn]$ . Consider the following $[eqn]$ -valued SDE whose initial conditions are drawn from $[eqn]$

[eqn]

where $[eqn]$ , $[eqn]$ is a $[eqn]$ -Wiener process on $[eqn]$ (of trace class) defined on the quadruplet $[eqn]$ . In our definitions, $[eqn]$ are the sample space and canonical filtration, respectively: we consider $[eqn]$ to be $[eqn]$ , that is the space of all continuous mappings $[eqn]$ , and $[eqn]$ to be the canonical process. The domain of $[eqn]$ is $[eqn]$ , where $[eqn]$ is a measurable map. $[eqn]$ is assumed to be Lipschitz continuous. The negative, symmetric operator $[eqn]$ is the infinitesimal generator of a $[eqn]$ -semigroup $[eqn]$ in $[eqn]$ $[eqn]$ , and $[eqn]$ is a probability measure in $[eqn]$ . By definition, the measure associated with equation (2.1) is indicated with $[eqn]$ . The law induced at time $[eqn]$ by the canonical process on the measure $[eqn]$ is indicated with $[eqn]$ , where $[eqn]$ , and $[eqn]$ is any element of $[eqn]$ . We define the law associated with the process defined in equation (2.1), when initial conditions are deterministic ( $[eqn]$ ) as $[eqn]$ . Notice that, by disintegration properties,

[eqn]

Given a countable orthonormal basis $[eqn]$ for the Hilbert space $[eqn]$ , we can always construct an isomorphism with the space of square summable sequences $[eqn]$ . Since $[eqn]$ is a trace class operator, there exists a complete orthonormal system in $[eqn]$ [37] that diagonalizes $[eqn]$ , i.e. $[eqn]$ , with $[eqn]$ a positive scalar, where we leverage the fact that $[eqn]$ is a symmetric operator being a covariance. The covariance operator can be used to define the Cameron Martin space $[eqn]$ associated with $[eqn]$ , where the corresponding induced scalar product is $[eqn]$ .

Then, it is useful to notice that equation (2.1) can also be expressed as an (infinite) system of stochastic differential equations, in terms of $[eqn]$ , as:

[eqn]

where we introduced the projection $[eqn]$ , with $[eqn]$ and Lebesgue measure to obtain densities from measures. Moreover, $[eqn]$ with covariance given by $[eqn]$ , $[eqn]$ in the Kronecker sense. Under condition 2.1, the SDE equation (2.1) admits a strong solution [37].

Condition 2.1. (i) $[eqn]$ , (ii) $[eqn]$ is a Hilbert–Schmidt operator, and (iii) $[eqn]$ . Notice that this condition is trivially satisfied whenever $[eqn]$ is a bounded operator.

Time reversal of SDEs

(a)

The idea of generative models based on SDEs in the form of equation (2.1) is to consider the time-reversal of the stochastic process $[eqn]$ , namely the process $[eqn]$ . Under certain conditions, the process $[eqn]$ can be shown to be a solution of a new SDE. When this is the case, simulation of its dynamics allows obtaining a sample from the desired measure ( $[eqn]$ ).

More formally, we require that the time reversal of the canonical process, $[eqn]$ , is again a diffusion process, with distribution given by the path-reversed measure $[eqn]$ , along with the reversed filtration $[eqn]$ . In the finite-dimensional case (which corresponds to the case where the Hilbert space has finite dimensionality), several results exist which prove (under appropriate technical conditions) that the time reversal of SDEs in the form of equation (2.1) (although finite dimensional) has again expression in terms of an SDE, with a new drift term composed as $[eqn]$ , where $[eqn]$ is the density of the measure w.r.t. the Lebesgue measure $[eqn]$ [53,54]. However, the time reversal of an infinite-dimensional process is more involved than for the finite-dimensional case. Indeed, in infinite-dimensional spaces, there is no equivalent of the Lebesgue measure to obtain densities from measures (i.e. $[eqn]$ in $[eqn]$ does not exist [55]). However, in the infinite-dimensional case, we can consider the single-dimensional conditional density $[eqn]$ (provided its existence), $[eqn]$ being the single-dimensional Lebesgue measure. At a rather informal level, the reader can understand why considering such objects can alleviate the problem of non-existence of the desired densities in infinite dimensions: in the finite-dimensional case, thanks to disintegration of measures, the second term of the time-reversed drift can be simplified as $[eqn]$ .

It is then possible to derive results for the infinite-dimensional case which are analogous to the finite-dimensional one. There are two major approaches to guarantee the existence of the reverse diffusion process. The first approach [56] relies on a finite local entropy condition. The second approach is based on stochastic calculus of variations [57], which we use to claim what follows.

Theorem 2.1. Consider equation (2.1). Suppose that the conditions listed in Theorem 4.3 of [57] hold. Then $[eqn]$ , corresponding to the path measure $[eqn]$ , has the following SDE representation:

[eqn]

where $[eqn]$ is a $[eqn]$ $[eqn]$ -Wiener process and the notation $[eqn]$ stands for the mapping $[eqn]$ that, when projected, satisfies $[eqn]$ . By projecting onto the eigenbasis, we have an infinite system of SDEs:

[eqn]

Notice that strictly speaking, using the symbol $[eqn]$ is an abuse of notation, since we cannot identify it with the gradient (in Hilbert space) of a single function.

The result in [57] is remarkably general and elegant, but this generality comes at the cost of increased complexity in verifying that the necessary technical conditions hold. Although not the most general case, a concrete example of conditions that satisfy the coefficient assumptions outlined in [57] are those considered in [26], which we report hereafter, as discussed in detail in appendix A:

Condition 2.2. $[eqn]$ and $[eqn]$ is diagonal in $[eqn]$ , i.e. $[eqn]$ .

Condition 2.3. Condition 2.2 is satisfied, and $[eqn]$ is bounded: $[eqn]$ , with $[eqn]$ a finite positive constant.

It is worth mentioning that much effort has been put in by the authors in [31] to generalize the settings considered here for the dynamics in equation (2.1), in particular with the possibility of considering non-constant operators $[eqn]$ , which precondition the Wiener process via $[eqn]$ . In their work, they focus on the time reversal formula at the level of the generator directly: in general, the existence of time reversal of a process and the validity of time reversal formulas do not imply each other. Such an approach is useful for investigating the relationship between different methodologies, as we will explore later.

For the validity of the methodologies described above, it is required that the conditional densities exist. Next, we introduce a different condition, which will be used to prove the existence of such densities and is at the root of the different generative modelling techniques explored in this survey, the trait d’union between all the technical requirements of infinite-dimensional diffusion models.

Condition 2.4. For each $[eqn]$ , the measure $[eqn]$ is absolutely continuous with respect to the measure $[eqn]$ .

In our introduction, we considered the generic nonlinear ( $[eqn]$ ) SDE equation (2.1). However, in almost all actual applications in the literature, the simple linear case where $[eqn]$ is considered. This assumption allows a simpler investigation of the validity of time reversal equations (see appendix A). Furthermore, this has tremendous practical benefits, as under such conditions, there exists a weak solution to equation (2.1) as

[eqn]

which is typically amenable to cheap simulation. In this case, the conditional distribution admits a simple expression $[eqn]$ , where $[eqn]$ .

Condition 2.2 is particularly helpful in practical implementations, simplifying the computation of terms in the form $[eqn]$ , which would generally require working with a different basis and consequently induce extremely high computational cost. For simplicity of exposition in relating different works, we will often consider this assumption in our writing.

Under condition 2.2, $[eqn]$ . This simplified expression is central in the works [24,27,45] where reverse time dynamics are proved directly with approaches that do not explicitly require the existence of the conditional densities. Furthermore, in this particular case, the measures $[eqn]$ have a simple Gaussian expression $[eqn]$ . In particular, in the works [24,25,27], it is assumed that $[eqn]$ , and consequently, $[eqn]$ , $[eqn]$ . The proof strategy in these works revolves around the conditional expectation directly, without the need to explicitly define conditional densities. This is an advantage over the conditions imposed, e.g. in [56,57], which are designed for generally nonlinear processes. In particular, the key idea adopted for the proofs is to define projected, low-dimensional dynamics for the process $[eqn]$ , via $[eqn]$ , where $[eqn]$ is the projection operator on the first $[eqn]$ basis vectors of $[eqn]$ . Then, the existence of a backward process with a finite-dimensional expected value is proven using classical results. The core technical contribution is to show that as $[eqn]$ the processes remain well-behaved and converge to the original infinite-dimensional dynamics, allowing us to claim that equations in the form of equation (2.4) (with the substitution of equation (2.7)) are valid generative models. Summarizing, the linear case has the following important property

Theorem 2.2. Assume that $[eqn]$ and condition 2.2 holds. Then

(i) Condition 2.4 holds ( $[eqn]$ exists). (ii) The conditional densities $[eqn]$ exist. (iii)

[eqn]

(iv)

[eqn]

Proof.

(i) Consider temporarily the assumptions $[eqn]$ and $[eqn]$ . Condition 2.4 is equal to $[eqn]$ for any measurable set $[eqn]$ and any $[eqn]$ . Since $[eqn]$ , the implication is true if $[eqn]$ for $[eqn]$ a.e. $[eqn]$ . Given that both measures are Gaussian with means $[eqn]$ and $[eqn]$ , respectively, and covariance $[eqn]$ the absolute continuity holds if $[eqn]$ for $[eqn]$ a.e. $[eqn]$ . Given that $[eqn]$ , this is equivalent to $[eqn]$ for all $[eqn]$ , which is exactly the assumption $[eqn]$ . We now prove that $[eqn]$ holds true if condition 2.2 holds. First, let us notice that the assumption is equivalent to assuming that $[eqn]$ . In this case, $[eqn]$ . Given a single $[eqn]$ , clearly $[eqn]$ satisfies the equality. Furthermore, its norm is equal to $[eqn]$ $[eqn]$ $[eqn]$ .(ii) Condition 2.4 implies that the Radon–Nikodym derivative exists and is well defined. By restriction, $[eqn]$ exists and is uniquely defined as well. Then, by disintegration $[eqn]$ , from which we conclude the existence of $[eqn]$ . Based on this, $[eqn]$ . Since $[eqn]$ , the existence of the densities is proven.(iii) $[eqn]$ . Furthermore, $[eqn]$ $[eqn]$ , where $[eqn]$ which has density w.t.t $[eqn]$ . Then, $[eqn]$ , and $[eqn]$ $[eqn]$ . Upon simple algebraic manipulation, we can get the desired result (the proof is based on the content of [24,26,33]).(iv) Due to the Feldman–Hajek theorem [37]

[eqn]

Then, $[eqn]$ . Consequently, $[eqn]$ $[eqn]$ and equation (2.8) holds (see [30] for further insights).

∎

If the linear operator $[eqn]$ satisfies determinate spectral conditions, then $[eqn]$ , for all $[eqn]$ and $[eqn]$ , where both measures have known Gaussian form [37,58–60]. Consequently, $[eqn]$ for all $[eqn]$ , which implies $[eqn]$ , i.e. §2.4 holds ( $[eqn]$ exists). Unfortunately, this result, which is stronger than condition §2.4, does not hold for the most commonly used operators $[eqn]$ , like $[eqn]$ , as it is required $[eqn]$ , for $[eqn]$ generic ([60]).

Theorem 2.1 stipulates that, given some conditions, the reverse time dynamics of equation (2.1) exist in the form of SDE. However, for generative modelling purposes, the content of theorem 2.1 is stronger than necessary. What suffices is that there exists a new SDE whose time-varying law corresponds to the backward in time of the original $[eqn]$ . Clearly, if a time reversal exists, such a process satisfies the weaker measure matching condition.

To expand on this point and connect together the different generative modelling techniques discussed in the literature, it is necessary to introduce the Kolmogorov operators [31,61–64]

[eqn]

where $[eqn]$ is the time derivative, $[eqn]$ are first- and second-order Fréchet derivatives in space (since both the scalar product and the trace are defined on the Hilbert space $[eqn]$ , and not its dual, the quantities in the equation above should be interpreted after Riesz mapping $[eqn]$ ). The domain of the operator is assumed to be the set of smooth cylinder functions of finite support [31,61].

Provided appropriate conditions are satisfied, see for example [61,62], the time-varying measure $[eqn]$ exists, is unique, and solves the Fokker–Planck equation

[eqn]

The result of theorem 2.1 can be thus understood at the level of time reversal of measures: by simply considering the negative of the operator $[eqn]$ , we obtain $[eqn]$ . While $[eqn]$ can be associated with a new drift term, it is not possible to construct a Brownian motion term with covariance $[eqn]$ . However, $[eqn]$ . As shown in [31, App. B.2],

[eqn]

which allows us, as anticipated, to prove in a simpler way a weaker form of the content of theorem 2.1: an SDE in the form of equation (2.4), defined on a proper probability space, induces a time-varying measure which corresponds to the time-reversal of the original $[eqn]$ , i.e. the measure associated with equation (2.1). Furthermore, the equality in equation (2.11) allows us to claim that the measure $[eqn]$ is also a solution of the continuity equation (see [63–65] for definiteness and uniqueness)

[eqn]

which corresponds to an ordinary differential equation (ODE) in the Hilbert space with deterministic drift

[eqn]

These facts allow us to build a connection between the methodology discussed so far and other prominent approaches from the literature, as shown hereafter.

Alternative generative approaches

In §2, we introduced a family of generative models rooted in (variants) of the time reversal formula, shown in equation (2.4). As anticipated, inverting the dynamics of a diffusion process is a sufficient, but not necessary, strategy to obtain a generative model in function space. In this section, we briefly overview some different methodologies which appeared in the literature, clearly connecting their working mechanism and underlying assumptions with the ones described previously.

Annealed Langevin dynamics

(a)

Consider the $[eqn]$ dimensional sequence of measures $[eqn]$ . Suppose we have access to computational schemes which allow us, starting from a sample drawn from $[eqn]$ , to obtain a sample from the target $[eqn]$ . Then, a valid generative modelling technique is to run sequentially such computational schemes on the reverse order sequence. One particular approach [30] relies on running multiple annealed chains of Langevin sampling schemes, in the following fashion:

[eqn]

where $[eqn]$ has covariance $[eqn]$ . Dynamics in the form of equation (3.1) have the following fundamental property: their time-invariant measure is $[eqn]$ ρiT/N [19,66]. Notice that for the scheme to be valid, the Radon–Nikodym derivative should be valid, i.e. condition 2.4 should hold, as required explicitly by the authors of [30]. Consequently, by selecting the sequence of measures as the law of a diffusion process in the form of equation (2.1), with the conditions described in theorem 2.2, the condition is satisfied, allowing comparison of the methods on the same ground.

To connect this scheme with the one described in [30], it is necessary to notice that $[eqn]$ . The iterative procedure that starts from a sample $[eqn]$ , simulates dynamics equation (3.1) for a sufficiently long time (to reach approximately steady state) and obtains a sample from $[eqn]$ , is consequently a valid generative model for $[eqn]$ .

Our goal hereafter is to discuss more clearly the connections between this approach and the schemes described previously. Considering a simple time-rescaling argument, it can be shown [37] that the dynamics in equation (3.1) are equivalent to

[eqn]

where $[eqn]$ is a Brownian motion with covariance $[eqn]$ . Then, under the assumptions of theorem 2.2, equation (3.2) reads

[eqn]

which is reminiscent of a classical Langevin equation in $[eqn]$ . Clearly, equation (3.3) preserves the measure $[eqn]$ . The reverse time dynamics described by equation (2.5) can then be interpreted as the schemes which anneal through the infinite ( $[eqn]$ ) sequences as follows:

[eqn]

Furthermore, the approach in [30] can also be directly linked with the denoising approach of [24,25,27] thanks to the result in equation(2.8), which allows us to appreciate the connection among all the methods and a denoising interpretation of the generative dynamics.

Flow matching

(b)

It is worth mentioning that, in the literature, a purely deterministic approach to the problem (a generalization of the ODE equation (2.13)), built as a generalization of the finite-dimensional Flow Matching approach of [46], has been explored in [29]. In this work, the authors avoid the issues associated with time reversal by directly constructing a path of probability measures $[eqn]$ which anneals from a tractable initial measure $[eqn]$ (a Gaussian measure for example) to the target data distribution $[eqn]$ . This path is constructed by flowing the initial measure along a time-dependent vector field $[eqn]$ on the Hilbert space $[eqn]$ :

[eqn]

with $[eqn]$ being a given trace class covariance. Since it represents a deterministic evolution, equation (3.4) can equivalently be described by a flow mapping $[eqn]$ , which is usually the space in which the design of such models is performed. The evolution of $[eqn]$ along $[eqn]$ is governed by the continuity equation (again for $[eqn]$ in the set of cylinder test functions)

[eqn]

Notice that the difference between this methodology and the ones described previously is that the transformation of a Gaussian reference measure into the data measure takes place exactly in finite time, whereas for the other methodologies, this does not happen. Indeed, in equation (2.4), the initial conditions are drawn from $[eqn]$ , which can be close but not equal to a purely Gaussian distribution (unless $[eqn]$ is the steady-state distribution itself of the process, provided it exists, or $[eqn]$ ). Similarly, the annealed Langevin approach requires running the last step of the chains for an infinite amount of time.

Given a conditional vector field $[eqn]$ , it is possible to compute the induced sequence of measures $[eqn]$ . The inverse procedure, which is more important from a practical perspective (the annealed sequence is a design parameter and the field is unknown), is much more challenging. A priori, it is not even known whether, given a sequence of measures, a vector field which matches this sequence exists. In [29], the strategy for the construction of a sequence for which the field exists is split into steps: (i) selection of a family of vector fields $[eqn]$ which induce a flow of measures $[eqn]$ whose starting point is $[eqn]$ (thus independent from $[eqn]$ ) and ending one (at time $[eqn]$ ) is a measure concentrated around $[eqn]$ , (ii) assumption that $[eqn]$ for $[eqn]$ a.e. $[eqn]$ and almost every $[eqn]$ , and (iii) construction of $[eqn]$ . The biggest technical problem is to ensure that $[eqn]$ , which is proven to be true if the collection of parametrized measures are $[eqn]$ a.e. mutually absolutely continuous. This holds for a class of conditional flows (induced by the conditional fields) of the form $[eqn]$ , where $[eqn]$ are selected appropriately ( $[eqn]$ ), which corresponds to conditional vector fields

[eqn]

In this practical case, the sequence of measures $[eqn]$ admits known Gaussian closed form.

The relationship between the technique described in [29] and the other ones presented previously can be understood assuming that $[eqn]$ , with $[eqn]$ . Then $[eqn]$ . Notice that, in this comparison, the flow starts from a measure $[eqn]$ which is only close to a purely Gaussian measure. Then, we can explicitly identify the vector field $[eqn]$ from equation (2.13), after simple manipulation, as $[eqn]$ , thanks to the content of the second form of the Fokker–Planck equation, as shown in equation (2.12).

This correspondence is useful in that the same conditions of theorem 2.2, which allow us to claim validity for the other class of models, are sufficient to ensure that the models considered in [29] are well defined. Indeed, other than integrability assumptions, the main assumption in Theorem 1 of [29] is that $[eqn]$ for $[eqn]$ a.e. $[eqn]$ and almost every $[eqn]$ . Under the same assumptions of theorem 2.2, this holds.

Theorem 3.1. As in theorem 2.2, assume that $[eqn]$ and condition 2.2 holds. Then, $[eqn]$ for $[eqn]$ a.e. $[eqn]$ and almost every $[eqn]$ .

Proof. We consider point (ii) of theorem 2.2, $[eqn]$ , which reads $[eqn]$ for any $[eqn]$ . Since $[eqn]$ , we have $[eqn]$ , for $[eqn]$ a.e. $[eqn]$ . Chaining the results, this implies that $[eqn]$ , for $[eqn]$ a.e. $[eqn]$ . Since both measures are Gaussian, this implies their equivalence, $[eqn]$ for $[eqn]$ a.e. $[eqn]$ . Then, for any $[eqn]$ , $[eqn]$ , which is enough to prove that $[eqn]$ for $[eqn]$ a.e. $[eqn]$ and almost every $[eqn]$ , since $[eqn]$ has full measure ([29], Theorem 2).∎

Parametric approximations

Our discussion so far has been rooted in the underlying assumption of having access to the key vector fields of the dynamics for the generation process: $[eqn]$ and $[eqn]$ . Clearly, this is not the case in practice, for realistic (and unknown) $[eqn]$ . In all implementations, the true vector field is replaced by a parametric ( $[eqn]$ ) approximation $[eqn]$ . Importantly, working in the functional domain calls for architectural choices which are suited for the infinite-dimensional domain. The most popular choice for resolution invariant architectures is the family of Neural (Fourier) Operators [67,68]. These architectures are extremely powerful in representational power but suffer from the drawback of requiring the input points to be collected on a regularly spaced grid [68], limiting their usage. Alternatively, transformers architectures [69] have been considered, by interpreting them as mappings between Hilbert spaces [70]. The flexibility of being capable of handling irregularly spaced grids comes at the cost that resolution invariance has to be learned during training, and it is not guaranteed by default for Neural Operators. Finally, it is worth mentioning the class of Implicit Neural Representations [71], which have the requested resolution-invariant properties, are known to be naturally good denoisers [72], but require meta-learning techniques for working properly [73]. Another possibility, not yet explored by the community, is to combine the different architectures mentioned, as done in the empirical work [74].

Loss functions and time discretization

(a)

While the selection of an appropriate architecture is an important endeavour, a fundamental aspect is the selection of appropriate loss functions for learning the parameters of such architectures. We then proceed by providing an overview of various loss functions employed in different approaches to infinite-dimensional diffusion models, highlighting their mathematical formulations and connections. Interestingly, all variants presented in the literature can be understood under the lens of learning to denoise a corrupted version of the input data.

The learning objective associated with equation (2.4) can be formulated as an evidence lower bound (ELBO) on the log-likelihood of the data (see [26] for full details). In particular, such ELBO arises from comparing the path measures of the forward and reverse diffusion processes, leveraging Girsanov’s theorem in infinite dimensions [37]. The discrepancy between the measure $[eqn]$ obtained with the approximated dynamics (and with initial conditions drawn from $[eqn]$ in place of $[eqn]$ ) and the true one can be expressed in Kullback–Leibler (KL) terms as

[eqn]

where $[eqn]$ . In the linear case, the conditional score term has expression [26]

[eqn]

Minimizing the loss $[eqn]$ is then equivalent to minimizing the ELBO. Related losses, but with different preconditioning, are also considered in [24,31,45]. A typical parametrization chosen for $[eqn]$ is in the form $[eqn]$ , from which

[eqn]

and is thus evident that the optimal solution for the parametric network $[eqn]$ is the ‘denoiser’ $[eqn]$ .

While the continuous-time framework provides a strong theoretical foundation for understanding diffusion processes, practical implementations often rely only on discrete-time approximations. In this setting [28], the first work in chronological order in considering Hilbert space valued generative diffusion models, the continuous path of measures $[eqn]$ is replaced by the finite sequence of measures, and the forward and reverse diffusion processes are approximated using discrete-time transitions, from which an analogous ELBO to equation (4.1) is obtained (where effectively, integration over $[eqn]$ is substituted with a finite sum over the discrete steps of the generation chain). One important aspect of this approach is that it automatically includes in the lower bound the effect of discretization of the SDEs, which is instead not included explicitly in the continuous time methods (when implemented, such methods clearly need to include some form of numerical integration [75–81]). Whenever the number of numerical integration steps is limited, in the finite-dimensional literature of diffusion models, it has been observed that optimizing directly the discrete-time lower bound provides better results [82]. While the same should hold in principle also in the infinite-dimensional settings, to the best of our knowledge, none of the work present in the literature has directly explored the problem.

Other families of approaches [29,30] adopt similar techniques, where a conditional version of the target field is considered for learning the vector fields. In particular, for learning $[eqn]$ through the equality formalized by equation (2.8), the following loss is considered: $[eqn]$ . Each term in the loss corresponds to the mismatch between the true Langevin drift and the approximated one. Interestingly, in principle, the scheme in [30] could still achieve very good performance while having bad approximations for all the terms but the last one, provided the last chain is run for long enough. A quantitative analysis of such discrepancy is an interesting avenue for future work, but it is worth mentioning that the study of the convergence rate of Langevin dynamics in infinite dimensions is much more challenging than for the finite-dimensional case [19]. Finally, in [29], the considered loss is $[eqn]$ which, upon noticing that $[eqn]$ (equation (3.6)), can again be interpreted as a denoising objective.

Applications and extensions

One of the primary applications of diffusion models in function spaces is data generation at arbitrary resolutions. This capability is essential in domains where high-resolution or continuous data representations are required. For example, in the fields of image and audio synthesis, where traditional generative models operate in discrete, finite-dimensional spaces, diffusion models in function spaces can inherently capture and generate data that varies continuously across scales. Recent works have demonstrated that these models can be particularly powerful for generating high-resolution images or audio waveforms, with Neural Operators, INRs or Transformer architectures providing a way to model functional data while maintaining resolution invariance [26,27,31,50,83]. Other authors have mainly focused on time series modelling under the functional formalism [28,29], or focused on new methodologies for physical sciences simulations and inverse problems [30]. In general, function space diffusion models can be applied to any modality on which finite-dimensional techniques have been applied, and beyond. We comment that, other than the obvious application scenario of generative modelling, variants of these diffusion models have also been applied for other problems, which we overview in the following.

Diffusion bridges and optimal control

(a)

For example, the works presented in [32,33,50] introduce an approach for simulating nonlinear diffusion bridges in infinite-dimensional spaces. The method involves considering a diffusion process (with measure $[eqn]$ )

[eqn]

and proving, through a generalization of Doob’s h-transform [84] to infinite dimensions, that the representation of an SDE corresponding to a process with path measure $[eqn]$ , where $[eqn]$ , $[eqn]$ , is again an SDE in the form

[eqn]

where $[eqn]$ . Unfortunately, excluding simple cases, knowledge of $[eqn]$ is out of reach. Even worse, a priori, it is not even known whether $[eqn]$ is differentiable and the problem is consequently well defined.

In the linear case, under the assumption $[eqn]$ , the derivative exists and furthermore $[eqn]$ , with known differentiable density $[eqn]$ (for the exact form and a proof refer to [59,85,86]).1 Then, noticing the time homogeneity of the problem $[eqn]$ $[eqn]$ $[eqn]$ , that is a form allowing for the computation of $[eqn]$ , which coincides with $[eqn]$ [50]. In [50], such formalism is investigated in deeper detail and adopted for building bridges between broader modifications of $[eqn]$ , for example for the measure $[eqn]$ . These extensions allow us, for example, to perform generic bridge matching and implement Bayesian learning in function space [50]. Other than for generative modelling purposes or stochastic control, stochastic bridges are also useful in other contexts, such as phylogenetic shape analysis, which models the stochastic change in animals' morphometry over time [87].

When considering nonlinear SDEs, the situation becomes more complex as, in general, it is harder to prove the existence of the derivative of $[eqn]$ , and it is impossible to sample from $[eqn]$ without having first constructed the bridge, which introduces a circular dependency. The authors in [32,33] adopt the strategy of time reversal to solve such problems and simplify the implementation.

For the following passages to be valid, it is important to assume the existence of the various Radon–Nikodym derivatives involved, whose proof is in general a far from trivial task. We consider for simplicity of exposition deterministic terminal conditions, i.e. $[eqn]$ , and consequently $[eqn]$ . By writing $[eqn]$ , we can explicitly write $[eqn]$ , and in particular then $[eqn]$ . The time reversal of equation (5.2) involves the distribution $[eqn]$ associated to $[eqn]$ , and consequently $[eqn]$ . Then $[eqn]$ . Conceptually, this implies the following: to simulate a bridge in the form of equation (5.2), it is possible to learn $[eqn]$ from equation (5.1), which has no constraints on the ending value and is thus an easier task, and then simulate the backward dynamics of equation (5.2), where initial conditions are known and the drift term is equal to $[eqn]$ . For full details and formal validity of the derivations, we refer the reader to [32,33].

Bayesian inverse problems

(b)

Finite-dimensional diffusion models have been explored as strong priors for Bayesian inverse problems in multiple domains [51,52]. The extension of these techniques to the Hilbert space settings has been explored by the authors in [24,25], in the case of linear and nonlinear observations, respectively. Considering the linear observation problem, it is assumed to have access to $[eqn]$ , where $[eqn]$ is a linear operator and $[eqn]$ is an additive noise random variable with a density $[eqn]$ w.r.t. the Lebesgue measure. Then, if $[eqn]$ (note that the work in [24] considers a more generic $[eqn]$ ) it is possible to show that the following scheme

[eqn]

with $[eqn]$ being a $[eqn]$ -Wiener process provides, at time $[eqn]$ , a valid sample from the posterior distribution $[eqn]$ . The result is obtained considering the particular case of equation (2.1) where $[eqn]$ and $[eqn]$ . The proof requires mimicking the result of [45], extending it for the conditional case. The authors extend their work in two directions: first, considering the case of nonlinear operator $[eqn]$ , and second, performing conditional sampling with only knowledge of the unconditional $[eqn]$ (thus allowing the technique to adopt any pre-trained unconditional model as in [88]). A simplified expression of the scheme proposed in [25] reads

[eqn]

where $[eqn]$ is a parameter which is annealed from large to small values. Given the similarity between equations (5.4) and (3.3), it is natural to interpret the former as an infinite-dimensional generalization of the conditioning trick [89], which in finite dimension reads $[eqn]$ . We invite the reader to refer to [25] for generalizations of this scheme and quantitative bounds on the quality of approximated posterior.

Conclusions

In this survey, we have explored the extension of diffusion-based generative models from finite-dimensional settings to infinite-dimensional function spaces, focusing on the theoretical underpinnings and practical implementations of such models in Hilbert spaces. By leveraging SDEs and the properties of time reversal in infinite dimensions, we have highlighted how these models can effectively generate function-valued data, offering promising applications in domains that require high-resolution generative capabilities.

While significant progress has been made in adapting diffusion models to infinite-dimensional settings, several challenges remain open. One of the key difficulties is ensuring the efficient computation of generative dynamics in function spaces without sacrificing resolution invariance. Approximations and parametric methods, such as those based on neural Fourier operators and transformer architectures, have shown promise, but further improvements are necessary to fully realize the potential of these models in practical settings.

An interesting direction for future research would be the exploration of perturbation-based scenarios for infinite-dimensional diffusion models, akin to those already developed for finite-dimensional cases. For instance, works such as [90–93] have successfully extended finite-dimensional diffusion models to account for different perturbative settings. Applying these methods to function spaces could open new avenues for developing more robust generative models that operate under complex data perturbations or in noisier, real-world applications. This extension could provide stronger priors for solving inverse problems, denoising functional data or addressing generative tasks in physical sciences. Alternatively, it would be interesting to draw the functional domain equivalent of the alternatives to diffusion in finite dimensions [47,94,95].

Moreover, the theoretical understanding of how these perturbation techniques interact with the infinite-dimensional structure remains an open problem. As with finite-dimensional models, establishing precise conditions for the stability, convergence and efficiency of these models in infinite dimensions will be critical for both theoretical development and their practical utility.

In conclusion, infinite-dimensional diffusion models represent a powerful and flexible approach to generative modelling in functional domains. Their potential in practical applications is clear but under-explored, and further extensions, especially along the lines of perturbation methods, could significantly advance both the theory and applications of functional models.

Bibliography95

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Song Y , Sohl-Dickstein J , Kingma DP , Kumar A , Ermon S , Poole B . 2021 Score-based generative modeling through stochastic differential equations. In Int. Conf. on Learning Representations, Vienna, Austria, 4 May, 2021. Red Hook, NY: Curran Associates, Inc. https://www.proceedings.com/75296.html.
2Kong Z , Ping W , Huang J , Zhao K , Catanzaro B . 2021 Diffwave: a versatile diffusion model for audio synthesis. In Int. Conf. on Learning Representations, Vienna, Austria, 4 May, 2021. Red Hook, NY: Curran Associates, Inc. https://www.proceedings.com/75296.html.
3Liu J , Li C , Ren Y , Chen F , Zhao Z . 2022 Diffsinger: singing voice synthesis via shallow diffusion mechanism. In Proc. of the AAAI Conf. on Artificial Intelligence. Online, 22 February - 1 March 2022, vol. 36 , pp. 11020–11028, Red Hook, NY: Curran Associates. (10.1609/aaai.v 36i 10.21350) · doi ↗
4He Y , Yang T , Zhang Y , Shan Y , Chen Q . 2022 Latent video diffusion models for high-fidelity video generation with arbitrary lengths. ar Xiv (10.48550/ar Xiv.2302.10130) · doi ↗
5Ho J et al . 2022 Imagen video: high definition video generation with diffusion models. See https://imagen.research.google/video/paper.pdf.
6Hoogeboom E , Satorras VG , Vignac C , Welling M . 2022 Equivariant diffusion for molecule generation in 3D. In Int. Conf. on Machine Learning, Baltimore, MD, 17–23 July 2022. Red Hook, NY: Curran Associates, Inc. https://www.proceedings.com/68317.html.
7Luo S , Hu W . Diffusion probabilistic models for 3d point cloud generation. In Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). Online, 19-25 June 2021, pp. 2837–2845. Red Hook, NY: Curran Associates, Inc. https://www.proceedings.com/60773.html.
8Trippe BL , Yim J , Tischer D , Baker D , Broderick T , Barzilay R , Jaakkola T . 2022 Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem. In International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023, Red Hook, NY: Curran Associates, Inc. https://www.proceedings.com/75096.html.