Fisher information regularization schemes for Wasserstein gradient flows
Wuchen Li, Jianfeng Lu, Li Wang

TL;DR
This paper introduces a Fisher information regularized variational scheme for Wasserstein gradient flows, improving convexity, stability, and computational efficiency, with applications to various PDEs.
Contribution
It develops a novel regularization approach based on Fisher information within the Jordan--Kinderlehrer--Otto framework, enhancing numerical stability and efficiency.
Findings
Improves convexity and stability of Wasserstein gradient flow computations.
Reduces computational cost by eliminating the need for additional time interpolation.
Demonstrates effectiveness on multiple PDE examples, including porous media and nonlinear Fokker-Planck equations.
Abstract
We propose a variational scheme for computing Wasserstein gradient flows. The scheme builds upon the Jordan--Kinderlehrer--Otto framework with the Benamou-Brenier's dynamic formulation of the quadratic Wasserstein metric and adds a regularization by the Fisher information. This regularization can be derived in terms of energy splitting and is closely related to the Schr{\"o}dinger bridge problem. It improves the convexity of the variational problem and automatically preserves the non-negativity of the solution. As a result, it allows us to apply sequential quadratic programming to solve the sub-optimization problem. We further save the computational cost by showing that no additional time interpolation is needed in the underlying dynamic formulation of the Wasserstein-2 metric, and therefore, the dimension of the problem is vastly reduced. Several numerical examples, including porous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Fisher information regularization schemes for Wasserstein gradient flows
Wuchen Li
Mathematics department, University of California, Los Angeles 90095
,
Jianfeng Lu
Departments of Mathematics, Physics, and Chemistry, Duke University, Box 90320, Durham, NC 27708.
and
Li Wang
School of Mathematics, University of Minnesota, Twin cities, MN 55455.
Abstract.
We propose a variational scheme for computing Wasserstein gradient flows. The scheme builds upon the Jordan–Kinderlehrer–Otto framework with the Benamou-Brenier’s dynamic formulation of the quadratic Wasserstein metric and adds a regularization by the Fisher information. This regularization can be derived in terms of energy splitting and is closely related to the Schrödinger bridge problem. It improves the convexity of the variational problem and automatically preserves the non-negativity of the solution. As a result, it allows us to apply sequential quadratic programming to solve the sub-optimization problem. We further save the computational cost by showing that no additional time interpolation is needed in the underlying dynamic formulation of the Wasserstein-2 metric, and therefore, the dimension of the problem is vastly reduced. Several numerical examples, including porous media equation, nonlinear Fokker-Planck equation, aggregation diffusion equation, and Derrida-Lebowitz-Speer-Spohn equation, are provided. These examples demonstrate the simplicity and stableness of the proposed scheme.
Key words and phrases:
Time discretization; Gradient flow; Fisher information; Optimal transport; Schrödinger bridge problem.
1. Introduction
Consider the general continuity equation of the form:
[TABLE]
where , is the particle density function, is an internal energy, is a drift potential, and is an interaction potential. , are gradient and divergence operators with respect to in . This equation can be derived as a mean-field limit of particle systems with a number of physical and biological applications, such as granular materials [22], chemotaxis, animal swarming [19, 3], and many others. In particular, the Fokker-Planck equation [44], porous medium equation [54], aggregation equation [59, 38], Keller-Segel equation [45], and quantum drift-diffusion equation [41] all fall within this framework.
As written, equation (1) possesses two immediate properties: it preserves the non-negativity of the solution and conserves total mass. Therefore, in what follows, we will always consider nonnegative initial data with mass one, so that the solution is in the set of probability measures on , . The third property of (1) is the dissipation of the energy, which can be seen as follows. Given an energy , we may formally define its gradient with respect to the quadratic Wasserstein metric as
[TABLE]
where always denotes the first variation in throughout the paper. Comparing it with (1), one can write the velocity field as , and view equation (1) as the gradient flow of the energy
[TABLE]
Differentiating the energy (2) along solutions of (1), one formally obtains the decreasing of energy along the gradient flow , which indicates that the solution evolves in the direction of steepest descent of an energy. This property entails a full characterization of the set of stationary states, and provides a necessary tool to study its stability.
Desired numerical methods for (1) are to attain all three properties above at the discrete level, which, however, are rather challenging. Existing methods have been developed on different prospects of the equation. One kind of methods views it as an advection diffusion equation and employs finite difference, finite volume, or discontinuous Galerkin [39, 31, 26, 52, 58, 1]. Such methods are explicit or semi-implicit in time, so the per time computation is cheap. But they often suffer from stability constraints, due either to the degeneracy of the diffusion or the non-locality from the interaction potential, such as the mesa problem [51]. Another approach leverages structural similarities between (1) and equations from fluid dynamics to develop particle methods [36, 12, 9, 53, 15]. On one hand, particle methods naturally conserve mass and positivity, and they can also be designed to respect the underlying gradient flow structure of the equation so as to dissipate the energy along time. On the other hand, a large number of particles is often required to resolve finer properties of solutions.
A third class of methods builds on a variational formulation, following the seminal work by Jordan, Kinderlehrer, and Otto [44]. Given a time step , the scheme (known as the JKO scheme) recursively defines a sequence as
[TABLE]
where , and denotes the quadratic Wasserstein distance between two probability measures. Therefore, (3) offers a positivity preserving, energy dissipating, and unconditionally stable time discretization. A major bottleneck of this approach is the computation of , which is an infinite dimensional minimization problem. Hence, early works that use (3) avoid direct computing either by linearization [42, 11] or by diffeomorphisms [10, 23, 24], which lead to methods that lose some inherited properties in (3) or are limited by complicated geometry and structure. Only recent progress in computing has enabled the direct application of (3) [56, 6, 32, 27].
In the present work, we will adopt Benamou-Brenier’s dynamic formulation for the Wasserstein distance [5]. In particular, given two measures and , their Wasserstein distance can be obtained by solving
[TABLE]
where is the outer unit normal on the boundary of the domain . Adapting (4) into (3), and let , we have the following computable reformulation of the JKO scheme: given , with solving
[TABLE]
where
[TABLE]
To solve (5)–(9), there are two sources of difficulties. One lies in the non-smooth function of , so that second order information that often used to accelerate the optimization can not be applied. Occasionally, erroneous solution near may be produced. The other comes from the artificial time introduced in the dynamic formulation (4), which increases the dimension of the problem.
To overcome these two issues, we propose the following scheme: where
[TABLE]
The additional term, is the Fisher information functional. It keeps away from zero (see Theorem 5) and thus simplifies to just . More importantly, it improves the convexity of the cost functional and gives access to the second order sequential programming which enjoys much faster convergence. In addition, we replace the time derivative in the dynamics by a one step finite difference. We shall show in Theorem 3 that such a simplification will not violate the first order accuracy of the original JKO formulation (5). Furthermore, as we shall see in Section 3, compared to classical backward Euler method that may suffer from ill-conditioned Jacobian [1], (10) provides a symmetric, structure preserving version of implicit method that is insensitive to the condition number of the Hessian, and has guaranteed convergence. We note that the relation between Fisher information and Schrödinger bridge problem (SBP) can be seen from [13, 29, 35, 46, 48], and will be further discussed in Section 2.
It is also important to mention that the Fisher information regularization is closely related to the entropic regularization that has been successfully applied in many optimal transport problems [37, 56, 14, 40, 32]. There, the Kantorovich formulation based on the joint distribution between two measures is adopted, and the entropic regularization term is added to the cost function, so that an iterative projection method, Sinkhorn method or more general Dykstra’s algorithm, can be applied with linear convergence. This method, when applied to gradient flow problem, has a major difficulty in computing the proximal of the energy (2) with respect to the Kullback-Leibler divergence, which does not have a closed form in general.
The paper is organized as follows. In the next section, we provide necessary background on the dynamical formulation of Schrodinger bridge problem (SBP), its relation with Wasserstein gradient flows and the Fisher information functional. We then derive the Fisher information regularized semi-discrete scheme in the end. In Section 3, we introduce a fully discrete scheme and study the properties of this new scheme. Numerical results are provided in Section 4, and the paper is concluded in Section 5.
2. Semi-discretization with Fisher information regularization
In this section, we briefly review the Schrödinger bridge problem, Fisher information regularization and Wasserstein gradient flow. We then weave together these ideas to derive our new regularized time discretization.
2.1. Schrödinger Bridge problem and Fisher regularization
Consider a bounded convex domain , and the probability density space
[TABLE]
Definition 1** (Schrödinger bridge problem).**
Denote . Given , , let
[TABLE]
where the infimum is taken among all drift functions and density functions satisfying the Fokker-Planck equation
[TABLE]
with the fixed initial and ending density functions
[TABLE]
and Neumann boundary condition for : on .
Here, , are two given constant parameters in . Note specifically that when , equals to the Wasserstein-2 distance between and , where the problem (11) is equivalent to the Benamou–Brenier formula [7]. Here we are using the product as a regularization parameter just to facilitate the derivation for the gradient flow later.
Interestingly, the variational problem (11) has the following symmetric reformulations.
Proposition 2** (Fisher information regularization).**
Denote , then
[TABLE]
subject to the dynamical constraint
[TABLE]
with initial and boundary conditions:
[TABLE]
Proof.
First, rewrite the the Fokker-Planck equation (12) as follows:
[TABLE]
where we notice the fact that , and .
Denote , and let , then (12) reduces to (14). Next, we shall show that with the above definition of , the cost functional in (13) is the same as that in (11). Indeed,
[TABLE]
We then show that last term in the above equation only depends on the initial and final condition. This is seen from the fact that
[TABLE]
where the second last equality comes from the definition of first variation. The other direction of the equivalence follows similarly. ∎
Here, the symmetric version of Schrödinger bridge problem relates to the optimal control problem of gradient flows. See related geometric studies in [47, 49]. The additional term
[TABLE]
in the cost functional is named the Fisher information. In the sequel, we will apply the symmetric SBP to compute the Wasserstein gradient flow with serving as a regularization. The numerical benefits of this regularization will be discussed in Section 3.
2.2. Energy splitting and time discretization
We are now ready to derive the main scheme (10) of this paper. Starting from the classical JKO formulation (5) of the Wasserstein gradient flow (1), we split the energy into two parts
[TABLE]
and move to the flow constraint in (5). Here is the entropy defined above. More specifically, given , we update by solving the following new form for :
[TABLE]
Intuitively, the difference between (5) and (16) lies in the flow of , , between and . In (5), the flow is purely convective, and one controls it at final time using full energy ; whereas in (16), the diffusion effect in full energy (i.e., ) is moved to modify the flow so that the flow is both convective and diffusive, and therefore one only need to control its partial energy at the final time. Moreover, we observe that, the flux for in this new form is the same as that in the original form (5). Since (5) provides a first order approximation of that resembles backward Euler scheme, the equivalence in the flux implies that (16) is also a first order approximation in terms of . Indeed, we rewrite (5) using the Lagrangian multipliers
[TABLE]
then the optimality condition leads to
[TABLE]
Therefore satisfies the Hamilton-Jacobi equation , and
[TABLE]
Plugging it into the constraint PDE in (5), one gets the flux for the original gradient flow equation (1) after one time step . Similarly, we rewrite (16) as
[TABLE]
then the optimality condition leads to
[TABLE]
Consequently, , which substituting back into the constraint of (16) leads to the same flux for as in (18).
Next, we rewrite (16) in line with Proposition 2. Let (so that ), and plug it into the objective function in (16), we have
[TABLE]
where the reformulation of the third term in the second equation follows (15). Omitting the tilde in (19), (16) can be reformulated as
[TABLE]
In practice, we want to remove the additional dimension induced by the flow, and therefore approximate the derivate in in the constraint PDE of (20) by a one step difference and the integral in time in the objective function by a one term quadrature. This leads to our main scheme (10). In the following theorem, we show that, such an approximation does not violate the first order accuracy of the original JKO scheme.
Theorem 3** (Fisher information regularization scheme).**
The minimizer of the variational problem (10) is a first-order time consistent scheme for Wasserstein gradient flow (1).
Proof.
First it is straightforward to check that variational problem (10) is strictly convex. We then solve it by the Lagrange multiplier method. Define the Lagrangian as:
[TABLE]
The critical solution of above variation problem forms
[TABLE]
The solution of above system satisfies
[TABLE]
Denote the solution of above system as follows: . We then derive the following update
[TABLE]
Therefore scheme (10) is a first order time discretization. ∎
It is worth mentioning that there are several cases that the Fisher information regularization schemes are exact for the computation of gradient flows.
Proposition 4** (Exact cases).**
If , then the iterative scheme (10) is a first order scheme for the equation
[TABLE]
Proof.
If , then , the scheme (10) becomes
[TABLE]
Following Theorem 3, the algorithm is a consistent first time discretization of Wasserstein gradient flow for functional . ∎
Several remarks are in order.
Remark 1* (Comparison with the classical JKO scheme).*
Compared with the classical approach of JKO (5), our method does not require any inner time interpolation in the underlying dynamical formulation. It still preserves the first order time accuracy of the time discretization.
Remark 2* (Schrödinger Bridge problem proximal).*
The variational problem (20) can be viewed as a Schrödinger bridge proximal method of Wasserstein gradient flows.
Remark 3* (Comparison with entropic regularization of gradient flow [57]).*
We also compare our method with the entropic gradient flow studied in [14, 57]. A known fact is that when , the SBP problem has the static formulation [55]
[TABLE]
where is a constant and the infimum is over all joint histogram with marginals , . In [14, 57], the algorithm applies the above static formulation and considers the iterative regularization algorithm for the computation of gradient flow. Our formulation mainly uses the dynamical formulation of SBP, especially its time symmetric version in Proposition 2.
Remark 4* (Generalized regularization functional).*
Besides using , we can also study other types of regularizations, e.g., . We leave these studies for future works.
3. Full discretization and optimization algorithm
In this section, we detail the spatial discretization and provide a complete algorithm for the fully discrete problem. The underlying principle for spatial discretization is to preserve the structure of Wasserstein metric tensor in the discrete sense so that it can be easily adapted to unstructured grid and more complicated equations with energy involving high order derivatives. Thanks to the Fisher information regularization, the resulting optimization is strictly convex and therefore gives access to second order Newton type optimization algorithms.
3.1. Spatial Discretization
To better explain the idea, we first consider the discretization in one spatial dimension on uniform grid. Let be the computational domain and and be the spatial grid and temporal step respectively. Choose , and define
[TABLE]
where , and . Note first that from the boundary condition, then the cost function in scheme (10) can be discretized as
[TABLE]
where in its general form reads
[TABLE]
Here and are vector representations of vectors and , i.e., , and . The constraint is discretized with center difference in space as follows
[TABLE]
and the zero boundary conditions is applied.
Extension to two dimension is straightforward. Denote
[TABLE]
then the no-flux boundary condition on are imposed dimension by dimension, i.e.,
[TABLE]
The cost function then writes
[TABLE]
and the constraint becomes
[TABLE]
Generalization on graphs can be found in related studies [33, 34].
Upon spatial discretization, we therefore have the following finite dimensional variational problem:
[TABLE]
Here is a vector of sub index (e.g., in two dimension), or , . Written in this way, the discretization can be directly generalized to unstructured grid.
Remark 5*.*
In practice, we will impose an non-negativity of to avoid unexpected negative solution when the optimization is not fully converged, i.e., the iteration terminates when the stopping criteria is met. However, as we will show in Theorem 5, the non-negativity shall be preserved when the underlying optimization is solved exactly.
Denote its minimizer as , then . We study the property of problem (26). Note that the constraints contain both equalities and inequalities, we will demonstrate that the Fisher information regularization plays the crucial role of penalty function, which enforces the density solution staying in the interior of probability simplex. We next prove several properties of the proposed algorithm.
Theorem 5**.**
For each , the following properties hold for scheme (26):
- (i)
There exists a unique minimizer for the problem;
- (ii)
The modified energy decays
[TABLE]
- (iii)
There exists a constant , such that
[TABLE]
- (iv)
The total mass is conserved
[TABLE]
Proof.
(i) The proof is based on the result of [50]. For the completeness of paper, we present it here. We shall show that
- (1)
The discrete Fisher information functional is shown to be positive infinity on the boundary of the probability set. Thus the minimizer of (26) is obtained in the interior of simplex.
- (2)
The optimization problem (26) is strict convex in the interior of the constraint.
For notational convenience, we denote
[TABLE]
We first show that the minimizer of (26) in term of is strictly positive. This is true since is positive infinity on the boundary of simplex set, i.e.
[TABLE]
where is the vertices set of the discretization. Suppose the above is not true, there exists a constant , such that if there exists some , , then
[TABLE]
where is the edge set of the discretization. Notice that each term in (27) is non-negative, thus
[TABLE]
for any edge. Since , the above formula further implies that for any , . This is true since if , we have
[TABLE]
Similarly, we show that for any nodes , . Here is the neighborhood of node in the discretization grids. We iterate the above steps a finite number of times. Since the lattice graph is connected and the set is finite, we obtain , for any . This contradicts the assumption that , which finishes the proof.
We now prove that is strictly convex in the variable with a constraint , , for any . We shall show
[TABLE]
Here , and is the constraint for lying on the simplex set. Notice the fact that
[TABLE]
where
[TABLE]
Hence
[TABLE]
where is due to the convention that each edge is summed twice.
We next show that the strict inequality in (28) holds. Suppose (28) is not true, there exists a unit vector such that
[TABLE]
Then . Combining this with the constraint , we have , which contradicts that is a unit vector.
Second, we show that is strictly convex in . Notice that is in the interior of optimization domain, we have , thus the objective function is smooth. We shall show that , where
[TABLE]
subject to
[TABLE]
Here, is the smallest eigenvalue of Hessian matrix for the objective function with tangent vectors . We last show that is a smooth, convex function in the interior of simplex set. We have
[TABLE]
Since is convex when and is concave on variables , . Then is convex. From (28), we have
[TABLE]
We claim that the inequality in (31) is strict. Suppose there exists , such that (31) is zero, i.e.
[TABLE]
In this case, from (28), . Thus (31) forms
[TABLE]
Since is strictly positive, we have , which contradicts the fact that . From the above statements, we prove that there exists a unique solution
(ii) Denote as the minimizer of variation problem (26). Then
[TABLE]
This further implies
[TABLE]
which finishes the proof.
(iii) holds since goes to infinity on the boundary of simplex set.
(iv) is true, because the continuity equation in (26) satisfies
[TABLE]
This finishes the proof. ∎
3.2. Optimization method
To solve (26), we first rewrite our problem into a vector form. Let , then (26) can be written as
[TABLE]
where is defined in (22), is the matrix representation of the constraint (23) or (25), and is a selection matrix that only selects the components in . Let be the indicator function, then (32) can be further reformulated as
[TABLE]
Here defined in either (22) or (24) is a smooth, convex function (provided is convex), and indicator function is also convex. Therefore we adopt the (approximate) sequential quadratic programming to solve it:
[TABLE]
where is either the Hessian or an approximation of it.
[TABLE]
There are several approaches to solve the subproblem in line 6. Among them, we have tried interior point method, projected preconditioned conjugate gradient method [43], and first order fast iterative shrinkage thresholding algorithm (FISTA) [4]. In our case where is sparse but ill conditioned, we found that the MATLAB built-in function ‘quadprog’ with interior point solver performs the best.
3.3. Convergence
In this section, we analyze the convergence of (34), especially the role that plays. We have the following assumptions:
- (A1)
, ;
- (A2)
the subproblem in (34) is solved exactly.
First we note that can be rewritten as
[TABLE]
where . Further, let be the unique minimizer to (33), then solves
[TABLE]
where and is any positive definite matrix. Our first result is, when is only an approximation of , we get first order convergence with convergence rate depends on the condition number . More specifically, we have
Theorem 6**.**
Consider uniform time step . Let , then
[TABLE]
where is the condition number of .
To prove the above theorem, we need the following lemma on the contraction of the proximal operator.
Lemma 7**.**
If , , where is a convex function, and is a positive definite matrix, then we have . Consequently, .
The proof of this lemma is standard, so we omit the details and directly jump to the proof of Theorem 6.
Proof of Theorem 6.
By virtue of (35), and (36) with , we have
[TABLE]
where the first inequality uses Lemma 7. Since both and are positive definite from assumption (A1), we denote as the eigenvalues of , then
[TABLE]
Here we have used the fact that for two symmetric positive semi-definite matrix and , if , then . Indeed, since , let , we have , therefore, .
Choose in (39) so that it minimize its RHS, and plug it into (38) to get the final result. ∎
Remark 6* (Comparison with proximal gradient).*
Consider the proximal gradient method for solving (33)
[TABLE]
Comparing it to (35), we see that (35) is a preconditioned version of (40). Indeed, substituting (36) with from (40), we get
[TABLE]
where is the condition number of . Therefore when is ill-conditioned, which is the case in the presence of vacuum due to the nonlinear diffusion, the convergence rate in (41) is much slower than that in (37).
Remark 7* (Choice of ).*
In our problems when the energy term only contains internal and potential energies, both of which are local in , we directly compute the Hessian of as , since in this case the Hessian is sparse and very cheap to compute. When also contains interaction energy, the Hessian of is dense, and we instead approximate the Hessian of by replacing the interaction energy with entropy , and adjusting the parameter in the Fisher information term to approximate the original Hessian. More specifically, for the general case where is
[TABLE]
We compute the Hessian of
[TABLE]
as an approximation of . Here is an integer multiple of .
We close this subsection by stating following result that when is exact Hessian of , we obtain local quadratic convergence. We omit the proof, which is standard.
Theorem 8**.**
Assume further that . If in (34), then for sufficiently large , , and satisfies
[TABLE]
4. Numerical examples
In this section, we demonstrate several numerical examples to show the accuracy and efficiency of the proposed scheme (26). The stopping criteria in the sub optimization problem (see line 9 in the Algorithm) is chosen as
[TABLE]
where TOL is set to be unless otherwise specified.
4.1. 1D problem
4.1.1. Heat equation
For heat equation, we directly choose , and let the initial condition be
[TABLE]
In Fig. 1 on the left, we apply (26) on a coarse mesh and compare the solution with the reference solution obtained by implicit diffusion solver on a fine mesh, and observe good agreements. When is sufficiently small, we check the first order accuracy of our scheme by computing the following relative error
[TABLE]
and error with respect to the reference solution
[TABLE]
with decreasing .
4.1.2. Porous medium equation
The porous medium equation
[TABLE]
can be considered as the Wasserstein gradient flow of the energy (2), with and . A well-known family of exact solutions is given by Barenblatt profiles (c.f. [60]), which are densities of the form
[TABLE]
In our tests, we choose , and . We plot the evolution of the numerical solution over time in Fig. 2, and we observe good agreement with the exact solution of the form (46), which is shown in dashed curve.
Next, we examine how the entropic regularization affects the solution. In the left plot of Fig. 3, we compare solutions obtained by our scheme with various and we observe that near the boundary of the solutions’ support where a non-smooth transition is expected (see the black dashed curve for the exact solution), our solution with regularization inevitably smooth out the solution. As decreases, the solution improves moderately. On the right, we compare the error between our solution with the exact formula (46):
[TABLE]
As expected, smaller leads to better accuracy. However, as the regularization parameter is closely related to the convexity of the problem and thus affects the convergence of the method, one has to strike a balance between the accuracy and efficiency by choosing neither too big nor too small.
4.1.3. Nonlinear Fokker-Planck equation
Next, we consider a nonlinear variant of the Fokker-Planck equation, by replacing the linear diffusion with the porous medium type nonlinear diffusion (45):
[TABLE]
When is a confining drift potential, all solutions approach the unique steady state
[TABLE]
where depends on the mass of the initial data, i.e., denote , then , see [25, 21] for a derivation.
In Figure 4, we compute the solutions to the nonlinear Fokker-Planck equation with , , and initial data given by . On the left, we plot the evolution of the density towards the steady state . On the right, we compute the rate of decay of the corresponding energy (2) as a function of time, observing exponential decay as the solution approaches equilibrium, which is consistent with the analytic results on convergence to equilibrium [25, 18].
4.1.4. Aggregation equation
In this subsection, we consider a nonlocal aggregation equation of the form
[TABLE]
where the interaction kernel is repulsive at short length scales and attractive at longer distances. In particular, we choose the following kernel with logarithmic repulsion and quadratic attraction
[TABLE]
then it is proved that there exists a unique equilibrium profile [28], given by
[TABLE]
In practice, to avoid evaluation of at , we set to equal the average value of on the cell of width centered at 0, i.e., , where we compute this value analytically. (See also [26, 27] for a similar treatment.)
The numerical results are gathered in Fig. 5. On the left, we simulate the solution to the aggregation equation with Gaussian initial data at varying times, observing convergence to the equilibrium profile . On the right, we compute the rate of the decay of the energy as a function of time, observing exponential decay with the theoretical rate as obtained by Carrillo et. al. [28].
4.1.5. Derrida-Lebowitz-Speer-Spohn (DLSS) equation
We now consider a DLSS equation
[TABLE]
where and is the first variation operator with
[TABLE]
As written, the DLSS equation can be considered as the Wasserstein gradient flow of functional: . In practice, we just replace by and choose in (10).
When , the stationary solution has an explicit form
[TABLE]
With double-Gaussian initial condition
[TABLE]
we plot the results in Fig. 6. On the left, one sees an evolution of towards the equilibrium (51); on the right, an exponential convergence of the energy is demonstrated.
Likewise, for a double-well potential , with the same initial condition, we collect the results in Fig.7. Unlike the previous case, the steady state here has two bumps.
4.2. 2D problem
4.2.1. Aggregation equation
We first consider aggregation equation (49) with attractive-repulsive potentials in two dimensions with interaction kernel
[TABLE]
where . In this case, the repulsion near the origin determines the dimension of the support of the steady state measure, see [2, 17].
In the first example, we choose , , and take the initial data to be a Gaussian
[TABLE]
with mean and variance . Here the steady state concentrates on a Dirac ring with radius 0.5 centered at , recovering analytical results on the existence of a stable Dirac ring equilibrium [8]. We also compare the convergence in the first outer JKO time step of our regularized sequential quadratic programming with the un-regularized primal dual method [27] in Fig. 9, and a much faster convergence in Newton’s method is observed.
In the second example, we consider interaction kernel (52) with different parameters: and and the results are displayed in Fig. 10. We observe that the solution converges to a characteristic function on the disk of radius 1, centered at , recovering analytic results on solutions of the aggregation equation with Newtonian repulsion [38, 9].
4.2.2. Aggregation drift equation
We compute solutions of aggregation-drift equations
[TABLE]
where and . As shown in the analytical results [30, 20], the steady state is a characteristic function on a torus, with inner and outer radius given by , . The initial condition consists of five Gaussians, which is non-radially symmetric. The evolution of towards equilibrium along with the energy decay in time are displayed in Fig. 11.
4.2.3. Aggregation diffusion equation
Consider the aggregation diffusion equations
[TABLE]
When the interaction kernel is attractive, the competition between the nonlocal aggregation and nonlinear diffusion causes solutions to behave differently in various regimes—either finite time blow up or globally exist in time, see the survey [16]. In Fig. 12, we take , , . Computational domain is chosen as , initial data is .
4.2.4. The DLSS model
We close the section by computing a two dimensional DLSS equation
[TABLE]
with and initial condition consisting of four Gaussians. In this case, we do not need a regularization and Hessian is computed exactly. As seen in Fig. 13, the density converges to the equilibrium very rapidly. In the bottom center plot, we also compare slice of the steady state computed via our method with the exact equilibrium, and observe a good match.
5. Discussion
In this paper, we propose a variational time discretization scheme for Wasserstein gradient flows. The scheme applies the quadric approximation of Wasserstein-2 metric and introduces the Fisher information regularization into the iterative regularization. In discrete grids, this regularized term helps the gradient flow path to maintain positivity during the evolution and further improves the convexity of the variational problem.
Acknowledgement: WL was partially supported by AFOSR MURI FA9550-18-1-0502. JL was partially supported by NSF under grant DMS-1454939. LW was partially supported by NSF grant DMS-1903420 and NSF CAREER grant DMS-1846854. The authors are grateful to the support from KI-Net (NSF grant RNMS-1107444) and UMN-Math Visitors Program to facilitate the collaboration.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. Bailo, J. A. Carrillo, and J. Hu. Fully discrete positivity-preserving and energy-dissipative schemes for nonlinear nonlocal equations with a gradient flow structure. preprint ar Xiv: , 2018.
- 2[2] D. Balagué, J. A. Carrillo, T. Laurent, and G. Raoul. Dimensionality of local minimizers of the interaction energy. Arch. Ration. Mech. Anal. , 209(3):1055–1088, 2013.
- 3[3] Alethea B. T. Barbaro, José A. Cañizo, José A. Carrillo, and Pierre Degond. Phase transitions in a kinetic flocking model of Cucker-Smale type. Multiscale Model. Simul. , 14(3):1063–1088, 2016.
- 4[4] Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences , 2(1):183–202, 2009.
- 5[5] J.-D. Benamou and Y. Brenier. A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numer. Math. , 84:375–393, 2000.
- 6[6] J.-D. Benamou, G. Carlier, and M. Laborde. An augmented Lagrangian approach to Wasserstein gradient flows and applications. ESAIM: PROCEEDINGS AND SURVEYS , 54:1–17, 2016.
- 7[7] Jean-David Benamou and Yann Brenier. A computational fluid mechanics solution to the monge-kantorovich mass transfer problem. Numerische Mathematik , 84(3):375–393, Jan 2000.
- 8[8] Andrea L. Bertozzi, Theodore Kolokolnikov, Hui Sun, David Uminsky, and James von Brecht. Ring patterns and their bifurcations in a nonlocal model of biological swarms. Commun. Math. Sci. , 13(4):955–985, 2015.
