On the fundamental solution and a variational formulation of a   degenerate diffusion of Kolmogorov type

Manh Hong Duong; Hoang Minh Tran

arXiv:1703.07622·math.AP·May 4, 2018

On the fundamental solution and a variational formulation of a degenerate diffusion of Kolmogorov type

Manh Hong Duong, Hoang Minh Tran

PDF

TL;DR

This paper constructs the fundamental solution for a degenerate Kolmogorov diffusion and introduces a variational scheme for its adjoint, leveraging optimal transport and mean squared derivative costs, with proven convergence.

Contribution

It provides a novel fundamental solution and a variational scheme for the adjoint of a degenerate Kolmogorov diffusion, extending previous methods.

Findings

01

Successfully constructed the fundamental solution.

02

Developed a convergent variational scheme.

03

Extended results to more general degenerate diffusions.

Abstract

In this paper, we construct the fundamental solution to a degenerate diffusion of Kolmogorov type and develop a time-discrete variational scheme for its adjoint equation. The so-called mean squared derivative cost function plays a crucial role occurring in both the fundamental solution and the variational scheme. The latter is implemented by minimizing a free energy functional with respect to the Kantorovich optimal transport cost functional associated with the mean squared derivative cost. We establish the convergence of the scheme to the solution of the adjoint equation generalizing previously known results for the Fokker-Planck equation and the Kramers equation.

Equations472

\partial_{t} ρ (t, x_{1}, \dots, x_{n}) = - i = 2 \sum n x_{i} \cdot \nabla_{x_{i - 1}} ρ + div_{x_{n}} (\nabla V (x_{n}) ρ) + Δ_{x_{n}} ρ

\partial_{t} ρ (t, x_{1}, \dots, x_{n}) = - i = 2 \sum n x_{i} \cdot \nabla_{x_{i - 1}} ρ + div_{x_{n}} (\nabla V (x_{n}) ρ) + Δ_{x_{n}} ρ

\partial_{t} f (t, x_{1}, \dots, x_{n}) = i = 2 \sum n x_{i} \cdot \nabla_{x_{i - 1}} f + Δ_{x_{n}} f .

\partial_{t} f (t, x_{1}, \dots, x_{n}) = i = 2 \sum n x_{i} \cdot \nabla_{x_{i - 1}} f + Δ_{x_{n}} f .

\partial_{t} ρ = div (\nabla V ρ) + Δ ρ,

\partial_{t} ρ = div (\nabla V ρ) + Δ ρ,

\partial_{t} ρ = - x_{2} \cdot \nabla_{x_{1}} ρ + div_{x_{2}} (\nabla V (x_{2}) ρ) + Δ_{x_{2}} ρ .

\partial_{t} ρ = - x_{2} \cdot \nabla_{x_{1}} ρ + div_{x_{2}} (\nabla V (x_{2}) ρ) + Δ_{x_{2}} ρ .

\partial_{t} f = x_{2} \cdot \nabla_{x_{1}} f + Δ_{x_{2}} f .

\partial_{t} f = x_{2} \cdot \nabla_{x_{1}} f + Δ_{x_{2}} f .

\Phi_{\rm FP}(t,\textbf{x},\textbf{y})=\frac{1}{(4\pi t)^{\frac{d}{2}}}\exp\Big{(}-\frac{C^{\rm FP}(\textbf{x},\textbf{y})}{4t}\Big{)}\quad\text{with}\quad C^{\rm FP}(\textbf{x},\textbf{y})=|\textbf{x}-\textbf{y}|^{2}.

\Phi_{\rm FP}(t,\textbf{x},\textbf{y})=\frac{1}{(4\pi t)^{\frac{d}{2}}}\exp\Big{(}-\frac{C^{\rm FP}(\textbf{x},\textbf{y})}{4t}\Big{)}\quad\text{with}\quad C^{\rm FP}(\textbf{x},\textbf{y})=|\textbf{x}-\textbf{y}|^{2}.

\displaystyle\Phi_{2}(t,x_{1},y_{1};x_{2},y_{2})=\Bigg{(}\frac{\sqrt{3}}{2\pi t^{2}}\Bigg{)}^{d}\exp\left(-\frac{C^{\rm KR}_{t}(\textbf{x},\textbf{y})}{4t}\right)~{}\text{with}

\displaystyle\Phi_{2}(t,x_{1},y_{1};x_{2},y_{2})=\Bigg{(}\frac{\sqrt{3}}{2\pi t^{2}}\Bigg{)}^{d}\exp\left(-\frac{C^{\rm KR}_{t}(\textbf{x},\textbf{y})}{4t}\right)~{}\text{with}

\displaystyle C^{\rm KR}_{t}(\textbf{x},\textbf{y})=|y_{2}-y_{1}|^{2}+12\Big{|}\frac{x_{2}-x_{1}}{t}-\frac{y_{1}+y_{2}}{2}\Big{|}^{2}

W_{2}^{2} (μ, ν) := γ \in Γ (μ, ν) in f \int_{R^{d} \times R^{d}} ∣ x - y ∣^{2} γ (d x, d y) .

W_{2}^{2} (μ, ν) := γ \in Γ (μ, ν) in f \int_{R^{d} \times R^{d}} ∣ x - y ∣^{2} γ (d x, d y) .

\rho^{k}_{h}=\mathop{\mathrm{argmin}}_{\rho\in\mathcal{P}_{2}(\mathbb{R}^{d})}\left\{\frac{1}{2h}W^{2}(\rho^{k-1}_{h},\rho)+\int_{\mathbb{R}^{d}}\big{(}V+\log\rho)\rho\,d\textbf{x}\right\}.

\rho^{k}_{h}=\mathop{\mathrm{argmin}}_{\rho\in\mathcal{P}_{2}(\mathbb{R}^{d})}\left\{\frac{1}{2h}W^{2}(\rho^{k-1}_{h},\rho)+\int_{\mathbb{R}^{d}}\big{(}V+\log\rho)\rho\,d\textbf{x}\right\}.

\tilde{W}_{h}^{2} (μ, ν) = γ \in Γ (μ, ν) in f \int_{R^{2 d} \times R^{2 d}} C_{h}^{KR} (x, y) γ (d x, d y),

\tilde{W}_{h}^{2} (μ, ν) = γ \in Γ (μ, ν) in f \int_{R^{2 d} \times R^{2 d}} C_{h}^{KR} (x, y) γ (d x, d y),

\displaystyle C^{\rm FP}(\textbf{x},\textbf{y})=h\min_{\xi}\Big{\{}\int_{0}^{h}|\dot{\xi}(t)|^{2}\,dt:\xi\in C^{1}([0,h],\mathbb{R}^{d})~{}\text{such that}~{}\xi(0)=\textbf{x},~{}\xi(h)=\textbf{y}\Big{\}}\quad\text{and}

\displaystyle C^{\rm FP}(\textbf{x},\textbf{y})=h\min_{\xi}\Big{\{}\int_{0}^{h}|\dot{\xi}(t)|^{2}\,dt:\xi\in C^{1}([0,h],\mathbb{R}^{d})~{}\text{such that}~{}\xi(0)=\textbf{x},~{}\xi(h)=\textbf{y}\Big{\}}\quad\text{and}

\displaystyle C^{\rm KR}_{h}(\textbf{x},\textbf{y})=h\min_{\xi}\Big{\{}\int_{0}^{h}|\ddot{\xi}(t)|^{2}\,dt:\xi\in C^{2}([0,h],\mathbb{R}^{d})~{}\text{such that}~{}(\xi,\dot{\xi})(0)=\textbf{x}=(x_{1},x_{2}),

\displaystyle\hskip 227.62204pt~{}(\xi,\dot{\xi})(h)=\textbf{y}=(y_{1},y_{2})\Big{\}}.

C_{t} (x, y) := t ξ in f \int_{0}^{t} ∣ ξ^{(n)} (s) ∣^{2} d s,

C_{t} (x, y) := t ξ in f \int_{0}^{t} ∣ ξ^{(n)} (s) ∣^{2} d s,

(ξ, \dot{ξ}, \dots, ξ^{(n - 1)}) (0) = (x_{1}, x_{2}, \dots, x_{n}) and (ξ, \dot{ξ}, \dots, ξ^{(n - 1)}) (t) = (y_{1}, y_{2}, \dots, y_{n}) .

(ξ, \dot{ξ}, \dots, ξ^{(n - 1)}) (0) = (x_{1}, x_{2}, \dots, x_{n}) and (ξ, \dot{ξ}, \dots, ξ^{(n - 1)}) (t) = (y_{1}, y_{2}, \dots, y_{n}) .

\Phi(t,\textbf{x},\textbf{y}):=\frac{\beta_{d}}{t^{\frac{n^{2}d}{2}}}\exp\Big{(}-\frac{\mathcal{C}_{t}(\textbf{x},\textbf{y})}{4t}\Big{)},

\Phi(t,\textbf{x},\textbf{y}):=\frac{\beta_{d}}{t^{\frac{n^{2}d}{2}}}\exp\Big{(}-\frac{\mathcal{C}_{t}(\textbf{x},\textbf{y})}{4t}\Big{)},

t \to 0 lim Φ (t, x, y) = δ_{x = y} .

t \to 0 lim Φ (t, x, y) = δ_{x = y} .

W_{h} (μ, ν) = γ \in Γ (μ, ν) in f \int_{R^{d n} \times R^{d n}} C_{h} (x, y) γ (d x d y) .

W_{h} (μ, ν) = γ \in Γ (μ, ν) in f \int_{R^{d n} \times R^{d n}} C_{h} (x, y) γ (d x d y) .

\min_{\rho\in\mathcal{P}_{2}(\mathbb{R}^{dn})}\frac{1}{2h}\mathcal{W}_{h}(\rho_{k-1}^{h},\rho)+\int_{\mathbb{R}^{dn}}\big{(}V(x_{n})+\log\rho\big{)}\rho\,d\textbf{x}.

\min_{\rho\in\mathcal{P}_{2}(\mathbb{R}^{dn})}\frac{1}{2h}\mathcal{W}_{h}(\rho_{k-1}^{h},\rho)+\int_{\mathbb{R}^{dn}}\big{(}V(x_{n})+\log\rho\big{)}\rho\,d\textbf{x}.

\int_{0}^{\infty}\int_{\mathbb{R}^{dn}}\Big{[}\partial_{t}\varphi+\sum_{i=2}^{n}x_{i}\cdot\nabla_{x_{i-1}}\varphi-\nabla_{x_{n}}V(x_{n})\cdot\nabla_{x_{n}}\varphi+\Delta_{x_{n}}\varphi\Big{]}\rho(t,\textbf{x})\,d\textbf{x}\,dt\\ =-\int_{\mathbb{R}^{dn}}\varphi(0,\textbf{x})\rho_{0}(\textbf{x})\,d\textbf{x}~{}~{}\text{for all}~{}~{}\varphi\in C_{c}^{\infty}(\mathbb{R}\times\mathbb{R}^{dn})

\int_{0}^{\infty}\int_{\mathbb{R}^{dn}}\Big{[}\partial_{t}\varphi+\sum_{i=2}^{n}x_{i}\cdot\nabla_{x_{i-1}}\varphi-\nabla_{x_{n}}V(x_{n})\cdot\nabla_{x_{n}}\varphi+\Delta_{x_{n}}\varphi\Big{]}\rho(t,\textbf{x})\,d\textbf{x}\,dt\\ =-\int_{\mathbb{R}^{dn}}\varphi(0,\textbf{x})\rho_{0}(\textbf{x})\,d\textbf{x}~{}~{}\text{for all}~{}~{}\varphi\in C_{c}^{\infty}(\mathbb{R}\times\mathbb{R}^{dn})

V \in C^{2} (R^{d}), V (x) \geq 0 for all x \in R^{d},

V \in C^{2} (R^{d}), V (x) \geq 0 for all x \in R^{d},

∣ \nabla V (x_{1}) - \nabla V (x_{2})∣ \leq C ∣ x_{1} - x_{2} ∣ .

∣ \nabla V (x_{1}) - \nabla V (x_{2})∣ \leq C ∣ x_{1} - x_{2} ∣ .

ρ^{h} (t) = ρ_{k}^{h} for (k - 1) h < t \leq k h .

ρ^{h} (t) = ρ_{k}^{h} for (k - 1) h < t \leq k h .

ρ^{h} ⇀ ρ weakly in L^{1} ((0, T) \times R^{d n}) as h \to 0,

ρ^{h} ⇀ ρ weakly in L^{1} ((0, T) \times R^{d n}) as h \to 0,

ρ^{h} (t) \to ρ (t) weakly in L^{1} (R^{d n}) as h \to 0 for any t > 0,

ρ^{h} (t) \to ρ (t) weakly in L^{1} (R^{d n}) as h \to 0 for any t > 0,

ρ (t) \to ρ_{0} in L^{1} (R^{d n}) .

ρ (t) \to ρ_{0} in L^{1} (R^{d n}) .

C_{t} (x, y) = t^{2 - 2 n} [b (t, x, y)]^{T} M b (t, x, y)

C_{t} (x, y) = t^{2 - 2 n} [b (t, x, y)]^{T} M b (t, x, y)

\emph{{b}(t,{x},{y})}=\begin{pmatrix}y_{1}-x_{1}-\frac{t}{1}x_{2}-...-\frac{t^{n-1}}{(n-1)!}x_{n}\\ \vdots\\ t^{i-1}\Big{(}y_{i}-\sum_{j=i}^{n}\frac{t^{j-i}}{(j-i)!}x_{j}\Big{)}\\ \vdots\\ t^{n-1}(y_{n}-x_{n})\end{pmatrix}

\emph{{b}(t,{x},{y})}=\begin{pmatrix}y_{1}-x_{1}-\frac{t}{1}x_{2}-...-\frac{t^{n-1}}{(n-1)!}x_{n}\\ \vdots\\ t^{i-1}\Big{(}y_{i}-\sum_{j=i}^{n}\frac{t^{j-i}}{(j-i)!}x_{j}\Big{)}\\ \vdots\\ t^{n-1}(y_{n}-x_{n})\end{pmatrix}

\displaystyle A=\left[\begin{array}[]{ccc}1&...&1\\ \begin{pmatrix}n\\ 1\end{pmatrix}&...&\begin{pmatrix}2n-1\\ 1\end{pmatrix}\\ \vdots&\vdots&\vdots\\ k!\begin{pmatrix}n\\ k\end{pmatrix}&...&k!\begin{pmatrix}2n-1\\ k\end{pmatrix}\\ \vdots&\vdots&\vdots\\ (n-1)!\begin{pmatrix}n\\ n-1\end{pmatrix}&...&(n-1)!\begin{pmatrix}2n-1\\ n-1\end{pmatrix}\end{array}\right]\quad\text{and}

\displaystyle A=\left[\begin{array}[]{ccc}1&...&1\\ \begin{pmatrix}n\\ 1\end{pmatrix}&...&\begin{pmatrix}2n-1\\ 1\end{pmatrix}\\ \vdots&\vdots&\vdots\\ k!\begin{pmatrix}n\\ k\end{pmatrix}&...&k!\begin{pmatrix}2n-1\\ k\end{pmatrix}\\ \vdots&\vdots&\vdots\\ (n-1)!\begin{pmatrix}n\\ n-1\end{pmatrix}&...&(n-1)!\begin{pmatrix}2n-1\\ n-1\end{pmatrix}\end{array}\right]\quad\text{and}

B_{k i} = {(- 1)^{n - k} \frac{( n + i - 1 )!}{( k + i - n - 1 )!}, 0 if k + i \geq n + 1 if k + i < n + 1.

U_{ij} = {\frac{( j - 1 )!}{( j - i )!} 0 if j \geq i, otherwise, and L_{k j} = ⎩ ⎨ ⎧ (k - 1 j - 1) \frac{n !}{( n - k + j )!} 0 if j \leq k, otherwise .

U_{ij} = {\frac{( j - 1 )!}{( j - i )!} 0 if j \geq i, otherwise, and L_{k j} = ⎩ ⎨ ⎧ (k - 1 j - 1) \frac{n !}{( n - k + j )!} 0 if j \leq k, otherwise .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On the fundamental solution and a variational formulation for a degenerate diffusion of Kolmogorov type

Manh Hong Duong Corresponding author: M. H. Duong Department of Mathematics, Imperial College London, London SW7 2AZ, UK. Email: [email protected].

Hoang Minh Tran

Data Analytics Department, Esmart Systems, Håkon Melbergs vei 16, 1783 Halden, Norway. Email: [email protected]

Abstract

In this paper, we construct the fundamental solution to a degenerate diffusion of Kolmogorov type and develop a time-discrete variational scheme for its adjoint equation. The so-called mean squared derivative cost function plays a crucial role occurring in both the fundamental solution and the variational scheme. The latter is implemented by minimizing a free energy functional with respect to the Kantorovich optimal transport cost functional associated with the mean squared derivative cost. We establish the convergence of the scheme to the solution of the adjoint equation generalizing previously known results for the Fokker-Planck equation and the Kramers equation. ††footnotetext: 2010 Mathematics Subject Classification. Primary: 49S05; Secondary: 49J40, 35Q84.

Key words and phrases. Generalized gradient flow structure, variational methods, hypo-elliptic PDEs, optimal transport.

1 Introduction

In this paper, we are interested in the following partial differential equation

[TABLE]

and its adjoint equation when $V\equiv 0$

[TABLE]

In the above equations, the unknowns are $\rho=\rho(t,x_{1},\ldots,x_{n})$ and $f=f(t,x_{1},\ldots,x_{n})$ where $t>0$ and $\textbf{x}=(x_{1},\ldots,x_{n})\in\mathbb{R}^{dn}$ are time and spatial co-ordinates, respectively. The notations $\nabla,\mathop{\mathrm{div}}\nolimits$ and $\Delta$ denote the gradient, the divergence and the Laplacian operators respectively. The subscripts in these operators indicate that they only act on the corresponding variables. The two equations are complemented with initial data. The initial data for (1) is a probability measure, $\rho_{0}$ , on $\mathbb{R}^{dn}$ . Since the right-hand side of (1) is a divergence form, noting that the summation term can be written as $-\sum_{i=2}^{n}\mathop{\mathrm{div}}\nolimits_{x_{i-1}}(x_{i}\rho)$ , for each $t>0$ , $\rho(t)$ is also a probability measure on $\mathbb{R}^{dn}$ . Since we will be interested in the fundamental solution of (2), the initial data for (2) is a Dirac measure $\delta_{\textbf{y}}$ for $\textbf{y}\in\mathbb{R}^{dn}$ . Two special cases of (1), that correspond to $n=1$ and $n=2$ respectively, are the Fokker-Planck equation

[TABLE]

and the Kramers equation

[TABLE]

Their corresponding adjoint equations with $V\equiv 0$ are, respectively, the diffusion equation $\partial_{t}f=\Delta f$ and the ultra-parabolic equation

[TABLE]

Equations (1) and (2), particularly the Fokker-Plank equation and the Kramers equation, play an important role in statistical mechanics [Ris84], reaction-rate theory [HTB90] and mathematical finance [LPP02, Pas05]. For instance, in the context of statistical mechanics, (1) has been used to describe the time evolution of the probability density function of a many-particle system. Equation (1) with $n>2$ can also be viewed as a simplified model of a finite Markovian approximation for the generalised Langevin dynamics [OP11, Duo15] or a model of a harmonic chains of oscillators that arises in the context of non-equilibrium statistical mechanics [EH00, BL08, DM10].

Equations (1) and (2) belong to a class of degenerate diffusions of Kolmogorov type where the Laplacian acts only in some of variables. Two main issues that have been getting a lot of research interest in this class is (i) constructing the fundamental solution of (2) and (ii) developing a time-discrete variational scheme for (1). To motivate our work, let us discuss relevant literature on these issues for the Fokker-Planck equation and the Kramers equation. Regarding the first issue, it is a classical result that the diffusion equation $\partial_{t}f=\Delta f$ has the fundamental solution

[TABLE]

In the seminal paper [Kol34], Kolmogorov show that

[TABLE]

is the fundamental solution to the ultra-parabolic equation (5). It is this equation that was Hörmander’s starting point to develop the hypo-elliptic theory [Hör67], which has become a powerful tool in the theory of partial differential equations, see e.g. [Bra14] for discussions. Since Kolmogorov’s paper there has been a considerable amount of work on extending his result to other hypoelliptic equations including (2), see e.g., [Web51, Kup72, Pol94, FP05, DM10, IB15]. We refer to the mentioned papers and references therein for more information on this direction.

Regarding the second issue, the functions $C^{\rm FP}$ and $C^{\rm KR}$ also play a crucial role in the more recent development of time-discrete variational schemes for the Fokker-Planck equation and the Kramers equation, respectively. In the seminal paper [JKO98], Jordan-Kinderlehrer-Otto prove a remarkable result that the Fokker-Planck equation (3) can be seen as a gradient flow of the free energy, which is the sum of the potential energy $\int V\rho$ and the Boltzmann entropy $\int\rho\log\rho$ , with respect to the Wasserstein distance on the space of probability measures with finite second moments. Let $\mu,\nu\in\mathcal{P}_{2}(X)$ be two probability measures on some Euclidean space $X$ having finite second moments. Throughout this paper, $\Gamma(\mu,\nu)$ denotes the set of all probability measures on $X\times X$ having $\mu$ and $\nu$ as first and second marginals. The Wasserstein distance $W_{2}(\mu,\nu)$ between them is defined via

[TABLE]

The main ingredient in [JKO98] is the following variational approximation scheme, which is now known as the JKO-scheme,

JKO-scheme: Let $h>0$ be a time-step. Define $\rho^{0}_{h}:=\rho_{0}$ . Then, for each $k=1,2,...$ , $\rho^{k}_{h}$ is determined as

[TABLE]

The main result in [JKO98] then states that, after an appropriate interpolation, the sequence $\{\rho^{k}_{h}\}$ constructed from the JKO-scheme converges to the solution of the Fokker-Planck equation (3). This result has sparked a large body of research in the last two decades in the field of partial differential equations linking the field to some other branches of mathematics such as optimal transport theory, geometric measure theory and functional inequalities, see monographs [Vil03, AGS08, Vil09] for great expositions on the development.

Inspired by the JKO-scheme, Huang [Hua00] and then Duong et al. [DPZ14] have established different approximation schemes for the Kramers equation (4). The challenge here is that the techniques in [JKO98] can not be directly applied as the Kramers equation is neither a gradient flow nor a Hamiltonian flow even though these schemes are of the same form as in (9). The free energy functional is the same, but instead of the Wasserstein distance, the following Monge-Kantorovich optimal transport cost has been used in these papers

[TABLE]

where $C^{\rm KR}_{h}(\textbf{x},\textbf{y})$ is defined in (1). The cost functional $\tilde{W}_{h}$ has also been used to construct time-discrete variational schemes for other evolution equations such as the system of isentropic Euler equations [GW09, Wes10] and the compressible Euler equations [CSW14].

Through the above discussions, we notice a similarity between the Fokker-Planck equation and the Kramers equation. That is the role of the cost functions $C^{\rm FP}$ and $C_{t}^{\rm KR}$ . They appear both in the discrete-time variational scheme and the fundamental solution. In addition, these cost functions also satisfy the following property: they minimize the velocity and acceleration integrals respectively

[TABLE]

Recently in [DT16] we have studied the minimization problem

[TABLE]

where $\mathbf{x}=(x_{1},\ldots,x_{n})\in\mathbb{R}^{dn},\mathbf{y}=(y_{1},\ldots,y_{n})\in\mathbb{R}^{dn}$ and the infimum is taken over all curves $\xi\in C^{n}([0,T],\mathbb{R}^{d})$ that satisfy the boundary conditions

[TABLE]

The optimal value $\mathcal{C}_{t}(\textbf{x},\textbf{y})$ is called the mean squared derivative cost function and has been found to be useful in the modelling and design of various real-world systems such as motor control, biometrics and online-signatures and robotics, see [DT16] for further discussion.

Inspired by the role of $C^{\rm FP}$ and $C_{t}^{\rm KR}$ for the Fokker-Planck equation and the Kramers equation as discussed above, it is natural to ask

(Q1)

Is the function

[TABLE]

where $\beta_{d}$ is the normalising constant, the fundamental solution to Equation (2)? 2. (Q2)

Can Equation (1) be approximated by a discrete-time variational scheme in the spirit of the JKO-scheme where $C^{\rm FP}(\textbf{x},\textbf{y})$ is replaced by $\mathcal{C}_{h}(\textbf{x},\textbf{y})$ ?

The aim of the present paper is to provide affirmative answers to these questions. We now describe our main results.

1.1 Main results of the present paper

Our first main result is the following theorem about the fundamental solution to (2).

Theorem 1.1.

*The function $\Phi(t,\textbf{x},\textbf{y})$ defined in (13) is the fundamental solution to (2). That is, for each y, $\Phi(t,\textbf{x},\textbf{y})$ , as a function of $t$ and x, satisfies (2). In addition,

[TABLE]

This theorem together with the explicit formula for the mean squared derivative cost function $\mathcal{C}_{t}(\textbf{x},\textbf{y})$ in [DT16, Theorem 1.2] (see also Theorem 2.1 below) provide an explicit formula for the fundamental solution $\Phi(t,\textbf{x},\textbf{y})$ .

Our second main result is concerned with a variational formulation of (1). We first introduce the approximation scheme in the spirit of the JKO-scheme. Let $h>0$ be given and $\mathcal{C}_{h}(\textbf{x},\textbf{y})$ be the mean square derivative cost function defined in (11). Let $\mu$ and $\nu$ be two probability measures on $\mathbb{R}^{dn}$ having finite second moments. The Monge-Kantorovich optimal transport cost $\mathcal{W}_{h}(\mu,\nu)$ between $\mu$ and $\nu$ is defined by

[TABLE]

The variational approximation scheme of this paper is constructed as follows.

Scheme 1.2.

Let $\rho_{0}^{h}:=\rho_{0}$ . For $k\geq 1$ , define $\rho_{k}^{h}$ as the solution of the minimization problem

[TABLE]

Next we introduce the concept of a weak solution of (1). A function $\rho\in L^{1}(\mathbb{R}^{+}\times\mathbb{R}^{dn})$ is called a weak solution of equation (1) with initial datum $\rho_{0}\in\mathcal{P}_{2}(\mathbb{R}^{dn})$ if it satisfies the following weak formulation of (1):

[TABLE]

Throughout the paper we make the following assumptions.

Assumption 1.3.

[TABLE]

and there exists a constant $C>0$ such that for all $x_{1},x_{2}\in\mathbb{R}^{d}$

[TABLE]

The second main result of the paper is about the convergence of the approximation scheme to a weak solution of (1).

Theorem 1.4.

Suppose that $V$ satisfies Assumption 1.3. Let $\rho_{0}\in\mathcal{P}_{2}(\mathbb{R}^{dn})$ satisfy $\int_{\mathbb{R}^{dn}}(V(x_{n})+\log\rho_{0})\,\rho_{0}\,d\textbf{x}<\infty$ . For any $h>0$ sufficiently small, let $\rho_{k}^{h}$ be the sequence of the solutions of the (16). For any $t\geq 0$ , define the piecewise-constant time interpolation

[TABLE]

Then for any $T>0$ ,

[TABLE]

where $\rho$ is the unique weak solution of Equation (1) with initial value $\rho_{0}$ . Moreover

[TABLE]

and as $t\to 0$ ,

[TABLE]

1.2 Comparison to related work

Our work is twofold: to construct the fundamental solution to equation (2) (Theorem 1.1) and to develop a time-discrete variational scheme for equation (1) which is the adjoint equation of (2) with an additional external force field (Theorem 1.4). The mean squared derivative cost function (11) plays a central role appearing both in the fundamental solution and in the approximation scheme. We now give further comments on these issues.

On the fundamental solution of (2). The fundamental solution to (2) is not new. As mentioned above, it has been shown in [Web51, Hör67, Kup72, Pol94, FP05, DM10, IB15]. In these papers, the fundamental solution is constructed using various methods such as parametrix and Fourier-transform. In this paper, we provide a direct verification based on elementary combinatorial and linear algebra techniques. We use explicit formulas for the mean squared derivative cost function that we obtained in our recent work [DT16]. Our method is closed to [Kup72]. However, this work does not represent the fundamental solution of (2) in terms of the mean squared derivative cost function. The reference [DM10] provides an implicit representation via the controllability property of a differential system but this work does not address the variational formulation of (1).

Variational formulation for equation (1). Theorem 1.4 is a generalisation of the main results of [JKO98, Hua00, DPZ14] for an arbitrary $n$ . Even though the standard procedure in these papers can be used to prove the theorem, two additional difficulties arise. The first difficulty is how to select an appropriate perturbation flow to derive the Euler-Lagrange equation for the sequence $\rho_{k}^{h}$ . The other difficulty is to prove the convergence of the scheme to a solution of (1) which amounts to show the error terms vanish as $h\to 0$ . These difficulties can be solved by using the explicit formulas involving the derivative mean squared cost function derived in our previous work [DT16]. We also note that [Hua00] and [DPZ14] considered the full Kramers equation (i.e., with $n=2$ ) with an external force field $-\nabla_{x_{1}}U(x_{1})$ where $U=U(x_{1})$ is a given potential. In this case, the right-hand side of (1) has an additional term $\mathop{\mathrm{div}}\nolimits_{x_{2}}(\nabla_{x_{1}}U\rho)$ . [Hua00] has dealt with this term by adding $\frac{1}{h}\int U(x_{1})\rho\,d\textbf{x}$ into the free-energy functional resulting in an unusual scale. In contrast, this term has been encoded in the cost function in [DPZ14]. In the present work for arbitrary number of variables $n$ , we have made an assumption that $V$ depends only on the last co-ordinate $x_{n}$ . Due to this assumption Equation (1) resembles the Fokker-Planck equation in the last co-ordinate and the cost function $\mathcal{C}_{t}$ depends only on the $n$ -th order derivative of $\xi$ giving rise to a controllable formula. It is not clear to us at the moment how to adapt [Hua00, DPZ14] to deal with a more general case where $V$ depends on more than the last co-ordinate or when the co-ordinates are coupled in a more complex way as in [EH00, DM10, OP11]. We leave this issue for future research.

Microscopic interpretation of the variational scheme. The main results in recent papers [ADPZ11, DLR13, DPZ13, EMR15] show that Scheme 1.2 with $n=1$ and $n=2$ for the Fokker-Planck equation and the Kramers equation, respectively, can be derived from large-deviation principles of empirical measures associated to underlying stochastic processes. These results provide, among other things, microscopic interpretations for Scheme 1.2 in these cases. We expect that these results can be extended to the general case $n$ , but we will not follow this direction in this paper.

1.3 Organization of the paper

The rest of the paper is organised as follows. In Section 2, we summarize the main properties of the mean squared cost function in [DT16]. In Section 3 and Section 4 we prove Theorem 1.1 and Theorem 1.4, respectively. Finally, Appendix 5 contains proofs of technical lemmas.

2 The mean square derivative cost function

In this section, we collect relevant results on the mean squared derivative cost function in [DT16].

Theorem 2.1.

[DT16*, Theorem 1.2]**

The mean square derivative cost function $\mathcal{C}_{t}(\textbf{x},\textbf{y})$ is given by

[TABLE]

where

[TABLE]

and $M=BA^{-1}$ with

[TABLE]

Moreover, the matrix $A$ and its inverse have explicit $LU$ -decompositions given in the following theorem.

Theorem 2.2.

[DT16, Theorem 1.3]**

(1)

$A=LU$ , where $U$ and $L$ are defined as follows

[TABLE] 2. (2)

The inverse of $A$ is given by the product of the following two matrices:

[TABLE]

Note that throughout this paper, all the matrices $A,B,M,L$ and $U$ are of order $dn$ . Each entry of these matrices should be understood as a $d$ -dimensional matrix that is equal to the entry multiplies with the $d$ -dimensional identity matrix $I_{d}$ . For instance, $A_{ij}$ , for $1\leq i,j\leq n$ should be understood as the matrix $A_{ij}I_{d}$ . The multiplication of matrices are carried out as the multiplication of block matrices.

In the sequel sections, we also need the following the property of the cost function $\mathcal{C}_{t}(\textbf{x},\textbf{y})$ .

Lemma 2.3.

*There exists a constant $K>0$ independent of $t$ such that

[TABLE]

Proof.

It follows from the formula for $\mathcal{C}_{t}(\textbf{x};\textbf{y})$ that only the symmetric part $M_{s}$ of $M$ contributes to $\mathcal{C}_{t}$ . We have

[TABLE]

where

[TABLE]

By the definition of b, we have

[TABLE]

Therefore

[TABLE]

where $\bar{\mathcal{T}}=\mathrm{diag}(t^{n-1},\ldots,1)$ , which implies that $\textbf{y}-\textbf{x}=\bar{\mathcal{T}}Q^{-1}\,\mathbf{z}+\mathbf{w}$ .

By Cauchy-Schwarz inequality, for $t$ sufficiently small, we have $\|\bar{\mathcal{T}}\|\leq K$ and $\|\mathbf{w}\|^{2}\leq Kt^{2}\|\textbf{x}\|^{2}$ for some constant $K>0$ . We use the notation $K$ to denote a universal constant that may change from line to line. Therefore, we get

[TABLE]

This finishes the proof of this lemma. ∎

3 The fundamental solution of the adjoint equation (2)

In this section we prove Theorem 1.1. The proof consists of two main steps which are Proposition 3.1 and Proposition 3.2 below.

Proposition 3.1.

Let $\Phi(t,\textbf{x},\textbf{y})$ be defined as in (13). Then it is a solution of (2) if and only if $\mathcal{C}_{t}(\textbf{x},\textbf{y})$ satisfies the following equation

[TABLE]

Proof.

Let $\alpha(n,d)=\frac{n^{2}d}{2}$ . We compute each term in (2) from the representation of $\Phi$ in (13). For the simplicity of notation, in the following computations, we denote $\mathcal{C}:=\mathcal{C}_{t}(\textbf{x},\textbf{y})$ .

First, we calculate the time-derivative of $\Phi$ .

[TABLE]

Next, we calculate $\sum_{i=2}^{n}x_{i}\cdot\nabla_{x_{i-1}}\Phi$ . We have

[TABLE]

from which by taking the sum over $i$ from $2$ to $n$ , we get

[TABLE]

The Laplacian, $\Delta_{x_{n}}\Phi$ , with respect to variable $x_{n}$ is computed analogously

[TABLE]

It follows from (36), (37) and (38) that $\Phi$ is a solution of (2) if and only if

[TABLE]

The above equality is equivalent to

[TABLE]

By re-arranging the terms in the above equality and recalling that $\alpha(n,d)=\frac{n^{2}d}{2}$ , we obtain (35). This finishes the proof of the proposition. ∎

The following matrices will play important role in the rest of the paper

[TABLE]

Note that $H_{1}(t),H_{2}(t),Q,D\in\mathbb{R}^{dn\times dn}$ . Each entry of these matrices should be understood as a matrix of order $d$ that equals to the entry multiply with the $d$ -dimensional identity matrix.

Proposition 3.2.

The following assertions hold

(1)

$T_{1}$ * is anti-symmetric.* 2. (2)

$T_{2}=0$ . 3. (3)

$T_{3}$ * is anti-symmetric.* 4. (4)

$\mathrm{Tr}(DH_{2}^{T}MH_{2})=n^{2}\,d\,t^{2(n-1)}$ .

Proof.

The assertions of the lemma are proved by using combinatorial techniques and are given in Appendix 5. ∎

The following lemma is elementary but will be used several times in the sequel. We include it for the sake of completeness and reference.

Lemma 3.3.

Suppose that $A\in\mathbb{R}^{N\times N}$ is an anti-symmetric matrix, then for all $x\in\mathbb{R}^{N}$ we have,

[TABLE]

Proof.

We have $x^{T}Ax=x^{T}A^{T}x=\frac{1}{2}x^{T}(A+A^{T})x=0$ . ∎

We are now ready to prove Theorem 1.1.

Proof of Theorem 1.1.

To prove Theorem 1.1, by Proposition 3.1 it is sufficient to prove (35). According to Theorem 2.1 we have

[TABLE]

where

[TABLE]

where $H_{1}$ and $H_{2}$ are given in (39). Therefore, we have

[TABLE]

Next we will verify Proposition 3.1. We need to show that the function $\mathcal{C}$ satisfies Equation (35).

We compute the time-derivative of $\mathcal{C}$ first.

[TABLE]

From (44), we have

[TABLE]

Using the matrices $Q$ and $D$ defined in (40), $\sum_{i=2}^{n}x_{i}\cdot\nabla_{x_{i-1}}\mathcal{C}$ and $\nabla_{x_{n}}\mathcal{C}$ can be computed as follows

[TABLE]

and similarly

[TABLE]

Therefore, we get

[TABLE]

where to obtain the second equality we have used the fact that $D^{T}D=\mathrm{diag}(0,\ldots,0,1)=D$ .

The Laplacian $\Delta_{x_{n}}\mathcal{C}$ is then computed via the Trace operator.

[TABLE]

We now verify that

[TABLE]

Substituting the computations from (3) to (48), we need to verify that

[TABLE]

or equivalently, using (41)–(43), we need to verify that

[TABLE]

According to Proposition 3.2 we have

(i)

$T_{2}=0$ and $2t\mathrm{Tr}(DH_{2}^{T}MH_{2})=2t\,dn^{2}t^{2(n-1)}=2dn^{2}t^{2n-1}$ , 2. (ii)

$T_{1}$ and $T_{3}$ are anti-symmetric. By Lemma 3.3, we have $\textbf{y}^{T}T_{1}\textbf{y}=0=\textbf{x}^{T}T_{3}\textbf{x}$ .

Therefore, using (i)–(ii) above, we obtain

[TABLE]

which is equal to the left-hand side of (3) as required.

Finally, the initial condition (14) follows from the representation of $\mathcal{C}$ in Theorem 2.1 and the formula of b in (24). ∎

4 The variational formulation of Equation (1)

4.1 Well-posedness of Scheme 1.2 and the Euler-Langrange equation

In this section we prove the well-posedness of Scheme 1.2 and establish the Euler-Lagrange equations for the sequence of its minimizers.

Under Assumption 1.3 the free energy functional

[TABLE]

is well-defined in $\mathcal{P}_{2}(\mathbb{R}^{dn})$ . The following two lemmas show that Scheme 1.2 is well-defined. Their proofs are now classical, see e.g., [Vil03, Theorem 1.3], [JKO98, Proposition 4.1], and [Hua00, Lemma 4.2]). Hence we will omit them here.

Lemma 4.1.

*Let $\rho_{0},\rho\in\mathcal{P}_{2}(\mathbb{R}^{dn})$ be given. There exists a unique optimal plan $P_{\mathrm{opt}}\in\Gamma(\rho_{0},\rho)$ such that

[TABLE]

Lemma 4.2.

Let $\rho_{0}\in\mathcal{P}_{2}(\mathbb{R}^{dn})$ be given. If $h$ is small enough, then the minimization problem

[TABLE]

has a unique solution.

Next we establish the Euler-Lagrange equation for the sequence of minimizers of Scheme 1.2. We will need two auxiliary lemmas whose proofs are presented in Appendix 5.

Lemma 4.3.

$H_{2}^{-1}H_{1}=H$ * where*

[TABLE]

In particular $H_{ii}=1,\quad H_{ii+1}=-h$ and $H_{ij}=o(h^{2})$ for $j\geq i+2$ . Note that $H\in\mathbb{R}^{dn\times dn}$ where $H_{ij}$ should be understood as $H_{ij}I_{d}$ .

Lemma 4.4.

Let $\mathcal{K}=h^{2n-2}(H_{2}^{T}MH_{1})^{-1}$ . Then

[TABLE]

In particular, $\mathcal{K}_{nn}=1$ and $\mathcal{K}_{ij}=o(h)$ for all $(i,j)\neq(n,n)$ . Note also that $\mathcal{K}\in\mathbb{R}^{dn\times dn}$ where $\mathcal{K}_{ij}$ should be understood as $\mathcal{K}_{ij}I_{d}$ .

Having these two lemmas, we are now ready to derive the Euler-Lagrange equation for the sequence of minimizers in Scheme 1.2.

Lemma 4.5 (Euler-Lagrange equation for the sequence of minimizers).

*Let $\{\rho_{k}^{h}\}_{k\geq 1}$ be the sequence of the minimizers of Scheme 1.2. Then we have

[TABLE]

where $P_{k}^{h}$ is the optimal plan in $\mathcal{W}_{h}(\rho_{k-1}^{h},\rho_{k}^{h})$ .

Proof.

Let $\overline{\mu}\in\mathcal{P}_{2}(\mathbb{R}^{dn})$ be given and let $\mu$ be the unique solution of the minimization problem

[TABLE]

We will show that

[TABLE]

where $P_{\mathrm{opt}}$ is the optimal plan in $\mathcal{W}_{h}(\overline{\mu},\mu)$ .

Although establishing the Euler-Lagrange equation for the minimizer $\mu$ has become a well-established route, see e.g., [JKO98] and [Hua00, DPZ13] for that of the Fokker-Planck equation ( $n=1$ ) and of the Kramers equation ( $n=2$ ) respectively, there is one additional difficulty. That is how to select an appropriate perturbation flow from that the Euler-Langrange equation is deduced. We first define a perturbation of $\mu$ by a push-forward under an appropriate flow. Let $\phi_{1},\ldots,\phi_{n}\in C_{0}^{\infty}(\mathbb{R}^{dn},\mathbb{R}^{d})$ . We define the flows $\Phi^{1},\ldots,\Phi^{n}\colon[0,\infty)\times\mathbb{R}^{dn}\rightarrow\mathbb{R}^{d}$ such that

[TABLE]

Let $\mu_{s}(\textbf{x})$ be the push forward of $\mu(\textbf{x})$ under the flow $(\Phi^{1}_{s},\ldots,\Phi^{n}_{s})$ , i.e., for any $\varphi\in C_{0}^{\infty}(\mathbb{R}^{dn},\mathbb{R})$ we have

[TABLE]

Since $(\Phi^{1}_{0},\ldots,\Phi^{n}_{0})=\textbf{x}$ , we have $\mu_{0}(\textbf{x})=\mu(\textbf{x})$ . Taking derivatives with respect to $s$ of both sides gives

[TABLE]

We then compute, using (14) and (15), the stationarity condition for $\mu$ following the calculations in [JKO98, Hua00, DPZ13]

[TABLE]

According to (44) we have

[TABLE]

Therefore,

[TABLE]

Let $\varphi\in C_{0}^{\infty}(\mathbb{R}^{dn},\mathbb{R})$ . We choose $\phi_{1},\ldots,\phi_{n}$ such that

[TABLE]

where $\mathcal{K}$ is given in Lemma 4.4 that implies that $h^{2-2n}\mathcal{K}^{T}(H_{1}^{T}MH_{2})=I$ .

Using Lemmas 4.3 and 4.4, we compute

[TABLE]

(ii) $\phi_{n}(\textbf{x})=\sum_{j=1}^{n}\mathcal{K}_{nj}\nabla_{x_{j}}\varphi(\textbf{x})=\nabla_{x_{n}}\varphi(\textbf{x})+o(h)$ .

(iii) $\sum_{i=1}^{n}\mathop{\mathrm{div}}\nolimits_{x_{i}}\phi_{i}(\textbf{x})=\mathop{\mathrm{div}}\nolimits_{\textbf{x}}[\mathcal{K}\nabla\varphi(\textbf{x})]=\sum_{i,j}\mathcal{K}_{ij}\partial^{2}_{x_{i}x_{j}}\varphi=\Delta_{x_{n}}\varphi(\textbf{x})+o(h)$ .

Substituting these calculations back into (58) we obtain

[TABLE]

which is the desired equality (55).

Applying (55) for the minimizers $\{\rho_{k}^{h}\}_{k\geq 1}$ of Scheme 1.2 yields the statement of Lemma 4.5. ∎

4.2 A priori estimates

In this section, we derive a priori estimates for the sequence of the minimizers of Scheme 1.2. The proofs of Lemma 4.6, Lemma 4.7 and Lemma 4.8 below are now standard, see e.g. [JKO98, Hua00, DPZ13], hence we omit them.

The following lemma provides an upper bound for the sum $\sum_{k=1}^{n}\mathcal{W}_{h}(\rho_{k-1}^{h},\rho_{k}^{h})$ . From now on, we denote by $M_{2}(\rho)$ the second moment of a probability measure $\rho$ , i.e., $M_{2}(\rho)=\int|\textbf{x}|^{2}\,d\rho$ .

Lemma 4.6.

Let $\{\rho_{k}^{h}\}_{k\geq 1}$ be the sequence of the minimizers of Scheme 1.2 for fixed $h>0$ . Then for any positive integer $n$ and sufficiently small $h$ , we have

[TABLE]

for some constant $C>0$ independent of $n$ .

The next lemma shows boundedness of the second moment $M_{2}(\rho_{k}^{h})$ and the entropy $S(\rho_{k}^{h})$ locally in time.

Lemma 4.7.

There exist positive constants $T_{0}$ , $h_{0}$ , and $C$ , independent of the initial data, such that for any $0<h\leq h_{0}$ , the solutions $\{\rho_{k}^{h}\}_{k\geq 1}$ for Scheme 1.2 satisfy

[TABLE]

where $K_{0}=\lceil{T_{0}}/{h}\rceil$ .

The last lemma of this section extends Lemma 4.7 to any final time $T>0$ .

Lemma 4.8.

Let $\{\rho_{k}^{h}\}_{k\geq 1}$ be the sequence of the minimizers of Scheme 1.2 for fixed $h>0$ . For any $T>0$ , there exists a constant $C>0$ depending on $T$ and on the initial data such that

[TABLE]

for any $h\leq h_{0}$ and $k\leq K_{h}$ , where $K_{h}=\left\lceil\frac{T}{h}\right\rceil$ .

4.3 Proof of Theorem 1.4

Having established the Euler-Lagrange equation and a priori estimates in previous sections, in this section we prove Theorem 1.4. The proof is similar to that of [JKO98, Hua00, DPZ13]. Therefore, we only present the part that is different, that is to prove the convergence of the discrete Euler-Lagrange equations to the weak formulation (17) of Equation (1) as $h\to 0$ . The key point is to link the Euler-Lagrange equation for the sequence of minimizers obtained in Lemma 4.5 to a time-discretization of Equation (1).

Let $T>0$ be a given final time. For each $h>0$ we set $K_{h}\colonequals\lceil T/h\rceil$ . Let $(\rho_{k}^{h})_{k\geq 1}$ be the sequence of minimizers of Scheme 1.2 and let $t\mapsto\rho^{h}(t)$ be the piecewise-constant interpolation (20). By Lemma 4.8 we have

[TABLE]

Since the function $z\mapsto\max\{z\log z,0\}$ has super-linear growth, (66) guarantees that there exists a subsequence, denoted again by $\rho^{h}$ , and a function $\rho\in L^{1}((0,T)\times\mathbb{R}^{dn})$ such that

[TABLE]

We now prove that the limit $\rho$ satisfies the weak formulation (17). Let $\varphi\in C_{c}^{\infty}((-\infty,T)\times\mathbb{R}^{dn})$ be given. All constants $C$ below depend on the parameters of the problem, on the initial datum $\rho_{0}$ , and on $\varphi$ , but are independent of $k$ and of $h$ .

Let $P_{k}^{h}\in\Gamma(\rho_{k-1}^{h},\rho_{k}^{h})$ be the optimal plan for $\mathcal{W}_{h}(\rho_{k-1}^{h},\rho_{k}^{h})$ . For any $0<t<T$ , we have

[TABLE]

where the error term $\varepsilon_{k}$ comes from the Taylor expansion of $\varphi$ and can be estimated by

[TABLE]

Multiplying (68) with $\frac{1}{h}$ and combining with (54) we get

[TABLE]

where

[TABLE]

It is worthy noting that $\theta_{k}$ depends on $t$ through the $t$ -dependence of $\varphi$ . Integrating (70) with respect to $t$ from $(k-1)h$ to $kh$ , we obtain

[TABLE]

Summing this relation from $k=1$ to $K_{h}$ gives

[TABLE]

where

[TABLE]

By a discrete integration by parts, the left hand side of (72) can be written as

[TABLE]

From (72) and (74) we obtain

[TABLE]

Next we show that $R_{h}\rightarrow 0$ as $h\to 0$ . Indeed, we have

[TABLE]

Taking the limit $h\rightarrow 0$ in (75) yields equation (17) proving (21).

The proof of the stronger convergence (22) and of the continuity (23) at $t=0$ follows from the equi-near-continuity estimate, see [JKO98, Hua00, DPZ13]

[TABLE]

where $W_{2}(\rho_{0},\rho_{1})$ is the Wasserstein distance between $\rho_{0}$ and $\rho_{1}$ defined in (8). This estimate follows from the inequality (see (34))

[TABLE]

and the estimates (66) and (64).

5 Appendix

This appendix contains proofs of technical lemmas in the previous sections.

5.1 Proof of Proposition 3.2

In this appendix, we prove Proposition 3.2. For the convenience, we recall the relevant matrices here:

[TABLE]

We need several auxiliary lemmas. The first one is an explicit formula for the inverse of the matrix $B$ define in (32).

Lemma 5.1.

Recall the matrix $B$ in (32)

[TABLE]

Then $B^{-1}$ has the following form

[TABLE]

Proof.

Define the matrix $\tilde{B}$ by

[TABLE]

We now show that $B\tilde{B}=I$ . We consider three cases

If $k<j$ , then

[TABLE]

The last equality is [math] since by the definition of $\tilde{B}$ we have $\tilde{B}_{ij}=0$ for every $i\geq n+1-k>n+1-j$ . 2. 2.

If $k=j$ , then similarly as the case above, we get

[TABLE] 3. 3.

If $k>j$ , then since $B_{ki}=0$ for $i<n+1-k$ and $\tilde{B}_{ij}=0$ for $i>n+1-j$ , we get

[TABLE]

From these three cases, we imply that $B\tilde{B}=I$ , therefore $B^{-1}=\tilde{B}$ which concludes the proof of the lemma. ∎

The next two lemmas show relations between the matrices $H_{1},H_{2},Q$ and $D$ .

Lemma 5.2.

It holds that

[TABLE]

Proof.

For three matrices $X,\,Y,\,Z$ , we have

[TABLE]

In particular, applying this identity for $X=H_{2},Y=D$ and $Z=H_{2}^{T}$ , we get

[TABLE]

where we used the definition of $D$ and $H_{2}$ to obtain $(*)$ and $(**)$ respectively. This finishes the proof of the lemma. ∎

Lemma 5.3.

It holds that

[TABLE]

Proof.

Let $T_{23}:=[-2t(H_{2}^{T})^{-1}(H_{2}^{\prime})^{T}+2t(H_{2}^{T})^{-1}QH_{2}^{T}]$ . Then

[TABLE]

Note that $(H_{2}^{T})_{ik}^{-1}(H_{2}^{\prime})_{kj}^{T}\neq 0$ iff $i\geq k,k\geq j$ and $(H_{2}^{T})_{i,k+1}^{-1}(H_{2}^{T})_{kj}\neq 0$ iff $i\geq k+1,k\geq j$ . We transform further $(T_{23})_{ij}$ .

If $j>i$ then $(H_{2}^{T})_{ik}^{-1}(H_{2}^{\prime})_{kj}^{T}=(H_{2}^{T})_{i,k+1}^{-1}(H_{2}^{T})_{kj}=0$ . This gives that $(T_{23})_{ij}=0$ . 2. 2.

If $j<i$ , then we have

[TABLE]

Consider the function $g(x)=(x-1)^{i-j}$ . On the one hand, since $g^{\prime}(x)=(i-j)(x-1)^{i-j}$ , it follows that $g^{\prime}(1)=0$ . On the other hand, we have

[TABLE]

which implies that

[TABLE]

In addition, we have

[TABLE]

Summing up (82) and (83) yields $(I)=0$ . The term $(II)$ is equal to

[TABLE]

Hence if $j<i$ , then $(T_{23})_{ij}=0$ . 3. 3.

Finally, if $i=j$ , then we have

[TABLE]

From these three cases we obtain that

[TABLE]

As a result, we have

[TABLE]

which is (81). This finishes the proof of the lemma. ∎

The following lemma is a purely combinatoric identity. We expect that this identity is not new, but we include a proof here for the sake of completeness.

Lemma 5.4.

It holds that

[TABLE]

Proof.

We prove (89) by induction. Obviously, (89) is correct for $k=0$ . Assume that we (89) is valid with some value of $k$ .

We will prove (89) is also valid for $k+1$ . We have

[TABLE]

where in the last equality we have used the fact that $\left(\begin{array}[]{c}n-1\\ k\end{array}\right)+\left(\begin{array}[]{c}n-1\\ k+1\end{array}\right)=\left(\begin{array}[]{c}n\\ k+1\end{array}\right)$ . This is (89) for $k+1$ . Therefore by the induction principle, (89) is proved. ∎

Lemma 5.5.

For all $j\leq k$ and $k-j\leq n$ , we have

[TABLE]

Proof.

The identity (90) has been used and proved in our previous paper [DT16, Proof of Theorem 4.1]. For the convenience of the reader we provide a proof here. Consider the function $f(x)=x^{n}(1-x)^{j-1}$ . On the one hand, we have

[TABLE]

Therefore,

[TABLE]

It follows that

[TABLE]

and by multiplying both sides of this equality with $(-1)^{j}$ , we get

[TABLE]

On the other hand, according to Leibniz formula, we have

[TABLE]

In particular, we get

[TABLE]

The identity (90) is then followed from (91) and (92). ∎

Having these lemmas, we are ready to prove Proposition 3.2 in the following subsections.

5.1.1 Proof of Proposition 3.2–(1): $T_{1}$ anti-symmetric

In this section,we prove that $T_{1}$ is anti-symmetric where

[TABLE]

We have

[TABLE]

Therefore, we get

[TABLE]

where we have used Lemma 5.2 to obtain $(*)$ . Let

[TABLE]

To prove $T_{1}$ is anti-symmetric, it suffices to prove that $T_{11}$ is anti-symmetric. Let $C_{ij}$ be the element $(i,j)$ of $T_{11}$ . Then $C_{ij}$ can be computed as follows.

[TABLE]

where to obtain the last equality, we have changed variable $l:=n+1-k-j$ .

Applying (89) for $n=p+q+1$ ane $k=q$ we obtain

[TABLE]

Now applying (100) for $p=n-i$ and $q=n-j$ we get

[TABLE]

From (101) and (94) we achieve

[TABLE]

which implies that $C_{ji}=-C_{ji}$ . Therefore, $T_{11}$ is anti-symmetric and so is $T_{1}$ .

5.1.2 Proof of Proposition 3.2–(2): $T_{2}=0$

In this section, we prove that $T_{2}=0$ where

[TABLE]

According to Lemma 5.3, we have

[TABLE]

implying that

[TABLE]

Therefore,

[TABLE]

Since $tH_{1}^{\prime}H_{1}^{-1}=\mathrm{diag}(0,...,n-1)$ , we have

[TABLE]

Therefore, we get

[TABLE]

We will prove that $T_{22}=0$ where

[TABLE]

Note that the last equality is because $M^{-1}=AB^{-1}$ . We compute each element of $T_{22}$ as follows.

[TABLE]

Therefore $T_{22}=0$ . It follows that $T_{2}=0$ .

5.1.3 Proof of Proposition 3.2–(3): $T_{3}$ anti-symmetric:

In this section, we prove that $T_{3}$ is anti-symmetric where

[TABLE]

According to Lemma 5.2 $H_{2}DH_{2}^{T}=H_{0}$ ; therefore we have

[TABLE]

We define

[TABLE]

To prove $T_{3}$ is anti-symmetric, it suffices to show that $T_{31}$ is anti-symmetric.

From (94) we have:

[TABLE]

which implies that (using the fact that $(H_{0})_{ij}=\frac{1}{(n-i)!(n-j!)}$ )

[TABLE]

Using (102) and Lemma 5.3, we obtain

[TABLE]

Therefore, $T_{31}$ is anti-symmetric and so is $T_{3}$ .

5.1.4 Proof of Proposition 3.2–(4): $\mathrm{Tr}(DH_{2}^{T}MH_{2})=n^{2}\,d\,t^{2(n-1)}$

In this section, we prove that last identity in Proposition 3.2. We will show that

[TABLE]

We recall that all the matrices $A,B,H_{2},H_{1},D,M,L$ and $U$ are of order $dn$ . Each entry of these matrix should be understood as a matrix of order $d$ that equals to the entry multiply with the $d$ -dimensional identity matrix $I_{d}$ .

Since $D=\mathrm{diag}(0,\ldots,0,1)$ , we have $DH_{2}^{T}MH_{2}=\mathrm{diag}(0,\ldots,0,[(H_{2})^{T}MH_{2}]_{nn})$ . Therefore

[TABLE]

Next we show that $M_{nn}=n^{2}I_{d}$ . According to Theorem 2.2, we have

[TABLE]

where

[TABLE]

Therefore,

[TABLE]

Now we show that

[TABLE]

Applying (90) for $j=n$ and $k=n+1$ we get

[TABLE]

which implies that

[TABLE]

as desired, that is $M_{nn}=n^{2}I_{d}$ .

Therefore we get $\mathrm{Tr}(DH_{2}^{T}MH_{2})=t^{2(n-1)}\rm{Tr}(M_{nn})=t^{2(n-1)}n^{2}d$ as stated.

5.2 Proof of Lemma 4.3

In this secion we prove Lemma 4.3: we need to show that $H_{2}^{-1}H_{1}=H$ where $H_{1},H_{2}$ and $H$ are given in (39) and (52).

We will show that $(H_{2}H)_{ik}=(H_{1})_{ik}$ for all $i,k$ .

•

If $k<i$ , we have $(H_{2})_{ij}=0$ if $j\leq k$ and $H_{jk}=0$ if $j>k$ . Therefore,

[TABLE]

•

If $k>i$ , we have $(H_{2})_{ij}=0$ if $j\leq i$ and $H_{jk}=0$ if $j>k$ . Therefore,

[TABLE]

•

If $k=i,$ we have $(H_{2})_{ij}=0$ if $j<i$ and $H_{ji}=0$ if $j>k$ . Therefore,

[TABLE]

It follows from these cases that $H_{2}H=H_{1}$ .

5.3 Proof of Lemma 4.4

In this section, we prove Lemma 4.4. First we will show that $(H_{2}^{T})^{-1}=P$ where

[TABLE]

We consider the following cases:

•

If $i=j,$ we have $P_{lj}=0$ if $l<i$ and $(H_{2}^{T})_{il}=0$ if $l>i$ . Therefore,

[TABLE]

•

If $i<j$ , we have $P_{lj}=0$ if $l\leq i$ and $(H_{2}^{T}(h))_{il}=0$ if $l>i$ . Therefore,

[TABLE]

•

If $i>j$ , we have $P_{lj}=0$ if $l<j$ and $(H_{2})_{il}=0$ if $l>i$ . Therefore,

[TABLE]

Therefore, $(H_{2}^{T})^{-1}=P$ . We have $\mathcal{K}=(H_{2}^{T}MH_{1})^{-1}=H_{1}^{-1}M^{-1}(H_{2}^{T})^{-1}$ . Therefore,

[TABLE]

We now transform further the last expression. Let $\beta=n-j,\alpha=n-i,u=n-j-l$ , and $f(x)=\sum_{u=0}^{\beta}\frac{1}{(\alpha+1+u)}\frac{\beta!(-1)^{\beta-u}}{u!(\beta-u)!}x^{u+\alpha}$ . From (103), $\mathcal{K}_{ij}$ can be rewritten as follows

[TABLE]

We now compute $f(1)$ in an alternative way. We have

[TABLE]

In particular, taking $x=1$ yields

[TABLE]

We now compute the integral on the RHS of (106). We have

[TABLE]

It follows from (105),(106) and (107)) that

[TABLE]

which finishes the proof of Lemma 4.4.

Acknowledgements

The majority of this paper was written when M. H. Duong was at the Mathematics Institute, University of Warwick and was supported by ERC Starting Grant 335120 while H. M. Tran was at the Department of Industrial Engineering, Texas A&M University.

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[ADPZ 11] S. Adams, N. Dirr, M. A. Peletier, and J. Zimmer. From a large-deviations principle to the Wasserstein gradient flow: a new micro-macro passage. Communications in Mathematical Physics , 307:791–815, 2011.
2[AGS 08] L. Ambrosio, N. Gigli, and G. Savaré. Gradient flows in metric spaces and in the space of probability measures . Lectures in Mathematics. ETH Zürich. Birkhauser, Basel, 2nd edition, 2008.
3[BL 08] T. Bodineau and R. Lefevere. Large deviations of lattice hamiltonian dynamics coupled to stochastic thermostats. Journal of Statistical Physics , 133(1):1–27, 2008.
4[Bra 14] M. Bramanti. An Invitation to Hypoelliptic Operators and Hörmander’s Vector Fields . Springer, Berlin, 2014.
5[CSW 14] F. Cavalletti, M. Sedjro, and M. Westdickenberg. A variational time discretization for the compressible euler equations. http://arxiv.org/abs/1411.1012 , 2014.
6[DLR 13] M. H. Duong, V. Laschos, and M. Renger. Wasserstein gradient flows from large deviations of many-particle limits. ESAIM Control Optim. Calc. Var. , 19(4):1166–1188, 2013.
7[DM 10] F. Delarue and S. Menozzi. Density estimates for a random noise propagating through a chain of differential equations. J. Funct. Anal. , 259(6):1577–1630, 2010.
8[DPZ 13] M. H. Duong, M. A. Peletier, and J. Zimmer. GENERIC formalism of a Vlasov-Fokker-Planck equation and connection to large-deviation principles. Nonlinearity , 26(11):2951–2971, 2013.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On the fundamental solution and a variational formulation for a degenerate diffusion of Kolmogorov type

Abstract

1 Introduction

1.1 Main results of the present paper

Theorem 1.1**.**

Scheme 1.2**.**

Assumption 1.3**.**

Theorem 1.4**.**

1.2 Comparison to related work

1.3 Organization of the paper

2 The mean square derivative cost function

Theorem 2.1**.**

Theorem 2.2**.**

Lemma 2.3**.**

Proof.

3 The fundamental solution of the adjoint equation (2)

Proposition 3.1**.**

Proof.

Proposition 3.2**.**

Proof.

Lemma 3.3**.**

Proof.

Proof of Theorem 1.1.

4 The variational formulation of Equation (1)

4.1 Well-posedness of Scheme 1.2 and the Euler-Langrange equation

Lemma 4.1**.**

Lemma 4.2**.**

Lemma 4.3**.**

Lemma 4.4**.**

Lemma 4.5** (Euler-Lagrange equation for the sequence of minimizers).**

Proof.

4.2 A priori estimates

Lemma 4.6**.**

Lemma 4.7**.**

Lemma 4.8**.**

4.3 Proof of Theorem 1.4

5 Appendix

5.1 Proof of Proposition 3.2

Lemma 5.1**.**

Proof.

Lemma 5.2**.**

Proof.

Lemma 5.3**.**

Proof.

Lemma 5.4**.**

Proof.

Lemma 5.5**.**

Proof.

5.1.1 Proof of Proposition 3.2–(1): T1T_{1}T1​ anti-symmetric

5.1.2 Proof of Proposition 3.2–(2): T2=0T_{2}=0T2​=0

5.1.3 Proof of Proposition 3.2–(3): T3T_{3}T3​ anti-symmetric:

5.1.4 Proof of Proposition 3.2–(4): Tr(DH2TMH2)=n2 d t2(n−1)\mathrm{Tr}(DH_{2}^{T}MH_{2})=n^{2}\,d\,t^{2(n-1)}Tr(DH2T​MH2​)=n2dt2(n−1)

5.2 Proof of Lemma 4.3

5.3 Proof of Lemma 4.4

Acknowledgements

Theorem 1.1.

Scheme 1.2.

Assumption 1.3.

Theorem 1.4.

Theorem 2.1.

Theorem 2.2.

Lemma 2.3.

Proposition 3.1.

Proposition 3.2.

Lemma 3.3.

Lemma 4.1.

Lemma 4.2.

Lemma 4.3.

Lemma 4.4.

Lemma 4.5 (Euler-Lagrange equation for the sequence of minimizers).

Lemma 4.6.

Lemma 4.7.

Lemma 4.8.

Lemma 5.1.

Lemma 5.2.

Lemma 5.3.

Lemma 5.4.

Lemma 5.5.

5.1.1 Proof of Proposition 3.2–(1): $T_{1}$ anti-symmetric

5.1.2 Proof of Proposition 3.2–(2): $T_{2}=0$

5.1.3 Proof of Proposition 3.2–(3): $T_{3}$ anti-symmetric:

5.1.4 Proof of Proposition 3.2–(4): $\mathrm{Tr}(DH_{2}^{T}MH_{2})=n^{2}\,d\,t^{2(n-1)}$