Numerical methods for stochastic differential equations based on   Gaussian mixture

Lei Li; Jianfeng Lu; Jonathan Mattingly; Lihan Wang

arXiv:1812.11932·math.NA·August 12, 2021

Numerical methods for stochastic differential equations based on Gaussian mixture

Lei Li, Jianfeng Lu, Jonathan Mattingly, Lihan Wang

PDF

TL;DR

This paper introduces a novel numerical method for stochastic differential equations that uses Gaussian mixtures to achieve weak second order accuracy, offering an efficient alternative to traditional schemes.

Contribution

The paper proposes a new Gaussian mixture-based scheme for SDEs that simplifies computation and achieves higher weak order accuracy compared to conventional methods.

Findings

01

Achieves weak second order accuracy for SDEs.

02

Complexity of the method scales linearly with dimension.

03

Provides an efficient alternative to Itô-Taylor based schemes.

Abstract

We develop in this work a numerical method for stochastic differential equations (SDEs) with weak second order accuracy based on Gaussian mixture. Unlike the conventional higher order schemes for SDEs based on It\^o-Taylor expansion and iterated It\^o integrals, the proposed scheme approximates the probability measure $μ (X^{n + 1} ∣ X^{n} = x_{n})$ by a mixture of Gaussians. The solution at next time step $X^{n + 1}$ is then drawn from the Gaussian mixture with complexity linear in the dimension $d$ . This provides a new general strategy to construct efficient high weak order numerical schemes for SDEs.

Equations332

d X (t) = b (X (t)) d t + σ (X (t)) d W, X (0) = x,

d X (t) = b (X (t)) d t + σ (X (t)) d W, X (0) = x,

C_{b}^{k}=\Big{\{}f\in C^{k}:\|f\|_{C^{k}}:=\sup_{x\in\mathbb{R}^{d}}\sum_{|\alpha|\leq k}|D^{\alpha}f|<\infty\Big{\}}.

C_{b}^{k}=\Big{\{}f\in C^{k}:\|f\|_{C^{k}}:=\sup_{x\in\mathbb{R}^{d}}\sum_{|\alpha|\leq k}|D^{\alpha}f|<\infty\Big{\}}.

Λ (x) := σ (x) σ^{T} (x)

Λ (x) := σ (x) σ^{T} (x)

x \in R^{d} in f min λ (Λ (x)) \geq σ_{0}^{2} > 0

x \in R^{d} in f min λ (Λ (x)) \geq σ_{0}^{2} > 0

∣ b (x) - b (y) ∣ + ∣ σ (x) - σ (y) ∣ \leq K ∣ x - y ∣.

∣ b (x) - b (y) ∣ + ∣ σ (x) - σ (y) ∣ \leq K ∣ x - y ∣.

0 \leq t \leq T sup E_{x} ∣ X (t) ∣^{2 m} \leq C (x, T) .

0 \leq t \leq T sup E_{x} ∣ X (t) ∣^{2 m} \leq C (x, T) .

L = i = 1 \sum d b_{i} \partial_{i} + 1 \leq i, j \leq d \sum \frac{1}{2} Λ_{ij} \partial_{ij} .

L = i = 1 \sum d b_{i} \partial_{i} + 1 \leq i, j \leq d \sum \frac{1}{2} Λ_{ij} \partial_{ij} .

\partial_{t} p = L^{*} p := - \nabla \cdot (b p) + \frac{1}{2} ij \sum \partial_{ij}^{2} (Λ_{ij} p) .

\partial_{t} p = L^{*} p := - \nabla \cdot (b p) + \frac{1}{2} ij \sum \partial_{ij}^{2} (Λ_{ij} p) .

u (x, t) = E_{x} ϕ (X (t)) .

u (x, t) = E_{x} ϕ (X (t)) .

u_{t} = L u .

u_{t} = L u .

u (x, t) = e^{t L} ϕ (x) = j = 0 \sum \infty \frac{t ^{j}}{j !} L^{j} ϕ (x) .

u (x, t) = e^{t L} ϕ (x) = j = 0 \sum \infty \frac{t ^{j}}{j !} L^{j} ϕ (x) .

\sup_{x\in\mathbb{R}^{d}}\Big{|}u(x,h)-\Big{(}\phi(x)+\sum_{j=1}^{2}\frac{h^{j}}{j!}\mathcal{L}^{j}\phi\Big{)}\Big{|}\leq\rho(\|\phi\|_{C^{6}})h^{3}.

\sup_{x\in\mathbb{R}^{d}}\Big{|}u(x,h)-\Big{(}\phi(x)+\sum_{j=1}^{2}\frac{h^{j}}{j!}\mathcal{L}^{j}\phi\Big{)}\Big{|}\leq\rho(\|\phi\|_{C^{6}})h^{3}.

h = T / N .

h = T / N .

\bigl{\lvert}\mathbb{E}\phi(X^{n})-\mathbb{E}\phi(X(t_{n}))\bigr{\rvert}\leq Ch^{r},\quad\forall\,1\leq n\leq N,\mbox{ whenever }h\in(0,h_{0}).

\bigl{\lvert}\mathbb{E}\phi(X^{n})-\mathbb{E}\phi(X(t_{n}))\bigr{\rvert}\leq Ch^{r},\quad\forall\,1\leq n\leq N,\mbox{ whenever }h\in(0,h_{0}).

X^{n + 1} = X^{n} + A (X^{n}, ζ^{n}, h), X^{0} = x,

X^{n + 1} = X^{n} + A (X^{n}, ζ^{n}, h), X^{0} = x,

∣ E_{x} ϕ (X (h)) - E_{x} ϕ (X^{1}) ∣ \leq ρ (∥ ϕ ∥_{C^{2 (r + 1)}}) h^{r + 1}, \forall x \in R^{d},

∣ E_{x} ϕ (X (h)) - E_{x} ϕ (X^{1}) ∣ \leq ρ (∥ ϕ ∥_{C^{2 (r + 1)}}) h^{r + 1}, \forall x \in R^{d},

\sup_{x\in\mathbb{R}^{d}}\Bigl{\lvert}\mathbb{E}_{x}(\phi(X^{1}))-\sum_{j=0}^{2}\frac{h^{j}}{j!}\mathcal{L}^{j}\phi(x)\Bigr{\rvert}\leq\rho(\|\phi\|_{C^{6}})h^{3},

\sup_{x\in\mathbb{R}^{d}}\Bigl{\lvert}\mathbb{E}_{x}(\phi(X^{1}))-\sum_{j=0}^{2}\frac{h^{j}}{j!}\mathcal{L}^{j}\phi(x)\Bigr{\rvert}\leq\rho(\|\phi\|_{C^{6}})h^{3},

X^{n + 1} = X^{n} + b (X^{n}) h + σ (X^{n}) Δ W_{n}

X^{n + 1} = X^{n} + b (X^{n}) h + σ (X^{n}) Δ W_{n}

d Y (t) = μ (t) d t + σ (t) d W .

d Y (t) = μ (t) d t + σ (t) d W .

m (t) = m_{0} + \int_{0}^{t} μ (s) d s, S (t) = S_{0} + \int_{0}^{t} σ (s) σ^{T} (s) d s .

m (t) = m_{0} + \int_{0}^{t} μ (s) d s, S (t) = S_{0} + \int_{0}^{t} σ (s) σ^{T} (s) d s .

d Y (t) = \overset{m}{˙} (t) d t + \dot{S} (t) d W

d Y (t) = \overset{m}{˙} (t) d t + \dot{S} (t) d W

L (s) = \overset{m}{˙} (s) \cdot \nabla_{x} + \frac{1}{2} \dot{S}_{ij} (s) \partial_{ij} .

L (s) = \overset{m}{˙} (s) \cdot \nabla_{x} + \frac{1}{2} \dot{S}_{ij} (s) \partial_{ij} .

\mathbb{E}\phi(Y(h))=\exp\Big{(}\int_{0}^{h}\mathcal{L}(s)\,ds\Big{)}\phi(x_{0})=\exp\Big{(}(m-x_{0})\nabla_{x}+\frac{1}{2}S_{ij}\partial_{ij}\Big{)}\phi(x_{0}).

\mathbb{E}\phi(Y(h))=\exp\Big{(}\int_{0}^{h}\mathcal{L}(s)\,ds\Big{)}\phi(x_{0})=\exp\Big{(}(m-x_{0})\nabla_{x}+\frac{1}{2}S_{ij}\partial_{ij}\Big{)}\phi(x_{0}).

E ϕ (Y) = exp (L_{z}) φ (z), \forall z \in R^{d},

E ϕ (Y) = exp (L_{z}) φ (z), \forall z \in R^{d},

L_{z} = (m - z) \partial_{x} + \frac{1}{2} S_{ij} \partial_{ij} .

L_{z} = (m - z) \partial_{x} + \frac{1}{2} S_{ij} \partial_{ij} .

X^{1}=x_{0}+A(x_{0},\xi,h)\sim\tilde{\rho}(x;h,x_{0})=\frac{1}{\sqrt{2\pi S(h,x_{0})}}\exp\Big{(}-\frac{(x-m(h,x_{0}))^{2}}{2S(h,x_{0})}\Big{)}.

X^{1}=x_{0}+A(x_{0},\xi,h)\sim\tilde{\rho}(x;h,x_{0})=\frac{1}{\sqrt{2\pi S(h,x_{0})}}\exp\Big{(}-\frac{(x-m(h,x_{0}))^{2}}{2S(h,x_{0})}\Big{)}.

\exp\Big{(}(m(h,x_{0})-x_{0})\partial_{x}+\frac{1}{2}S(h,x_{0})\partial_{xx}\Big{)}\varphi(x_{0})=\varphi(x_{0})+h\mathcal{L}\varphi(x_{0})+\frac{h^{2}}{2}\mathcal{L}^{2}\varphi(x_{0})+O(h^{3})

\exp\Big{(}(m(h,x_{0})-x_{0})\partial_{x}+\frac{1}{2}S(h,x_{0})\partial_{xx}\Big{)}\varphi(x_{0})=\varphi(x_{0})+h\mathcal{L}\varphi(x_{0})+\frac{h^{2}}{2}\mathcal{L}^{2}\varphi(x_{0})+O(h^{3})

m (h, x_{0})

m (h, x_{0})

S (h, x_{0})

X^{1} \sim i = 1 \sum M w_{i} N (m_{i} (h), S_{i} (h)) .

X^{1} \sim i = 1 \sum M w_{i} N (m_{i} (h), S_{i} (h)) .

E ϕ (X^{1}) = i = 1 \sum M w_{i} exp (L_{i}) ϕ (x_{0}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Numerical methods for stochastic differential equations based

on Gaussian mixture††thanks: Received date, and accepted date (The correct dates will be entered by the editor).

Lei Li School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC, Shanghai Jiao Tong University, Minhang District, Shanghai 200240, China ([email protected]).

Jianfeng Lu Department of Mathematics, Department of Chemistry and Department of Physics, Duke University, Durham, NC 27708, USA ([email protected]).

Jonathan C. Mattingly Department of Mathematics and Department of Statistical Science, Duke University, Durham, NC 27708, USA ([email protected]).

Lihan Wang Department of Mathematics, Duke University, Durham, NC 27708, USA ([email protected]).

Abstract

We develop in this work a numerical method for stochastic differential equations (SDEs) with weak second-order accuracy based on Gaussian mixture. Unlike conventional higher order schemes for SDEs based on Itô-Taylor expansion and iterated Itô integrals, the scheme we propose approximates the probability measure $\mu(X^{n+1}\mid X^{n}=x_{n})$ using a mixture of Gaussians. The solution at the next time step $X^{n+1}$ is drawn from the Gaussian mixture with complexity linear in dimension $d$ . This provides a new strategy to construct efficient high weak order numerical schemes for SDEs.

keywords:

Gaussian mixture; stochastic differential equation; second-order scheme; weak convergence

{AMS}

60H35; 65C30; 65L20

1 Introduction

Stochastic differential equations (SDEs) [23] have been used to model a wide range of phenomena, such as stock prices of financial derivatives [4, 11], and physical systems in contact with heat bath [30, 5, 12]. SDEs have recently also been used for analyzing stochastic gradient descent (SGD) in machine learning [13, 9, 7]. The SDEs are dynamical systems with noise [23, 6] that often represent interactions that are not included in the model but affect the dynamics. For example, in the Langevin equations [5, 24], one considers the evolution of a subsystem, while the rest of the system, consisting of potentially large degrees of freedom, is regarded as a heat bath. The interaction between the heat bath and the subsystem is modelled by noise and dissipation terms.

We will consider general SDEs driven by white noise [24, 6] and in particular SDEs in Itô sense [23, Chap. 5]:

[TABLE]

where $X\in\mathbb{R}^{d}$ , $W$ is a standard $m$ -dimensional Brownian motion, $b:\mathbb{R}^{d}\to\mathbb{R}^{d}$ is the drift, and $\sigma:\mathbb{R}^{d}\to\mathbb{R}^{d\times m}$ is the diffusion matrix. We are interested in its numerical approximations, and depending on whether our goal is to approximate the sample paths or the distributions, the numerical methods can be classified into strong schemes and weak schemes [10, 6]. The weak schemes approximate the distributions, and we refer the readers to Section 2.2 for more details. Indeed, weak schemes attempt to match the moments of the iterated Itô integrals, and therefore, the key question for designing weak schemes is how to approximate these moments efficiently. The classical Euler-Maruyama scheme (3.15) is known to be a weak first-order scheme. In applications, schemes with higher order accuracy are often desired. The weak second-order schemes, however, are not trivial for SDEs. The traditional second-order schemes, based on Itô-Taylor expansion [10, 20], involve evaluations of the spatial derivatives of drift and diffusion coefficients, as well as iterated Itô integrals. The weak second-order schemes date back to Milstein and Talay [18, 28]. The Talay-Tubaro expansion can also yield a weak second-order approximation [29]. In [25, 26], Runge-Kutta methods that achieve arbitrary weak order for scalar noise and weak second-order for general noise are developed. Runge-Kutta schemes can avoid approximating some derivatives of the drift and the diffusion coefficients directly (see for example the Talay-Tubaro scheme), and lead to better stability. A weak trapezoidal second-order method has been developed in [2], which is derivative free and no evaluation of iterated Itô integrals is needed. However, it leverages the structure of a particular, but common, class of equations. In [1], higher order convergence for a class of SDEs is achieved based on solving modified SDEs. In [22, 21], another class of higher order schemes were developed which are often referred to as Lie splitting methods. Like the current methods, they strive for a level of weak accuracy by deriving a condition which guarantees that certain terms vanish in an expansion of the difference of the true density and that of the numerical method.

As will be discussed in Section 2.2, one may approximate the conditional distribution $\mu(X^{n+1}\mid X^{n}=x_{n})$ with asymptotic weak local error $O(h^{3})$ to achieve global weak second-order accuracy, where $\{X^{n}\}$ ’s are the numerical solutions and $h$ is the step size. In this work, we propose a novel Gaussian mixture method to achieve $O(h^{3})$ weak local error in which $X^{n+1}$ can be sampled from one of a mixture of Gaussians. Our ansatz is inspired by the expansion of the solution using commutators. Our Gaussian particle ansatz has two attractive properties:

•

Since only one of the Gaussians in the mixture is chosen in each step, the cost in each step is minimal and the simulation is fast: we only need to generate $d$ scalar random variables (with three possible values only) to generate an initial point, and then generate a $d$ -dimensional multi-variate normal variable whose mean and covariance matrix are related with the $d$ scalar random variables and obtained by solving an ODE (see Remark 4.7 for some comments on the number of random variables needed). In this sense, the method we construct in this paper can be considered as a random mixture of Euler-Maruyama steps that produces a higher order method, and it is simple to implement.

•

Secondly, our scheme does not need the spatial derivatives of the coefficients, which is useful in several contexts.

Numerical simulations show that our Gaussian mixture method is indeed weak second-order for reasonable values of the step size. This agrees with our theoretical results that our methods are asymptotically second-order as the step size goes to zero. For related works about using Gaussian approximations for general distributions, see [14, 3]. In [3], Gaussian processes based on a variational approach are used to approximate posterior measure in path space. In [15], Gaussian approximation is used to approximate transition paths in Langevin dynamics.

This work is primarily interested with accuracy considerations. This is reflected in the use of essentially forward-Euler like steps to builds the Gaussian Mixture. It is easy to imagine replacing these with an implicit step to address stability concerns.

The rest of the paper is organized as follows. In Section 2, we give a brief introduction to SDEs and the basic setup of our problem. In particular, the concept and criteria for weak accuracy using test functions with bounded derivatives are introduced. In Section 3, we introduce the idea of Gaussian mixtures for high order weak accuracy and develop an algorithm for one-dimensional SDEs with weak second-order accuracy, where the mean and variance of the Gaussian are computed either based on some ODEs or construction. In Section 4, we generalize the algorithm for 1D SDEs to SDEs in multi-dimensions. The number of Gaussian beams are exponential in the dimension $d$ , but we only need $d$ discrete random variables to determine which beam we choose, so the complexity is linear in $d$ . In Section 5, we perform several numerical examples to see how our algorithm performs regarding different aspects.

2 Preliminaries

In this section, we collect some definitions and notations related to SDEs. Moreover, the notion of weak convergence is introduced in detail, which lays the foundation of our construction of Gaussian mixture methods in later sections.

2.1 Notations and assumptions.

For the convenience of later discussions, we introduce for any integer $k$ the following set of functions

[TABLE]

Here the subscript $b$ is used to remind the reader that the functions are bounded in value and all their derivatives up to the specified order. We use $\mathbb{E}_{x}$ to denote the expectation under the law of the process $X(t)$ with $X(0)=x$ . The notation $\mathcal{N}(m,\Lambda)$ denotes the normal distribution with mean $m$ and covariance matrix $\Lambda$ .

We list out the following assumptions, which will be used throughout this work:

{assumption}

The diffusion matrix

[TABLE]

is uniformly positive definite. In other words,

[TABLE]

for some $\sigma_{0}>0$ , where $\lambda(\Lambda)$ is the set of eigenvalues of $\Lambda$ .

{assumption}

The coefficients are smooth in the sense that $b,\sigma\in C_{b}^{6}$ .

Note that Assumption 2.1 implies that the coefficients are Lipschitz continuous, i.e. there exists a constant $K>0$ such that for all $x,y\in\mathbb{R}^{d}$ ,

[TABLE]

It is well known that Assumption 2.1 ensures the existence of strong solutions to (1.1) [23] and that the moments of the solution are bounded:

[TABLE]

Though likely overly restrictive, Assumption 2.1 and Assumption 2.1 will simplify the analysis and make the ideas more transparent. Analysis based on Assumption 2.1 has been pursued in many works (see for example [2]). Compared with Assumption 2.1, some authors relax the coefficients to have polynomial growth (see for example [19]). The current results can be extended to locally Lipschitz coefficients with polynomial growth under appropriate one-sided Lyapunov conditions or simply the arbitrary moment bounds they imply, see for instance [16].

The generator of the diffusion process (1.1) is given by

[TABLE]

The evolution of the law satisfies the Fokker-Planck equation (or the forward Kolmogrov equation)

[TABLE]

For a smooth function $\phi$ , let us define

[TABLE]

With regularity Assumptions 2.1 and 2.1, $u$ satisfies the backward Kolmogorov equation (see, for example, [23, Chap. 8])

[TABLE]

Formally, this implies the semigroup expansion

[TABLE]

Given regularity assumptions on $\phi$ , the expansion can be rigorously established up to a certain order. We cite a classical result in [8, Chap. XI] for expansion up to $j=2$ , which has been modified for our purpose.

Lemma 2.1.

[8, Theorem 11.6.4]** Under Assumptions 2.1–2.1, there exists a non-negative non-decreasing function $\rho$ , such that for all $\phi\in C_{b}^{\infty}$ ,

[TABLE]

In a numerical scheme, we generate the approximation sequences for the diffusion process at discrete time steps. Let $T>0$ be the terminal time point, and $N$ the number of numerical steps such that

[TABLE]

We use $t_{n}=nh$ ( $n=0,1,\ldots$ ) to denote the time grid points, $X^{n}$ the random variable generated by some numerical method to approximate $X(t_{n})$ , and $x_{n}$ a particular realization of the random variable $X^{n}$ .

2.2 Weak convergence.

We only require the law of $X^{n}$ to approximate the law of the solution to (1.1). This is described by the notion of weak convergence, which will be the focus of this section.

Definition 2.2.

Fix $T>0$ . Let $N,h$ and $X^{n}$ be given as in Section 2.1. We say $X^{n}$ converges weakly with order $r>0$ to $X(t_{n})$ as $h\to 0$ if for any $\phi\in C_{b}^{\infty}$ , there exist $C>0$ , $h_{0}>0$ that are independent of $h$ (but may depend on $T$ and $\phi$ ) such that

[TABLE]

Here, $\mathbb{E}$ represents the expectation under the law of $X^{n}$ or $X(t_{n})$ .

Remark 2.3.

*Note that the test functions here have bounded derivatives, and those used in [10, Sec. 9.7] and [19] have derivatives with polynomial growth. Test functions with bounded derivatives induce weaker topology but are much easier to handle (see e.g., [2]). The results can be extended to the more general setting with additional work and assumptions to ensure boundedness of moments. *

We now move on to the criteria of weak convergence. Suppose the random sequence $X^{n}$ is generated by

[TABLE]

where $\zeta^{n}$ is a random vector generated at time $t_{n}$ and $A$ is a function. If $\{\zeta^{n}\}$ ’s are i.i.d, then $\{X^{n}\}$ is a time-homogeneous Markov chain. (For example, in Algorithm 1 below, $\zeta^{n}$ is a combination of the $z$ random variable and the standard 1D normal variable $\xi$ .)

The following proposition is standard and we provide the proof in Appendix A for reference. We emphasize that the following two results are not new and are part of the standard “folklore” meta-theorems in the subject. We repeat them only to make precise the versions we need and their dependences on parameters.

Proposition 2.4.

Let $b$ and $\sigma$ satisfy $b,\sigma\in C_{b}^{2(r+1)}$ for $r>0$ . If there is a nonnegative and non-decreasing function $\rho$ such that for all $\phi\in C_{b}^{\infty}$ , we have the local truncation error bounded by

[TABLE]

then $X^{n}$ converges weakly with order $r$ to $X(t_{n})$ as $h\to 0$ .

As before, $\mathbb{E}_{x}$ represents the expectation under the law of the process or Markov chain starting at $x$ . This proposition basically says that if the local truncation error is $O(h^{r+1})$ , then the global error is of order $r$ . We have the following trivial observation by Lemma 2.1 and Proposition 2.4:

Corollary 2.5.

Under Assumptions 2.1-2.1, if there exists $\rho$ that is nonnegative and non-decreasing such that $\forall\phi\in C_{b}^{\infty}$ , we have

[TABLE]

then the method (2.13) is of weak second-order accuracy.

3 Weak second-order Gaussian mixture method

The Euler-Maruyama scheme [10] for SDE (1.1)

[TABLE]

generates Gaussian distributions for $X^{n+1}$ conditioning on $X^{n}=x_{n}$ but has only weak first-order accuracy. It is well known that constructing a weak second-order scheme is nontrivial, not to mention a weak second-order scheme using Gaussian approximations for the measure $\mu(X^{n+1}|X^{n}=x_{n})$ . In fact, as we will see, an approximation using a single Gaussian is generally insufficient for weak second-order accuracy. Hence, we aim to use Gaussian mixture to construct higher order schemes.

To start with, let us recall that the law of $Y(t)$ , the weak solution of the following SDE with additive noise, is a Gaussian distribution if $Y(0)$ is a Gaussian random variable independent of $W$ :

[TABLE]

Here, $\mu(t)$ and $\sigma(t)$ only depend on time. The mean and covariance matrix of $Y(t)$ are given respectively by

[TABLE]

Conversely, if we are given some $m$ and some positive semidefinite matrix $S$ , we can recover the normal distribution $\mathcal{N}(m,S)$ by constructing “paths” $\{m(t)\}_{t\in[0,h]}$ and $\{S(t)\}_{t\in[0,h]}$ with $m(h)=m,\,S(h)=S$ , and $\dot{S}$ being positive semi-definite so that the solution to the SDE

[TABLE]

with $Y(0)\sim\mathcal{N}(m(0),S(0))$ , independent of $W$ , will satisfy $Y(h)\sim\mathcal{N}(m,S)$ . Here $\dot{m}$ and $\dot{S}$ denote their respective time derivatives. Now let us consider the time-dependent generator

[TABLE]

We now assume $S(0)=0$ and $m(0)=x_{0}$ for some $x_{0}$ . By the backward equation (2.8), we have

[TABLE]

Here the second equality comes from integrating $\mathcal{L}$ in time. Therefore, for any random variable $Y\sim\mathcal{N}(m,S)$ , we can express the expectation of $\phi$ as

[TABLE]

where

[TABLE]

We will use (3.20) to construct our scheme. In this section, we start with $d=1$ . Our construction for $d=1$ will be used as the building block for constructing our scheme in higher dimensions in Section 4.

3.1 Conditions for second order Gaussian mixtures.

First of all, we claim that using a single Gaussian distribution to approximate $\mu(X^{n+1}|X^{n}=x_{n})$ is generally insufficient for weak second order accuracy. To start with, we assume $X^{1}$ generated by (2.13) conditioning on $X^{0}=x_{0}$ is a normal distribution with mean $m(h,x_{0})$ and variance $S(h,x_{0})$ :

[TABLE]

Here $X^{1}\sim\tilde{\rho}$ means the law of $X^{1}$ has a density $\tilde{\rho}$ . Using (3.20), we desire

[TABLE]

in order to achieve global weak second-order accuracy. Clearly, we need $m(h,x_{0})-x_{0}=o(1),S(h,x_{0})=o(1)$ as $h\to 0$ . Using the semigroup expansion (Lemma 2.1), we infer that

[TABLE]

where $R_{1},R_{2}$ are bounded. Detailed calculation shows the following:

Proposition 3.1.

For a general multiplicative noise (or equivalently $\sigma(x)$ is not constant), there exist no $(m_{0},m_{1},m_{2},S_{0},S_{1})$ as functions of $x_{0}$ such that the constraint (2.14) can be satisfied.

Remark 3.2.

By the proof of Proposition (3.1) in Appendix B, if the noise is additive ( $\sigma$ is independent of $x$ ), it is possible to construct an approximation with a single Gaussian that yields global weak second-order accuracy.

The proof of Proposition 3.1 is provided in Appendix B. Proposition 3.1 is a strong indication that no approximation with one Gaussian can reach weak second-order accuracy, which forces us to seek Gaussian mixtures. In the derivation below, we use $R(x)$ to denote a generic function with a bound that depends only on $\|\cdot\|_{C^{6}}$ norms of $b,\sigma$ , and $\phi$ (the test function), and its concrete meaning may change from line to line. Below, we would first present an informal argument to derive the scheme; the rigorous analysis of the scheme will be deferred to later sections.

As we have mentioned, considering the law of $X^{1}$ given the initial position $X^{0}=X(0)=x_{0}$ is sufficient to determine the whole Markov chain by time homogeneity. Therefore, it suffices to consider that the law of $X^{1}$ is given by a mixture of $M$ Gaussians:

[TABLE]

Here we abuse notation by letting $\mathcal{N}(m,S)$ denote the density function of a Gaussian with the given mean and covariance.

Let $\mathcal{L}_{i}:=(m_{i}(h)-x_{0})\partial_{x}+\frac{1}{2}S_{i}(h)\partial_{xx}$ . By (3.20), we have

[TABLE]

Here, the dependence on $x_{0}$ in the coefficients is not written out explicitly for simplicity. Since after time $h$ , the scale for a pure diffusion process is $\sqrt{h}$ (since $\mathbb{E}|X(t+h)-X(t)|^{2}\sim h$ ), we expect that $|m_{i}(h)-x_{0}|\leq C\sqrt{h}$ and $|S_{i}(h)|\leq C_{M}h$ . Therefore, by Corollary 2.5, the scheme will be of weak second-order if the following holds for all $\phi\in C_{b}^{\infty}$ :

[TABLE]

where $R(h)$ is bounded and depends on the function $\phi$ . We stop at $\mathcal{L}_{i}^{4}$ because of the expectation $|m_{i}(h)-x_{0}|\leq C\sqrt{h}$ so $\mathcal{L}_{i}^{k}$ is of order at least $O(h^{\frac{k}{2}})$ . Note that by (2.5),

[TABLE]

Due to the $\sqrt{h}$ scale in displacement, we take the following ansatz for $m_{i}(h)$ and $S_{i}(h)$ :

[TABLE]

Substituting the ansatz (3.28) into (3.26), after a tedious but straightforward calculation, we are able to derive the following conditions:

[TABLE]

In the above equations, functions $b,\Lambda$ and their spatial derivatives are all evaluated at point $x_{0}$ .

In the following Section 3.2, we consider a possible approach to satisfy these constraints, by choosing $M=3$ .

Remark 3.3.

We have not yet derived a weak third order Gaussian mixture scheme. The number of variables and the equations grow to the point where our current methods to solve them are unfeasible. However, we expect that a minimum of five Gaussians is needed to reach third order, which is suggested by the second and sixth equations of (3.29). These are constraints for $\phi^{\prime\prime}$ in first order and $\phi^{(4)}$ in second order respectively (and there will be another constraint for $\phi^{(6)}$ in third order), which only involve the weights $w_{i}$ and the leading order diffusion scaling terms $m_{i0}$ and $S_{i1}$ .

3.2 An ODE approach.

In this section, we give a particular construction of $m_{i}(h)$ and $S_{i}(h)$ that satisfy (3.29) which determines our numerical scheme. Our approach is to construct ODEs for $m_{i}$ and $S_{i}$ with a certain initial condition and solve them at time $h$ , which has the advantage of avoiding derivative evaluations of $b$ and $\Lambda$ .

To satisfy the last condition of (3.29), we consider a “symmetric” construction. It is convenient to relabel the Gaussians as $i=0,\pm 1$ , so that $\mathcal{N}(m_{0}(h),S_{0}(h))$ is “centralized” and does not contribute to the odd powers of $h^{\frac{1}{2}}$ . The centers of the other two Gaussians $m_{\pm 1}(h)$ are placed at both sides of $m_{0}(h)$ with $O(\sqrt{h})$ distance apart, and the variance matrices $S_{1}(h),S_{-1}(h)$ are constructed similarly with each other. These two Gaussians will contribute powers like $h^{k/2}$ . Moreover, we impose $w_{1}=w_{-1}$ , so that the odd powers of $h^{\frac{1}{2}}$ will cancel out with each other due to symmetry.

For initial conditions, we set

[TABLE]

where $\gamma>0$ is a parameter and

[TABLE]

This choice takes into consideration that the diffusion scale is $\sqrt{h}$ , while transportation scale is $h$ . For the choice of ODE flows, we take the following ansatz, where the functions $g_{i}$ are to be determined later:

[TABLE]

Our choice in (3.32a) is natural in the sense that we expect

[TABLE]

and the approximation is exact if $b$ is a linear function. For symmetry, we require $g_{1}=g_{-1}$ . Clearly, the $\sqrt{h}$ factor enters $m_{i}(h)$ through the initial value and then the equation. Due to symmetries in both $m_{\pm 1}$ and $S_{\pm 1}$ and $w_{1}=w_{-1}$ , all the odd powers of $h^{1/2}$ indeed cancel out in $\sum_{i}w_{i}\mathcal{L}_{i}^{m}$ .

We now find the constraints on the functions $g_{i}$ and the parameters so that (3.29) can be satisfied. To start with, we have by Taylor expansion that

[TABLE]

Hence, substituting (3.30) into (3.33), and considering our ansatz (3.28), we obtain

[TABLE]

Similarly, we can find $S_{ij}$ ’s:

[TABLE]

However, substituting (3.34) and (3.35) into (3.29) cannot uniquely determine the parameters $w_{i},~{}g_{i}$ and $\gamma$ . We further impose $S_{01}=S_{11}$ which makes our construction of the scheme easier for higher dimensions, which uniquely determines the solution:

[TABLE]

Clearly, choosing the following functions will suffice:

[TABLE]

Unfortunately, this choice has one issue: $g_{0}$ and $g_{1}$ are not always nonnegative. Indeed, it is possible that $S_{i}(h)$ given by (3.32b) could be negative. To solve this issue, we simply set $S_{i}(h)$ to zero if that happens. Fortunately, since $g(x)\approx\frac{1}{2}\Lambda(x_{0})$ is positive whenever $x$ is close to $x_{0}$ , $S_{i}(h)$ can be guaranteed to be positive whenever $h$ is sufficiently small, thus it can be shown that this error has a lower-order effect. Similar situation also arises in [2].

The details of the procedure outlined above are expressed more exactly in the following Algorithm 1 which gives the pseudocode to generate $x_{n+1}$ from $x_{n}$ .

Remark 3.4.

One may truncate the function and consider

[TABLE]

where $\psi(x;x_{0})$ is some truncation function that is $1$ in a neighborhood of $x_{0}$ so that $g_{0},g_{1}$ are positive definite for all $x$ . This approach, however, is not very convenient and in practice the behavior is not very satisfactory.

We are now in position to present the following theorem, which tells that our scheme is indeed of weak second-order.

Theorem 3.5.

Let $d=1$ . Suppose Assumptions 2.1-2.1 hold., then Algorithm 1 is a weak second-order scheme for SDE (1.1).

Proof 3.6.

It is clear that there exists $h_{0}>0$ such that for $h<h_{0}$ ,

[TABLE]

Consider that $X(0)=X^{0}=x_{0}$ . By construction, $|m_{i}(t)-m_{i}(0)|\leq\|b\|_{\infty}h$ for all $t\leq h$ , and we have

[TABLE]

Hence, $S_{i}(h)>0$ for $h<h_{0}$ . Moreover, any reasonable numerical approximation to $S_{i}(h)$ will also be positive for sufficiently small $h$ .

By (3.37), we can conclude that for $h<h_{0}$ , (3.29) holds, and $S_{i}(h)>0$ . In other words, (3.26) holds and

[TABLE]

By Corollary 2.5, we find that our scheme constructed here is of weak second-order if (3.39) is solved exactly. Since for any numerical solver on (3.39) that is of at least second order, the error induced by solving (3.39) is $O(h^{3})$ or smaller, and therefore the above local estimate still holds. Our Algorithm 1 is thus of weak second order as well.

Remark 3.7.

The above construction with ODE flow gives $S_{i}(h)$ that can be possibly negative, though it is positive asymptotically as $h\to 0$ and when it becomes negative, we can always fix by setting it to zero. One may desire to have a method that ensures $S_{i}(h)$ to be positive. In Appendix D, we provide a direct way to construct $S_{i}$ ’s so that positivity can be guaranteed. However, this method involves evaluation of the derivatives of $\Lambda$ , which is oftentimes undesired.

4 Gaussian mixture for multi-dimensions

In this section, we generalize the Gaussian mixture method constructed in Section 3 to higher dimensions. We assume that we have the eigen-decomposition for $\Lambda(x)$ :

[TABLE]

where $\lambda_{i}(x)$ ’s are the eigenvalues of the matrix $\Lambda(x)$ , and $\{v_{i}\}$ forms an orthonormal basis of $\mathbb{R}^{d}$ .

As discussed in Section 3, we only need to focus on how to generate $X^{1}$ given $X^{0}=x_{0}$ . Again, we assume that $X^{1}$ has the conditional probability measure of the form

[TABLE]

for Gaussian mixture approximations. Here we use $P$ to denote the set of indices $p$ .

To illustrate our choice of the number of Gaussians and their initial positions, suppose we have $d$ -dimensional decoupled diffusion process (diffusion matrix is diagonal), then we approximate each dimension using our 1D technique in Section 3 and then get a global second-order approximation. In each dimension, we have three Gaussians, which means we have a total of $3^{d}$ Gaussians. If the diffusion matrix is no longer diagonal, we can still consider using $3^{d}$ Gaussians. At the first glance, the complexity is large, but fortunately, it turns out that the complexity grows linearly with $d$ instead of exponentially.

We now explain our construction. Let the index set $P=\{-1,0,1\}^{d}$ , so that $|P|=3^{d}$ , and each index $p\in P$ can be expressed as $p=(z_{p}^{1},\cdots,z_{p}^{d})$ where $z_{p}^{i}\in\{0,\pm 1\}$ . Let us consider the Gaussians with initial centers $y_{p}$ , given by

[TABLE]

These formulas and $\gamma=\frac{3}{2}$ are obtained from the 1D construction in Section 3. The weight for the Gaussian with index $p$ is

[TABLE]

with the parameters given by

[TABLE]

Remark 4.1.

Another natural idea is to place the initial points at $x_{0},~{}x_{0}\pm\sqrt{\gamma\lambda_{i}h}v_{i}$ and there are $2d+1$ such points. After some attempts, we found that this strategy hardly works when $d$ is large.

With these initial positions and weights, we can easily generalize our Gaussian mixture constructions for $d=1$ to arbitrary dimensions.

4.1 The ODE approach for multi-dimensions.

Following the construction in the 1D case, we consider $m_{p}(h)$ and $S_{p}(h)$ for $p\in P$ given by

[TABLE]

where

[TABLE]

Thanks to imposing $S_{01}=S_{11}$ in (3.35) we are able to have a simple expression (4.47). The algorithm can then be summarized as the following Algorithm 2.

Remark 4.2.

Our algorithm requires a matrix factorization at every time step, which is the most computationally costly step. However, as $\Lambda(X(t))$ does not change much between consecutive time steps, one could use the matrix of $v_{i}$ ’s as the preconditioner for next step’s computation, which will significantly reduce the computational cost in high dimensions.

We now establish the main result for multi-dimensions:

Theorem 4.3.

Suppose the Assumptions 2.1-2.1 are satisfied, then there exists $h_{0}>0$ such that when $h<h_{0}$ :

(i) $S_{p}(h)$ is positive definite for all $p\in P$ and for any initial position $x_{0}$ .

(ii) for any test function $\phi\in C_{b}^{\infty}$ , there exists a constant $C$ depending on the $C^{6}$ norms of $\phi,b,\sigma$ only, such that

[TABLE]

Consequently, the Gaussian mixture Algorithm 2 is a weak second-order scheme to (1.1).

To prove this theorem, we first present a useful lemma, the proof of which is deferred to Appendix C:

Lemma 4.4.

For a function $\phi\in C_{b}^{\infty}$ , we have

[TABLE]

Here we use shorthand notation $D_{i}:=D_{v_{i}}$ .

Remark 4.5.

Here $D_{v_{i}}\phi(x):=v_{i}(x_{0})\cdot\nabla\phi(x)$ , so we have $D_{v_{i}}^{2}\phi=v_{i}\cdot\nabla(v_{i}\cdot\nabla\phi(x))=v_{i}\otimes v_{i}:\nabla^{2}\phi(x)$ . The function inside is $v_{i}(x_{0})\cdot\nabla\phi(x)$ . In other words, we allow $\phi$ to change for $x\neq x_{0}$ but $v_{i}$ is frozen to be its value at $x_{0}$ .

Proof 4.6.

(Proof of Theorem 4.3)

(i). To prove this claim, we find that for all $p\in P$ ,

[TABLE]

Hence,

[TABLE]

Recall that we use $\lambda(M)$ to represent the set of eigenvalues of matrix $M$ . If $h$ is sufficiently small, $\min\lambda(G(m_{p}(t)))$ is positive for all $p\in P$ for $t\leq h$ . By Equation (4.46), $\min\lambda(S_{p}(h))$ is positive for all $p$ .

(ii). Noticing that $\partial_{ijkl}\phi$ is a symmetric tensor on any indices, we find (the Einstein summation convention is used)

[TABLE]

Using Lemma 4.4, we are able to compute the sums. For example, we find:

[TABLE]

Here, we used (4.47) and identities like

[TABLE]

Noting

[TABLE]

we have after some computation:

[TABLE]

where

[TABLE]

and

[TABLE]

Again, by the eigen-decomposition $\Lambda=\sum_{i=1}^{d}\lambda_{i}v_{i}v_{i}^{T}$ , we find

[TABLE]

Similarly, we find

[TABLE]

which equals to $B$ . Together with (i), Corollary 2.5 gives the claim.

Remark 4.7.

For the multi-dimensional Algorithm 2, though we have exponentially many Gaussians, the complexity is just linear in $d$ . In fact, one needs the number of random variables to grow at least linearly in $d$ to get a weak second-order scheme for general SDEs [18, 28, 19].

4.2 Efficiency of the Monte Carlo method.

For the multi-dimensional Algorithm 2, though we have exponentially many Gaussians, we see that the complexity is just linear in $d$ , which means our algorithm has good computational efficiency. Since we only care about the distributions, we often use Monte Carlo methods [17, 27] to generate a large number of samples and use the empirical measure to approximate the probability measure. As we know, the error and efficiency of Monte Carlo methods depend on the variance. The variance of the Euler-Maruyama scheme (3.15) is $\Lambda(x_{n})h$ , where $x_{n}$ is the value of the scheme at $t_{n}$ . For the same reason, if we can show that the variance of Algorithm 2 after one step is proportional to $h$ , then the Monte Carlo method based on our algorithm is as efficient as the Monte Carlo method based on the Euler-Maruyama method (3.15).

In this section, we compute the second moment

[TABLE]

and show that it is indeed $O(h)$ despite we have exponentially many Gaussians. For the notational convenience, we define the matrix norm

[TABLE]

where $|\Lambda(x)|=\sum_{i=1}^{d}|\lambda_{i}(x)|v_{i}(x)v_{i}^{T}(x)$ if $\Lambda(x)$ is given by (4.41).

Proposition 4.8.

There exists $h_{0}>0$ such that when $h<h_{0}$

[TABLE]

for Algorithm 2.

Proof 4.9.

By (4.42), direct computation shows that

[TABLE]

The first inequality in (4.55) follows from (4.49) and (4.50):

[TABLE]

For the last equality, we have by the fact that $\{v_{i}\}$ ’s are orthonormal:

[TABLE]

and the last equality in (4.55) follows since $\sum_{p\in P}w_{p}|z_{p}^{i}|^{2}=2w_{1}=\frac{1}{3}$ (see (4.80)). Now, noticing $\operatorname{tr}(G(m(t)))\leq\frac{3}{2}\|\Lambda\|_{\operatorname{tr}}$ , we obtain

[TABLE]

For Algorithm 2, when $h$ is small enough, we have

[TABLE]

Since $\|b\|_{\infty}^{2}h^{2}$ is in higher order, the claim follows.

5 Numerical experiments

In this section, we apply the algorithm on SDE (1.1) in Itô sense with different choices of $b$ and $\sigma$ . Note that the Assumption 2.1 $\sigma,b\in C_{b}^{m}$ is only listed for convenience of theoretical analysis. For a diffusion process starting at $x_{0}$ , within finite time $T$ , the probability density is concentrated in a finite domain and the far away behaviors of $b$ and $\sigma$ are not important. Hence, in the simulation here, we may use unbounded $b$ and $\sigma$ . We also check how the algorithm behaves if there are some degenerate points of $\Lambda$ (i.e. $\Lambda$ is only positive semi-definite at these points).

5.1 A 1D example with regular $\sigma$ .

This example is designed to test the correctness of Algorithm 1. The dimension is $d=1$ and $\sigma^{2}$ is uniformly bounded from below. We will also plot the empirical distribution generated by our algorithm to compare with the one generated by Euler-Maruyama scheme (3.15).

The SDE we consider is as following:

[TABLE]

The diffusion coefficient $\sigma(x)=\sqrt{x^{2}+4}$ is bounded below uniformly so that there is no degenerate point.

To test the correctness of our algorithm, we use the test function $\phi(x)=x^{2}$ and define the relative error as

[TABLE]

where $X^{(k)}=\{X^{(k),n}\}_{n\geq 0}$ is the sequence generated by the numerical algorithm in the $k$ -th experiment. Hence, $X^{(k)}$ is a sample path. The exact expectation $\mathbb{E}_{x_{0}}X^{2}(T)$ , by Itô’s formula [23, Chap. 4], is given by

[TABLE]

In Figure 1, we plot the results of the simulation for $X(0)=2,\lambda=-2$ and $T=2$ . Each error is computed using $N=10^{8}$ trajectories. The “error bars” are obtained by chopping all samples into $10$ slices, with each slice containing $10^{7}$ trajectories. We then compute the relative error (5.59) in each slice, denoted by $E^{(m)}$ ( $1\leq m\leq 10$ ). We find the standard deviation $\sigma_{E}$ for the data $\{E^{(m)}\}_{m=1}^{10}$ , and use $[E-1.65\sigma_{E},E+1.65\sigma_{E}]$ as our confidence interval. We find that our Gaussian mixture method gives weak second-order accuracy.

To confirm that the Gaussian mixture method gives the desired distribution, we now plot the empirical distribution in Figure 2 by histcounts. All the empirical densities are obtained by using $N=10^{6}$ points, and the initial condition $X(0)=2$ . We take the results obtained from Euler-Maruyama (E-M) scheme (3.15) with $\Delta t=h^{3}$ as the reference density (green curves in Figure 2 (a) and (b)).

In Figure 2 (a), we plot the empirical densities obtained by Algorithm 1 (red) and Euler-Maruyama (black) after one step with step size $\Delta t=h=1/32$ . At time $t=h$ , the reference density (green curve) has a peak at $x_{c}\approx 1.79$ while its mean is located at the black dot ( $\bar{x}\approx 1.88$ ). We also calculated the empirical skewness $\gamma_{1}=\mathbb{E}\Big{(}\dfrac{X(h)-\bar{x}}{\sigma}\Big{)}^{3}\approx 0.3695$ (here only $\sigma$ denotes the variance of the reference density), and the kurtosis $K=\mathbb{E}\Big{(}\dfrac{X(h)-\bar{x}}{\sigma}\Big{)}^{4}\approx 3.3078$ , while the accurate skewness and kurtosis are 0.3718 and 3.3153 respectively. The skewness and kurtosis for a Gaussian (Euler-Maruyama method) are 0 and 3 respectively. For Algorithm 1, these two numbers are 0.3717 and 3.1888.

In Figure 2 (b), we plot the empirical densities obtained by Algorithm 1 (red) and Euler-Maruyama (black) at time $t=1$ with step size $\Delta t=h=1/32$ . We find that the densities given by our weak second-order algorithm almost coincides with the reference density, while the one given by E-M is worse.

To sum up, for this example (5.58), the Gaussian mixture method has weak second-order accuracy and is able to capture the correct distribution better.

5.2 1D Geometric Brownian Motion.

In this example, we consider the 1D Geometric Brownian Motion

[TABLE]

which has a degenerate diffusion coefficient

[TABLE]

Again, we test the weak accuracy with test function $\phi(x)=x^{2}$ and define the weak error

[TABLE]

By Itô calculus, it is straightforward to find

[TABLE]

In Figure 3, we plot the weak error of simulations for $\lambda=-0.8,\sigma=0.85,x_{0}=5,T=1$ with $N=2\times 10^{8}$ . The error bars are computed by slicing the samples into $5$ pieces of equal size, and the method is the same as in Section 5.1 (confidence interval is $[E-1.65\sigma_{E},E+1.65\sigma_{E}]$ ).

For the tested parameters our Gaussian mixture method still works and is of weak second-order. For this example, the error of Algorithm 1 scales like $h^{2}$ only when $h$ becomes small. This can be seen in the kink in Figure 3 where only the left-most two points seem to line up with the order $h^{2}$ line. After further investigation, we find that for the first three $h$ values ( $h=0.25,0.125,0.0833$ ), there are roughly $1/6$ chance that the computed $S(h)$ from the ODE is negative. For smaller values of $h$ ( $h=0.0625,0.05$ ), $S(h)$ is always nonnegative for the samples we have. In light of this, we expect the second-order behavior for our approach to appear in the examples with $h\lesssim 0.0625$ . When there is a resonable chance that $\sigma^{2}$ is degenerate, our approach seems to lose the second-order accuracy.

5.3 A 2D example.

In this example, we consider a 2D SDE, which is a modification of the first example in [2]:

[TABLE]

where $W_{1}(t)$ and $W_{2}(t)$ are independent standard Brownian motions, and $\sigma$ is a positive constant. The purpose here is to show that our Gaussian mixture method for multi-dimensions (Algorithm 2) works for $\Lambda(x)$ that has varying eigen-directions.

We consider the solution of (5.60) at $T=1$ with initial condition $X_{1}(0)=X_{2}(0)=1$ and $\sigma=0.1$ . We will use the test function $\phi(x)=x_{2}^{2}$ to check the weak accuracy. By Itô’s formula,

[TABLE]

As before, the relative error is computed as

[TABLE]

In Figure 4, we sketch the error plots with $N=2\times 10^{8}$ and also slice these samples into $10$ equal pieces for the “error bar” calculation (confidence interval $[E-1.65\sigma_{E},E+1.65\sigma_{E}]$ and $\sigma_{E}$ is the standard deviation for these 10 data). We find that our Gaussian mixture method gives weak second-order accuracy for this 2D example as well.

5.4 A 6D Example.

According to Algorithm 2, the proposed Gaussian mixture method depends explicitly on the dimension and one is surely curious with what will happen if the dimension gets higher. In this example, we look at a 6D problem and verify that our algorithm is still weak second-order.

The SDE we consider is given by:

[TABLE]

We take $\sigma=0.7$ and check the solution at $t=2$ . The initial condition we use is $X_{i}(0)=1$ for all $1\leq i\leq 6$ . The test function we use is

[TABLE]

By Itô’s formula

[TABLE]

The relative error is again defined as

[TABLE]

For the following log-log error plot (Figure 5), we choose $h=\dfrac{1}{4k}$ , $1\leq k\leq 5$ . The sample size is $N=2\times 10^{8}$ for $h\geq\dfrac{1}{16}$ and $5\times 10^{8}$ for $h=\dfrac{1}{20}$ , chopped into $10$ equal slices to produce the error bars with confidence interval $[E-1.65\sigma_{E},E+1.65\sigma_{E}]$ ( $\sigma_{E}$ is again the standard deviation of these $10$ data). The plot demonstrates that the scheme works in high dimensions as well.

Acknowledgements

The work of L. Li is partially sponsored by NSFC 11901389,11971314, and Shanghai Sailing Program 19YF1421300. The work of J. Lu and L. Wang is supported in part by National Science Foundation under grant DMS-1454939, while J. Mattingly is supported in part by National Science Foundation under grant DMS-1613337.

Appendix A Proof of Proposition 2.4

Proof A.1.

Let us fix $\phi\in C_{b}^{2(r+1)}$ and define

[TABLE]

By the Markov property, we have

[TABLE]

Similarly, we have

[TABLE]

Note that $u$ satisfies the backward Kolmogorov equation

[TABLE]

with initial condition

[TABLE]

By standard parabolic PDE theory, for $b,\sigma\in C_{b}^{2(r+1)}$ , we have

[TABLE]

By Equations (1.65) and (1.66), we have for all $x\in\mathbb{R}^{d}$ that

[TABLE]

Define

[TABLE]

by the assumption of Proposition 2.4 on local truncation error and Equation (1.67) we have

[TABLE]

where $C=\sup_{0\leq t\leq T}\rho(\|u(\cdot,t)\|_{C^{2(r+1)}})$ . This further implies that

[TABLE]

Appendix B Proof of Proposition 3.1

Proof B.1.

For the convenience of notations, we will drop the dependence on $x_{0}$ so that $m(h)$ indeed means $m(h,x_{0})$ and $m_{1}$ means $m_{1}(x_{0})$ and so on. Denote $\mathcal{L}_{1}:=(m(h)-x_{0})\partial_{x}+\frac{1}{2}S(h)\partial_{xx}$ , and we have

[TABLE]

It follows that

[TABLE]

Here, $B$ and $C$ are the coefficients of $h$ and $h^{2}$ :

[TABLE]

To satisfy the condition (2.14), we need to have

[TABLE]

Recall that $\mathcal{L}=b\,\partial_{x}+\frac{1}{2}\Lambda(x)\partial_{x}^{2}$ , so $B=\mathcal{L}\phi(x_{0})$ requires that for any sufficiently smooth $\phi$ ,

[TABLE]

which requires

[TABLE]

On the other hand, the requirement $C=\frac{1}{2}\mathcal{L}^{2}\phi(x_{0})$ can be expanded as

[TABLE]

This is impossible in general. For example, the coefficient of $\phi^{\prime\prime\prime}$ on right hand side is $\frac{1}{2}m_{1}S_{0}$ , or $\frac{1}{2}b(x_{0})\Lambda(x_{0})$ but the one on left hand side is $\frac{1}{2}b(x_{0})\Lambda(x_{0})+\frac{1}{4}\Lambda(x_{0})\Lambda^{\prime}(x_{0})$ . They can not balance unless the diffusion matrix $\Lambda(x)$ is constant.

Appendix C Proof of Lemma 4.4

Proof C.1.

In this proof, we will again use $R$ to denote a generic function that can depend on the $C^{6}$ norm of the test function but can be bounded uniformly in $x_{0}$ and $h$ . However, its concrete meaning can change from line to line.

Clearly, due to the symmetry, we only need to prove that for all $\phi\in C_{b}^{\infty}$ ,

[TABLE]

Without loss of generality, we set $x_{0}=0.$ With Equation (4.44), it is convenient to denote the left hand side of (3.71) as

[TABLE]

If $d=1$ , the claim follows from 1D Taylor expansion derived in Section 3. Assume that the claim is valid for all $d=1,2,\ldots,m$ , $m\geq 1$ , and we want to prove for $d=m+1$ . Define $P_{m}$ to be the index set with $d=m$ . We find by definition:

[TABLE]

For each $p$ , we do Taylor expansion of $\phi$ about $\sum_{i=1}^{m}z_{p}^{i}\sqrt{\gamma\lambda_{i}h}v_{i}$ and have

[TABLE]

By the induction hypothesis, we have

[TABLE]

Arranging the terms on the right hand side, we find the claim is also true for $d=m+1$ .

Appendix D A variance construction approach

D.1 The variance construction method for one dimension.

Motivated by (3.35) and (3.36), we can construct

[TABLE]

We can verify that the constraints are all satisfied. The third term added is to ensure that $S_{0}$ is non-negative. Compared with the ODE flow method, the drawback of this method is that it involves higher order spatial derivatives, such as $\Lambda^{\prime\prime}$ . In practice, one may approximate it by finite difference $\frac{1}{h^{2}}\bigl{(}\Lambda(x_{0}+h)-2\Lambda(x_{0})+\Lambda(x_{0}-h)\bigr{)}$ .

Remark D.1.

The third correction term can be thrown away if $h$ is small enough. For example,

[TABLE]

This construction gives the following Algorithm 3 to generate $x_{n+1}$ given $X^{n}=x_{n}$ .

One can verify that the requirements in (3.29) are satisfied, which gives the following theorem:

Theorem D.2.

Let $d=1$ . Suppose Assumptions 2.1-2.1 hold, then Algorithm 3 is a weak second-order scheme for the 1D diffusion process (1.1).

The proof is identical to that of Theorem 3.5 and is omitted here.

D.2 The variance construction method for multi-dimension.

As before, one may want to guarantee that $S_{p}(h)$ is positive definite for $p\in P$ . We now present a variance construction method for $S_{p}(h)$ for multi-dimension. Consider that $m_{p}(h)$ and $S_{p}(h)$ are given by

[TABLE]

where $\sum_{i=1}^{d}(1-|z_{p}^{i}|)\lambda_{i}D_{i}^{2}\Lambda$ can be approximated by finite difference. In particular, if we set $\theta=\sum_{i=1}^{d}\sqrt{(1-|z_{p}^{i}|)\lambda_{i}}v_{i}$ , then

[TABLE]

$F_{p}(h)h^{3}$ is added to ensure that $S_{p}$ is positive semi-definite. Let the first two terms in $S_{p}$ be $hA_{p}$ and $h^{2}B_{p}$ , where $A_{p}$ is positive definite and thus invertible. Then, we have

[TABLE]

We propose Algorithm 4 to generate $x_{n+1}$ given $X^{n}=x_{n}$ .

Remark D.3.

Notice that we need to invert a matrix to get $F(h)$ , which is not desirable when $d$ is large. However, similar as the 1D case, if $h$ is small enough, $F(h)h^{3}$ can be thrown away and we can still guarantee the positive definiteness.

Theorem D.4.

Suppose Assumptions 2.1-2.1 hold, then Algorithm 4 is a second-order scheme for the multi-dimensional diffusion process (1.1).

Proof D.5.

Again, the idea is to check the conditions in Corollary 2.5. Our strategy is not to verify the conditions directly, instead, we compare it to Algorithm 2, the one using an ODE approach.

Again, we only have to check $X^{1}$ given $X^{0}=x_{0}$ . Let $S_{p}^{o}$ be the covariance matrix obtained following Algorithm 2 at time $h$ while $S_{p}^{s}$ be the covariance matrix constructed in this section at time $h$ for $p\in P$ . Let $\mathbb{E}_{x_{0}}^{s}$ denotes the expectation under the process constructed here while $\mathbb{E}_{x_{0}}^{o}$ be the expectation under the process in Algorithm 2.

Consider

[TABLE]

Since the two algorithms only give different covariance matrices, we have by Equation (4.42):

[TABLE]

We denote

[TABLE]

By Equation (4.46) and direction Taylor expansion on $t$ , we find for $p\in P$ ,

[TABLE]

where $K_{p}^{1}(h)$ is a bounded function.

We do expansion on $S_{p}^{s}$ and have

[TABLE]

where $K_{p}^{2}$ is some bounded function. This implies

[TABLE]

Hence, we can replace $m_{p}(h)$ with $y_{p}$ and throw away the terms involving $\partial_{ijkl}\phi$ in (4.77) with introducing errors at most $R(h)h^{3}$ :

[TABLE]

By (4.79) and (4.78), we find

[TABLE]

We note first that

[TABLE]

We justify the second equality for an example. Let $j\in\{\pm 1,0\}$ be the index over the beams in one dimension, and $z_{j}=j$ . Then, when $m=n$ ,

[TABLE]

When $m\neq n$ ,

[TABLE]

Using (4.80), we find that

[TABLE]

Therefore

[TABLE]

However, we know that $E_{1}$ does not contain $h^{5/2}$ terms while $E_{2}$ neither does because of the symmetry. Hence, $|E|\leq R(h)h^{3}$ , which finishes the proof.

Bibliography30

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Abdulle, D. Cohen, G. Vilmart, and K. C. Zygalakis , High weak order methods for stochastic differential equations based on modified equations , SIAM Journal on Scientific Computing, 34 (2012), pp. A 1800–A 1823.
2[2] D. F. Anderson and J. C. Mattingly , A weak trapezoidal method for a class of stochastic differential equations , Commun. Math. Sci., 9 (2011), pp. 301–318.
3[3] C. Archambeau, D. Cornford, M. Opper, and J. Shawe-Taylor , Gaussian process approximations of stochastic differential equations , in Gaussian Processes in Practice, 2007, pp. 1–16.
4[4] F. Black and M. Scholes , The pricing of options and corporate liabilities , Journal of political economy, 81 (1973), pp. 637–654.
5[5] W. T. Coffey and Y. P. Kalmykov , The Langevin equation: with applications to stochastic problems in physics, chemistry and electrical engineering , vol. 27, World Scientific, 2012.
6[6] W. E, T. Li, and E. Vanden-Eijnden , Applied stochastic analysis , vol. 199, American Mathematical Soc., 2019.
7[7] Y. Feng, L. Li, and J.-G. Liu , Semigroups of stochastic gradient descent and online principal component analysis: properties and diffusion approximations , Communication in Mathematical Sciences, 16 (2018), pp. 777–789.
8[8] E. Hille and R. S. Phillips , Functional analysis and semi-groups , vol. 31, American Mathematical Soc., 1996.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Numerical methods for stochastic differential equations based

Abstract

keywords:

1 Introduction

2 Preliminaries

2.1 Notations and assumptions.

Lemma 2.1**.**

2.2 Weak convergence.

Definition 2.2**.**

Remark 2.3**.**

Proposition 2.4**.**

Corollary 2.5**.**

3 Weak second-order Gaussian mixture method

3.1 Conditions for second order Gaussian mixtures.

Proposition 3.1**.**

Remark 3.2**.**

Remark 3.3**.**

3.2 An ODE approach.

Remark 3.4**.**

Theorem 3.5**.**

Proof 3.6**.**

Remark 3.7**.**

4 Gaussian mixture for multi-dimensions

Remark 4.1**.**

4.1 The ODE approach for multi-dimensions.

Remark 4.2**.**

Theorem 4.3**.**

Lemma 4.4**.**

Remark 4.5**.**

Proof 4.6**.**

Remark 4.7**.**

4.2 Efficiency of the Monte Carlo method.

Proposition 4.8**.**

Proof 4.9**.**

5 Numerical experiments

5.1 A 1D example with regular σ\sigmaσ.

5.2 1D Geometric Brownian Motion.

5.3 A 2D example.

5.4 A 6D Example.

Acknowledgements

Appendix A Proof of Proposition 2.4

Proof A.1**.**

Appendix B Proof of Proposition 3.1

Proof B.1**.**

Appendix C Proof of Lemma 4.4

Proof C.1**.**

Appendix D A variance construction approach

D.1 The variance construction method for one dimension.

Remark D.1**.**

Theorem D.2**.**

D.2 The variance construction method for multi-dimension.

Remark D.3**.**

Theorem D.4**.**

Proof D.5**.**

Lemma 2.1.

Definition 2.2.

Remark 2.3.

Proposition 2.4.

Corollary 2.5.

Proposition 3.1.

Remark 3.2.

Remark 3.3.

Remark 3.4.

Theorem 3.5.

Proof 3.6.

Remark 3.7.

Remark 4.1.

Remark 4.2.

Theorem 4.3.

Lemma 4.4.

Remark 4.5.

Proof 4.6.

Remark 4.7.

Proposition 4.8.

Proof 4.9.

5.1 A 1D example with regular $\sigma$ .

Proof A.1.

Proof B.1.

Proof C.1.

Remark D.1.

Theorem D.2.

Remark D.3.

Theorem D.4.

Proof D.5.