A generic construction for high order approximation schemes of   semigroups using random grids

Aur\'elien Alfonsi; Vlad Bally

arXiv:1905.08548·math.PR·May 22, 2019·Numerische Mathematik

A generic construction for high order approximation schemes of semigroups using random grids

Aur\'elien Alfonsi, Vlad Bally

PDF

Open Access

TL;DR

This paper introduces a universal method for constructing high-order approximation schemes for semigroups of linear operators using random grids, applicable to various processes including diffusions and deterministic systems.

Contribution

It develops a general framework for high-order semigroup approximation schemes with random grids, achieving any order of accuracy with controlled complexity and broad applicability.

Findings

01

Achieves arbitrary order of approximation with finite variance estimators.

02

Constructs random grids and coefficients for universal applicability.

03

Demonstrates effectiveness on diffusions, ODEs, and Markov processes.

Abstract

Our aim is to construct high order approximation schemes for general semigroups of linear operators $P_{t}, t \geq 0$ . In order to do it, we fix a time horizon $T$ and the discretization steps $h_{l} = \frac{T}{n ^{l}}, l \in N$ and we suppose that we have at hand some short time approximation operators $Q_{l}$ such that $P_{h_{l}} = Q_{l} + O (h_{l}^{1 + α})$ for some $α > 0$ . Then, we consider random time grids $Π (ω) = {t_{0} (ω) = 0 < t_{1} (ω) < ... < t_{m} (ω) = T}$ such that for all $1 \leq k \leq m$ , $t_{k} (ω) - t_{k - 1} (ω) = h_{l_{k}}$ for some $l_{k} \in N$ , and we associate the approximation discrete semigroup $P_{T}^{Π (ω)} = Q_{l_{n}} ... Q_{l_{1}} .$ Our main result is the following: for any approximation order $ν$ , we can construct random grids $Π_{i} (ω)$ and coefficients $c_{i}$ , with $i = 1, ..., r$ such that \[…

Tables1

Table 1. Table 1 : Empirical standard deviation of c ( 𝒜 ) Γ 0 𝒜 𝑐 𝒜 subscript superscript Γ 𝒜 0 c(\mathcal{A})\Gamma^{\mathcal{A}}_{0} for f ( x ) = x 2 𝑓 𝑥 superscript 𝑥 2 f(x)=x^{2} and n = 5 𝑛 5 n=5 , on the SDE example described in Subsection 5.3 .

$𝒜$	Standard deviation of $c (𝒜) Γ_{0}^{𝒜}$	Used for approx of order
${\emptyset}$	$8.8 \times 10^{- 2}$	$ν \geq 1$
${\emptyset, 1}$	$3.2 \times 10^{- 2}$	$ν \geq 2$
${\emptyset, 1, 11}$	$1.4 \times 10^{- 2}$	$ν \geq 3$
${\emptyset, 1, 2}$	$4.4 \times 10^{- 3}$	$ν \geq 3$
${\emptyset, 1, 11, 111}$	$1.0 \times 10^{- 2}$	$ν \geq 4$
${\emptyset, 1, 11, 2}$	$1.7 \times 10^{- 3}$	$ν \geq 4$
${\emptyset, 1, 2, 21}$	$1.5 \times 10^{- 3}$	$ν \geq 4$
${\emptyset, 1, 11, 2, 21}$	$5.4 \times 10^{- 4}$	$ν \geq 4$
${\emptyset, 1, 2, 3}$	$3.7 \times 10^{- 4}$	$ν \geq 4$

Equations418

P_{t} f = i = 1 \sum r c_{i} E (P_{t}^{Π_{i} (ω)} f (x)) + O (n^{- ν})

P_{t} f = i = 1 \sum r c_{i} E (P_{t}^{Π_{i} (ω)} f (x)) + O (n^{- ν})

d X_{t} = j = 1 \sum d σ_{j} (X_{t}) d W_{t}^{j} + b (X_{t}) d t,

d X_{t} = j = 1 \sum d σ_{j} (X_{t}) d W_{t}^{j} + b (X_{t}) d t,

X_{(k + 1) h}^{n} = X_{k h}^{n} + j = 1 \sum d σ_{j} (X_{k}^{n}) (W_{(k + 1) h}^{j} - W_{k h}^{j}) + b (X_{k}^{n}) h

X_{(k + 1) h}^{n} = X_{k h}^{n} + j = 1 \sum d σ_{j} (X_{k}^{n}) (W_{(k + 1) h}^{j} - W_{k h}^{j}) + b (X_{k}^{n}) h

∣ P_{T} f (x) - P_{T}^{n} f (x) ∣ \leq \frac{C}{n} ∥ f ∥_{4, \infty}

∣ P_{T} f (x) - P_{T}^{n} f (x) ∣ \leq \frac{C}{n} ∥ f ∥_{4, \infty}

P_{T} f (x) - P_{T}^{n} f (x) = k = 0 \sum n - 1 P_{[n - (k + 1)] h} (P_{h} - P_{h}^{n}) P_{k h}^{n} f (x) .

P_{T} f (x) - P_{T}^{n} f (x) = k = 0 \sum n - 1 P_{[n - (k + 1)] h} (P_{h} - P_{h}^{n}) P_{k h}^{n} f (x) .

∣ P_{h} f (x) - P_{h}^{n} f (x) ∣ \leq C ∥ f ∥_{4, \infty} h^{2},

∣ P_{h} f (x) - P_{h}^{n} f (x) ∣ \leq C ∥ f ∥_{4, \infty} h^{2},

P_{T} f (x)

P_{T} f (x)

P_{T} f = i = 1 \sum r c_{i} E [P_{T}^{Π_{ν}^{i}} f] + O (n^{- ν}) .

P_{T} f = i = 1 \sum r c_{i} E [P_{T}^{Π_{ν}^{i}} f] + O (n^{- ν}) .

X_{t_{i + 1}}^{Π} = X_{t_{i}}^{Π} + j = 1 \sum d σ_{j} (X_{t_{i}}^{Π}) (W_{t_{i + 1}}^{j} - W_{t_{i}}^{j}) + b (X_{t_{i}}^{Π}) (t_{i + 1} - t_{i}) .

X_{t_{i + 1}}^{Π} = X_{t_{i}}^{Π} + j = 1 \sum d σ_{j} (X_{t_{i}}^{Π}) (W_{t_{i + 1}}^{j} - W_{t_{i}}^{j}) + b (X_{t_{i}}^{Π}) (t_{i + 1} - t_{i}) .

E [f (X_{T})] = i = 1 \sum r c_{i} E [f (X_{T}^{Π_{ν}^{i}})] + O (n^{- ν}),

E [f (X_{T})] = i = 1 \sum r c_{i} E [f (X_{T}^{Π_{ν}^{i}})] + O (n^{- ν}),

\exists C > 0, k \in N^{*}, \forall f \in F, ∥ P_{T} f - \hat{P}_{T}^{ν, n} f ∥_{0} \leq C ∥ f ∥_{k} n^{- ν} .

\exists C > 0, k \in N^{*}, \forall f \in F, ∥ P_{T} f - \hat{P}_{T}^{ν, n} f ∥_{0} \leq C ∥ f ∥_{k} n^{- ν} .

\forall l, k \in N, \exists C > 0, \forall f \in F, ∥ (P_{h_{l}} - Q_{l}) f ∥_{k} \leq C ∥ f ∥_{k + β} h_{l}^{1 + α},

\forall l, k \in N, \exists C > 0, \forall f \in F, ∥ (P_{h_{l}} - Q_{l}) f ∥_{k} \leq C ∥ f ∥_{k + β} h_{l}^{1 + α},

\forall l, m \in N, \exists C > 0, 0 \leq k \leq n^{l} max ∥ Q_{l}^{[k]} f ∥_{m} + t \leq T sup ∥ P_{t} f ∥_{m} \leq C ∥ f ∥_{m} .

\forall l, m \in N, \exists C > 0, 0 \leq k \leq n^{l} max ∥ Q_{l}^{[k]} f ∥_{m} + t \leq T sup ∥ P_{t} f ∥_{m} \leq C ∥ f ∥_{m} .

\forall f \in F, P_{T} f - i = 1 \sum r c_{i} E [P_{T}^{Π_{ν}^{i}} f]_{0} \leq C ∥ f ∥_{k} n^{- ν} .

\forall f \in F, P_{T} f - i = 1 \sum r c_{i} E [P_{T}^{Π_{ν}^{i}} f]_{0} \leq C ∥ f ∥_{k} n^{- ν} .

∥ f ∥_{k, \infty} = 0 \leq ∣ γ ∣ \leq k \sum x \in R^{d} sup ∣ \partial^{γ} f (x) ∣

∥ f ∥_{k, \infty} = 0 \leq ∣ γ ∣ \leq k \sum x \in R^{d} sup ∣ \partial^{γ} f (x) ∣

∣ γ ∣ = m and \partial^{γ} = \partial_{x_{γ_{1}}} ... \partial_{x_{γ_{m}}} .

∣ γ ∣ = m and \partial^{γ} = \partial_{x_{γ_{1}}} ... \partial_{x_{γ_{m}}} .

\partial^{α} [f \circ g] = ∣ β ∣ \leq ∣ α ∣ \sum (\partial^{β} f) (g) P_{α, β} (g)

\partial^{α} [f \circ g] = ∣ β ∣ \leq ∣ α ∣ \sum (\partial^{β} f) (g) P_{α, β} (g)

P_{α, β} (g) = \sum c_{α, β} ((γ_{1}, j_{1}), \dots, (γ_{k}, j_{k})) i = 1 \prod k \partial^{γ_{i}} g^{j_{i}},

P_{α, β} (g) = \sum c_{α, β} ((γ_{1}, j_{1}), \dots, (γ_{k}, j_{k})) i = 1 \prod k \partial^{γ_{i}} g^{j_{i}},

P_{t + s} = P_{t} P_{s} .

P_{t + s} = P_{t} P_{s} .

h_{l} = \frac{T}{n ^{l}}, h = h_{1} = \frac{T}{n}, h_{0} = T .

h_{l} = \frac{T}{n ^{l}}, h = h_{1} = \frac{T}{n}, h_{0} = T .

\forall l, k \in N, \exists C > 0, \forall f \in C_{b}^{\infty} (R^{d}), ∥ (P_{h_{l}} - Q_{l}) f ∥_{k, \infty} \leq C ∥ f ∥_{k + β, \infty} h_{l}^{1 + α}

\forall l, k \in N, \exists C > 0, \forall f \in C_{b}^{\infty} (R^{d}), ∥ (P_{h_{l}} - Q_{l}) f ∥_{k, \infty} \leq C ∥ f ∥_{k + β, \infty} h_{l}^{1 + α}

Δ_{h_{l}} = P_{h_{l}} - Q_{l}

Δ_{h_{l}} = P_{h_{l}} - Q_{l}

P_{h_{l}}^{h_{l}} = Q_{l} and P_{k h_{l}}^{h_{l}} = Q_{l} ... Q_{l} k times.

P_{h_{l}}^{h_{l}} = Q_{l} and P_{k h_{l}}^{h_{l}} = Q_{l} ... Q_{l} k times.

\forall l, m \in N, \exists C > 0, k h_{l} \leq T max ∥ P_{k h_{l}}^{h_{l}} f ∥_{m, \infty} + t \leq T sup ∥ P_{t} f ∥_{m, \infty} \leq C ∥ f ∥_{m, \infty} .

\forall l, m \in N, \exists C > 0, k h_{l} \leq T max ∥ P_{k h_{l}}^{h_{l}} f ∥_{m, \infty} + t \leq T sup ∥ P_{t} f ∥_{m, \infty} \leq C ∥ f ∥_{m, \infty} .

Δ_{h} f (x) = P_{h} f (x) - P_{h}^{h} f (x) = \int_{0}^{h} \int_{0}^{s} E ((L^{2} f) (X_{r} (x)) - E ((L_{x}^{2} f) (x + b (x) r + σ (x) W_{r})) d r d s,

Δ_{h} f (x) = P_{h} f (x) - P_{h}^{h} f (x) = \int_{0}^{h} \int_{0}^{s} E ((L^{2} f) (X_{r} (x)) - E ((L_{x}^{2} f) (x + b (x) r + σ (x) W_{r})) d r d s,

\forall k \in N, \exists C > 0, ∥ Δ_{h} f ∥_{k, \infty} \leq C ∥ f ∥_{k + 4, \infty} h^{2}

\forall k \in N, \exists C > 0, ∥ Δ_{h} f ∥_{k, \infty} \leq C ∥ f ∥_{k + 4, \infty} h^{2}

P_{T} - P_{T}^{h_{1}}

P_{T} - P_{T}^{h_{1}}

P_{T} = P_{T}^{h_{1}} + i = 1 \sum m - 1 I_{i}^{h_{1}} (n) + R_{m}^{h_{1}} (n)

P_{T} = P_{T}^{h_{1}} + i = 1 \sum m - 1 I_{i}^{h_{1}} (n) + R_{m}^{h_{1}} (n)

I_{i}^{h} (n)

I_{i}^{h} (n)

R_{m}^{h} (n)

R_{m}^{h} (n) f_{\infty}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematical Approximation and Integration · Stochastic Gradient Optimization Techniques · Stochastic processes and financial applications

Full text

A generic construction for high order approximation schemes of semigroups using random grids

Aurélien Alfonsi111Université Paris-Est, Cermics (ENPC), INRIA, F-77455 Marne-la-Vallée, France. email: [email protected] and Vlad Bally222LAMA (UMR CNRS, UPEMLV, UPEC), MathRisk INRIA, Université Paris-Est. email: [email protected]

Abstract

Our aim is to construct high order approximation schemes for general semigroups of linear operators $P_{t},t\geq 0$ . In order to do it, we fix a time horizon $T$ and the discretization steps $h_{l}=\frac{T}{n^{l}},l\in\mathbb{N}$ and we suppose that we have at hand some short time approximation operators $Q_{l}$ such that $P_{h_{l}}=Q_{l}+O(h_{l}^{1+\alpha})$ for some $\alpha>0$ . Then, we consider random time grids $\Pi(\omega)=\{t_{0}(\omega)=0<t_{1}(\omega)<...<t_{m}(\omega)=T\}$ such that for all $1\leq k\leq m$ , $t_{k}(\omega)-t_{k-1}(\omega)=h_{l_{k}}$ for some $l_{k}\in\mathbb{N}$ , and we associate the approximation discrete semigroup $P_{T}^{\Pi(\omega)}=Q_{l_{n}}...Q_{l_{1}}.$ Our main result is the following: for any approximation order $\nu$ , we can construct random grids $\Pi_{i}(\omega)$ and coefficients $c_{i}$ , with $i=1,...,r$ such that

[TABLE]

with the expectation concerning the random grids $\Pi_{i}(\omega).$ Besides, $\textup{Card}(\Pi_{i}(\omega))=O(n)$ and the complexity of the algorithm is of order $n$ , for any order of approximation $\nu$ . The standard example concerns diffusion processes, using the Euler approximation for $Q_{l}$ . In this particular case and under suitable conditions, we are able to gather the terms in order to produce an estimator of $P_{t}f$ with finite variance. However, an important feature of our approach is its universality in the sense that it works for every general semigroup $P_{t}$ and approximations. Besides, approximation schemes sharing the same $\alpha$ lead to the same random grids $\Pi_{i}$ and coefficients $c_{i}$ . Numerical illustrations are given for ordinary differential equations, piecewise deterministic Markov processes and diffusions.

Keywords: approximation schemes, random grids, parametrix, Monte-Carlo methods.

AMS: 60H35, 65C30, 65C05, 65C20

1 Introduction

We consider a semigroup of linear operators $(P_{t},t\geq 0)$ and we want to construct high order approximation schemes based on some random grids. Before presenting our general result, we would like to present the popular example of diffusion processes and of approximation schemes of Euler type. Consider the diffusion process

[TABLE]

where $W$ is a $d$ -dimensional Brownian motion and $\sigma_{j},b:\mathbb{R}^{d}\to\mathbb{R}^{d}$ are smooth vector fields. Our aim is to construct an approximation scheme for the semigroup $P_{t}f(x)=\mathbb{E}(f(X_{t}(x)))$ , where $X_{t}(x)$ is the diffusion process starting from $x$ . Given the time horizon $T>0$ and the time step $h=\frac{T}{n}$ one constructs the Euler scheme of step $h$ by

[TABLE]

Then one constructs the approximation semigroup $P_{t}^{n}f(x)=\mathbb{E}(f(X_{kh}^{n}(x)))$ for $kh\leq t<(k+1)h$ . It is well known that

[TABLE]

where $\left\|f\right\|_{4,\infty}$ is the supremum norm of $f$ and its derivatives up to order four. The proof is based on Lindeberg method (or Duhamel’s principle):

[TABLE]

Since

[TABLE]

the above inequality gives (2). If we want to go further we develop $P_{T-(k+1)h}=P_{[n-(k+1)]h}$ as well and we obtain

[TABLE]

The last term is of order $n^{-2}$ , which gives an error of order two if we only keep the two first terms. And one may continue and go further in the development: one develops $P_{T-(k_{2}+1)n}$ and so on. This is similar to the development made in the classical parametrix method. But now a problem appears: how to compute $P_{h}-P_{h}^{n}$ in the second term? In the classical parametrix method, one uses an integration by parts formula based on the infinitesimal operator of the diffusion semigroup. Here we follow another way: we develop $P_{h}-P_{h}^{n}$ itself in the same way as for $P_{t}-P_{t}^{n}$ in order to improve the order of approximation. So, in our approach we have two simultaneous developments: an "horizontal" one as in (1) and a "vertical" one which is used in order to refine $P_{h}-P_{h}^{n}.$ And in both cases we continue the development up to the moment that we have obtained the order of approximation $\nu\in\mathbb{N}^{*}$ that we desire. The control of this two folds "Taylor expansion" gives rise to a rather intricate combinatorial problem. The natural way to describe this is to use some trees which are constructed by backward recurrence, which is a little bit tricky. A second problem concerns the computation of the sum $\sum_{k=0}^{n-1}$ and more generally of sums of the form $\sum_{0\leq k_{1}<...<k_{m}<n}$ . Computing all these terms would make the use of the development (1) inefficient for computational purposes. The idea is then to randomize $0\leq k_{1}<...<k_{m}<n$ by using order statistics and then to use the Monte Carlo method in order to compute it. This is the reason for which random grids come on in our schemes.

These developments are made in Sections 2 and 3. Eventually, we show that for any order $\nu\in\mathbb{N}^{*}$ , there exists $r\in\mathbb{N}^{*}$ , coefficients $c_{1},\dots,c_{r}\in\mathbb{R}$ and random grids $\Pi^{i}_{\nu}(\omega)\subset\{jT/n^{l}:j\leq n^{l}\},i=1,...,r$ with $\textup{Card}(\Pi_{i})\leq C_{\nu}\times n$ for some $C_{\nu}>0$ such that

[TABLE]

This is our first main result, stated precisely in Theorem 3.10, where we give an explicit construction of the coefficients $c_{i}$ and of the time-grids $\Pi^{i}_{\nu}$ . Thus, the complexity of our algorithm remains of order $r\times C_{\nu}\times n$ , for any order $\nu$ of precision. However, $r\times C_{\nu}$ seriously increases with $\nu$ , and one has to take care about this in the complexity analysis for the choice of $\nu$ .

To use the approximation in practice, one has to work with a probabilistic representation of the semigroup. On the discretization time grid $\Pi=\{t_{0}=0<t_{1}<\dots<t_{m}=T\}$ , we define the corresponding Euler scheme by $X_{0}^{\Pi}=x$ and

[TABLE]

Since the grids are independent from $W$ , we then have for smooth functions $f$

[TABLE]

and the right hand side gives an estimator that can be computed in $O(n)$ operations. Then, an important issue is the variance of this estimator. In Section 4, we present a specific organization of the algorithm which allows to get a finite variance. Theorem 3.10 proposes a particular way to gather the terms, i.e. a partition $\mathcal{I}_{1},\dots,\mathcal{I}_{q}$ of $\{1,\dots,r\}$ , and Theorem 4.4 shows that the variance of $\sum_{i\in\mathcal{I}_{q^{\prime}}}c_{i}f(X_{T}^{\Pi^{i}_{\nu}})$ is bounded for all $q^{\prime}$ , so that the variance of the estimator is bounded.

An important and nice feature of our approach is that it is generic and provides an algorithm that can be used in many contexts. Indeed, in the previous approach the only fact which is necessary in order to make the algorithm work is to have at hand a short time approximation $P_{h}^{n}$ for $P_{h}$ such that (4) holds. The construction of the grids $\Pi_{i}$ and the coefficients $c_{i}$ only depends on this. This leads us to consider the following abstract framework. Let $F$ be a vector space endowed with a family of seminorms $\|\|_{k}$ , $k\in\mathbb{N}$ , such that $\|f\|_{k}\leq\|f\|_{k+1}$ . We consider a family $(P_{t},t\geq 0)$ of linear operators on $F$ that have the semigroup property, i.e $P_{0}f=f$ and $P_{t+s}f=P_{t}P_{s}f$ for all $f\in F$ , $t,s\geq 0$ . Our goal is to approximate the semigroup $P_{T}f$ and build, for any $\nu\in\mathbb{N}^{*}$ a linear operator $\hat{P}^{\nu,n}_{T}$ such that

[TABLE]

To achieve this goal, we suppose that we have at our hands a family of linear operators $Q_{l}:F\to F$ , $l\in\mathbb{N}$ , such that we have for some $\alpha>0$ and $\beta\in\mathbb{N}$ ,

[TABLE]

where $h_{l}=T/n^{l}$ . We note $Q_{l}^{[0]}$ the identity operator on $F$ and, for $k\in\mathbb{N}^{*}$ , $Q_{l}^{[k]}=Q_{l}^{[k-1]}Q_{l}$ the operator obtained by applying $k$ times the operator $Q_{l}$ . We also assume that all these operators satisfy

[TABLE]

The Euler scheme discussed before corresponds to $Q_{l}=P_{h_{l}}^{n^{l}}$ with $h_{l}=T/n^{l}$ , and the approximation of order $2$ only involves $Q_{1}$ . But, if we want to construct higher order schemes as in (1), we have to mix operators $Q_{l}$ for $l\in\mathbb{N}^{*}$ . This leads us to consider grids $\Pi=\{0\leq t_{1}<...<t_{m}=T\}$ with the property that for every $i=1,...,n,$ we have $t_{i}-t_{i-1}=h_{l_{i}}$ for some $l_{i}\in\mathbb{N}$ . Then we define $P_{T}^{\Pi}=Q_{l_{n}}Q_{l_{n-1}}...Q_{l_{1}}$ . Notice that $P_{T}^{\Pi}$ is built by using the "short time" approximation operators $Q_{l},l\in\mathbb{N}$ only. Eventually, we show (see Theorem 3.10) that for any order $\nu\in\mathbb{N}^{*}$ , there exists coefficients $c_{1},\dots,c_{r}\in\mathbb{R}$ and random grids $\Pi^{i}_{\nu}(\omega)\subset\{jT/n^{l}:j\leq n^{l}\},i=1,...,r$ with $\textup{Card}(\Pi_{i})\leq C_{\nu}\times n$ for some $C_{\nu}>0$ and constants $C>0$ and $k\in\mathbb{N}$ such that

[TABLE]

We stress that the coefficients $c_{i}$ and the grids $\Pi_{i}(\omega),i=1,..,r$ does not depend on $P_{t}$ nor on the specific form of $Q_{l}$ : only the order of approximation $\nu$ and $\alpha$ in ( $\overline{H_{1}}$ ) matter. Then, we give several examples of applications besides the Euler scheme: the Ninomiya Victoir scheme for diffusion processes (then $\alpha=2$ and $\beta=6$ ), or approximation schemes for ordinary differential equations and piecewise deterministic Markov processes.

The approximations introduced in this paper are of any order $\nu$ with a computation time in $O(n)$ . To calculate then $P_{T}f$ with a precision $\varepsilon$ , we naturally use a Monte-Carlo method with $n\sim\varepsilon^{-1/\nu}$ and $M\sim\varepsilon^{-2}$ samples, which has a computational cost of $O(\varepsilon^{-(2+1/\nu)})$ . Since $\nu$ is arbitrary large, we will denote by $O(\varepsilon^{-2+})$ this complexity. There is a large literature in numerical probability dedicated to construct either unbiased estimators of $P_{T}f$ , leading then to a computational cost of $O(\varepsilon^{-2})$ (but this is only true in the case of finite variance), or approximated estimators leading to a computational cost of $O(\varepsilon^{-2+})$ . Let us give an overview of the different methods to position our work.

When $Q_{1}^{[n]}f=P_{T}f+c_{1}n^{-1}+\dots+c_{\nu}n^{1-\nu}+O(n^{-\nu})$ , the Richardson-Romberg extrapolation provides an approximation of order $\nu$ , and Pagès [23] shows in the case of the Euler scheme for SDEs how to get with this method an estimator with bounded variance. In a different way, extending Fujiwara’s method, Oshima et al. [22] propose approximations for SDEs of any order by considering linear combinations of Ninomiya and Victoir schemes with different time steps. These approximations have very similar properties to the ones presented in this paper, but they are obtained with a significantly different approach: they are constructed with linear combinations of schemes using uniform grids obtained with multiples of the same time-step, while our approximations uses non-uniform time grids that are refined at some random places. Also, the principle of our methodology is not to find a combination of schemes that cancels the terms of orders $n^{-i}$ for $i=1,\dots,\nu-1$ , but instead to calculate the contribution of all these terms.

The Multi-Level Monte-Carlo (MLMC) method proposed by Giles [12] that generalizes the statistical Romberg method of Kebaier [16] gives another generic way to approximate $P_{T}f$ in $O(\varepsilon^{-2+})$ . McLeish [19] and Rhee and Glynn [24] have then proposed an unbiased estimator constructed with similar ideas, see also the recent work of Vihola [25]. Contrary to the previous approaches, the MLMC method does not rely on the development of high order approximations since it already works using the Euler scheme. It stems from a clever probabilistic representation and variance analysis. The MLMC method is in fact complementary to high order approximations. For instance, Lemaire and Pagès [18] have proposed estimators combining the MLMC method and the Richardson-Romberg extrapolation, improving the asymptotic complexity of the standard MLMC method with the Euler scheme.

Last, there is a stream of papers that develop unbiased estimators for $P_{T}f$ in the case of SDEs. We have already mentioned the unbiased estimators [19, 24] that are obtained as telescopic series and that use a discretization scheme with more and more refined time grids. Another direction of research is to try to write $P_{T}f=\mathbb{E}[W_{T}f(\tilde{X}_{T})]$ , where $\tilde{X}_{T}$ is a simulatable process (e.g. a Euler scheme) and $W_{T}$ is some computable weight. By using a change of measure and a rejection algorithm, Beskos and Roberts [9] have proposed such a method for one-dimensional diffusions. Recently, Bally and Kohatsu-Higa [7] have given a probabilistic representation of the parametrix method that opens the road to construct unbiased estimators for a wide class of Markov processes, including stopped or reflected diffusions [11, 4]. By using a different approach, Henry-Labordère et al. [13] have lately proposed unbiased estimators for SDEs that present nonetheless a similar structure as the ones obtained with the parametrix method. A common important issue with all these unbiased estimators is to come up with a bounded variance estimator. This problem is tackled by Andersson and Kohatsu-Higa [5] who provide a finite variance estimator for the parametrix method, see also Agarwal and Gobet [1]. The approximation method that we develop in this paper can be seen somehow as a discrete version of the parametrix method. Instead of considering a continuous time approximating semigroup $P^{x}_{t}$ , and iterate indefinitely the formula $P_{T}f-P^{x}_{T}f=\int_{0}^{T}P^{x}_{t}(L-L^{x})P_{s}fds$ ( $L$ and $L^{x}$ are the corresponding infinitesimal generators), we iterate the equality $P_{T}f-Q_{1}^{[n]}=\sum_{k=0}^{n-1}P_{(n-(k+1))h_{1}}(P_{h_{1}}-Q_{1})Q_{1}^{[k]}$ a finite number of times until to achieve an approximation of order $\nu$ . The main advantage of our approximation schemes is that their construction is generic and only depends on the parameter $\alpha$ in ( $\overline{H_{1}}$ ), while the weights involved in these unbiased estimators really depends on the underlying SDE or Markov process. This makes our approach much easier to implement for an whole class of processes. Besides, the discrete structure enables us to gather the correcting terms in a way to get a finite variance estimator as already mentioned.

The paper is organized as follows. In Section 2, we introduce some notation and present the recipe to construct iteratively high-order approximation schemes. Section 3 introduces trees, random trees and random grids that we use to construct our approximation schemes. It also prepares the variance analysis by gathering the terms of the approximations in an appropriate way. Theorem 3.10 states our first main result. Section 4 specify these approximations in some cases by using particular probabilistic representations of semigroups, for instance in the case of the Euler scheme for SDEs. In this case, we state in Theorem 4.4 our second main result that ensures that our estimators have a finite variance. Last, we provide in Section 5 numerical examples of our approximations that illustrates the broad application of our approach.

2 Basic development

We first introduce notation that will be used through the paper. We denote by $C_{b}^{\infty}(\mathbb{R}^{d})$ the space of smooth functions from $\mathbb{R}^{d}$ to $\mathbb{R}^{d}$ which are bounded and have bounded derivatives of any order. And we work with the norms

[TABLE]

where for a multi-index $\gamma=(\gamma_{1},...,\gamma_{m})\in\cup_{m^{\prime}\in\mathbb{N}}\{1,...,d\}^{m^{\prime}}$ we denote

[TABLE]

In many proofs of the paper, we will have to deal with derivatives of composed functions. Let $f:\mathbb{R}^{d}\rightarrow\mathbb{R}$ and $g:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ be smooth functions. We note $g^{j}$ with $j\in\{1,\dots,d\}$ the coordinates of $g$ . Then, one may prove by recurrence that

[TABLE]

with

[TABLE]

where the sum is over all $k=1,\dots,|\alpha|$ , $j_{1},\dots,j_{k}\in\{1,\dots,d\}$ and (non void) multi-indices $\gamma_{1},\dots,\gamma_{k}\in\cup_{m\geq 1}\{1,\dots,d\}^{m}$ such that $\sum_{i=1}^{k}|\gamma_{i}|\leq|\alpha|$ . For $f:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ , we note $\partial^{\alpha}[f\circ g]:=(\partial^{\alpha}[f^{1}\circ g],\dots,\partial^{\alpha}[f^{d}\circ g])$ , and the same formula applies coordinate by coordinate. This result is known in the literature as the Faà di Bruno’s formula, but we do not need in this work to use the explicit formula for the coefficients $c_{\alpha,\beta}$ , see Constantine and Savits [10].

We consider a semigroup of linear operators $P_{t}:C_{b}^{\infty}(\mathbb{R}^{d})\rightarrow C_{b}^{\infty}(\mathbb{R}^{d})$ which satisfies

[TABLE]

Let $T>0$ be a time horizon which is fixed in the sequel. We are interested in building approximation schemes for $P_{T}.$ For $n\in\mathbb{N}^{*}$ and $l\in\mathbb{N}$ , we define

[TABLE]

We suppose that we are given a sequence of linear operators $Q_{l}:C_{b}^{\infty}(\mathbb{R}^{d})\rightarrow C_{b}^{\infty}(\mathbb{R}^{d}),l\in\mathbb{N}$ which will be used in order to construct our approximation schemes. The operator $Q_{l}$ is supposed to be an approximation of $P_{h_{l}}$ , more precisely we assume that for every $k\in\mathbb{N}$ and $l\in\mathbb{N}$

[TABLE]

for some $\alpha>0,\beta\in\mathbb{N}.$ In the case of the Euler scheme we have $\alpha=1,$ and $\beta=4$ (see Example 2.2 below). We denote

[TABLE]

and

[TABLE]

Thus, we produce a discrete semigroup $P_{t}^{h_{l}},t=kh_{l}.$ We will use the following regularity hypothesis:

[TABLE]

Remark 2.1.

We could more generally assume that the left hand-side of ( $H_{2}$ ) is upper bounded by $C\|f\|_{m+\tilde{\beta},\infty}$ for some $\tilde{\beta}\in\mathbb{N}$ : this would not modify the main results of Sections 2 and 3. However, this generalization is not relevant for usual semigroups that already satisfy this bound for $\tilde{\beta}=0$ . For simplicity, we only consider this case.

Example 2.2.

(Euler scheme for diffusion processes) We work with the diffusion process (1). We assume that $\sigma_{j},b\in C^{\infty}(\mathbb{R}^{d})$ and the derivatives of any order of $\sigma_{j}$ and $b$ are bounded. In particular they have linear growth. By standard results on stochastic flows (see Proposition 2.1 and Theorem 2.3 of [14], Chapter 5), we have ( $H_{2}$ ).

We denote by $P_{t}$ the semigroup a diffusion process (1) and $P_{h}^{h}f(x):=\mathbb{E}[f(x+b(x)h+\sigma(x)W_{h})]$ , $P_{kh}^{h}:=(P_{h}^{h})^{k}$ the (discrete) semigroup of the Euler scheme of step $h$ . Then

[TABLE]

where $L$ is the infinitesimal operator of the semigroup $P_{t}$ and $L_{x}$ is the semigroup corresponding to the Euler scheme, with frozen coefficients $b(x)$ and $\sigma(x)$ . By Theorem 4.4 [17], we can take a modification of the solution such that the flow $x\rightarrow X_{t}(x)$ is infinitely differentiable with derivatives which have finite moments of any order. Thus we have

[TABLE]

Thus, the property ( $H_{1}$ ) is satisfied with $\alpha=1,\beta=4.$

We come back to the general case and we present the basic decomposition that we will use. We use the linearity of the operators in order to get

[TABLE]

Iterating this equality, we get for every $1\leq m\leq n$ ,

[TABLE]

with (convention $k_{0}=-1$ and $\prod_{j=0}^{i-1}A_{j}=A_{i-1}\dots A_{0}$ for non commutative operators $A_{j}$ )

[TABLE]

Then, using ( $H_{1}$ ) and ( $H_{2}$ ) we get for $h\in\{h_{l},l\in\mathbb{N}\}$ ,

[TABLE]

Thus, we get an error of order $O(h^{\alpha m})=O(n^{-\alpha m})$ for $h=h_{1}=T/n$ .

Formula (11) represents a discretization with step $h_{1}>0$ on the interval $[0,T]=[0,h_{0}]$ . In the sequel we will use similar developments on intervals $[0,h_{l}]$ with step $h_{l+1}$ . So, using the above formula with $T=h_{l}$ and with step $h_{l+1}$ instead of $h=h_{1}$ , we obtain

[TABLE]

with $l\in\mathbb{N}$ . We then have from (12) with $h=h_{l+1}=Tn^{-(l+1)}$ ,

[TABLE]

Similarly, we get

[TABLE]

Formula (11) is appealing since it may lead to an approximation of order $O(n^{-\alpha m})$ . The natural question is then how to simulate the terms $I_{i}^{h}f(n)$ . This raises two problems that we explain now.

Problem 1. It seems cumbersome (time consuming of complexity $O(n^{i})$ ) to compute the sum defining $I_{i}^{h}(n)$ . To avoid this issue, we will use a randomization procedure (inspired from [7] in the framework of the parametrix method). We fix $i$ and we consider a random variable $\kappa(\omega)=(\kappa_{1}(\omega),...,\kappa_{i}(\omega))$ that follows a "discrete order statistics" on $\{0,1,...,n-1\}.$ Precisely, $\kappa$ follows the distribution

[TABLE]

We will use the notation

[TABLE]

Then

[TABLE]

Remark 2.3.

If we look to the equality between the terms in (16) and in (17), we see that (17) gives a way to compute the sum which appears in (16) by the Monte-Carlo method. This Monte Carlo avoids the “curse of dimensionality” since the discrete simplex of dimension $i$ , $\{0\leq k_{1}<...<k_{i}<n\}$ , has $O(n^{i})$ elements. Besides, the different terms in the sum (16) have values that may be very close each other leading to a bounded variance. This will be analyzed later on in Subsection 4.3 for SDEs and the Euler scheme.

Let us note that this randomization makes the approximation (11) effective. Otherwise, it would have a computational cost of $O(n+\dots+n^{m-1})=O(n^{m})$ for a precision in $O(n^{-\alpha m})$ (see (12)), exactly as $P^{h_{m}}_{n^{m}h_{m}}$ .

Problem 2. The basic element in the above formula is $\Delta_{h},$ and we are not able to simulate directly this quantity, due to $P_{h}$ . To overcome this problem, we will use the fact that the short-time estimate ( $H_{1}$ ) of the semigroup is more and more precise when $l$ increases since $\alpha>0$ :

[TABLE]

Thus, we will construct by backward induction on $l$ , some approximations only based the approximation kernels $Q_{l}$ , each of them involving at most $O(n)$ iterations of these approximations (so we keep a complexity of order $n$ ). This is precised by the following lemma, which is the core of our computations. We will use the following numbers: for $l\in\mathbb{N},i\in\mathbb{N}^{\ast}$ and $\nu\in\mathbb{N}^{\ast}$ we define

[TABLE]

Here, $\lceil x\rceil=q$ if $x\in(q-1,q]$ is the ceiling function. We observe that $i\in\mathbb{N}^{\ast}\mapsto q_{i}(l,\nu)$ is nonincreasing and therefore $q_{i}(l,\nu)\leq q_{1}(l,\nu)=\nu+1$ .

Lemma 2.4.

Let $l\in\mathbb{N}$ and $\nu_{0}\in\mathbb{N}^{\ast}$ . Suppose that we have already a sequence of operators $\hat{P}_{h_{l+1}}^{\nu}$ for $1\leq\nu\leq\nu_{0}+1$ such that

[TABLE]

for some $C_{l+1,\nu}>0$ and $k(l+1,\nu)\in\mathbb{N}$ . For $\nu\leq\nu_{0}$ , we define

[TABLE]

with

[TABLE]

Then, we have

[TABLE]

for some $C_{l,\nu}>0$ and with

[TABLE]

Remark 2.5.

Compare the definition of $I_{i}^{\nu,h_{l+1}}(n)$ with the one of $I_{i}^{h_{l+1}}(n)$ in (17): one just replaces $\Delta_{h_{l+1}}=(P_{h_{l+1}}-P_{h_{l+1}}^{h_{l+1}})$ by $(\hat{P}_{h_{l+1}}^{q_{i}(l,\nu)}-P_{h_{l+1}}^{h_{l+1}})$ . So $P_{h_{l+1}}$ is replaced by $\hat{P}_{h_{l+1}}^{q_{i}(l,\nu)}$ , which is supposed to be "computable". 2. 2.

Recall that for $l=0$ , we have $h_{0}=\frac{T}{n^{0}}=T$ . Thus so $\hat{P}_{h_{0}}^{\nu}=\hat{P}_{T}^{\nu}$ is an approximation of order $n^{-\nu}$ of $P_{h_{0}}=P_{T}$ . This is what we want to obtain. 3. 3.

The inductive construction suggested by Lemma 2.4 to get a $\nu$ -th order scheme for $P_{h_{l}}$ is finite. See the construction of the tree $\mathcal{T}_{l}^{\nu}$ in (54).

Proof of Lemma 2.4.

For $\nu\leq\alpha+(1+\alpha)l$ , we have $m(l,\nu)=1$ so that $\hat{P}_{h_{l}}^{\nu}=P_{h_{l}}^{h_{l+1}}.$ Using (14) with $m=1$ , we obtain (23).

Suppose now that $\nu>\alpha+(1+\alpha)l$ so that we have $m(l,\nu)-1>0.$ We write

[TABLE]

First we compare $I_{i}^{h_{l+1}}(n)$ and $I_{i}^{\nu,h_{l+1}}(n)$ . From (22) we have

[TABLE]

We get by expanding (choose $j$ times $\hat{P}_{h_{l+1}}^{q_{i}(l,\nu)}-P_{h_{l+1}}$ and $i-j$ times $\Delta_{h_{l+1}}$ ) and using ( $H_{1}$ ) and (20)

[TABLE]

with $C$ depending on the constants $C_{l+1,q_{i}(l,\nu)}.$ We have to check that, for every $j=1,...,i$

[TABLE]

The first inequality is true if $q_{i}(l,\nu)\geq(1+\alpha)(l+1)$ and thus if $\nu+i-(1+\alpha)(l+1)(i-1)\geq(1+\alpha)(l+1)$ by using (18). The last inequality is equivalent to

[TABLE]

which holds since

[TABLE]

We obtain

[TABLE]

We deal now with the remainder. Using (14) with $m=m(l,\nu)$ we obtain

[TABLE]

Since $((1+\alpha)l+\alpha)m(l,\nu)\geq\nu$ the proof is completed. ∎

Remark 2.6.

Lemma 2.4 gives a recursive way to construct approximation of order $\nu\in\mathbb{N}$ . When $\alpha$ is not an integer, another natural choice may be to consider approximations of order $\alpha\nu$ , with $\nu\in\mathbb{N}$ . Of course, it is possible then to get an analogous recursive construction.

3 High order approximations of semigroups

Lemma 2.4 gives the recipe to construct high order approximations of semigroups by induction: from high order approximations on a time step $h_{l+1}$ , we produce high order approximations on a time step $h_{l}$ , we go on this construction to get high order approximations for $T=h_{0}$ . To describe precisely this construction, we need to introduce basic mathematical objects. In Subsection 3.1, we introduce "trees", "random trees" and "random grids" that will be used to define suitably our approximation schemes. Then, Subsection 3.2 presents a sequence of abstract operators and the composition operations associated to some given random tree. All these definitions are motivated by the approximation schemes that we describe in Subsection 3.3, but for the moment we keep an abstract framework because this allows a precise and simple presentation.

3.1 Trees, random trees and random grids

The approximations that we construct in this paper involve a quite intricate combinatorics. This can be understood from Lemma 2.4: an approximation of order $\nu$ at a level $l$ is constructed from approximations of different orders at level $l+1$ . To describe this recursion, we will use trees, see (58) and (56) thereafter. Then, to make this approximation more explicit and non-recursive, we will then use what we call random trees, i.e. trees labeled with particular random variables. In the case of SDE and the Euler scheme approximations, these random trees can be seen as a way to represent the random grid on which the Euler scheme is constructed. In this paper, we will use as much as possible the letter $\mathcal{T}$ for trees and $\mathcal{A}$ for random trees.

3.1.1 Trees

We will use the Neveu notation [20]. Let $\mathcal{U}=\cup_{n\geq 0}(\mathbb{N}^{\ast})^{n}$ be the set of finite sequences of non-negative integers. For $u=(u_{1},...,u_{m})\in\mathcal{U}$ and $i\in\mathbb{N}$ we denote $iu=(i,u_{1},...,u_{m})$ and $ui=(u_{1},...,u_{m},i).$ We also denote $\left|u\right|=m$ the length of $u.$

Definition 3.1.

(Trees) A tree is a subset $\mathcal{T}\subset\mathcal{U}$ such that:

$\emptyset\in\mathcal{T}$ , 2. 2.

$uj\in\mathcal{T\quad\Rightarrow\quad}u\in\mathcal{T}$ , 3. 3.

$uj\in\mathcal{T\quad\Rightarrow\quad}ui\in\mathcal{T}$ * for every $i<j.$ *

Convention: Throughout the paper, we use a different symbol for the ancestor (root) of a tree and for the void set:

[TABLE]

We think to $\mathcal{T}$ as a genealogical tree: each $u=(u_{1},...,u_{m})\in\mathcal{T}$ represents an individual (we call it also node or vertex) which is characterized by his genealogy: $u$ is the $u_{m}$ -th son of $(u_{1},...,u_{m-1}).$ So the first property imposes that the root $\emptyset$ belong to the tree (it is the universal ancestor), the second property says that any node of the tree (except the root) has a father, and the third property imposes to number the sons of a node increasingly, without jumping any number: put it otherwise, if a third son exists, then a second one has to exist also. Last, let us mention that a tree $\mathcal{T}$ can be infinite: throughout the paper, we will only consider finite trees.

We introduce some more notation related to $\mathcal{T}$ . For $u\in\mathcal{T}$ , we denote by $j_{u}(\mathcal{T})$ the number of sons of $u$ that is

[TABLE]

We denote by $\mathcal{T}_{i}^{\prime}$ the sub tree of $\mathcal{T}$ rooted in $i$ that is $\mathcal{T}_{i}^{\prime}=\{u\in\mathcal{U}:iu\in\mathcal{T}\}.$ We also denote $i\mathcal{T}=\{iu:u\in\mathcal{T}\}$ . This means that we root $\mathcal{T}$ at the point $i\in\mathbb{N}^{*}$ . Notice that this is not a tree because it does not contain $j\mathcal{T}$ , for $j<i$ , nor the ancestor. We note $\left|\mathcal{T}\right|=\max\{\left|u\right|:u\in\mathcal{T}\}$ , the depth of the tree $\mathcal{T}$ . Finally we define the extreme points (leaves) of $\mathcal{T}:$

[TABLE]

3.1.2 Random trees

Let $\mathcal{A}$ be a finite tree and $n\in\mathbb{N}^{*}$ such that $n\geq\max_{u\in\mathcal{A}}j_{u}(A)$ . To every vertex $u\in\mathcal{A}\setminus\mathcal{E}(\mathcal{A})$ (i.e. such that $j_{u}(\mathcal{A})>0$ ) we associate a random variable

[TABLE]

which we may considered as the (random) birthdays of the sons of $u$ . We denote

[TABLE]

We assume :

•

The random variables $\kappa(u),u\in\mathcal{A}-\mathcal{E}(\mathcal{A})$ are independent each other.

•

$0\leq\kappa_{1}(u)<...<\kappa_{j_{u}(\mathcal{A})}(u)\leq n-1$ is an order statistics, i.e. they are uniformly distributed on $\{(k_{1},\dots,k_{j_{u}(\mathcal{A})})\in\{0,1,...,n-1\}^{j_{u}(\mathcal{A})}:0\leq k_{1}<\dots<k_{j_{u}(\mathcal{A})}\}.$

Definition 3.2.

(Random trees) If the above hypothesis are verified, we call $(\mathcal{A},\kappa(\mathcal{A}))$ a random tree.

Remark 3.3.

Let us precise the relation between the random trees $(\mathcal{A},\kappa(\mathcal{A}))$ and its subtrees $(\mathcal{A}_{j}^{\prime},\kappa^{j}(\mathcal{A}_{j}^{\prime}))$ , $j=1,...,i$ with $i=j_{\emptyset}(\mathcal{A})$ . For $u\in\mathcal{A}$ with $u=jv$ we will assume $\kappa(u)=\kappa^{j}(v)$ where $\kappa^{j}(v)$ is the random variable associated to $v\in\mathcal{A}_{j}^{\prime}$ . So, $\kappa(\mathcal{A})$ is the family of uniform random variables obtained from $\kappa^{j}(\mathcal{A}_{j}^{\prime}),j=1,...,i$ to which is added one independent random variable $\kappa(\emptyset)$ uniformly distributed on $\{(k_{1},\dots,k_{j_{\emptyset}(\mathcal{A})})\in\{0,1,...,n-1\}^{j_{\emptyset}(\mathcal{A})}:0\leq k_{1}<\dots<k_{j_{\emptyset}(\mathcal{A})}\}$ .

3.1.3 Random grids

We associate to a random tree $(\mathcal{A},\kappa(\mathcal{A}))$ a random grid. We fix $l\in\mathbb{N},n\in\mathbb{N}$ and $T\geq 0$ and we recall

[TABLE]

and we use here (and in the sequel) the convention

[TABLE]

Then, we construct by recurrence the random grid $G_{l}(\mathcal{A})$ on $[0,h_{l}]$ in the following way. For convenience, we drop in the notation $G_{l}(\mathcal{A})$ the dependence in $\kappa(\mathcal{A})$ even if this grid is constructed by using $\kappa(\mathcal{A})$ . If $\mathcal{A=\{\emptyset\}}$ (contains just the ancestor) then $j_{\emptyset}(\mathcal{A)}=0$ and we define

[TABLE]

This is the usual uniform grid of step $h_{l+1}$ on $[0,h_{l}]$ . Otherwise, we have $j_{\emptyset}(\mathcal{A})>0$ and we define by recurrence

[TABLE]

where $t+G:=\{t+s,s\in G\}$ . This means that we consider the uniform grid of step $h_{l+1}$ on $[0,h_{l}]$ and moreover we refine the intervals $[\kappa_{i}h_{l+1},\kappa_{i}^{\prime}h_{l+1}]$ according to the random grid $G_{l+1}(\mathcal{A}_{i}^{\prime})$ (see Remark 3.3 to see how the random trees $\mathcal{A}$ and $\mathcal{A}^{\prime}_{i}$ are related). Notice that, if $\left|\mathcal{A}\right|=r,$ then $G_{l}(\mathcal{A})\subset\{qh_{l+r+1},q=0,...,n^{r+1}\}.$ We denote $m=\textup{Card}(G_{l}(\mathcal{A}))-1$ and we define

[TABLE]

the reordering of $G_{l}(\mathcal{A})$ . We notice that for every $k=1,....,m$ one has $s_{k}-s_{k-1}=h_{l+p_{k}}$ for some $p_{k}\in\{1,2,...,r\}.$ We finally give an alternative representation of the random grid $G_{l}(\mathcal{A})$ .

Lemma 3.4.

Let $(\mathcal{A},\kappa(\mathcal{A}))$ be a random tree. Let us define $t_{l}(\emptyset)=0$ and for $u=(u_{1},...,u_{m})\in\mathcal{A}$ , we define

[TABLE]

Then, we have

[TABLE]

Proof.

We prove this result by recurrence on the depth $|\mathcal{A}|$ . The result is clear for $\mathcal{A}=\{\emptyset\}$ . Suppose the result true for any random tree $|\mathcal{A}^{\prime}|\leq r-1$ and assume $|\mathcal{A}|=r$ . Let $i\in\{1,\dots,j_{\emptyset}(\mathcal{A})\}$ and $\kappa^{i}$ the random variables associated to $\mathcal{A}^{\prime}_{i}$ (see Remark 3.3). For $u^{\prime}\in\mathcal{A}^{\prime}_{i}$ , we have

[TABLE]

We set $u=(i,u^{\prime})$ . Since $\kappa((i,u))=\kappa^{i}(u)$ for any $u\in\mathcal{A}^{\prime}_{i}$ , we get

[TABLE]

From $\mathcal{A}=\{\emptyset\}\cup\left(\cup_{i=1}^{j_{\emptyset}(\mathcal{A})}i\mathcal{A}^{\prime}_{i}\right)$ , we deduce that

[TABLE]

By using the recurrence hypothesis and (28), this set is equal to $G_{l}(\mathcal{A})$ . ∎

3.2 Operators

We consider again a sequence of operators $Q_{l}:C^{\infty}(\mathbb{R}^{d})\rightarrow C^{\infty}(\mathbb{R}^{d})$ , $l\in\mathbb{N}$ (or more generally $Q_{l}:F\rightarrow F$ where $F$ is an abstract vector space). For $k\in\mathbb{N}$ we denote

[TABLE]

Given a random tree $(\mathcal{A},\kappa(\mathcal{A}))$ we construct by recurrence $Q_{l}^{\mathcal{A}}$ in the following way (again we drop the dependence on $\kappa(\mathcal{A})$ in the notation). If $\mathcal{A}=\{\emptyset\}$ we define

[TABLE]

Suppose now that $j_{\emptyset}(\mathcal{A})=i\geq 1$ and let $\{\kappa_{1}<...<\kappa_{i}\}=\kappa_{\emptyset}(\mathcal{A})$ . We define by recurrence

[TABLE]

see Remark 3.3 for the dependence between the random trees. Finally, we define $\Gamma_{l}^{\mathcal{A}}$ in the following way. If $\mathcal{A=}\{\emptyset\}$ we define

[TABLE]

and if ${j}_{\emptyset}(\mathcal{A})=i\geq 1,$ we define by recurrence

[TABLE]

Notice that the recurrence formula (33) is the same as (31), but the initial condition (32) is different from (30).

We consider now a deterministic tree $\mathcal{T}$ (in contrast with $\mathcal{A}$ which is a random tree) and define by recurrence $\Delta_{l}(\mathcal{T})$ . If $\mathcal{T}=\{\emptyset\}$ , then

[TABLE]

and if ${j}_{\emptyset}(\mathcal{T})>0,$

[TABLE]

where $\mu_{i}$ is the uniform law of the order statistics $0\leq\kappa_{1}<...<\kappa_{i}\leq n-1.$

Our aim now is to give an explicit computational formula for $\Delta_{l}(\mathcal{T}).$ In order to do it, we need to introduce one more notation concerning families of trees (forests). Let $\mathbb{F}=\{\mathcal{T}_{j},j\in J_{\mathbb{F}}\}$ , where $J_{\mathbb{F}}$ is a family of indices and $\mathcal{T}_{j}$ is a tree for all $j\in J_{\mathbb{F}}$ . Given $m\in\mathbb{N}^{*}$ , we construct

[TABLE]

with

[TABLE]

So, the tree $\mathcal{T}(j_{1},\dots,j_{m})$ is obtained by rooting to the ancestor the tree $\mathcal{T}_{j_{k}}$ at the node $k$ , for each $k=1,...,m$ . We note $\mathcal{A}(j_{1},\dots,j_{m})$ the random tree obtained by labeling the nodes of $\mathcal{T}(j_{1},\dots,j_{m})$ with random variables, according to Definition 3.2.

Using this notation, we are able to construct the family of trees associated to a tree $\mathcal{T}$ by recurrence, in the following way. If $\mathcal{T}=\{\emptyset\}$ (this means the tree is composed just by the ancestor) then we define $\mathbf{F(}\mathcal{T})=\mathbf{F}(\{\emptyset\})=\{\emptyset\}$ - so the finite family associated to the tree $\{\emptyset\}$ has just one element which is the tree $\{\emptyset\}$ . Suppose now that $j_{\emptyset}(\mathcal{T)}\geq 1$ . Then, we define

[TABLE]

where $\mathbf{F}^{\otimes i}(\mathcal{T}_{i}^{\prime})$ is the shorthand notation for $(\mathbf{F}(\mathcal{T}_{i}^{\prime}))^{\otimes i}$ . We are now able to give the first result from this section.

Proposition 3.5.

Let $\mathcal{T}$ be a tree and let $\mathbf{F}(\mathcal{T})$ be the family of trees associated to $\mathcal{T}$ in (38). We label each tree $\mathcal{A}\in\mathbf{F}(\mathcal{T})$ with random variables, so that $\mathcal{A}$ is a random tree in the sense of Definition 3.2. Then, with $\Gamma_{l}^{\mathcal{A}}$ defined in (32) and (33), we have

[TABLE]

with

[TABLE]

Proof.

Take first $\mathcal{T}=\{\emptyset\}$ . By definition, we have $\Delta_{l}(\{\emptyset\})=\Gamma_{l}^{\{\emptyset\}}=Q_{l+1}^{[n]}-Q_{l}.$ On the other hand we have $\mathbf{F}(\mathcal{T})=\mathbf{F}(\{\emptyset\}\mathcal{)}=\{\emptyset\}$ and $c(\emptyset)\Gamma_{l}^{\{\emptyset\}}=\Gamma_{l}^{\{\emptyset\}}=\Delta_{l}(\emptyset).$ So the equality (39) holds true.

Suppose now that (39) holds if $\left|\mathcal{T}\right|\leq q-1$ and let us prove it for $\left|\mathcal{T}\right|=q.$ Using the recurrence formula (35) first and the recurrence hypothesis then we get

[TABLE]

Let $i\in\{1,..,{j}_{\emptyset}(\mathcal{T})\}.$ We have

[TABLE]

We recall that $\mathcal{A}(j_{1},...,j_{i})$ is defined in (37), and by construction we have $(\mathcal{A}(j_{1},...,j_{i}))_{k}^{\prime}=\mathcal{A}_{j_{k}}.$ Then, we use the recurrence formula (33) in the definition of $\Gamma_{l}^{\mathcal{A}_{i}(j_{1},...,j_{i})}$ and we obtain

[TABLE]

Moreover, we have

[TABLE]

so the term in (3.2) is equal to

[TABLE]

We conclude that

[TABLE]

Our aim now is to compute in an explicit way $\Gamma_{l}^{\mathcal{A}}.$ For a tree $\mathcal{A}$ and for a subset of leaves $\Lambda\subset\mathcal{E(A)}$ , we define $\mathcal{A}_{\Lambda}=\mathcal{A}\setminus\Lambda$ : we cut the extreme nodes which belong to $\Lambda$ . Notice that $\mathcal{A}_{\Lambda}$ is no more a tree: for example, if $\mathcal{A}=\{\emptyset,1,2,3\}$ and $\Lambda=\{2\}$ then $\mathcal{A}_{\Lambda}=\{\emptyset,1,3\}$ is not a tree: the first and second axioms of Definition 3.1 are satisfied, not the third. We also stress that $\mathcal{A}_{\Lambda}$ may be the void set in the case $\mathcal{A}=\{\emptyset\}$ and $\Lambda=\{\emptyset\}$ . Thus, we look to $\mathcal{A}_{\Lambda}$ as to a set (not a tree) which may be void as well (remember the convention (25)).

Suppose that $j_{\emptyset}(\mathcal{A})=r.$ Our first concern is to precise how $\Lambda\subset\mathcal{E(A)}$ is decomposed on each of the subtrees $\mathcal{A}_{i}^{\prime},i=1,...,r$ . We define

[TABLE]

We stress that, if no descendant of $i$ belongs to $\Lambda$ , then we have $\{u\in\mathcal{A}_{i}^{\prime}:iu\in\Lambda\}=\varnothing$ (void set). We also have

[TABLE]

We define now $Q_{l}^{\mathcal{A}_{\Lambda}}$ recursively. First, if $\mathcal{A}_{\Lambda}=\varnothing$ (void set) or if $\mathcal{A}_{\Lambda}=\{\emptyset\}$ (ancestor) we define

[TABLE]

Otherwise we have $j_{\emptyset}(\mathcal{A)}=r\geq 1$ , and we define

[TABLE]

with $(\kappa_{1},...,\kappa_{r})=\kappa_{\emptyset}(\mathcal{A})$ and $\Lambda_{i}$ defined in (42).

Before going further, we construct the grid $G_{l}(\mathcal{A}_{\Lambda})$ in a similar way with $G_{l}(\mathcal{A})$ defined in (28). We denote $j_{\emptyset}^{\Lambda}(\mathcal{A}):=j_{\emptyset}(\mathcal{A})-\textup{Card}(\{1,...,j_{\emptyset}(\mathcal{A)\}}\cap\Lambda).$ So $j_{\emptyset}^{\Lambda}(\mathcal{A)}$ represents the number of sons of the ancestor $\emptyset$ which are not in $\Lambda$ (so, that are alive after killing the individuals from $\Lambda$ ). We also denote $\{i_{1},...,i_{j_{\emptyset}^{\Lambda}(\mathcal{A)}}\}=\{1,...,j_{\emptyset}(\mathcal{A})\}\setminus\Lambda$ , the indices of the surviving sons. Then, we define (with the convention $\cup_{j=1}^{0}=\varnothing\}$

[TABLE]

if $\mathcal{A}_{\Lambda}\not=\varnothing$ , and $G_{l}(\varnothing)=\{0,h_{l}\}$ . Here $\Lambda_{i}$ is the set defined in (42). So, we use the refinement procedure for $i_{j}$ only, and not for every $i=1,\dots,j_{\emptyset}(\mathcal{A})$ . In the case $\Lambda=\varnothing$ (void set) $G_{l}(\mathcal{A}_{\Lambda})$ coincides with $G_{l}(\mathcal{A})$ . As for Lemma 3.4, we can show that

[TABLE]

Note that we need to add the union with $\{0,h_{l}\}$ for the case $\mathcal{A}_{\Lambda}=\varnothing$ , i.e. when $\mathcal{A}=\Lambda=\{\emptyset\}$ . We denote

[TABLE]

the reordering of $G_{l}(\mathcal{A}_{\Lambda})$ . We notice that for every $k=1,....,m$ one has $s_{k}-s_{k-1}=h_{l+p_{k}}$ for some $p_{k}=1,2,...,|\mathcal{A}_{\Lambda}|$ . Thus, we produce a sequence $p(\mathcal{A}_{\Lambda},\kappa(\mathcal{A}_{\Lambda}))$ associated to $\mathcal{A}_{\Lambda}$ , and we have

[TABLE]

Proposition 3.6.

Let $\Gamma_{l}^{\mathcal{A}}$ defined in (32) and (33) and $Q_{l}^{\mathcal{A}_{\Lambda}}$ defined in (44). Then

[TABLE]

The above sum includes $\Lambda=\varnothing$ (void set) and $\Lambda=\mathcal{E(A)}.$

Before giving the proof of the above proposition, we need to get a more detailed description of the set $\Lambda$ and of the decomposition given in (42). Let $\mathcal{A}$ be such that $j_{\emptyset}(\mathcal{A})>0$ , so that $\emptyset\not\in\mathcal{E}(\mathcal{A})$ . Then, for any $\Lambda\subset\mathcal{E}(\mathcal{A})$ , we denote

[TABLE]

where $\Lambda_{i}$ is defined by (42). We define now the converse operation: given a sequence of sets $\Lambda_{i}^{\prime}\subset\mathcal{E(}\mathcal{A}_{i}^{\prime}),i=1,...,r$ we define

[TABLE]

In order to precise the structure of $\Lambda^{\prime}$ we consider the sets of indices $J_{i}\subset\{1,...,r\}$ defined by

[TABLE]

We stress that for $i\in J_{3},$ the set $\Lambda_{i}^{\prime}$ is not void and does not contain the ancestor $\emptyset.$ Then the set $\Lambda^{\prime}$ defined in (50) is given by

[TABLE]

and we define

[TABLE]

Lemma 3.7.

Let $\mathcal{A}$ be a tree such that $j_{\emptyset}(\mathcal{A})=r>0$ . Then, we have for any $\Lambda\in\mathcal{E}(\mathcal{A})$ and any $\Lambda_{i}^{\prime}\in\mathcal{E}(\mathcal{A}^{\prime}_{i})$ , $i=1,\dots,r$

[TABLE]

Proof.

We just check the first equality. Let $\Lambda^{\prime}=D^{-1}(\Lambda_{1}^{\prime},...,\Lambda_{r}^{\prime}).$ We have to prove that for each $i=1,...,r$ we have $\Lambda_{i}^{\prime}=D_{i}(\Lambda^{\prime})$ , where $D_{i}$ is the $i$ th coordinate of the application $D$ defined by (49). From (51), we have $\Lambda^{\prime}=\cup_{1\leq i\leq r:\Lambda^{\prime}_{i}\not=\varnothing}\{iu:u\in\Lambda^{\prime}_{i}\}$ . Thus, $D_{i}(\Lambda)=\varnothing$ if $\Lambda^{\prime}_{i}=\varnothing$ and $D_{i}(\Lambda)=\{u:u\in\Lambda^{\prime}_{i}\}=\Lambda^{\prime}_{i}$ otherwise. The second equality is verified in a similar way. ∎

Proof of Proposition 3.6..

If $j_{\emptyset}(\mathcal{A})=0$ then $\mathcal{A}=\{\emptyset\}$ and $\Gamma_{l}^{\mathcal{A}}=Q_{l+1}^{n}-Q_{l}=Q_{l}^{\{\emptyset\}}-Q_{l}^{\varnothing}=Q_{l}^{\mathcal{A}_{\Lambda_{1}}}-Q_{l}^{\mathcal{A}_{\Lambda_{2}}}$ with $\Lambda_{1}$ is the void set and $\Lambda_{2}=\{\emptyset\}.$ So (48) holds.

If $j_{\emptyset}(\mathcal{A})=r>0$ then, using the recurrence hypothesis

[TABLE]

Let $\Lambda=D^{-1}(\Lambda_{j_{1}},...,\Lambda_{j_{r}}).$ We have $\textup{Card}(\Lambda_{j_{1}})+\dots+\textup{Card}(\Lambda_{j_{r}})=\textup{Card}(\Lambda)$ , and according to (44)

[TABLE]

Since every $\Lambda\subset\mathcal{E(A)}$ may be decomposed in this way by Lemma 3.7, we get

[TABLE]

3.3 Tree representation of the approximation schemes

We define now a family of trees which describes our approximation schemes. For $\nu\geq 1$ and $l\geq 0$ , let us define the tree $\mathcal{T}_{l}^{\nu}$ as follows:

[TABLE]

with $q_{i}(l,\nu),m(l,\nu)$ given in (18), (19) and the convention $\cup_{i=1}^{0}\{...\}=\varnothing$ (void set). These trees are defined by recurrence, and it is not clear at a first glance that the induction ends. This true by the next lemma.

Lemma 3.8.

Let $\alpha>0$ . Let us denote for $k\in\mathbb{N}$ ,

[TABLE]

We have $\cup_{k\in\mathbb{N}}\mathcal{H}_{k}=\mathbb{N}^{2}$ and

[TABLE]

In particular, the recursion defining $\mathcal{T}^{\nu}_{l}$ in formula (54) ends for every $(\nu,l)\in\mathbb{N}^{2}$ .

Proof.

Since $\alpha>0$ , we have $(\nu,l)\in\mathcal{H}_{\lceil\max((\nu-l)/\alpha-(l+1),0)\rceil}$ for any $(\nu,l)\in\mathbb{N}^{2}$ , which gives $\cup_{k\in\mathbb{N}}\mathcal{H}_{k}=\mathbb{N}^{2}$ . Let us first observe that for $(\nu,l)\in\mathcal{H}_{0}$ , we have $m(l,\nu)=1$ and thus $\mathcal{T}_{l}^{\nu}=\{\emptyset\}$ by (54). Therefore, the implication will prove that the recursion ends. Let us take then $(\nu,l)\in\mathcal{H}_{k+1}$ and $i$ such that $1\leq i\leq m(l,\nu)-1$ . The last inequality implies $\nu\geq(1+\alpha)li+\alpha i$ and then $q_{i}(l,\nu)\geq 0$ . Thus, we have to check that

[TABLE]

Since $\nu\leq\alpha+(1+\alpha)l+\alpha(k+1)$ , it is sufficient to prove

[TABLE]

After simplifications, this inequality is equivalent to

[TABLE]

which clearly holds true for every $l\in\mathbb{N},i\in\mathbb{N}^{\ast}$ since $\alpha\geq 0$ . ∎

Now, we explain how we associate an approximation scheme to a finite tree. In the following we will work with the specific trees $\mathcal{T}_{l}^{\nu}$ constructed above, but for the moment we consider a general finite tree $\mathcal{T}$ . We recall that ${j}_{\emptyset}(\mathcal{T})$ is the number of sons of the root $\emptyset$ , and for $1\leq i\leq j_{\emptyset}(\mathcal{T})$ , $\mathcal{T}_{i}^{\prime}=\{u\in\mathcal{U},iu\in\mathcal{T}\}$ is the subtree that is rooted at the node $i$ . For a finite tree $\mathcal{T}$ , we define the approximation scheme $\hat{Q}_{h_{l}}(\mathcal{T})$ as follows by induction.

If ${j}_{\emptyset}(\mathcal{T})=0$ then $\mathcal{T}=\{\emptyset\}$ and we put

[TABLE]

If ${j}_{\emptyset}(\mathcal{T})\geq 1$ we define by recurrence

[TABLE]

Since $\left|\mathcal{T}_{i}^{\prime}\right|=\left|\mathcal{T}\right|-1$ and the tree $\mathcal{T}$ is finite, this induction clearly ends.

Proposition 3.9.

(Tree representation of the approximations of order $\nu$ )

For every $\nu\geq 1,l\geq 0$ , we have

[TABLE]

where $\hat{P}_{h_{l}}^{\nu}$ is the approximation defined in (21). Consequently, we have

[TABLE]

with $k(l,\nu)$ defined in (24). In particular, taking $l=0$ (recall that $h_{0}=T)$ we obtain

[TABLE]

Proof.

We consider the sets $\mathcal{H}_{k}$ defined in (55) and prove the result by induction on $k$ . For $(\nu,l)\in\mathcal{H}_{0}$ we have $m(l,\nu)=0$ and $\mathcal{T}_{l}^{\nu}=\{\emptyset\}$ so that $\hat{Q}_{h_{l}}(\mathcal{T}_{l}^{\nu})=P_{h_{l}}^{h_{l+1}}=\hat{P}_{h_{l}}^{\nu}$ . Let $(\nu,l)\in\mathcal{H}_{k+1}$ . From (54), we have $(\mathcal{T}_{l}^{\nu})_{i}^{\prime}=\mathcal{T}_{l+1}^{q_{i}(l,\nu)}$ . Using Lemma 3.8 and the induction hypothesis, we get $\hat{Q}_{l+1}(\mathcal{T}_{l+1}^{q_{i}(l,\nu)})=\hat{P}_{l+1}^{q_{i}(l,\nu)}.$ We also have ${j}_{\emptyset}(\mathcal{T}_{l}^{\nu})=m(l,\nu)-1,$ so the recurrence formulas (56) for $\hat{Q}_{h_{l}}(\mathcal{T}_{l}^{\nu})$ and (21) for $\hat{P}_{h_{l}}^{\nu}$ coincide, proving the claim.∎

We put the above formula in an alternative form which is more enlightening and easier to handle. We define

[TABLE]

Then, $\Delta_{l}(\{\emptyset\})=P_{h_{l}}^{h_{l+1}}-P_{h_{l}}^{h_{l}}$ formula (56) is equivalent to

[TABLE]

This is precisely the operator defined in (35). We are now able to give the main result in this section, which is a consequence of Propositions 3.5 and 3.9.

Theorem 3.10.

Suppose that Hypotheses ( $H_{1}$ ) and ( $H_{2}$ ) hold true. Let $\nu\in\mathbb{N}$ be given and let $\mathcal{T}_{0}^{\nu}$ be the tree constructed in (54) for $l=0$ . Let $\mathbf{F}(\mathcal{T}_{0}^{\nu})$ be the family of trees associated to $\mathcal{T}_{0}^{\nu}$ in (38) and let $c(\mathcal{A})$ be given in (40). Then, we define

[TABLE]

and we have

[TABLE]

Remark 3.11.

The result presented in this section also holds in the more abstract framework described in the introduction, with semigroups defined on a vector space $F$ with seminorms $\|\|_{k}$ . Under ( $\overline{H_{1}}$ ) and ( $\overline{H_{2}}$ ), we have similarly $\left\|(\hat{Q}_{T}(\mathcal{T}_{0}^{\nu})-P_{T})f\right\|_{0}\leq C\left\|f\right\|_{k(0,\nu)}n^{-\nu}$ .

4 Probabilistic representation of the approximation semigroup for some Markov processes

All the results presented in the previous sections apply for an abstract semigroup $P_{t}$ with a family of approximation schemes corresponding to the abstract operators $Q_{l}$ . If one wants to use a Monte Carlo algorithm, one needs to use some probabilistic representation for $Q_{l}$ in order to compute the approximation schemes. This probabilistic representation may be very different according to the problem at hand. Nonetheless, a crucial common issue is the variance of the estimator. More precisely, the approximation proposed in (60) has to be seen as an addition of correction terms that can be calculated independently. Instead, it is very important to try to calculate jointly the terms appearing in $\mathbb{E}[\Gamma^{\mathcal{A}}_{0}f]$ for $\mathcal{A}\in\mathbf{F}(\mathcal{T}^{\nu}_{0})$ . This is the sum of $2^{\textup{Card}(\mathcal{E}(\mathcal{A}))}$ terms with rather close values since $\mathbb{E}[Q^{\mathcal{A}_{\Lambda}}_{0}f]=P_{T}f+O(h_{1})$ . If one would use independent samples to compute each term, the correcting term $\mathbb{E}[c(\mathcal{A})\Gamma^{\mathcal{A}}_{0}f]$ would have roughly $c(\mathcal{A})^{2}\times 2^{\textup{Card}(\mathcal{E}(\mathcal{A}))}$ times the variance of the initial basic Monte-Carlo estimator, which would make the approximation (60) poorly efficient. Fortunately, it is in general possible to do much better.

The goal of this section is to precise the probabilistic representation of the approximation schemes, when considering diffusion processes or Piecewise Deterministic Markov Processes (PDMP). In these cases, it is possible to specify a probabilistic representation of all the schemes $(Q^{\mathcal{A}_{\Lambda}}_{0},\Lambda\subset\mathcal{E}(\mathcal{A}))$ on the same probability space, and such that the variance of $c(\mathcal{A})\Gamma^{\mathcal{A}}_{0}f$ is bounded.

4.1 Probabilistic representation of $Q^{\mathcal{A}_{\Lambda}}_{0}$

We start by presenting a general framework. We consider a Polish space $\mathcal{Z}$ and consider a kernel $\Theta:\mathbb{R}_{+}\times\mathcal{Z}\times\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ such that the application $(t,z,x)\rightarrow\Theta(t,z,x)$ is measurable. Moreover, we consider an independent random variable $Z:\Omega\rightarrow\mathcal{Z}$ and define

[TABLE]

We will assume that for some $\beta\in\mathbb{N}$ and $\alpha>0,$ the estimates ( $H_{1}$ ) and ( $H_{2}$ ) hold true. We consider now a grid $\Pi=\{0=s_{0}<s_{1}<....<s_{m}=T\}$ and denote $\delta_{k}=s_{k+1}-s_{k}$ . We also consider a sequence of independent copies $Z_{k}$ of $Z$ and define the random vector fields

[TABLE]

and the approximating flow defined by $X_{0}^{\Pi}(x)=x$ and

[TABLE]

To a tree $\mathcal{A}$ and to a subset $\Lambda\subset\mathcal{E(A)}$ we associate $Q_{0}^{\mathcal{A}_{\Lambda}}$ defined in (44) and the grid $\Pi_{0}(\mathcal{A}_{\Lambda})$ defined in (45). It is easy to check that we have the probabilistic representation

[TABLE]

Therefore, (60) (with $c(\mathcal{A})$ given in (40)) can be rewritten as

[TABLE]

It is clear that $\mathbb{E}[f(X_{T}^{\Pi_{0}(\mathcal{A}_{\Lambda})}(x))]$ (and consequently $\widehat{Q}_{T}(\mathcal{T}_{0}^{\nu})f(x))$ may be computed using Monte-Carlo simulation, and Theorem 3.10 gives

[TABLE]

Remark 4.1.

The above estimate involves $\left\|f\right\|_{k(0,\nu),\infty}$ which requires much regularity for the test function $f$ . However, under some supplementary regularity and non degeneracy assumptions one may prove convergence in total variation distance. Precisely, one may consider measurable and bounded test functions and replace $\left\|f\right\|_{k(0,\nu),\infty}$ by $\left\|f\right\|_{\infty}$ in the estimation of the error. This has been done in Bally and Rey [8] for usual approximation schemes and the uniform grid (which corresponds in our framework to $\nu=1$ , i.e. to $\mathcal{T}_{0}^{\nu}=\{\emptyset\}$ ). The supplementary hypothesis are the following. First, one has to assume that $Z:\Omega\rightarrow\mathbb{R}^{q}$ satisfies the so called Doeblin condition: there exists $\varepsilon>0,r>0$ and $z\in\mathbb{R}^{q}$ such that for every measurable set $A\subset\{z^{\prime}:|z^{\prime}-z|<r\}$ one has $\mathbb{P}(Z\in A)\geq\varepsilon\lambda(A)$ where $\lambda$ is the Lebesgue measure. Moreover one has to assume some non degeneracy condition on the gradient of $\Theta$ with respect to $z$ . However the proof is technical and non trivial, so we do not consider this possible extension in the present paper.

4.2 Probabilistic representation of $\Gamma^{\mathcal{A}_{\Lambda}}_{0}$

We now specify, on some examples, how to sample jointly the random variables $Z$ for all the schemes $X^{\Pi_{0}(\mathcal{A}_{\Lambda})}(x)$ for $\Lambda\subset\mathcal{E}(\mathcal{A})$ .

4.2.1 Approximation schemes for SDEs

We deal with approximation schemes for the $d$ dimensional diffusion process $X_{t}$ which solves the SDE (1):

[TABLE]

Here $\circ dW^{j}$ denotes the Stratonovich integral and $\overline{b}$ designates the drift coefficient that one obtains when passing from the Itô integral to the Stratonovich integral. We assume that the coefficients $\sigma_{j}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ and $b:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ are $C^{\infty}$ , bounded with bounded derivatives of any order.

We start with the Euler scheme. It corresponds to

[TABLE]

and $Z$ being distributed as standard normal random variable on $\mathbb{R}^{d}$ . The finest discretization is $\Pi_{0}(\mathcal{A})=\{0=s_{0}<s_{1}<\dots<s_{m}=h_{0}=T\}$ , and one therefore needs $m$ independent random variables $Z_{0},\dots,Z_{m-1}$ . The grids $\Pi_{0}(\mathcal{A}_{\Lambda})$ with $\Lambda\subset\mathcal{E}(\mathcal{A})$ are sub-grids of $\Pi_{0}(\mathcal{A}_{\Lambda})$ . For some indices $i_{k}$ , it goes directly from $s_{i_{k}}$ to $s_{i_{k}+n}$ while the uniform discretization of $[s_{i_{k}},s_{i_{k}+n}]$ is contained in $\Pi_{0}(\mathcal{A})$ . One takes then $\frac{Z_{i_{k}}+\dots+Z_{i_{k}+n-1}}{\sqrt{n}}$ for the corresponding normal variable for $\Pi_{0}(\mathcal{A}_{\Lambda})$ .

We now present the Ninomiya and Victoir scheme [21]. We use the following notation: for a vector field $V:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ , we define $\Phi_{V}$ to be the solution of the ODE

[TABLE]

and denote $\exp(tV)(x)=\Phi_{V}(x,t)$ . Then, we set

[TABLE]

Let $\rho$ be a Bernoulli random variable such that $P(\rho=1)=P(\rho=0)=\frac{1}{2}$ and let $Z\sim\mathcal{N}_{d}(0,I_{d})$ be an independent standard normal random variable on $\mathbb{R}^{d}$ . Then $\Theta(\delta,(Z,\rho),x)$ represents the Ninomiya and Victoir scheme. We denote

[TABLE]

One may show, adapting for example the proof of Theorem 1.18 in [3], that

[TABLE]

and more generally that ( $H_{1}$ ) holds with $\alpha=2$ and $\beta=6$ . Therefore, the tree $\mathcal{T}_{0}^{\nu}$ will be different from the one of the Euler scheme and shorter. To sample the Ninomiya and Victoir scheme on the grids $\Pi_{0}(\mathcal{A})$ , we take a sequence $(Z_{k},\rho_{k})_{k\in\{0,\dots,m-1\}}$ of independent copies of $(Z,\rho)$ . We define the corresponding flow by

[TABLE]

For each grid $\Pi_{0}(\mathcal{A}_{\Lambda})$ with $\Lambda\subset\mathcal{E}(\mathcal{A})$ , we again take $\frac{Z_{i_{k}}+\dots+Z_{i_{k}+n-1}}{\sqrt{n}}$ each time that the discretization goes from $s_{i_{k}}$ to $s_{i_{k}+n}$ , and take $\rho_{i_{k}}$ for the associated Bernoulli variable.

4.2.2 Approximation schemes for PDMPs

We consider the infinitesimal operator

[TABLE]

Here $(E,\mathcal{E})$ is a measurable space, $\nu$ is a finite measure on $E$ , $b,\lambda:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ are globally Lipschitz continuous functions, $c:E\times\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ is measurable, bounded and Lipschitz continuous with respect to $x$ , uniformly with respect to $z\in E$ . We set $\bar{\lambda}(x)=\lambda(x)/\|\lambda\|_{\infty}$ . We denote by $P_{t}$ the semigroup associated to the infinitesimal operator $L$ . The probabilistic representation of $P_{t}$ is given in the following way.

Let $J$ be a Poisson process with intensity $\nu(E)\|\lambda\|_{\infty}$ , and a sequence of independent random variables $(Z_{k},U_{k})_{k\in\mathbb{N}}$ such that

[TABLE]

and $Z_{k}$ being independent of $U_{k}$ . Then, we define $X_{t}(x)$ as the solution of

[TABLE]

where $T_{k}$ is the time of the $k$ -th jump of $J$ . It is well known that under our hypothesis the above equation has a unique solution: between two jump times $t\in[T_{k-1},T_{k})$ it follows the deterministic curve given by $dX_{t}=b(X_{t})dt$ , and at time $T_{k}$ it makes the jump $c(Z_{k},X_{T_{k}-})$ if $U_{k}\leq\bar{\lambda}(X_{s-})$ . This process satisfies $P_{t}f(x)=\mathbb{E}(f(X_{t}(x)))$ . This is one particular possible description of PDMPs. There is a huge literature concerning this type of process and their applications, see e.g. [15].

We now define the approximation scheme. Let $\tilde{X}_{t}(x)$ be the solution of

[TABLE]

which is the solution of (67) for $b\equiv 0$ . Then, we define

[TABLE]

where $\tilde{P}_{h}$ is the semigroup associated to $\tilde{X}$ . On the discretization time-grid $\Pi=\{0=s_{0}<s_{1}<\dots<s_{m}=h_{0}=T\}$ , this amounts to consider

[TABLE]

where $\Theta$ is defined recursively by $\Theta(\delta,(0,()),x)=x+b(x)\delta$ and

[TABLE]

Here, the generic Polish space $\mathcal{Z}$ introduced in Subsection 4.1 is $\cup_{n\in\mathbb{N}}\{(n,(z_{k},u_{k})_{1\leq k\leq n}):z_{k}\in E,u_{k}\in[0,1]\}$ . Let us note that other approximation schemes are possible. The interest of (69) is that all the schemes $X^{\Pi_{0}(\mathcal{A}_{\Lambda})}$ with $\Lambda\subset\mathcal{E}(\mathcal{A})$ are sampled from the same random variables $(J_{t})_{t\in[0,T]}$ and $(Z_{k},U_{k})_{1\leq k\leq J_{T}}$ and are likely to have very similar jumps, which is interesting to reduce the variance of $\Gamma^{\mathcal{A}}_{0}f$ .

Let us assume now that $b$ , $\lambda$ and $x\mapsto c(z,x)$ are $C^{\infty}$ , bounded with bounded derivatives (uniformly in $z$ ). We check that ( $H_{2}$ ) holds in this case. To do so, we introduce $\Phi_{b}(x,t)$ the flow associated to $b$ , see equation (66). We have (see e.g. [15], Lemma 7.3.3)

[TABLE]

By differentiating this equation, we get by induction on $k$ (we clearly have $\|P_{t}f\|_{\infty}\leq\|f\|_{\infty}$ for $k=0$ ) that

[TABLE]

using the Faà di Bruno formula and Gronwall’s lemma. This property ( $H_{2}$ ) has been studied for more general jump SDEs very recently by Bally, Goreac and Rabiet [6]. The next lemma proves that ( $H_{1}$ ) also holds with with $\alpha=1$ and $\beta=2$ .

Lemma 4.2.

We assume that $b$ , $\lambda$ and $x\mapsto c(z,x)$ are $C^{\infty}$ , bounded with bounded derivatives, uniformly in $z$ . Then, we have

[TABLE]

Proof.

From (70) and $f(\Phi_{b}(x,t))=f(x)+\int_{0}^{t}b(\Phi_{b}(x,s))\nabla f(\Phi_{b}(x,s))ds$ , we get

[TABLE]

which leads to

[TABLE]

Since $P_{t}f=f+\int_{0}^{t}P_{t-s}Lfds$ , we get from (71) $\|P_{t}f-f-tLf\|_{k,\infty}\leq Ct^{2}\|Lf\|_{k+1,\infty}\leq Ct^{2}\|f\|_{k+2,\infty}$ . We define $\tilde{\Phi}_{b}(x,t)=x+b(x)t$ which has bounded derivatives, uniformly in $t\in[0,T]$ . Let $\tilde{L}f(x)=\int_{E}(f(x+c(z,x))-f(x))\lambda(x)\nu(dz)$ be the infinitesimal generator of (68). We have similarly

[TABLE]

We obviously have $\|f\circ\tilde{\Phi}_{b}(\cdot,t)-f-tb\nabla f\|_{k,\infty}\leq Ct^{2}\|f\|_{k+2,\infty}.$ Since $\|f\circ\tilde{\Phi}_{b}(\cdot,t)-f\|_{k,\infty}\leq Ct\|f\|_{k+1,\infty}$ , we also have $\|t\tilde{L}f\circ\tilde{\Phi}_{b}(\cdot,t)-t(Lf-b\nabla f)\|_{k,\infty}\leq Ct^{2}\|f\|_{k+1,\infty}$ , which yields to $\|P_{h}f-P^{h}_{h}f\|_{k,\infty}\leq C\|f\|_{k+2,\infty}h^{2}$ . ∎

4.3 Estimates of the variance on the Euler scheme for SDEs

The aim of this section is to estimate the variance of the algorithm given in (63), (64) in the case of the Euler scheme for SDEs. We consider the $\mathbb{R}^{d}$ valued diffusion process solution of the SDE (1). Given a random grid

[TABLE]

we construct the corresponding Euler scheme by $X_{0}^{\Pi}=x$ and

[TABLE]

In (64), we have constructed an approximation scheme based on a linear combination of $X_{T}^{\Pi_{0}(\mathcal{A}_{\Lambda})}(x).$ We use here all the notation introduced there. We denote

[TABLE]

Remark 4.3.

The important point here is that all the Euler schemes $X_{T}^{\Pi_{0}(\mathcal{A}_{\Lambda})}(x)$ for $\Lambda\subset\mathcal{E(A)}$ are defined on the same probability space and constructed with the same Brownian motion $W$ . Thus, all the values of $X_{T}^{\Pi_{0}(\mathcal{A}_{\Lambda})}(x)$ are close. When summing according to (72), we may then expect that $\Upsilon_{\mathcal{A}}f(x)$ is small with a small variance. This is precised in the next proposition.

Theorem 4.4.

Suppose that $\sigma_{j},b\in C_{b}^{\infty}(\mathbb{R}^{d})$ . Then, we have for any $f\in C_{b}^{\infty}(\mathbb{R}^{d})$ ,

[TABLE]

In particular, we have $\mathbb{E}[(c(\mathcal{A)}\Upsilon_{\mathcal{A}}f(x))^{2}]\leq C$ and thus $Var[c(\mathcal{A)}\Upsilon_{\mathcal{A}}f(x)]\leq C$ .

The proof needs some preparation. We use the alternative representation of the random grids $G_{0}(\mathcal{A})$ and $G_{0}(\mathcal{A}_{\Lambda})$ with $\Lambda\subset\mathcal{E(A)}$ given by Lemma 3.4 and (45). We recall that $\Pi_{0}(\mathcal{A})$ is the ordered grid $G_{0}(\mathcal{A})$ :

[TABLE]

We write $\mathcal{E(A)}=\{u^{(1)},\dots,u^{(r)}\}$ with $r=\textup{Card}(\mathcal{E(A)})$ . For $k\in\{1,\dots,r\}$ , there exists $i_{k}\in\{0,\dots,m\}$ such that $s_{i_{k}}=t_{0}(u^{(k)})$ . We check that these indices are distinct, and we assume without loss of generality that $i_{1}<\dots<i_{r}$ . These are the "extreme times". We also denote

[TABLE]

Example 4.5.

We consider $n=3$ , $\mathcal{A}=\{\emptyset,1,2,21\}$ with $\kappa(\emptyset)=(0,2)$ and $\kappa(2)=1$ . The grid $\Pi_{0}(\mathcal{A})$ is drawn below to scale, with $s_{i}-s_{i-1}=T/n^{l_{i}}$ , $l_{i}\in\{1,2,3\}$ .

$T$ [math] $s_{0}$$s_{1}$$s_{2}$$s_{3}$$s_{4}$$s_{5}$$s_{6}$$s_{7}$$s_{8}$$s_{9}$

On this example, we have $\mathcal{E}(\mathcal{A})=\{1,21\}$ , $r=2$ , $i_{1}=0$ , $j_{1}=3$ , $i_{2}=5$ and $j_{2}=8$ . The three other grids $\Pi_{0}(\mathcal{A}_{\Lambda})$ needed in the computation of (72) are

[TABLE]

Notice that the grid $\Pi_{0}(\mathcal{A})$ contains, by construction, all the points in the uniform grid on $[s_{i_{k}},s_{j_{k}}]$ , that is $s_{i_{k}+j},$ with $j=1,\dots,n$ . But, if $u^{(k)}\in\Lambda$ , the grid $\Pi_{0}(\mathcal{A}_{\Lambda})$ is not refined between $s_{i_{k}}$ and $s_{j_{k}}$ and does not contain $s_{i_{k}+j}$ with $j=1,\dots,n-1$ . We deduce the next lemma.

Lemma 4.6.

Let $\Lambda\subset\mathcal{E(A)}$ . We note $\Lambda=\{u^{(k_{1})},\dots,u^{(k_{\ell})}\}$ with $\ell=\textup{Card}(\Lambda)$ and set $\mathcal{I}^{\Lambda}=\{0,\dots,m\}\setminus\left(\cup_{\ell^{\prime}=1}^{\ell}\{i_{k_{\ell^{\prime}}}+1,\dots,j_{k_{\ell^{\prime}}}-1\}\right)$ . Then, $\Pi_{0}(A_{\Lambda})=\{s_{i},i\in\mathcal{I}\}$ .

We also recall that, in order to construct our scheme (see (63)), we have considered a kernel $\Theta_{\rho}$ and a sequence of independent random variables $Z_{k}$ and $\rho_{k},$ and we have defined the vector fields $\theta_{k}(x)=\Theta_{\rho_{k}}(\delta_{k},\sqrt{\delta_{k}}Z_{k},x)$ with $\delta_{k}=s_{k+1}-s_{k}$ (see (62)). In the case of the Euler scheme, we have a special representation of these random variables and of these operators. We define

[TABLE]

This corresponds to the quantity defined in (62) with $\delta_{k}=s_{k+1}-s_{k}$ and $Z_{k}=(s_{k+1}-s_{k})^{-1/2}(W_{s_{k+1}}^{j}-W_{s_{k}}^{j}).$ Moreover, for $k=1,...,r=\textup{Card}(\mathcal{E(A)})$ we define

[TABLE]

with the convention $\left(\prod_{j=j_{k-1}}^{i_{k}-1}\right)\theta_{j}(x)=x$ if $i_{k}=j_{k-1}$ . This represents the flow of the approximation scheme which runs from $s_{j_{k-1}}$ to $s_{i_{k}}$ , and that is common to all the grids $\Pi_{0}(\mathcal{A}_{\Lambda})$ for $\Lambda\subset\mathcal{E(A)}$ . We also define the flow between $s_{j_{k}}$ and $s_{m}=T$ :

[TABLE]

We now specify if we use or not the refined grid on the interval $[s_{i_{k}},s_{j_{k}}]$ . For the case where the grid is refined (i.e. when $u^{(k)}\not\in\Lambda$ ), we define

[TABLE]

This is the Euler scheme which starts from $\Psi_{k}(x)$ and runs from $s_{i_{k}}$ to $s_{j_{k}}=s_{i_{k}+n}$ using the uniform step. Instead, for the coarse discretization which goes from $s_{i_{k}}$ to $s_{i_{k}+n}$ directly in one single step (i.e. when $u^{(k)}\in\Lambda$ ), we set

[TABLE]

Now, we are able to define the flow of the whole Euler scheme on the grid $\Pi_{0}(\mathcal{A}_{\Lambda})$ . If $u^{(k)}\in\Lambda$ , we use $\Phi_{k}$ in order to go from $s_{j_{k-1}}$ to $s_{j_{k}}$ . Instead, if $u^{(k)}\in\Lambda$ we use $\phi_{k}$ . Thus, we define, for $k=1,...,r$

[TABLE]

From Lemma 4.6, we get

[TABLE]

As a consequence, we have

[TABLE]

with $r=\textup{Card}(\mathcal{E(A)})$ and $\Gamma_{r}^{\varnothing}(f\circ\Psi_{r})$ defined in (80). We are now in the framework of Appendix A. The above formula has to be understood in the following way: $\theta_{k}^{\Lambda}$ represents the approximating flow associated to the grid $\Pi_{0}(\mathcal{A}_{\Lambda})$ which runs from $s_{j_{k-1}}$ to $s_{i_{k}}.$ So when $k=r,$ we arrive in $s_{i_{r}}.$ This is the last "extreme time". After this, we run with $\Psi_{r+1}$ up to $s_{m}=T.$

Lemma 4.7.

With the notation above, we define families $\mathcal{X}_{k}$ of $2^{k}$ elements of $\mathbb{R}^{d}\times\{-1,1\}$ as follows. We set $\mathcal{X}_{0}=\{(x,1)\}$ and for $k\in\{1,\dots,r\}$ , we define

[TABLE]

where $\mathcal{X}_{k-1}=\{(x^{k-1}_{j},\epsilon^{k-1}_{j}),1\leq j\leq 2^{k-1}\}$ and $\cup$ has to be understood as the concatenation symbol. Then, we have

[TABLE]

Lemma 4.7 is obvious but important for simulation purposes: by branching, it is possible to simulate at the same time all the values of $(X_{T}^{\Pi_{0}(\mathcal{A}_{\Lambda})}(x),(-1)^{\textup{Card}(\Lambda)})$ as explained in Subsection 5.1. More precisely, there is no need to store $\Lambda$ : adding the sign $\epsilon$ to the state space makes the branching dynamics Markovian.

Proof of Theorem 4.4.

Our aim now is to check that $\Phi_{k}$ and $\phi_{k}$ verify the hypothesis of Proposition A.1. Standard estimates concerning Euler schemes (see the short sketch below) give

[TABLE]

and the same estimate holds for $\phi_{k}.$ Then, as a consequence of Proposition A.1 we obtain

[TABLE]

Notice that $s_{j_{k}}-s_{i_{k}}=Tn^{-\left|u^{(k)}\right|}$ where $s_{i_{k}}=t_{0}(u^{(k)})$ . Thus, from Lemma A.3 we get easily that $\left\|\Theta_{k}-\prod_{j=i_{k}}^{j_{k}-1}\theta_{j}\right\|_{r,4,\infty}\leq\frac{C}{n^{\left|u^{(k)}\right|}}$ and then

[TABLE]

by using the Faà di Bruno formula. Then (73) follows. Moreover, we have

[TABLE]

One checks easily by induction that $\sum_{u\in\mathcal{A}}j_{u}(\mathcal{A})\leq\sum_{u\in\mathcal{E}(\mathcal{A})}\left|u\right|$ , which gives (73).

We now give a sketch of the proof of (74). We consider the grid $\Pi_{0}(\mathcal{A})=\{0=s_{0}<s_{1}<....<s_{m}=T\}$ given at the beginning of this section. The corresponding Euler scheme on $[0,T]$ is defined by

[TABLE]

where $\tau(s)=s_{i}$ for $s\in[s_{i},s_{i+1})$ . Then, we have $\Phi_{k}(x)=X_{h_{l}}(x).$ Using Burkholder-Davis-Gundy inequality and the fact that the coefficients are bounded, we get $\mathbb{E}(\left|X_{r}(x)-x\right|^{p})\leq C$ . Moreover, the first derivatives satisfy

[TABLE]

Since $\nabla\sigma_{j}$ and $\nabla b$ are bounded, using Burkholder-Davis-Gundy inequality and Gronwall’s lemma we get $\mathbb{E}(\left|\nabla X_{r}(x)\right|^{p})\leq C.$ For higher order derivatives, the proof is similar.∎

Remark 4.8.

We have a better estimate for (76) when $\sigma(x)$ is constant. In this case, we have from Lemma A.3

[TABLE]

which leads to get $\mathbb{E}(\Upsilon_{\mathcal{A}}^{2}f(x))\leq\frac{C}{n^{3\sum_{u\in\mathcal{E(A)}}\left|u\right|}}$ instead of (73). When $\sigma(x)=0$ , we even have $\left\|\Phi_{k}-\phi_{k}\right\|_{1,p,\infty}\leq\frac{C}{n^{2\left|u^{(k)}\right|}}$ and thus $\mathbb{E}(\Upsilon_{\mathcal{A}}^{2}f(x))\leq\frac{C}{n^{4\sum_{u\in\mathcal{E(A)}}\left|u\right|}}$ .

Remark 4.9.

Since $c(\mathcal{A})=O(n^{\sum_{u\in\mathcal{E(A)}}\left|u\right|})$ , we get $\mathbb{E}(|c(\mathcal{A})\Upsilon_{\mathcal{A}}f(x)|)=O(n^{(1-a)\sum_{u\in\mathcal{E(A)}}\left|u\right|})$ with $a=2$ if $\sigma=0$ , $a=3/2$ when $\sigma$ is a constant function. Thus, the computation of some terms in the sum (60) is useless: we can drop the terms $\mathbb{E}[\Gamma^{\mathcal{A}}_{0}]$ for any $\mathcal{A}$ such that $(a-1)\sum_{u\in\mathcal{E(A)}}\left|u\right|\geq\nu$ . More precisely, $\hat{Q}^{\prime}_{T}(\mathcal{T}_{0}^{\nu})=Q_{0}+\sum_{\mathcal{A}\in\mathbf{F}(\mathcal{T}_{0}^{\nu}):(a-1)\sum_{u\in\mathcal{E(A)}}\left|u\right|<\nu}c(\mathcal{A})\mathbb{E}[\Gamma^{\mathcal{A}}_{0}]$ also satisfies $\left\|(\hat{Q}^{\prime}_{T}(\mathcal{T}_{0}^{\nu})-P_{T})f\right\|_{\infty}\leq C_{l}\left\|f\right\|_{k(0,\nu),\infty}n^{-\nu}$ .

For example, the tree $\mathcal{A}=\{\emptyset,1,11,2,21\}\in\mathbf{F}(\mathcal{T}^{4}_{0})$ is such that $\sum_{u\in\mathcal{E(A)}}\left|u\right|=4$ and its calculation is useless for an approximation of order $4$ for ODEs ( $\sigma=0$ ).

5 Numerical results

5.1 Implementation

First, we have to calculate the tree $\mathcal{T}^{\nu}_{0}$ given by Equation (54) in function of the desired order $\nu$ of convergence. To calculate this tree, we only have to know $\nu$ and the coefficient $\alpha$ that characterizes the order of convergence of the elementary scheme (see $(H_{1})$ hypothesis). For the Euler scheme, we have $\alpha=1$ . For example, the tree corresponding to the approximations of order $\nu=4$ and $\nu=6$ constructed with the Euler scheme are given in Figure 2. To compute these trees, we use the induction formula (54). To help the reader, we have indicated in the node the convergence order (i.e. the value of $q_{i}(l,\nu)$ in (54)) needed in the induction. For example, $q_{1}(0,4)=5$ , $q_{2}(0,4)=4$ and $q_{3}(0,4)=3$ are the value indicated for the sons of the ancestor of the tree $\mathcal{T}^{4}_{0}$ .

The second step consists in calculating the forest $\mathbf{F}(\mathcal{T}^{\nu}_{0})$ . According to Proposition 3.5, each tree of this forest represents a combination of elementary schemes. For example, using the Neveu notation, we have for the Euler scheme

[TABLE]

Let us note that the number of trees in the forest $\mathbf{F}(\mathcal{T}^{\nu}_{0})$ increases rapidly with $\nu$ : for the Euler scheme ( $\alpha=1$ ), we have $\textup{Card}(\mathbf{F}(\mathcal{T}^{4}_{0}))=9$ , $\textup{Card}(\mathbf{F}(\mathcal{T}^{6}_{0}))=67$ , $\textup{Card}(\mathbf{F}(\mathcal{T}^{10}_{0})=29135$ . Nonetheless, these forests can be calculated once and for all.

The last step consist in calculating $\mathbb{E}[\Gamma^{\mathcal{A}}_{0}]$ for all the trees $\mathcal{A}\in\mathbf{F}(\mathcal{T}^{\nu}_{0})$ . Then, we get the approximation by using (60). The key point here is to sample all the Euler schemes from the same Brownian path, as explained in Section 4. Figure 3 gives an illustration of the times grids that are involved in the calculation of $\Gamma^{\mathcal{A}}_{0}$ , with $\mathcal{A}=\{\emptyset,1,11,2\}$ .

To implement the Euler schemes involved in $\Gamma^{\mathcal{A}}$ , it is possible to do it “by hands”, i.e. to generate the random tree and then to simulate simultaneously the $2^{\textup{Card}(\mathcal{E}(\mathcal{A}))}$ schemes. This is easy to do for rather small trees $\mathcal{A}$ , but the drawback is that it requires to write a routine for each $\mathcal{A}\in\mathbf{F}(\mathcal{T}^{\nu}_{0})$ . Thus, it is easy to do this direct implementation up to order three, but then it becomes rather cumbersome since the number of routines needed is rather large. Instead of this, it is possible to write a recursive routine that works for any $\mathcal{A}$ . This routine starts from one initial value and calculates at the same time the $2^{\textup{Card}(\mathcal{E}(\mathcal{A}))}$ schemes and branches each time it finds a leaf. It also calculates inductively the weight $\pm 1$ associated to each scheme. This routine works as follows. It takes in arguments a tree $\mathcal{A}$ , a step $h_{l}$ , and a set $\mathcal{X}=\{(x_{i},\epsilon_{i}),1\leq i\leq 2^{M}\}$ of initial values $x_{i}$ with weights $\epsilon_{i}\in\{-1,+1\}$ . If $\mathcal{A}=\{\emptyset\}$ , it samples independent increments $(W_{kh_{l}/n}-W_{(k-1)h_{l}/n})_{1\leq k\leq n}$ and calculate for each $i$ , the Euler scheme $\hat{X}^{c}_{i,h_{l}}$ on the coarse grid with time step $h_{l}$ starting from $x_{i}$ and the Euler scheme $\hat{X}^{f}_{i,h_{l}}$ on the fine grid with time step $h_{l}/n$ starting from $x_{i}$ . It returns the set of $2^{M+1}$ values

[TABLE]

Otherwise, we have $\mathcal{A}=\{\emptyset,1\mathcal{A}^{\prime}_{1},\dots,r\mathcal{A}^{\prime}_{r}\}$ with $r\leq n$ . We draw $\kappa(\emptyset)=(\kappa_{1},\dots,\kappa_{r})$ a uniform random variable on $\{(k_{1},\dots,k_{r}):0\leq k_{1}<\dots<k_{r}<n\}$ (see Remark 5.1). Then, we apply to all the initial values $k_{1}$ times the Euler scheme with time step $h_{l+1}=h_{l}/n$ , conserving their weights. They are used as argument to apply inductively the function with $\mathcal{A}^{\prime}_{1}$ and $h_{l+1}$ . This generates a set of values and weights to which we apply $k_{2}-k_{1}$ times the Euler scheme with time step $h_{l+1}$ , and then we apply again inductively the function with $\mathcal{A}^{\prime}_{2}$ and $h_{l+1}$ . We repeat this $r$ times, and finally apply $n-(k_{r}+1)$ times the Euler scheme with time step $h_{l+1}$ . This inductive algorithm consists precisely in implementing the formula given in Lemma 4.7.

Remark 5.1.

To sample a uniform random variable on $\mathcal{S}_{r}:=\{(k_{1},\dots,k_{r}):0\leq k_{1}<\dots<k_{r}<n\}$ for $r\in\{1,\dots,n\}$ , we can proceed as follows. If $r=1$ , we simply draw a uniform r.v. on $\{0,\dots,n-1\}$ . For $r\geq 2$ , we proceed by induction and draw a uniform random variable $(\kappa^{\prime}_{1},\dots,\kappa^{\prime}_{r-1})$ on $\mathcal{S}_{r-1}$ . Then, we draw a uniform random variable $\kappa^{\prime}_{r}$ on $\{0,\dots,n-1\}\setminus\{\kappa^{\prime}_{1},\dots,\kappa^{\prime}_{r-1}\}$ . This can be done by sampling an independent random variable $\xi$ that is uniform on $\{0,\dots,n-r\}$ and then set $\kappa^{\prime}_{r}=\xi+\sum_{i=1}^{r-1}\mathbf{1}_{\xi+(i-1)\geq\kappa^{\prime}_{i}}$ . Last, we sort the $\kappa^{\prime}$ , which produces a vector $(\kappa_{1},\dots,\kappa_{r})$ that is uniformly distributed on $\mathcal{S}_{r}$ .

Now that we have an algorithm that is able to calculate $\mathbb{E}[\Gamma^{\mathcal{A}}_{0}]$ for any tree $\mathcal{A}$ , we just have to approximate all these quantities for all the trees $\mathcal{A}\in\mathbf{F}(\mathcal{T}^{\nu}_{0})$ and then to sum these contributions according to (60). To decide how many samples $N_{\mathcal{A}}$ we use to approximate $\mathbb{E}[\Gamma^{\mathcal{A}}_{0}]$ , we fix a desired precision $\varepsilon>0$ , calculate the empirical variance $\hat{V}_{\mathcal{A}}$ of $c(\mathcal{A})\Gamma^{\mathcal{A}}_{0}$ on a small sampling and then take $N_{\mathcal{A}}$ such that $1.96\sqrt{\hat{V}_{\mathcal{A}}/N_{\mathcal{A}}}\approx\varepsilon$ , so that all the terms have roughly the same statistical error with a 95% confidence interval half-width equal to $\varepsilon$ .

5.2 Numerical results for an ODE

To visualize numerically the orders of convergence provided by (60) for the Euler scheme, it is more convenient to work with ODEs. In this case, the variance of the terms is very small and it is possible to observe the five first order of convergence. In the particular case of a linear ODE $dX_{t}=k(\theta-X_{t})dt$ , we can go further, but we can check also that the value of $\Gamma^{\mathcal{A}}_{0}$ is deterministic and does not depend on the uniform random variables $\kappa$ ’s. Thus, we have considered the following example

[TABLE]

with $X_{0}=0.4$ and $\alpha=0.1$ . The exact value is given by $X_{T}=\tanh(\textup{arctanh}(X_{0})+\alpha T)$ . We have drawn on Figure 4, for $T=1$ , the values of $\log(|X_{T}-\hat{\xi}^{n,\nu}_{T}|)$ in function of $\log(T/n)$ with $\nu=2$ , $\nu=3$ , $\nu=4$ and $\nu=5$ , where $\hat{\xi}^{N,\nu}_{T}$ is the estimator of $X_{T}$ given by equation (60) and $f(x)=x$ . The corresponding values of the slopes are $2.003$ , $3.025$ , $4.056$ and $5.012$ which is in line with what is expected. All the values given on this example are with an half-width of the 95% confidence interval that does not exceed $3\times 10^{-7}$ . Our run for the approximation of order $\nu=6$ already gives with $n=6$ a value that is accurate up to $8\times 10^{-8}$ : the exact value $-0.31280256721$ is already in the 95% confidence interval.

Last, let us mention that for this ODE, we have used the same approximation rule as for the SDE and calculated all the terms of (60). However, as noticed in Remark 4.9, it is possible to avoid the calculation of many terms.

5.3 Numerical results for an SDE

We now want to illustrate the orders of convergence for the approximation given by (60) for the Euler-Maruyama scheme. We consider the following SDE

[TABLE]

with $X_{0}=1$ , $k=1$ , $\sigma=0.2$ . In Figure 5, we have plotted the approximation of $\mathbb{E}[X_{T}^{2}]$ with $T=1$ with the orders $\nu\in\{2,3,4\}$ in function of $1/n$ . We still denote by $\hat{\xi}^{n,\nu}_{T}$ the estimator of $\mathbb{E}[X_{T}^{2}]$ given by (60), using the approximation of order $\nu$ with $n$ time-steps. The half-width of the 95% confidence interval is about $2\times 10^{-4}$ . The approximation of order $\nu=5$ is already at this level of precision for $n=5$ , and we have indicated this value as a reference line for the other schemes. The convergence are again in line with what is expected.

5.4 Numerical results for a PDMP

We consider the TCP process with infinitesimal generator

[TABLE]

starting from $X_{0}=1$ , and our goal is to approximate $\mathbb{E}[X_{T}]$ , with $T=1$ . Since the jumps are only downward, the jump intensity $\lambda(x)$ is bounded by $X_{0}\times e$ on $[0,1]$ . We are thus in the framework of paragraph 4.2.2, and use the scheme described in (69). We denote again by $\hat{\xi}^{n,\nu}_{T}$ the estimator of $\mathbb{E}[X_{T}]$ given by (60), using the approximation of order $\nu$ with $n$ . In Figure 6, we have plotted the approximation of $\mathbb{E}[X_{T}]$ with $T=1$ with the orders $\nu\in\{2,3,4\}$ in function of $1/n$ . The half-width of the 95% confidence interval is about $7\times 10^{-4}$ . The approximation of order $\nu=5$ is already at this level of precision for $n=5$ , and we have indicated this value as a reference line for the other schemes. The plot is very similar to the one obtained in Figure 5. This demonstrates numerically that the approximations described by (60) are relevant for a wide range of processes and applications.

5.5 A rough complexity analysis

Now, let us do a rough complexity analysis to understand which order of approximation to use in practice. To make this derivation, we make the assumption for sake of simplicity that the variance corresponding to the term $c(\mathcal{A})\Gamma^{\mathcal{A}}_{0}$ is equal to $1$ for all $\mathcal{A}\in\mathbf{F}(\mathcal{T}^{\nu}_{0})$ , $\nu\geq 1$ . Thus, in this analysis, we will use the same number of samples for all these terms. We also suppose that we want to achieve a precision of order $\varepsilon>0$ , with a standard error which is exactly $\varepsilon$ . Then, we have the following.

For the approximation of order $1$ , we use one Euler scheme with time step $n$ and the standard error is $1/\sqrt{N}$ , where $N$ is the number of samples. We take $n=\varepsilon^{-1}$ and $N=\varepsilon^{-1}$ and the calculation time (counted as the number of Euler iterations used) is $N\times n=\varepsilon^{-3}$ . 2. 2.

For the approximation of order $2$ , we have two terms corresponding to $\mathcal{A}=\{\emptyset\}$ and $\mathcal{A}=\{\emptyset,1\}$ . The first one requires $n$ calculations of Euler iterations. The second one requires between $2n$ and $3n$ Euler iterations: due to the branching implementation, we only calculate $2n$ iterations when $\kappa(\emptyset)=n-1$ and $3n$ iterations when $\kappa(\emptyset)=0$ . For simplicity, we will only consider in this computational cost analysis the worst case and count $3n$ iterations. Since the convergence is of order $2$ , we take $n=\varepsilon^{-1/2}$ . The standard error is $\sqrt{2/N}$ , and we take $N=2\varepsilon^{-1}$ . Thus, the calculation time is $N\times(n+3n)=8\varepsilon^{-5/2}$ . 3. 3.

For the order 3, we have in addition to calculate $\mathbb{E}[\Gamma^{\mathcal{A}}_{0}]$ for $\mathcal{A}=\{\emptyset,1,11\}$ and $\mathcal{A}=\{\emptyset,1,2\}$ that requires respectively $2n+3n=5n$ and $n+2\times 2n+3n=8n$ Euler iterations. We take $n=\varepsilon^{-1/3}$ to have an approximation of order $\varepsilon$ . The standard error is $\sqrt{4/N}$ , and we take $N=4\varepsilon^{-1}$ . Thus, the calculation time is $N\times(4n+5n+8n)=68\varepsilon^{-7/3}$ . 4. 4.

For the order 4, we have in addition to calculate $\mathbb{E}[\Gamma^{\mathcal{A}}_{0}]$ for $\mathcal{A}=\{\emptyset,1,11,111\}$ and $\mathcal{A}=\{\emptyset,1,11,2\}$ , $\mathcal{A}=\{\emptyset,1,2,21\}$ , $\mathcal{A}=\{\emptyset,1,11,2,21\}$ and $\mathcal{A}=\{\emptyset,1,2,3\}$ : they require respectively $7n$ , $2n+2\times 3n+4n=12n$ (see Figure 3), $12n$ , $3n+2\times 4n+5n=16n$ and $n+3\times 2n+3\times 3n+4n=20n$ . The overall cost is $17n+7n+2\times 12n+16n+20n=84n$ . We then take $n=\varepsilon^{-1/4}$ and $N=9\varepsilon^{-1}$ to have a standard error $\varepsilon$ . Thus, the calculation time is $84n\times N=756\varepsilon^{-9/4}$ .

With this rough cost analysis, we would use:

•

the approximation of order 2 rather than the approximation of order 1 if $8\varepsilon^{-5/2}<\varepsilon^{-3}$ , i.e. $\varepsilon<1/64$ ,

•

the approximation of order 3 rather than the approximation of order 2 if $68\varepsilon^{-7/3}<8\varepsilon^{-5/2}$ , i.e. $\varepsilon<(8/68)^{6}\approx 2.6\times 10^{-6}$ ,

•

the approximation of order 4 rather than the approximation of order 3 if $756\varepsilon^{-9/4}<68\varepsilon^{-7/3}$ , i.e. $\varepsilon<(68/756)^{12}\approx 2.8\times 10^{-13}$ .

This analysis shows that in practice the order 3 may be already sufficient for the precision that is usually needed. However, this cost analysis has to be tempered, because the assumption of a unit variance for each term is rather pessimistic. For ODEs or SDEs with constant diffusion coefficient, we already know from our theoretical results (see Remark 4.8) that the variance of $c(\mathcal{A})\Gamma^{\mathcal{A}}_{0}$ may be much smaller. Also, for SDEs, we see from Table 1 that, globally, the terms that are needed for the calculation of order 4 have a smaller variance than the one needed for the order 3, which have also smaller variance than the one needed for the order 2. Of course, there is exception: for example in Table 1, the standard deviation associated to $\{\emptyset,1,11,111\}$ is of same magnitude as the one associated to $\{\emptyset,1,11\}$ or even $\{\emptyset,1\}$ . This is why it is better in practice to estimate first the variance of each term and then determine how many samples are needed to achieve a given precision. For the example of Figure 5, to get a precision of $\varepsilon=2\times 10^{-4}$ , the approximation of order 2 has required 88s ( $n=30$ ), the order 3 about 89s ( $n=10$ ), the order 4 about 214s ( $n=6$ ) and the order 5 about 345s ( $n=5$ ). Thus, the scheme of order 3 is already competitive for this precision with respect to the order 2.

Appendix A Technical results for the variance analysis

We introduce some notation. We consider smooth random fields, that is functions $\varphi:\Omega\times\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ which are measurable with respect to $(\omega,x)$ and such that, for each $\omega,$ the function $x\mapsto\varphi(\omega,x)$ is of class $C^{\infty}(\mathbb{R}^{d})$ . For such a random field we denote

[TABLE]

Moreover, we will say that a sequence of random fields $\varphi_{i},i=1,...,m$ are independent if there are some independent $\sigma-$ algebras $\mathcal{G}_{i},i=1,...,m$ such that $\varphi_{i}$ is $\mathcal{G}_{i}\otimes\mathcal{B}(\mathbb{R}^{d})$ measurable. We will use this property as follows. Suppose that $\Phi$ is $\mathcal{G}_{m}\otimes\mathcal{B}(\mathbb{R}^{d})$ measurable and $\Psi$ and $\Theta$ are $\vee_{i=1}^{m-1}\mathcal{G}_{i}\otimes\mathcal{B}(\mathbb{R}^{d})$ measurable. Then, for every $x\in\mathbb{R}^{d}$ and every $p\geq 1$

[TABLE]

In the sequel we consider a sequence of smooth random fields $\Phi_{i}:\Omega\times\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ and $\phi_{i}:\Omega\times\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ , $i\in\mathbb{N}$ and moreover, a vector field $\varphi:\Omega\times\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ . We assume that $\varphi$ and $(\Phi_{j},\phi_{j}),j\in\mathbb{N}$ are independent. We fix $r\in\mathbb{N}$ and, for a set $\Lambda\subset\{1,...,r\},$ we define

[TABLE]

Moreover, given a multi-index $\alpha,$ we define

[TABLE]

Proposition A.1.

Suppose that for every $p,q\in\mathbb{N}$ , there exists $C_{q,p}$ such that

[TABLE]

Then, for every $p\geq 1$ and every multi-index $\alpha$ we have

[TABLE]

for some $C$ depending on $r$ , $\left|\alpha\right|$ and $C_{\left|\alpha\right|+r,2|\alpha|p}.$

Remark A.2.

This proposition says the following: if at each step the error is of order $\delta_{i}=\left\|\Phi_{i}-\phi_{i}\right\|_{q,p^{\prime},\infty}$ , then after $r$ steps we have an error of order $\delta_{1}\times...\times\delta_{r}.$ This may seem a little surprising, and one may have expected an error of order $\delta_{1}+...+\delta_{r}$ , but this is due to the way how terms are summed with $\sum_{\Lambda\subset\{1,...,r\}}(-1)^{\left|\Lambda\right|}.$

Proof.

Step 1. We use the Faà di Bruno formula $\partial^{\alpha}[f\circ g]=\sum_{\left|\beta\right|\leq\left|\alpha\right|}(\partial^{\beta}f)(g)P_{\alpha,\beta}(g)$ (see (8)) and the inequality between geometric and arithmetic means to upper bound the terms $|\prod_{i=1}^{k}\partial^{\gamma_{i}}g^{j_{i}}|$ defining $P_{\alpha,\beta}(g)$ . We then obtain for random functions $g$

[TABLE]

Besides, for two random fields $g_{1}$ and $g_{2}$ , we write

[TABLE]

Using the inequality between geometric and arithmetic means for the product on $i\not=j$ and then the Cauchy-Schwarz inequality, we get

[TABLE]

Step 2. We prove (82) for $r=1.$ In this case $\Lambda=\varnothing$ or $\Lambda=\{1\}$ so that

[TABLE]

with

[TABLE]

and

[TABLE]

Using (81) and the fact that $\varphi$ is independent of $\lambda\Phi_{1}+(1-\lambda)\phi_{1}$ we get (see (79))

[TABLE]

so that $\left\|A_{\beta}\right\|_{0,p,\infty}\leq C\left\|\Phi_{1}-\phi_{1}\right\|_{0,p,\infty}.$ Moreover, using again (79) first and then (83), we get

[TABLE]

so (82) is proved for $r=1$ .

**Step 3. **Suppose (82) is true for $r-1,$ for every $\alpha$ and every $p\geq 1.$ We prove it for $r.$ We do it first for $\alpha=\varnothing$ (without derivatives) because it is simpler. We write

[TABLE]

Note that we have made a slight abuse of notation here: the notation $\theta_{(r-1)}^{\Lambda^{\prime}}$ is used in fact for $\theta^{\Lambda^{\prime}}_{r}\circ\dots\circ\theta^{\Lambda^{\prime}}_{2}$ , not for $\theta^{\Lambda^{\prime}}_{r-1}\circ\dots\circ\theta^{\Lambda^{\prime}}_{1}$ . Since $(\lambda\Phi_{1}+(1-\lambda)\phi_{1})(x)$ is independent of $\Gamma_{r-1}^{(i)}\varphi,$ we have from (79)

[TABLE]

Then, by using the induction hypothesis, we get

[TABLE]

We prove now (82) for a general multi-index $\alpha$ and make the same abuse of notation for $\theta^{\Lambda^{\prime}}$ . Using (8) and (9) for $f=\varphi(\theta_{(r-1)}^{\Lambda^{\prime}})$ and $g_{1}=\Phi_{1},g_{2}=\phi_{1}$ we obtain

[TABLE]

with

[TABLE]

and

[TABLE]

By assumption $\theta_{2},\dots,\theta_{r}$ are independent of $(\Phi_{1},\phi_{1})$ . Therefore, $\Gamma_{r-1}^{\beta}\varphi(x)$ is independent of $(\Phi_{1},\phi_{1})$ . We use (79) first and then the induction hypothesis and (83) to obtain

[TABLE]

Moreover

[TABLE]

Notice that $\partial^{i}\Gamma_{r-1}^{\beta}\varphi=\Gamma_{r-1}^{(\beta,i)}\varphi$ . Using again (79) and the recurrence hypothesis, we get

[TABLE]

Lemma A.3.

Let $(X_{t}(x))_{t\geq 0}$ denote the flow of the SDE (1) and $\hat{X}_{t}(x)=x+b(x)t+\sigma(x)W_{t}$ the flow of the Euler scheme. We assume that $b$ and $\sigma$ are $C^{\infty}$ , bounded and with bounded derivatives. Then, we have

[TABLE]

with $a=2$ if $\sigma=0$ , $a=3/2$ if $\sigma(x)$ is a constant function and $a=1$ in the general case.

Proof.

We show this result by induction on $q$ . We only focus on the general case, the cases $\sigma=0$ or $\sigma(x)$ constant can be then easily deduced. For $q=0$ , this result is stated for example in Proposition 1.2 [2]. For simplicity of notation, we do the proof in dimension $d=1$ with $b=0$ . We note $\sigma^{(q)}$ the $q$ -th derivative of $\sigma$ . For $q=1$ , we have $\hat{X}_{t}^{(1)}(x)=1+\sigma^{(1)}(x)W_{t}$ and

[TABLE]

Since $\sigma^{(1)}$ is bounded, we have $\forall t>0,\sup_{x}\mathbb{E}[\sup_{s\in[0,t]}|X_{s}^{(1)}(x)|^{p}]<\infty$ . We write

[TABLE]

Since $\sigma^{(1)}$ is bounded and Lipschitz, we get by using the Burkholder-Davis-Gundy inequality and then Jensen inequality

[TABLE]

with a constant $C$ that does not depend on $x$ . We check then again with the BDG inequality that $\mathbb{E}[|X_{s}(x)-x|^{p}]\leq Cs^{p/2}$ since $\sigma$ is bounded and $\mathbb{E}[|X_{s}^{(1)}(x)-1|^{p}]\leq Cs^{p/2}$ since $\sigma^{(1)}$ is bounded and (84). Thus, we have $\mathbb{E}[|\hat{X}_{t}^{(1)}(x)-X_{t}^{(1)}(x)|^{p}]\leq Ct^{p}$ .

We suppose now the result true for $q-1\in\mathbb{N}^{*}$ and that we have shown that

[TABLE]

for each $p$ , with a constant $C$ that does not depend on $x$ . We have $\hat{X}_{t}^{(q)}(x)=\sigma^{(q)}(x)W_{t},$ and by the Faà di Bruno formula

[TABLE]

with $A_{t}=\sum_{m_{1}+\dots+qm_{q}=q,m_{1}\not=q,m_{q}\not=0}c_{m_{1},\dots,m_{q}}\prod_{k=1}^{q}(X^{(k)}_{t}(x))^{m_{k}}\sigma^{(m_{1}+\dots+m_{q})}(X_{t}(x))$ . Note that in this sum is equal to [math] for $q=2$ and otherwise there is at least one $k\in\{2,\dots,q-1\}$ , such that $m_{k}\geq 1$ . This gives $\mathbb{E}[|A_{t}|^{p}]\leq Ct^{p/2}$ by using the induction hypothesis (85) and Hölder type inequalities. Since $X^{(q)}_{0}(x)=0$ , $\sigma^{(1)}$ and $\sigma^{(q)}$ are bounded and $\sup_{x}\mathbb{E}[\sup_{s\in[0,t]}|X_{s}^{(1)}(x)|^{p}]<\infty$ for any $p$ , we get $\mathbb{E}[|X_{t}^{(q)}(x)|^{p}]\leq Ct^{p/2}$ by using BDG and Gronwall inequalities. Therefore, $\tilde{A}_{t}=A_{t}+X^{(q)}_{t}(x)\sigma^{(1)}(X_{t}(x))$ also satisfies $\mathbb{E}[|\tilde{A}_{t}|^{p}]\leq Ct^{p/2}$ .

We now repeat the same arguments as for $q=1$ : from

[TABLE]

we get $\mathbb{E}[|\hat{X}_{t}^{(q)}(x)-X_{t}^{(q)}(x)|^{p}]\leq Ct^{p}$ .

∎

Acknowledgements

Aurélien Alfonsi benefited from the support of the “Chaire Risques Financiers”, Fondation du Risque.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Ankush Agarwal and Emmanuel Gobet. Finite variance unbiased estimation of stochastic differential equations, 2018.
2[2] A. Alfonsi, B. Jourdain, and A. Kohatsu-Higa. Pathwise optimal transport bounds between a one-dimensional diffusion and its Euler scheme. Ann. Appl. Probab. , 24(3):1049–1080, 2014.
3[3] Aurélien Alfonsi. High order discretization schemes for the CIR process: application to affine term structure and Heston models. Math. Comp. , 79(269):209–237, 2010.
4[4] Aurélien Alfonsi, Masafumi Hayashi, and Arturo Kohatsu-Higa. Parametrix methods for one-dimensional reflected SD Es. In Modern problems of stochastic analysis and statistics , volume 208 of Springer Proc. Math. Stat. , pages 43–66. Springer, Cham, 2017.
5[5] Patrik Andersson and Arturo Kohatsu-Higa. Unbiased simulation of stochastic differential equations using parametrix expansions. Bernoulli , 23(3):2028–2057, 2017.
6[6] Vlad Bally, Dan Goreac, and Victor Rabiet. Regularity and stability for the semigroup of jump diffusions with state-dependent intensity. Ann. Appl. Probab. , 28(5):3028–3074, 2018.
7[7] Vlad Bally and Arturo Kohatsu-Higa. A probabilistic interpretation of the parametrix method. Ann. Appl. Probab. , 25(6):3095–3138, 2015.
8[8] Vlad Bally and Clément Rey. Approximation of Markov semigroups in total variation distance. Electron. J. Probab. , 21:Paper No. 12, 44, 2016.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A generic construction for high order approximation schemes of semigroups using random grids

Abstract

1 Introduction

2 Basic development

Remark 2.1**.**

Example 2.2**.**

Remark 2.3**.**

Lemma 2.4**.**

Remark 2.5**.**

Proof of Lemma 2.4.

Remark 2.6**.**

3 High order approximations of semigroups

3.1 Trees, random trees and random grids

3.1.1 Trees

Definition 3.1**.**

3.1.2 Random trees

Definition 3.2**.**

Remark 3.3**.**

3.1.3 Random grids

Lemma 3.4**.**

Proof.

3.2 Operators

Proposition 3.5**.**

Proof.

Proposition 3.6**.**

Lemma 3.7**.**

Proof.

Proof of Proposition 3.6..

3.3 Tree representation of the approximation schemes

Lemma 3.8**.**

Proof.

Proposition 3.9**.**

Proof.

Theorem 3.10**.**

Remark 3.11**.**

4 Probabilistic representation of the approximation semigroup for some Markov processes

4.1 Probabilistic representation of Q0AΛQ^{\mathcal{A}_{\Lambda}}_{0}Q0AΛ​​

Remark 4.1**.**

4.2 Probabilistic representation of Γ0AΛ\Gamma^{\mathcal{A}_{\Lambda}}_{0}Γ0AΛ​​

4.2.1 Approximation schemes for SDEs

4.2.2 Approximation schemes for PDMPs

Lemma 4.2**.**

Proof.

4.3 Estimates of the variance on the Euler scheme for SDEs

Remark 4.3**.**

Theorem 4.4**.**

Example 4.5**.**

Lemma 4.6**.**

Lemma 4.7**.**

Proof of Theorem 4.4.

Remark 4.8**.**

Remark 4.9**.**

5 Numerical results

5.1 Implementation

Remark 5.1**.**

5.2 Numerical results for an ODE

5.3 Numerical results for an SDE

5.4 Numerical results for a PDMP

5.5 A rough complexity analysis

Appendix A Technical results for the variance analysis

Proposition A.1**.**

Remark A.2**.**

Proof.

Lemma A.3**.**

Proof.

Acknowledgements

Remark 2.1.

Example 2.2.

Remark 2.3.

Lemma 2.4.

Remark 2.5.

Remark 2.6.

Definition 3.1.

Definition 3.2.

Remark 3.3.

Lemma 3.4.

Proposition 3.5.

Proposition 3.6.

Lemma 3.7.

Lemma 3.8.

Proposition 3.9.

Theorem 3.10.

Remark 3.11.

4.1 Probabilistic representation of $Q^{\mathcal{A}_{\Lambda}}_{0}$

Remark 4.1.

4.2 Probabilistic representation of $\Gamma^{\mathcal{A}_{\Lambda}}_{0}$

Lemma 4.2.

Remark 4.3.

Theorem 4.4.

Example 4.5.

Lemma 4.6.

Lemma 4.7.

Remark 4.8.

Remark 4.9.

Remark 5.1.

Proposition A.1.

Remark A.2.

Lemma A.3.