Strong convergence rates of probabilistic integrators for ordinary   differential equations

H. C. Lie; A. M. Stuart; T. J. Sullivan

arXiv:1703.03680·math.NA·October 29, 2019·Stat. Comput.

Strong convergence rates of probabilistic integrators for ordinary differential equations

H. C. Lie, A. M. Stuart, T. J. Sullivan

PDF

TL;DR

This paper analyzes the convergence rates of probabilistic integrators for ordinary differential equations, showing they match deterministic methods under certain conditions, thus enabling reliable uncertainty quantification.

Contribution

It extends convergence analysis to probabilistic integrators with relaxed assumptions, demonstrating their mean-square convergence rates match deterministic methods.

Findings

01

Probabilistic integrators achieve the same convergence rates as deterministic ones.

02

Convergence holds for state-dependent, non-Gaussian, and non-centred random perturbations.

03

Results apply to high-order integrators for Lipschitz flows and Euler methods for dissipative fields.

Abstract

Probabilistic integration of a continuous dynamical system is a way of systematically introducing model error, at scales no larger than errors introduced by standard numerical discretisation, in order to enable thorough exploration of possible responses of the system to inputs. It is thus a potentially useful approach in a number of applications such as forward uncertainty quantification, inverse problems, and data assimilation. We extend the convergence analysis of probabilistic integrators for deterministic ordinary differential equations, as proposed by Conrad et al.\ (\textit{Stat.\ Comput.}, 2017), to establish mean-square convergence in the uniform norm on discrete- or continuous-time solutions under relaxed regularity assumptions on the driving vector fields and their induced flows. Specifically, we show that randomised high-order integrators for globally Lipschitz flows and…

Equations447

\frac{d}{d t} u (t)

\frac{d}{d t} u (t)

u (0)

Φ^{t} (u_{0}) = u_{0} + \int_{0}^{t} f (Φ^{s} (u_{0})) d s

Φ^{t} (u_{0}) = u_{0} + \int_{0}^{t} f (Φ^{s} (u_{0})) d s

t_{k} : = k τ for k \in [K] : = {0, 1, \dots, K},

t_{k} : = k τ for k \in [K] : = {0, 1, \dots, K},

k \in [K] max ∥ u_{k} - u_{k} ∥ \leq C τ^{q} .

k \in [K] max ∥ u_{k} - u_{k} ∥ \leq C τ^{q} .

u_{k + 1} = Φ^{τ} (u_{k}) = u_{k} + \int_{t_{k}}^{t_{k + 1}} f (u (s)) d s

u_{k + 1} = Φ^{τ} (u_{k}) = u_{k} + \int_{t_{k}}^{t_{k + 1}} f (u (s)) d s

U_{k + 1} : = Ψ^{τ} (U_{k}) + ξ_{k} (τ), U_{0} = u_{0},

U_{k + 1} : = Ψ^{τ} (U_{k}) + ξ_{k} (τ), U_{0} = u_{0},

\max_{k\in[K]}\mathbb{E}\bigl{[}\|u_{k}-U_{k}\|^{2}\bigr{]}\leq C\tau^{2\min\{p,q\}}.

\max_{k\in[K]}\mathbb{E}\bigl{[}\|u_{k}-U_{k}\|^{2}\bigr{]}\leq C\tau^{2\min\{p,q\}}.

\mathbb{E}\biggl{[}\max_{k\in[K]}\|u_{k}-U_{k}\|^{n}\biggr{]}\leq C\tau^{n\cdot\min\{p-c,q\}},

\mathbb{E}\biggl{[}\max_{k\in[K]}\|u_{k}-U_{k}\|^{n}\biggr{]}\leq C\tau^{n\cdot\min\{p-c,q\}},

\mathbb{P}\biggl{[}\max_{k\in[K]}\|u_{k}-U_{k}\|\leq r\biggr{]}\geq 1-C\tau^{n\min\{p-c,q\}}r^{-n};

\mathbb{P}\biggl{[}\max_{k\in[K]}\|u_{k}-U_{k}\|\leq r\biggr{]}\geq 1-C\tau^{n\min\{p-c,q\}}r^{-n};

Lip (Φ) : = min {L \geq 0 ∣ ∥Φ (x) - Φ (y) ∥ \leq L ∥ x - y ∥}

Lip (Φ) : = min {L \geq 0 ∣ ∥Φ (x) - Φ (y) ∥ \leq L ∥ x - y ∥}

ab \leq \frac{δ}{r} a^{r} + \frac{1}{r ^{*} δ ^{r^{*} / r}} b^{r^{*}}, for all a, b \geq 0.

ab \leq \frac{δ}{r} a^{r} + \frac{1}{r ^{*} δ ^{r^{*} / r}} b^{r^{*}}, for all a, b \geq 0.

∥ x - y ∥^{2} \leq (1 + δ) ∥ x ∥^{2} + (1 + δ^{- 1}) ∥ y ∥^{2},

∥ x - y ∥^{2} \leq (1 + δ) ∥ x ∥^{2} + (1 + δ^{- 1}) ∥ y ∥^{2},

x_{k} \leq α_{k} + 0 \leq j < k \sum β_{j} x_{j} and α_{k} \leq A,

x_{k} \leq α_{k} + 0 \leq j < k \sum β_{j} x_{j} and α_{k} \leq A,

(x + y)^{n} \leq x^{n} (1 + δ 2^{n - 1}) + y^{n} (1 + (2/ δ)^{n - 1}) .

(x + y)^{n} \leq x^{n} (1 + δ 2^{n - 1}) + y^{n} (1 + (2/ δ)^{n - 1}) .

j = 1 \sum N s_{j}^{m} \leq N^{m - 1} j = 1 \sum N ∣ s_{j} ∣^{m} .

j = 1 \sum N s_{j}^{m} \leq N^{m - 1} j = 1 \sum N ∣ s_{j} ∣^{m} .

j = 1 \sum N s_{j}^{m}

j = 1 \sum N s_{j}^{m}

= N^{m - 1} j = 1 \sum N ∣ s_{j} ∣^{m},

⟨ f (x) - f (y), x - y ⟩ \leq μ ∥ x - y ∥^{2}, for all x, y \in R^{d},

⟨ f (x) - f (y), x - y ⟩ \leq μ ∥ x - y ∥^{2}, for all x, y \in R^{d},

u \in R^{d} sup ∥ Ψ^{τ} (u) - Φ^{τ} (u) ∥ \leq C_{Ψ} τ^{q + 1} .

u \in R^{d} sup ∥ Ψ^{τ} (u) - Φ^{τ} (u) ∥ \leq C_{Ψ} τ^{q + 1} .

∥ Φ^{τ} (u) - Ψ^{τ} (u) ∥ \leq C^{'} (u) τ^{q + 1},

∥ Φ^{τ} (u) - Ψ^{τ} (u) ∥ \leq C^{'} (u) τ^{q + 1},

\mathbb{E}\bigl{[}\|\Phi^{\tau}(U_{k})-\Psi^{\tau}(U_{k})\|^{n}\bigr{]}\leq C\tau^{n(q+1)}

\mathbb{E}\bigl{[}\|\Phi^{\tau}(U_{k})-\Psi^{\tau}(U_{k})\|^{n}\bigr{]}\leq C\tau^{n(q+1)}

\mathbb{E}\bigl{[}\|\xi_{k}(\tau)\|^{r}\bigr{]}\leq\left(C_{\xi,R}\tau^{p+1/2}\right)^{r}.

\mathbb{E}\bigl{[}\|\xi_{k}(\tau)\|^{r}\bigr{]}\leq\left(C_{\xi,R}\tau^{p+1/2}\right)^{r}.

U_{k + 1} : = Ψ^{τ} (U_{k}) + ξ_{k} (τ, U_{k}), for all k \in [K] .

U_{k + 1} : = Ψ^{τ} (U_{k}) + ξ_{k} (τ, U_{k}), for all k \in [K] .

E i = 1 \sum T / τ ∥ ξ_{i} (τ) ∥^{w}^{v}

E i = 1 \sum T / τ ∥ ξ_{i} (τ) ∥^{w}^{v}

E i = 1 \sum T / τ ∥ ξ_{i} (τ) ∥^{w}^{v}

E i = 1 \sum T / τ ∥ ξ_{i} (τ) ∥^{w}^{v}

\leq (\frac{T}{τ})^{v} (C_{ξ, R} τ^{p + 1/2})^{w v}

= (T C_{ξ, R}^{w} τ^{w (p + 1/2) - 1})^{v},

e_{k+1}=\bigl{(}\Phi^{\tau}(u_{k})-\Phi^{\tau}(U_{k})\bigr{)}-\bigl{(}\Psi^{\tau}(U_{k})-\Phi^{\tau}(U_{k})\bigr{)}-\xi_{k}(\tau).

e_{k+1}=\bigl{(}\Phi^{\tau}(u_{k})-\Phi^{\tau}(U_{k})\bigr{)}-\bigl{(}\Psi^{\tau}(U_{k})-\Phi^{\tau}(U_{k})\bigr{)}-\xi_{k}(\tau).

\mathbb{E}\biggl{[}\max_{k\in[K]}\|e_{k}\|^{2}\biggr{]}\leq C\tau^{2p\wedge 2q}.

\mathbb{E}\biggl{[}\max_{k\in[K]}\|e_{k}\|^{2}\biggr{]}\leq C\tau^{2p\wedge 2q}.

E [ℓ \in [K] max ∥ e_{ℓ} ∥^{n}] \leq \overline{C} τ^{n (q \land (p - 1/2))} .

E [ℓ \in [K] max ∥ e_{ℓ} ∥^{n}] \leq \overline{C} τ^{n (q \land (p - 1/2))} .

\overline{C} : = 2 T max {(4 C_{Ψ})^{n}, (2 C_{ξ, R})^{n}} exp (T C_{Φ} (n, τ^{*}))

\overline{C} : = 2 T max {(4 C_{Ψ})^{n}, (2 C_{ξ, R})^{n}} exp (T C_{Φ} (n, τ^{*}))

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

∎

11institutetext: Han Cheng Lie 22institutetext: Institute of Mathematics, Universität Potsdam, Campus Golm, Haus 9, Karl-Liebknecht Str. 24-25, 14476 Potsdam OT Golm, Germany

https://orcid.org/0000-0002-6905-9903

22email: [email protected] 33institutetext: A. M. Stuart44institutetext: Department of Computing and Mathematical Sciences, California Institute of Technology, 1200 East California Boulevard, Pasadena, CA 91125, United States of America

44email: [email protected] 55institutetext: T. J. Sullivan66institutetext: Freie Universität Berlin, and Zuse Institute Berlin, Takustrasse 7, 14195 Berlin, Germany

66email: [email protected]

Strong convergence rates of probabilistic integrators for ordinary differential equations

††thanks: HCL and TJS are supported by the Freie Universität Berlin within the Excellence Initiative of the German Research Foundation. HCL is supported by the Universität Potsdam. AMS is grateful to DARPA, EPSRC and ONR for funding. This material was based upon work partially supported by the National Science Foundation under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of these funding agencies and institutions.

Han Cheng Lie

A. M. Stuart

T. J. Sullivan

(Received: September 28, 2018 / Accepted: December 28, 2018 / Handling Editor: C. J. Oates This is a post-peer-review, pre-copyedit version of an article published in Statistics and Computing (2019). The final authenticated version is available online at: https://doi.org/10.1007/s11222-019-09898-6.)

Abstract

Probabilistic integration of a continuous dynamical system is a way of systematically introducing discretisation error, at scales no larger than errors introduced by standard numerical discretisation, in order to enable thorough exploration of possible responses of the system to inputs. It is thus a potentially useful approach in a number of applications such as forward uncertainty quantification, inverse problems, and data assimilation. We extend the convergence analysis of probabilistic integrators for deterministic ordinary differential equations, as proposed by Conrad et al. (Stat. Comput., 2017), to establish mean-square convergence in the uniform norm on discrete- or continuous-time solutions under relaxed regularity assumptions on the driving vector fields and their induced flows. Specifically, we show that randomised high-order integrators for globally Lipschitz flows and randomised Euler integrators for dissipative vector fields with polynomially-bounded local Lipschitz constants all have the same mean-square convergence rate as their deterministic counterparts, provided that the variance of the integration noise is not of higher order than the corresponding deterministic integrator. These and similar results are proven for probabilistic integrators where the random perturbations may be state-dependent, non-Gaussian, or non-centred random variables.

Keywords:

probabilistic numerical methods ordinary differential equations convergence rates uncertainty quantification

MSC:

65L20 65C99 37H10 68W20

††journal: Statistics and Computing

1 Introduction

This article concerns the analysis of probabilistic numerical integrators for deterministic initial value problems of the form

[TABLE]

where $T>0$ . Let $\Phi^{t}\colon\mathbb{R}^{d}\to\mathbb{R}^{d}$ denote the flow induced by (1), so that

[TABLE]

for all $(t,u_{0})\in[0,T]\times\mathbb{R}^{d}$ . Given an integration time step $\tau>0$ such that $K\coloneqq T/\tau\in\mathbb{N}$ and the corresponding time mesh

[TABLE]

a deterministic one-step numerical method for the solution of (1) is a numerical flow map $\Psi^{\tau}\colon\mathbb{R}^{d}\to\mathbb{R}^{d}$ that generates approximations $\widetilde{u}_{k}\approx u_{k}\coloneqq u(t_{k})$ by the recursion $\widetilde{u}_{k}\coloneqq\Psi^{\tau}(\widetilde{u}_{k-1})$ ; note that $u_{k}=\Phi^{\tau}(u_{k-1})$ . A key property of the numerical method is its global order of convergence, i.e. the largest $q\geq 0$ such that, for some constant $C=C(T)$ , independent of $\tau$ ,

[TABLE]

As a modelling choice, epistemic stochasticity can be introduced into the numerical solution of (1) on the basis that, while the exact solution satisfies

[TABLE]

for all $k\in[K-1]$ , the only information available about the values of the solution off the time mesh comes from the numerical solution on the mesh, and so the integrand $f(u(s))$ is not exactly accessible. This uncertainty is relevant in the setting where, given a large-scale mathematical model, it may be more statistically informative to spend computational resources on solving a differential equation-based model many times on a coarser grid than on solving the same model a few times on a finer grid. This is often the case in forward uncertainty quantification (Smith, 2014; Sullivan, 2015), inverse problems (Kaipio and Somersalo, 2005; Stuart, 2010), and data assimilation (Law et al., 2015; Reich and Cotter, 2015); the area of multi-level Monte Carlo methods makes particular use of this kind of cost-accuracy tradeoff (Giles, 2015). Furthermore, in many such settings, the quantity of interest is often not the solution of a differential equation-based model, but a functional thereof. In all cases, estimates of the off-mesh uncertainty due to the numerical method can and should be fed forward to estimate the uncertainty in the quantity of interest.

This article is motivated by the work of Conrad et al. (2017), in which one seeks to model the off-mesh uncertainty by considering probabilistic solvers. For the same mesh given in (3), the probabilistic solver of Conrad et al. (2017) involves producing a sequence of random variables $(U_{k})_{k\in[K]}$ according to

[TABLE]

where $\Psi^{\tau}$ is the map associated to the deterministic numerical method, and each $\xi_{k}(\tau)$ is an i.i.d. copy of a random variable $\xi_{0}(\tau)\coloneqq\int_{0}^{\tau}\chi_{0}(s)\,\mathrm{d}s$ , where $\chi_{0}$ is a stochastic process over the time interval $[0,\tau]$ that models the off-mesh behaviour of the unknown function $f(u(s))$ in (5). We refer the reader to Conrad et al. (2017, Figure 2) for a pictorial representation of (6). The process $\chi_{0}$ is introduced so that one can probe the uncertainty induced by the mesh $(t_{k})_{k\in[K]}$ and the underlying solver, and thus explore possible responses of the system to inputs. Comparing (5) and (6), it follows that the random variable $\xi_{k}(\tau)$ is a statistical model for the approximation error $\Phi^{\tau}(u_{k})-\Psi^{\tau}(u_{k})$ .

We emphasise that the additive, state-independent noise model appearing in (6) should be interpreted as providing a prior on the local truncation error (Hairer et al., 2009). A frequent criticism levelled at the field of probabilistic numerical integration is that the statistical properties of the noise $\xi_{k}$ that have been imposed in existing published works do not reflect known prior information about local truncation error. Here we address this issue by considerably weakening the assumptions made on the $\xi_{k}$ . However we anticipate future work in this direction, especially when specific structure on the vector field $f$ is used to further inform the prior. Note also that, in the presence of large amounts of data, we expect posterior contraction and forgetting of the prior; see, e.g., Knapik et al. (2011). Posterior contraction for (6) was demonstrated numerically on a number of examples by Conrad et al. (2017).

In the spirit of (4), the main convergence result (Conrad et al., 2017, Theorem 2.2) yields that, if the vector field $f$ in (1) is globally Lipschitz, if the deterministic numerical method has uniform local truncation of order $q+1$ , and if $\chi_{0}$ is a centred Gaussian process such that the second moment of $\xi_{0}(\tau)$ decays as $\tau^{2p+1}$ for some $p\geq 1$ , then

[TABLE]

This shows that the convergence rate of the probabilistic solver (6) is determined by the convergence of the ‘worst-case error’ of the deterministic method $\Psi^{\tau}$ , and the convergence of the ‘statistical error’ $\xi_{0}$ , as described by the parameters $q$ and $p$ respectively. Choosing $\xi_{0}$ with $p=q$ introduces the maximum amount of solution uncertainty consistent with preserving the order of accuracy of the original deterministic integrator.

It is important to stress that, despite the apparent similarities between (6) and Euler–Maruyama schemes for stochastic differential equations (SDEs) driven by Brownian motion, the analysis of the latter does not directly apply to probabilistic solvers, even though we will borrow some techniques from that field. This is because the variance of $\xi_{0}(\tau)$ for probabilistic solvers of the form (6) is assumed to decay to zero strictly faster than $\tau$ , whereas, for SDEs driven by Brownian motion, the variance is proportional to $\tau$ . A key aspect of this work is to determine how to scale the noise so that the rate of convergence of the underlying deterministic numerical integrator is not affected, yet uncertainty arising from numerical approximation is accounted for.

1.1 Contribution and outline of the paper

The purpose of this paper is to make significant extensions of the convergence analysis of Conrad et al. (2017) for (1). We accomplish this by obtaining stronger error estimates (and hence stronger convergence results) under assumptions on both the underlying differential equation and on the noise model for probabilistic numerical integration that are weaker than their counterparts in Conrad et al. (2017). The convergence results of this paper are of the form

[TABLE]

where $n\in\mathbb{N}$ , $q$ is the order of the numerical method $\Psi^{\tau}$ , $p$ is an exponent of decay in the moments of the random variables $(\xi_{k}(\tau))_{k\in[K]}$ , and $c\geq 0$ is a penalty term in the convergence rate that depends solely on the random variables $(\xi_{k}(\tau))_{k\in[K]}$ . Note that, when $c=0$ and $n=2$ , the convergence rate of $n\min\{p-c,q\}$ on the right-hand side of (8) agrees with that of (7) shown by Conrad et al. (2017), so that the right-hand sides of (7) and (8) differ only in the constant prefactor $C$ . However, because the time supremum is inside the expectation, (8) implies (7). Furthermore, by Markov’s inequality, (8) yields an estimate of the frequentist coverage of the true solution $u$ by the randomised solutions $U$ :

[TABLE]

such estimates are useful in the context of forward uncertainty quantification and inverse problems (Lie et al., 2018).

We emphasize that, in addition to strengthening the form of the convergence results so that the supremum is inside the expectation, we also prove the results in this paper under weaker assumptions on the vector field $f$ , and under weaker assumptions on the noise $\xi_{k}$ , than those employed by Conrad et al. (2017). Specifically we do not assume that $f$ and its derivatives are globally bounded, and we do not assume that the random variables are Gaussian; furthermore in results generalizing (8) we relax the assumption that the random variables are centred, paving the way for future analyses which incorporate specific known structure and bias in the truncation error.

Error estimates like (8) show that the randomised numerical solution has convergence properties that are asymptotically no worse than the deterministic numerical solution. This can be interpreted as saying that the trajectories obtained from the randomised numerical integrator are all equally valid approximations to the solution of the original system, modulo the uncertainty induced by solving in discrete time. This can be useful for many purposes, for example in studying limits on predictability in chaotic systems, as shown for the Lorenz-63 system by Conrad et al. (2017).

After introducing some notation and auxiliary results in Section 2, the rest of the paper is organised as follows. In Section 3, Theorem 3.4 yields (8) for numerical methods of arbitrary order, for vector fields $f$ whose induced flow maps $\Phi^{\tau}$ are globally Lipschitz — including one-sided Lipschitz vector fields — and for collections $(\xi_{k}(\tau))_{k\in[K]}$ of random variables that are independent and centred, but not necessarily Gaussian. Conrad et al. (2017) assumed the vector field $f$ to be globally Lipschitz, and the random variables $(\xi_{k}(\tau))_{k\in[K]}$ were assumed to be i.i.d. centred Gaussian random variables. In Theorem 3.5, we prove a result similar to (8) in which we relax the assumption that the $(\xi_{k}(\tau))_{k\in[K]}$ are independent and that they are centred; the price we pay for these weaker constraints on the noise is a stronger decay assumption, with respect to the time-step, on the second moments of the $(\xi_{k}(\tau))_{k\in[K]}$ . We use this assumption in order to introduce the maximal noise that is consistent with retaining the rate of convergence of the underlying deterministic numerical integrator.

In Section 4, we further weaken the conditions on the vector field $f$ , by considering locally Lipschitz vector fields that satisfy a polynomial growth condition. In Theorem 4.2, we show that, under the assumption that the $(\xi_{k}(\tau))_{k\in[K]}$ are almost surely bounded, we can again obtain (8). In Theorem 4.5, we remove the almost-sure boundedness condition, but add the assumption that the vector field $f$ satisfies a generalised dissipativity condition.

In Section 5 we discuss a continuous-time analogue of (6), and show how convergence results of the form (8) can be obtained. We also show that there exists a nonempty set of random variables (or more generally, stochastic processes) that satisfy the regularity assumptions on the random variables $(\xi_{k}(\tau))_{k\in[K]}$ used throughout this paper.

Proofs of the results may be found in Appendix A.

1.2 Review of probabilistic numerical methods

Continuous relationships such as ODEs and PDEs are commonplace as forward models in uncertainty quantification problems, or as Bayesian likelihoods in modern statistical inverse problems (Kaipio and Somersalo, 2005; Stuart, 2010), and in particular in data assimilation algorithms with critical everyday applications such as numerical weather prediction (Law et al., 2015; Reich and Cotter, 2015). The use of a discretised solver for such forward models is usually unavoidable in practice, but introduces an additional source of uncertainty both into forward propagation of uncertainty and into subsequent inferences. While the solution to the ODE/PDE may not be random in the frequentist sense, it is nonetheless only imperfectly known through the discretised numerical solution. Probability in the subjective or Bayesian sense is one appropriate means of representing this epistemic uncertainty, particularly if the ODE/PDE solution forms part of the forward model in a Bayesian inverse problem. Failure to properly account for discretisation errors and uncertainties can lead to biased, inconsistent, and over-confident inferences (Conrad et al., 2017).

Probabilistic numerical solutions of problems such as the solution of ODEs have a long history. Modern foundations for this field were laid by the work of Diaconis (1988), O’Hagan (1992), and Skilling (1992) under the term of “Bayesian numerical analysis”. More recently, such ideas have received renewed attention under the term “probabilistic numerics” (Hennig et al., 2015; Cockayne et al., SIAM Rev., to appear): the discussion of probabilistic numerical methods for ordinary differential equations given by Schober et al. (2014); Conrad et al. (2017); Chkrebtii et al. (2016), and Teymur et al. (2018) is particularly relevant here. Also of interest in the field of probabilistic numerics, but not directly relevant to the present work, are probabilistic numerical methods for linear algebra (Hennig, 2015), optimisation (Gonzalez et al., 2016), partial differential equations (Cockayne et al., 2017; Owhadi, 2015, 2017; Wang et al., 2018), and quadrature (Briol et al., 2015). In particular, Cockayne et al. (SIAM Rev., to appear) sets out some axiomatic foundations for probabilistic numerical methods broadly conceived, and in particular what it means for a probabilistic numerical method to be “Bayesian”.

Randomised solutions of ODEs have also been studied in the context of stochastic or rough differential equations. In the case of non-autonomous ODEs driven by Carathéodory vector fields — i.e. vector fields that are locally integrable in time and continuous in the state space — it has been observed that randomised Euler and Runge–Kutta methods outperform their deterministic counterparts: see e.g. Stengle (1990); Jentzen and Neuenkirch (2009), and Kruse and Wu (2017) and the references therein.

We note that analysing the convergence properties of numerical solutions to (1) in terms of the approximation error for the solution, as in (7) and (8), is very much in the spirit of classical numerical analysis. For uncertainty quantification of the discretised solution of (1) as a stand-alone forward problem, this viewpoint is often sufficient. However, for applications to inverse problems and data assimilation, in which the numerical solution of the (1) is used to (approximately) evaluate the data misfit or likelhood, an alternative paradigm is to directly examine the impact of discretisation upon the quality of later inferences using e.g. Bayes factors (Capistrán et al., 2016; Christen, 2017). There is also the well-established literature of information-based complexity and average-case analysis, with its greater emphasis on algorithmic aspects such as computational cost and optimal accuracy for given classes of information (Novak, 1988; Ritter, 2000; Traub and Woźniakowsi, 1980; Traub et al., 1983).

2 Setup and notation

Let $(\Omega,\mathcal{F},\mathbb{P})$ be a probability space sufficiently rich to serve as a common domain of definition for all the random variables and processes under consideration, and let $\mathbb{E}$ denote expectation with respect to $\mathbb{P}$ . The space of $s$ th-power integrable random variables over $(\Omega,\mathcal{F},\mathbb{P})$ will be denoted $L^{s}_{\mathbb{P}}$ . The scalars $C$ , $C^{\prime}$ , etc. denote non-negative constants whose value may change from occurence to occurence, but are independent of the time step $\tau>0$ . $\mathop{\mathrm{Lip}}\nolimits(\Phi)$ denotes the best Lipschitz constant of $\Phi\colon\mathbb{R}^{d}\to\mathbb{R}^{d}$ :

[TABLE]

for all $x,y\in\mathbb{R}^{d}$ . We let $\mathbb{N}$ denote the natural numbers beginning with $1$ , and $\mathbb{N}_{0}\coloneqq\mathbb{N}\cup\{0\}$ . We shall sometimes abuse notation and write $[K]\coloneqq\{0,1,\dotsc,K-1\}$ or $[K]\coloneqq\{1,2,\dotsc,K\}$ , and we shall write $u_{k}\coloneqq u(t_{k})\equiv\Phi^{\tau}(u_{k-1})$ for the value of the exact solution to (1) at time $t_{k}$ . We denote the minimum of a pair of real numbers $a$ and $b$ by $a\wedge b=\min\{a,b\}$ .

It will be assumed throughout that $T>0$ is a fixed, deterministic time, and that $f$ in (1) is sufficiently smooth such that (1) has a unique solution for every initial condition $u_{0}$ . The flow map $\Phi^{t}$ associated to (1) is defined in (2), and the output of a one-step deterministic numerical integration method for a given $x$ and time step $\tau$ will be given by $\Psi^{\tau}(x)$ . This setting encompasses many of the time-stepping methods in common use, such as Runge–Kutta methods of all orders.

The analysis of this paper will make repeated use of several useful inequalities, which are collected here for reference. First, recall Young’s inequality: for any $\delta>0$ and any pair of Hölder conjugate exponents $r,r^{\ast}>1$ ,

[TABLE]

Combining that inequality for $r=r^{\ast}=2$ with the Cauchy–Schwarz inequality in $\mathbb{R}^{d}$ yields

[TABLE]

which will often be used either with $\delta=1$ or $\delta=\tau$ .

The following discrete-time version of Grönwall’s inequality (Holte, 2009) will also be useful:

Theorem 2.1

Let $(x_{k})_{k\in\mathbb{N}_{0}}$ , $(\alpha_{k})_{k\in\mathbb{N}_{0}}$ , and $(\beta_{k})_{k\in\mathbb{N}_{0}}$ be non-negative sequences. If, for all $k\in\mathbb{N}_{0}$ ,

[TABLE]

then $x_{k}\leq A\exp\left(\sum_{0\leq j<k}\beta_{j}\right)$ for all $k\in\mathbb{N}_{0}$ .

For completeness, we state the following lemma.

Lemma 1

Let $x,y\geq 0$ , $n\in\mathbb{N}$ , and $\delta>0$ . Then

[TABLE]

We shall also use the following inequality, which is valid for arbitrary $N\in\mathbb{N}$ and $m\geq 1$ : for all $\{s_{j}\}_{j\in[N]}\in\mathbb{R}^{N}$ ,

[TABLE]

This follows from

[TABLE]

where we used Jensen’s inequality in the second inequality.

3 High-order integration of Lipschitz flows

The purpose of this section is to establish, given the initial value problem (1), the strong convergence result (8) for probabilistic solvers of the form (6), under the following assumptions.

Assumption 3.1

The vector field $f$ admits $0<\tau^{\ast}\leq 1$ and $C_{\Phi}\geq 1$ , such that for $0<\tau<\tau^{\ast}$ , the flow map $\Phi^{\tau}$ defined by (2) is globally Lipschitz with Lipschitz constant $\mathop{\mathrm{Lip}}\nolimits(\Phi^{\tau})\leq 1+C_{\Phi}\tau$ .

As is well known, Assumption 3.1 holds if the generating vector field $f$ is itself globally Lipschitz. However, Assumption 3.1 holds if, for instance, $f$ merely satisfies the one-sided Lipschitz inequality

[TABLE]

for some constant $\mu\in\mathbb{R}$ ; in this case, a calculation of $\frac{\mathrm{d}}{\mathrm{d}t}\|u(t)-v(t)\|^{2}$ for trajectories $u$ and $v$ starting at initial conditions $u_{0},v_{0}\in\mathbb{R}^{d}$ and an application of the differential version of Grönwall’s inequality shows that $\|u(t)-v(t)\|\leq\exp(\mu|t|)\|u_{0}-v_{0}\|$ , so that $\mathop{\mathrm{Lip}}\nolimits(\Phi^{t})\leq 1+2|\mu||t|$ for small $|t|$ .

Assumption 3.2

The numerical method $\Psi^{\tau}$ has uniform local truncation error of order $q+1$ : for some constant $C_{\Psi}\geq 1$ that does not depend on $\tau$ ,

[TABLE]

Assumption 3.2 holds, in particular, for single- and multi-step methods derived from a $q$ -times continuously differentiable vector field $f$ with bounded $q$ th derivatives (Hairer et al., 2009, Section III.2). Imposing global bounds on the derivatives of $f$ , and therefore on those of $\Phi^{\tau}$ , forces us to consider a smaller class of flow maps $\Phi^{\tau}$ than the class of flow maps that satisfy Assumption 3.1. We may alleviate this problem by weakening Assumption 3.2 to a bound of the form

[TABLE]

with the consequence that the dependence of $C^{\prime}(u)$ on $u$ must be specified; this dependence will vary according to the chosen numerical method $\Psi^{\tau}$ . Moreover, whenever we apply (13) in place of Assumption 3.2 with a random variable $U_{k}$ in place of a deterministic $u_{k}$ — as we do below, e.g. in deriving (36) — we will need to ensure that $\mathbb{E}[C^{\prime}(U_{k})]$ is finite, and of the correct order in $\tau$ if necessary. In Section 4, we consider the implicit Euler method for a class of locally Lipschitz flow maps $\Phi^{\tau}$ , obtain an expression for $C^{\prime}(U_{k})$ , and with this expression obtain a bound of the form

[TABLE]

where $U_{k}$ denotes the output of the randomised numerical integrator according to (6), $n\in\mathbb{N}$ , and $C>0$ does not depend on $\tau$ or on $k$ ; see Proposition 2. Note that there is no supremum inside the expectation in the inequality above. However, in this section, we shall apply Assumption 3.2 instead of (13), in order to avoid lengthy analyses that are specific to the choice of numerical method. We make no assumptions about how the integrator $\Psi^{\tau}$ has been derived and treat it as a ‘black box’ satisfying Assumption 3.2.

Assumption 3.3

The random variables $(\xi_{k}(\tau))_{k\in\mathbb{N}}$ admit parameters $p\geq 1$ , $R\in\mathbb{N}\cup\{+\infty\}$ , and $C_{\xi,R}\geq 1$ , independent of $k$ and $\tau$ , such that for all $1\leq r\leq R$ and all $k\in\mathbb{N}$ ,

[TABLE]

Note that we do not assume that the $(\xi_{k}(\tau))_{k\in[K]}$ are identically distributed nor that they are centred. However we will impose these two additional assumptions in Theorem 3.4. The parameter $p$ determines the decay rate of the $r$ th moments of the $(\xi_{k}(\tau))_{k\in[K]}$ , for $1\leq r\leq R$ , while $R$ determines the highest order moment for which the same decay behaviour holds.

Since Assumption 3.3 does not assume that the $\xi_{k}$ are identically distributed or mutually independent, it can hold for the following variant of (6):

[TABLE]

In this setting, we interpret Assumption 3.3 as the condition that the dependence of the moments of $\xi_{k}$ on the state $U_{k}$ , can be uniformly controlled by the constant $C_{\xi,R}$ . We leave a more extensive investigation of state-dependent noise models for future work.

It follows from (12) and Assumption 3.3 that, for $v,w\in\mathbb{N}$ ,

[TABLE]

This is because

[TABLE]

where we used (12) and Assumption 3.3 for the first and second inequality respectively.

As noted in the introduction, the focus of this paper is on the convergence rate of the error $e_{k}\coloneqq u_{k}-U_{k}$ and not on, say, the covariance operator of $e_{k}$ , though that information is also important in applications. Note that if $\xi_{k}(\tau)$ in Assumption 3.3 does not belong to $L^{2}_{\mathbb{P}}$ , then $\xi_{k}(\tau)$ does not admit a covariance operator. Accordingly, Assumption 3.3 and similar assumptions later in the paper are only upper bounds, and we do not actually work with the covariance operator of $\xi_{k}$ . The precise construction of stochastic models for discretisation and truncation error is an interesting topic in its own right at the interface of numerical analysis and probability, upon which this paper only starts to touch; we anticipate that there will be further research concerning this question.

Given $e_{k}=u_{k}-U_{k}$ , it follows from (5) and (6) that

[TABLE]

We shall use the decomposition (15) throughout this article.

The next result is stronger than Conrad et al. (2017, Theorem 2.2), as the discrete time supremum is inside the expectation, and as it does not require the vector field $f$ to be globally Lipschitz nor $\xi$ to be Gaussian:

Theorem 3.4

Suppose Assumptions 3.1 and 3.2 hold, and fix $u_{0}=U_{0}$ . Furthermore, if it holds that $X\in L^{2}_{\mathbb{P}}\implies\Psi^{\tau}(X)\in L^{2}_{\mathbb{P}}$ , and if the $(\xi_{k}(\tau))_{k\in[K]}$ have zero mean, are mutually independent, and satisfy Assumption 3.3 for $R=2$ and $p\geq 1$ , then there exists $C>0$ that does not depend on $\tau$ such that

[TABLE]

In contrast to Theorem 3.4, which required that the $(\xi_{k}(\tau))_{k\in[K]}$ be independent and centred in order to construct a martingale, we make no independence or centredness assumptions on the $(\xi_{k}(\tau))_{k\in[K]}$ for the rest of this article. The following result should be compared to Theorem 3.4 by considering the case $R=n=2$ . Then for the randomised method to have the same order as the deterministic method on which it is based, we need that $p\geq q+\tfrac{1}{2}$ . In other words, if we remove the assumptions on the $(\xi_{k}(\tau))_{k\in[K]}$ of independence and centredness, then we require that the second moments of the $(\xi_{k}(\tau))_{k\in[K]}$ decay to zero with time-step $\tau$ at a faster rate than in Theorem 3.4, since the lower bound $q+\tfrac{1}{2}$ on $p$ implied by Theorem 3.5 is larger than the lower bound $q$ on $p$ implied by Theorem 3.4.

Theorem 3.5

Let $n\in\mathbb{N}$ . Suppose that Assumptions 3.1, 3.2, and 3.3 hold with $\tau^{\ast}\leq 1$ , $q\geq 1$ , $p\geq 1$ , and $R$ , and that $u_{0}=U_{0}$ . Then, there exists a $\overline{C}>0$ that does not depend on $\tau$ such that for $0<\tau<\tau^{\ast}$ ,

[TABLE]

where

[TABLE]

and $C_{\Phi}(n,\tau^{\ast})$ is defined according to (39).

We shall show that if we strengthen Assumption 3.3 by allowing for arbitrarily large $R\in\mathbb{N}$ , then the moment generating function of $\max_{\ell\in[K]}\|e_{\ell}\|^{n}$ is finite on $\mathbb{R}$ .

Corollary 1

Fix $n\in\mathbb{N}$ . Suppose that Assumptions 3.1 and 3.2 hold, and that Assumption 3.3 holds with $R=+\infty$ and $p\geq 1/2$ . Then, for all $0<\tau<\tau^{\ast}$ and all $\rho\in\mathbb{R}$ ,

[TABLE]

Hence, by Markov’s inequality, the distribution of

$\max_{\ell\in[K]}\|e_{\ell}\|^{n}$ concentrates exponentially about its

mean.

We close this section by noting that, while we have made no attempt to find the optimal constants in Theorem 3.4 and Theorem 3.5, the convergence orders in these results cannot be improved at the present level of generality. This is because the convergence order of the randomised solution cannot exceed that of the underlying deterministic solver, unless the random variables $\xi_{k}(\tau)$ used to model the error $\Phi^{\tau}(u_{k})-\Psi^{\tau}(u_{k})$ at each time step $t_{k}$ are chosen to achieve this effect. We leave the construction of such randomised solvers for future work.

4 Integration for locally Lipschitz vector fields

This section considers the numerical integration of vector fields $f$ that satisfy the following polynomial growth condition.

Assumption 4.1

The vector field $f$ is continuously differentiable, and both $f$ and the associated map $\Phi^{\tau}$ defined by (2) admit $0<\tau^{\ast}\leq 1$ , $C_{\Phi}\geq 1$ , and $s\geq 1$ , such that the following inequalities hold for all $a,b\in\mathbb{R}^{d}$ and all $0<\tau<\tau^{\ast}$ :

[TABLE]

The inequality (20a) implies

[TABLE]

By Taylor’s theorem, the remainder term $R^{\tau}(a)$ in the first-order Taylor expansion (45) of $\Phi^{\tau}(a)$ is given by the derivatives of $f$ , evaluated at some $a\in\mathbb{R}^{d}$ for some $0\leq t\leq\tau$ . The condition (20b) means that for some $\tau^{\ast}>0$ that is sufficiently small, the norm of the difference between two remainder terms can be controlled. The growth condition (20a) is not new; see for example Higham et al. (2002, Assumption 4.1).

The following result is analogous to Theorem 3.5. It states that we can replace Assumption 3.1 with Assumption 4.1 and obtain the same result as Theorem 3.5, provided that the $(\xi_{k}(\tau))_{k\in[K]}$ are $\mathbb{P}$ -a.s. bounded.

Theorem 4.2

Suppose that Assumptions 4.1, 3.2, and 3.3 hold for $p$ and $R$ as in Theorem 3.5. Suppose that $u_{0}=U_{0}$ . If the $(\xi_{k}(\tau))_{k\in[K]}$ are $\mathbb{P}$ -a.s. uniformly bounded over all $k$ by a positive scalar that is $O(\tau)$ , then the conclusions of Theorem 3.5 hold.

It is of theoretical interest to determine whether there exists a deterministic numerical method $\Psi$ such that the randomised version given by (6) has the same order even when each $\xi_{k}(\tau)$ is not $\mathbb{P}$ -a.s. bounded. In the remainder of this section, we shall show that for the implicit Euler method $\Psi^{\tau}\colon\mathbb{R}^{d}\to\mathbb{R}^{d}$ defined by

[TABLE]

the randomised version given by (6) has order 1, under the following dissipativity assumption.

Assumption 4.3

The function $f$ admits parameters $\alpha\geq 0$ and $\beta\in\mathbb{R}$ such that

[TABLE]

Assumption 4.3 is more general than the usual dissipativity property found in Humphries and Stuart (1994, Equation (1.2)) because $\beta$ may assume positive values. The sign of $\beta$ in (23) plays an important role in the behaviour of the solution $u$ of (1), as well as in numerical methods for solving for $u$ . For example, if $\beta$ is positive, then the problem (1) may be stiff. In this paper, we study only the rate of convergence, and leave the issue of stiffness for future work. In particular, allowing for positive $\beta$ poses no problem for establishing moment bounds, as we show in Lemma 2.

Recent studies in numerical methods for stochastic differential equations consider constraints on the drift that feature the same right-hand side as (23), e.g. Fang and Giles (2016) and Mao and Szpruch (2013). We reiterate, however, that the analysis of numerical methods for stochastic differential equations cannot be applied to probabilistic solvers of the form (6), because of the different behaviour in the additive noise (see e.g. Assumption 3.3).

Assumption 4.4

Let $\tau^{\ast}\leq 1$ be as in Assumption 4.1 and $\beta\in\mathbb{R}$ be as in Assumption 4.3. Then there exists some $0<\tau^{\prime}\leq\min\{\tau^{\ast},(2|\beta|)^{-1}\}$ such that there exists a solution $\Psi^{\tau}(a)$ to the implicit equation (22) for every $0\leq\tau\leq\tau^{\prime}$ , such that the solution $\Psi^{\tau}(a)$ varies continuously as a function of $\tau$ in the interval $0\leq\tau\leq\tau^{\prime}$ , and such that $\left.\Psi^{\tau}\right|_{\tau=0}(a)=a$ .

Note that Assumption 4.4 is weaker than assuming unique solvability of (22) for every $a\in\mathbb{R}^{d}$ over a sufficiently small time interval.

Unless otherwise specified, we shall assume hereafter that $0<\tau<\tau^{\prime}$ .

4.1 Moment bounds for implicit Euler

Lemma 2

Suppose that Assumptions 4.1, 4.3, and 4.4 hold, and let $n\in\mathbb{N}$ be arbitrary. Given a fixed, deterministic $U_{0}$ , the following holds uniformly in $\omega\in\Omega$ :

[TABLE]

for $C_{2}$ given in (44) below.

Note that Lemma 2 is the only statement for which we directly use Assumption 4.3. The following results depend on Assumption 4.3 only insofar as they depend on the conclusions of Lemma 2.

Proposition 1

Suppose that Assumptions 4.1, 4.4, and 4.3 hold, and let $n\in\mathbb{N}$ be arbitrary. If Assumption 3.3 holds for some $R\geq 2n$ and some $p\geq 1$ , then

[TABLE]

for $C_{2}$ defined in (44), and $C_{\xi,R}$ in Assumption 3.3.

Proof

The statement follows directly from the conclusion (24) of Lemma 2 and (14) with $w=2$ and $v=n$ .

Corollary 2

Suppose that Assumptions 4.1, 4.4, 4.3, and 3.3 hold with $R=+\infty$ and $p\geq 1/2$ . Then

[TABLE]

Proof

The result follows from Proposition 1, the series expansion of the exponential, and the dominated convergence theorem.

Lemma 2 shows that whenever Assumption 4.3 holds, then regardless of the growth behaviour of $f$ , the randomised implicit Euler method has the property that if $X\in L^{R}_{\mathbb{P}}$ for some $R\in\mathbb{N}$ , then $\Psi^{\tau}(X)\in L^{R}_{\mathbb{P}}$ as well; cf. the hypothesis on $\Psi^{\tau}$ in Theorem 3.4.

4.2 Convergence in discrete time for implicit Euler

Proposition 2

Let $n\in\mathbb{N}$ , and suppose that Assumptions 4.1 and 3.3 hold for some $R\geq 2n(2s+1)$ and some $p\geq 1$ . Then there exists a scalar $C_{\Psi}>0$ that does not depend on $\tau$ or $k\in[K]$ , such that for all $k\in[K]$ ,

[TABLE]

with $C_{\Psi}$ as in (50).

Proposition 2 shows that when $f$ satisfies the polynomial growth condition and $\Psi$ is the implicit Euler method, then the local truncation error at step $k$ of the randomised numerical integrator satisfies a bound analogous to that in Assumption 3.2, provided that the random variables $(\xi_{k}(\tau))_{k\in[K]}$ are sufficiently regular.

Theorem 4.5

Let $n\in\mathbb{N}$ , and let $\Psi^{\tau}$ be given by (22). Suppose that Assumptions 4.1, 4.3, and 4.4 hold, with parameters $s\geq 1$ and $\tau^{\prime}>0$ . Suppose that Assumption 3.3 holds with $R\geq 2n(2s+1)$ and $p\geq\tfrac{3}{2}$ . Then there exists some $C>0$ that does not depend on $\tau$ such that for $0<\tau<\tau^{\prime}$ ,

[TABLE]

Note that the condition $p\geq\tfrac{3}{2}$ is the same condition $p\geq q+\tfrac{1}{2}$ on $p$ in Theorem 3.5, since the implicit Euler method has order $q=1$ .

4.3 Alternative decomposition of the error

The decomposition (15) of the error $e_{k+1}$ was used to derive the convergence results above. One might consider instead using the decomposition

[TABLE]

with the goal of using some stability properties of the implicit Euler method. However, this approach leads to a convergence result that is weaker, either because it requires exponential integrability of $\|U_{k}\|$ , or because the convergence is uniform only on a proper subset $\Omega_{\tau}$ of the event space $\Omega$ . Recall that we do not assume any of the $\xi_{k}(\tau)$ to be a.s. bounded.

By (10) and the fact that implicit Euler has order one (i.e. Assumption 3.2)

[TABLE]

where one can show, using the proof of Proposition 2, that $C>0$ in (27) depends on $\|u_{k}\|^{s}$ but not on $\tau$ . By (10) we obtain

[TABLE]

Substituting the result above into (27), and assuming that $\tau<1$ , we obtain

[TABLE]

The definition (22) of the implicit Euler method and (10) yield

[TABLE]

by Assumption 4.1. Rearranging the above yields

[TABLE]

where $\hat{M}\coloneqq[1+\|\Psi^{\tau}(u_{k})\|^{s}+\|\Psi^{\tau}(U_{k})\|^{s}]^{2}$ is a random variable. Analogously, define the random variable $M$ by

[TABLE]

Suppose that $u_{0}=U_{0}$ are fixed, and define

[TABLE]

Since it is not the case that all of the random variables $(\xi_{k}(\tau))_{k\in[K]}$ are a.s.-bounded, it follows that $\Omega_{\tau}$ is a proper subset of $\Omega$ , for every $\tau>0$ . In what follows, we assume that $\Omega_{\tau}$ is nonempty, and that $\omega\in\Omega_{\tau}$ ; we suppress the $\omega$ –dependence of all random variables. Define $\widetilde{C}>0$ by

[TABLE]

Using (30), we have

[TABLE]

and substituting the above into (28) yields

[TABLE]

Proceeding as in the proof of Theorem 4.5, we use a telescoping sum, Grönwall’s theorem, and Assumption 3.3 with $p\geq q+1/2$ to obtain

[TABLE]

where $\kappa$ depends on $\tau$ according to

[TABLE]

For any $\tau>0$ , it follows from the definition of $\kappa$ , and considering the zeroth order term $3+\widetilde{C}$ above that

[TABLE]

From (29), it follows that, for all $\omega\in\Omega_{\tau}$ , we have

[TABLE]

where the right-hand side increases to infinity as $\tau$ decreases to zero. Thus, it need not be true that the quantity $\mathbb{E}[1_{\Omega_{\tau}}\exp(T\kappa)]$ is finite. One way to ensure that $\mathbb{E}[1_{\Omega_{\tau}}\exp(T\kappa)]$ is finite for $0<\tau<\tau^{\prime}$ would be to require that $\mathbb{E}[\exp(T\kappa)]$ is finite on the same range. By the inequality for $\kappa$ above, a necessary condition for $\mathbb{E}[\exp(T\kappa)]$ to be finite is exponential integrability of $\max_{k\in[K]}\|\Psi^{\tau}(U_{k})\|^{2s}$ . In many cases, a necessary condition for this would be exponential integrability of $\max_{k\in[K]}\|U_{k}\|^{2s}$ . By Corollary 2, in order to guarantee exponential integrability of $\max_{k\in[K]}\|U_{k}\|^{2s}$ , we would need to impose much stronger regularity conditions on the $(\xi_{k}(\tau))_{k\in[K]}$ than those in Theorem 4.5. Finally, we also remark that if the $(\xi_{k})_{k\in[K]}$ are not $\mathbb{P}$ -a.s. uniformly bounded, then for any $\tau>0$ , (31) is a weaker convergence result than Theorem 4.5, since in this case for any $\tau>0$ $\Omega_{\tau}$ will be a proper subset of $\Omega$ .

5 Additional results

5.1 Convergence for continuous-time interpolant

Recall (6) defines the discrete-time process $(U_{k})_{k\in[K]}$ ; in many applications, it is often useful to have a numerical method that provides continuous output, e.g. an inverse problem or data assimilation that requires comparison between the numerical solution and an observation that is not on the time grid $(t_{k})_{k\in[K]}$ defined in (3). Given this time grid $(t_{k})_{k\in[K]}$ , we may define a continuous-time process $U$ by

[TABLE]

For the above definition to work, we assume that each $\xi_{k}$ is a stochastic process defined on the time interval $[0,\tau]$ . In addition, to ensure that the process $U$ has $\mathbb{P}$ -almost surely continuous paths, we require that $\mathbb{P}(\xi_{k}(0)=0)=1$ . The corresponding notion of the error at time $0\leq t\leq T$ is given by $e(t)\coloneqq u(t)-U(t)$ , where $u(t)=\Phi^{t}(u_{0})$ . We emphasise that the continuous-time process $(U(t))_{0\leq t\leq T}$ described above will in general differ from the continuous-time process obtained by linear interpolation of $(U_{k})_{k\in[K]}$ .

We now demonstrate how one can obtain a convergence result for the continuous-time process from a discrete-time convergence result by strengthening the assumption on the noise, using Theorem 3.5 as an example. Consider the following version of Assumption 3.3:

Assumption 5.1

Fix $\tau>0$ . The collection $(\xi_{k})_{k\in\mathbb{N}}$ of stochastic processes $\xi_{k}\colon\Omega\times[0,\tau]\to\mathbb{R}^{d}$ satisfies $\mathbb{P}(\xi_{k}(0)=0)=1$ and admits $p\geq 1$ , $R\in\mathbb{N}\cup\{+\infty\}$ and some $C_{\xi,R}\geq 1$ that do not depend on $k\in\mathbb{N}$ or $\tau$ , such that for all $1\leq r\leq R$ and for all $k\in\mathbb{N}$ ,

[TABLE]

Recall that we do not assume that the $\xi_{k}$ are independent, identically distributed, or centred.

Theorem 5.2

Let $n\in\mathbb{N}$ , and suppose that Assumptions 3.1, 3.2, and 5.1 hold with parameters $\tau^{\ast}$ , $C_{\Phi}$ , $C_{\Psi}$ , $q$ , $C_{\xi,R}$ , $p$ , and $R$ . Then for all $0<\tau<\tau^{\ast}$ ,

[TABLE]

where $\overline{C}$ is defined in (18).

The next result follows from Theorem 5.2 in the same way that Corollary 1 follows from Theorem 3.5.

Corollary 3

Fix $n\in\mathbb{N}$ . Suppose that Assumptions 3.1 and 3.2 hold, and that Assumption 5.1 holds with $R=+\infty$ and $p\geq 1/2$ . Then, for all $0<\tau<\tau^{\ast}$ ,

[TABLE]

Proof

The proof follows by the series representation of the exponential and the dominated convergence theorem; see the proof of Corollary 1.

5.2 Existence of processes that satisfy the $(p,R)$ -regularity condition

The lemma below shows that there exist random variables that are not $\mathbb{P}$ -a.s. bounded, and that satisfy Assumption 3.3 and, more generally, Assumption 5.1 for $R=+\infty$ .

Lemma 3

Let $\tau>0$ and $p\geq 1$ be arbitrary, and let $(B_{t})_{0\leq t\leq\tau}$ be $\mathbb{R}^{d}$ -valued Brownian motion. Then

[TABLE]

satisfies

[TABLE]

Note that variants of the integrated Brownian motion process have been used for modelling local truncation error in other works (Schober et al., 2014; Conrad et al., 2017). However, the point of Lemma 3 is not to suggest that the local truncation error behaves as an integrated Brownian motion, nor even that the integrated Brownian motion process is a suitable model for the local truncation error. The point of Lemma 3 is simply to show that there exist processes that satisfy Assumption 5.1 with $R=+\infty$ . The construction of models that better reflect known properties of the truncation error, for specific classes of vector fields $f$ , is an interesting task that we leave for future work.

Appendix A Proofs

Proof (Proof of Lemma 1)

The assertion (11) holds immediately for $n=1$ , so let $n\in\mathbb{N}\setminus\{1\}$ , and recall the binomial formula: for $x,y\in\mathbb{R}$ and $n\in\mathbb{N}\setminus\{1\}$ ,

[TABLE]

Fix $\delta>0$ . By (9), for any $1\leq k\leq n-1$ ,

[TABLE]

where the second inequality follows from $-\tfrac{k}{n-k}\geq-(n-1)$ . Therefore,

[TABLE]

and the proof is complete upon observing that

[TABLE]

and bounding the other binomial sum in a similar way.∎

Proof (Proof of Theorem 3.4)

By (15),

[TABLE]

By (10) with $\delta=\tau$ , by Assumption 3.1 and Assumption 3.2, and using that $\tau<\tau^{\ast}\leq 1$ ,

[TABLE]

Observe that $[(1+\tau)(1+C_{\Phi}\tau)^{2}-1]\tau^{-1}$ equals a quadratic polynomial in $\tau$ with coefficients $a_{0}$ , $a_{1}$ , and $a_{2}$ . Calculating these coefficients and defining

[TABLE]

then yields that $[(1+\tau)(1+C_{\Phi}\tau)^{2}-1]\tau^{-1}\leq C_{1}$ for all $0<\tau<\tau^{\ast}$ .

Combining the preceding estimates yields

[TABLE]

Using (38) in the telescoping sum

[TABLE]

the fact that $e_{0}=u_{0}-U_{0}=0$ and $K=T/\tau$ , we obtain

[TABLE]

It follows from the last inequality that

[TABLE]

Now replace $\|e_{j}\|^{2}$ on the right-hand side with $\max_{\ell\leq j}\|e_{\ell}\|^{2}$ and take expectations of both sides of the inequality. Since Assumption 3.3 holds with $R=2$ ,

[TABLE]

Next, define for every $k\in[K]$ the $\sigma$ -algebra $\mathcal{F}_{j}$ generated by $\xi_{0}(\tau),\ldots,\xi_{j}(\tau)$ Then the sequence $(\mathcal{F}_{j})_{j\in[K]}$ forms a filtration. Define $(M_{k})_{k\in[K]}$ by

[TABLE]

We want to show that this process is a martingale with respect to $(\mathcal{F}_{j})_{j\in[K]}$ . By (6), $U_{j}$ is measurable with respect to $\mathcal{F}_{j-1}$ , so $M_{k}$ is measurable with respect to $\mathcal{F}_{k}$ . Hence $(M_{k})_{k\in[K]}$ is adapted with respect to $(\mathcal{F}_{k})_{k\in[K]}$ . Observe that

[TABLE]

Using the assumption that $X\in L^{2}_{\mathbb{P}}\implies\Psi^{\tau}(X)\in L^{2}_{\mathbb{P}}$ , (6), Assumption 3.3, and the fact that $U_{0}=u_{0}$ is fixed, it follows that $U_{j}$ and $\Psi^{\tau}(U_{j})$ belong to $L^{2}_{\mathbb{P}}$ ; thus $M_{k}$ belongs to $L^{1}_{\mathbb{P}}$ for every $k\in[K]$ . We now use the assumption that $\mathbb{E}[\xi_{j}(\tau)]=0$ for every $j\in[K]$ , and that the $(\xi_{k}(\tau))_{k\in[K]}$ are mutually independent, in order to establish the martingale property:

[TABLE]

and the right-hand side vanishes since $U_{k}$ is measurable with respect to $\mathcal{F}_{k-1}$ as noted earlier. Since $(M_{k})_{k\in[K]}$ is a martingale, we may apply the Burkholder–Davis–Gundy inequality [Peškir, 1996, Equation (2.2)]. Letting $[Y]_{\ell}$ denote the quadratic variation up to time $\ell$ of a process $Y_{k}$ , we have

[TABLE]

where we define $b\coloneqq\sqrt{\sum_{j=1}^{\ell-1}\|\xi_{j}(\tau)-\xi_{j-1}(\tau)\|^{2}}$ and $a\coloneqq\sqrt{\max_{j\leq\ell}\|\Phi^{\tau}(u_{j})-\Psi^{\tau}(U_{j})\|^{2}}$ . Using (9) with the same $a$ and $b$ , $r=r^{\ast}=2$ , and $\delta=[6(1+\tau)(1+C_{\Phi}\tau)^{2}]^{-1}$ , and using (36), it follows that

[TABLE]

where we applied (10) with $\delta=1$ , $r=r^{\ast}=2$ , $a=\xi_{j}(\tau)$ and $b=\xi_{j-1}(\tau)$ to obtain the last inequality. Thus by Assumption 3.3 and by using $\ell-1\leq K=T/\tau$ ,

[TABLE]

Combining the preceding estimates, we obtain

[TABLE]

and by rearranging terms and using that $\tau<\tau^{\ast}\leq 1$ , we obtain

[TABLE]

By the discrete Grönwall inequality (Theorem 2.1) with $x_{k}\coloneqq\mathbb{E}[\max_{\ell\leq k}\|e_{\ell}\|^{2}]$ and constant $\alpha_{k}$ and $\beta_{j}=2\tau C_{1}$ , and by using that $K=T/\tau$ , we obtain

[TABLE]

This establishes (16).∎

Proof (Proof of Theorem 3.5)

Let $0\leq k\leq K-1$ and $n\in\mathbb{N}$ . By applying the triangle inequality, (11), Assumptions 3.1 and 3.2, and by using that $1+\tau 2^{n-1}\leq 1+2^{n-1}$ (since $\tau\leq 1$ ),

[TABLE]

Observe that, since $2^{n-1}$ and $C_{\Phi}$ are nonnegative, and since $0<\tau<\tau^{\ast}$ ,

[TABLE]

Note that $C_{\Phi}(n,\tau)\leq C_{\Phi}(n,\tau^{\ast})$ .

Since $n\geq 1$ implies that $1+(2/\tau)^{n-1}\leq 2^{n}\tau^{1-n}$ , we have

[TABLE]

Decomposing $\|e_{k+1}\|^{n}-\|e_{0}\|^{n}$ as a telescoping sum, using that $e_{0}=u_{0}-U_{0}=0$ , using the nonnegativity of the summands on the right-hand side of the last inequality, and using the relation $\|e_{\ell}\|^{n}\leq\max_{j\leq\ell}\|e_{j}\|^{n}$ , we obtain

[TABLE]

Using that $K=T\tau$ and Grönwall’s inequality (Theorem 2.1),

[TABLE]

Taking expectations, using (14) with $w=n$ and $v=1$ , and using that $K=T/\tau$ yields

[TABLE]

Rearranging the above produces the desired inequality.∎

Proof (Proof of Corollary 1)

Let $m\in\mathbb{N}$ be arbitrary. Using (40), and applying (12) twice, we obtain

[TABLE]

Taking expectations and using (14) with $w=n$ and $v=m$ , we obtain

[TABLE]

The conclusion follows by the series expansion of the exponential and the dominated convergence theorem.∎

Proof (Proof of Theorem 4.2)

Recall that the solution map $\Phi^{\tau}$ of the initial value problem (1) satisfies

[TABLE]

For any $\tau>0$ and $a,b\in\mathbb{R}^{d}$ , Assumption 4.1 and the integral Grönwall–Bellman inequality yield

[TABLE]

Given the boundedness hypothesis on the $(\xi_{k}(\tau))_{k\in[K]}$ , we may define a finite constant $C>0$ that does not depend on $\tau$ or $k$ , such that

[TABLE]

The rest of the proof follows in a similar manner to that of Theorem 3.5. ∎

Proof (Proof of Lemma 2)

In what follows, we shall omit the dependence of all random variables on $\omega$ , with the understanding that $\omega$ is arbitrary. Let $n\in[K]$ , where $K=T/\tau\in\mathbb{N}$ . From (6) we have, by (9),

[TABLE]

Taking the inner product of (22) with $\Psi^{\tau}(U_{n})$ , we obtain by (23)

[TABLE]

Thus,

[TABLE]

where we used the inequality $1-2|\beta|\tau\leq 1+2\beta\tau$ for the second inequality. Then (41) and (A) yield

[TABLE]

Let $c_{1}(\tau)\coloneqq\tfrac{1+2|\beta|}{1-2|\beta|\tau}$ and $c_{2}(\tau)\coloneqq\tfrac{2\alpha}{1-2|\beta|\tau}$ . By (43), it follows that

[TABLE]

Using the telescoping sum

[TABLE]

it follows that

[TABLE]

Since $n\leq K\coloneqq T/\tau$ , and since the right-hand side of the inequality above is nonnegative,

[TABLE]

Applying the Grönwall inequality (Theorem 2.1), yields, for all $n\in[K]$ ,

[TABLE]

where we define, for $\tau^{\prime}$ as in Assumption 4.4, the scalar

[TABLE]

This yields (24) for $n=1$ . By applying (12), we obtain (24) for arbitrary $n\in\mathbb{N}$ .∎

Proof (Proof of Proposition 2)

Recall that in Assumption 4.1, we assume $f\in C^{1}(\mathbb{R}^{d};\mathbb{R}^{d})$ . Therefore, Taylor’s theorem applied to the function $t\mapsto\Phi^{t}(a)$ yields

[TABLE]

where $R^{\tau}(a)\to 0$ as $\tau\to 0$ . Then, by (20a), (22), and (12),

[TABLE]

By (20a), (22), (21), and (12) with the fact that $C_{\Phi}\geq 1$ in Assumption 4.1, we obtain

[TABLE]

From (A) and (12), it holds that for any $n$ and $r$ such that $nr\geq 1$ ,

[TABLE]

for $\tau^{\prime}$ in Assumption 4.4. Applying the second inequality for the appropriate values of $r$ and computing exponents yields that, for the polynomials $\pi_{1}$ , $\pi_{2}$ and $\pi$ defined on $\mathbb{R}$ by

[TABLE]

and $\pi(x)\coloneqq\pi_{1}(x)\pi_{2}(x)$ , it follows from Lemma 2 that

[TABLE]

Taking expectations, applying Proposition 1, and using that $\tau<\tau^{\prime}$ to bound the right-hand side of the inequality (25) in Proposition 1, we may define some $C_{3}=C_{3}(\alpha,\beta,C_{\Phi},\tau^{\prime},n)$ that does not depend on $k$ or $\tau$ , such that

[TABLE]

By Proposition 1, the finiteness of $C_{3}$ follows from the hypothesis $R\geq 2n(2s+1)$ and the observation that $\pi_{1}(x^{2})$ and $\pi_{2}(x^{2})$ have degree $ns$ and $n(s+1)$ in $x^{2}$ , respectively.

Now it remains to show that $\|R^{\tau}(U_{k})\|^{2n}\in L^{1}_{\mathbb{P}}$ . From (20a), (20b), and (45), we obtain

[TABLE]

By the triangle inequality and (48),

[TABLE]

Then by applying (12) and Proposition 1 with the hypothesis that $R\geq 2n(2s+1)\geq 2n(s+1)$ , and using the bound $\tau<\tau^{\prime}$ , it follows that we may define a positive scalar $C_{4}$ that does not depend on $k$ or $\tau$ , such that

[TABLE]

Therefore, with $C_{3}$ and $C_{4}$ as in (47) and (49) above, (46) yields

[TABLE]

as desired.∎

The proof below makes clear that we make absolutely no effort to find optimal constants.

Proof (Proof of Theorem 4.5)

Let $n\in\mathbb{N}$ . By (11)

[TABLE]

Since $\tau\leq 1$ and $n\geq 1$ , it holds that $1+\tau^{1-2n}2^{2n-1}\leq\tau^{1-2n}(1+2^{2n-1})$ and $1+\tau 2^{2n-1}\leq 1+2^{2n-1}$ . Using these inequalities, (11), and (20b) in the preceding inequality, we obtain

[TABLE]

Using (11) again, we obtain

[TABLE]

so that by defining

[TABLE]

we have

[TABLE]

and, therefore,

[TABLE]

By nonnegativity of $C_{5}$ , it follows that $[(1+\tau C_{5})^{2n+1}-1]\tau^{-1}$ is a polynomial of degree $2n$ in $\tau$ with positive coefficients. In particular, if we recall the definition of $C_{5}$ and define $C_{6}$ by

[TABLE]

then $C_{6}$ does not depend on $\tau$ , $[(1+\tau C_{5})^{2n+1}-1]\tau^{-1}\leq C_{6}$ for all $0<\tau<\tau^{\prime}$ , and

[TABLE]

By the telescoping sum associated to $\|e_{k+1}\|^{2n}-\|e_{k}\|^{2n}$ , the fact that $e_{0}=0$ , the bound $1+2^{2n-1}\leq 2^{2n}$ , the nonnegativity of the terms on the right-hand side of the inequality above, and the bound $\|e_{j}\|\leq\max_{\ell\leq j}\|e_{\ell}\|$ , we obtain

[TABLE]

By Lemma 2,

[TABLE]

which implies that

[TABLE]

Define

[TABLE]

Since $C_{\Phi},C_{1}\geq 1$ , it follows that $2^{4n}\leq C_{7}$ ,and by Grönwall’s inequality (Theorem 2.1) we obtain

[TABLE]

Taking expectations completes the proof, provided that we can ensure each sum is of the right order in $\tau$ . By Proposition 2 with the hypothesis that $R\geq 2n(2s+1)$ , and by Assumption 3.3,

[TABLE]

Thus we need $p-1/2\geq 1$ to hold. Next, using the bound $\|e_{\ell}\|\leq\max_{j\in[K]}\|e_{j}\|$ , Young’s inequality (9) with $a=(\sum^{K}_{i=1}\|\xi_{i}(\tau)\|^{2})^{ns}$ , $b=\|e_{\ell}\|^{2n}$ , and some $\delta>0$ and conjugate exponent pair $(r,r^{\ast})\in(1,\infty)^{2}$ to be determined later, we obtain with (14) that

[TABLE]

Since $R\geq 2n(2s+1)$ , the maximal value of $r$ compatible with integrability of $(\sum^{K}_{i=1}\|\xi_{i}(\tau)\|^{2})^{nrs}$ is $r=2+s^{-1}$ . Since we are not interested in optimal estimates, we shall set $r=r^{\ast}=2$ and $\delta=\tau^{-n(2+s)}$ . We thus obtain

[TABLE]

For the exponent of $\tau$ of the first term in the parentheses, we want to ensure that $-n(2+s)+2p(2ns)-ns\geq 2n$ , or equivalently that $p\geq\tfrac{1}{s}+\tfrac{1}{2}$ . Comparing this condition with the condition $p-\tfrac{1}{2}\geq 1$ that arose from (53), and recalling that $s\geq 1$ , we observe that if $p\geq\tfrac{3}{2}$ , then the preceding estimates yield

[TABLE]

It remains to bound $\mathbb{E}[\max_{\ell\in[K]}\|e_{\ell}\|^{4n}]$ by a constant that does not depend on $\tau$ . By (12), Proposition 1, and the assumption that $\tau<\tau^{\prime}$ for $\tau^{\prime}$ in Assumption 4.4, we obtain

[TABLE]

where $C_{8}=C_{8}(C_{2},C_{\xi,R},n,p,\tau^{\prime},T,(u_{t})_{0\leq t\leq T})>0$ does not depend on $\tau$ . Note that in applying Proposition 1, we have used that $s\geq 1$ for the exponent $s$ in Assumption 4.1, since this implies that $2n(2s+1)\geq 4n$ . ∎

Proof (Proof of Theorem 5.2)

Let $k\in[K]$ and $t_{k}<t\leq t_{k+1}$ . Then

[TABLE]

and given that Assumption 3.1 implies that $\Phi^{t^{\prime}}$ is Lipschitz on $\mathbb{R}^{d}$ for every $t^{\prime}\geq 0$ ,

[TABLE]

by applying (12). Since $t-t_{k}\leq\tau$ , it follows from the inequality above that

[TABLE]

By Assumption 5.1,

[TABLE]

Note that Assumption 5.1 is stronger than Assumption 3.3. Therefore we may apply Theorem 3.5 to obtain (32).∎

Proof (Proof of Lemma 3)

If $r=0$ , then the desired statement follows immediately. Therefore, let $p,r\geq 1$ . Let $\xi_{0}$ be the integrated $\mathbb{P}$ -Brownian motion process scaled by $\tau^{p-1}$ , so that

[TABLE]

where we applied Jensen’s inequality to the uniform probability measure on $[0,t]$ . It follows that

[TABLE]

Above, we used the Fubini–Tonelli theorem to interchange expectation and integration with respect to $s$ , and the fact that $\mathbb{E}\bigl{[}\sup_{t\leq\tau}\|B_{t}\|^{r}\bigr{]}$ is constant with respect to the variable of integration $s$ . For $r=1$ , the Burkholder–Davis–Gundy martingale inequality [Peškir, 1996, Equation (2.2)] yields

[TABLE]

with $(4-r)/(2-r)=3$ for $r=1$ . For $r>1$ , Doob’s inequality [Peškir, 1996, Equation (2.1)] yields

[TABLE]

Since $r\mapsto[r/(r-1)]^{r}$ is continuously differentiable and monotonically decreasing on $2<r<\infty$ , the desired conclusion follows.∎

Bibliography41

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Briol et al. [2015] F.-X. Briol, C. Oates, M. Girolami, and M. A. Osborne. Frank–Wolfe Bayesian quadrature: Probabilistic integration with theoretical guarantees. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28 , pages 1162–1170. Curran Associates, Inc., 2015.
2Capistrán et al. [2016] M. A. Capistrán, J. A. Christen, and S. Donnet. Bayesian analysis of OD Es: solver optimal accuracy and Bayes factors. SIAM/ASA J. Uncertain. Quantif. , 4(1):829–849, 2016. doi: 10.1137/140976777 .
3Chkrebtii et al. [2016] O. A. Chkrebtii, D. A. Campbell, B. Calderhead, and M. A. Girolami. Bayesian solution uncertainty quantification for differential equations. Bayesian Anal. , 11(4):1239–1267, 2016. doi: 10.1214/16-BA 1017 .
4Christen [2017] J. A. Christen. Posterior distribution existence and error control in Banach spaces, 2017. ar Xiv:1712.03299.
5Cockayne et al. [2017] J. Cockayne, C. Oates, T. J. Sullivan, and M. Girolami. Probabilistic numerical methods for PDE-constrained Bayesian inverse problems. In G. Verdoolaege, editor, Proceedings of the 36 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering , volume 1853 of AIP Conference Proceedings , pages 060001–1–060001–8, 2017. doi: 10.1063/1.4985359 .
6Cockayne et al. [SIAM Rev., to appear] J. Cockayne, C. Oates, T. J. Sullivan, and M. Girolami. Bayesian probabilistic numerical methods, SIAM Rev., to appear. ar Xiv:1702.03673.
7Conrad et al. [2017] P. R. Conrad, M. Girolami, S. Särkkä, A. M. Stuart, and K. C. Zygalakis. Statistical analysis of differential equations: introducing probability measures on numerical solutions. Stat. Comput. , 27(4):1065–1082, 2017. ISSN 0960-3174. doi: 10.1007/s 11222-016-9671-0 .
8Diaconis [1988] P. Diaconis. Bayesian numerical analysis. In Statistical Decision Theory and Related Topics, IV, Vol. 1 (West Lafayette, Ind., 1986) , pages 163–175. Springer, New York, 1988.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Strong convergence rates of probabilistic integrators for ordinary differential equations

Abstract

Keywords:

MSC:

1 Introduction

1.1 Contribution and outline of the paper

1.2 Review of probabilistic numerical methods

2 Setup and notation

Theorem 2.1

Lemma 1

3 High-order integration of Lipschitz flows

Assumption 3.1

Assumption 3.2

Assumption 3.3

Theorem 3.4

Theorem 3.5

Corollary 1

4 Integration for locally Lipschitz vector fields

Assumption 4.1

Theorem 4.2

Assumption 4.3

Assumption 4.4

4.1 Moment bounds for implicit Euler

Lemma 2

Proposition 1

Proof

Corollary 2

Proof

4.2 Convergence in discrete time for implicit Euler

Proposition 2

Theorem 4.5

4.3 Alternative decomposition of the error

5 Additional results

5.1 Convergence for continuous-time interpolant

Assumption 5.1

Theorem 5.2

Corollary 3

Proof

5.2 Existence of processes that satisfy the (p,R)(p,R)(p,R)-regularity condition

Lemma 3

Appendix A Proofs

Proof (Proof of Lemma 1)

Proof (Proof of Theorem 3.4)

Proof (Proof of Theorem 3.5)

Proof (Proof of Corollary 1)

Proof (Proof of Theorem 4.2)

Proof (Proof of Lemma 2)

Proof (Proof of Proposition 2)

Proof (Proof of Theorem 4.5)

Proof (Proof of Theorem 5.2)

Proof (Proof of Lemma 3)

5.2 Existence of processes that satisfy the $(p,R)$ -regularity condition