Strong convergence rates of probabilistic integrators for ordinary differential equations
H. C. Lie, A. M. Stuart, T. J. Sullivan

TL;DR
This paper analyzes the convergence rates of probabilistic integrators for ordinary differential equations, showing they match deterministic methods under certain conditions, thus enabling reliable uncertainty quantification.
Contribution
It extends convergence analysis to probabilistic integrators with relaxed assumptions, demonstrating their mean-square convergence rates match deterministic methods.
Findings
Probabilistic integrators achieve the same convergence rates as deterministic ones.
Convergence holds for state-dependent, non-Gaussian, and non-centred random perturbations.
Results apply to high-order integrators for Lipschitz flows and Euler methods for dissipative fields.
Abstract
Probabilistic integration of a continuous dynamical system is a way of systematically introducing model error, at scales no larger than errors introduced by standard numerical discretisation, in order to enable thorough exploration of possible responses of the system to inputs. It is thus a potentially useful approach in a number of applications such as forward uncertainty quantification, inverse problems, and data assimilation. We extend the convergence analysis of probabilistic integrators for deterministic ordinary differential equations, as proposed by Conrad et al.\ (\textit{Stat.\ Comput.}, 2017), to establish mean-square convergence in the uniform norm on discrete- or continuous-time solutions under relaxed regularity assumptions on the driving vector fields and their induced flows. Specifically, we show that randomised high-order integrators for globally Lipschitz flows and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
∎
11institutetext: Han Cheng Lie 22institutetext: Institute of Mathematics, Universität Potsdam, Campus Golm, Haus 9, Karl-Liebknecht Str. 24-25, 14476 Potsdam OT Golm, Germany
https://orcid.org/0000-0002-6905-9903
22email: [email protected] 33institutetext: A. M. Stuart44institutetext: Department of Computing and Mathematical Sciences, California Institute of Technology, 1200 East California Boulevard, Pasadena, CA 91125, United States of America
44email: [email protected] 55institutetext: T. J. Sullivan66institutetext: Freie Universität Berlin, and Zuse Institute Berlin, Takustrasse 7, 14195 Berlin, Germany
66email: [email protected]
Strong convergence rates of probabilistic integrators for ordinary differential equations
††thanks: HCL and TJS are supported by the Freie Universität Berlin within the Excellence Initiative of the German Research Foundation. HCL is supported by the Universität Potsdam. AMS is grateful to DARPA, EPSRC and ONR for funding. This material was based upon work partially supported by the National Science Foundation under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of these funding agencies and institutions.
Han Cheng Lie
A. M. Stuart
T. J. Sullivan
(Received: September 28, 2018 / Accepted: December 28, 2018 / Handling Editor: C. J. Oates This is a post-peer-review, pre-copyedit version of an article published in Statistics and Computing (2019). The final authenticated version is available online at: https://doi.org/10.1007/s11222-019-09898-6.)
Abstract
Probabilistic integration of a continuous dynamical system is a way of systematically introducing discretisation error, at scales no larger than errors introduced by standard numerical discretisation, in order to enable thorough exploration of possible responses of the system to inputs. It is thus a potentially useful approach in a number of applications such as forward uncertainty quantification, inverse problems, and data assimilation. We extend the convergence analysis of probabilistic integrators for deterministic ordinary differential equations, as proposed by Conrad et al. (Stat. Comput., 2017), to establish mean-square convergence in the uniform norm on discrete- or continuous-time solutions under relaxed regularity assumptions on the driving vector fields and their induced flows. Specifically, we show that randomised high-order integrators for globally Lipschitz flows and randomised Euler integrators for dissipative vector fields with polynomially-bounded local Lipschitz constants all have the same mean-square convergence rate as their deterministic counterparts, provided that the variance of the integration noise is not of higher order than the corresponding deterministic integrator. These and similar results are proven for probabilistic integrators where the random perturbations may be state-dependent, non-Gaussian, or non-centred random variables.
Keywords:
probabilistic numerical methods ordinary differential equations convergence rates uncertainty quantification
MSC:
65L20 65C99 37H10 68W20
††journal: Statistics and Computing
1 Introduction
This article concerns the analysis of probabilistic numerical integrators for deterministic initial value problems of the form
[TABLE]
where . Let denote the flow induced by (1), so that
[TABLE]
for all . Given an integration time step such that and the corresponding time mesh
[TABLE]
a deterministic one-step numerical method for the solution of (1) is a numerical flow map that generates approximations by the recursion ; note that . A key property of the numerical method is its global order of convergence, i.e. the largest such that, for some constant , independent of ,
[TABLE]
As a modelling choice, epistemic stochasticity can be introduced into the numerical solution of (1) on the basis that, while the exact solution satisfies
[TABLE]
for all , the only information available about the values of the solution off the time mesh comes from the numerical solution on the mesh, and so the integrand is not exactly accessible. This uncertainty is relevant in the setting where, given a large-scale mathematical model, it may be more statistically informative to spend computational resources on solving a differential equation-based model many times on a coarser grid than on solving the same model a few times on a finer grid. This is often the case in forward uncertainty quantification (Smith, 2014; Sullivan, 2015), inverse problems (Kaipio and Somersalo, 2005; Stuart, 2010), and data assimilation (Law et al., 2015; Reich and Cotter, 2015); the area of multi-level Monte Carlo methods makes particular use of this kind of cost-accuracy tradeoff (Giles, 2015). Furthermore, in many such settings, the quantity of interest is often not the solution of a differential equation-based model, but a functional thereof. In all cases, estimates of the off-mesh uncertainty due to the numerical method can and should be fed forward to estimate the uncertainty in the quantity of interest.
This article is motivated by the work of Conrad et al. (2017), in which one seeks to model the off-mesh uncertainty by considering probabilistic solvers. For the same mesh given in (3), the probabilistic solver of Conrad et al. (2017) involves producing a sequence of random variables according to
[TABLE]
where is the map associated to the deterministic numerical method, and each is an i.i.d. copy of a random variable , where is a stochastic process over the time interval that models the off-mesh behaviour of the unknown function in (5). We refer the reader to Conrad et al. (2017, Figure 2) for a pictorial representation of (6). The process is introduced so that one can probe the uncertainty induced by the mesh and the underlying solver, and thus explore possible responses of the system to inputs. Comparing (5) and (6), it follows that the random variable is a statistical model for the approximation error .
We emphasise that the additive, state-independent noise model appearing in (6) should be interpreted as providing a prior on the local truncation error (Hairer et al., 2009). A frequent criticism levelled at the field of probabilistic numerical integration is that the statistical properties of the noise that have been imposed in existing published works do not reflect known prior information about local truncation error. Here we address this issue by considerably weakening the assumptions made on the . However we anticipate future work in this direction, especially when specific structure on the vector field is used to further inform the prior. Note also that, in the presence of large amounts of data, we expect posterior contraction and forgetting of the prior; see, e.g., Knapik et al. (2011). Posterior contraction for (6) was demonstrated numerically on a number of examples by Conrad et al. (2017).
In the spirit of (4), the main convergence result (Conrad et al., 2017, Theorem 2.2) yields that, if the vector field in (1) is globally Lipschitz, if the deterministic numerical method has uniform local truncation of order , and if is a centred Gaussian process such that the second moment of decays as for some , then
[TABLE]
This shows that the convergence rate of the probabilistic solver (6) is determined by the convergence of the ‘worst-case error’ of the deterministic method , and the convergence of the ‘statistical error’ , as described by the parameters and respectively. Choosing with introduces the maximum amount of solution uncertainty consistent with preserving the order of accuracy of the original deterministic integrator.
It is important to stress that, despite the apparent similarities between (6) and Euler–Maruyama schemes for stochastic differential equations (SDEs) driven by Brownian motion, the analysis of the latter does not directly apply to probabilistic solvers, even though we will borrow some techniques from that field. This is because the variance of for probabilistic solvers of the form (6) is assumed to decay to zero strictly faster than , whereas, for SDEs driven by Brownian motion, the variance is proportional to . A key aspect of this work is to determine how to scale the noise so that the rate of convergence of the underlying deterministic numerical integrator is not affected, yet uncertainty arising from numerical approximation is accounted for.
1.1 Contribution and outline of the paper
The purpose of this paper is to make significant extensions of the convergence analysis of Conrad et al. (2017) for (1). We accomplish this by obtaining stronger error estimates (and hence stronger convergence results) under assumptions on both the underlying differential equation and on the noise model for probabilistic numerical integration that are weaker than their counterparts in Conrad et al. (2017). The convergence results of this paper are of the form
[TABLE]
where , is the order of the numerical method , is an exponent of decay in the moments of the random variables , and is a penalty term in the convergence rate that depends solely on the random variables . Note that, when and , the convergence rate of on the right-hand side of (8) agrees with that of (7) shown by Conrad et al. (2017), so that the right-hand sides of (7) and (8) differ only in the constant prefactor . However, because the time supremum is inside the expectation, (8) implies (7). Furthermore, by Markov’s inequality, (8) yields an estimate of the frequentist coverage of the true solution by the randomised solutions :
[TABLE]
such estimates are useful in the context of forward uncertainty quantification and inverse problems (Lie et al., 2018).
We emphasize that, in addition to strengthening the form of the convergence results so that the supremum is inside the expectation, we also prove the results in this paper under weaker assumptions on the vector field , and under weaker assumptions on the noise , than those employed by Conrad et al. (2017). Specifically we do not assume that and its derivatives are globally bounded, and we do not assume that the random variables are Gaussian; furthermore in results generalizing (8) we relax the assumption that the random variables are centred, paving the way for future analyses which incorporate specific known structure and bias in the truncation error.
Error estimates like (8) show that the randomised numerical solution has convergence properties that are asymptotically no worse than the deterministic numerical solution. This can be interpreted as saying that the trajectories obtained from the randomised numerical integrator are all equally valid approximations to the solution of the original system, modulo the uncertainty induced by solving in discrete time. This can be useful for many purposes, for example in studying limits on predictability in chaotic systems, as shown for the Lorenz-63 system by Conrad et al. (2017).
After introducing some notation and auxiliary results in Section 2, the rest of the paper is organised as follows. In Section 3, Theorem 3.4 yields (8) for numerical methods of arbitrary order, for vector fields whose induced flow maps are globally Lipschitz — including one-sided Lipschitz vector fields — and for collections of random variables that are independent and centred, but not necessarily Gaussian. Conrad et al. (2017) assumed the vector field to be globally Lipschitz, and the random variables were assumed to be i.i.d. centred Gaussian random variables. In Theorem 3.5, we prove a result similar to (8) in which we relax the assumption that the are independent and that they are centred; the price we pay for these weaker constraints on the noise is a stronger decay assumption, with respect to the time-step, on the second moments of the . We use this assumption in order to introduce the maximal noise that is consistent with retaining the rate of convergence of the underlying deterministic numerical integrator.
In Section 4, we further weaken the conditions on the vector field , by considering locally Lipschitz vector fields that satisfy a polynomial growth condition. In Theorem 4.2, we show that, under the assumption that the are almost surely bounded, we can again obtain (8). In Theorem 4.5, we remove the almost-sure boundedness condition, but add the assumption that the vector field satisfies a generalised dissipativity condition.
In Section 5 we discuss a continuous-time analogue of (6), and show how convergence results of the form (8) can be obtained. We also show that there exists a nonempty set of random variables (or more generally, stochastic processes) that satisfy the regularity assumptions on the random variables used throughout this paper.
Proofs of the results may be found in Appendix A.
1.2 Review of probabilistic numerical methods
Continuous relationships such as ODEs and PDEs are commonplace as forward models in uncertainty quantification problems, or as Bayesian likelihoods in modern statistical inverse problems (Kaipio and Somersalo, 2005; Stuart, 2010), and in particular in data assimilation algorithms with critical everyday applications such as numerical weather prediction (Law et al., 2015; Reich and Cotter, 2015). The use of a discretised solver for such forward models is usually unavoidable in practice, but introduces an additional source of uncertainty both into forward propagation of uncertainty and into subsequent inferences. While the solution to the ODE/PDE may not be random in the frequentist sense, it is nonetheless only imperfectly known through the discretised numerical solution. Probability in the subjective or Bayesian sense is one appropriate means of representing this epistemic uncertainty, particularly if the ODE/PDE solution forms part of the forward model in a Bayesian inverse problem. Failure to properly account for discretisation errors and uncertainties can lead to biased, inconsistent, and over-confident inferences (Conrad et al., 2017).
Probabilistic numerical solutions of problems such as the solution of ODEs have a long history. Modern foundations for this field were laid by the work of Diaconis (1988), O’Hagan (1992), and Skilling (1992) under the term of “Bayesian numerical analysis”. More recently, such ideas have received renewed attention under the term “probabilistic numerics” (Hennig et al., 2015; Cockayne et al., SIAM Rev., to appear): the discussion of probabilistic numerical methods for ordinary differential equations given by Schober et al. (2014); Conrad et al. (2017); Chkrebtii et al. (2016), and Teymur et al. (2018) is particularly relevant here. Also of interest in the field of probabilistic numerics, but not directly relevant to the present work, are probabilistic numerical methods for linear algebra (Hennig, 2015), optimisation (Gonzalez et al., 2016), partial differential equations (Cockayne et al., 2017; Owhadi, 2015, 2017; Wang et al., 2018), and quadrature (Briol et al., 2015). In particular, Cockayne et al. (SIAM Rev., to appear) sets out some axiomatic foundations for probabilistic numerical methods broadly conceived, and in particular what it means for a probabilistic numerical method to be “Bayesian”.
Randomised solutions of ODEs have also been studied in the context of stochastic or rough differential equations. In the case of non-autonomous ODEs driven by Carathéodory vector fields — i.e. vector fields that are locally integrable in time and continuous in the state space — it has been observed that randomised Euler and Runge–Kutta methods outperform their deterministic counterparts: see e.g. Stengle (1990); Jentzen and Neuenkirch (2009), and Kruse and Wu (2017) and the references therein.
We note that analysing the convergence properties of numerical solutions to (1) in terms of the approximation error for the solution, as in (7) and (8), is very much in the spirit of classical numerical analysis. For uncertainty quantification of the discretised solution of (1) as a stand-alone forward problem, this viewpoint is often sufficient. However, for applications to inverse problems and data assimilation, in which the numerical solution of the (1) is used to (approximately) evaluate the data misfit or likelhood, an alternative paradigm is to directly examine the impact of discretisation upon the quality of later inferences using e.g. Bayes factors (Capistrán et al., 2016; Christen, 2017). There is also the well-established literature of information-based complexity and average-case analysis, with its greater emphasis on algorithmic aspects such as computational cost and optimal accuracy for given classes of information (Novak, 1988; Ritter, 2000; Traub and Woźniakowsi, 1980; Traub et al., 1983).
2 Setup and notation
Let be a probability space sufficiently rich to serve as a common domain of definition for all the random variables and processes under consideration, and let denote expectation with respect to . The space of th-power integrable random variables over will be denoted . The scalars , , etc. denote non-negative constants whose value may change from occurence to occurence, but are independent of the time step . denotes the best Lipschitz constant of :
[TABLE]
for all . We let denote the natural numbers beginning with , and . We shall sometimes abuse notation and write or , and we shall write for the value of the exact solution to (1) at time . We denote the minimum of a pair of real numbers and by .
It will be assumed throughout that is a fixed, deterministic time, and that in (1) is sufficiently smooth such that (1) has a unique solution for every initial condition . The flow map associated to (1) is defined in (2), and the output of a one-step deterministic numerical integration method for a given and time step will be given by . This setting encompasses many of the time-stepping methods in common use, such as Runge–Kutta methods of all orders.
The analysis of this paper will make repeated use of several useful inequalities, which are collected here for reference. First, recall Young’s inequality: for any and any pair of Hölder conjugate exponents ,
[TABLE]
Combining that inequality for with the Cauchy–Schwarz inequality in yields
[TABLE]
which will often be used either with or .
The following discrete-time version of Grönwall’s inequality (Holte, 2009) will also be useful:
Theorem 2.1
Let , , and be non-negative sequences. If, for all ,
[TABLE]
then for all .
For completeness, we state the following lemma.
Lemma 1
Let , , and . Then
[TABLE]
We shall also use the following inequality, which is valid for arbitrary and : for all ,
[TABLE]
This follows from
[TABLE]
where we used Jensen’s inequality in the second inequality.
3 High-order integration of Lipschitz flows
The purpose of this section is to establish, given the initial value problem (1), the strong convergence result (8) for probabilistic solvers of the form (6), under the following assumptions.
Assumption 3.1
The vector field admits and , such that for , the flow map defined by (2) is globally Lipschitz with Lipschitz constant .
As is well known, Assumption 3.1 holds if the generating vector field is itself globally Lipschitz. However, Assumption 3.1 holds if, for instance, merely satisfies the one-sided Lipschitz inequality
[TABLE]
for some constant ; in this case, a calculation of for trajectories and starting at initial conditions and an application of the differential version of Grönwall’s inequality shows that , so that for small .
Assumption 3.2
The numerical method has uniform local truncation error of order : for some constant that does not depend on ,
[TABLE]
Assumption 3.2 holds, in particular, for single- and multi-step methods derived from a -times continuously differentiable vector field with bounded th derivatives (Hairer et al., 2009, Section III.2). Imposing global bounds on the derivatives of , and therefore on those of , forces us to consider a smaller class of flow maps than the class of flow maps that satisfy Assumption 3.1. We may alleviate this problem by weakening Assumption 3.2 to a bound of the form
[TABLE]
with the consequence that the dependence of on must be specified; this dependence will vary according to the chosen numerical method . Moreover, whenever we apply (13) in place of Assumption 3.2 with a random variable in place of a deterministic — as we do below, e.g. in deriving (36) — we will need to ensure that is finite, and of the correct order in if necessary. In Section 4, we consider the implicit Euler method for a class of locally Lipschitz flow maps , obtain an expression for , and with this expression obtain a bound of the form
[TABLE]
where denotes the output of the randomised numerical integrator according to (6), , and does not depend on or on ; see Proposition 2. Note that there is no supremum inside the expectation in the inequality above. However, in this section, we shall apply Assumption 3.2 instead of (13), in order to avoid lengthy analyses that are specific to the choice of numerical method. We make no assumptions about how the integrator has been derived and treat it as a ‘black box’ satisfying Assumption 3.2.
Assumption 3.3
The random variables admit parameters , , and , independent of and , such that for all and all ,
[TABLE]
Note that we do not assume that the are identically distributed nor that they are centred. However we will impose these two additional assumptions in Theorem 3.4. The parameter determines the decay rate of the th moments of the , for , while determines the highest order moment for which the same decay behaviour holds.
Since Assumption 3.3 does not assume that the are identically distributed or mutually independent, it can hold for the following variant of (6):
[TABLE]
In this setting, we interpret Assumption 3.3 as the condition that the dependence of the moments of on the state , can be uniformly controlled by the constant . We leave a more extensive investigation of state-dependent noise models for future work.
It follows from (12) and Assumption 3.3 that, for ,
[TABLE]
This is because
[TABLE]
where we used (12) and Assumption 3.3 for the first and second inequality respectively.
As noted in the introduction, the focus of this paper is on the convergence rate of the error and not on, say, the covariance operator of , though that information is also important in applications. Note that if in Assumption 3.3 does not belong to , then does not admit a covariance operator. Accordingly, Assumption 3.3 and similar assumptions later in the paper are only upper bounds, and we do not actually work with the covariance operator of . The precise construction of stochastic models for discretisation and truncation error is an interesting topic in its own right at the interface of numerical analysis and probability, upon which this paper only starts to touch; we anticipate that there will be further research concerning this question.
Given , it follows from (5) and (6) that
[TABLE]
We shall use the decomposition (15) throughout this article.
The next result is stronger than Conrad et al. (2017, Theorem 2.2), as the discrete time supremum is inside the expectation, and as it does not require the vector field to be globally Lipschitz nor to be Gaussian:
Theorem 3.4
Suppose Assumptions 3.1 and 3.2 hold, and fix . Furthermore, if it holds that , and if the have zero mean, are mutually independent, and satisfy Assumption 3.3 for and , then there exists that does not depend on such that
[TABLE]
In contrast to Theorem 3.4, which required that the be independent and centred in order to construct a martingale, we make no independence or centredness assumptions on the for the rest of this article. The following result should be compared to Theorem 3.4 by considering the case . Then for the randomised method to have the same order as the deterministic method on which it is based, we need that . In other words, if we remove the assumptions on the of independence and centredness, then we require that the second moments of the decay to zero with time-step at a faster rate than in Theorem 3.4, since the lower bound on implied by Theorem 3.5 is larger than the lower bound on implied by Theorem 3.4.
Theorem 3.5
Let . Suppose that Assumptions 3.1, 3.2, and 3.3 hold with , , , and , and that . Then, there exists a that does not depend on such that for ,
[TABLE]
where
[TABLE]
and is defined according to (39).
We shall show that if we strengthen Assumption 3.3 by allowing for arbitrarily large , then the moment generating function of is finite on .
Corollary 1
Fix . Suppose that Assumptions 3.1 and 3.2 hold, and that Assumption 3.3 holds with and . Then, for all and all ,
[TABLE]
Hence, by Markov’s inequality, the distribution of
concentrates exponentially about its
mean.
We close this section by noting that, while we have made no attempt to find the optimal constants in Theorem 3.4 and Theorem 3.5, the convergence orders in these results cannot be improved at the present level of generality. This is because the convergence order of the randomised solution cannot exceed that of the underlying deterministic solver, unless the random variables used to model the error at each time step are chosen to achieve this effect. We leave the construction of such randomised solvers for future work.
4 Integration for locally Lipschitz vector fields
This section considers the numerical integration of vector fields that satisfy the following polynomial growth condition.
Assumption 4.1
The vector field is continuously differentiable, and both and the associated map defined by (2) admit , , and , such that the following inequalities hold for all and all :
[TABLE]
The inequality (20a) implies
[TABLE]
By Taylor’s theorem, the remainder term in the first-order Taylor expansion (45) of is given by the derivatives of , evaluated at some for some . The condition (20b) means that for some that is sufficiently small, the norm of the difference between two remainder terms can be controlled. The growth condition (20a) is not new; see for example Higham et al. (2002, Assumption 4.1).
The following result is analogous to Theorem 3.5. It states that we can replace Assumption 3.1 with Assumption 4.1 and obtain the same result as Theorem 3.5, provided that the are -a.s. bounded.
Theorem 4.2
Suppose that Assumptions 4.1, 3.2, and 3.3 hold for and as in Theorem 3.5. Suppose that . If the are -a.s. uniformly bounded over all by a positive scalar that is , then the conclusions of Theorem 3.5 hold.
It is of theoretical interest to determine whether there exists a deterministic numerical method such that the randomised version given by (6) has the same order even when each is not -a.s. bounded. In the remainder of this section, we shall show that for the implicit Euler method defined by
[TABLE]
the randomised version given by (6) has order 1, under the following dissipativity assumption.
Assumption 4.3
The function admits parameters and such that
[TABLE]
Assumption 4.3 is more general than the usual dissipativity property found in Humphries and Stuart (1994, Equation (1.2)) because may assume positive values. The sign of in (23) plays an important role in the behaviour of the solution of (1), as well as in numerical methods for solving for . For example, if is positive, then the problem (1) may be stiff. In this paper, we study only the rate of convergence, and leave the issue of stiffness for future work. In particular, allowing for positive poses no problem for establishing moment bounds, as we show in Lemma 2.
Recent studies in numerical methods for stochastic differential equations consider constraints on the drift that feature the same right-hand side as (23), e.g. Fang and Giles (2016) and Mao and Szpruch (2013). We reiterate, however, that the analysis of numerical methods for stochastic differential equations cannot be applied to probabilistic solvers of the form (6), because of the different behaviour in the additive noise (see e.g. Assumption 3.3).
Assumption 4.4
Let be as in Assumption 4.1 and be as in Assumption 4.3. Then there exists some such that there exists a solution to the implicit equation (22) for every , such that the solution varies continuously as a function of in the interval , and such that .
Note that Assumption 4.4 is weaker than assuming unique solvability of (22) for every over a sufficiently small time interval.
Unless otherwise specified, we shall assume hereafter that .
4.1 Moment bounds for implicit Euler
Lemma 2
Suppose that Assumptions 4.1, 4.3, and 4.4 hold, and let be arbitrary. Given a fixed, deterministic , the following holds uniformly in :
[TABLE]
for given in (44) below.
Note that Lemma 2 is the only statement for which we directly use Assumption 4.3. The following results depend on Assumption 4.3 only insofar as they depend on the conclusions of Lemma 2.
Proposition 1
Suppose that Assumptions 4.1, 4.4, and 4.3 hold, and let be arbitrary. If Assumption 3.3 holds for some and some , then
[TABLE]
for defined in (44), and in Assumption 3.3.
Proof
The statement follows directly from the conclusion (24) of Lemma 2 and (14) with and .
Corollary 2
Suppose that Assumptions 4.1, 4.4, 4.3, and 3.3 hold with and . Then
[TABLE]
Proof
The result follows from Proposition 1, the series expansion of the exponential, and the dominated convergence theorem.
Lemma 2 shows that whenever Assumption 4.3 holds, then regardless of the growth behaviour of , the randomised implicit Euler method has the property that if for some , then as well; cf. the hypothesis on in Theorem 3.4.
4.2 Convergence in discrete time for implicit Euler
Proposition 2
Let , and suppose that Assumptions 4.1 and 3.3 hold for some and some . Then there exists a scalar that does not depend on or , such that for all ,
[TABLE]
with as in (50).
Proposition 2 shows that when satisfies the polynomial growth condition and is the implicit Euler method, then the local truncation error at step of the randomised numerical integrator satisfies a bound analogous to that in Assumption 3.2, provided that the random variables are sufficiently regular.
Theorem 4.5
Let , and let be given by (22). Suppose that Assumptions 4.1, 4.3, and 4.4 hold, with parameters and . Suppose that Assumption 3.3 holds with and . Then there exists some that does not depend on such that for ,
[TABLE]
Note that the condition is the same condition on in Theorem 3.5, since the implicit Euler method has order .
4.3 Alternative decomposition of the error
The decomposition (15) of the error was used to derive the convergence results above. One might consider instead using the decomposition
[TABLE]
with the goal of using some stability properties of the implicit Euler method. However, this approach leads to a convergence result that is weaker, either because it requires exponential integrability of , or because the convergence is uniform only on a proper subset of the event space . Recall that we do not assume any of the to be a.s. bounded.
By (10) and the fact that implicit Euler has order one (i.e. Assumption 3.2)
[TABLE]
where one can show, using the proof of Proposition 2, that in (27) depends on but not on . By (10) we obtain
[TABLE]
Substituting the result above into (27), and assuming that , we obtain
[TABLE]
The definition (22) of the implicit Euler method and (10) yield
[TABLE]
by Assumption 4.1. Rearranging the above yields
[TABLE]
where is a random variable. Analogously, define the random variable by
[TABLE]
Suppose that are fixed, and define
[TABLE]
Since it is not the case that all of the random variables are a.s.-bounded, it follows that is a proper subset of , for every . In what follows, we assume that is nonempty, and that ; we suppress the –dependence of all random variables. Define by
[TABLE]
Using (30), we have
[TABLE]
and substituting the above into (28) yields
[TABLE]
Proceeding as in the proof of Theorem 4.5, we use a telescoping sum, Grönwall’s theorem, and Assumption 3.3 with to obtain
[TABLE]
where depends on according to
[TABLE]
For any , it follows from the definition of , and considering the zeroth order term above that
[TABLE]
From (29), it follows that, for all , we have
[TABLE]
where the right-hand side increases to infinity as decreases to zero. Thus, it need not be true that the quantity is finite. One way to ensure that is finite for would be to require that is finite on the same range. By the inequality for above, a necessary condition for to be finite is exponential integrability of . In many cases, a necessary condition for this would be exponential integrability of . By Corollary 2, in order to guarantee exponential integrability of , we would need to impose much stronger regularity conditions on the than those in Theorem 4.5. Finally, we also remark that if the are not -a.s. uniformly bounded, then for any , (31) is a weaker convergence result than Theorem 4.5, since in this case for any will be a proper subset of .
5 Additional results
5.1 Convergence for continuous-time interpolant
Recall (6) defines the discrete-time process ; in many applications, it is often useful to have a numerical method that provides continuous output, e.g. an inverse problem or data assimilation that requires comparison between the numerical solution and an observation that is not on the time grid defined in (3). Given this time grid , we may define a continuous-time process by
[TABLE]
For the above definition to work, we assume that each is a stochastic process defined on the time interval . In addition, to ensure that the process has -almost surely continuous paths, we require that . The corresponding notion of the error at time is given by , where . We emphasise that the continuous-time process described above will in general differ from the continuous-time process obtained by linear interpolation of .
We now demonstrate how one can obtain a convergence result for the continuous-time process from a discrete-time convergence result by strengthening the assumption on the noise, using Theorem 3.5 as an example. Consider the following version of Assumption 3.3:
Assumption 5.1
Fix . The collection of stochastic processes satisfies and admits , and some that do not depend on or , such that for all and for all ,
[TABLE]
Recall that we do not assume that the are independent, identically distributed, or centred.
Theorem 5.2
Let , and suppose that Assumptions 3.1, 3.2, and 5.1 hold with parameters , , , , , , and . Then for all ,
[TABLE]
where is defined in (18).
The next result follows from Theorem 5.2 in the same way that Corollary 1 follows from Theorem 3.5.
Corollary 3
Fix . Suppose that Assumptions 3.1 and 3.2 hold, and that Assumption 5.1 holds with and . Then, for all ,
[TABLE]
Proof
The proof follows by the series representation of the exponential and the dominated convergence theorem; see the proof of Corollary 1.
5.2 Existence of processes that satisfy the -regularity condition
The lemma below shows that there exist random variables that are not -a.s. bounded, and that satisfy Assumption 3.3 and, more generally, Assumption 5.1 for .
Lemma 3
Let and be arbitrary, and let be -valued Brownian motion. Then
[TABLE]
satisfies
[TABLE]
Note that variants of the integrated Brownian motion process have been used for modelling local truncation error in other works (Schober et al., 2014; Conrad et al., 2017). However, the point of Lemma 3 is not to suggest that the local truncation error behaves as an integrated Brownian motion, nor even that the integrated Brownian motion process is a suitable model for the local truncation error. The point of Lemma 3 is simply to show that there exist processes that satisfy Assumption 5.1 with . The construction of models that better reflect known properties of the truncation error, for specific classes of vector fields , is an interesting task that we leave for future work.
Appendix A Proofs
Proof (Proof of Lemma 1)
The assertion (11) holds immediately for , so let , and recall the binomial formula: for and ,
[TABLE]
Fix . By (9), for any ,
[TABLE]
where the second inequality follows from . Therefore,
[TABLE]
and the proof is complete upon observing that
[TABLE]
and bounding the other binomial sum in a similar way.∎
Proof (Proof of Theorem 3.4)
By (15),
[TABLE]
By (10) with , by Assumption 3.1 and Assumption 3.2, and using that ,
[TABLE]
Observe that equals a quadratic polynomial in with coefficients , , and . Calculating these coefficients and defining
[TABLE]
then yields that for all .
Combining the preceding estimates yields
[TABLE]
Using (38) in the telescoping sum
[TABLE]
the fact that and , we obtain
[TABLE]
It follows from the last inequality that
[TABLE]
Now replace on the right-hand side with and take expectations of both sides of the inequality. Since Assumption 3.3 holds with ,
[TABLE]
Next, define for every the -algebra generated by Then the sequence forms a filtration. Define by
[TABLE]
We want to show that this process is a martingale with respect to . By (6), is measurable with respect to , so is measurable with respect to . Hence is adapted with respect to . Observe that
[TABLE]
Using the assumption that , (6), Assumption 3.3, and the fact that is fixed, it follows that and belong to ; thus belongs to for every . We now use the assumption that for every , and that the are mutually independent, in order to establish the martingale property:
[TABLE]
and the right-hand side vanishes since is measurable with respect to as noted earlier. Since is a martingale, we may apply the Burkholder–Davis–Gundy inequality [Peškir, 1996, Equation (2.2)]. Letting denote the quadratic variation up to time of a process , we have
[TABLE]
where we define and . Using (9) with the same and , , and , and using (36), it follows that
[TABLE]
where we applied (10) with , , and to obtain the last inequality. Thus by Assumption 3.3 and by using ,
[TABLE]
Combining the preceding estimates, we obtain
[TABLE]
and by rearranging terms and using that , we obtain
[TABLE]
By the discrete Grönwall inequality (Theorem 2.1) with and constant and , and by using that , we obtain
[TABLE]
This establishes (16).∎
Proof (Proof of Theorem 3.5)
Let and . By applying the triangle inequality, (11), Assumptions 3.1 and 3.2, and by using that (since ),
[TABLE]
Observe that, since and are nonnegative, and since ,
[TABLE]
Note that .
Since implies that , we have
[TABLE]
Decomposing as a telescoping sum, using that , using the nonnegativity of the summands on the right-hand side of the last inequality, and using the relation , we obtain
[TABLE]
Using that and Grönwall’s inequality (Theorem 2.1),
[TABLE]
Taking expectations, using (14) with and , and using that yields
[TABLE]
Rearranging the above produces the desired inequality.∎
Proof (Proof of Corollary 1)
Let be arbitrary. Using (40), and applying (12) twice, we obtain
[TABLE]
Taking expectations and using (14) with and , we obtain
[TABLE]
The conclusion follows by the series expansion of the exponential and the dominated convergence theorem.∎
Proof (Proof of Theorem 4.2)
Recall that the solution map of the initial value problem (1) satisfies
[TABLE]
For any and , Assumption 4.1 and the integral Grönwall–Bellman inequality yield
[TABLE]
Given the boundedness hypothesis on the , we may define a finite constant that does not depend on or , such that
[TABLE]
The rest of the proof follows in a similar manner to that of Theorem 3.5. ∎
Proof (Proof of Lemma 2)
In what follows, we shall omit the dependence of all random variables on , with the understanding that is arbitrary. Let , where . From (6) we have, by (9),
[TABLE]
Taking the inner product of (22) with , we obtain by (23)
[TABLE]
Thus,
[TABLE]
where we used the inequality for the second inequality. Then (41) and (A) yield
[TABLE]
Let and . By (43), it follows that
[TABLE]
Using the telescoping sum
[TABLE]
it follows that
[TABLE]
Since , and since the right-hand side of the inequality above is nonnegative,
[TABLE]
Applying the Grönwall inequality (Theorem 2.1), yields, for all ,
[TABLE]
where we define, for as in Assumption 4.4, the scalar
[TABLE]
This yields (24) for . By applying (12), we obtain (24) for arbitrary .∎
Proof (Proof of Proposition 2)
Recall that in Assumption 4.1, we assume . Therefore, Taylor’s theorem applied to the function yields
[TABLE]
where as . Then, by (20a), (22), and (12),
[TABLE]
By (20a), (22), (21), and (12) with the fact that in Assumption 4.1, we obtain
[TABLE]
From (A) and (12), it holds that for any and such that ,
[TABLE]
for in Assumption 4.4. Applying the second inequality for the appropriate values of and computing exponents yields that, for the polynomials , and defined on by
[TABLE]
and , it follows from Lemma 2 that
[TABLE]
Taking expectations, applying Proposition 1, and using that to bound the right-hand side of the inequality (25) in Proposition 1, we may define some that does not depend on or , such that
[TABLE]
By Proposition 1, the finiteness of follows from the hypothesis and the observation that and have degree and in , respectively.
Now it remains to show that . From (20a), (20b), and (45), we obtain
[TABLE]
By the triangle inequality and (48),
[TABLE]
Then by applying (12) and Proposition 1 with the hypothesis that , and using the bound , it follows that we may define a positive scalar that does not depend on or , such that
[TABLE]
Therefore, with and as in (47) and (49) above, (46) yields
[TABLE]
as desired.∎
The proof below makes clear that we make absolutely no effort to find optimal constants.
Proof (Proof of Theorem 4.5)
Let . By (11)
[TABLE]
Since and , it holds that and . Using these inequalities, (11), and (20b) in the preceding inequality, we obtain
[TABLE]
Using (11) again, we obtain
[TABLE]
so that by defining
[TABLE]
we have
[TABLE]
and, therefore,
[TABLE]
By nonnegativity of , it follows that is a polynomial of degree in with positive coefficients. In particular, if we recall the definition of and define by
[TABLE]
then does not depend on , for all , and
[TABLE]
By the telescoping sum associated to , the fact that , the bound , the nonnegativity of the terms on the right-hand side of the inequality above, and the bound , we obtain
[TABLE]
By Lemma 2,
[TABLE]
which implies that
[TABLE]
Define
[TABLE]
Since , it follows that ,and by Grönwall’s inequality (Theorem 2.1) we obtain
[TABLE]
Taking expectations completes the proof, provided that we can ensure each sum is of the right order in . By Proposition 2 with the hypothesis that , and by Assumption 3.3,
[TABLE]
Thus we need to hold. Next, using the bound , Young’s inequality (9) with , , and some and conjugate exponent pair to be determined later, we obtain with (14) that
[TABLE]
Since , the maximal value of compatible with integrability of is . Since we are not interested in optimal estimates, we shall set and . We thus obtain
[TABLE]
For the exponent of of the first term in the parentheses, we want to ensure that , or equivalently that . Comparing this condition with the condition that arose from (53), and recalling that , we observe that if , then the preceding estimates yield
[TABLE]
It remains to bound by a constant that does not depend on . By (12), Proposition 1, and the assumption that for in Assumption 4.4, we obtain
[TABLE]
where does not depend on . Note that in applying Proposition 1, we have used that for the exponent in Assumption 4.1, since this implies that . ∎
Proof (Proof of Theorem 5.2)
Let and . Then
[TABLE]
and given that Assumption 3.1 implies that is Lipschitz on for every ,
[TABLE]
by applying (12). Since , it follows from the inequality above that
[TABLE]
By Assumption 5.1,
[TABLE]
Note that Assumption 5.1 is stronger than Assumption 3.3. Therefore we may apply Theorem 3.5 to obtain (32).∎
Proof (Proof of Lemma 3)
If , then the desired statement follows immediately. Therefore, let . Let be the integrated -Brownian motion process scaled by , so that
[TABLE]
where we applied Jensen’s inequality to the uniform probability measure on . It follows that
[TABLE]
Above, we used the Fubini–Tonelli theorem to interchange expectation and integration with respect to , and the fact that \mathbb{E}\bigl{[}\sup_{t\leq\tau}\|B_{t}\|^{r}\bigr{]} is constant with respect to the variable of integration . For , the Burkholder–Davis–Gundy martingale inequality [Peškir, 1996, Equation (2.2)] yields
[TABLE]
with for . For , Doob’s inequality [Peškir, 1996, Equation (2.1)] yields
[TABLE]
Since is continuously differentiable and monotonically decreasing on , the desired conclusion follows.∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Briol et al. [2015] F.-X. Briol, C. Oates, M. Girolami, and M. A. Osborne. Frank–Wolfe Bayesian quadrature: Probabilistic integration with theoretical guarantees. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28 , pages 1162–1170. Curran Associates, Inc., 2015.
- 2Capistrán et al. [2016] M. A. Capistrán, J. A. Christen, and S. Donnet. Bayesian analysis of OD Es: solver optimal accuracy and Bayes factors. SIAM/ASA J. Uncertain. Quantif. , 4(1):829–849, 2016. doi: 10.1137/140976777 .
- 3Chkrebtii et al. [2016] O. A. Chkrebtii, D. A. Campbell, B. Calderhead, and M. A. Girolami. Bayesian solution uncertainty quantification for differential equations. Bayesian Anal. , 11(4):1239–1267, 2016. doi: 10.1214/16-BA 1017 .
- 4Christen [2017] J. A. Christen. Posterior distribution existence and error control in Banach spaces, 2017. ar Xiv:1712.03299.
- 5Cockayne et al. [2017] J. Cockayne, C. Oates, T. J. Sullivan, and M. Girolami. Probabilistic numerical methods for PDE-constrained Bayesian inverse problems. In G. Verdoolaege, editor, Proceedings of the 36 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering , volume 1853 of AIP Conference Proceedings , pages 060001–1–060001–8, 2017. doi: 10.1063/1.4985359 .
- 6Cockayne et al. [SIAM Rev., to appear] J. Cockayne, C. Oates, T. J. Sullivan, and M. Girolami. Bayesian probabilistic numerical methods, SIAM Rev., to appear. ar Xiv:1702.03673.
- 7Conrad et al. [2017] P. R. Conrad, M. Girolami, S. Särkkä, A. M. Stuart, and K. C. Zygalakis. Statistical analysis of differential equations: introducing probability measures on numerical solutions. Stat. Comput. , 27(4):1065–1082, 2017. ISSN 0960-3174. doi: 10.1007/s 11222-016-9671-0 .
- 8Diaconis [1988] P. Diaconis. Bayesian numerical analysis. In Statistical Decision Theory and Related Topics, IV, Vol. 1 (West Lafayette, Ind., 1986) , pages 163–175. Springer, New York, 1988.
