Donsker-Type Theorem for BSDEs: Rate of Convergence
Philippe Briand, Christel Geiss, Stefan Geiss, C\'eline Labart

TL;DR
This paper investigates the convergence rate of a Markovian backward stochastic differential equation (BSDE) approximation driven by a scaled random walk, extending Donsker-type theorems to BSDEs and analyzing their Wasserstein distance convergence.
Contribution
It introduces a Donsker-type theorem for BSDEs, providing a quantitative rate of convergence for approximations driven by scaled random walks.
Findings
Establishes a convergence rate in Wasserstein distance for BSDE approximations.
Extends classical Donsker theorems to the context of BSDEs.
Provides theoretical bounds for approximation accuracy.
Abstract
In this paper, we study in the Markovian case the rate of convergence in the Wasserstein distance of an approximation of the solution to a BSDE given by a BSDE which is driven by a scaled random walk as introduced in Briand, Delyon and M{\'e}min (Electron. Comm. Probab. 6(2001),1-14).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Donsker-Type Theorem for BSDEs:
Rate of Convergence
Philippe Briand In memory of Jean Mémin from whom I learned lots of mathematics.Many thanks to Pierre Baras for very fruitful discussions about the heat equation.
Christel Geiss
Stefan Geiss
Céline Labart 22footnotemark: 2
Abstract
In this paper, we study in the Markovian case the rate of convergence in the Wasserstein distance of an approximation of the solution to a BSDE given by a BSDE which is driven by a scaled random walk as introduced in Briand, Delyon and Mémin (Electron. Comm. Pro- bab. 6 (2001), 1–14).
1. Introduction
In this paper, we are concerned with the discretization of solutions to BSDEs of the form
[TABLE]
where is a standard Brownian motion. These equations have been introduced by Jean-Michel Bismut for linear generators in [2] and by Étienne Pardoux and Shige Peng for Lipschitz generators in [14].
In one of the first studies on this topic, in the case where the generator may depend on as well, Philippe Briand, Bernard Delyon and Jean Mémin [5] proposed an approximation based on Donsker’s theorem. They showed that the solution to the previous BSDE can be approximated by the solution to the BSDE
[TABLE]
where is the scaled random walk
[TABLE]
and is an i.i.d. sequence of symmetric Bernoulli random variables. They proved, in full generality, meaning that is only required to be a square integrable random variable, that converges to . However, the question of the rate of convergence was left open. Right now it seems to be hopeless to get a result in this direction for such a general path-dependent terminal condition But in the Markovian case, meaning that , this problem seems to be tractable, in particular due to the PDE structure behind. Indeed, is related to the semilinear heat equation
[TABLE]
where, under certain regularity conditions, we can choose
[TABLE]
In the case where is the discretized Brownian motion, the link to PDEs was exploited in [18, 3] to get the rate of convergence, in the Markovian case, of the classical scheme for BSDEs. The convergence of this scheme was already proved in [6, Proposition 13] for a general terminal condition and a generator that is Lipschitz in its spatial coordinates but without any rate of convergence.
Even though the link with PDEs was pointed out in [5], the rate of convergence of the approximation of BSDEs given by scaled random walks was completely open. In two recent papers, Christel Geiss, Céline Labart and Antti Luoto [10, 9] give a first answer to this question. They showed that the error between and is of order when is assumed to be -Hölder continuous and Lipschitz continuous. One of the main arguments in these papers consists in constructing the random walk from the Brownian motion using the Skorohod embedding (see [17]) together with generalizations of the pioneering work of Jin Ma and Jianfeng Zhang [13] on representation theorems for BSDEs. This approach allows to work with convergence in the -sense even if the problem naturally arises in the weak sense. The drawback is that the rate of convergence obtained in these papers is not optimal as one can expect .
The objective of our study is to confirm this expected rate . This improvement was possible by using a weak limit approach, where the error is considered in the Wasserstein distance. Our starting point is a result of Emmanuel Rio [15] who proved that, when , for all , there exists a constant such that, for all , , where is the -Wasserstein distance and a standard normal random variable (see Section 3). Firstly, we generalize this result to cover the case where which corresponds to the heat equation. Then, using the associated PDE, in particular representation formulas in the spirit of [13], we are able to prove that
[TABLE]
for and , respectively, when and are -Hölder continuous and is -Hölder continuous. We refer to Theorem 10 in Section 5 for the precise statement. One of the main difficulties in the proof concerned various gradient estimates in order to obtain the estimate for .
For and we obtain the rate which is the same rate as obtained from Rio for the Random walk approximation of a Gaussian random variable in the Wasserstein distance as mentioned above.
2. Notation
In all the sequel, is a fixed positive real number. We work on a complete probability space carrying a standard real Brownian motion and stands for the augmented filtration of which is right continuous and complete.
We consider the following BSDE
[TABLE]
Throughout this article, we will assume for the function defining the terminal condition and the generator the following:
Assumption (A1)****.
There exist and such that it holds:
- (i)
The function is -Hölder continuous: for all one has
[TABLE] 2. (ii)
The function is -Hölder continuous in time, -Hölder continuous in space and Lipschitz continuous with respect to : for all and in one has
[TABLE]
Most of the time, we do not need to distinguish between and and we let .
Convention: Later the phrase that a constant depends on stands for the fact that can be expressed in terms of where
[TABLE]
Similarly, a dependence on means an additional dependence on .
From [4, Theorem 4.2] it is known that under (A1), the BSDE (2) has a unique -solution for any So for we let be the square integrable solution to the BSDE
[TABLE]
where , and set, as usual, for , , and, for ,
[TABLE]
It is well known that the function is continuous on (see also Lemma 6 below) and under Lipschitz assumptions in and for it is the viscosity solution to (1), see [20, Theorem 5.5.8]. Moreover, in this Markovian setting, for , we have a.s. for all . In [19, Theorem 3.2], for a generator which is Lipschitz continuous in all space variables and a measurable with polynomial growth, J. Zhang proved that belongs to and that a.e. on . Moreover, the following representation holds
[TABLE]
If is the function given by
[TABLE]
we thus have
[TABLE]
together with
[TABLE]
These formulas play an important role in the sequel.
In Section 4 and in the appendix, we extend these results to the case where is -Hölder continuous and make the regularity of and precise.
As mentioned before, we are concerned with the approximation of the solution to (4) by a solution to the BSDE driven by a scaled random walk. To do this, let us consider, on some probability space, not necessarily , an i.i.d. sequence of symmetric Bernoulli random variables. For we set and we consider the scaled random walk
[TABLE]
where for any real number As we did for the Brownian motion, for and we put
[TABLE]
Let us introduce some further notation. We denote the ceiling function by for Moreover, we set
[TABLE]
For let us consider the following BSDE driven by :
[TABLE]
It was shown in [5] that, as soon as , this BSDE has a unique square integrable solution , being adapted and being predictable with respect to the filtration generated by . By construction, is a piecewise constant càdlàg process with . The process is defined as an element of where we start with a defined only on the points and extend it to as a càglàd process by setting . The previous BSDE is actually a discrete BSDE that can be solved by hand since, for , we have
[TABLE]
Thus, if is given,
[TABLE]
where the last equality follows by taking the conditional equation w.r.t. of the second line.
Since we are in a Markovian setting, there is also an analog of the Feynman-Kac formula. If is a given function we set
[TABLE]
and
[TABLE]
Remark 1*.*
From the definition of and , we get that if is -Hölder, and are also -Hölder with constant .
Let be the solution to the finite difference equation, where for and we require
[TABLE]
Then, we obtain from (8) and (9) (cf [5, Proposition 5.1]) that
[TABLE]
These formulas rewrite in continuous time to
[TABLE]
If we set, for and , , we have .
More generally, for , we define as the solution and to the BSDE
[TABLE]
We set . Then,
[TABLE]
Let us observe that is first defined at the points , . As before we let for . We have
[TABLE]
In particular,
[TABLE]
Of course, we have
[TABLE]
Similarly, we define, for ,
[TABLE]
With this notation, (16) rewrites as
[TABLE]
It follows that
[TABLE]
which rewrites, taking into account (15) and (18), to
[TABLE]
where
[TABLE]
We will prove in Section 5 that converges to .
From now on we assume that where is the integer given in Lemma 12 in the appendix and which automatically implies also existence and uniqueness of solutions because
3. Scaled random walk and Wasserstein distance
One starting point of our paper is the following result of Emmanuel Rio [15] (Theorem 2.1); see also [16]. This result covers, up to a generalization, the case where the generator vanishes, i.e. .
Let be the convex function defined by . The Orlicz norm associated to this function of any real random variable is given by
[TABLE]
Let us recall that, for any ,
[TABLE]
Let and be two random variables end let us denote by the law of and by the law of . With the usual abuse of notation, the Wasserstein distance associated to is defined by
[TABLE]
Let be an i.i.d. sequence of random variables with , and such that, for some , . Let be a standard normal random variable. In [15, Theorem 2.1], Emmanuel Rio proved that there exists a constant such that, for ,
[TABLE]
As a byproduct, for any , there exists a constant such that
[TABLE]
where stands for the -Wasserstein distance
[TABLE]
We have also the result of Kantorovich-Rubinstein, i.e.
[TABLE]
Remark 2*.*
We could also consider the case where by using the fact that, in this case, is a distance (see the arguments in [1, Section 7.1]). In general, we have for
Let us start with a straightforward generalization of Rio’s result.
Proposition 3**.**
There exists a such that, for all and all ,
[TABLE]
As a byproduct, taking into account (21), for any , there exists a such that, for all and all ,
[TABLE]
Proof of Proposition 3.
We have, for any and all ,
[TABLE]
If then and we have
[TABLE]
Let us assume that and let us write
[TABLE]
Let us treat each term separately. For the first one, Rio’s result gives
[TABLE]
and multiplying by , we get, since is equal to in distribution,
[TABLE]
Let us deal with the second term of (25). Let . Then
[TABLE]
But and this concludes the proof. ∎
Let us finish with a simple consequence of this result that we will use in the sequel.
Corollary 4**.**
Let and let be an -Hölder continuous function. Then there exists a depending on T such that, for all and all ,
[TABLE]
and, setting ,
[TABLE]
Proof.
Let and . For any coupling of and , using Hölder’s inequality when ,
[TABLE]
Thus, we have, by (24) for ,
[TABLE]
Choosing in (23), this implies the first result.
Let us prove the second assertion. We start by observing that, since and are centered random variables, we have, setting ,
[TABLE]
Let us remark that, for any real numbers and , , and using the fact that ,
[TABLE]
Young’s inequality, , leads to
[TABLE]
In the case where we have
[TABLE]
where . Since implies and we have
[TABLE]
using the fact that .
Let us turn to the case . For any coupling of and , using (22) and (26),
[TABLE]
and, by Hölder’s inequality with and ,
[TABLE]
From (22) it follows that
[TABLE]
where we have used (24) for .
Thus, for ,
[TABLE]
and the result follows as before by choosing in (23). ∎
4. Regularity results on , and
Let us start by known regularity properties of the function that follow from classical a priori estimates for BSDEs.
Lemma 5**.**
Under Assumption (A1) there exists a constant depending on such that, for all ,
[TABLE]
Proof.
The first two results follow directly from classical a priori estimates for BSDEs, see e.g. [8, Proposition 2.1]. The last one ensues from the following upper bound: for any real and for ,
[TABLE]
Since the norm in of is of order , we use Cauchy-Schwarz inequality to bound the first term and a priori estimates enable (similarly as in the proof of [8, Proposition 4.1]) to bound the second term. ∎
Next we extend [19, Theorem 3.2] to the case where is Hölder continuous.
Lemma 6**.**
Recall the notation (5) and let Assumption (A1) hold.
- (a)
The function belongs to and, for all , we have,
[TABLE]
as well as (7) i.e.
[TABLE] 2. (b)
Moreover, there exists a constant depending on such that,
- (i)
* for all * 2. (ii)
* for all *
Consequently, for ,
[TABLE]
Proof of Lemma 6.
The proof is divided into two steps.
Step 1. We assume in addition that is Lipschitz continuous w.r.t. . Then according to [19], we have only the second point to prove and we know that, for some constant ,
[TABLE]
(bi) The representation (28) yields to
[TABLE]
Since is -Hölder continuous we get
[TABLE]
Similarly, we obtain by the conditional Cauchy-Schwarz inequality the estimate
[TABLE]
Using (3) for , we have
[TABLE]
and the Hölder continuity of stated in Lemma 5 yields
[TABLE]
By combining the above estimates we conclude from (4) that
[TABLE]
for Because of (29) we have
[TABLE]
with . Hence we may apply Gronwall’s lemma (Lemma 14) and get
[TABLE]
for some Especially, for this implies
[TABLE]
for some
(bii) We first notice that for any -Hölder continuous function and for all we have
[TABLE]
Therefore, we obtain from (7) that
[TABLE]
Using (31) for and taking into account that satisfies (bi) we get
[TABLE]
for some This finishes the proof of the first step.
Step 2. General case. The proof relies on a regularization procedure and is postponed to appendix A.3. ∎
Remark 7*.*
From now on we will always use the continuous version of given by .
Lemma 8**.**
For all and for , with defined as in Lemma 12, we have
- (i)
** 2. (ii)
**
where depends on and depends on
Proof.
The result on ensues from Lemma 12, by choosing and . Let us prove the result on . By (17) and (10) we have that
[TABLE]
We want to use (2), where we realize that
[TABLE]
A similar argument can be used for the integral expression so that we get
[TABLE]
Then
[TABLE]
Since is -Hölder, is bounded by . Concerning the second term, we get, since satisfies (3),
[TABLE]
We will use that and are -Hölder continuous in , i.e.
[TABLE]
where tends to infinity when tends to [math]. For , Lemma 12 with gives
[TABLE]
while for this is an immediate consequence of Remark 1 and (35) with Then
[TABLE]
Since for we get that .
∎
Proposition 9**.**
Under (A1), there exists a constant depending on such that, for all ,
[TABLE]
Proof.
From Lemma 6, we know that, for ,
[TABLE]
where we have set
[TABLE]
It holds and . We also have, for ,
[TABLE]
Let us observe that, for and any -Hölder continuous function , it holds
[TABLE]
Indeed, we have
[TABLE]
and, from Cauchy-Schwarz inequality, we deduce that
[TABLE]
Coming back to (37), we write, for ,
[TABLE]
to have, taking into account the fact that by (33),
[TABLE]
∎
5. Main results
In this section, we state the main result of this paper which gives the rate of convergence in the Wasserstein distance between the solution to the BSDE (4) and the solution to the BSDE driven by the scaled random walk (14). For the following we want to remind the reader of Remark 7.
Theorem 10**.**
Under (A1), for any , there exists a constant depending at most on such that for all ,
- (i)
** 2. (ii)
**
This result is a consequence of the following proposition which gives the rate of the point-wise convergence of solution to (13), towards the solution of the semilinear heat equation (1).
Proposition 11**.**
Under (A1) there exists a constant depending at most on such that
- (i)
** 2. (ii)
**
Proof.
We split the proof into three parts. We begin by studying , we proceed by obtaining an estimate for and then we conclude with a Gronwall argument.
Estimate for
From (6) and (2) we conclude that
[TABLE]
Let be the function given by
[TABLE]
Using the notation (20) we also have that . With this notation in hand, we have, taking into account (15) and (18),
[TABLE]
In view of the regularity of in time, we have
[TABLE]
Moreover, the Cauchy-Schwarz inequality leads to
[TABLE]
and, taking into account the growth of , we have
[TABLE]
where we have used Lemma 12 to get
[TABLE]
Coming back to (38), we derive the following inequality
[TABLE]
From Corollary 4 we get
[TABLE]
We split the second term on the RHS of (5) into two parts
[TABLE]
Since has the regularity (33), Corollary 4 gives
[TABLE]
By the above estimates we derive from (5) the inequality
[TABLE]
Coming back to the definition of and (see (5) and (39)) and using the Lipschitz continuity of with respect to , we have
[TABLE]
Setting for simplicity, for ,
[TABLE]
for , Lemma 5, Lemma 6 and Lemma 8 imply that, for some and
[TABLE]
respectively. We deduce the following estimate
[TABLE]
and get, coming back to (41), for and for any ,
[TABLE]
We end up with the inequality
[TABLE]
and since belongs to Gronwall’s inequality (Lemma 13) gives
[TABLE]
Estimate for
In order to take advantage of the previous inequality, we need to estimate . To do this, we use the representations (7) and (34). We will divide the study into two parts
[TABLE]
Study of the difference
We have
[TABLE]
For the first term, since
[TABLE]
we have, using the fact that is -Hölder continuous,
[TABLE]
But, exploiting that and , we obtain
[TABLE]
from which we deduce that
[TABLE]
Since , from Corollary 4, the absolute value of the second term on the RHS of (5) is bounded by
[TABLE]
Then we get
[TABLE]
Study of the difference
Here we have to estimate for ,
[TABLE]
When we observe that and combining the regularity (33) of with the estimate (32) we obtain
[TABLE]
Thus, for , .
Let us now consider the case where i.e. . We first write
[TABLE]
For the second term of the RHS of this equality, we proceed as above and get
[TABLE]
But, since , for , and, since ,
[TABLE]
Secondly, we split the term
[TABLE]
into two parts:
[TABLE]
and, the remaining term
[TABLE]
But, due to the uniform regularity of in time, we have, since ,
[TABLE]
Thus, for ,
[TABLE]
We split the integrand of the first term on the RHS of the inequality into three parts,
[TABLE]
so that
[TABLE]
The term
Since has mean zero,
[TABLE]
and the regularity (33) of gives
[TABLE]
Since and the same upper bound holds for (since ), we get
[TABLE]
Finally, we remark that and , to obtain
[TABLE]
and, as a consequence,
[TABLE]
The term
We use once again the regularity (33) of together with Corollary 4 to obtain, for ,
[TABLE]
We first use the fact that to get
[TABLE]
and, since , we have
[TABLE]
from which we deduce the estimate
[TABLE]
The term
For this last term, we come back to the definitions (5) and (39) of and respectively. By (42) we have, for ,
[TABLE]
and, by the Cauchy-Schwarz inequality, we derive the estimate
[TABLE]
Summary for
Let us summarize the estimates we got for . For , we have . For we obtained the upper bound
[TABLE]
Hence we have, for ,
[TABLE]
Coming back to (45), we have, for any and ,
[TABLE]
and, as a byproduct,
[TABLE]
Global estimate
Plugging (43) into (46), we get, for ,
[TABLE]
For , since , we have
[TABLE]
Again, we have,
[TABLE]
from which we deduce
[TABLE]
It follows that, for ,
[TABLE]
Thus,
[TABLE]
But we have
[TABLE]
and, for , since if and only if ,
[TABLE]
But and , so we get
[TABLE]
Finally, we have
[TABLE]
and from Gronwall’s inequality (Lemma 13)
[TABLE]
Coming back to (43), we have also,
[TABLE]
The proof of Proposition 11 is complete. ∎
Proof of Theorem 10.
Theorem 10 is mainly a corollary of Proposition 11.
Let us begin with the convergence of the processes. Let us fix , and . We have
[TABLE]
Since, by Lemma 5, is -Hölder continuous in space, uniformly in time, we have, by Hölder’s inequality,
[TABLE]
where we have used Proposition 3 (see (24)) for the last inequality. Moreover, by Proposition 11,
[TABLE]
This gives the first part of the result.
Let us continue with the convergence of the processes. The proof is almost the same except for the grid points. Let with i.e. . We have as before
[TABLE]
Since, by Lemma 6, is -Hölder continuous, we have, by Hölder’s inequality,
[TABLE]
where the last inequality follows from (24). Moreover, by Proposition 11,
[TABLE]
and the result follows, in this case, from equalities (27) and (18) together with Remark 7.
Let us now consider the case where In this case we have
[TABLE]
which is not equal to in general. We first write
[TABLE]
The second term on the RHS can be bounded by using our previous result, namely
[TABLE]
For the first term, one can write
[TABLE]
From Lemma 6, is -Hölder continuous and we have
[TABLE]
where is a standard normal random variable. Finally, for the last term, we use Proposition 9 for the time regularity of . We have
[TABLE]
and the estimate for follows.
This ends the proof. ∎
Appendix A Appendix
A.1. A priori estimate for discrete BSDEs
For the convenience of the reader, we prove a generalization of an a priori estimate for BSDEs driven by random walks given in [6, Proposition 7] (see also the appendix in [5]). This generalization allows to consider two different generators.
Lemma 12**.**
There exists an integer and a constant both depending only on and such that for any couple of functions satisfying (A1) and for all with and all ,
[TABLE]
where and, for all , denotes the solution to (14) where is replaced by .
Proof.
Let be such that and . Since, , doing exactly the same computation as in the proof of Proposition 7 in [6], we get, for a universal constant ,
[TABLE]
for all deterministic where
[TABLE]
We choose an integer such that, with , . Then, there exists an such that, for it holds and As soon as and , we have
[TABLE]
We set, for , and we introduce the following norm on :
[TABLE]
Considering and summing up (47) over yields
[TABLE]
from which we get
[TABLE]
This finishes the proof since upper bounds the LHS of the inequality stated in the Lemma. ∎
A.2. Gronwall lemmas
We recall the Gronwall lemmas used in this article.
Lemma 13**.**
Suppose that are integrable functions, and For if
[TABLE]
then
[TABLE]
The second lemma is of Volterra type. It can be either proved directly by a convolution argument or one can use [12, Exercise 4, page 190].
Lemma 14**.**
Assume a measurable and such that
[TABLE]
for all . Then for for a constant .
A.3. Proof of Lemma 6: Step 2
We assume that ; the case was treated in Step 1. For , let us consider the function
[TABLE]
When satisfies (3), then does as well and it is -Lipschitz continuous w.r.t. . Moreover
[TABLE]
with . Indeed,
[TABLE]
In particular, . Let be the solution to the BSDE (4) with data and be the function . By the usual classical estimate for BSDEs (see, for instance, [8, Proposition 2.1 and remarks] or [11, Lemma 5.26]), there exists a such that, for all ,
[TABLE]
In particular, converges to , as , uniformly on .
Proof of (a), (bi), and (bii). Since is Lipschitz continuous w.r.t. and satisfies (3) (uniformly in ), by Step 1, we know that and for a.e. with
[TABLE]
Taking into account the convergence of to in , we have
[TABLE]
We define the function
[TABLE]
First we show that
[TABLE]
which also implies that is measurable. For this we denote
[TABLE]
We also use (28) for so that
[TABLE]
Then we apply the conditional Cauchy-Schwarz inequality to and use (48) to get
[TABLE]
Taking into account the bound for given in (A.3), we have
[TABLE]
where depends on , , and .
Now we find a sequence such that the RHS tends to [math]. Indeed, this follows by dominated convergence as (A.3) guarantees a sequence such that
[TABLE]
and because of the equations (50) and (51). For , converges to in and, in particular, for we obtain the desired convergence . Because of for a.e. this also gives that
[TABLE]
Coming back to (52), Gronwall’s lemma (Lemma 14) gives,
[TABLE]
Especially, , so that .
Moreover, we conclude from this estimate and from the continuity of on that also is continuous. Finally, follows from taking the limit in
[TABLE]
where we use dominated convergence based on the inequality (50).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] L. Ambrosio, N. Gigli and G. Savaré, Gradient Flows in Metric Spaces and in the Space of Probability Measures , Birkhäuser, Basel, Boston, Berlin (2005).
- 2[2] J.M. Bismut, Théorie probabiliste du contrôle des diffusions , Mem. AMS 176 (1973).
- 3[3] B. Bouchard and N. Touzi, Discrete-time approximation and Monte-Carlo simulation of backward stochastic differential equations , Stochastic Process. Appl. 111 (2004), no. 2, 175–206.
- 4[4] P. Briand, B. Delyon, Y. Hu, E. Pardoux and L. Stoica, Lp solutions of backward stochastic differential equations , Stochastic Process. Appl. 108 (2003) , 109–129.
- 5[5] Ph. Briand, B. Delyon, and J. Mémin, Donsker–type theorem for BSD Es , Electron. Comm. Probab. 6 (2001), 1–14, (electronic).
- 6[6] , On the robustness of backward stochastic differential equations , Stochastic Process. Appl. 97 (2002), no. 2, 229–253.
- 7[7] J. Jacod and A. N. Shiryaev, Limit theorems for stochastic processes , Springer (2003).
- 8[8] N. El Karoui, S. Peng, and M.C. Quenez, Backward Stochastic Differential Equations in Finance , Math. Finance 7 (1997), 1–71.
