Adapted Wasserstein Distances and Stability in Mathematical Finance
Julio Backhoff-Veraguas, Daniel Bartl, Mathias Beiglb\"ock, Manu, Eder

TL;DR
This paper introduces an adapted Wasserstein distance that incorporates temporal structure, enabling stability analysis of hedging strategies in financial models and overcoming limitations of traditional Wasserstein metrics.
Contribution
It proposes a novel adapted Wasserstein distance for financial models, establishing Lipschitz stability of hedging strategies with respect to this metric.
Findings
The adapted Wasserstein distance accounts for temporal structure in models.
Hedging strategies are Lipschitz continuous under this new metric.
Results are sharp for Brownian motion and European options.
Abstract
Assume that an agent models a financial asset through a measure Q with the goal to price / hedge some derivative or optimize some expected utility. Even if the model Q is chosen in the most skilful and sophisticated way, she is left with the possibility that Q does not provide an "exact" description of reality. This leads us to the following question: will the hedge still be somewhat meaningful for models in the proximity of Q? If we measure proximity with the usual Wasserstein distance (say), the answer is NO. Models which are similar w.r.t. Wasserstein distance may provide dramatically different information on which to base a hedging strategy. Remarkably, this can be overcome by considering a suitable "adapted" version of the Wasserstein distance which takes the temporal structure of pricing models into account. This adapted Wasserstein distance is most closely related to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Adapted Wasserstein Distances and Stability in Mathematical Finance
J. Backhoff-Veraguas, D. Bartl, M. Beiglböck, M. Eder
Abstract.
Assume that an agent models a financial asset through a measure with the goal to price / hedge some derivative or optimize some expected utility. Even if the model is chosen in the most skilful and sophisticated way, she is left with the possibility that does not provide an exact description of reality. This leads us to the following question: will the hedge still be somewhat meaningful for models in the proximity of ?
If we measure proximity with the usual Wasserstein distance (say), the answer is NO. Models which are similar w.r.t. Wasserstein distance may provide dramatically different information on which to base a hedging strategy.
Remarkably, this can be overcome by considering a suitable adapted version of the Wasserstein distance which takes the temporal structure of pricing models into account. This adapted Wasserstein distance is most closely related to the nested distance as pioneered by Pflug and Pichler [52, 53, 54]. It allows us to establish Lipschitz properties of hedging strategies for semimartingale models in discrete and continuous time. Notably, these abstract results are sharp already for Brownian motion and European call options.
Keywords: Hedging, utility maximization, optimal transport, causal optimal transport, Wasserstein distance, sensitivity, stability.
AMS subject classifications (2010) 91G80, 60G42, 60G44, 90C15
1. Introduction
1.1. Outline
Assume that a reference measure is used to model the evolution of a financial asset with the purpose to hedge a financial claim or to maximize some expected utility. We do not expect that the model captures reality in an absolutely accurate way. However, supposing that is close enough to reality (described by a probability ) we would still hope that a strategy which is developed for leads to reasonable results.
A main goal of this paper is to establish this intuitive idea rigorously based on a new notion of adapted Wasserstein distance between semimartingale measures. To fix ideas, we provide a first example of the results we are after.
Theorem 1.1**.**
Let be continuous semimartingale models for the asset price process , and assume that denotes an -Lipschitz payoff of a (pathdependent) derivative . Assume that a predictable trading strategy , and an initial endowment constitute a -superhedge of , i.e.
[TABLE]
Then there is a predictable s.t. constitute an “almost” -superhedge:
[TABLE]
While the adapted Wasserstein distance will be defined in abstract terms (see (1.3)), it relates directly to the model parameters for ‘simple’ models. In particular, if are Brownian models with different volatilities, than the distance between these models is just the difference of these volatilities. Moreover, the bound in (1.1) (as well as further Lipschitz bounds given below) are already sharp in such a simple setting and for a European call option.
Below we will provide a number of results with similar flavour as Theorem 1.1. E.g. we will provide versions where the hedging error is controlled in terms of risk measures and we will show that a Lipschitz bound of the type (1.1) applies (with bigger constants) if the same trading strategy is applied in the model as well as in the model . Importantly, we establish that comparable results of Lipschitz continuity apply to utility maximization and utility indifference pricing.
We emphasize that familiar concepts such as the Lévy-Prokhorov metric or the usual Wasserstein distance do not appear suitable to derive results comparable to Theorem 1.1. E.g. in the vicinity of financial meaningful models there are models with arbitrarily high arbitrage even for bounded strategies; similar phenomena appear w.r.t. completeness / incompleteness. Instead we introduce an adapted Wasserstein distance which takes the temporal structure of semimartingale models into account. These distances are conceptually closely related to the nested distance as pioneered by Pflug and Pichler [53, 54, 55]; see [1, 30, 20] for first articles which link such a type of distance to finance. We describe these contributions more closely in Section 2 below.
1.2. Notation and adapted Wasserstein distances
Throughout we let
[TABLE]
The first setting shall be referred to as the discrete time case, and the second as the continuous time case.111Indeed the arguments in the discrete and the continuous case use the same set of ideas but the presentation is significantly less technical in the discrete case which was an important reason to include the discrete case in the paper. In the first case we denote by the time-index set, and in the second . Throughout the article we will provide definitions and results without specifying which of the two cases we are referring to: This means that the definitions / results apply in both cases. Only occasionally will we consider one case specifically, and in this situation we will state this explicitly.
We interpret as the set of all possible evolutions (in time) of the 1-dimensional asset price. Importantly, mutatis mutandis, all our results (except Propositions 3.3, 3.6 and Example 3.4) remain true for multi-dimensional asset price processes (corresponding to / ). We chose to go for the 1-dimensional version to simplify notation.
The mappings denote the canonical processes (i.e. the identity map), and we make the convention that on the process denotes the first coordinate and the second one. The spaces and are endowed with the maximum-norm and the corresponding Borel--field. In continuous time, the space is endowed with the right-continuous filtration generated by , in discrete time we use the plain filtration generated by . In any case we denote this filtration by and endow with the product filtration . Given a -algebra and a probability on we write for the -completion of . The set of couplings between probability measures consists of all probability measures on such that and . A Monge coupling is a coupling that is of the form for some Borel mapping that transports to , i.e. satisfies . Given a metric on and , the -Wasserstein distance of is
[TABLE]
In many cases of practical interest the infimum in (1.2) remains unchanged if one minimizes only over Monge couplings, cf. [56].
Before defining the adapted Wasserstein distance between measures and on , let us hint why distances related to weak convergence are not suitable for the results we have in mind. Assume for example that we are interested in a utility maximization problem in two periods and that Figure 1 describes the laws of two traded assets. Clearly they are very close in Wasserstein distance, as follows from considering the obvious Monge coupling induced by depicted in Figure 1. At the same time, the outcome of utility maximization is certainly very different. Similarly, is a martingale measure while allows for arbitrage. The clear reason for that is the different structure of information available at time .
To exhibit why the Wasserstein distance does not reflect this different structure of information, let us review the transport condition . We rephrase it as
[TABLE]
While this condition is of course perfectly natural in mass transport, (1.3) almost seems like cheating when viewed from a probabilistic perspective: the map should not be allowed to consider the future value in order to determine . To define an adapted version of the Wasserstein distance, the ‘process’ should be taken to be adapted in order to account for the different information structures of and .
Naturally our official definition of adapted Wasserstein distances will not refer to adapted Monge transports but rather to couplings which are ‘adapted’ in an appropriate sense. Following Lassalle [45], we call such couplings (bi-)causal. Since the definition below may appear a bit technical at first glance, the following may be reassuring: In the discrete time setting and for absolutely continuous measures , the weak closure of the set of adapted Monge couplings, i.e. for adapted, is precisely the set of all causal couplings, see [42].
Definition 1.2** ((bi-)causal couplings).**
For a coupling of denote by a regular disintegration w.r.t. . The set of causal couplings consists of all such that for all and
[TABLE]
The set of all bi-causal couplings consists of all such that also , where .
In discrete time, a coupling is causal if and only if
[TABLE]
-a.s. for every and Borel set , that is, at time , given the past of , the distribution of does not depend on the future of .
Replacing couplings by bi-causal couplings in (1.2) one arrives at the nested distance as introduced by Pflug and Pichler [52, 53]. Since our goal is to compare also semimartingale models in continuous time we will work with an adapted Wasserstein distance that is defined slightly differently. (Notably, it is straightforward that the two distances are equivalent for probabilities on . We will elaborate in Section 3.3 below, why the definition in (1.4) is more appropriate for our purposes even in discrete time.)
In continuous time, we denote by the set of all probabilities on (the Borel -field of) under which the canonical process is a continuous semimartingale. In discrete time, denotes the set of all Borel probabilities on under which is integrable. In either case we can uniquely decompose , with a finite variation predictable process started at zero, and a local martingale. Indeed, in the first case is a special semimartingale and in fact and are continuous too, and in the second case this is the Doob decomposition of an integrable adapted discrete-time process. For we denote by the subset of for which
[TABLE]
where is the quadratic variation and the first variation norm. Note also that by the BDG inequality for , hence is a true martingale.
Definition 1.3** (Adapted Wasserstein distance).**
For , set
[TABLE]
where denote the semimartingale decomposition of and resp.
It is shown in Lemma 3.1 that is well-defined (i.e. that is a semimartingale under every bi-causal coupling) and in Lemma 3.2 that in fact defines a metric.
Remark 1.4**.**
In the continuous time setup, the adapted Wasserstein distance can also be computed through
[TABLE]
Here MV denotes the mean variation, i.e. , where the supremum is taken over all finite partitions of .
In Section 3.2 below we will give explicit formulae for the adapted Wasserstein distance in the case of semi-martingale measures described by simple SDEs.
1.3. Stability of Superhedging
For the rest of this article, fix some and let be the set of all predictable processes
[TABLE]
For every , write for the ‘upper’ Burkholder-Davis-Gundy (BDG) constant, cf. Remark 3.12 below. In particular it is known that and that .
Our first main result concerns the stability of superhedging and constitutes a stronger version of Theorem 1.1 stated above.
Theorem 1.5**.**
Let , and let be Lipschitz with constant . Then the hedging error under is bounded by the distance of and plus the hedging error under in the following sense: there exists such that
[TABLE]
Assume in addition that is Lipschitz with constant for every . Then we can take and obtain
[TABLE]
where .
Importantly, it is impossible to transfer a superhedge under into a superhedge under . This occurs already in a one-period framework and is not a by-product of our definition of adapted Wasserstein distance; see Remark 5.2. A similar reasoning requires to consider only trading strategies bounded by ; see Remark 5.3.
It is worthwhile to compare the inequalities (WHI) and (SHI):
- (S)
In a certain sense the ‘strong hedging inequality’ (SHI) seems to be the more relevant assertion: after all a trader does not know that the model (rather than the model ) describes reality and hence she might (somewhat stubbornly) stick to the initial plan of hedging her risk according to the strategy . The inequality (SHI) then allows to quantify the losses due to this model-error.
- (W)
However, the ‘weak hedging inequality’ (WHI) also has a particular merit: suppose that a trader starts with the prior belief that the asset price evolves according to a Black-Scholes model with volatility but soon after time [math] realizes that a volatility (where ) yields a more adequate description of reality. If the witty trader makes an accurate guess about the correct model and updates her trading strategy accordingly, her losses can be controlled through the tighter bound in (WHI).
In Theorem 4.2 we provide a version of Theorem 1.5, where is replaced by a convex, strictly increasing loss function .
Another way to gauge the effectiveness of an almost superhedge is by means of risk measures. We postpone the general formulation to Theorem 4.3 and first present a version that appeals to the average value of risk . Recall that for a random variable
[TABLE]
is the average value at risk at level under model . We then have
Theorem 1.6**.**
Assume that is Lipschitz with constant . Then
[TABLE]
for . If is such that is Lipschitz with constant for every and is the constant defined in Theorem 1.5, then
[TABLE]
The interpretation of this result is similar to the one of Theorem 1.5: As is translation invariant, one has
[TABLE]
and the right-hand side constitutes a relaxed version of the superhedging price.
Notably, the explicit calculations of adapted Wasserstein distance given in Section 3.2 imply that Theorem 1.6 (and similarly Theorem 1.5) are sharp
Example 1.7** (Hedging in a Brownian framework).**
Consider a European call option , where for simplicity . Moreover, let be Wiener measure with constant volatility . Then for every , , and it holds that (we defer the proof of this fact to Section 4)
[TABLE]
This shows that the estimate in Theorem 1.6 is tight (up to constants), in the sense that it is essentially impossible to improve on the probability metric .
We make the important remark that Glanzer, Pflug, and Pichler [30] use the nested distance to control acceptability prices in discrete time models in a Lipschitz fashion through the nested distance of these models. Specifically, in a discrete time one-period framework [30, Proposition 3] and Theorem 1.6 yield almost the same assertion: in this setup, the only difference is that [30, Proposition 3] does not specify a Lipschitz constant and does not assume uniform boundedness of the admissible hedging strategy. (However, the latter seems to be in conflict with our Remark 5.3 below.)
1.4. Stability of Utility Maximization and Utility Indifference Pricing
We move on to consider the continuity of utility maximization. Let be a utility function which is concave, increasing, and denote by the left-continuous version of the derivative. We have
Theorem 1.8**.**
Let be Lipschitz continuous and assume that there exists such that for all . Then, for every there exists a constant such that
[TABLE]
for all with .
The failure of usual Wasserstein distances to guarantee stability of utility maximization is illustrated in Remark 5.1.
A common way of quantifying the value of a claim is via utility indifference pricing:222We are grateful to the anonymous referee for pointing out that we could include the stability of utility indifference pricing w.r.t. adapted Wasserstein distance. given a claim , the utility indifference (bid-) price is defined as the solution of the following equation
[TABLE]
Continuing in the spirit of the present paper, we are interested in the stability of , where the latter denotes the utility indifference price associated to the model .
Theorem 1.9**.**
Let be Lipschitz continuous and assume that there exists such that for all . Then, for every there exists a constant such that
[TABLE]
for all with .
1.5. Structure of the paper
In Section 2 we briefly review the literature related to this paper. In Section 3 we establish some basic properties of the adapted Wasserstein distance, discuss the choice of cost function and give some examples. Moreover we derive a contraction principle (Theorem 3.10) which relates adapted Wasserstein distance with a ‘weak’ (in the sense of Gozlan et al [32]) transport distance. This result forms the basis for the proofs of the results mentioned in the introduction, as well as certain extensions of these results, see Section 4. Finally we conclude with some remarks in Section 5.
2. Literature
The articles closest in spirit to ours are [1, 20, 30]. Acciaio, Zalashko and one of the present authors consider in [1] an object related to the adapted Wasserstein distance in continuous time in connection with utility maximization, enlargement of filtrations and optimal stopping. Glanzer, Pflug, and Pichler [30] prove a deviation-inequality for the so-called nested distance in a discrete time framework333Note added in revision: improved convergence rates have been recently obtained in [7] for a related sample-based estimator. Together with the results of the present article, this gives statistical consistency for an empirical version of the financial problems considered., and consider acceptability pricing over an ambiguity set described through the nested distance. Bion-Nadal and Talay [20] study via PDE arguments a continuous-time optimization problem which is related to the adapted Wasserstein distance.
The concept of causal couplings, and optimal transport over causal couplings, has been recently popularized by Lassalle [45] although precursors can be found in the works [62, 58]. This notion is central to the recent articles [1, 10, 8, 9].
The idea of strengthening weak convergence of measures in order to account for the temporal evolution has some history. Indeed several authors have independently introduced different approaches to address this challenge: The seminal unpublished work by Aldous [2] introduces the notion of extended weak convergence for the study of stability of optimal stopping problems. The principal idea is not to compare the laws of processes directly, but rather the laws of the corresponding prediction processes. Independently, Hellwig [33] introduces the information topology for the stability of equilibrium problems in economics. Roughly, two probability measures on a product of finitely many spaces are considered to be close if for each the projections onto the first coordinates as well as the corresponding conditional (regular) disintegrations are close. Unrelated to these developments Pflug and Pichler [52, 53, 54] have introduced the nested distances for the stability of stochastic programming in discrete time. The nested distance is the obvious role model for the adapted Wasserstein distances considered in this article and (as mentioned above) for a fixed number of time steps and , they are obviously equivalent. Yet another idea to account for the temporal evolution of processes would be to symmetrise the causal transport costs defined by Lassalle [45] by taking the maximum or sum of and ; this was pointed out by Soumik Pal.
In parallel work [6], the four authors of the present article investigate the relations between these concepts in detail. Remarkably, in discrete time all of the concepts mentioned above (adapted Wasserstein distances, extended weak convergence, information topology, nested distances, symmetrised causal transport costs) define the same topology. As noted above, this ‘weak adapted topology’ refines the usual weak topology (properly for , see also Remark 5.2). The articles [8, 6, 27] investigate basic properties of this topology, e.g. the weak adapted topology is Polish [8, Section 5], sets are totally bounded w.r.t. to adapted Wasserstein distance / nested distance if and only if they are totally bounded w.r.t. usual Wasserstein distance [6, Lemma 1.6]. For recent applications of these concepts to optimal transport and probabilistic variants thereof we refer to [11, 12, 61].
In contrast, fundamental topological properties of the above mentioned concepts in the continuous time case seem to be much less understood and, as far as the authors are concerned, pose an interesting challenge for future research. Specifically, it is not clear to us whether the topology associated to the adapted Wasserstein distance is Polish in the continuous time case. In a similar vein, we expect that results analogous to the ones of the present article should apply in the case of càdlàg paths, but this extension is beyond the scope of our current understanding of adapted Wasserstein distances.
The question of stability in mathematical finance has been studied from different perspectives over the years. Notably, starting with the articles of Lyons [46] and Avellaneda, Levy, Paras [5] the area of robust finance has mainly focused on extremal models and hedging strategies which dominate the payoff for every model in a specified class. Following the publication of Hobson’s seminal article [36] connections with the Skorokhod embedding problem have been a driving force of the field, see the surveys of Hobson [37] and Obłój [48]. Recently this has been complemented by techniques coming from (martingale) optimal transport, early papers which advance this viewpoint include [38, 15, 29, 16, 21, 26, 24, 18]. The literature on ‘local’ misspecification of volatility in a sense more closely related to the present article appears more spare. El Karoui, Jeanblanc, and Shreve [28] establish in a stochastic volatility framework that if the misspecified volatility dominates the true volatility, then the misspecified price of call options dominates the real price; see also the elegant account of Hobson [39]. More recently, the question of pricing and hedging under uncertainty about the volatility of a reference local volatility model is studied by Herrmann, Muhle-Karbe, and Seifried [35] (see also [34]). Less plausible models are penalized through a mean square distance to the volatility of the reference model and the authors obtain explicit formulas for prices and hedging strategies in a limit for small uncertainty aversion. Becherer and Kentia [14] derive worst-case good-deal bounds under model ambiguity which concerns drift as well as volatility. Indeed, discussions with Dirk Becherer motivated us to consider also models with drift in our results on stability of super hedging. The behaviour of the superhedging price in a ball (w.r.t. various notions of distance) around a reference model is studied in depth by Obłój and Wiesel [49] for a -dimensional asset and one time period.
A notable implication of our work is that it yields a coherent way to measure model-uncertainty (in the sense of Cont’s influential article [25]): Fix a subset of the set of all consistent models, i.e. martingale measures which are consistent with benchmark instruments whose price can be observed on the market. Given , the model uncertainty associated to a derivative can be gauged through
[TABLE]
The worst-case approach typically pursued in robust finance then yields for , but it appears equally natural to take to be an infinitesimal ball around a reference model. This approach is first carried out by Drapeau, Obłój, Wiesel and one of the present authors [13] in a one period framework. Our results indicate that adapted Wasserstein distance provides a way to extend this to a multi-period setup, and we intend to pursue this further in future work.
On a different note, much work has been done regarding the convergence of discrete time models to their continuous time analogues. Due to the vastness of this literature we refer the reader to the book [57] for references. Finally, in more recent times and starting from the works of Kardaras and Žitković, the stability of utility maximization has been studied in [41, 43, 44, 47, 60] among others.
3. The adapted Wasserstein distance
3.1. Basic properties of
The following Lemma shows that is well-defined.
Lemma 3.1**.**
Let be integrable (semi-)martingale measures for , respectively, and let be a bi-causal coupling between and . Then are (semi)-martingales w.r.t. . Further, if denotes the semimartingale decomposition under , then up to evanescence is the semimartingale decomposition of under .
Proof.
Let be the semimartingale decomposition under and consider and as processes on via and . Further let be a bi-causal coupling between and . To show that remains the semimartingale decomposition under , it is enough to show that is a martingale under . To that end, let and let be -measurable and bounded. (Recall that denotes the right-continuous filtration generated by and that we endow with the filtration .) Then the random variable defined by
[TABLE]
and clearly bounded. Indeed, if for -measurable bounded functions and , then it follows from the definition of bi-causality that is -measurable; the general statement then follows from a monotone class argument. Therefore
[TABLE]
by the martingale property of under . This shows that is a martingale under and therefore that is the semimartingale decomposition under . ∎
Lemma 3.2**.**
* defines a metric on the set .*
We note that very similar arguments could be used to show that defines a metric for semimartingales with infinite time horizon or .
Proof of Lemma 3.2.
It is clear that for all . Suppose that . As , it is immediate that if participates in the infimum defining , and , then
[TABLE]
where denotes the BDG constant and we used the BDG inequality for the martingale . Hence the usual Wasserstein distance between and (defined w.r.t. the -norm) is dominated from above by , and so .
We now prove the triangle inequality. Let given. We fix and assume is bi-causal -optimal for and is bi-causal -optimal for . In the next couple of lines, will always denote the first coordinate of a vector in , the second, and the last. Let
[TABLE]
be disintegrations, and define by
[TABLE]
If is the projection of onto the first and third components, then it is clear that the first and second marginals of are and respectively. Moreover, a disintegration of is given by
[TABLE]
where, as indicated above, now denotes the disintegration of w.r.t. the first coordinate, that is . We claim that, for every , the mapping is -measurable. Indeed, by bi-causality of one has that is -measurable. Thus there is an -measurable function and a -almost surely zero function such that for all . Then for all . The first term is -measurable (by bi-causality of ), and, as is a coupling between and , one has that for -almost all .
The argument for is similar and therefore is a bi-causal coupling between and . Finally, it follows as in the proof of Lemma 3.1 that, if , , and are the semimartingale decompositions under , , and , then they remain the semimartingale decomposition under on endowed with the product filtration.
To finish the proof of the triangle inequality, we observe that
[TABLE]
The function is known to be a norm on the space of -martingales started at zero whose supremum is -integrable. Likewise is a norm on the space of finite variation processes with -integrable variation. Hence
[TABLE]
is a norm on the product of these spaces. We conclude the proof for the triangle inequality with
[TABLE]
since the semimartingale decomposition of under is , with an analogous expression for under .
To conclude the proof, it remains to show that for all . By Lemma 3.1, we have where is the semimartingale decomposition under . Therefore the triangle inequality implies that is real-valued on . ∎
3.2. Examples and explicit calculations
We start by a simple result which permits to give a closed-form expression of the adapted Wasserstein distance in given continuous-time situations:
Proposition 3.3**.**
For consider the SDEs with bounded progressive coefficients:
[TABLE]
Assume that each SDE admits a unique strong solution and denote by the respective laws. Further assume that
- •
* is a function of time only (namely )*
- •
* and at least one of them is a function of time only.*
Then the synchronous coupling (namely joint law of , where in (3.1)), is optimal in the definition of .
The discrete time version of the aforementioned synchronous coupling is given by the Knothe-Rosenblatt rearrangement [10], and a variant of the previous result can also be obtained in the discrete time framework.
Proof.
Let be a feasible coupling for , leading to a finite cost. Naturally for this proof we denote the coordinate process on by . As before we let be the unique continuous semimartingale decomposition of under the -completion of its right-continuous filtration. Observe that is a.s. deterministic, by the assumption on , and that the law of is independent of the coupling . Both facts can be derived easily from the identity
[TABLE]
which by Lebesgue differentiation theorem holds -a.s. As a consequence, the term is independent of the coupling and so we may ignore it and only focus on the term .
By Doob’s martingale representation [40, Theorem 4.2], in a possibly enlarged filtered probability space we may represent the martingale by
[TABLE]
where are independent standard one-dimensional Brownian motions and real-valued processes, both of them adapted in the enlarged filtered space. In the following we will omit the argument from . Necessarily
[TABLE]
By Cauchy-Schwarz inequality we deduce that almost surely
[TABLE]
and accordingly we get the lower bound
[TABLE]
As in the beginning of the proof, the right-hand side does not depend on the coupling thanks to either being a function of time only. To conclude observe that for the synchronous coupling we have equality in the above equation. ∎
As an easy consequence we have
Example 3.4**.**
For bounded Lipschitz functions we denote by the law of the diffusion
[TABLE]
Assume that
- •
* is independent of the -variable, some , and*
- •
* is independent of the -variable, some .*
Calling and , we have
[TABLE]
We now illustrate that in general it is not true that the straightforward synchronous coupling of Proposition 3.3 is optimal. As a consequence, we do not expect a closed-form expression for the adapted Wasserstein distance. A discrete-time version of this observation is discussed in [8, Section 7].
Example 3.5**.**
Consider , , and for each introduce
[TABLE]
Assuming that is a Brownian motion, and for , we introduce the couplings
[TABLE]
These couplings share the same marginals and each of them is bi-causal. It is easy to compute
[TABLE]
We conclude that, for each , there are plenty of pairs such that the “synchronous” coupling is not optimal between its marginals for the metric .
To close this section, we estimate the distance between two geometric Brownian motions with different volatilities.
Proposition 3.6**.**
For , let be the law of the solution to the SDE with , where denotes Brownian motion and . Letting , we then have
[TABLE]
and for
[TABLE]
where is the constant in the BDG-inequality which allows to control quadratic variation by terminal value.
Proof.
We have
[TABLE]
where denotes a centered Gaussian with variance . For and we obtain equality. ∎
3.3. Choice of the ‘cost functional’
Recall from Definition 1.3 that the adapted Wasserstein distance is given through
[TABLE]
where the ‘cost functional’
[TABLE]
is defined using the semimartingale decompositions . The distinctive property of this “quadratic plus first variation” functional is that it exhibits the proper scaling to interpret the discrete time case as approximation to the continuous time counterpart. To wit, consider , and let be the law of where , Brownian motion and . For each , denote by the law of a random walk on with independent increments from to distributed according to . Then one can compute that for
[TABLE]
For comparison, consider the consequences of replacing in (3.2) with corresponding to quadratic nested distance (in terms of Pflug and Pichler [53]). While and are equivalent metrics for each fixed , does not exhibit the appropriate scaling for large . A straightforward computation shows as whenever . In consequence, bounds on the hedging error in terms of become progressively weaker as . In particular they do not allow for a meaningful continuous time limit.
When restricting solely to martingale measures , a sensible alternative to (3.2) would be to consider the maximum norm, i.e. . In fact, by the BDG-inequalities this is essentially equivalent our choice in (3.2). However, when considering semimartingales, this cost is too coarse. For example, let be a sequence in which converges to zero in maximum norm but for which the first variation tends to infinity. Then converges to (when adapted distance is defined only with maximum norm as cost), however, none of our optimization problems converge (take a strategy for which (H(X)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}X)_{T}\approx k|\omega_{n}|_{\text{1-var}} almost surely).
3.4. Stochastic integrals and a contraction principle
We present here the two technical results which underlie the proofs of the main theorems in the article. The first one is
Lemma 3.7**.**
Let , , and be a bi-causal coupling between and . Then there exists a process such that for every , -almost surely. Moreover, we have (G(Y)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}=\mathbb{E}_{\pi}[(H(X)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}|Y], -almost surely.
Proof.
In discrete time, write for Borel functions . Let be a disintegration and define
[TABLE]
for every and . By definition of bi-causal coupling is -measurable. It remains to pick functions which are measurable such -almost surely. Since -almost surely, it is clear that (G(Y)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}=\mathbb{E}_{\pi}[(H(X)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}|Y] -almost surely.
In continuous time we take to be the predictable projection of , under the reference measure , with respect to the -completion of the filtration . By [1, Lemma C.1] the result is -indistinguishable from a predictable process under the -completion of the filtration . The -by-, -almost sure equality , is then a consequence of the definition of predictable projection. The -almost sure equality (G(Y)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}=\mathbb{E}_{\pi}[(H\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}|Y] is established in Lemma 3.8 below, assuming that . The general case follows by localization. ∎
Lemma 3.8**.**
In the continuous-time context of Lemma 3.7, assume further that . Then we have (G(Y)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}=\mathbb{E}_{\pi}[(H(X)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}|Y], -almost surely.
Proof.
The statement is true if instead of the stochastic integrals we considered the integrals w.r.t. the finite variation part of (either by properties of Riemann-Stieltjes integrals, or directly from the definition of predictable projection). For this reason we may now assume that is itself a martingale.
We first take for granted the following result: if is bounded and predictable in the filtration of , and if denotes its predictable projection in the filtration of under the measure , then
[TABLE]
We know that there exist a sequence of predictable simple processes s.t.
[TABLE]
By Itô isometry the stochastic integrals (H^{n}\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T} converge in to (H\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}. Denoting by the predictable projection of with respect to the -filtration, we deduce from (3.3) that
[TABLE]
so again by Itô isometry (G^{n}\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T} converges in to (G\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y). The -almost sure equality (G^{n}\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}=\mathbb{E}_{\pi}[(H^{n}\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}|Y] follows easily by the bi-causality of the coupling , and by taking limits the desired conclusion is obtained.
To finish the proof we must establish (3.3). First we observe that
[TABLE]
as follows from predictable projection and upon taking . The result is a consequence of the equality
[TABLE]
∎
Our next crucial technical result is given in Theorem 3.10 below. But first we need some preparation.
Lemma 3.9**.**
Let , let be a bi-causal coupling between and , let , and write for the semimartingale decomposition under . Then, for every , we have
[TABLE]
where is the upper constant in the BDG-inequality. If further is -Lipschitz continuous for every , then we have
[TABLE]
where .
Proof.
The elementary inequality for together with BDG inequality and the fact that imply
[TABLE]
This proves the first part. The same arguments imply
[TABLE]
from which the second part follows. To prove the third claim, write
[TABLE]
The second term is smaller than by the second part. It remains to estimate \mathbb{E}_{\pi}[|((H(Y)-H(Y))\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}X)_{T}|^{p}]. Write for the semimartingale decomposition of under . By Lemma 3.1, the semimartingale decomposition under is still . Moreover, the BDG-inequality, the Lipschitz-continuity of , and Hölder’s inequality, imply that
[TABLE]
It now follows from the first part that
[TABLE]
and by Lemma 3.1 we have
[TABLE]
Putting all estimates together and replacing and yields the claim. ∎
Denote by the set of all Borel probability measures on such that . Moreover, let be the usual -Wasserstein distance, and let the weak -Wasserstein cost, that is,
[TABLE]
Here denotes the disintegration. Note that is not symmetric and as a consequence of Jensen’s inequality, we always have . Problems akin to go under the name of ‘weak optimal transport’ and have been recently introduced by Gozlan et al. in [32], but see also [3, 4, 11, 9, 31]. We have
Theorem 3.10** (Contraction).**
Let , let a bi-causal coupling between and , let be Lipschitz with constant , and let . Further denote by the semimartingale decomposition under and let such that (G(Y)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}=\mathbb{E}_{\pi}[(H(X)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}|Y] -almost surely. Then
[TABLE]
Now assume in addition that is -Lipschitz continuous for every , then
[TABLE]
where is the constant of Lemma 3.9.
Proof.
We start by proving the first claim. Let be as stated, and define a(X):=C(X)+(H(X)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}X)_{T} as well as b(Y):=C(Y)+(G(Y)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}. Now let so that is trivially a coupling between and . Therefore
[TABLE]
By assumption it holds that
[TABLE]
Thus, using the tower property and Jensen’s inequality, it follows that
[TABLE]
The claim now follows from the first and second estimates in Lemma 3.9.
In the second case where is additionally Lipschitz, let d(X):=C(X)+(H(X)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}X)_{T} as well as e(Y):=C(Y)+(H(Y)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T} and . Then, similarly as before,
[TABLE]
and the claim follows from the first and third estimates of Lemma 3.9. ∎
Remark 3.11**.**
An evident question is whether an estimate for the usual Wasserstein distance holds true without the (Lipschitz-) continuity assumption on . Namely if (3.4) holds for instead of . The following example shows that this is not true. In a two-period discrete time model , let
[TABLE]
so that as for every . Then, set and . For the projection under any bi-causal coupling between and of onto one computes and . In particular (G(Y)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}=0 -almost surely. However, for every one has \mathbb{P}_{\varepsilon}((H(X)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}X)_{T}\geq 1-\varepsilon)\geq 1/4 which implies that the respective laws cannot converge.
Remark 3.12**.**
By we denote the smallest real number such that
[TABLE]
for every martingale . For it was established by Burkholder [22] that but the value of is unknown for according to [50], [51, page 427]. By [17], . (The optimal constant in the reverse inequality is known for the trivial case and for . In the latter instance one obtains [23] and [59] for continuous martingales, resp.)
4. Proofs of the results stated in the introduction and extensions
Thanks to work done in the previous section, the strategy for the proofs boils down into two parts. In a first step, one forgets about the space and only focuses on continuity of the problem at hand with respect to or when image measures on are plugged in: e.g. in utility maximization this means to study continuity of . In a second step, one uses the obtained continuity and the contraction theorem in the previous section.
4.1. Proof of Theorem 1.5
We will need the elementary estimate
Lemma 4.1**.**
Let and let be convex and Lipschitz.
[TABLE]
where is Lipschitz constant of .
Proof.
Let be a coupling of and . Applying Jensen’s inequality we obtain
[TABLE]
As was arbitrary, this implies the claim. ∎
In fact there is equality in the previous lemma, if one takes supremum in the l.h.s. of (4.1) over all -Lipschitz convex function, as shown in [32, Proposition 3.2].
We now turn to the proof of Theorem 1.5. For let be a bi-causal coupling which attains the infimum in the definition of modulo a -margin. By Lemma 3.7 there is such that (G^{n}(Y)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}=\mathbb{E}_{\pi}[(H(X)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}|Y] -almost surely. Define
[TABLE]
(Note that as .) By Lemma 4.1 we have
[TABLE]
From Theorem 3.10 we obtain
[TABLE]
Assume first that and denote by the finite variation process associated to . Then, as is uniformly bounded by , there exists a predictable and a sequence of forward-convex combinations of which converge in to . This, (4.2), and the convexity of lead to the desired conclusion. The general case follows by a simple but notationally heavy localization argument.
The proof in case that and is Lipschitz follows analogously from the second part of Theorem 3.10.
4.2. Proof of Theorem 1.6
In a first step notice that for all and random variables , it follows as in Lemma 4.1 that
[TABLE]
Indeed, if is a coupling from to then
[TABLE]
so minimizing over yields the claim.
The rest of the proof now follows the line of argumentation as in the proof for Theorem 1.5. Fix . Assume only for notational simplicity that there exists a bi-causal coupling which attains the infimum in the definition of , and that there exist such that
[TABLE]
By Lemma 3.7 there is such that (G^{\ast}(Y)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}=\mathbb{E}_{\pi}[(H^{\ast}(X)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}|Y] -almost surely. Therefore
[TABLE]
where the last inequality is due to Theorem 3.10. Interchanging the role of and yields the desired conclusion. The proof for the second estimate follows analogously.
4.3. Proof of Example 1.7
First note that for every integrable random variable . Indeed, this follows from integrating the pointwise inequality . Therefore, as the Brownian stochastic integral has expectation zero, we conclude that \inf_{H\in\mathcal{H}_{k}}\mathrm{AVaR}^{\mathbb{P}}_{\alpha}(C(X)-(H(X)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}X)_{T})\geq\mathbb{E}_{\mathbb{P}}[C(X)]. On the other hand, define
[TABLE]
where stands for the normal distribution with mean 0 and variance . Then and for every . Thus, by Itô’s formula and fact that the martingale property implies that the finite variation part vanishes, one has for the predictable trading strategy . As further for every and , one has
[TABLE]
The proof now follows from the explicit formula for the adapted Wasserstein distance derived in Example 3.4 and the fact that .
4.4. Proof of Theorem 1.8
Recall that for all and some constant . Let be arbitrary and assume only for notational simplicity that there is such that
[TABLE]
and that there is a bi-causal coupling coupling between and which is optimal for . By Lemma 3.7 there is such that (G^{\ast}(Y)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}=\mathbb{E}_{\pi}[(H^{\ast}(X)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}Y)_{T}|Y] -almost surely. Let
[TABLE]
and let be an (almost) optimal coupling for . As is concave and increasing, we have . Using Jensen’s inequality for the concave function we have
[TABLE]
where we used Hölder’s inequality in the last line and denotes the conjugate Hölder exponent of (that is, ). As , the growth assumption on implies that for some (new) constant . Then, by Lemma 3.9, we have
[TABLE]
for ). Exchanging the roles of and and using Theorem 3.10 completes the proof.
4.5. The proof of Theorem 1.9
In a first step, we claim that is uniformly bounded over all with . Indeed, using the growth assumption on , the fact that is stricly increasing, and the BDG-inequality to control the -th moment of (H\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}X)_{T}, it follows that there exist such that
[TABLE]
for all with . Now assume that there exists a sequence with but . Then, using the BDG-inequality once more, it follows that
[TABLE]
a contradiction to (4.3). The case is excluded analogously.
At this point, using the definition of , a twofold application of Theorem 1.8 yields
[TABLE]
Indeed, while a direct application of the theorem would give a constant which depends on , an inspection of its proof shows that the constant depends only on the size of . By the first step this is bounded unifomly over with .
Now let and be arbitrary, and set Y:=Y^{H}:=C-v(\mathbb{P})+(H\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}X)_{T}. Then, it follows that there is some constant (depending on and ) such that
[TABLE]
Indeed, this would follow directly if were bounded by a fixed constant but readily extends to the present setting as for some constant , independent of and as long as . In a similar manner .
Putting everything together reveals
[TABLE]
for some (where is a new constant emerging from and ). Thus which completes the proof.
4.6. Two generalizations
The following two results can be proved using almost the same arguments as used in the proofs of Theorem 1.6 and Theorem 1.8. In particular the proofs boil down to establishing convergence for image measures with respect to and give no new insight on adapted Wasserstein distances, so we shall skip them.
Proposition 4.2**.**
Let be a convex and strictly increasing function and let . Assume that is such that for some constant . Then, for every Lipschitz continuous function , the function
[TABLE]
is continuous on .
Let be a law-invariant risk measure which we directly view as a functional from to the reals. For and a random variable (such that ) we write . A typical example of a law invariant risk measure which satisfies for some constant depending on the -the moment of and is the optimized certainty equivalent, introduced to the mathematical finance community in [19]. For a convex, increasing function which is bounded from below and satisfies as , the optimized certainty equivalent is defined via
[TABLE]
If , then it follows that the infimum over can be taken in some compact set depending on the -th moments. Due to cash additivity of , the following proposition has the same interpretation as Theorem 1.6.
Proposition 4.3**.**
Assume that satisfies for some constant depending on the -the moment of and . Then, for every Lipschitz function , the mapping
[TABLE]
is locally Lipschitz continuous on .
Finally, let us point out that (though not a convex risk measure) the Value-at-Risk () would be another natural candidate to study continuity. However, as is not continuous w.r.t. weak convergence, already in a one period model continuity of \mathbb{P}\mapsto\inf\{m\in\mathbb{R}:\text{there is }H\in\mathcal{H}_{k}\text{ with }\mathrm{VaR}^{\mathbb{P}}(C(X)-m-(H(X)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}X)_{T})\leq 0\} does not hold.
5. Final remarks
Remark 5.1** (Usual Wasserstein does not work I).**
We note that convergence in the usual Wasserstein distance is not sufficient to obtain continuity in any of the problems we study in this paper. Consider a two period market with
[TABLE]
Then and each satisfy the classical no-arbitrage condition, unlike the situation described in Figure 1. While converges to in usual Wasserstein distance, one can verify that convergence in nested distance does not hold. For example in utility maximization of the trivial claim , we have \sup_{H\in\mathcal{H}_{k}}\mathbb{E}_{\mathbb{P}}[U(C(X)+(H(X)\mathchoice{\mathbin{\vbox{\hbox{\scalebox{0.6}{\displaystyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\textstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptstyle\bullet}}}}}{\mathbin{\vbox{\hbox{\scalebox{0.6}{\scriptscriptstyle\bullet}}}}}X)_{T})]=U(0) by Jensen’s inequality (as is a martingale under ). For taking the strategy consisting of and , one gets
[TABLE]
showing the lack of continuity.
Remark 5.2** (Usual Wasserstein does not work II).**
As explained in the introduction, the objective in Theorem 1.5 can be seen as a relaxed version of the superhedging problem. The reason to consider this relaxation is not a technical simplification but necessary to to obtain continuity without further assumptions. Indeed, the problem of superhedging
[TABLE]
is not continuous in w.r.t. adapted distance for any . In fact, this already happens in one period, where adapted and the usual Wasserstein distances coincide. Consider a sequence of measures with full support which converge weakly to a measure . Then the superhedging price w.r.t. equals the concave envelope of , while the superhedging price w.r.t. equals the concave envelope of restricted to the support of . For a recent paper on this problem in one period, see the work of Obłój and Wiesel [49].
Remark 5.3** (Uniformly bounded strategies are necessary).**
Similar as in Remark 5.2 the restriction to trading strategies in (i.e. uniformly bounded strategies) is also no technical simplification. For example, in a one-period framework, the measures converges to in every (adapted) Wasserstein distance. However, we have for small
[TABLE]
where is the set of all bounded trading strategies.
Acknowledgements
All authors are grateful to the anonymous referees whose insightful comments had a significant impact on this article. J. Backhoff gratefully acknowledges financial support by the FWF through grant P30750 and by the Vienna University of Technology. D. Bartl has been funded by the Austrian Science Fund (FWF) under Project P28661. M. Beiglboeck and M. Eder gratefully acknowledge financial support by the FWF through grant Y782.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] B. Acciaio, J. Backhoff-Veraguas, and A. Zalashko. Causal optimal transport and its links to enlargement of filtrations and continuous-time stochastic optimization. Forthcoming at Stoch. Processes and their Applications , 2016.
- 2[2] D. J. Aldous. Weak convergence and general theory of processes. Unpublished monograph; Department of Statistics, University of California, Berkeley, CA 94720, July 1981.
- 3[3] A. Alfonsi, J. Corbetta, and B. Jourdain. Sampling of one-dimensional probability measures in the convex order and computation of robust option price bounds. International Journal of Theoretical and Applied Finance , 22(03):1950002, 2019.
- 4[4] J.-J. Alibert, G. Bouchitte, and T. Champion. A new class of cost for optimal transport planning. hal-preprint , 2018.
- 5[5] M. Avellaneda, A. Levy, and A. Paràs. Pricing and hedging derivative securities in markets with uncertain volatilities. Appl. Math. Finance , 2(2):73–88, 1995.
- 6[6] J. Backhoff-Veraguas, D. Bartl, M. Beiglböck, and M. Eder. All Adapted Topologies are Equal. ar Xiv e-prints , page ar Xiv:1905.00368, May 2019.
- 7[7] J. Backhoff-Veraguas, D. Bartl, M. Beiglböck, and J. Wiesel. Estimating processes in adapted wasserstein distance. ar Xiv e-prints , 2020.
- 8[8] J. Backhoff-Veraguas, M. Beiglböck, M. Eder, and A. Pichler. Fundamental properties of process distances. Ar Xiv e-prints , 2017.
