All Adapted Topologies are Equal
Julio Backhoff-Veraguas, Daniel Bartl, Mathias Beiglb\"ock, Manu Eder

TL;DR
This paper demonstrates that various topologies on the set of stochastic process laws, developed for different purposes, are actually equivalent in finite discrete time, unifying their theoretical framework.
Contribution
It proves that all these different adapted topologies coincide in finite discrete time, providing a unified understanding of their structure and properties.
Findings
All adapted topologies are equivalent in finite discrete time.
The weak adapted topology is characterized by continuity of optimal stopping problems.
Different approaches to defining topologies on stochastic laws unify under this framework.
Abstract
A number of researchers have introduced topological structures on the set of laws of stochastic processes. A unifying goal of these authors is to strengthen the usual weak topology in order to adequately capture the temporal structure of stochastic processes. Aldous defines an extended weak topology based on the weak convergence of prediction processes. In the economic literature, Hellwig introduced the information topology to study the stability of equilibrium problems. Bion-Nadal and Talay introduce a version of the Wasserstein distance between the laws of diffusion processes. Pflug and Pichler consider the nested distance (and the weak nested topology) to obtain continuity of stochastic multistage programming problems. These distances can be seen as a symmetrization of Lassalle's causal transport problem, but there are also further natural ways to derive a topology from causal…
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
All adapted topologies are equal
Julio Backhoff-Veraguas
,
Daniel Bartl
,
Mathias Beiglböck
and
Manu Eder
Department of Mathematics, University of Vienna, Austria
Abstract.
A number of researchers have introduced topological structures on the set of laws of stochastic processes. A unifying goal of these authors is to strengthen the usual weak topology in order to adequately capture the temporal structure of stochastic processes.
Aldous defines an extended weak topology based on the weak convergence of prediction processes. In the economic literature, Hellwig introduced the information topology to study the stability of equilibrium problems. Bion-Nadal and Talay introduce a version of the Wasserstein distance between the laws of diffusion processes. Pflug and Pichler consider the nested distance (and the weak nested topology) to obtain continuity of stochastic multistage programming problems. These distances can be seen as a symmetrization of Lassalle’s causal transport problem, but there are also further natural ways to derive a topology from causal transport.
Our main result is that all of these seemingly independent approaches define the same topology in finite discrete time. Moreover we show that this ‘weak adapted topology’ is characterized as the coarsest topology that guarantees continuity of optimal stopping problems for continuous bounded reward functions.
J. Backhoff-Veraguas gratefully acknowledges financial support from the Austrian Science Fund (FWF) under grant P30750. D. Bartl has been funded by the Vienna Science and Technology Fund (WWTF) through projects VRG17-005 and MA16-021, as well as by the Austrian Science Fund (FWF) through project P28661. M. Beiglboeck gratefully acknowledges financial support by the FWF through grant Y782. M. Eder gratefully acknowledges financial support by the FWF through grant Y782 and by the WWTF through project MA16-021.
Keywords: Aldous’ extended weak topology, Hellwig’s information topology, nested distance, causal optimal transport, stability of optimal stopping, Vershik’s iterated Kantorovich distance
1. Introduction
1.1. Outline
If some type of natural phenomenon is modelled through a stochastic process, one might expect that the model does not describe reality in an entirely accurate way. To be able to study the impact of such inaccuracies on the problems one is trying to solve, it makes sense to equip the set of laws of stochastic processes with a suitable notion of distance or topology.
Denoting by the path space (where is some Polish space and ), the set of laws of stochastic processes is , i.e. the set of probability measures on .
Clearly, carries the usual weak topology. However, this topology does not respect the time evolution of stochastic processes which has a number of potentially inconvenient consequences: e.g., problems of optimal stopping / utility maximization / stochastic programming are not continuous, arbitrary processes can be approximated by processes which are deterministic after the first period, etc. In the following we describe a number of approaches which have been developed by different authors to deal with these (and related) problems. Our main result (Theorem 1.2) is that all of these approaches actually define the same topology in the present discrete time setup. Moreover, this topology is the weakest topology which allows for continuity of optimal stopping problems.
1.2. Adapted Wasserstein distances, nested distance
A number of authors have independently introduced variants of the Wasserstein distance which take the temporal structure of processes into account: the definition of ‘iterated Kantorovich distance’ by Vershik [60, 61] might be seen as a first construction in this direction. The topic is also considered by Rüschendorf [58]. Independently, Pflug and Pflug–Pichler [52, 56, 53, 54, 55, 30] introduce the nested distance and describe the concept’s rich potential for the approximation of stochastic multi-period optimization problems. Lassalle [46] considers the ‘causal transport problem’ that leads to a corresponding notion of distance. Once again independently of these developments, Bion-Nadal and Talay [16] define an adapted version of the Wasserstein distance between laws of solutions to SDEs. Gigli [28, Chapter 4] introduces a similar distance for measures whose first marginal agrees, see also [4, Section 12.4].
To set the stage for describing these ‘adapted’ variants let us fix and recall the definition of the usual -Wassterstein distance.
is now a Polish metric space. On we use the Polish metric . Typically, when clear from the context we will omit the subscript for the metric. We use to denote the canonical process on , i.e. is the projection onto the -th factor of . On call the projection on the first factor and call the projection on the second factor. For we denote by the set of probability measures on for which and under , i.e. for which the distribution of under is and that of under is . In applications, a particular role is played by Monge couplings. A Monge coupling from to is a coupling for which -a.s. for some Borel mapping that transports to , i.e. satisfies .
For , i.e. for probability measures on with finite -th moment their -Wasserstein distance is
[TABLE]
Following, [57] the infimum in (1) remains unchanged if one minimizes only over Monge couplings in many situations.
To motivate the formal definition of the adapted cousins in (5) and (6) below, we start with an informal discussion in terms of Monge mappings: In probabilistic terms, the preservation of mass assumption asserts
[TABLE]
which ignores the evolution of and (resp.) in time. Rather it would appear more natural to restrict to mappings which are adapted in the sense that depends only on . Adapted Wasserstein distances can be defined following precisely this intuition, relying on a suitable version of adaptedness on the level of couplings:
The set of causal couplings 111Intuitively, at time , given the past of , the distribution of does not depend on the future of . For measures such that the first marginal of has no atoms, the weak closure of the set of adapted Monge couplings, i.e. of those for which -a.s. for adapted, is precisely the set of all causal couplings, see [44]. consists of all such that
[TABLE]
for all and measurable, cf. [46]. The set of all bi-causal couplings consists of all such that the distribution of under is also in , i.e. that (3) also holds with the roles of and reversed.
The term causal was introduced by Lassalle [46], who considers a causal transport problem in which the usual set of couplings is replaced by the set of causal couplings. The resulting concept is not actually a metric as it lacks symmetry, but as suggested by Soumik Pal, this is easily mended and we formally define the causal - and symmetrized-causal -Wasserstein distance, resp. as follows:
For set
[TABLE]
Rüschendorf [58] refers to as ‘modified Wasserstein distance’. Pflug-Pichler [52, Definition 1] use the names multi-stage distance of order and nested distance. It can also be considered as a discrete time version of the ‘Wasserstein-type distance’ of Bion-Nadal and Talay [16]. In [5] we use a slightly modified definition of which scales better with the number of time-periods but leads to an equivalent metric (for fixed and ). We shall discuss further properties of (and in particular the connection with Vershik’s iterated Kantorovich distance) in Section 1.8 below.
1.3. Hellwig’s information topology
The information topology introduced by Hellwig in [31] (as well as Aldous’ extended weak topology which we discuss next) is based on the idea that an essential part of the structure of a process is the information that we may deduce about the future behaviour of the process given its behaviour up to current time . For a process whose law is , this information is captured by the conditional law of given under .
is also the disintegration of w.r.t. the first coordinates.
Hellwig’s information topology is the initial topology w.r.t. a family of maps which are defined based on these disintegrations:
[TABLE]
Equivalently, is the joint law of
[TABLE]
under , and Hellwig’s information topology is therefore the coarsest topology which makes continuous for all the maps which send a probability to the joint law describing the evolution of the coordinate process up to time and the prediction about the future behaviour of the coordinate process after .
Remark 1.1**.**
All the topologies we consider in this paper are second countable. As such they can be characterized by saying which sequences converge. Restated in the language of sequences, the above definition says that a sequence in converges in Hellwig’s information topology to if and only if, for every , the sequence converges to in the usual weak topology on .
The work of Hellwig [31] was motivated by questions of stability in dynamic economic models/games; see the related articles [40, 59, 32, 11].
1.4. Aldous’ extended weak topology
Aldous [3] introduces a type of convergence for pairs of filtrations and continuous time stochastic processes on them that he calls extended weak convergence [3, Definition 15.2]. Restricted to our current setting, his definition can be paraphrased in a similar manner as that of the information topology. Aldous’ idea is to represent a stochastic process with law through the associated prediction process222The definition of the prediction process goes back at least to Knight [41]., that is, the process given by
[TABLE]
That is, is a measure-valued martingale that makes increasingly accurate predictions about the full trajectory of the process .
Rather then comparing the laws of processes directly, the extended weak topology is derived from the weak topology on the corresponding prediction processes (plus the original processes). I.e. formally, the extended weak topology on is the initial topology w.r.t. the map
[TABLE]
which sends to the joint distribution of
[TABLE]
under .
Note that, to stay faithful to Aldous’ original definition, we defined to map not just to the law of the prediction process but to the joint law of the original process and its prediction process. One easily checks that the original process may be omitted in our setting without changing the resulting topology.
1.5. The optimal stopping topology
The usual weak topology on is the coarsest topology which makes continuous all the functions
[TABLE]
for continuous and bounded.
One may follow a similar pattern and look at the coarsest topology which makes continuous the outcomes of all sequential decision procedures. Perhaps the easiest way to formalize this is to look at optimal stopping problems. In detail, write for the set of all processes which are adapted, bounded and satisfy that is continuous for each . Write for the corresponding value function, given that the process follows the law , i.e.
[TABLE]
The optimal stopping topology on is the coarsest topology which makes the functions
[TABLE]
continuous for all .
1.6. Main result
We can now state our main result:
Theorem 1.2**.**
Let be a Polish metric space, where is a bounded metric and set . Then the following topologies on are equal
- (1)
the topology induced by 2. (2)
the topology induced by 3. (3)
Hellwig’s information topology 4. (4)
Aldous’ extended weak topology 5. (5)
the optimal stopping topology.
The assumption that is bounded serves only to simplify the statement of the theorem, because in this case the topology induced by coincides with the weak topology. For every Polish space there is a bounded complete metric which induces the topology (given any complete metric , replace it by e.g. ).
1.6.1. -Wasserstein and unbounded metrics
There is an analogous statement, Theorem 1.3 below, which drops the assumption that is bounded. To be able to state it, we introduce slight variations of Hellwig’s information topology, of Aldous’ extended weak topology and of the optimal stopping topology:
In [31] Hellwig equips the target spaces of with the weak topology – or more precisely he equips with the weak topology, with the product topology and finally with the weak topology based on this topology. One may easily define a -Wasserstein version of Hellwigs information topology by using the recipe ‘replace the weak topology by the -Wasserstein metric everywhere’. Concretely, if we restrict to , we may view it as a map into , where the last space carries the metric
[TABLE]
We will call the resulting variant of Hellwigs information topology on the -information topology.
Similarly, one may systematically replace every occurrence of the weak topology in the definition of the extended weak topology by the -Wasserstein metric. We call the resulting topology on the extended -topology.
Just like the weak topology is the coarsest topology which makes integration of continuous bounded functions continuous, the -Wasserstein topology is the coarsest topology which makes integration of continuous functions bounded by continuous. Following this analogy, we define as the set of all processes which are adapted, bounded by for some and satisfy that is continuous for each .
The -optimal stopping topology on is the coarsest topology which makes the functions
[TABLE]
continuous for all .
With these we may state the following generalization of Theorem 1.2:
Theorem 1.3**.**
Let be a Polish metric space and set . Then the following topologies on are equal
- (1)
the topology induced by 2. (2)
the topology induced by 3. (3)
the -information topology 4. (4)
the extended -topology 5. (5)
the -optimal stopping topology.
Clearly, one recovers Theorem 1.2 from Theorem 1.3 by choosing a bounded metric on , because the -information topology for bounded is just the information topology, the extended -topology for bounded is just the extended weak topology and the -optimal stopping topology for bounded is just the optimal stopping topology.
The relationship between the topologies listed in Theorem 1.2 and those listed in Theorem 1.3 is similar to the non-adapted case where we know that usual -Wasserstein convergence is equivalent to usual weak convergence plus convergence of the -th moments.
{restatable}
lemmaconvergencemoments
Convergence in any of the topologies of Theorem 1.3 is equivalent to convergence in any of the topologies of Theorem 1.2 (where for building and , is replaced by a bounded compatible complete metric e.g. ) plus convergence of -th moments on w.r.t. (the original) .
We prove Lemma 1.3 in Section 6, making use of (parts of) Theorem 1.2 and Theorem 1.3.
1.7. Further remarks on related work
1.7.1. Some further articles of successors of Aldous
One of the original applications of Aldous’ weak extended topology concerned the stability of optimal stopping [3]. This corresponds to one half of (4)=(5) in Theorem 1.2, but in a much more general setting. This line of work has been continued by Lamberton and Pagès [45], Coquet and Toldo [20], among others.
Aldous’ extended weak topology was also inspiring and instrumental for the development of the theory of convergence of filtrations, and the associated questions of stability of the martingale representation property and Doob-Meyer decompositions. In this regard, see the works by Hoover et al [37, 35] and by Mémin et al [19, 48]. The related question of stability of stochastic differential equations (as well as their backwards version) with respect to the driving noise has particularly seen a burst of activity in the last two decades. For brevity’s sake we only refer to the recent article by Papapantoleon, Posamaï, and Saplaouras [50] for an overview of the many available works in this direction.
1.7.2. Previous applications of adapted Wasserstein distances.
Pflug, Pichler and co-authors [52, 56, 53, 54, 55, 30] have extensively developed and applied the notion of nested distaces for the purpose of scenario generation, stability, sensitivity bounds, and distributionally robust stochastic optimization, in the context of operations research.
Acciaio, Zalashko, and one of the present authors consider in [2] the adapted Wasserstein distance in continuous time in connection with utility maximization, enlargement of filtrations and optimal stopping.
Causal couplings have appeared in the work by Yamada and Watanabe [62], Jacod and Mémin [38] as well as Kurtz [42, 43], concerning weak solutions of stochastic differential equations, and by Rüschendof [58] concerning approximation theorems in probability theory. The term ‘causal’ is first used by Lassalle [46], who uses it in an additional constraint for the transport problem and gives an alternative derivation of the Talagrand inequality for the Wiener measure. Causal couplings are also present in the numerical scheme suggested in [1] for (extended mean-field) stochastic control.
The article [7] connects adapted Wasserstein distance (in continuous time) to martingale optimal transport (cf. [34, 13, 27, 23, 17, 33, 18, 12, 14] among many others). Several familiar objects appear as solutions to variational problems in this context. E.g. geometric Brownian motion is the martingale which is closest in to usual Brownian motion subject having a log normal distribution at the terminal time-point, the local vol model is closest to Brownian motion subject to matching 1-d marginals.
Bion-Nadal and Talay [16] introduce an adapted Wasserstein-type distance on the set of diffusion SDEs and show that this distance corresponds to the computation of a tractable stochastic control problem. They also apply their results to the problem of fitting diffusion models to given marginals.
In [5] the present authors consider adapted Wasserstein distances in relation to stability in finance: Lipschitz continuity of utility maximization/hedging are established w.r.t. to the underlying models in discrete and continuous time.
1.8. Another formulation of the adapted Wasserstein distance and of Hellwigs information topology
Here we give an alternative formulation of the adapted Wasserstein distance / nested distance due to Pflug and Pichler.
Again, is a Polish space and is a compatible metric on . Starting with we define
[TABLE]
The nested distance is finally obtained in a backwards recursive way by
[TABLE]
Then . We refer to [8] for the (straightforward) justification.
For the adapted Wasserstein distance is not complete. As was established in [6], a natural complete space into which embeds is given by the space of nested distributions:
Consider the sequence of metric spaces
[TABLE]
where at each stage , the space is endowed with the -Wasserstein distance with respect to the metric on , which we denote by . The space of nested distributions (of depth ) is defined as . We endow with the complete metric .
The space of nested distributions was defined by Pflug [51]. Notably the idea to iterate the formation of Wasserstein spaces and metrics goes back to Vershik [60, 61] who uses the name ‘iterated Kantorovich distance’. The main interest of Vershik (and his successors) lies in the classification of filtrations (in the language of ergodic theory). We refer to the work of Emery and Schachermayer [25] for a survey from a probabilistic perspective and to Janvresse, Laurent and de la Rue [39] for a contemporary article (again from a probabilistic viewpoint).
is naturally embedded in the set of nested distributions of depth through the map given by
[TABLE]
where is a vector with law , again denotes (conditional) law and we use as a shorthand for the vector .
Following [6], we have:
Theorem 1.4**.**
The map defined in (12) embeds the metric space isometrically into the complete separable metric space .
Remark 1.5**.**
When has no isolated points, is actually the completion of , i.e. considered as a subset of is dense.
1.8.1. Hellwig’s information topology in terms of adapted Wasserstein distances
We note that Hellwig’s definition of the information topology can also be rephrased using the concept of adapted Wasserstein distance: Assume that is a bounded metric and for , set
[TABLE]
I.e. for each , we consider as the product of two Polish spaces (which one might consider as ‘history’ and ‘future’). Extending the defintion of in the obvious way to products of not necessarily equal Polish spaces, we can then equip with a one period adapted Wasserstein distance . Setting for
[TABLE]
we obtain a compatible metric for the information topology. This is relatively straightforward (whereas the full version of Theorem 1.2 is not straightforward as far as we are concerned).
1.9. Preservation of Compactness
We close this section with a result about the preservation of relative compactness which we shall use in Sections 4 and 6, but which also might be of independent interest. Specifically, in [9, 10] the two-step version of Lemma 1.6 is used as a crucial tool in the investigation of the weak transport problem.
A more detailed investigation of compactness in with the weak adapted topology is the topic of the companion paper to this one, [24].
Assume for simplicity that is a bounded metric. Then we have
Lemma 1.6** (Compactness lemma).**
* is relatively compact w.r.t. the usual weak topology iff is relatively compact.*
We note that Lemma 1.6 is essentially a consequence of the characterization of compact subsets in ; in a somewhat different framework it was first proved in [36]. The version stated here follows by repeated application of [24, Lemma 3.3]/[9, Lemma 2.6].
The implication that relatively compact implies relatively compact is rather easy to see, but the other direction that relatively compact implies relatively compact is nontrivial since the mapping is not continuous when is endowed with the usual weak topology (except for trivial cases). Lemma 1.6 would not be true if we were to replace relative compactness by compactness.
The assumption that is bounded is inessential. A version of Lemma 1.6 holds if we replace by and the weak topology by the one induced by the -Wasserstein metric.
A similar result based on Hellwig’s information toplogy, relating relative compactness in to relative compactness in , is also true.
2. Preparations
The rest of the paper will essentially be devoted to proving Theorem 1.2, or really its generalization Theorem 1.3.
In Section 3 we prove that Hellwig’s information topology equals the topology induced by , i.e. in Theorem 1.3. In a sense, of all the topologies listed in Theorem 1.3, Hellwig’s information toplogy ‘looks’ the coarsest – or at least like one of the coarser ones, while the topology induced by ‘looks’ the finest.
In Section 4 we sandwich the topology induced by between Hellwig’s information topology and the toplogy induced by , i.e. we show in Theorem 1.3.
In Section 5 we show that Aldous’ extended weak topology is equal to Hellwig’s information topology, i.e. in Theorem 1.3.
In Section 6 we prove Lemma 1.3.
In Section 7 we prove that the optimal stopping topology is coarser than the topology induced by and finer than Hellwig’s (-)information topology, i.e. in Theorem 1.3.
2.1. Notation
The nested structure of spaces like for example introduced in Section 1.8 is (at least for the authors) not so easy to gain an intuition for. It seems rather challenging to picture probability measures on probability measures on probability measures… etc.
Therefore, much of the proofs in the following two sections will be about bookkeeping and not getting lost in these nested structures. In most other contexts we would regard such bookkeeping as abstract nonsense better swept under the rug, but in the context of the present paper we believe that it really constitutes an important and nontrivial ingredient in successfully carrying out the proofs.
To aid in this endeavour we make some notational preparations and introduce a few conventions.
2.1.1. Operations on Spaces
In the introduction we described the topologies listed in Theorems 1.2 and 1.3 as initial topologies w.r.t. maps into more complex spaces. These spaces are built up from just a few basic operations, and in most cases the maps can also be constructed using a few relatively simple ingredients.
For spaces, the operations in question are
- •
product formation, i.e. for spaces and we may form their product space ,
- •
and passing from a space to the space of probability measures on .
Here we run into some tension between the various existing definitions in the literature. While Hellwig and Aldous originally defined their topologies based on equipping the space of probability measures on some space with the weak topology, without any mention of metrics, is a metric built on the -Wasserstein metric, and Theorem 1.4 exhibits this metric as the ‘initial metric’ w.r.t. an embedding of (not ) into .
Luckily, when the base metric on is bounded and we decide that we only care about topologies and not the metrics that induce them, all of these distinctions vanish, and one may hope for these fine distinctions to not be so important in the end.
To give as uniform and as streamlined a treatment as possible of all the various ways in which these metric and topological spaces can be related to each other we employ the following strategy: A lot of our arguments are agnostic to the distinction between and , and to whether we are talking about metric or topological spaces etc. They only rely on properties of the operations of product formation and formation of spaces of probability measures and on properties of maps between various spaces built using these operations which hold in either case. For the rest of the paper we will therefore drop the in and other explicit mentions of these distinctions. The reader may decide to read the paper using either of the following two sets of conventions, which are to be applied recursively:
Convention 1 (weak topologies)
- •
, , , , , , etc. are Polish spaces.
- •
is a topological space with the product topology (again Polish).
- •
is a topological space with the weak topology (also Polish).
- •
‘space’ will mean Polish space.
Convention 2 ()
- •
is fixed throughout the paper
- •
, , , , , , etc. are Polish (i.e. complete separable) metric spaces with metrics , , , , , , etc. respectively.
- •
is a Polish metric space with the metric
[TABLE]
- •
is a Polish metric space with the -Wasserstein metric
[TABLE]
- •
The subscript on the metric may be dropped when clear from the context.
- •
‘space’ will mean Polish metric space.
Unless specified otherwise everything said from here on will be true for either way of reading. Convention 1 will lead to a direct proof of Theorem 1.2, while Convention 2 will give a proof of the more general version, Theorem 1.3. Occasionally an argument will require us to talk directly about metrics to establish continuity of some map. When one only cares about Theorem 1.2 and not Theorem 1.3 these sections can be read while assuming that and that all metrics mentioned are bounded.
Another space we will need is
Definition 2.1**.**
is the space of probability measures on which are concentrated on the graph of a measuruable function, i.e.:
[TABLE]
The space carries the subspace topology / the restriction of the metric on .
2.1.2. Maps between spaces
Assuming Convention 1, when is a continuous map, the pushforward under , i.e. the map which sends to the measure with is also continuous.
Similarly, assuming Convention 2, when is a Lipschitz-continuous map between metric spaces the pushforward under is also Lipschitz-continous from to .
We will use to denote the pushforward under , to emphasize the fact that is a functor, i.e. that it sends a diagram with a ‘nice’ (read continuous/Lipschitz) map
[TABLE]
to a similar diagram
[TABLE]
where the map is also ‘nice’, and that and (where is the identity function on ).
For a product of spaces , the projection onto will alternatively be denoted by either or by the same letter that is used for the space, but in a non-calligrapic font, i.e. .
If is defined on some product of spaces, we also introduce a shorthand notation for marginals of , i.e. for the pushforward of under projection onto the product of some subset of the original factors:
[TABLE]
If and are functions we write for the function
[TABLE]
If we want to specify a map from, say to but we only really care about one of the variables we will use an underscore ‘’ instead of naming the unused variables, as in . Similarly, when integrating we may also use to denote unused variables, i.e. for we might write .
Two important maps will be the disintegration map and its left inverse .
The disintegration map
[TABLE]
sends a probability on to the measure
[TABLE]
where is a classical disintegration of , i.e. if then
[TABLE]
The disintegration map is measurable (see for example [15, Proposition 7.27]) and injective. It is not continuous w.r.t. the weak topologies or the Wasserstein metrics.
When writing we will not insist that has to be the first factor in the domain of – and may even be products themselves, whose factors are intermingled in the product that makes up the domain of . Also, we may sometimes omit , only specifying the variable(s) w.r.t. which we are disintegrating, not the ones which are left over, as in .
The map
[TABLE]
is (Lipschitz-)continuous.
The pair , enjoy the following properties:
- (1)
is the left inverse of the disintegration map, i.e.
[TABLE]
This is a direct consequence of the definition of the disintegration. 2. (2)
is injective. Therefore, 3. (3)
, i.e. and are inverse bijections between and .
The last two properties are just a reformulation of the known fact that the disintegration of a measure is almost-surely uniquely defined.
2.1.3. Processes which take values in different spaces at different times
Already in the introduction, in Section 1.8.1, we found it convenient to extend the definition of to products of not necessarily equal Polish spaces ‘in the obvious way’. To accommodate for reapplication of concepts in a similar style as seen there we make the minor generalization of letting all the processes we talk about take values in different spaces at different times – typically at time they will take values in a space .
Denote by and define , , .
3. Hellwig’s -information topology is equal to the topology induced by
In this section we show in Theorem 1.3. We will do so by identifying both topologies as initial topologies w.r.t. a single map each, i.e. finding a space which is homeomorphic to with Hellwig’s (-)information topology and one which is homeomorphic to with the topology induced by and then showing that these spaces are homeomorphic in the right way. As an auxilliary tool we will introduce another topology on which wasn’t mentioned in the introduction, but which is very similar to Hellwig’s. The proof strategy can be summarized by saying that we want to show that the following diagram is commutative.
[TABLE]
Here is the map which induces the same topology as , induces Hellwig’s topology and induces what we will call the reduced information topology. We shortly restate their definitions below.
Since these mappings are injective and by the definition of the initial topology all of these mappings are homeomorphisms. To be precise, is a homeomorphism from with the topology induced by onto (cf. Theorem 1.4), is a homeomorphism from with the information topology onto , and is a homeomorphism from with the reduced information topology onto .
The maps , , are still to be found.
As introduced in Section 1.3 Hellwig’s (-)information topology is induced by a family of maps , given by:
[TABLE]
Equivalently, the information topology is the initial topology w.r.t. the map
[TABLE]
We saw in Section 1.8 that is induced by an embedding . Rephrasing the definition there, is obtained by defining recursively from to :
[TABLE]
In fact, because maps into the space of measures concentrated on the graph of a function, also maps into a smaller space, which we call , and which is again defined by recursion down from to :
[TABLE]
I.e. is with all occurences of replaced by . Remember that we had
[TABLE]
For convenience, let us also define
[TABLE]
The fact that
[TABLE]
and that therefore maps into is a consequence of Lemma 3.1 below.
Finally, is defined as follows
[TABLE]
I.e. the reduced information topology, like the information topology, makes continuous predictions about the behaviour of the process after time given information about its behaviour up to time , only now we are just predicting what the process will do in the next step, not for the rest of time.
, and are injective and therefore bijections onto their codomains. This means that the values of the maps , , in diagram (14) as functions between sets are really already prescribed. The task consists in finding a representation for them which makes it clear that they are continuous.
Lemma 3.1**.**
* restricted to maps onto \mathscr{F}\!\big{(}\mathcal{A}\rightsquigarrow\mathscr{F}\left(\mathcal{B}\rightsquigarrow\mathcal{Y}\right)\!\big{)}.*
Proof.
We first show that it maps into \mathscr{F}\big{(}\mathcal{A}\rightsquigarrow\mathscr{F}\left(\mathcal{B}\rightsquigarrow\mathcal{Y}\right)\big{)}. Let \nu\in\mathscr{F}\big{(}\mathcal{A}\times\mathcal{B}\rightsquigarrow\mathcal{Y}\big{)} and let be a function witnessing this fact, i.e. .
Let . Then
[TABLE]
This means that for -a.a. we have , i.e. is concentrated on the graph of the function .
To see that any \alpha\in\mathscr{F}\big{(}\mathcal{A}\rightsquigarrow\mathscr{F}\left(\mathcal{B}\rightsquigarrow\mathcal{Y}\right)\big{)} can be obtained as the image of some under , note that for such , by the existence of measurably dependent (classical) disintegrations (see for example [15, Proposition 7.27]), , and . ∎
3.1. Homeomorphisms
We give a plain language description of what follows in this section:
The continuity of will be quite trivial, because we are just discarding information.
The components of the map are obtained by ‘folding’ both the ‘head’ and the ‘tail’ of using iterated application of the map .
[TABLE]
By continuity of , it’s easy to see that is continuous. To show that the map with the components is the map we are looking for, we basically show that
[TABLE]
is again another way of ‘folding’ all of using to arrive at . As is also , showing (15) amounts to showing that these two different ways of ‘folding’ – first the head and tail and then in a last step the junction between and on the one hand, and from front to back on the other hand – do the same thing. This may be intuitively clear to the reader. The proof works by repeated application of Lemma 3.5, which represents one step of ‘folding order doesn’t matter’. Using Lemma 3.5 the proof is completely analogous to the proof that for an operation satisfying , i.e. for an associative operation, one has
[TABLE]
As we know, for such an operation any way of parenthesizing the multiplication of elements gives the same result. An analogous statement holds for , though we do not formally state or prove this.
Finally, in Lemma 3.8, using Lemma 3.7 as the main ingredient we prove the ‘hard direction’, i.e. that is continuous. If the continuity of and as informally described here seem obvious to the reader they may wish to skip ahead to Lemma 3.7 and Lemma 3.8.
Remark 3.2**.**
The reader interested in working out the details and analogies between ‘folding’ using and associative binary operations might be interested in reading about monads in the context of Category Theory first. (See for example Chapter VI in [47].) In fact, forms a monad, where
[TABLE]
sends an element of to the dirac measure at and
[TABLE]
This monad is studied in a little more detail in [29]. can be obtained from and a tensorial strength in the sense described for example in [49].
To show that is continuous we will need the following lemma.
Lemma 3.3**.**
* is natural in , i.e. for the following diagram commutes.*
[TABLE]
Proof.
This is just straigtforward calculation using the definitions. ∎
Applying Lemma 3.3 with , , and we get that
[TABLE]
Setting we get and then setting gives .
There is an analogue of Lemma 3.3 which we list here for completeness.
Lemma 3.4**.**
* is natural in , i.e. for the following diagram commutes:*
[TABLE]
In particular, if then
[TABLE]
if we regard as a subset of by recursively using the recipe: ‘if is a subset of , then we can view as the subset of those which are concentrated on ’.
Proof.
Again this is just calculation. ∎
We already implicity used the ‘in particular’-part of Lemma 3.4 when we said that can be regarded both as a map into and into but the use there seemed too trivial to warrant much mention. There will be more such tacit uses.
Now we show that is continuous. We claim that it can be written as
[TABLE]
where
[TABLE]
or without the dots, letting \circ$$\prod denote concatenation of functions, e.g. \hbox to0.0pt{\hbox to9.44447pt{\hss\circ\hss}\hss}\hbox{\prod}_{i=3}^{1}f_{i}=f_{3}\circ f_{2}\circ f_{1}:
[TABLE]
To prove this we will repeatedly apply the following lemma.
Lemma 3.5** ( is ‘associative’).**
* satisfies the following relation:*
[TABLE]
These maps can be seen in the following commutative diagram.
[TABLE]
Proof.
This is just expanding the definition. Both maps send a measure to the measure with
[TABLE]
∎
Lemma 3.6**.**
The following relation holds.
[TABLE]
Proof.
Again, this is just repeated application of Lemma 3.5. Below we define for and show that
[TABLE]
for all by showing for all . The left hand side of (17) is the left hand side of (16) with the common tail {\textstyle\hbox to0.0pt{\hbox to9.44447pt{\hss\circ\hss}\hss}\hbox{\prod}_{i=k-1}^{1}}\operatorname{int}_{{}\mkern 3.5mu\overline{\mkern-3.5mu\mathcal{X}\mkern-0.5mu}\mkern 0.5mu^{i}}^{\mathcal{X}_{i+1:N}} of the left and right side in (16) dropped. will be the right hand side of (16) with the common part dropped.
[TABLE]
Here we regard \hbox to0.0pt{\hbox to9.44447pt{\hss\circ\hss}\hss}\hbox{\prod}_{r}^{s}\dots with (an empty product in our context) as the identity function. For the first factor is an empty product and therefore clearly (17) is true for . To get from to we leave the first factor alone and apply Lemma 3.5 with , and . This transforms
[TABLE]
into
[TABLE]
and therefore into . ∎
Lemma 3.7**.**
The right hand triangle in (14) commutes, i.e.
[TABLE]
Proof.
Prepending to (16) gives
[TABLE]
and appending gives
[TABLE]
∎
Now we will show that is continuous. We will postpone the proof of Lemma 3.7 below, which is the crucial non-bookkeeping ingredient in the proof of Lemma 3.8 below, until the end of this section. The methods used in the proof of Lemma 3.7 differ significantly from the rest in this section and make use of the concept of the modulus of continuity for measures, and results relating to it, introduced in the companion paper [24] to this one.
{restatable}
lemmaindkernel
Let
[TABLE]
be the set of all s.t.
[TABLE]
The function
[TABLE]
is continuous.
Clearly, as a function between sets, only depends on . But, as we know, is not continuous. Only when we refine the topology on the source space, which we encode by regarding as a map from the above subset of a product space, does it become continuous.
Lemma 3.8**.**
* is continuous.*
Proof.
We will inductively define
[TABLE]
(again down from to ) so that they will be continuous by construction (and by virtue of Lemma 3.7). Also by construction, we will have . will be so that .
Set , the projection from onto the last factor. by definition. Given define
[TABLE]
where is the projection from onto the -th factor.
For this to be well-defined we need to check that for we have
[TABLE]
I.e. for we want
[TABLE]
The composite of the maps on the left-hand side is equal to
[TABLE]
On the right-hand side we get by induction hypothesis
[TABLE]
Using that we see for
[TABLE]
i.e. by induction (19) is also equal to .
As a composite of continuous maps is clearly continuous. (This is where we use Lemma 3.7.) As a map between sets is just
[TABLE]
by induction hypothesis and definition of . ∎
3.2. Proof of Lemma 3.7
In this part we prove Lemma 3.7. Here we use several of the ideas developed in the companion paper [24]. In particular we will need [24, Lemma 4.2] which we reproduce below.
Lemma 3.9** ([24, Lemma 4.2]).**
Let . For any there is a s.t. if
[TABLE]
then
[TABLE]
For easy reference we also restate Lemma 3.7. \indkernel*
Proof of Lemma 3.7.
Let . Let .
Choose according to Lemma 3.9 with , i.e. s.t. for any with and any with we have .
Let with .
This means we can find with
[TABLE]
Let and be measurable functions on whose graph and , respectively, are concentrated. Let , .
As noted in the proof of Lemma 3.1 we know that for -a.a. the measure is concentrated on the graph of the function (and similarly for ). This together with (which is a consequence of (18)) implies that
[TABLE]
(again similarly for ).
From this we see that the measure defined as
[TABLE]
is in .
We may measurably select almost-witnesses for the distances s.t. building on (20) we get
[TABLE]
Now
[TABLE]
where is defined as
[TABLE]
The integral over the first two summands in (22) is less than by (21). By our choice of in the beginning this implies that the integral over the last summand is also less than , so that overall
[TABLE]
Es was arbitrary this concludes the proof. ∎
4. The symmetrized causal Wasserstein distance
In this section we prove that the topology induced by is sandwiched between Hellwig’s -information topology and the topology induced by , and therefore by what we have already seen in the previous section equal to both of them. Our arguments in this section make explicit use of metrics. The reader who is only interested in the simpler version of our main theorem, Theorem 1.2 may assume that and that all metrics are bounded.
Remember that for we have
[TABLE]
In proving this we will take a slightly roundabout route. First we will focus on the case where is the product of just two spaces, i.e. where we have only two time points. Moreover, for expositional purposes, let us for the moment assume that and are both compact. Generalizing from this setting will not be very hard.
In the compact, two-time-point case we will show equality of the two topologies in question by extending both to a larger (compact) space and showing equality of the topologies on that larger space.
In more detail:
When there are only two timepoints Hellwig’s -information topology and the topology induced by trivially coincide. Both are induced by emedding into via . The latter space carries its standard metric , which – as was already established in Theorem 1.4 in Section 1.8 of the introduction – is an extension of . To highlight this connection, in this section we will also refer to that metric as . As a reminder,
[TABLE]
where is the normal Wasserstein distance (on in this case). We will find an extension of to , which still satisfies all properties of a metric except for symmetry and which is dominated by . Symmetrizing this extension gives a metric (which we will call ). The identity function from topologized with to topologized with will then be a continuous bijection from a compact space (this is where we use compactness of , ) to a Hausdorff space, i.e. a homeomorphism.
The next subsection will be devoted to finding an expression for the extension of to and proving that it satisfies all the properties mentioned above.
Remark 4.1**.**
When contains no isolated points, because is the metric completion of w.r.t. and because the above properties imply that is (uniformly) continuous w.r.t. , we have already uniquely identified . Still, we want to find an expression that allows us to work with and in particular that allows us to prove that is a metric and not just a pseudometric, i.e. that the induced topology is in fact Hausdorff. This is exactly what we gain from assuming compact base spaces and passing to the completion: instead of having to find a lower bound for in terms of (and possibly ) we now just have to prove that if then .
For definiteness we note that we do not assume, compactness of any space in the following.
4.1. Extending the causal ‘distance’
So now we are working with two Polish metric spaces , . Remember that we denote the ‘canonical process’ on by , i.e. is the projection onto the -th coordinate.
To differentiate between the different roles that may play - i.e. is it the space for the left measure or the right measure when measuring the ‘distance’ - we will also refer to , by the aliases , respectively. (And later , as well.) Analogously, we have . (And .)
In this section we will repeatedly make use of the following construction:
Definition 4.2**.**
Let , , be Polish metric spaces. Let and with . We define
[TABLE]
as the measure given by
[TABLE]
where is a disintegration of w.r.t. and similarly for .
We further define
[TABLE]
Remark 4.3**.**
If is a probability on and is a probability on , another way of saying what \mu\mathbin{\leavevmode\hbox to7.67pt{\vbox to10.32pt{\pgfpicture\makeatletter\hbox{\hskip 3.83331pt\lower-7.98332pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.83331pt}{-1.75pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\otimes}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.47917pt}{-7.98332pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\mathcal{B}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\nu is, is to state that it is a probability on s.t. the law of is equal to , the law of is equal to (where per our convention is the projection onto , etc.), and is conditionally independent from given . (For the notion of conditional independence see for example [22, Definition II.43].)
Another helpful intuition comes from looking at the case where is concentrated on the graph of some measurable function and is concentrated on the graph of a measurable function . \mu\mathbin{\raise 2.58334pt\hbox{\oalign{\hfil\scriptscriptstyle\mathrm{o}\scriptscriptstyle\mathrm{9}\hfil}}}_{\mathcal{B}}\nu is then concentrated on the graph of . In some contexts is also written as f\mathbin{\raise 2.58334pt\hbox{\oalign{\hfil\scriptscriptstyle\mathrm{o}\scriptscriptstyle\mathrm{9}\hfil}}}g, which is where we borrowed the symbol from.
Remark 4.4**.**
We will often encounter the situation that one of the factors , or in Definition 4.2 is itself a product of spaces and the individual factors may not always be so nicely sorted. We will rely on naming in the subscript the space(s) along which to join the measures and . For example if and we might write
[TABLE]
to refer to the measure that we get when in (26) we use as the middle variable . We will not be systematic about the order of the factors in the resulting product space on which e.g. \mu\mathbin{\leavevmode\hbox to15.86pt{\vbox to11.68pt{\pgfpicture\makeatletter\hbox{\hskip 7.93056pt\lower-9.3444pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.83331pt}{-1.75pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\otimes}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-7.93056pt}{-7.98332pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\mathcal{B}{1},\mathcal{B}{2}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\nu is a measure, again relying on naming our spaces for disambiguation.
For future reference we paraphrase the definition of a causal transport plan given in (3) in the introduction.
Lemma 4.5**.**
Let be a measure on and be a measure on . is a causal transference plan from to iff under
[TABLE]
Proof.
One way of formulating conditional independence is as in (3), see for example [22, Definition II.43, Theorem II.45]. ∎
In other words, is a causal transference plan iff \gamma_{\restriction\mathcal{X}_{1},\mathcal{X}_{2},\mathcal{Y}_{1}}=\mu\mathbin{\leavevmode\hbox to7.67pt{\vbox to11.61pt{\pgfpicture\makeatletter\hbox{\hskip 3.83331pt\lower-9.2722pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.83331pt}{-1.75pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\otimes}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.625pt}{-7.98332pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\mathcal{X}_{1}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\gamma_{\restriction\mathcal{X}_{1},\mathcal{Y}_{1}}.
We start by reexpressing in different ways until we find one which also makes sense in .
Let and . Then
[TABLE]
This is true because, on the one hand clearly a is causal by Lemma 4.5 and the alternative characterization of \mathbin{\leavevmode\hbox to7.67pt{\vbox to11.61pt{\pgfpicture\makeatletter\hbox{\hskip 3.83331pt\lower-9.2722pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.83331pt}{-1.75pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\otimes}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.625pt}{-7.98332pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\mathcal{X}{1}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}. On the other hand, given any causal , again by Lemma 4.5, \gamma_{\restriction\mathcal{X}_{1},\mathcal{X}_{2},\mathcal{Y}_{1}}=\mu\mathbin{\leavevmode\hbox to7.67pt{\vbox to11.61pt{\pgfpicture\makeatletter\hbox{\hskip 3.83331pt\lower-9.2722pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.83331pt}{-1.75pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\otimes}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.625pt}{-7.98332pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\mathcal{X}{1}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\gamma_{\restriction\mathcal{X}_{1},\mathcal{Y}_{1}}, and we may define \gamma^{\prime}:=\left(\mu\mathbin{\leavevmode\hbox to7.67pt{\vbox to11.61pt{\pgfpicture\makeatletter\hbox{\hskip 3.83331pt\lower-9.2722pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.83331pt}{-1.75pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\otimes}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.625pt}{-7.98332pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\mathcal{X}{1}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\gamma_{\restriction\mathcal{X}_{1},\mathcal{Y}_{1}}\right)\mathbin{\leavevmode\hbox to16.44pt{\vbox to11.68pt{\pgfpicture\makeatletter\hbox{\hskip 8.22221pt\lower-9.3444pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.83331pt}{-1.75pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\otimes}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-8.22221pt}{-7.98332pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\mathcal{X}{2},\mathcal{Y}_{1}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\gamma_{\restriction\mathcal{X}_{2},\mathcal{Y}_{1},\mathcal{Y}_{2}}\enskip\in\operatorname{Cpl}\left(\mu,\nu\right). Now and , so in particular
[TABLE]
We may name the different building blocks of to get
[TABLE]
with
[TABLE]
i.e. there is a bijection between and given by sending to where , , and, in the other direction, by sending to \gamma^{\prime}:=\left(\mu\mathbin{\leavevmode\hbox to7.67pt{\vbox to11.61pt{\pgfpicture\makeatletter\hbox{\hskip 3.83331pt\lower-9.2722pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.83331pt}{-1.75pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\otimes}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.625pt}{-7.98332pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\mathcal{X}{1}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\gamma\right)\mathbin{\leavevmode\hbox to16.44pt{\vbox to11.68pt{\pgfpicture\makeatletter\hbox{\hskip 8.22221pt\lower-9.3444pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.83331pt}{-1.75pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\otimes}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-8.22221pt}{-7.98332pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\mathcal{X}{2},\mathcal{Y}_{1}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\beta.
We can apply the bijection to . Translating the conditions on to conditions on we arrive at
[TABLE]
where
[TABLE]
Let and let be a measurable mapping with for -a.a. . Then we have that also , where is defined by
[TABLE]
By employing a -a.e. measurable selector this implies that
[TABLE]
We need
Lemma 4.6**.**
If and then the only measure with and is \kappa\mathbin{\leavevmode\hbox to7.67pt{\vbox to10.32pt{\pgfpicture\makeatletter\hbox{\hskip 3.83331pt\lower-7.98332pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.83331pt}{-1.75pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\otimes}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.47917pt}{-7.98332pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\mathcal{B}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\lambda.
Proof.
If satisfies the properties above and , are (classical) disintegrations of , w.r.t. , then a (classical) disintegration of w.r.t. has to satisfy and a.s. As is a Dirac measure a.s. this forces to be almost surely. ∎
This implies that for the distribution of
[TABLE]
under is already determined by , i.e. because the distribution of is \operatorname{dis}_{\mathcal{Y}_{1}}\!\!\left(\gamma\mathbin{\raise 2.58334pt\hbox{\oalign{\hfil\scriptscriptstyle\mathrm{o}\scriptscriptstyle\mathrm{9}\hfil}}}_{\mathcal{X}_{1}}\mu\right) and the distribution of is , the distribution of (27) under must be equal to
[TABLE]
This means that we may get rid of :
[TABLE]
For the final step we need another lemma:
Lemma 4.7**.**
Let and . Let denote the projection onto . Then
[TABLE]
is equal to the distribution of
[TABLE]
Proof.
Let be a version of the (classical) disintegration of w.r.t. and let be a disintegration of w.r.t. .
As one easily checks, a version of the (classical) disintegration of \lambda\mathbin{\raise 2.58334pt\hbox{\oalign{\hfil\scriptscriptstyle\mathrm{o}\scriptscriptstyle\mathrm{9}\hfil}}}_{\mathcal{B}}\beta w.r.t. is given by , so that \operatorname{dis}_{\mathcal{A}}\left(\lambda\mathbin{\raise 2.58334pt\hbox{\oalign{\hfil\scriptscriptstyle\mathrm{o}\scriptscriptstyle\mathrm{9}\hfil}}}_{\mathcal{B}}\beta\right) is equal to
[TABLE]
By the same argument a version of the disintegration of \lambda\mathbin{\raise 2.58334pt\hbox{\oalign{\hfil\scriptscriptstyle\mathrm{o}\scriptscriptstyle\mathrm{9}\hfil}}}_{\mathcal{B}}\operatorname{dis}_{\mathcal{B}}(\beta) w.r.t. is given by , where is a disintegration of w.r.t. . But such a disintegration is given by , (where is the dirac measure at ). So . This means (a version of) is given by
[TABLE]
so that the distribution of under is also given by
[TABLE]
∎
Using this lemma with , , , , and writing , for the projections onto , respectively, we find:
[TABLE]
where \eta(\gamma):=\operatorname{dis}_{\mathcal{X}_{1}}(\mu)\mathbin{\leavevmode\hbox to7.67pt{\vbox to11.61pt{\pgfpicture\makeatletter\hbox{\hskip 3.83331pt\lower-9.2722pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.83331pt}{-1.75pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\otimes}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.625pt}{-7.98332pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\mathcal{X}{1}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\gamma\mathbin{\leavevmode\hbox to7.67pt{\vbox to11.61pt{\pgfpicture\makeatletter\hbox{\hskip 3.83331pt\lower-9.2722pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.83331pt}{-1.75pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\otimes}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.625pt}{-7.98332pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\mathcal{Y}{1}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\operatorname{dis}_{\mathcal{Y}_{1}}(\nu).
By Lemma 4.6 the function is a bijection, so we may as well write
[TABLE]
Finally, under any we know that is almost surely equal to a function of , so that the completions of the sigma-algebras generated by and respectively are equal. This means that a.s. and we arrive at our final expression for :
[TABLE]
Now this expression is trivial to generalize to and , i.e. for such , we set
[TABLE]
To summarize our discussion up to this point:
Lemma 4.8**.**
The function
[TABLE]
as defined in (28) is really an extension of
[TABLE]
as defined in (23) (when is embedded into via ).
Next we promised to show
Lemma 4.9**.**
* is bounded by , i.e.*
[TABLE]
Proof.
By the conditional version of Jensen’s inequality applied to the convex function we have
[TABLE]
∎
Remark 4.10**.**
For the reader who may be sceptical of whether Jensen’s inequality holds in this rather unusual setting, where we have a convex function
[TABLE]
and conditional expectations on spaces of measures we remark that for the Wasserstein distance in particular this is very easy to check. The proof is just integrating transport plans between and w.r.t. the distribution of these conditioned on (in this case) to get transport plans between and .
Lemma 4.11**.**
Let . Then
[TABLE]
Proof.
Using our naming convention we have
[TABLE]
We denote the projections onto , , by , , respectively. , .
Let and . In the following let refer to (conditional) expectation w.r.t. \kappa:=\gamma\mathbin{\leavevmode\hbox to24.99pt{\vbox to12.53pt{\pgfpicture\makeatletter\hbox{\hskip 12.49309pt\lower-10.2pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.83331pt}{-1.75pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\otimes}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-12.49309pt}{-8.45pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\mathcal{Y}{1},\mathscr{P}!\left(\mathcal{Y}{2}\right)}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\eta, and let refer to the -norm w.r.t. .
Combining the triangle inequalities for , and the we get
[TABLE]
By the conditional Jensen inequality
[TABLE]
and therefore
[TABLE]
By construction, is conditionally independent from given , so that (this basic fact about conditional independence can be found for example as Theorem 45 in [22]). Combining this with (30) gives
[TABLE]
Putting together (29) and (31) with the triangle inequality for we get
[TABLE]
∎
Lemma 4.12**.**
* is uniformly continuous w.r.t. on .*
Proof.
Let . We repeatedly use Lemma 4.11:
[TABLE]
therefore
[TABLE]
Switching the roles of and implies
[TABLE]
∎
Lemma 4.13**.**
The infimum in (28) is attained.
Proof.
This is an application of [9, Theorem 1.2].
For self-containedness and because it’s a nice application of the nested distance, we also sketch the argument here. We know that is compact. The problem is that is not (lower semi-) continuous. But we may switch to a topology which is better adapted to the problem at hand. Namely the two-timepoint -topology. In this case the space for the first timepoint is and that for the second is . In effect that means that instead of we are now looking at . The function that we are optimizing over can be written as
[TABLE]
is a continuous function and so is . Now is not compact, but
[TABLE]
is. So we can find a minimizer of in this set. To return to , or more precisely , we can send to the distribution of . Because is continuous and convex in its last argument and by (the conditional version of) Jensens inequality (which could again be proved ‘by hand’ here) . is the sought after minimizer of (28). ∎
Lemma 4.14**.**
Let . Then implies .
Proof.
Call
[TABLE]
To have labels for our spaces, see as
[TABLE]
Let s.t. .
Let s.t. .
All the following considerations happen under \displaystyle\gamma\mathbin{\leavevmode\hbox to7.67pt{\vbox to12.68pt{\pgfpicture\makeatletter\hbox{\hskip 3.83331pt\lower-10.34444pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.83331pt}{-1.75pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\otimes}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.26909pt}{-10.34444pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\vec{\mathcal{Y}}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\eta. Clearly, a.s.
Moreover, because , the random variables form a martingale w.r.t. the filtration generated by . The distribution of is equal to the distribution of . Both of these statements are also true if we integrate some bounded measurable function w.r.t. our random variables, i.e. for any bounded measurable we have that is a martingale and that the distribution of is equal to the distribution of . But this means that we must have a.s. (Lemma 4.15 below). As this is true for all from a countable generator of the sigma-algebra on , we have a.s. ∎
Lemma 4.15**.**
Let be a bounded martingale over . If the distribution of is equal to the distribution of then a.s.
Proof.
This is a consequence of the strict version of Jensen’s inequality applied to any everywhere strictly convex function. (Take for example .) ∎
Remark 4.16**.**
The reason we took the detour of turning our probability-measure-valued martingale into a family of martingales on and arguing on these is because this way we avoid having to exhibit a continuous, everywhere strictly convex function on .
As a reminder:
Definition 4.17**.**
For ,
[TABLE]
Theorem 4.18**.**
* is a metric on satisfying*
[TABLE]
Proof.
This follows from Lemma 4.11, Lemma 4.14 and Lemma 4.9. ∎
Remark 4.19**.**
As outlined at the beginning of this section, and thanks to Theorem 4.18, we now know enough to conclude that the topology induced by is equal to the topology induced by , in case both and were compact. The non-compact case is not much harder. We now proceed to settle this case: For this we need the following lemma.
Lemma 4.20**.**
The map
[TABLE]
is a contraction when we equip the source space with and the target space with . More specifically for
[TABLE]
Proof.
We prove the second statement. Let , . Given and the task is to find s.t.
[TABLE]
We take inspiration from the discussion at the beginning of this section. Let be a measurable selector satisfying
[TABLE]
The obvious choice for , namely will not work because in general it gets the relationship between and wrong, i.e. its first marginal may not be . Instead we again define and and set \gamma^{\prime}:=\gamma_{L}\mathbin{\leavevmode\hbox to16.44pt{\vbox to11.68pt{\pgfpicture\makeatletter\hbox{\hskip 8.22221pt\lower-9.3444pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.83331pt}{-1.75pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\otimes}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-8.22221pt}{-7.98332pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\mathcal{X}{2},\mathcal{Y}{1}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\gamma_{R}.
[TABLE]
Clearly, if we can actually define as announced, then (33) will hold, because then
[TABLE]
It remains to check that and can actually be composed, i.e. that has the same distribution under and .
[TABLE]
The step in the middle has its own Lemma 4.21 below. ∎
Lemma 4.21**.**
Let be a probability on , for Polish spaces . Let be a measurable function. Then
[TABLE]
where without superscript is the (conditional) expectiation w.r.t. and is the projection onto .
Note that is on both sides introduced by the expectation operator which carries a superscript, while may on both sides be interpreted as coming from the outermost context. On the right hand side may also be seen as having been introduced by the outermost conditional expectation operator. (As this operator conditions on this is the same thing.)
Proof.
Both sides are clearly -measurable. We prove that for , multiplying by and taking expectation gives the same result. By definition of the conditional expectation
[TABLE]
Applying the continuous linear function this gives
[TABLE]
Again by the definition of the conditional expectation:
[TABLE]
where for the third equality we plugged in the previous equation. ∎
Alternative proof of Lemma 4.20 when has no isolated points.
When the space has no isolated points one can show that the space is dense in . This allows for a shorter proof of Lemma 4.20:
By the original definition (23) of on the space the inequality (32) holds on . Both and are uniformly continuous on w.r.t. some product metric of with itself. is dense in , and therefore is dense in . This implies that (32) holds on all of . ∎
Theorem 4.22**.**
The topology induced by on is equal to the toplogy induced by on that space.
Proof.
As both topologies are metric and therefore first-countable we may argue on sequences. Let be a sequence in . As , if converges to w.r.t. it also converges to w.r.t. .
Now assume that a sequence in converges to w.r.t. . We will show that every subsequence of has a subsequence which converges to w.r.t. .
Note that convergence of w.r.t. implies that the set is relatively compact w.r.t. the topology induced by . As is continuous as a map from with the topology induced by to with the toplogy induced by (Lemma 4.20), we have that is also relatively compact. By Lemma 1.6/[24, Lemma 3.3] this implies that is relatively compact in with the topology induced by . Now let be some subsequence of . As is relatively compact we can find a subsequence of , which converges w.r.t. to some . As this sequence also converges to w.r.t. . But also converges to w.r.t. . Because the topology induced by is Hausdorff (Lemma 4.14), we must have , i.e. converges to w.r.t. . ∎
Now we return to the general case of time-points.
Theorem 4.23**.**
The topology induced by on is equal to Hellwig’s -information topology and to the topology induced by .
Proof.
As every bicausal transport plan between and can be interpreted as a causal transport plan from to and also as a causal transport plan from to we have that . This means that the identity from with the topology induced by to with the topology induced by is continuous. For the other direction we show that the identity from with the topology induced by to with the -information topology is continuous, i.e. we show that each of the maps
[TABLE]
is continuous when gets the topology induced by .
If and is causal, then, in particular, is ‘causal at the timestep from to ’, i.e. is causal when regarded as a coupling between . This means that if we define like , but only require causality based on the decomposition of as , then , i.e. the identity from with the topology induced by to with the topology induced by is continuous. By Theorem 4.22 the map
[TABLE]
is continuous when we equip with the topology induced by . Now is continuous as a composite of continuous maps. ∎
5. Aldous’ extended weak convergence
In this section we show that Aldous extended -/weak topology is equal to Hellwig’s (-)information topology.
We recall and paraphrase here the definition, already given in the introduction, of Aldous’ topology.
Definition 5.1**.**
Given let be the value of a (classical) disintegration of w.r.t. the first coordinates at . (By convention ). Define
[TABLE]
The extended /weak topology on is the initial topology w.r.t. .
Remark 5.2**.**
Reasonable people may disagree about whether the most faithful / useful transcription of Aldous’ definition should include the factors and in the above product of spaces. When including , as we did, one has to interpret simply as . We leave it as an exercise to the reader to check that either or both may be dropped in the definition of without affecting the resulting topology on .
Theorem 5.3**.**
The (-)extended weak topology is equal to the (-)information topology.
Proof.
We construct continuous maps
[TABLE]
such that
[TABLE]
The first equality above implies that the identity on is continuous from the extended weak topology to the information topology, the second implies that it is continuous in the other direction.
is very simple. We just need to select the right factors and then discard the unnecessary part of the measure component. Formally
[TABLE]
which is cleary continuous.
We construct recursively, by constructing as a composite of continuous maps
[TABLE]
satisfying
[TABLE]
. We need the helper functions
[TABLE]
Given satisfying the induction hypothesis we set
[TABLE]
where is the obvious permutation of the coordinates to get the factors into the right order. is continuous because by [24, Lemma 4.1] \mathbin{\leavevmode\hbox to10.73pt{\vbox to12.07pt{\pgfpicture\makeatletter\hbox{\hskip 5.36711pt\lower-9.73888pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{}{}{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-3.83331pt}{-1.75pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle\otimes}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-5.36711pt}{-9.73888pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{{\scriptstyle{}\mkern 3.5mu\overline{\mkern-3.5mu\mathcal{X}\mkern-0.5mu}\mkern 0.5mu^{m}}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}} is continuous when one of the arguments is an element of some . That (34) still holds for is a straightforward calculation. This way we get to . Finally, set
[TABLE]
∎
6. Bounded vs unbounded metrics
Because we will need it in the next section we interject here a proof of Lemma 1.3, which we restate below.
\convergencemoments
Proof of Lemma 1.3.
We provide the proof only for Hellwig’s topology, i.e. (3) of Theorem 1.3 and Theorem 1.2, respectively. As we have already seen in the previous sections, the topologies (2)–(4) are equivalent topologies, and the result therefore carries over to them. The (-)optimal stopping topology, (5), is treated below. It is clear that convergence w.r.t. -information topology implies convergence in Hellwig’g information topology plus convergence of -th moments. For the reverse implication, let , and denote by the first and by the last coordinates. Now assume that converges to in Hellwig’s information topology and that the -th moments converge. The classical (not adapted) version of the very lemma we prove here implies that in ; in particular is relatively compact. Lemma 1.6 (or really [24, Lemma 3.3]/[9, Lemma 2.6]) therefore guarantees that is relatively compact.
Every subsequence of therefore has a subsequence which converges w.r.t. the topology on (i.e. the one coming from nested Wasserstein metrics) to some . Because convergence in is stronger than convergence in (i.e. in the nested weak sense) we must also have in . But also, by assumption, in and therefore . ∎
7. Optimal Stopping
In this section we investigate the relation between the (-)optimal stopping topology and the adapted Wasserstein topology. Lemma 7.1 states that the topology induced by ((1) of Theorem 1.3) is finer than the -optimal stopping topology. Lemma 7.5 states that the -optimal stopping topology is finer than the -information topology ((3) of Theorem 1.3). This will finish the proof of Theorem 1.3.
Recall that
[TABLE]
for .
Lemma 7.1**.**
Let . Then is continuous w.r.t. . In fact, one has
[TABLE]
for every .
Proof.
Let and assume that . Moreover, let and be arbitrary, and fix a stopping time satisfying . For define
[TABLE]
where the equality holds by the properties of stopping times and since is causal. We then have that
[TABLE]
As further is a stopping time for every fixed one has and therefore
[TABLE]
Changing the role of and and using that and was arbitrary yields (35).
Now assume that and that is less than away from attaining the infimum . Then , where is the identity coupling of . (A coupling between and is given by .) Because is a continuous function of growth of at most order , we get that
[TABLE]
Together with (35) this implies that is continuous w.r.t. . ∎
Remark 7.2**.**
The above proof reveals that if is Lipschitz with constant for every , then .
In order to show that the optimal stopping topology is finer than the -information topology, we need to make a few preparations.
Lemma 7.3**.**
Let be a Polish space. Then the family
[TABLE]
is convergence determining for the weak topology on , that is, a sequence of probability measures in converges weakly to a probability measure if and only if for all in (38).
This follows from the Stone-Weierstrass theorem in case of compact and readily extends to general Polish spaces e.g. via Stone-Čech compactification.
Lemma 7.4**.**
Let be a Polish space. The family of functions
[TABLE]
is convergence determining for the weak topology on .
Proof.
Let , , and as in (38). Moreover, let such that for all and define . Then is compact and satisfies
[TABLE]
Let be some fixed bounded continuous sigmoid function such as or .
By the universal approximation result of Cybenko [21, Theorem 2], the set
[TABLE]
is dense in w.r.t. the supremum norm. As a result, it is enough to replace in (38) by functions of the form . Evaluating the latter function on the vector yields
[TABLE]
upon defining , , and finally for every . The result follows from Lemma 7.3. ∎
Lemma 7.5**.**
The -optimal stopping topology is finer than the -information topology.
Proof.
The choice and for shows that convergence in the -optimal stopping topology implies convergence of the -th moments. Thus, we are left to show that convergence in the optimal stopping topology implies convergence in Hellwig’s information topology. Then, by the part of Lemma 1.3 which has already been established, we obtain convergence in the -information topology.
Fix and denote by the first and by the last coordinates. As is convergence determining for , and is, by Lemma 7.4, convergence determining for , it follows e.g. from [26, Proposition 4.6 (p.115)] that
[TABLE]
is convergence determining for the weak topology on . Since in (40) is bounded, one can actually take in (40) to be compactly supported. But a continuous compactly supported function can be approximated uniformly by piecewise linear functions. The latter are linear combinations of functions of the form where . It therefore follows that
[TABLE]
is also convergence determining for the weak topology on . Let be a function in (41), defined via and , and let be a bound for and . Define via
[TABLE]
(Where is the projection onto the first coordinates and is the projection onto the remaining coordinates.)
By dynamic programming (the Snell-envelope theorem) one has
[TABLE]
for every . This implies that the optimal stopping topology is finer than the initial topology of over in (41). As (41) is convergence determining for the weak topology on , the optimal stopping topology is indeed finer than the information topology, and as observed at the beginning of this proof therefore the -optimal stopping topology is finer than the -information topology. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] B. Acciaio, J. Backhoff-Veraguas, and R. Carmona. Extended mean field control problems: stochastic maximum principle and transport perspective. ar Xiv preprint ar Xiv:1802.05754 , 2018.
- 2[2] B. Acciaio, J. Backhoff-Veraguas, and A. Zalashko. Causal optimal transport and its links to enlargement of filtrations and continuous-time stochastic optimization. Ar Xiv e-prints , 2016.
- 3[3] D. J. Aldous. Weak convergence and general theory of processes. Unpublished draft of monograph; Department of Statistics, University of California, Berkeley, CA 94720, July 1981.
- 4[4] L. Ambrosio, N. Gigli, and G. Savaré. Gradient flows in metric spaces and in the space of probability measures . Lectures in Mathematics ETH Zürich. Birkhäuser Verlag, Basel, second edition, 2008.
- 5[5] J. Backhoff-Veraguas, D. Bartl, M. Beiglböck, and M. Eder. Adapted Wasserstein Distances and Stability in Mathematical Finance. ar Xiv e-prints , page ar Xiv:1901.07450, Jan 2019.
- 6[6] J. Backhoff-Veraguas, M. Beiglböck, M. Eder, and A. Pichler. Fundamental properties of process distances. Ar Xiv e-prints , 2017.
- 7[7] J. Backhoff-Veraguas, M. Beiglböck, M. Huesmann, and S. Källblad. Martingale Benamou–Brenier: a probabilistic perspective. Ar Xiv e-prints , Aug. 2017.
- 8[8] J. Backhoff-Veraguas, M. Beiglböck, Y. Lin, and A. Zalashko. Causal transport in discrete time and applications. SIAM Journal on Optimization , 27(4):2528–2562, 2017.
