Convergence of the Population Dynamics algorithm in the Wasserstein metric
Mariana Olvera-Cravioto

TL;DR
This paper proves the convergence of the population dynamics algorithm in the Wasserstein metric for a class of stochastic fixed-point equations, ensuring the algorithm's reliability in approximating the special endogenous solution.
Contribution
It establishes the convergence in Wasserstein metric and the consistency of estimators for the population dynamics algorithm applied to stochastic fixed-point equations.
Findings
Convergence in Wasserstein metric of order p ($p \\geq 1$) is proven.
Sample-based estimators are shown to be consistent.
The results validate the algorithm's effectiveness in approximating the solution.
Abstract
We study the convergence of the population dynamics algorithm, which produces sample pools of random variables having a distribution that closely approximates that of the {\em special endogenous solution} to a stochastic fixed-point equation of the form: where is a real-valued random vector with , and is a sequence of i.i.d. copies of , independent of ; the symbol denotes equality in distribution. Specifically, we show its convergence in the Wasserstein metric of order () and prove the consistency of estimators based on the sample pool produced by the algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Convergence of the Population Dynamics algorithm
in the Wasserstein metric
Mariana Olvera-Cravioto
Abstract
We study the convergence of the population dynamics algorithm, which produces sample pools of random variables having a distribution that closely approximates that of the special endogenous solution to a stochastic fixed-point equation of the form:
[TABLE]
where is a real-valued random vector with , and is a sequence of i.i.d. copies of , independent of ; the symbol denotes equality in distribution. Specifically, we show its convergence in the Wasserstein metric of order () and prove the consistency of estimators based on the sample pool produced by the algorithm.
Keywords: Population dynamics; iterative bootstrap; Wasserstein metric; distributional fixed-point equations.
1 Introduction
We study an iterative bootstrap algorithm, known as the “population dynamics” algorithm, that can be used to efficiently generate samples of random variables whose distribution closely approximates that of the so-called special endogenous solution to a stochastic fixed-point equation (SFPE) of the form:
[TABLE]
where is a real-valued random vector with , and is a sequence of i.i.d. copies of , independent of . These equations appear in a variety of problems, ranging from computer science to statistical physics, e.g.: in the analysis of divide and conquer algorithms such as Quicksort [27, 14, 28] and FIND [13], the analysis of Google’s PageRank algorithm [30, 18, 9, 23], the study of queueing networks with synchronization requirements [22, 26], and the analysis of the Ising model [12], to name a few. In general, SPFEs of the form in (1.1) can have multiple solutions, but in most cases we are interested in computing those that can be explicitly constructed on a weighted branching process, known as endogenous solutions. In some cases, even the endogenous solution is not unique [5], but characterizing all endogenous solutions can be done using the special endogenous solution, which is the only attracting solution, and can be constructed by iterating (1.1) starting from some well-behaved initial distribution.
This work focuses on the analysis of a simulation algorithm that can be used to generate samples from a distribution that closely approximates that of the special endogenous solution to a variety of SFPEs. The need for such an approximate algorithm lies on the numerical complexity of simulating even a few generations of a weighted branching process using naive Monte Carlo methods. The population dynamics algorithm, described in §14.6.4 in [25] and §8.1 in [1], circumvents this problem by resampling with replacement from previously computed iterations of (1.1), i.e., by using an iterative bootstrap technique. However, as is the case with the standard bootstrap algorithm, the samples obtained are neither independent nor exactly distributed according to the target distribution, which raises the need to study the convergence properties of the algorithm.
Before presenting the algorithm and stating our main results, it may be helpful to describe in more detail some of the examples mentioned above. Throughout the paper, we use and to denote the maximum and the minimum, respectively, of and .
- •
The linear SFPE or “smoothing transform”:
[TABLE]
appears in the analysis of the number of comparisons required by the sorting algorithm Quicksort [27, 14, 28], and can also be used to describe the distribution of the ranks computed by Google’s PageRank algorithm on directed complex networks [30, 18, 9, 23].
- •
The maximum SFPE or “high-order Lindley equation”:
[TABLE]
arises as the limiting waiting time distribution on queueing networks with parallel servers and synchronization requirements [22, 26] and in the analysis of the branching random walk [1].
- •
The discounted tree-sum SFPE:
[TABLE]
appears in the worst-case analysis of the FIND algorithm [13] and the analysis of the “discounted branching random walk” [6].
- •
The “free-entropy” SFPE:
[TABLE]
characterizes the asymptotic free-entropy density in the ferromagnetic Ising model on locally tree-like graphs [12]. In this case, for all , represents the “inverse temperature”, and the magnetic field.
- •
Although the analysis presented here does not directly apply to this case, we mention that the population dynamics algorithm can also be used to simulate the fixed points of the belief propagation equations on random graphical models [25]:
[TABLE]
where the are i.i.d. copies of independent of the vector and the are i.i.d. copies of independent of the vector , with and potentially different.
We refer the reader to [1] for even more examples, including some involving minimums.
The existence and uniqueness of solutions to any of these SFPEs is in itself a non-trivial problem. We refer the reader again to [1] for a broad survey of known results and open problems on this topic. The most well-studied equations are the linear (1.2) and maximum (1.3) SFPEs, which have been extensively studied in [24, 16, 2, 4, 5, 3, 17] and [7, 21], respectively. However, to provide some context to where the population dynamics algorithm fits in, we briefly mention that the existence of solutions is often established by showing that the transformation that maps the distribution on to the distribution of
[TABLE]
where the are i.i.d. random variables distributed according to , independent of the vector , is strictly contracting under some suitable metric. Note that in this case, we have that the sequence of probability measures converges as to a fixed point of (1.1). Moreover, as long as the initial distribution has sufficiently light tails, one can show that converges to the special endogenous solution to (1.1), and the contracting nature of provides an upper bound of the form
[TABLE]
for some constant , where is the distance under which is a contraction.
As will be discussed in more detail later (see Examples 2.7), all the examples provided earlier define contractions under , the Wasserstein metric of order , for some . For completeness, we also include a result (Theorem 2.5) that gives easy to verify conditions guaranteeing that
[TABLE]
for some , where and have distributions and , respectively.
It follows that from a computational point of view, it suffices to have an algorithm for computing for a fixed number of iterations . The population dynamics algorithm produces a sample of observations approximately distributed according to , which can also be helpful in searching for the existence of endogenous solutions, as stated in [1]. We now describe how to obtain an exact sample of , which will also make clear the need for a computationally efficient method.
1.1 Constructing endogenous solutions on weighted branching processes
As mentioned earlier, the attracting endogenous solution to (1.1), provided it exists, can be constructed on a structure known as a weighted branching process [27]. We now elaborate on this point.
Let be the set of positive integers and let be the set of all finite sequences , , where by convention contains the null sequence . To ease the exposition, we will use to denote the index concatenation operation. Next, let be a real-valued vector with . We will refer to this vector as the generic branching vector. Now let be a sequence of i.i.d. copies of the generic branching vector. To construct a weighted branching process we start by defining a tree as follows: let denote the root of the tree, and define the th generation according to the recursion
[TABLE]
Now, assign to each node in the tree a weight according to the recursion
[TABLE]
see Figure 1. If and for all , the weighted branching process reduces to a Galton-Watson process.
To generate a sample from we first need to fix the initial distribution , e.g., by letting be the probability measure of a constant, say zero or one. Now construct a weighted branching process with generations, and let be i.i.d. random variables having distribution . Next, define recursively for each , ,
[TABLE]
The random variable is distributed according to , and its generation requires on average i.i.d. copies of the generic branching vector . It follows that if the goal was to obtain an i.i.d. sample of size from distribution , one would need to generate on average copies of the generic branching vector. However, in applications one typically has , e.g., for Quicksort, in the analysis of PageRank on the WWW graph, and can be in the hundreds for MapReduce implementations related to the maximum SFPE. This makes the exact simulation of using a weighted branching process impractical.
The population dynamics algorithm, described below, uses a bootstrap approach to produce a sample of size of random variables that are approximately distributed according to , and that although not independent, can be used to obtain consistent estimators for moments, quantiles and other functions of .
1.2 The population dynamics algorithm
The population dynamics algorithm is based on the bootstrap, i.e., in the idea of sampling with replacement random variables from a common pool. As described above, the algorithm starts by generating a sample of i.i.d. random variables having distribution , with the difference that when computing the next level of the recursion, it samples with replacement from this pool as needed by the map . In other words, to obtain a pool of approximate copies of we bootstrap from the pool previously obtained of approximate copies of . The approximation lies in the fact that we are not sampling from itself, but from a finite sample of conditionally independent observations that are only approximately distributed as . The algorithm is described below.
Let denote the generic branching vector defining the weighted branching process. Let be the depth of the recursion that we want to simulate, i.e., the algorithm will produce a sample of random variables approximately distributed according to . Choose to be the bootstrap sample size. For each , the algorithm outputs , which we refer to as the sample pool at level .
- a.)
Initialize: Set . Simulate a sequence of i.i.d. random variables distributed according to some initial distribution . Let for .
Output and update . 2. b.)
While :
- (a)
Simulate a sequence of i.i.d. copies of the generic branching vector, independent of everything else. 2. (b)
Let
[TABLE]
where the are sampled uniformly with replacement from the pool . 3. (c)
Output and update .
We conclude this section by pointing out that the complexity of the algorithm described above is of order , while the naive Monte Carlo approach described earlier, which consists on sampling i.i.d. copies of a weighted branching process up to the th generation, has order . Our main results establish the convergence of the algorithm in the Wasserstein metric of order (), as well as the consistency of estimators constructed using the pool . The following section contains all the statements, and the proofs are given in Section 3.
2 Main results
We start by defining the Wasserstein metric of order .
Definition 2.1
Let denote the set of joint probability measures on with marginals and . Then, the Wasserstein metric of order () between and is given by
[TABLE]
An important advantage of working with the Wasserstein metrics is that on the real line they admit the explicit representation
[TABLE]
where and are the cumulative distribution functions of and , respectively, and denotes the generalized inverse of . It follows that the optimal coupling of two real random variables and is given by , where is uniformly distributed in .
With some abuse of notation, we use to denote the Wasserstein distance of order between the probability measures and , where and are their corresponding cumulative distribution functions.
Our main results establish the convergence of as , both in mean and almost surely, where
[TABLE]
and is the pool generated by the population dynamics algorithm. The theorems are proven under two different assumptions, the first one imposing a Lipschitz condition on the mean of , and the second one requiring to be Lipschitz continuous almost surely.
Assumption 2.2
For some there exist a constant such that if is a sequence of i.i.d. random vectors, independent of , then
[TABLE]
[linear0]* For the linear SFPE (1.2), it suffices that the inequality holds for and having the same mean.*
Assumption 2.3
Suppose that for any vector , with , and any sequences of numbers and for which and are well defined, there exists a function such that
[TABLE]
Remarks 2.4
- (i)
To see that Assumption 2.3 implies Assumption 2.2, note that Lemma 4.1 in **[20]** gives that
[TABLE]
and therefore Assumption 2.2 holds with , provided the expectation is finite. However, much tighter bounds can be obtained for specific examples, and we can usually find such that . 2. (ii)
The existence of a for which is important for obtaining estimates for the rate of convergence of the algorithm that are uniform in , and has also important implications for the convergence of as , as the next result shows.
Theorem 2.5
Suppose Assumption 2.2 holds for some , , and any i.i.d. sequence independent of . Then, provided , there exists a random variable and constants and such that
[TABLE]
where and are distributed according to and , respectively. For the linear SFPE (1.2), we have that (2.2) also holds under either of the following conditions:
- i)
If Assumption 2.2 [linear0] holds and .
- ii)
If and , where .
As the proof of Theorem 2.5 shows, one can take under the main set of conditions as well as under conditions (i), whereas for (ii) we have . As a consequence of the proof of Theorem 2.5 we also obtain the following explicit bound for the moments of .
Lemma 2.6
Suppose Assumption 2.2 holds for some . In the linear case, if only Assumption 2.2 [linear0] holds, suppose further that . Then, for any ,
[TABLE]
where .
Before stating the main theorems establishing the convergence of the algorithm in the Wasserstein metric, we point out how Assumptions 2.2 and 2.3 are satisfied by all the examples mentioned in the introduction.
Examples 2.7
- •
The linear SFPE (1.2) clearly satisfies Assumption 2.3 with . Moreover, for the Quicksort algorithm studied in **[27, 14, 28]** we have , and , with uniformly distributed on and , in which case we can take any and in Assumption 2.2 [linear0]. Lemma 2.6 also gives that is uniformly bounded in for all .
For the PageRank algorithm studied in **[30, 18, 9]** we have i.i.d. and independent of , a.s., and for any . Hence, we can take and in Assumption 2.2. Furthermore, Theorem 2.5(ii) gives that for some provided , which in turn gives the uniform boundedness of .
- •
Using the inequality
[TABLE]
for any real numbers and any , we obtain that the maximum SFPE (1.3) satisfies Assumption 2.3 with as well. Furthermore, in the analysis of queueing networks with parallel servers and synchronization requirements from **[22, 26]**, where (equivalently, ), the stability condition of the system implies that for any whenever the system is stable. Lemma 2.6 then implies that is uniformly bounded in for all .
- •
In the case of the discounted tree sum SFPE (1.4), inequality (2.3) implies that we can also take in Assumption 2.3. For the analysis of the FIND algorithm in **[13]** in particular, we have , and , with uniformly distributed on , and we can take for any in Assumption 2.2. Lemma 2.6 then gives that is uniformly bounded in for all
- •
To see that (1.5) also satisfies Assumption 2.3 with (in this case for all ), let (since ) and note that the function
[TABLE]
has derivative
[TABLE]
and therefore satisfies
[TABLE]
Assumption 2.2 is then satisfied for and , with at high temperatures (). Moreover, since , in the “free entropy” SFPE (1.5) is smaller or equal than , where
[TABLE]
Hence, provided , Theorem 2.5(ii) gives that for any for which , is uniformly bounded in .
Our first result establishes the convergence in mean of under the “optimal” moment conditions, that is, assuming only that . In view of Remark 2.4(ii), this is implied in all our examples by . This result was previously proven in [10] for the linear SFPE (1.2) for .
Theorem 2.8
Fix and suppose that satisfies Assumption 2.2 ,or Assumption 2.2 [linear0], for . Assume further that for any fixed , . Let be an i.i.d. sample from distribution , and let denote their corresponding empirical distribution function. Then,
[TABLE]
where is the same from Assumption 2.2. Moreover, if for , , then
[TABLE]
where is a constant that only depends on and .
Remarks 2.9
- (i)
Note that Assumption 2.2 does not require that , i.e., it is not necessary for to define a contraction for the algorithm to work. However, when the bound provided by Theorem 2.8 becomes independent of , ensuring that the complexity of the population dynamics algorithm remains linear in , rather than exponential, i.e., , as the naive algorithm. When for all the bound given above may grow with the level of the recursion, i.e., the value of , and the convergence of the sequence as may not be guaranteed. 2. (ii)
Even in the case when for all , the explicit bounds provided by Theorem 2.8 may be useful for determining whether endogenous solutions exist, since they guarantee that we can accurately approximate . 3. (iii)
We also point out that the first inequality in Theorem 2.8 implies that the rate at which converges to zero is determined by . Since corresponds to implementing the population dynamics algorithm by sampling without replacement from a “perfect” i.i.d. pool of observations from , this convergence rate is in some sense optimal. 4. (iv)
For all the examples given in Examples 2.7, we have and for some , making the bound provided by Theorem 2.8 independent of . Moreover, for the Quicksort and FIND algorithms, as well as for the queuing networks with parallel servers and synchronization requirements, the best possible rate of convergence is achieved, i.e., uniformly in .
We now turn our attention to the almost sure convergence of , for which we provide two different results. The first one holds under Assumption 2.2 as above, but under rather strong moment conditions. Note that for the linear case Assumption 2.2, in its general form, holds for any for which by Remark 2.4(i). Allowing Assumption 2.2 to hold for only is important for guaranteeing that we can choose in Theorem 2.8, but is unimportant for the almost sure convergence of the algorithm.
Theorem 2.10
Fix and suppose that satisfies Assumption 2.2 for both and . Assume further that for any fixed , . Then,
[TABLE]
The moment condition requiring the finiteness of the absolute moment also appears in some related (stronger) results for the convergence of the Wasserstein distance between a distribution function and its empirical measure, specifically, concentration inequalities [15] and a central limit theorem [11]. In our case, where we seek only to establish the almost sure convergence of the algorithm, this condition is too strong, so we provide below an improved result under the finer Assumption 2.3.
Theorem 2.11
Fix and suppose that satisfies Assumption 2.3. Assume further that for some , where . Then, for any fixed ,
[TABLE]
Our last result relates the convergence of to the consistency of estimators based on the pool . More precisely, the value of the algorithm lies in the fact that it efficiently produces a sample of identically distributed random variables whose distribution is approximately . A natural estimator for quantities of the form is then given by
[TABLE]
However, the random variables in are not independent of each other, and the consistency of such estimators requires proof. In the sequel, the symbol denotes convergence in probability.
Definition 2.12
We say that is a weakly consistent estimator for if as . We say that it is a strongly consistent estimator for if a.s.
Our last result shows the consistency of estimators of the form in (2.4) for a broad class of functions.
Proposition 2.13
Fix and suppose that satisfies for all and some constant . Then, the following hold:
- a.)
If as , then (2.4) is a weakly consistent estimator for for each fixed . 2. b.)
If a.s., as , then (2.4) is a strongly consistent estimator for for each fixed .
We conclude that the population dynamics algorithm can be used to efficiently generate sample pools of random variables having a distribution that closely approximates that of the special endogenous solution to SFPEs of the form in (1.1). Furthermore, these sample pools can be used to produce consistent estimators for a broad class of functions. The gain of efficiency of the algorithm compared to a naive Monte Carlo approach, combined with the consistency guarantees proved in this paper, make it extremely useful for the numerical analysis of many problems where SFPEs appear.
3 Proofs
This section includes the proofs of Theorems 2.8, 2.10, 2.11, Proposition 2.13, Theorem 2.5, and of Lemma 2.6, in that order. The last two appear at the end since they are not directly related to the Population Dynamics algorithm. The first four proofs are based on a construction of the pools where we carefully couple the random variables with i.i.d. observations from their limiting distribution .
To start, for any let
[TABLE]
be a collection of i.i.d. random vectors where has the same distribution as the generic branching vector and the are i.i.d. random variables uniformly distributed in , independent of . Next, we recursively construct a sequence of random variables as follows:
- i.
Set , for ; define
[TABLE] 2. ii.
For and each ,
[TABLE]
define
[TABLE]
Note that the random variables are i.i.d. and have distribution , and therefore, is an empirical distribution function for . The distribution functions are those obtained through the population dynamics algorithm.
Throughout the proofs we will also use repeatedly the sigma-algebra for . We point out that all the random variables are measurable with respect to for all .
We are now ready to prove Theorem 2.8.
Proof of Theorem 2.8. Let be a sequence of random vectors constructed as explained above.
Next, note that from the triangle inequality we obtain
[TABLE]
Now let be a Uniform(0,1) random variable independent of everything else, and define the random variables
[TABLE]
which conditionally on are distributed according to and , respectively. Then, from the definition of we have
[TABLE]
It follows from the observation that the random variables are identically distributed, that
[TABLE]
Next, suppose first that Assumption 2.2 for any and , and note that
[TABLE]
For the linear case when only Assumption 2.2 [linear0] holds, note that
[TABLE]
and therefore,
[TABLE]
It now follows from (3.2) and Minkowski’s inequality, that
[TABLE]
Iterating the recursion above we obtain
[TABLE]
Now let and use the fact that is concave to obtain
[TABLE]
or equivalently,
[TABLE]
This completes the first part of the proof.
Next, assume that for , , and use Theorem 1 in [15] to obtain that
[TABLE]
where is a constant that does not depend on . The second statement of the theorem now follows.
We now turn to the proof of Theorem 2.10. To simplify its exposition we first provide a preliminary result for the mean Wasserstein distance between a distribution and its empirical distribution function.
Lemma 3.1
Let be a distribution on and let be i.i.d. random variables distributed according to . Suppose for some , and let denote the empirical distribution function of the . Then,
[TABLE]
Proof. Fix and define for the functions
[TABLE]
Next, use Proposition 7.14 in [8] followed by the monotonicity of the norm, to see that
[TABLE]
where is the generalized inverse of function .
Next, to bound (3.4) note that
[TABLE]
where in the last equality we used the observation that , respectively, . Now note that for any we have
[TABLE]
Hence, (3.4) is bounded from above by a constant times
[TABLE]
To analyze (3.5) use the observation that to obtain that
[TABLE]
Since and
[TABLE]
we obtain that (3.5) is finite. Finally, the same steps used to bound (3.5) give that (3.6) is bounded by
[TABLE]
We now give the proof for the first result on the almost sure convergence of the algorithm. The idea of the proof is to first identify a recursive formula for the Wasserstein distance as it was done for the convergence in mean theorem. Once we do this, the main difficulty lies in ensuring that the errors in the bound converge sufficiently fast to satisfy the criterion for almost sure convergence in the Borel-Cantelli lemma. In the case when we have a bit more than finite moments this can be done using Chebyshev’s inequality, similarly to the proof of the strong law of large numbers under finite fourth moment conditions. We start with this case below.
Proof of Theorem 2.10. We will start the proof by deriving an upper bound for . To this end, we construct the random variables according to the construction given at the beginning of the section. Recall that , where is given by (3.1), and that Assumption 2.2 holds for both and .
We start by noting that the triangle inequality followed by (3.3) give
[TABLE]
Next, define for , and note that by construction, the random variables are identically distributed and conditionally independent given . Now set and note that
[TABLE]
It follows that
[TABLE]
which in turn implies that
[TABLE]
where in the last step we used the inequality for and . Iterating (3.8) more times we obtain
[TABLE]
Now note that by the Glivenko-Cantelli lemma and the strong law of large numbers,
[TABLE]
as , and therefore, by Definition 6.8 and Theorem 6.9 in [29], a.s. for each . It suffices then to show that for each the sums a.s. as well.
To see this note that for any ,
[TABLE]
Moreover, using the same arguments we used in the proof of Theorem 2.8, we obtain that
[TABLE]
Next, note that by Theorem 2.8 we have
[TABLE]
It follows that for any ,
[TABLE]
Finally, since by Lemma 3.1 we have that
[TABLE]
for each , the Borel-Cantelli Lemma gives that a.s. This completes the proof.
We now move on to the proof of Theorem 2.11, where we only have a bit more than finite moments. In this case, we cannot use Chebyshev’s inequality to verify the condition for the Borel-Cantelli lemma, and a finer analysis of the errors is required. In particular, our proof uses the Lipschitz condition from Assumption 2.3 to derive a large-deviations bound for the sum of independent random variables appearing in the recursive analysis of . Before proceeding to the main proof, we give three preliminary results. The first one provides an upper bound for the generalized inverse of any distribution function having finite absolute moments.
Lemma 3.2
Let be a distribution function on , and let be its generalized inverse. Suppose that has finite absolute moments of order . Then, for any ,
[TABLE]
Proof. Let be a random variable having distribution , and define and . Then,
[TABLE]
while if we define to be the right-continuous generalized inverse of , then
[TABLE]
Now use Markov’s inequality to obtain that for all ,
[TABLE]
and
[TABLE]
The first inequality implies that for any ,
[TABLE]
while the second one plus the continuity of gives
[TABLE]
It follows that
[TABLE]
The next two preliminary results provide key steps for the proof of Theorem 2.11, which essentially consist on giving a large-deviations bound (uniform in ) for the sample mean of (conditionally) i.i.d. random variables. The random variables defined below will be used as upper bounds for in the proof of Theorem 2.11, and the estimates we need have to be very tight considering that we no longer have finite second moments, so the rate of convergence to their mean can be very slow. The lemma below gives an upper bound for the truncated summands.
Lemma 3.3
Fix and . Suppose Assumption 2.3 holds and for some , where . Let , where is defined by (3.1), set , , , and
[TABLE]
for . Then, on the event , we have
[TABLE]
Proof. We start by noting that
[TABLE]
To bound each of the probabilities in (3.15) use Chernoff’s bound to obtain that
[TABLE]
Note that by Remark 2.4(i), we have that on the event ,
[TABLE]
Next, use the inequality for to obtain that
[TABLE]
Now use the inequality to see that
[TABLE]
It follows that by choosing we obtain
[TABLE]
which in turn implies that (3.15) is bounded from above by
[TABLE]
This completes the proof.
The next lemma gives the complementary estimate for the probability that any of the exceeds the truncation value in Lemma 3.3. The challenge here is the uniformity in of the result.
Lemma 3.4
Fix . Suppose Assumption 2.3 holds and for some , where . Let and for , fix , and let be defined according to (3.9). Then, for any and all ,
[TABLE]
Proof. To simplify the notation, let
[TABLE]
Next, note that
[TABLE]
where
[TABLE]
Now, let , where is given by (3.1), and note that
[TABLE]
Moreover, if we let and use Lemma 3.2, we obtain that, conditionally on ,
[TABLE]
Furthermore, by Minkowski’s inequality, we have that on the event ,
[TABLE]
It follows that conditionally on , we have that on the event ,
[TABLE]
where by Remark 2.4(ii).
Thus, we have that on the event , the union bound and Markov’s inequality yield
[TABLE]
where by assumption , and we have used the observation that . Finally, note that by Remark 2.4(i), we have
[TABLE]
and
[TABLE]
We conclude that
[TABLE]
We are now ready to prove Theorem 2.11, which proves by induction that a.s. as .
Proof of Theorem 2.11. Define for . We will prove by induction in that
[TABLE]
for . Since for all and , the Glivenko-Cantelli lemma and the strong law of large numbers yield
[TABLE]
Therefore, by Definition 6.8 and Theorem 6.9 in [29],
[TABLE]
Suppose now that (3.11) holds for . To prove that a.s. as , we start by constructing the random variables as explained at the beginning of this section. Now note that for any ,
[TABLE]
To analyze (3.13) note that its convergence to zero as is equivalent to the a.s. convergence of to zero as , which corresponds to the induction hypothesis (3.11).
To show that (3.14) converges to zero as , note that by Remark 2.4(ii) we have , which implies that . Hence, the Glivenko-Cantelli lemma, the strong law of large numbers, and Definition 6.8 and Theorem 6.9 in [29] give that a.s., which is equivalent to
[TABLE]
Next, to prove that (3.12) converges to zero we first define the random variables according to (3.9), and define the events
[TABLE]
Now use (3.3) and Assumption 2.3 to obtain
[TABLE]
To analyze (3.15), choose and let denote the sigma-algebra generated by , as given by (3.1). Note that
[TABLE]
By Lemma 3.3, we obtain that on the event , we have
[TABLE]
which implies that (3.15) is bounded from above by .
To analyze (3.16) note that
[TABLE]
Now set and , and note that . Then, by Lemma 3.4,
[TABLE]
for any , where
[TABLE]
by Remark 2.4(ii). It follows that (3.16) is bounded from above by
[TABLE]
for all . Since and
[TABLE]
as , we conclude that (3.12) is bounded from above by
[TABLE]
which converges to zero as . This completes the proof.
We now give below the proof of Proposition 2.13.
Proof of Proposition 2.13. The second statement of the proposition, regarding the almost sure convergence, follows directly from Definition 6.8 and Theorem 6.9 in [29]. For the convergence in probability we argue as follows.
Define and . By assumption, we have that in and therefore in probability, as . Hence, for every subsequence there is a further subsequence such that a.s. as . Definition 6.8 and Theorem 6.9 in [29] now give that
[TABLE]
We conclude that for any subsequence we can find a further subsequence such that (3.17) holds, and therefore,
[TABLE]
The remaining two proofs in the paper correspond to Theorem 2.5 and Lemma 2.6, which although not directly related to the Population Dynamics algorithm, may be of independent interest.
Proof of Theorem 2.5. Suppose first that Assumption 2.2 holds for any i.i.d. independent of . Recall that . Then, for any we have
[TABLE]
Moreover,
[TABLE]
It follows that for any we have
[TABLE]
which converges to zero as uniformly in whenever and . Therefore, the sequence is Cauchy, and since the Wasserstein space metrized by (see Definition 6.4 in [29]) is complete by Theorem 6.18 in [29], we have that there exists a random variable having distribution such that
[TABLE]
Equation (2.2) now follows by taking to obtain:
[TABLE]
and using the optimal coupling .
We now move to the linear SFPE (1.2), for which it is known (see [19]) that admits the explicit representation
[TABLE]
as described in Section 1.1. When conditions (i) hold we have for all and the arguments used above remain valid.
Suppose now that conditions (ii) hold, in which case we can take , where the are i.i.d. copies of . Therefore, Minkowski’s inequality gives
[TABLE]
where and . Now use Lemma 4.4 in [19] to obtain that under conditions (ii) there exist a constants such that
[TABLE]
where . Hence,
[TABLE]
This completes the proof.
Finally, we provide the proof of Lemma 2.6.
Proof of Lemma 2.6. By (3.18) we have for any ,
[TABLE]
and by (3.19),
[TABLE]
Hence,
[TABLE]
and we obtain that
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D.J. Aldous and A. Bandyopadhyay. A survey of max-type recursive distributional equation. Annals of Applied Probability , 15(2):1047–1110, 2005.
- 2[2] G. Alsmeyer, J.D. Biggins, and M. Meiners. The functional equation of the smoothing transform. Ann. Probab. , 40(5):2069–2105, 2012.
- 3[3] G. Alsmeyer and P. Dyszewski. Thin tails of fixed points of the nonhomogeneous smoothing transform. Stochastic Processes and their Applications , 2017.
- 4[4] G. Alsmeyer and M. Meiners. Fixed points of inhomogeneous smoothing transforms. J. Differ. Equ. Appl. , 18(8):1287–1304, 2012.
- 5[5] G. Alsmeyer and M. Meiners. Fixed points of the smoothing transform: Two-sided solutions. Probab. Theory Rel. , 155(1-2):165–199, 2013.
- 6[6] K.B. Athreya. Discounted branching random walks. Advances in Applied Probability , 17:53–66, 1985.
- 7[7] J.D. Biggins. Lindley-type equations in the branching random walk. Stochastic Process. Appl. , 75:105–133, 1998.
- 8[8] S. Bobkov and M. Ledoux. One-dimensional empirical measures, order statistics, and Kantorovich transport distances. To appear in Memoirs of the American Mathematical Society , 2017.
