Convergence of the Population Dynamics algorithm in the Wasserstein   metric

Mariana Olvera-Cravioto

arXiv:1705.09747·math.PR·February 12, 2018

Convergence of the Population Dynamics algorithm in the Wasserstein metric

Mariana Olvera-Cravioto

PDF

TL;DR

This paper proves the convergence of the population dynamics algorithm in the Wasserstein metric for a class of stochastic fixed-point equations, ensuring the algorithm's reliability in approximating the special endogenous solution.

Contribution

It establishes the convergence in Wasserstein metric and the consistency of estimators for the population dynamics algorithm applied to stochastic fixed-point equations.

Findings

01

Convergence in Wasserstein metric of order p ($p \\geq 1$) is proven.

02

Sample-based estimators are shown to be consistent.

03

The results validate the algorithm's effectiveness in approximating the solution.

Abstract

We study the convergence of the population dynamics algorithm, which produces sample pools of random variables having a distribution that closely approximates that of the {\em special endogenous solution} to a stochastic fixed-point equation of the form: $R = D Φ (Q, N, {C_{i}}, {R_{i}}),$ where $(Q, N, {C_{i}})$ is a real-valued random vector with $N \in N$ , and ${R_{i}}_{i \in N}$ is a sequence of i.i.d. copies of $R$ , independent of $(Q, N, {C_{i}})$ ; the symbol $= D$ denotes equality in distribution. Specifically, we show its convergence in the Wasserstein metric of order $p$ ( $p \geq 1$ ) and prove the consistency of estimators based on the sample pool produced by the algorithm.

Equations373

R = D Φ (Q, N, {C_{i}}, {R_{i}}),

R = D Φ (Q, N, {C_{i}}, {R_{i}}),

R = D Φ (Q, N, {C_{i}}, {R_{i}}),

R = D Φ (Q, N, {C_{i}}, {R_{i}}),

R = D Q + i = 1 \sum N C_{i} R_{i},

R = D Q + i = 1 \sum N C_{i} R_{i},

R = D Q \lor i = 1 ⋁ N C_{i} R_{i}, equivalently, X = D T \lor i = 1 ⋁ N (ξ_{i} + X_{i}),

R = D Q \lor i = 1 ⋁ N C_{i} R_{i}, equivalently, X = D T \lor i = 1 ⋁ N (ξ_{i} + X_{i}),

R = D Q + i = 1 ⋁ N C_{i} R_{i}

R = D Q + i = 1 ⋁ N C_{i} R_{i}

R = D Q + i = 1 \sum N arctanh (tanh (β) tanh (R_{i}))

R = D Q + i = 1 \sum N arctanh (tanh (β) tanh (R_{i}))

R = D Φ (Q, N, {C_{i}}, {\tilde{R}_{i}}) and \tilde{R} = D Ψ (\tilde{Q}, \tilde{N}, {\tilde{C}_{i}}, {R_{i}}),

R = D Φ (Q, N, {C_{i}}, {\tilde{R}_{i}}) and \tilde{R} = D Ψ (\tilde{Q}, \tilde{N}, {\tilde{C}_{i}}, {R_{i}}),

Φ (Q, N, {C_{i}}, {X_{i}}),

Φ (Q, N, {C_{i}}, {X_{i}}),

d (μ_{n}, μ) \leq d (T (μ_{n - 1}), T (μ)) \leq c d (μ_{n - 1}, μ) \leq c^{n} d (μ_{0}, μ), n = 1, 2, \dots,

d (μ_{n}, μ) \leq d (T (μ_{n - 1}), T (μ)) \leq c d (μ_{n - 1}, μ) \leq c^{n} d (μ_{0}, μ), n = 1, 2, \dots,

E [R^{(k)} - R^{β}] \leq c^{k}

E [R^{(k)} - R^{β}] \leq c^{k}

A_{n} = {(i, i_{n}) \in U : i \in A_{n - 1}, 1 \leq i_{n} \leq N_{i}}, n \geq 1.

A_{n} = {(i, i_{n}) \in U : i \in A_{n - 1}, 1 \leq i_{n} \leq N_{i}}, n \geq 1.

Π_{\emptyset} \equiv 1, Π_{(i, i_{n})} = C_{(i, i_{n})} Π_{i}, n \geq 1,

Π_{\emptyset} \equiv 1, Π_{(i, i_{n})} = C_{(i, i_{n})} Π_{i}, n \geq 1,

R_{i}^{(r)} = Φ (Q_{i}, N_{i}, {C_{(i, j)}}_{j \geq 1}, {R_{(i, j)}^{(r - 1)}}_{j \geq 1}) .

R_{i}^{(r)} = Φ (Q_{i}, N_{i}, {C_{(i, j)}}_{j \geq 1}, {R_{(i, j)}^{(r - 1)}}_{j \geq 1}) .

\hat{R}_{i}^{(j, m)} = Φ (Q_{i}^{(j)}, N_{i}^{(j)}, {C_{(i, r)}^{(j)}}, {\hat{R}_{(i, r)}^{(j - 1, m)}}), i = 1, \dots, m,

\hat{R}_{i}^{(j, m)} = Φ (Q_{i}^{(j)}, N_{i}^{(j)}, {C_{(i, r)}^{(j)}}, {\hat{R}_{(i, r)}^{(j - 1, m)}}), i = 1, \dots, m,

d_{p} (μ, ν) = π \in M (μ, ν) in f (\int_{R \times R} ∣ x - y ∣^{p} d π (x, y))^{1/ p} .

d_{p} (μ, ν) = π \in M (μ, ν) in f (\int_{R \times R} ∣ x - y ∣^{p} d π (x, y))^{1/ p} .

d_{p} (μ, ν) = (\int_{0}^{1} ∣ F^{- 1} (u) - G^{- 1} (u) ∣^{p} d u)^{1/ p},

d_{p} (μ, ν) = (\int_{0}^{1} ∣ F^{- 1} (u) - G^{- 1} (u) ∣^{p} d u)^{1/ p},

\hat{F}_{k, m} (x) = \frac{1}{m} i = 1 \sum m 1 (\hat{R}_{i}^{(k, m)} \leq x) and F_{k} (x) = μ_{k} ((- \infty, x]), k \in N,

\hat{F}_{k, m} (x) = \frac{1}{m} i = 1 \sum m 1 (\hat{R}_{i}^{(k, m)} \leq x) and F_{k} (x) = μ_{k} ((- \infty, x]), k \in N,

E [∣ Φ (Q, N, {C_{r}}, {X_{r}}) - Φ (Q, N, {C_{r}}, {Y_{r}}) ∣^{p}] \leq H_{p} E [∣ X_{1} - Y_{1} ∣^{p}] .

E [∣ Φ (Q, N, {C_{r}}, {X_{r}}) - Φ (Q, N, {C_{r}}, {Y_{r}}) ∣^{p}] \leq H_{p} E [∣ X_{1} - Y_{1} ∣^{p}] .

∣ Φ (q, n, {c_{r}}, {x_{r}}) - Φ (q, n, {c_{r}}, {y_{r}}) ∣ \leq r = 1 \sum n φ (c_{r}) ∣ x_{r} - y_{r} ∣.

∣ Φ (q, n, {c_{r}}, {x_{r}}) - Φ (q, n, {c_{r}}, {y_{r}}) ∣ \leq r = 1 \sum n φ (c_{r}) ∣ x_{r} - y_{r} ∣.

E [(r = 1 \sum N φ (C_{r}) ∣ X_{r} - Y_{r} ∣)^{p}]

E [(r = 1 \sum N φ (C_{r}) ∣ X_{r} - Y_{r} ∣)^{p}]

+ E [(r = 1 \sum N φ (C_{r}))^{p}] (E [∣ X_{1} - Y_{1} ∣^{⌈ p ⌉ - 1}])^{p / (⌈ p ⌉ - 1)}

\leq 2 E [(r = 1 \sum N φ (C_{r}))^{p}] E [∣ X_{1} - Y_{1} ∣^{p}],

E [R^{(k)} - R^{p}] \leq A_{p} c_{p}^{k} \to 0, k \to \infty,

E [R^{(k)} - R^{p}] \leq A_{p} c_{p}^{k} \to 0, k \to \infty,

(E [∣ R^{(k)} ∣^{p}])^{1/ p} \leq A_{p} i = 0 \sum k - 1 (H_{p}^{1/ p})^{i},

(E [∣ R^{(k)} ∣^{p}])^{1/ p} \leq A_{p} i = 0 \sum k - 1 (H_{p}^{1/ p})^{i},

1 \leq i \leq n max {x_{i}} - 1 \leq i \leq n max {y_{i}} \leq 1 \leq i \leq n max ∣ x_{i} - y_{i} ∣ \leq i = 1 \sum n ∣ x_{i} - y_{i} ∣

1 \leq i \leq n max {x_{i}} - 1 \leq i \leq n max {y_{i}} \leq 1 \leq i \leq n max ∣ x_{i} - y_{i} ∣ \leq i = 1 \sum n ∣ x_{i} - y_{i} ∣

f (x)

f (x)

= \frac{1}{2} ln (\frac{e ^{2 x} ( 1 + c ) + 1 - c}{e ^{2 x} ( 1 - c ) + 1 + c})

f^{'} (x) = \frac{4 c}{2 ( 1 + c ^{2} ) + ( e ^{2 x} + e ^{- 2 x} ) ( 1 - c ^{2} )} = \frac{2 c}{1 + c ^{2} + cosh ( 2 x ) ( 1 - c ^{2} )},

f^{'} (x) = \frac{4 c}{2 ( 1 + c ^{2} ) + ( e ^{2 x} + e ^{- 2 x} ) ( 1 - c ^{2} )} = \frac{2 c}{1 + c ^{2} + cosh ( 2 x ) ( 1 - c ^{2} )},

∣ f (x) - f (y) ∣ = ∣ f^{'} (ξ) ∣∣ x - y ∣ \leq c ∣ x - y ∣, for some ξ between x and y .

∣ f (x) - f (y) ∣ = ∣ f^{'} (ξ) ∣∣ x - y ∣ \leq c ∣ x - y ∣, for some ξ between x and y .

\tilde{R}^{(k)} = ∣ Q ∣ + i = 1 \sum N tanh (β) \tilde{R}_{i}^{(k - 1)} .

\tilde{R}^{(k)} = ∣ Q ∣ + i = 1 \sum N tanh (β) \tilde{R}_{i}^{(k - 1)} .

E [d_{p} (\hat{F}_{k, m}, F_{k})^{p}] \leq (r = 0 \sum k (H_{p}^{1/ p})^{r})^{p - 1} j = 0 \sum k (H_{p}^{1/ p})^{k - j} E [d_{p} (F_{j, m}, F_{j})^{p}],

E [d_{p} (\hat{F}_{k, m}, F_{k})^{p}] \leq (r = 0 \sum k (H_{p}^{1/ p})^{r})^{p - 1} j = 0 \sum k (H_{p}^{1/ p})^{k - j} E [d_{p} (F_{j, m}, F_{j})^{p}],

E [d_{p} (\hat{F}_{k, m}, F_{k})^{p}]

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Convergence of the Population Dynamics algorithm

in the Wasserstein metric

Mariana Olvera-Cravioto

Abstract

We study the convergence of the population dynamics algorithm, which produces sample pools of random variables having a distribution that closely approximates that of the special endogenous solution to a stochastic fixed-point equation of the form:

[TABLE]

where $(Q,N,\{C_{i}\})$ is a real-valued random vector with $N\in\mathbb{N}$ , and $\{R_{i}\}_{i\in\mathbb{N}}$ is a sequence of i.i.d. copies of $R$ , independent of $(Q,N,\{C_{i}\})$ ; the symbol $\stackrel{{\scriptstyle\mathcal{D}}}{{=}}$ denotes equality in distribution. Specifically, we show its convergence in the Wasserstein metric of order $p$ ( $p\geq 1$ ) and prove the consistency of estimators based on the sample pool produced by the algorithm.

Keywords: Population dynamics; iterative bootstrap; Wasserstein metric; distributional fixed-point equations.

1 Introduction

We study an iterative bootstrap algorithm, known as the “population dynamics” algorithm, that can be used to efficiently generate samples of random variables whose distribution closely approximates that of the so-called special endogenous solution to a stochastic fixed-point equation (SFPE) of the form:

[TABLE]

where $(Q,N,\{C_{i}\})$ is a real-valued random vector with $N\in\mathbb{N}=\{0,1,2,\dots\}$ , and $\{R_{i}\}_{i\in\mathbb{N}}$ is a sequence of i.i.d. copies of $R$ , independent of $(Q,N,\{C_{i}\})$ . These equations appear in a variety of problems, ranging from computer science to statistical physics, e.g.: in the analysis of divide and conquer algorithms such as Quicksort [27, 14, 28] and FIND [13], the analysis of Google’s PageRank algorithm [30, 18, 9, 23], the study of queueing networks with synchronization requirements [22, 26], and the analysis of the Ising model [12], to name a few. In general, SPFEs of the form in (1.1) can have multiple solutions, but in most cases we are interested in computing those that can be explicitly constructed on a weighted branching process, known as endogenous solutions. In some cases, even the endogenous solution is not unique [5], but characterizing all endogenous solutions can be done using the special endogenous solution, which is the only attracting solution, and can be constructed by iterating (1.1) starting from some well-behaved initial distribution.

This work focuses on the analysis of a simulation algorithm that can be used to generate samples from a distribution that closely approximates that of the special endogenous solution to a variety of SFPEs. The need for such an approximate algorithm lies on the numerical complexity of simulating even a few generations of a weighted branching process using naive Monte Carlo methods. The population dynamics algorithm, described in §14.6.4 in [25] and §8.1 in [1], circumvents this problem by resampling with replacement from previously computed iterations of (1.1), i.e., by using an iterative bootstrap technique. However, as is the case with the standard bootstrap algorithm, the samples obtained are neither independent nor exactly distributed according to the target distribution, which raises the need to study the convergence properties of the algorithm.

Before presenting the algorithm and stating our main results, it may be helpful to describe in more detail some of the examples mentioned above. Throughout the paper, we use $x\vee y=\max\{x,y\}$ and $x\wedge y=\min\{x,y\}$ to denote the maximum and the minimum, respectively, of $x$ and $y$ .

•

The linear SFPE or “smoothing transform”:

[TABLE]

appears in the analysis of the number of comparisons required by the sorting algorithm Quicksort [27, 14, 28], and can also be used to describe the distribution of the ranks computed by Google’s PageRank algorithm on directed complex networks [30, 18, 9, 23].

•

The maximum SFPE or “high-order Lindley equation”:

[TABLE]

arises as the limiting waiting time distribution on queueing networks with parallel servers and synchronization requirements [22, 26] and in the analysis of the branching random walk [1].

•

The discounted tree-sum SFPE:

[TABLE]

appears in the worst-case analysis of the FIND algorithm [13] and the analysis of the “discounted branching random walk” [6].

•

The “free-entropy” SFPE:

[TABLE]

characterizes the asymptotic free-entropy density in the ferromagnetic Ising model on locally tree-like graphs [12]. In this case, $C_{i}\equiv\tanh(\beta)$ for all $i\geq 1$ , $\beta\geq 0$ represents the “inverse temperature”, and $Q$ the magnetic field.

•

Although the analysis presented here does not directly apply to this case, we mention that the population dynamics algorithm can also be used to simulate the fixed points of the belief propagation equations on random graphical models [25]:

[TABLE]

where the $\{\tilde{R}_{i}\}$ are i.i.d. copies of $\tilde{R}$ independent of the vector $(Q,N,\{C_{i}\})$ and the $\{R_{i}\}$ are i.i.d. copies of $R$ independent of the vector $(\tilde{Q},\tilde{N},\{\tilde{C}_{i}\})$ , with $\Phi$ and $\Psi$ potentially different.

We refer the reader to [1] for even more examples, including some involving minimums.

The existence and uniqueness of solutions to any of these SFPEs is in itself a non-trivial problem. We refer the reader again to [1] for a broad survey of known results and open problems on this topic. The most well-studied equations are the linear (1.2) and maximum (1.3) SFPEs, which have been extensively studied in [24, 16, 2, 4, 5, 3, 17] and [7, 21], respectively. However, to provide some context to where the population dynamics algorithm fits in, we briefly mention that the existence of solutions is often established by showing that the transformation $T$ that maps the distribution $\mu$ on $\mathbb{R}$ to the distribution of

[TABLE]

where the $\{X_{i}\}$ are i.i.d. random variables distributed according to $\mu$ , independent of the vector $(Q,N,\{C_{i}\})$ , is strictly contracting under some suitable metric. Note that in this case, we have that the sequence of probability measures $\mu_{n+1}=T(\mu_{n})$ converges as $n\to\infty$ to a fixed point of (1.1). Moreover, as long as the initial distribution $\mu_{0}$ has sufficiently light tails, one can show that $\{\mu_{n}\}$ converges to the special endogenous solution to (1.1), and the contracting nature of $T$ provides an upper bound of the form

[TABLE]

for some constant $0<c<1$ , where $d$ is the distance under which $T$ is a contraction.

As will be discussed in more detail later (see Examples 2.7), all the examples provided earlier define contractions under $d_{p}$ , the Wasserstein metric of order $p$ , for some $p\geq 1$ . For completeness, we also include a result (Theorem 2.5) that gives easy to verify conditions guaranteeing that

[TABLE]

for some $0<c<1$ , where $R^{(k)}$ and $R$ have distributions $\mu_{k}$ and $\mu$ , respectively.

It follows that from a computational point of view, it suffices to have an algorithm for computing $\mu_{k}$ for a fixed number of iterations $k\in\mathbb{N}$ . The population dynamics algorithm produces a sample of observations approximately distributed according to $\mu_{k}$ , which can also be helpful in searching for the existence of endogenous solutions, as stated in [1]. We now describe how to obtain an exact sample of $\mu_{k}$ , which will also make clear the need for a computationally efficient method.

1.1 Constructing endogenous solutions on weighted branching processes

As mentioned earlier, the attracting endogenous solution to (1.1), provided it exists, can be constructed on a structure known as a weighted branching process [27]. We now elaborate on this point.

Let $\mathbb{N}_{+}=\{1,2,3,\dots\}$ be the set of positive integers and let $U=\bigcup_{k=0}^{\infty}(\mathbb{N}_{+})^{k}$ be the set of all finite sequences ${\bf i}=(i_{1},i_{2},\dots,i_{n})$ , $n\geq 0$ , where by convention $\mathbb{N}_{+}^{0}=\{\emptyset\}$ contains the null sequence $\emptyset$ . To ease the exposition, we will use $({\bf i},j)=(i_{1},\dots,i_{n},j)$ to denote the index concatenation operation. Next, let $(Q,N,\{C_{i}\}_{i\geq 1})$ be a real-valued vector with $N\in\mathbb{N}$ . We will refer to this vector as the generic branching vector. Now let $\{(Q_{\bf i},N_{\bf i},\{C_{({\bf i},j)}\}_{j\geq 1}\}_{{\bf i}\in U}$ be a sequence of i.i.d. copies of the generic branching vector. To construct a weighted branching process we start by defining a tree as follows: let $A_{0}=\{\emptyset\}$ denote the root of the tree, and define the $n$ th generation according to the recursion

[TABLE]

Now, assign to each node ${\bf i}$ in the tree a weight $\Pi_{\bf i}$ according to the recursion

[TABLE]

see Figure 1. If $P(N<\infty)=1$ and $C_{i}\equiv 1$ for all $i\geq 1$ , the weighted branching process reduces to a Galton-Watson process.

To generate a sample from $\mu_{k}$ we first need to fix the initial distribution $\mu_{0}$ , e.g., by letting $\mu_{0}$ be the probability measure of a constant, say zero or one. Now construct a weighted branching process with $k$ generations, and let $\{R^{(0)}_{\bf i}\}_{{\bf i}\in A_{k}}$ be i.i.d. random variables having distribution $\mu_{0}$ . Next, define recursively for each ${\bf i}\in A_{k-r}$ , $1\leq r\leq k$ ,

[TABLE]

The random variable $R^{(k)}_{\emptyset}$ is distributed according to $\mu_{k}$ , and its generation requires on average $(E[N])^{k}$ i.i.d. copies of the generic branching vector $(Q,N,\{C_{i}\}_{i\geq 1})$ . It follows that if the goal was to obtain an i.i.d. sample of size $m$ from distribution $\mu_{k}$ , one would need to generate on average $m(E[N])^{k}$ copies of the generic branching vector. However, in applications one typically has $E[N]>1$ , e.g., $N\equiv 2$ for Quicksort, $E[N]\approx 30$ in the analysis of PageRank on the WWW graph, and $E[N]$ can be in the hundreds for MapReduce implementations related to the maximum SFPE. This makes the exact simulation of $R^{(k)}$ using a weighted branching process impractical.

The population dynamics algorithm, described below, uses a bootstrap approach to produce a sample of size $m$ of random variables that are approximately distributed according to $\mu_{k}$ , and that although not independent, can be used to obtain consistent estimators for moments, quantiles and other functions of $\mu_{k}$ .

1.2 The population dynamics algorithm

The population dynamics algorithm is based on the bootstrap, i.e., in the idea of sampling with replacement random variables from a common pool. As described above, the algorithm starts by generating a sample of i.i.d. random variables having distribution $\mu_{0}$ , with the difference that when computing the next level of the recursion, it samples with replacement from this pool as needed by the map $\Phi$ . In other words, to obtain a pool of approximate copies of $R^{(j)}$ we bootstrap from the pool previously obtained of approximate copies of $R^{(j-1)}$ . The approximation lies in the fact that we are not sampling from $R^{(j-1)}$ itself, but from a finite sample of conditionally independent observations that are only approximately distributed as $R^{(j-1)}$ . The algorithm is described below.

Let $(Q,N,\{C_{r}\})$ denote the generic branching vector defining the weighted branching process. Let $k$ be the depth of the recursion that we want to simulate, i.e., the algorithm will produce a sample of random variables approximately distributed according to $\mu_{k}$ . Choose $m\in\mathbb{N}_{+}$ to be the bootstrap sample size. For each $0\leq j\leq k$ , the algorithm outputs $\mathscr{P}^{(j,m)}\triangleq\left(\hat{R}^{(j,m)}_{1},\hat{R}^{(j,m)}_{2},\dots,\hat{R}^{(j,m)}_{m}\right)$ , which we refer to as the sample pool at level $j$ .

a.)

Initialize: Set $j=0$ . Simulate a sequence $\{R^{(0)}_{i}\}_{i=1}^{m}$ of i.i.d. random variables distributed according to some initial distribution $\mu_{0}$ . Let $\hat{R}^{(0,m)}_{i}=R^{(0)}_{i}$ for $i=1,\dots,m$ .

Output $\mathscr{P}^{(0,m)}=\left(\hat{R}^{(0,m)}_{1},\hat{R}^{(0,m)}_{2},\dots,\hat{R}^{(0,m)}_{m}\right)$ and update $j=1$ . 2. b.)

While $j\leq k$ :

(a)

Simulate a sequence $\{(Q_{i}^{(j)},N_{i}^{(j)},\{C_{(i,r)}^{(j)}\}_{r\geq 1})\}_{i=1}^{m}$ of i.i.d. copies of the generic branching vector, independent of everything else. 2. (b)

Let

[TABLE]

where the $\hat{R}^{(j-1,m)}_{(i,r)}$ are sampled uniformly with replacement from the pool $\mathscr{P}^{(j-1,m)}$ . 3. (c)

Output $\mathscr{P}^{(j,m)}=\left(\hat{R}^{(j,m)}_{1},\hat{R}^{(j,m)}_{2},\dots,\hat{R}^{(j,m)}_{m}\right)$ and update $j=j+1$ .

We conclude this section by pointing out that the complexity of the algorithm described above is of order $km$ , while the naive Monte Carlo approach described earlier, which consists on sampling $m$ i.i.d. copies of a weighted branching process up to the $k$ th generation, has order $(E[N])^{k}m$ . Our main results establish the convergence of the algorithm in the Wasserstein metric of order $p$ ( $p\geq 1$ ), as well as the consistency of estimators constructed using the pool $\mathscr{P}^{(k,m)}$ . The following section contains all the statements, and the proofs are given in Section 3.

2 Main results

We start by defining the Wasserstein metric of order $p$ .

Definition 2.1

Let $M(\mu,\nu)$ denote the set of joint probability measures on $\mathbb{R}\times\mathbb{R}$ with marginals $\mu$ and $\nu$ . Then, the Wasserstein metric of order $p$ ( $1\leq p<\infty$ ) between $\mu$ and $\nu$ is given by

[TABLE]

An important advantage of working with the Wasserstein metrics is that on the real line they admit the explicit representation

[TABLE]

where $F$ and $G$ are the cumulative distribution functions of $\mu$ and $\nu$ , respectively, and $f^{-1}(t)=\inf\{x\in\mathbb{R}:f(x)\geq t\}$ denotes the generalized inverse of $f$ . It follows that the optimal coupling of two real random variables $X$ and $Y$ is given by $(X,Y)=(F^{-1}(U),G^{-1}(U))$ , where $U$ is uniformly distributed in $[0,1]$ .

With some abuse of notation, we use $d_{p}(F,G)$ to denote the Wasserstein distance of order $p$ between the probability measures $\mu$ and $\nu$ , where $F(x)=\mu((-\infty,x])$ and $G(x)=\nu((-\infty,x])$ are their corresponding cumulative distribution functions.

Our main results establish the convergence of $d_{p}(\hat{F}_{k,m},F_{k})$ as $m\to\infty$ , both in mean and almost surely, where

[TABLE]

and $\mathscr{P}^{(k,m)}=\left(\hat{R}_{1}^{(k,m)},\dots,\hat{R}_{m}^{(k,m)}\right)$ is the pool generated by the population dynamics algorithm. The theorems are proven under two different assumptions, the first one imposing a Lipschitz condition on the mean of $\Phi$ , and the second one requiring $\Phi$ to be Lipschitz continuous almost surely.

Assumption 2.2

For some $p\geq 1$ there exist a constant $0<H_{p}<\infty$ such that if $\{(X_{i},Y_{i}):i\geq 1\}$ is a sequence of i.i.d. random vectors, independent of $(Q,N,\{C_{r}\})$ , then

[TABLE]

[linear0]* For the linear SFPE (1.2), it suffices that the inequality holds for $\{X_{i}\}$ and $\{Y_{i}\}$ having the same mean.*

Assumption 2.3

Suppose that for any vector $(q,n,\{c_{r}\})$ , with $n\in\mathbb{N}\cup\{\infty\}$ , and any sequences of numbers $\{x_{r}\}$ and $\{y_{r}\}$ for which $\Phi(q,n,\{c_{r}\},\{x_{r}\})$ and $\Phi(q,n,\{c_{r}\},\{x_{r}\})$ are well defined, there exists a function $\varphi:\mathbb{R}\to\mathbb{R}_{+}$ such that

[TABLE]

Remarks 2.4

(i)

To see that Assumption 2.3 implies Assumption 2.2, note that Lemma 4.1 in **[20]** gives that

[TABLE]

and therefore Assumption 2.2 holds with $H_{p}=2E\left[\left(\sum_{r=1}^{N}\varphi(C_{r})\right)^{p}\right]$ , provided the expectation is finite. However, much tighter bounds can be obtained for specific examples, and we can usually find $p\geq 1$ such that $H_{p}<1$ . 2. (ii)

The existence of a $p\geq 1$ for which $H_{p}<1$ is important for obtaining estimates for the rate of convergence of the algorithm that are uniform in $k$ , and has also important implications for the convergence of $R^{(k)}\to R$ as $k\to\infty$ , as the next result shows.

Theorem 2.5

Suppose Assumption 2.2 holds for some $p\geq 1$ , $H_{p}<1$ , and any i.i.d. sequence $\{(X_{i},Y_{i}):i\geq 1\}$ independent of $(Q,N,\{C_{r}\})$ . Then, provided $E\left[|R^{(0)}|^{p}+|\Phi(Q,N,\{C_{r}\},\{0\})|^{p}\right]<\infty$ , there exists a random variable $R$ and constants $0\leq c_{p}<1$ and $A_{p}<\infty$ such that

[TABLE]

where $R^{(k)}$ and $R$ are distributed according to $\mu_{k}$ and $\mu$ , respectively. For the linear SFPE (1.2), we have that (2.2) also holds under either of the following conditions:

i)

If Assumption 2.2 [linear0] holds and $E[Q]=E[R^{(0)}]=0$ .

ii)

If $E\left[\left(\sum_{i=1}^{N}|C_{i}|\right)^{p}+|R^{(0)}|^{p}+|Q|^{p}\right]<\infty$ and $\rho_{1}\vee\rho_{p}<1$ , where $\rho_{\beta}\triangleq E\left[\sum_{i=1}^{N}|C_{i}|^{\beta}\right]$ .

As the proof of Theorem 2.5 shows, one can take $c_{p}=H_{p}$ under the main set of conditions as well as under conditions (i), whereas for (ii) we have $c_{p}=\rho_{1}\vee\rho_{p}$ . As a consequence of the proof of Theorem 2.5 we also obtain the following explicit bound for the moments of $R^{(k)}$ .

Lemma 2.6

Suppose Assumption 2.2 holds for some $p\geq 1$ . In the linear case, if only Assumption 2.2 [linear0] holds, suppose further that $E[R^{(0)}]=E[Q]=0$ . Then, for any $k\geq 0$ ,

[TABLE]

where $A_{p}=(H_{p}^{1/p}+1)\left(E[|R^{(0)}|^{p}]\right)^{1/p}+\left(E\left[\left|\Phi(Q,N,\{C_{r}\},\{0\})\right|^{p}\right]\right)^{1/p}$ .

Before stating the main theorems establishing the convergence of the algorithm in the Wasserstein metric, we point out how Assumptions 2.2 and 2.3 are satisfied by all the examples mentioned in the introduction.

Examples 2.7

•

The linear SFPE (1.2) clearly satisfies Assumption 2.3 with $\varphi(t)=|t|$ . Moreover, for the Quicksort algorithm studied in **[27, 14, 28]** we have $N\equiv 2$ , $C_{1}=U=1-C_{2}$ and $Q=2U\ln U+2(1-U)\ln(1-U)+1$ , with $U$ uniformly distributed on $[0,1]$ and $E[Q]=0$ , in which case we can take any $p\in\mathbb{N}_{+}$ and $H_{p}=1-2pE[U^{p-1}(1-U)]=(p-1)/(p+1)<1$ in Assumption 2.2 [linear0]. Lemma 2.6 also gives that $E[|R^{(k)}|^{p}]$ is uniformly bounded in $k$ for all $p\geq 1$ .

For the PageRank algorithm studied in **[30, 18, 9]** we have $\{C_{i}\}_{1\leq i\leq N}$ i.i.d. and independent of $N$ , $|C_{i}|\leq c<1$ a.s., and $E[|C_{1}|^{p}]\leq c^{p}/E[N]$ for any $p\geq 1$ . Hence, we can take $p=1$ and $H_{1}=E[N]E[|C_{1}|]\leq c<1$ in Assumption 2.2. Furthermore, Theorem 2.5(ii) gives that $E[|R^{(k)}-R|^{q}]=O(\gamma^{k})$ for some $0<\gamma<1$ provided $E[|Q|^{q}+N^{q}]<\infty$ , which in turn gives the uniform boundedness of $E[|R^{(k)}|^{q}]$ .

•

Using the inequality

[TABLE]

for any real numbers $\{x_{i},y_{i}\}$ and any $n\geq 1$ , we obtain that the maximum SFPE (1.3) satisfies Assumption 2.3 with $\varphi(t)=|t|$ as well. Furthermore, in the analysis of queueing networks with parallel servers and synchronization requirements from **[22, 26]**, where $T\equiv 0$ (equivalently, $Q\equiv 1$ ), the stability condition of the system implies that $H_{p}<1$ for any $p\geq 1$ whenever the system is stable. Lemma 2.6 then implies that $E[|R^{(k)}|^{p}]$ is uniformly bounded in $k$ for all $p\geq 1$ .

•

In the case of the discounted tree sum SFPE (1.4), inequality (2.3) implies that we can also take $\varphi(t)=|t|$ in Assumption 2.3. For the analysis of the FIND algorithm in **[13]** in particular, we have $N\equiv 2$ , $C_{1}=U=1-C_{2}$ and $Q\equiv 1$ , with $U$ uniformly distributed on $[0,1]$ , and we can take $H_{p}=2E[U^{p}]=2/(p+1)<1$ for any $p>1$ in Assumption 2.2. Lemma 2.6 then gives that $E[|R^{(k)}|^{p}]$ is uniformly bounded in $k$ for all $p>1$

•

To see that (1.5) also satisfies Assumption 2.3 with $\varphi(t)=|t|$ (in this case $C_{i}\equiv\tanh(\beta)$ for all $i\geq 1$ ), let $c=\tanh(\beta)\in[0,1)$ (since $\beta\geq 0$ ) and note that the function

[TABLE]

has derivative

[TABLE]

and therefore satisfies

[TABLE]

Assumption 2.2 is then satisfied for $p=1$ and $H_{1}\leq E[N]\tanh(\beta)$ , with $H_{p}<1$ at high temperatures ( $\beta<1/E[N]$ ). Moreover, since $|f(x)|\leq c|x|$ , $R^{(k)}$ in the “free entropy” SFPE (1.5) is smaller or equal than $\tilde{R}^{(k)}$ , where

[TABLE]

Hence, provided $\beta<1/E[N]$ , Theorem 2.5(ii) gives that for any $p\geq 1$ for which $E[|Q|^{p}+N^{p}]<\infty$ , $E[|R^{(k)}|^{p}]$ is uniformly bounded in $k$ .

Our first result establishes the convergence in mean of $d_{p}(\hat{F}_{k,m},F_{k})$ under the “optimal” moment conditions, that is, assuming only that $\max_{0\leq j\leq k}E[|R^{(j)}|^{p}]<\infty$ . In view of Remark 2.4(ii), this is implied in all our examples by $E\left[\left(\sum_{i=1}^{N}\varphi(C_{i})\right)^{p}\right]<\infty$ . This result was previously proven in [10] for the linear SFPE (1.2) for $p=1$ .

Theorem 2.8

Fix $1\leq p<\infty$ and suppose that $\Phi$ satisfies Assumption 2.2 ,or Assumption 2.2 [linear0], for $p$ . Assume further that for any fixed $k\in\mathbb{N}$ , $\max_{0\leq j\leq k}E[|R^{(j)}|^{p}]<\infty$ . Let $\{R^{(j)}_{1},\dots,R^{(j)}_{m}\}$ be an i.i.d. sample from distribution $F_{j}$ , and let $F_{j,m}$ denote their corresponding empirical distribution function. Then,

[TABLE]

where $0<H_{p}<\infty$ is the same from Assumption 2.2. Moreover, if $\max_{0\leq j\leq k}E[|R^{(j)}|^{q}]<\infty$ for $q>p\geq 1$ , $q\neq 2p$ , then

[TABLE]

where $K=K(p,q)$ is a constant that only depends on $p$ and $q$ .

Remarks 2.9

(i)

Note that Assumption 2.2 does not require that $H_{p}<1$ , i.e., it is not necessary for $\Phi$ to define a contraction for the algorithm to work. However, when $H_{p}<1$ the bound provided by Theorem 2.8 becomes independent of $k$ , ensuring that the complexity of the population dynamics algorithm remains linear in $k$ , rather than exponential, i.e., $(E[N])^{k}$ , as the naive algorithm. When $H_{p}\geq 1$ for all $p\geq 1$ the bound given above may grow with the level of the recursion, i.e., the value of $k$ , and the convergence of the sequence $\{\mu_{k}\}$ as $k\to\infty$ may not be guaranteed. 2. (ii)

Even in the case when $H_{p}\geq 1$ for all $p\geq 1$ , the explicit bounds provided by Theorem 2.8 may be useful for determining whether endogenous solutions exist, since they guarantee that we can accurately approximate $R^{(k)}$ . 3. (iii)

We also point out that the first inequality in Theorem 2.8 implies that the rate at which $E\left[d_{p}(\hat{F}_{k,m},F_{k})^{p}\right]$ converges to zero is determined by $\max_{0\leq j\leq k}E[d_{p}(F_{j,m},F_{j})]$ . Since $d_{p}(F_{j,m},F_{j})$ corresponds to implementing the population dynamics algorithm by sampling without replacement from a “perfect” i.i.d. pool of observations from $\mu_{j-1}$ , this convergence rate is in some sense optimal. 4. (iv)

For all the examples given in Examples 2.7, we have $H_{p}<1$ and $\sup_{k\geq 0}E[|R^{(k)}|^{q}]<\infty$ for some $q>p$ , making the bound provided by Theorem 2.8 independent of $k$ . Moreover, for the Quicksort and FIND algorithms, as well as for the queuing networks with parallel servers and synchronization requirements, the best possible rate of convergence is achieved, i.e., $E[d_{p}(\hat{F}_{k,m},F_{k})^{p}]=O(m^{-1/2})$ uniformly in $k$ .

We now turn our attention to the almost sure convergence of $d_{p}(\hat{F}_{k,m},F_{k})$ , for which we provide two different results. The first one holds under Assumption 2.2 as above, but under rather strong moment conditions. Note that for the linear case Assumption 2.2, in its general form, holds for any $p\geq 1$ for which $E\left[\left(\sum_{i=1}^{N}|C_{i}|\right)^{p}\right]<\infty$ by Remark 2.4(i). Allowing Assumption 2.2 to hold for only $E[X_{i}-Y_{i}]=0$ is important for guaranteeing that we can choose $H_{p}<1$ in Theorem 2.8, but is unimportant for the almost sure convergence of the algorithm.

Theorem 2.10

Fix $1\leq p<\infty$ and suppose that $\Phi$ satisfies Assumption 2.2 for both $p$ and $2p$ . Assume further that for any fixed $k\in\mathbb{N}$ , $\max_{0\leq j\leq k}E[(R^{(j)})^{2p}(\log|R^{(j)}|)^{+}]<\infty$ . Then,

[TABLE]

The moment condition requiring the finiteness of the $2p$ absolute moment also appears in some related (stronger) results for the convergence of the Wasserstein distance between a distribution function and its empirical measure, specifically, concentration inequalities [15] and a central limit theorem [11]. In our case, where we seek only to establish the almost sure convergence of the algorithm, this condition is too strong, so we provide below an improved result under the finer Assumption 2.3.

Theorem 2.11

Fix $1\leq p<\infty$ and suppose that $\Phi$ satisfies Assumption 2.3. Assume further that $E[|R^{(0)}|^{p+\delta}+Z^{p+\delta}]<\infty$ for some $\delta>0$ , where $Z=\sum_{i=1}^{N}\varphi(C_{i})$ . Then, for any fixed $k\in\mathbb{N}$ ,

[TABLE]

Our last result relates the convergence of $d_{p}(\hat{F}_{k,m},F_{k})$ to the consistency of estimators based on the pool $\mathscr{P}^{(k,m)}$ . More precisely, the value of the algorithm lies in the fact that it efficiently produces a sample of identically distributed random variables whose distribution is approximately $F_{k}$ . A natural estimator for quantities of the form $E[h(R^{(k)})]$ is then given by

[TABLE]

However, the random variables in $\mathscr{P}^{(k,m)}$ are not independent of each other, and the consistency of such estimators requires proof. In the sequel, the symbol $\stackrel{{\scriptstyle P}}{{\to}}$ denotes convergence in probability.

Definition 2.12

We say that $\Theta_{n}$ is a weakly consistent estimator for $\theta$ if $\Theta_{n}\stackrel{{\scriptstyle P}}{{\to}}\theta$ as $n\to\infty$ . We say that it is a strongly consistent estimator for $\theta$ if $\Theta_{n}\to\theta$ a.s.

Our last result shows the consistency of estimators of the form in (2.4) for a broad class of functions.

Proposition 2.13

Fix $1\leq p<\infty$ and suppose that $h:\mathbb{R}\to\mathbb{R}$ satisfies $|h(x)|\leq C(1+|x|^{p})$ for all $x\in\mathbb{R}$ and some constant $C>0$ . Then, the following hold:

a.)

If $E[d_{p}(\hat{F}_{k,m},F_{k})^{p}]\to 0$ as $m\to\infty$ , then (2.4) is a weakly consistent estimator for $E[h(R^{(k)})]$ for each fixed $k\in\mathbb{N}$ . 2. b.)

If $d_{p}(\hat{F}_{k,m},F_{k})\to 0$ a.s., as $m\to\infty$ , then (2.4) is a strongly consistent estimator for $E[h(R^{(k)})]$ for each fixed $k\in\mathbb{N}$ .

We conclude that the population dynamics algorithm can be used to efficiently generate sample pools of random variables having a distribution that closely approximates that of the special endogenous solution to SFPEs of the form in (1.1). Furthermore, these sample pools can be used to produce consistent estimators for a broad class of functions. The gain of efficiency of the algorithm compared to a naive Monte Carlo approach, combined with the consistency guarantees proved in this paper, make it extremely useful for the numerical analysis of many problems where SFPEs appear.

3 Proofs

This section includes the proofs of Theorems 2.8, 2.10, 2.11, Proposition 2.13, Theorem 2.5, and of Lemma 2.6, in that order. The last two appear at the end since they are not directly related to the Population Dynamics algorithm. The first four proofs are based on a construction of the pools $\{\mathscr{P}^{(j,m)}:0\leq j\leq k\}$ where we carefully couple the random variables $\{\hat{R}_{i}^{(j,m)}\}$ with i.i.d. observations from their limiting distribution $F_{j}$ .

To start, for any $k\in\mathbb{N}$ let

[TABLE]

be a collection of i.i.d. random vectors where $\left(Q_{i}^{(j)},N_{i}^{(j)},\{C_{(i,r)}^{(j)}\}_{r\geq 1}\right)$ has the same distribution as the generic branching vector $(Q,N,\{C_{r}\}_{i\geq 1})$ and the $\{U_{(i,r)}^{(j)}\}_{r\geq 1}$ are i.i.d. random variables uniformly distributed in $[0,1]$ , independent of $\left(Q_{i}^{(j)},N_{i}^{(j)},\{C_{(i,r)}^{(j)}\}_{r\geq 1}\right)$ . Next, we recursively construct a sequence of random variables $\{(\hat{R}_{i}^{(j,m)},R_{i}^{(j)}):1\leq i\leq m,\,0\leq j\leq k\}$ as follows:

i.

Set $\hat{R}_{i}^{(0)}=F_{0}^{-1}(U_{(i,1)}^{(0)})=R_{i}^{(0,m)}$ , for $1\leq i\leq m$ ; define

[TABLE] 2. ii.

For $1\leq j\leq k$ and each $1\leq i\leq m$ ,

[TABLE]

define

[TABLE]

Note that the random variables $\{R_{i}^{(j)}\}_{i=1}^{m}$ are i.i.d. and have distribution $F_{j}$ , and therefore, $F_{j,m}$ is an empirical distribution function for $F_{j}$ . The distribution functions $\hat{F}_{j,m}$ are those obtained through the population dynamics algorithm.

Throughout the proofs we will also use repeatedly the sigma-algebra $\mathcal{F}_{k}=\sigma(\mathscr{E}_{k})$ for $k\in\mathbb{N}$ . We point out that all the random variables $\{(\hat{R}_{i}^{(k,m)},R_{i}^{(k)}):i\geq 1\}$ are measurable with respect to $\mathcal{F}_{k}$ for all $m\geq 1$ .

We are now ready to prove Theorem 2.8.

Proof of Theorem 2.8. Let $\{(\hat{R}_{i}^{(j,m)},R_{i}^{(j)}):1\leq i\leq m,\,0\leq j\leq k\}$ be a sequence of random vectors constructed as explained above.

Next, note that from the triangle inequality we obtain

[TABLE]

Now let $\chi$ be a Uniform(0,1) random variable independent of everything else, and define the random variables

[TABLE]

which conditionally on $\mathcal{F}_{j}$ are distributed according to $\hat{F}_{j,m}$ and $F_{j,m}$ , respectively. Then, from the definition of $d_{p}$ we have

[TABLE]

It follows from the observation that the random variables $X_{i}^{(j)}=\hat{R}_{i}^{(j,m)}-R_{i}^{(j)}$ are identically distributed, that

[TABLE]

Next, suppose first that Assumption 2.2 for any $\{X_{i}\}$ and $\{Y_{i}\}$ , and note that

[TABLE]

For the linear case when only Assumption 2.2 [linear0] holds, note that

[TABLE]

and therefore,

[TABLE]

It now follows from (3.2) and Minkowski’s inequality, that

[TABLE]

Iterating the recursion above we obtain

[TABLE]

Now let $\lambda_{j,r}=(H_{p}^{1/p})^{j-r}\left(\sum_{r=0}^{j}(H_{p}^{1/p})^{j-r}\right)^{-1}$ and use the fact that $g(x)=x^{1/p}$ is concave to obtain

[TABLE]

or equivalently,

[TABLE]

This completes the first part of the proof.

Next, assume that $\max_{0\leq r\leq k}E[|R^{(r)}|^{q}]<\infty$ for $q>p\geq 1$ , $q\neq 2p$ , and use Theorem 1 in [15] to obtain that

[TABLE]

where $C=C(p,q)$ is a constant that does not depend on $F_{r}$ . The second statement of the theorem now follows.

We now turn to the proof of Theorem 2.10. To simplify its exposition we first provide a preliminary result for the mean Wasserstein distance between a distribution and its empirical distribution function.

Lemma 3.1

Let $G$ be a distribution on $\mathbb{R}$ and let $\{X_{i}\}_{i\geq 1}$ be i.i.d. random variables distributed according to $G$ . Suppose $E[|X_{1}|^{q}(\log|X_{1}|)^{+}]<\infty$ for some $q\geq 2$ , and let $G_{m}(x)=m^{-1}\sum_{i=1}^{m}1(X_{i}\leq x)$ denote the empirical distribution function of the $\{X_{i}\}$ . Then,

[TABLE]

Proof. Fix $\epsilon>0$ and define for $x\geq 0$ the functions

[TABLE]

Next, use Proposition 7.14 in [8] followed by the monotonicity of the $L_{p}$ norm, to see that

[TABLE]

where $g^{-1}(t)=\inf\{x\in\mathbb{R}:g(x)\geq t\}$ is the generalized inverse of function $g$ .

Next, to bound (3.4) note that

[TABLE]

where in the last equality we used the observation that $\{x<a^{-1}(m)\}=\{a(x)<m\}$ , respectively, $\{x<b^{-1}(m)\}=\{b(x)<m\}$ . Now note that for any $n\geq 0$ we have

[TABLE]

Hence, (3.4) is bounded from above by a constant times

[TABLE]

To analyze (3.5) use the observation that $\{x\geq a^{-1}(m)\}=\{a(x)\geq m\}$ to obtain that

[TABLE]

Since $\sup_{t\geq 1}\log(t+1)/\log t<\infty$ and

[TABLE]

we obtain that (3.5) is finite. Finally, the same steps used to bound (3.5) give that (3.6) is bounded by

[TABLE]

We now give the proof for the first result on the almost sure convergence of the algorithm. The idea of the proof is to first identify a recursive formula for the Wasserstein distance $d_{p}(\hat{F}_{k,m},F_{k})$ as it was done for the convergence in mean theorem. Once we do this, the main difficulty lies in ensuring that the errors in the bound converge sufficiently fast to satisfy the criterion for almost sure convergence in the Borel-Cantelli lemma. In the case when we have a bit more than $2p$ finite moments this can be done using Chebyshev’s inequality, similarly to the proof of the strong law of large numbers under finite fourth moment conditions. We start with this case below.

Proof of Theorem 2.10. We will start the proof by deriving an upper bound for $d_{p}(\hat{F}_{k,m},F_{k})$ . To this end, we construct the random variables $\{(\hat{R}_{i}^{(j,m)},R_{i}^{(j)}):1\leq i\leq m,\,0\leq j\leq k\}$ according to the construction given at the beginning of the section. Recall that $\mathcal{F}_{j}=\sigma(\mathscr{E}_{j})$ , where $\mathscr{E}_{j}$ is given by (3.1), and that Assumption 2.2 holds for both $p$ and $2p$ .

We start by noting that the triangle inequality followed by (3.3) give

[TABLE]

Next, define for $j\geq 1$ , $X_{i}^{(j,m)}=\left|\hat{R}_{i}^{(j,m)}-R_{i}^{(j)}\right|^{p}$ and note that by construction, the random variables $\{X_{i}^{(j,m)}\}_{i\geq 1}$ are identically distributed and conditionally independent given $\mathcal{F}_{j-1}$ . Now set $Z_{i}^{(j,m)}=X_{i}^{(j,m)}-E[X_{1}^{(j,m)}|\mathcal{F}_{j-1}]$ and note that

[TABLE]

It follows that

[TABLE]

which in turn implies that

[TABLE]

where in the last step we used the inequality $\left(x+y\right)^{\beta}\leq x^{\beta}+y^{\beta}$ for $0<\beta\leq 1$ and $x,y\geq 0$ . Iterating (3.8) $k-1$ more times we obtain

[TABLE]

Now note that by the Glivenko-Cantelli lemma and the strong law of large numbers,

[TABLE]

as $m\to\infty$ , and therefore, by Definition 6.8 and Theorem 6.9 in [29], $d_{p}(F_{j,m},F_{j})\to 0$ a.s. for each $j\geq 1$ . It suffices then to show that for each $1\leq j\leq k$ the sums $m^{-1}\sum_{i=1}^{m}Z_{i}^{(j,m)}\to 0$ a.s. as well.

To see this note that for any $\epsilon>0$ ,

[TABLE]

Moreover, using the same arguments we used in the proof of Theorem 2.8, we obtain that

[TABLE]

Next, note that by Theorem 2.8 we have

[TABLE]

It follows that for any $1\leq j\leq k$ ,

[TABLE]

Finally, since by Lemma 3.1 we have that

[TABLE]

for each $0\leq r\leq j-1$ , the Borel-Cantelli Lemma gives that $\lim_{m\to\infty}m^{-1}\sum_{i=1}^{m}Z_{i}^{(j,m)}=0$ a.s. This completes the proof.

We now move on to the proof of Theorem 2.11, where we only have a bit more than $p$ finite moments. In this case, we cannot use Chebyshev’s inequality to verify the condition for the Borel-Cantelli lemma, and a finer analysis of the errors is required. In particular, our proof uses the Lipschitz condition from Assumption 2.3 to derive a large-deviations bound for the sum of independent random variables appearing in the recursive analysis of $d_{p}(\hat{F}_{k,m},F_{k})$ . Before proceeding to the main proof, we give three preliminary results. The first one provides an upper bound for the generalized inverse of any distribution function having finite $q$ absolute moments.

Lemma 3.2

Let $G$ be a distribution function on $\mathbb{R}$ , and let $G^{-1}$ be its generalized inverse. Suppose that $G$ has finite absolute moments of order $q>0$ . Then, for any $u\in(0,1)$ ,

[TABLE]

Proof. Let $X$ be a random variable having distribution $G$ , and define $G_{+}(x)=P(X^{+}\leq x)=G(x)1(x\geq 0)$ and $G_{-}(x)=P(X^{-}\leq x)=P(X\geq-x)1(x\geq 0)$ . Then,

[TABLE]

while if we define $G_{-}^{*}$ to be the right-continuous generalized inverse of $G_{-}$ , then

[TABLE]

Now use Markov’s inequality to obtain that for all $x>0$ ,

[TABLE]

and

[TABLE]

The first inequality implies that for any $u\in(0,1)$ ,

[TABLE]

while the second one plus the continuity of $H_{-}$ gives

[TABLE]

It follows that

[TABLE]

The next two preliminary results provide key steps for the proof of Theorem 2.11, which essentially consist on giving a large-deviations bound (uniform in $m$ ) for the sample mean of (conditionally) i.i.d. random variables. The random variables $\{Y_{i}^{(j,m)}\}$ defined below will be used as upper bounds for $d_{p+\delta_{j+1}}(\hat{F}_{j,m},F_{j})$ in the proof of Theorem 2.11, and the estimates we need have to be very tight considering that we no longer have finite second moments, so the rate of convergence to their mean can be very slow. The lemma below gives an upper bound for the truncated summands.

Lemma 3.3

Fix $1\leq p<\infty$ and $\epsilon>0$ . Suppose Assumption 2.3 holds and $E[|R^{(0)}|^{p+\delta}+Z^{p+\delta}]<\infty$ for some $\delta>0$ , where $Z=\sum_{i=1}^{N}\varphi(C_{i})$ . Let $\mathcal{F}_{j}=\sigma(\mathscr{E}_{j})$ , where $\mathscr{E}_{j}$ is defined by (3.1), set $\delta_{j}=\delta(k-j)/k$ , $0\leq j\leq k$ , $\eta=\left(\epsilon^{-1}4e^{2/\epsilon}\max\{1,E[Z^{p+\delta}]\}\right)^{-(p+\delta_{j})/(p+\delta_{j+1})}$ , and

[TABLE]

for $i=1,\dots,m$ . Then, on the event $\left\{\sup_{m\geq n}d_{p+\delta_{j}}(\hat{F}_{j,m},F_{j})^{p+\delta_{j}}\leq\eta\right\}$ , we have

[TABLE]

Proof. We start by noting that

[TABLE]

To bound each of the probabilities in (3.15) use Chernoff’s bound to obtain that

[TABLE]

Note that by Remark 2.4(i), we have that on the event $\left\{\sup_{m\geq n}d_{p+\delta_{j}}(\hat{F}_{j,m},F_{j})^{p+\delta_{j}}\leq\eta\right\}$ ,

[TABLE]

Next, use the inequality $e^{x}\leq 1+xe^{x}$ for $x\geq 0$ to obtain that

[TABLE]

Now use the inequality $1+x\leq e^{x}$ to see that

[TABLE]

It follows that by choosing $\theta=(2/\epsilon)\log m/m$ we obtain

[TABLE]

which in turn implies that (3.15) is bounded from above by

[TABLE]

This completes the proof.

The next lemma gives the complementary estimate for the probability that any of the $\{Y_{i}^{(j,m)}\}$ exceeds the truncation value in Lemma 3.3. The challenge here is the uniformity in $m$ of the result.

Lemma 3.4

Fix $1\leq p<\infty$ . Suppose Assumption 2.3 holds and $E[Z^{p+\delta}]<\infty$ for some $\delta>0$ , where $Z=\sum_{i=1}^{N}\varphi(C_{i})$ . Let $\delta_{j}=\delta(k-j)/k$ and $q_{j}=p+\delta_{j}$ for $0\leq j<k$ , fix $\eta>0$ , and let $Y_{1}^{(j,m)}$ be defined according to (3.9). Then, for any $q_{j+1}<r_{j}<q_{j}$ and all $t\geq n$ ,

[TABLE]

Proof. To simplify the notation, let

[TABLE]

Next, note that

[TABLE]

where

[TABLE]

Now, let $\mathcal{F}_{j}=\sigma(\mathscr{E}_{j})$ , where $\mathscr{E}_{j}$ is given by (3.1), and note that

[TABLE]

Moreover, if we let $q_{j}=p+\delta_{j}$ and use Lemma 3.2, we obtain that, conditionally on $\mathcal{F}_{j}$ ,

[TABLE]

Furthermore, by Minkowski’s inequality, we have that on the event $\{\sup_{m\geq n}d_{q_{j}}(\hat{F}_{j,m},F_{j})^{q_{j}}\leq\eta\}$ ,

[TABLE]

It follows that conditionally on $\mathcal{F}_{j}$ , we have that on the event $\{\sup_{m\geq n}d_{q_{j}}(\hat{F}_{j,m},F_{j})^{q_{j}}\leq\eta\}$ ,

[TABLE]

where $K_{j}\triangleq\eta^{1/q_{j}}+||R^{(j)}||_{q_{j}}<\infty$ by Remark 2.4(ii).

Thus, we have that on the event $\{\sup_{m\geq n}d_{q_{j}}(\hat{F}_{j,m},F_{j})^{q_{j}}\leq\eta\}$ , the union bound and Markov’s inequality yield

[TABLE]

where by assumption $q_{j+1}<r_{j}<q_{j}$ , and we have used the observation that $U_{i}\stackrel{{\scriptstyle\mathcal{D}}}{{=}}1-U_{i}$ . Finally, note that by Remark 2.4(i), we have

[TABLE]

and

[TABLE]

We conclude that

[TABLE]

We are now ready to prove Theorem 2.11, which proves by induction that $d_{p+\delta}(\hat{F}_{k,m},F_{k})\to 0$ a.s. as $m\to\infty$ .

Proof of Theorem 2.11. Define $\delta_{j}=\delta(k-j)/k$ for $0\leq j\leq k$ . We will prove by induction in $j$ that

[TABLE]

for $0\leq j\leq k$ . Since $\hat{F}_{0,m}(x)\equiv F_{0,m}(x)$ for all $x\in\mathbb{R}$ and $E[|R_{0}|^{p+\delta}]<\infty$ , the Glivenko-Cantelli lemma and the strong law of large numbers yield

[TABLE]

Therefore, by Definition 6.8 and Theorem 6.9 in [29],

[TABLE]

Suppose now that (3.11) holds for $0\leq j<k$ . To prove that $d_{p+\delta_{j+1}}(\hat{F}_{j+1,m},F_{j+1})\to 0$ a.s. as $m\to\infty$ , we start by constructing the random variables $\{(\hat{R}_{i}^{(t,m)},R_{i}^{(t)}):1\leq i\leq m,\,0\leq t\leq k\}$ as explained at the beginning of this section. Now note that for any $\epsilon,\eta>0$ ,

[TABLE]

To analyze (3.13) note that its convergence to zero as $n\to\infty$ is equivalent to the a.s. convergence of $d_{p+\delta_{j}}(\hat{F}_{j,m},F_{j})$ to zero as $m\to\infty$ , which corresponds to the induction hypothesis (3.11).

To show that (3.14) converges to zero as $n\to\infty$ , note that by Remark 2.4(ii) we have $E[|R^{(j+1)}|^{p+\delta}]<\infty$ , which implies that $E[|R^{(j+1)}|^{p+\delta_{j+1}}]<\infty$ . Hence, the Glivenko-Cantelli lemma, the strong law of large numbers, and Definition 6.8 and Theorem 6.9 in [29] give that $\lim_{m\to\infty}d_{p+\delta_{j+1}}(F_{j+1,m},F_{j+1})=0$ a.s., which is equivalent to

[TABLE]

Next, to prove that (3.12) converges to zero we first define the random variables $\{Y_{i}^{(j,m)}:1\leq i\leq m\}$ according to (3.9), and define the events

[TABLE]

Now use (3.3) and Assumption 2.3 to obtain

[TABLE]

To analyze (3.15), choose $\eta=\left(\epsilon^{-1}4e^{2/\epsilon}\max\{1,E[Z^{p+\delta}]\}\right)^{-(p+\delta_{j})/(p+\delta_{j+1})}$ and let $\mathcal{F}_{j}=\sigma(\mathscr{E}_{j})$ denote the sigma-algebra generated by $\mathscr{E}_{j}$ , as given by (3.1). Note that

[TABLE]

By Lemma 3.3, we obtain that on the event $\left\{\sup_{m\geq n}d_{p+\delta_{j}}(\hat{F}_{j,m},F_{j})^{p+\delta_{j}}\leq\eta\right\}$ , we have

[TABLE]

which implies that (3.15) is bounded from above by $2(n-1)^{-1/2}$ .

To analyze (3.16) note that

[TABLE]

Now set $q_{j}=p+\delta_{j}$ and $r_{j}=q_{j+1}+\delta/(2k)$ , and note that $q_{j+1}<r_{j}<q_{j}\leq p+\delta$ . Then, by Lemma 3.4,

[TABLE]

for any $t\geq n$ , where

[TABLE]

by Remark 2.4(ii). It follows that (3.16) is bounded from above by

[TABLE]

for all $n\geq 3$ . Since $r_{j}/q_{j+1}>1$ and

[TABLE]

as $n\to\infty$ , we conclude that (3.12) is bounded from above by

[TABLE]

which converges to zero as $n\to\infty$ . This completes the proof.

We now give below the proof of Proposition 2.13.

Proof of Proposition 2.13. The second statement of the proposition, regarding the almost sure convergence, follows directly from Definition 6.8 and Theorem 6.9 in [29]. For the convergence in probability we argue as follows.

Define $\Theta_{k,m}=\frac{1}{m}\sum_{i=1}^{m}h(\hat{R}_{i}^{(k,m)})$ and $\theta_{k}=E[h(R^{(k)})]$ . By assumption, we have that $d_{p}(\hat{F}_{k,m},F_{k})\to 0$ in $L_{p}$ and therefore in probability, as $m\to\infty$ . Hence, for every subsequence $\{m_{i}\}_{i\geq 1}$ there is a further subsequence $\{m_{i_{j}}\}_{j\geq 1}$ such that $d_{p}(\hat{F}_{k,m_{i_{j}}},F_{k})\to 0$ a.s. as $j\to\infty$ . Definition 6.8 and Theorem 6.9 in [29] now give that

[TABLE]

We conclude that for any subsequence $\{m_{i}\}_{i\geq 1}$ we can find a further subsequence $\{m_{i_{j}}\}_{j\geq 1}$ such that (3.17) holds, and therefore,

[TABLE]

The remaining two proofs in the paper correspond to Theorem 2.5 and Lemma 2.6, which although not directly related to the Population Dynamics algorithm, may be of independent interest.

Proof of Theorem 2.5. Suppose first that Assumption 2.2 holds for any i.i.d. $\{(X_{i},Y_{i}):i\geq 1\}$ independent of $(Q,N,\{C_{i}\})$ . Recall that $F_{k}(x)=P(R^{(k)}\leq x)$ . Then, for any $j\in\mathbb{N}_{+}$ we have

[TABLE]

Moreover,

[TABLE]

It follows that for any $m\in\mathbb{N}_{+}$ we have

[TABLE]

which converges to zero as $k\to\infty$ uniformly in $m$ whenever $H_{p}<1$ and $E\left[|R_{0}|^{p}+|\Phi(Q,N,\{C_{r}\},\{0\})|^{p}\right]$ . Therefore, the sequence $\{R^{(k)}:k\geq 0\}$ is Cauchy, and since the Wasserstein space $P_{p}(\mathbb{R})$ metrized by $d_{p}$ (see Definition 6.4 in [29]) is complete by Theorem 6.18 in [29], we have that there exists a random variable $R$ having distribution $F_{*}(x)=P(R\leq x)$ such that

[TABLE]

Equation (2.2) now follows by taking $m\to\infty$ to obtain:

[TABLE]

and using the optimal coupling $(R^{(k)},R)=(F_{k}^{-1}(U),F_{*}^{-1}(U))$ .

We now move to the linear SFPE (1.2), for which it is known (see [19]) that $R$ admits the explicit representation

[TABLE]

as described in Section 1.1. When conditions (i) hold we have $E[R^{(k)}]=0$ for all $k\geq 0$ and the arguments used above remain valid.

Suppose now that conditions (ii) hold, in which case we can take $R^{(k)}=\sum_{j=0}^{k-1}\sum_{{\bf i}\in A_{j}}\Pi_{\bf i}Q_{\bf i}+\sum_{{\bf i}\in A_{k}}\Pi_{\bf i}R^{(0)}_{\bf i}$ , where the $\{R^{(0)}_{\bf i}:{\bf i}\in U\}$ are i.i.d. copies of $R^{(0)}$ . Therefore, Minkowski’s inequality gives

[TABLE]

where $W_{j}\triangleq\sum_{{\bf i}\in A_{j}}|\Pi_{\bf i}||Q_{\bf i}|$ and $W_{k}(R^{(0)})\triangleq\sum_{{\bf i}\in A_{k}}|\Pi_{\bf i}||R^{(0)}_{\bf i}|$ . Now use Lemma 4.4 in [19] to obtain that under conditions (ii) there exist a constants $K_{p},K_{p}^{\prime}<\infty$ such that

[TABLE]

where $\rho_{\beta}\triangleq E\left[\sum_{i=1}^{N}|C_{i}|^{\beta}\right]$ . Hence,

[TABLE]

This completes the proof.

Finally, we provide the proof of Lemma 2.6.

Proof of Lemma 2.6. By (3.18) we have for any $j\in\mathbb{N}_{+}$ ,

[TABLE]

and by (3.19),

[TABLE]

Hence,

[TABLE]

and we obtain that

[TABLE]

Bibliography30

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D.J. Aldous and A. Bandyopadhyay. A survey of max-type recursive distributional equation. Annals of Applied Probability , 15(2):1047–1110, 2005.
2[2] G. Alsmeyer, J.D. Biggins, and M. Meiners. The functional equation of the smoothing transform. Ann. Probab. , 40(5):2069–2105, 2012.
3[3] G. Alsmeyer and P. Dyszewski. Thin tails of fixed points of the nonhomogeneous smoothing transform. Stochastic Processes and their Applications , 2017.
4[4] G. Alsmeyer and M. Meiners. Fixed points of inhomogeneous smoothing transforms. J. Differ. Equ. Appl. , 18(8):1287–1304, 2012.
5[5] G. Alsmeyer and M. Meiners. Fixed points of the smoothing transform: Two-sided solutions. Probab. Theory Rel. , 155(1-2):165–199, 2013.
6[6] K.B. Athreya. Discounted branching random walks. Advances in Applied Probability , 17:53–66, 1985.
7[7] J.D. Biggins. Lindley-type equations in the branching random walk. Stochastic Process. Appl. , 75:105–133, 1998.
8[8] S. Bobkov and M. Ledoux. One-dimensional empirical measures, order statistics, and Kantorovich transport distances. To appear in Memoirs of the American Mathematical Society , 2017.