A central limit theorem for the gossip process
A. D. Barbour, A. R\"ollin

TL;DR
This paper proves a central limit theorem for the Aldous gossip process, showing that the random time shift in information dissemination is approximately normally distributed, with computable mean and variance.
Contribution
It extends the understanding of the gossip process by establishing a normal approximation for the initial stochastic delay, enhancing predictive accuracy.
Findings
The random time shift follows an approximately normal distribution.
The mean and variance of the time shift can be explicitly computed.
The broad deterministic description remains valid with increased precision.
Abstract
The Aldous gossip process represents the dissemination of information in geographical space as a process of locally deterministic spread, augmented by random long range transmissions. Starting from a single initially informed individual, the proportion of individuals informed follows an almost deterministic path, but for a random time shift, caused by the stochastic behaviour in the very early stages of development. In this paper, it is shown that, even with the extra information available after a substantial development time, this broad description remains accurate to first order. However, the precision of the prediction is now much greater, and the random time shift is shown to have an approximately normal distribution, with mean and variance that can be computed from the current state of the process.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Evolutionary Game Theory and Cooperation · Opinion Dynamics and Social Influence
A central limit theorem for the gossip process
A. D. Barbour111Institut für Mathematik, Universität Zürich, Winterthurertrasse 190, CH-8057 ZÜRICH. Work begun while ADB was Saw Swee Hock Professor of Statistics at the National University of Singapore, and supported in part by Australian Research Council Grants Nos DP120102728, DP120102398, DP150101459 and DP150103588.
and A. Röllin222Department of Statistics and Applied Probability, National University of Singapore, 6 Science Drive 2, 117546 Singapore. Supported in part by NUS Research Grant R-155-000-167-112 and Australian Research Council Grant No. DP150101459.
Universität Zürich and National University of Singapore
Abstract
The Aldous gossip process represents the dissemination of information in geographical space as a process of locally deterministic spread, augmented by random long range transmissions. Starting from a single initially informed individual, the proportion of individuals informed follows an almost deterministic path, but for a random time shift, caused by the stochastic behaviour in the very early stages of development. In this paper, it is shown that, even with the extra information available after a substantial development time, this broad description remains accurate to first order. However, the precision of the prediction is now much greater, and the random time shift is shown to have an approximately normal distribution, with mean and variance that can be computed from the current state of the process.
Keywords.
Gossip process, deterministic approximation, branching processes, central limit theorem
MRC subject classification.
92H30; 60K35, 60J85.
1 Introduction
A model for the dissemination of information in space, in which random long-range contacts facilitate spread, was introduced in Aldous (2012). In an idealized version, proposed by Chatterjee & Durrett (2011), individuals are represented as a continuum, evenly distributed over a two-dimensional torus of large area . Information spreads locally at constant rate from individuals to their neighbours, so that a disc of informed individuals, centred on an initial informant, grows steadily in the torus. However, information is also spread by long range transmissions to other, randomly chosen points of the torus, according to a Poisson process, whose rate is proportional to the area of currently informed individuals. Any such transmission initiates a new disc of informed individuals. The process can also be interpreted as a model of the spread of an SI disease, in which local infection is supplemented by occasional long-range contacts.
With denoting the area of informed individuals by time , Chatterjee & Durrett (2011) showed that, after some randomness in the initial stages of the process, the proportion of the torus that has been informed by time closely follows a particular, deterministic path. The times at which increases from almost zero to almost one is relatively short, and occurs around a time , which is a fixed multiple of . In what follows, we therefore concentrate on times relative to . Roughly speaking, Chatterjee & Durrett (2011) showed that, for large , we have
[TABLE]
for some function , where is a scaling factor related to the speed of spread of information, and where is a random variable. The path is the same for all realizations of the process, but the position on the path at a particular time varies from realization to realization because of the random time shift . This result was generalized to gossip processes on rather general homogeneous Riemannian manifolds by Barbour & Reinert (2013), hereafter referred to as [BR], as well as to related ‘small world’ processes; they also derived a uniform bound on the approximation error. In addition, the equation describing the deterministic development was interpreted in terms of the Laplace transform of the limiting random variable corresponding to an associated Crump–Mode–Jagers (CMJ) branching process (Jagers, 1975).
By analogy with the theory of Markov population processes (Kurtz 1970, 1971), one might expect that the fluctuations around the deterministic path of the proportions informed would be approximately Gaussian, with standard deviation , at least while the proportion informed is not too small or too close to . Here, however, the random quantity of most interest — the difference between the actual course of the process and a prediction of the course based on information available early in its development — involves the fluctuations of the process while the proportion informed is rather small, and the standard analogy does not apply. Instead, in view of the approximation already established, it seems reasonable at times to predict the value of by , where is the expected value of , given the information at time , and to augment the point prediction with a confidence interval around , derived from the (approximate) conditional distribution of , given the current information.
The validity of the procedure is justified in detail in Section 3. The broad argument is to exploit the fact that is the probability that a point , chosen independently and uniformly at random in , belongs to the informed set :
[TABLE]
As it stands, this changes nothing. However, it indicates that a good approximation might be obtained by replacing by , or, equivalently, replacing by , for chosen so that is close enough to . In particular, for prediction from , we need to choose so that
[TABLE]
where and denote expectation and standard deviation given the information at time .
The advantage of using is that can be approximated as the probability of at least one of many small balls, with centres chosen independently and at random in , intersecting . These balls are the islands in an independent ‘backwards’ gossip process, run for a length of time from . There are many such balls if is not too small, and the intersection probability can be approximated by a Poisson probability, using the Stein–Chen method; see Lemma 3.3. The mean of the Poisson distribution can, with considerable effort, be shown to be close to , where is a quantity that can be simply expressed in terms of a carefully chosen branching process, and is a constant. Now, given the information available at time , the quantity (which loosely corresponds to ) is known, and the conditional distribution of the difference is approximately normal, as is shown in Theorem 2.8 in Section 2. This, in turn, leads to a normal approximation for the difference between and its prediction at time . This implies the main result of the paper, that
[TABLE]
for suitable choice of the standard deviation depending on and ; a precise statement is given in Theorem 1.1. The error in the normal approximation is shown to be small if the number of individuals informed at time is large, even if their proportion in the whole population may be very small. For practical purposes, in an epidemic, the very earliest development may well pass almost unnoticed — the origins are often obscure — but prediction on the basis of the information gained from the first few hundred cases is an important public health goal, in which case using the normal approximation is reasonable.
1.1 Detailed formulation
We now describe the problem in more detail. We consider the gossip process evolving on a smooth closed homogeneous Riemannian manifold of dimension , such as a sphere or a torus, having large finite volume with respect to its intrinsic metric. An individual at point informed at time [math] gives rise to deterministic local spread that informs the set by time ; in addition, random ‘long range transmissions’ to independent and uniformly distributed points of occur at rate times the intrinsic volume of the set currently informed. Thus the process can be constructed from knowledge of the points of a point process on (characterized immediately below), together with an independent sequence of independent points , uniformly distributed in , and an initial point . The informed set and its volume are denoted by
[TABLE]
The point process is simple, and has conditional intensity at time with respect to the filtration , where .
The sets are assumed to be closed balls, centred at and of radius , with respect to a metric that makes a geodesic space: exactly when . Since is assumed to be homogeneous, the volume of is independent of , and we will therefore denote it by . The sets are also assumed to be locally almost Euclidean in the sense that for some constant . More precisely, we will assume that, for constants ,
[TABLE]
The quantity has physical dimensions , so that can be interpreted as a local velocity of spread of information in any particular direction. Assumption (1.4) is satisfied, for instance, for balls with respect to geodesic distance on the surface of a -dimensional sphere of large radius , when and
[TABLE]
(Li, 2011), in which case we can take in all dimensions .
Using (1.4), the probability of there being no long range transmission before time is given by
[TABLE]
so that the mean time to the first long range transmission is approximately
[TABLE]
Thus
[TABLE]
having physical dimensions , is such that represents the time scale for the first long range transmission, and then reflects the size of the initial neighbourhood when the first long range transmission occurs; the exact specification of is to make it equal to the growth rate of the associated CMJ process ([BR], p.986). For our approximations to be good, the size of the initial neighbourhood when the first long range transmission occurs should be small compared to , so that, defining
[TABLE]
a quantity without physical dimension, we shall take to be large. Note that, if this is so, the approximations made above have small error, in view of (1.4).
To start with, the points of closely match the birth events of a CMJ process , whose birth intensity as a function of age is given by . In fact, the approximation of , constructed by using the CMJ process to approximate and with the same sequence of points , is excellent for times if ([BR], §2.2), and still gives an approximation to the volume of at time that is accurate to the first order if ([BR], Theorem 3.2 and (2.23)). This CMJ approximation takes the form
[TABLE]
for a constant , where is a limiting random variable associated with the CMJ process . Taking
[TABLE]
with large and negative in the range in which this approximation holds, this implies that closely follows the curve
[TABLE]
where .
In [BR], Theorem 3.2, an analogous approximation
[TABLE]
is established, with uniformly small error, for all values of , with defined before (1.11), and with the time shift given by , for a suitably chosen constant . Clearly, to be compatible with (1.9), as , as follows from ([BR], following (2.23)).
For any fixed , the distribution of is close to that of , and is a bounded random variable. Hence it can only be approximately normally distributed, after appropriate centring and normalization, in circumstances in which the distribution of is concentrated close to some fixed value. This is not true of the distribution of at time [math]. However, when predicting from a time for any fixed , , the conditional distribution of , given the information up to time , is concentrated close to an approximation provided only that , even though the size of the informed set is still relatively small when compared to for any . The aim is now to show that the difference , suitably normalized, is approximately normally distributed.
It turns out to be easier to work with a ‘flattened’ CMJ process , rather than with the original CMJ process . The process has birth rate at age given by , and is thus the same process for all , whereas depends implicitly on through the function . The quantity then turns out to be the Malthusian parameter of . In a CMJ process with Malthusian parameter , at large times, a randomly sampled individual has average age approximately . For , , and replacing by in (1.4) confirms that the two CMJ processes and have birth rates that are close to each other if is large. The essentials of the proof of the normal approximation to are carried out in Section 2. The argument hinges on examining a collection of (complex valued) martingales associated with , that are defined in (2.13) below. In particular, , , is non-negative and square integrable, having limit . It is then shown that , suitably normalized, is close enough to the integral of a function with respect to an independent standard Brownian motion , giving the normal approximation.
The arguments in Section 3, as outlined before (1.2), rely heavily on comparisons between birth and growth processes. The actual process is compared with the branching approximation , and is compared to its flattened version . Further (flattened) CMJ processes and are then introduced, to act as upper and lower bounds for ; the comparison is formalized in Lemma 3.1. All the detailed computations in Section 3 are made using these processes, including the reduction of the intersection probability in Lemma 3.3 to a tractable form in Lemma 3.6.
To state our theorem, we take
[TABLE]
as an approximation to , where the set indexes the set of all non-intersecting neighbourhoods of . For each of these, the radii can be determined, and so can be derived from . Then let , and
[TABLE]
and define
[TABLE]
where is as above; see also (2.13) and (2.18). Let denote the bounded Wasserstein distance between probability measures on :
[TABLE]
where consists of all Lipschitz functions whose Lipschitz constant is at most . The theorem is as follows.
Theorem 1.1
With the above definitions, suppose that for , where is as in (1.4). Then, for any , there exists a and an event with such that
[TABLE]
uniformly in , where as in (1.8) and
[TABLE]
So, for instance, for spherical neighbourhoods in , it is possible to take any strictly between [math] and in Theorem 1.1. The order statements can be replaced by inequalities, valid for all sufficiently large, in which the constants depend only on and ; however, the lower bound on the value of then also involves and the constants and from (1.4).
In fact, the proof shows a little more: that we could realize the normal random variables , for different values of , as for the same standard normal random variable . The interpretation of this is that the fluctuations in are essentially those of , and that the remaining randomness after time is overwhelmingly that of the difference , a single random variable. This, at first sight surprising, result reflects the phenomenon common to branching processes, that the randomness determining the growth of a super-critical branching process occurs at the very beginning of its development.
2 The branching process
In this section, we investigate the limit , as , of a martingale associated with a particular CMJ branching process. We show that is approximately normally distributed, and give an explicit bound on the accuracy of the approximation. Although, for a (multitype) Galton–Watson process, a central limit theorem of this sort is not difficult to establish (Asmussen & Hering, Theorem 7.1), the corresponding theorems for general CMJ processes seem not to be available. Here, we are able to exploit the particular structure of our CMJ process to prove what we need.
We start by identifying the branching process that we work with, which can be expressed as a Markov process in a -dimensional space. The properties of the coordinate processes , and of some equivalent (complex valued) martingales are established in Lemma 2.1. The component is a non-negative real valued martingale, and is its limit as . Using Kolmogorov’s inequality, the fluctuations of the sample paths of the processes are controlled in Lemma 2.2, and this in turn gives control over the processes .
The martingale difference is written in (2.23) as an integral of an explicit function of the process with respect to a standard compensated Poisson process. Using the control that we have over the , we determine successively simpler approximations to this process, in (2.29) and (2.31), at each stage making sure that the error incurred is sufficiently small (Lemma 2.4 and Corollary 2.6). Finally, in (2.35), an expression is obtained in which integration with respect to the compensated Poisson process has been replaced by integration with respect to standard Brownian motion, and this can be used with an error controlled in Lemma 2.7. The results of these steps are collected as a functional approximation in Theorem 2.8. The version that is used to prove Theorem 3.9 in Section 3 is given as Corollary 2.10.
2.1 Properties of the flattened process
The first step is to determine a suitable . We do so by way of a ‘flattened’ version of the CMJ branching process . The process is the counting process associated with a point process on , with a.s., whose compensator is given by , where , and where , as before, denotes the intensity per unit volume. At time , can be thought of as consisting of neighbourhoods, whose volumes at time are given by , asymptotically close to, but not the same as the volume . The intensity is then precisely that of a CMJ process, in which neighbourhoods play the part of individuals, and the point process of an individual’s offspring is an inhomogeneous Poisson process with rate at age . The mean number of offspring of an individual is thus infinite, but the Malthusian parameter , chosen so that the equation
[TABLE]
is satisfied, is finite, and is given by . Note that
[TABLE]
where is the inhomogeneous Poisson process with rate at age .
We can immediately deduce some useful general properties of the process . To start with, because the variance of the discounted offspring number is finite, being given by , it follows from Ganuza & Durham (1974, Theorem 1) that there exist finite constants and such that, for all ,
[TABLE]
in view of (2.1), and depend only on . Then the intensity can be expressed as , where
[TABLE]
This in turn implies from (2.2) that
[TABLE]
using Cauchy–Schwarz for the second inequality.
However, also has special structure that will prove useful in what follows, relating to the sums
[TABLE]
of the -th powers of the ages of the neighbourhoods. Note that is as defined previously, and that
[TABLE]
Since has intensity , letting denote a unit rate Poisson process, we can write
[TABLE]
Defining , for any , the equations (2.6) reduce to
[TABLE]
with the particular choice , equation (2.7) becomes
[TABLE]
so that . In particular, from (2.8) and (2.9), it follows that the process defined by
[TABLE]
is a Markov process. It also follows directly from (2.8) and (2.9), or as a consequence of (2.1), that
[TABLE]
where denotes the process with . Note that may depend on , as also may .
In order to describe the properties of the process in more detail, we introduce the (complex valued) processes
[TABLE]
where , , which are martingales with respect to the natural filtration of . In particular, for , we have , and
[TABLE]
is a real valued, càdlàg martingale, and plays a key part our arguments. It is shown in the next lemma that it is also non-negative, and the rest of the section is then devoted to proving a normal approximation to , which is the basis for the central limit theorem for the gossip process itself. Note that the distribution of can be derived from the corresponding martingale for the process with , since, from (2.11),
[TABLE]
from this, it also follows that the distribution of is the same for all . The remaining martingales are useful, because they enable the quantities to be expressed in a tractable form, as in the next lemma.
Lemma 2.1
With notation as above, we have
[TABLE]
and
[TABLE]
Proof: It follows from (2.8) that, for any ,
[TABLE]
and, by partial integration, that
[TABLE]
Hence
[TABLE]
and thus
[TABLE]
Taking for any , we have , making the right hand side equal to , because , by (2.8) and (2.9); hence
[TABLE]
The first statement of the lemma follows by taking , and the second by using the orthogonality relation .
Now, writing and noting that , it follows from (2.12) that, for and for ,
[TABLE]
Using this bound with , we see that the variances of the terms with in the sum in Lemma 2.1 converge to zero as . However, the term with remains significant as , since, by (2.17) with and , it follows that is square integrable, and that
[TABLE]
Note that the distribution of , through its Laplace transform as in (1.12), already appears in the statement of Theorem 1.1, and is the same for all , as remarked following (2.14). Thus each of the satisfies
[TABLE]
We shall exploit more detailed versions of these asymptotics in Section 3.
In order to use Lemma 2.1 to describe further the behaviour of the , we need good control of the fluctuations of the processes . As indicated by (2.17), their asymptotic behaviour depends substantially on whether or not . Note, for future reference, that , where is as in (1.11).
Lemma 2.2
For any and , and for any , define the events
[TABLE]
similarly, for , define
[TABLE]
Then there exist constants , , such that, for all ,
[TABLE]
Proof: Combining (2.16) with (2.10), it follows that {\cal L}\bigl{(}(W_{0}(s),\ldots,W_{d}(s)),\,s\geq v\,|\,{\widehat{\cal F}}_{v}\bigr{)} depends on only through the value of . Then, noting that, for , and for any ,
[TABLE]
and using Kolmogorov’s inequality on the real and imaginary parts of , it follows that
[TABLE]
For , taking , it follows from (2.17) that
[TABLE]
For , taking and , it follows from (2.17) that
[TABLE]
and adding over gives
[TABLE]
For , taking and , it follows from (2.17) that
[TABLE]
and adding over gives
[TABLE]
For , the result is proved in analogous fashion, starting from
[TABLE]
and observing that, from (2.17),
[TABLE]
As a result of this lemma, we can sharpen (2.19) by giving an explicit bound on the error made when approximating by for any . To state the bound, we define
[TABLE]
noting that, on , for all . Then for all and , and if , we have
[TABLE]
on . Furthermore, from Lemma 2.2,
[TABLE]
2.2 Approximating an integral representation of
The aim of this section is to prove an approximation theorem, when is large, for the process in . We recall (2.7) and (2.9), and use the representation (2.12), writing
[TABLE]
where is a unit rate Poisson process, with increments independent of , starting with , and where , , are constructed in from the Poisson process , using (2.8) and (2.9), with initial values , . Once again, the process depends on its past only through . Since the expression (2.23) is too complicated to use directly, we simplify it in a series of stages.
We start by approximating in . In view of (2.21), we have , or ; the precise result is as follows. Note that, for our purposes, can be thought of as small.
Lemma 2.3
Fix any . Then, on the event , we have
[TABLE]
for all , where , , and is as defined in (2.20).
Proof: We begin by noting that , so that, from (2.21), for ,
[TABLE]
So, defining
[TABLE]
it follows that, on ,
[TABLE]
Now substitute into (2.26) for , giving
[TABLE]
Writing and inverting, it then follows immediately that
[TABLE]
establishing the lemma.
This now allows (2.23) to be rewritten in the form
[TABLE]
where is a unit rate Poisson process, with respect to which both upper limit and integrand are predictable, the latter being decreasing in and bounded between
[TABLE]
for all , on the event . In order to show that we can replace both the integrand and the upper limit of integration in (2.27) with simpler expressions, without making too great an error, we use Lemma 4.1 from the Appendix.
We first replace the integrand in (2.27), showing that is close to , defined by
[TABLE]
using (2.28). We set
[TABLE]
Lemma 2.4
With the above definitions, for any and any , we have
[TABLE]
where is as in (2.22), and .
Proof: It follows from (2.27) that is an integral of the form considered in Lemma 4.1, albeit with a random upper limit, and its corresponding function satisfies
[TABLE]
on , in view of (2.28). We can thus apply Lemma 4.1 to the process with and with as in (2.30), noting that then, recalling (2.22),
[TABLE]
Now, from (2.30), we have . We can then choose in Lemma 4.1, because
[TABLE]
if , and the result follows.
The next step is to simplify the upper limit in (2.29), using Lemma 4.1 to show that, with as defined in (2.25), is close to the process given by
[TABLE]
For this, we need to control , for defined in (2.26).
Lemma 2.5
With the definitions given in (2.26), (2.29) and (2.31), and for any , we have
[TABLE]
where , and
[TABLE]
Proof: We consider the ranges and separately. In the first range of , define for , and set : then for each . By Lemma 4.1, with the constant and , we have
[TABLE]
for , since on . Hence, by a standard argument,
[TABLE]
In the second range of , we define
[TABLE]
noting that . By Lemma 4.1 with , we have
[TABLE]
since on , and hence
[TABLE]
since also on . We need here as the bound on the supremum difference, rather than the usual , because it is possible to have for some ; however, it then has to be the case that, for such , if , which is the case on .
In view of Lemma 2.5 and (2.26), we immediately have the following corollary.
Corollary 2.6
With the definitions of Lemma 2.5,
[TABLE]
We now show that is close in distribution to the process defined by
[TABLE]
where, for the integrator, the compensated Poisson process from has been replaced by a standard Brownian motion . Note that is itself just a time-changed Brownian motion:
[TABLE]
and so, conditional on , .
Lemma 2.7
Fix . Then there are constants and , depending only on , with the following properties. For all such that , it is possible to construct and on the same probability space, in such a way that
[TABLE]
Proof: For any , there are constants with the property that, for any , a standard Poisson process and a standard Brownian motion can be constructed on the same probability space in such a way that , where
[TABLE]
This follows from Komlós, Major & Tusnády (1975, Theorem 1 (ii)), together with elementary exponential bounds for the fluctuations of the standard Poisson process and Brownian motion over the time interval . Fix , and take for , where is chosen so that , implying that . Then use the corresponding choices of and to realize and , which we express, by partial integration, in the form
[TABLE]
Taking the difference, it is immediate that, for and on ,
[TABLE]
and that
[TABLE]
This shows that, on ,
[TABLE]
Then, taking , , and in Lemma 4.1, with the choice of permissible for all , where is chosen such that , we have
[TABLE]
The same bound is satisfied also for , as can be deduced from the representation (2.36). Now choose so that , and set .
Summarizing the conclusions Lemmas 2.4 and 2.7 and of Corollary 2.6, we have the following theorem. In the error terms, is defined in (2.22), in Lemma 2.4, in Lemma 2.5 and in Lemma 2.7.
Theorem 2.8
With the definitions (2.12), (2.25) and (2.35), fixing any , we can construct and a time changed Brownian motion on the same probability space, in such a way that, for all ,
[TABLE]
where , is as defined in (2.32), and the constants and , which depend only on , can be deduced from Lemma 2.7 with .
2.3 Consequences for the gossip process
Theorem 2.8 is not yet in a form easily applied to the gossip process. To start with, the statement of the theorem involves the -measurable random variables , , and , , and it is useful to have some idea of their magnitude. It is also useful to specify how big the probability may be. To derive appropriate statements, we begin with the random elements and , .
Lemma 2.9
For any , we have
[TABLE]
for . Furthermore, for any ,
[TABLE]
for a suitably chosen .
Proof: The first part follows from (2.17) and Chebyshev’s inequality, and, for , the bound on the upper tail holds because and . For the lower tail, note that a.s., so that, because is càdlàg and positive on , we have a.s. also. Suppose that is such that . Then, for , if any of the offspring of the initial individual that are born before time generate families with , where . The probability that there are no such offspring is just . Hence, for and ,
[TABLE]
In view of (2.20), if , then on the event
[TABLE]
and the first part (2.38) of Lemma 2.9 directly implies that
[TABLE]
for a suitable constant ; of course, by definition, . The second part of Lemma 2.9 implies that is such that . From these observations and (2.32), it follows that
[TABLE]
if is such that , and hence, for such ,
[TABLE]
in addition,
[TABLE]
on also.
For the quantities , , note that, from (2.22),
[TABLE]
and that . Then, as in Lemma 2.7, if we take . From Lemma 2.4, , and both and , defined in Lemma 2.5, are super-exponentially small in on the event . Finally, by the last inequality in Lemma 2.9,
[TABLE]
which is also super-exponentially small in . Hence, taking
[TABLE]
for which , and assuming that is such that , we have the following consequence of Theorem 2.8. To state it, and for future use, we define
[TABLE]
an upper bound for the times to be considered in proving the central limit theorem.
Corollary 2.10
For any and such that and , there are constants and and an event , with , such that, for any such that ,
[TABLE]
uniformly for all .
Taking any and setting , we also observe from Lemma 2.1 that
[TABLE]
on , the probability of whose complement is bounded in (2.41).
3 The central limit theorem
In this section, the central limit theorem is proved much as outlined in the introduction. With , we show in Lemma 3.2 that
[TABLE]
if is chosen to be sufficiently long after . The approximation of as a Poisson probability is then accomplished in Lemma 3.3, with an error that is small if is sufficiently large. Lemmas 3.4–3.6 approximate the mean of the Poisson distribution by successively simpler quantities, and bound the errors involved in the approximations. The combined result of these steps is summarized in Corollary 3.7, showing that, given , the distribution of is close to that of .
Now the normalized difference can be shown, using Corollary 2.10, to have a normal approximation. Because of the normalization, it is important at this point to check that the approximation errors in the previous steps are all much smaller than ; this places some restrictions on how large may be. The linearization of the difference , needed to show that it is itself approximately normally distributed, is accomplished in Lemma 3.8, and the final result is given in Theorem 3.9.
3.1 Comparisons of processes
The detailed calculations make heavy use of comparisons between a number of processes, that we justify in Lemma 3.1 by realizing them on the same probability spaces. The process itself can be realized by starting with the times of the branching process , paired with a sequence of independent uniform points of . This yields a process
[TABLE]
in terms of which we define
[TABLE]
We can then define the set valued process
[TABLE]
obtained by taking the unions of the neighbourhoods generated by . The process can be augmented to a process of quadruples, by including a set of pairs , , where and , denoting the subsets from which the long range contacts were made and the positions of the individuals within them: given ,
[TABLE]
and is then chosen uniformly from the set . The process is derived from sequentially, by thinning. The pair is not included in unless . This thinning process ensures that, when neighbourhoods overlap in , only contacts from the neighbourhood that was informed earliest are allowed, ensuring that the rate of long range transmissions from remains equal to . Note that, if , the pair is included in defining ; however, it is redundant in (1.3), the newly informed individual having previously been informed, and it never contributes to further transmission, because of the definition of the thinning step. The resulting set of times and positions we denote by , with
[TABLE]
and is as given by (1.3); it satisfies , with strict inclusion for all large enough times.
The process acts as a tractable upper bound for , and it is useful also to have tractable lower bounds. In particular, when calculating the probability that a neighbourhood intersects , where is fixed and is a uniform random point of , the way in which the neighbourhoods of intersect one another enters in a complicated way. However, if happened to consist of a union of non-intersecting neighbourhoods, which were also separated from one another by distance at least , then the probability could be deduced by simply adding the intersection probabilities for the individual neighbourhoods. Then, because the neighbourhoods are balls in a geodesic metric space, the probability of two neighbourhoods and intersecting, if one or both of and are chosen uniformly and independently in , is given by
[TABLE]
where can be estimated in terms of , in view of (1.4). Of course, as grows, intersections occur in , but, at least for a while, their effect may not be too large. So the next step is to construct subsets of with the necessary separation properties, and which are amenable to analysis.
Fix any , and thin the process to obtain a set valued process as follows. Start with and , defining
[TABLE]
let denote the initial set of indices of censored points of . Then proceed sequentially. Suppose that the quadruples have already been considered. If , set and proceed to the next quadruple; descendants of censored points are also censored. If not, thin much as in the construction of , except that a point is also thinned if it belongs to , where, for and ,
[TABLE]
set
[TABLE]
The extra thinning in (3.7) ensures that the neighbourhoods in are at distance at least from one another. If denotes the set of indices of the points of that enter up to time , then consists of disjoint neighbourhoods , and new points are generated at rate , where the censoring probability is given by
[TABLE]
In our applications, we can find suitably small bounds for , so that the growth of the numbers of neighbourhoods in is still reasonably close to that of the CMJ process . In view of the ‘hard core’ censoring, the points are no longer independent of one another, but their marginal distribution is still uniform on if is chosen at random. Note also that for each and .
We shall also use comparisons between the CMJ process and ‘flattened’ versions , and that are of the form discussed in the previous section. We start by noting that, from the inequality (1.4),
[TABLE]
where is as in (2.45), and
[TABLE]
Hence, up to time , the process is stochastically dominated by the flattened process , defined as in the previous section, having intensity per unit volume, and hence growth rate ; similarly, it stochastically dominates the flattened process with and . We also define the flattened process with intensity per unit volume, and with growth rate . The quantities , and , and their standardized versions , and , correspond to these processes. We make the relationships between the processes precise with the following construction.
Lemma 3.1
Let the successive birth times in the branching processes , , and be denoted by , respectively, and let denote the sets of birth times up to time in each of the processes. If, for some , and , then the processes , , and can be defined on the same probability space, in such a way that, for all ,
[TABLE]
Proof: The birth rate of at time is given by
[TABLE]
and of by
[TABLE]
with analogous representations for and . Thus, for any time such that
[TABLE]
we have and . Hence, for as given, we can construct all four processes on the same probability space, for , by realizing on together with an independent sequence of independent random variables uniformly distributed on , and then thinning in the following way. At each successive point , include it as a point of if ; similarly, if , include as a point of , and if , include as a point of . This construction preserves the inclusions (3.12) for all times up to , and, because independently thinned Poisson processes are again Poisson processes, also yields the right distributions for the processes , and .
In what follows, we shall use to denote the filtration for the combined construction in Lemma 3.1. We shall henceforth only consider times in , and will take large enough that
[TABLE]
3.2 Relating the proportion informed to the function
The first step in our detailed calculations is to replace with , where , for suitable ; this conditional expectation is easier to handle. We start by bounding the conditional variance , for suitable values of .
The basis for our argument is given by the observations that
[TABLE]
where and are chosen independently and uniformly in , implying that
[TABLE]
On the other hand,
[TABLE]
where denotes the set of all points at time that, if informed, would inform by time . Now, for the gossip process, is independent of , and has the same distribution as . In view of (3.16), we thus have
[TABLE]
where is -measurable and is independent of , and
[TABLE]
with and independent of , but not of each other. Indeed, in view of (3.15), it is the extent of their dependence that measures .
Writing , our argument now involves bounding the differences
[TABLE]
between the probabilities (3.17) and (3.18) and the smaller ones obtained by replacing and by their related (independent) branching and growth processes and .
These, as observed in the joint construction at the beginning of the section, give rise to stochastically larger sets than and . If both of the differences (3.19) and (3.20) are smaller than some , then the independence of and immediately implies that . Using this strategy, we prove the following lemma.
Lemma 3.2
Under the above assumptions, there is a constant such that
[TABLE]
Proof: To control the differences (3.19) and (3.20), we begin by running a process , defined following (3.2), until time , and thin to obtain . As in (3.3), let , and set and . We then thin further to construct the process , by the method used to construct in (3.8).
We now consider the difference
[TABLE]
which is an upper bound for the real quantity (3.19) of interest to us. The quantity is no larger than the conditional expectation given of the number of intersections between censored islands of and the islands of . If an island born in at is censored, the expected number of censored islands that result at is at most , by (2.2) and because is stochastically dominated by . These islands each have radius at most . Hence, given , the expected number of intersections resulting from a censored island born at is at most
[TABLE]
in view of (3.6), (1.4) and (3.13); and are as in (3.3). Similarly, using (3.9), the conditional probability of an island born in at being censored for , given the history up to , is bounded above by
[TABLE]
Hence, again using as an upper bound for the number of uncensored islands, and noting that the birth intensity in at time is at most
[TABLE]
we have
[TABLE]
Now, by (2.2), (2.4) and Cauchy–Schwarz, and because is stochastically dominated by ,
[TABLE]
Using this in (3.2), and noting that and that , gives the following bound for (3.19):
[TABLE]
We now need to bound (3.20). This can be done by introducing a process , constructed in the same way as , but starting from two initial points and using a CMJ process , which is the same as using two independent CMJ processes and , by the branching property. Now , and the conditional expection given of the number of intersections between censored islands of and the islands of satisfies
[TABLE]
by an argument exactly as before, but for a larger constant than appearing in (3.23). Since is a bound for the difference in (3.20), we have enough to prove the lemma.
Remark. With and , where , and since , from (2.4), it follows that is typically of order O\bigl{(}\Lambda^{2\alpha_{2}-\alpha_{1}-2}(\log\Lambda)^{d}\bigr{)}.
Our main interest is in approximating the distribution of when
[TABLE]
for fixed. This is because the times asymptotically represent the period in which increases from [math] to . Taking and in the remark, it follows that is typically of order for . Now pick and , with . Then
[TABLE]
in which the latter term, again by the remark, is typically of order if . Supposing that is actually of magnitude , this indicates that the conditional distribution of given is essentially that of the conditional distribution of given . So the next step is to examine in detail, for , and to express it in more amenable form.
The next lemma once again uses the backward branching process from a randomly chosen point . We define , where contains the information about when the islands of were formed, up to time , but not where they are centred. We then write for the number of islands of that intersect .
Lemma 3.3
With the definitions above, there is a constant such that
[TABLE]
where .
Proof: We start by using (3.14), (3.16) and (3.23) to show that, for ,
[TABLE]
We now use Poisson approximation to approximate the probability , using the conditional independence between the locations of the islands of , given , as the basis of the approximation.
We first observe that the conditional probability that an island of with radius intersects , given , is at most
[TABLE]
in view of (3.6), by (1.4), (3.11) and (3.13), and because . This, using to denote the number of islands of that intersect , implies that
[TABLE]
by Barbour, Holst & Janson (1992, (1.23)), where . Hence, from (3.28),
[TABLE]
and combining this with (3.26) gives the lemma.
We now define
[TABLE]
as an approximation to . The following lemma bounds the accuracy of the approximation for .
Lemma 3.4
For any , there is an event with such that, for ,
[TABLE]
where and depend only on .
Proof: We begin by introducing the censored version of the process . We denote the indices of islands in by , and write . It then follows that
[TABLE]
with the lower bound using the separation between the islands of . Now, from (3.10), (3.11) and (3.30),
[TABLE]
and
[TABLE]
Hence
[TABLE]
and
[TABLE]
This implies that
[TABLE]
where . Thus we need to bound the conditional expectation given of the right hand side of (3.33).
Define by
[TABLE]
Since is independent of in (3.33), it follows that we can easily take the expectation, given , of its second term. For , and using (2.2) and (2.4), this gives
[TABLE]
where we have twice used for , as follows from (3.13). For the first term in (3.33), from (3.29), we have
[TABLE]
Defining
[TABLE]
it thus follows from the independence of and that, for ,
[TABLE]
using (3.13) to bound .
To complete the proof of the lemma, we need to show that
[TABLE]
For , we bound and {\mathbb{E}}\bigl{\{}\sum_{j\in{\overline{J}}_{s}\setminus J^{s,s}_{s}}r_{js}^{d}\bigr{\}}, and then use Markov’s inequality. We begin by bounding the conditional probability , given the past up to time , that an island of , born to an uncensored parent at , is censored in . Using (3.9), it is no greater than
[TABLE]
If it is censored, bounding by the branching process and using (2.2) and (2.4), the expected number of its offspring by time , all of which are also censored, is at most , and the expected volume censored at most . Hence
[TABLE]
again by (2.2) and (2.4), and from Cauchy–Schwarz. Then, by a similar argument,
[TABLE]
Combining (3.37) and (3.38) and using Markov’s inequality, , for a constant depending only on .
For , we again bound by the branching process and use (2.2) and (2.4), giving
[TABLE]
Hence, from Markov’s inequality, , for a constant depending only on , and the lemma is proved by taking .
We now replace by an expression involving the function defined in (1.12), and using the quantity defined by
[TABLE]
where the inequality follows from Lemma 3.1, so that, from (3.13), .
Lemma 3.5
Take , and let be defined as in (3.29), as in (3.40) and as for Lemma 1.12. Then, for any and , there is an event and constants and , depending only on , such that
[TABLE]
and that
[TABLE]
uniformly in , where .
Proof: We first observe, from (3.29) and (1.6) that
[TABLE]
with as before. Now realize , and together as in Lemma 3.1, so that
[TABLE]
Then, for such , it follows from (2.47), then using Lemma 4.2, (2.40) and (2.41), that, on an event such that , we have
[TABLE]
and
[TABLE]
for all choices of , where . Define
[TABLE]
Then, on and for , we have
[TABLE]
for all , from (3.42), (3.43), (3.40) and (3.45), where
[TABLE]
Arguing analogously, we also deduce that
[TABLE]
Now . Then, since
[TABLE]
and using (3.13), we have
[TABLE]
in , and hence, by Markov’s inequality,
[TABLE]
Thus the event
[TABLE]
is such that
[TABLE]
for a suitable constant .
Now, taking , where
[TABLE]
(3.41) implies that
[TABLE]
Hence also
[TABLE]
Now, because can also be bounded between copies and of and , using Lemma 3.1, we have the inequality
[TABLE]
Hence, since the -processes can be chosen to be independent of , it follows that
[TABLE]
for any . Thus, from (3.51), it follows that
[TABLE]
The next step is to examine the difference
[TABLE]
where . To start with, from (3.52) and Lemma 2.1,
[TABLE]
Hence, for any non-negative and -measurable random variable , we have
[TABLE]
where
[TABLE]
and is as above, with the final equalities a consequence of (2.14). Since , we conclude from Lemma 4.2 and (3.13) that
[TABLE]
as long as . Taking and , and using (3.55), (3.57) and (3.13), this gives
[TABLE]
From (3.40), we have . Thus, defining
[TABLE]
it follows that , and, combining (3.54) and (3.58), that
[TABLE]
uniformly in . But now, from Lemma 4.2, on the event ,
[TABLE]
and by (1.12), (2.14) and (2.18). This establishes the lemma, with , in view of (3.49) and (3.59).
3.3 Replacing by
Our aim is to approximate the conditional distribution of , given , for suitably chosen . After Lemma 3.5, the problem has largely been reduced to considering the conditional distribution of . However, in order to use the results of Section 2, it is advantageous to replace by a function of a flattened branching process; is constructed from the birth times of the original branching process . Accordingly, we define
[TABLE]
for , , corresponding to the (flattened) branching process of Lemma 3.1, taken to have initial condition , . Note that . The error involved in replacing by is bounded in the following lemma.
Lemma 3.6
For , we have
[TABLE]
Proof: We once more use Lemma 3.1 to justify that both and belong to the interval
[TABLE]
where the processes and both have the same initial condition as . Now
[TABLE]
and
[TABLE]
hence
[TABLE]
by (3.13). This, together with (1.12) and Lemma 4.2, implies that
[TABLE]
as required.
We now combine the results of Lemmas 3.2–3.6 to give the following result, relating the distribution of to that of .
Corollary 3.7
Take and for , and fix . Then there is an event , and constants , and , such that
[TABLE]
and such that , where
[TABLE]
Proof: We take the results of Lemmas 3.2–3.6 in turn. Using Lemma 3.1, we have
[TABLE]
Define the event , whose probability is at most , by Markov’s inequality. Then, from Lemma 3.2 and (3.63), it follows that
[TABLE]
implying that, on , we have
[TABLE]
Next, from Lemma 3.3 and (3.63) and on the event , we have
[TABLE]
Turning to Lemma 3.4, we find that
[TABLE]
Then, from Lemma 3.5, we have
[TABLE]
Finally, from Lemma 3.6, on the event , we have
[TABLE]
Combining (3.64) to (3.68), we deduce that, on the event , and uniformly in ,
[TABLE]
where .
For the exceptional set, from Lemmas 3.5 and 3.4, we have
[TABLE]
On the other hand, for any set with , and for any -field , we have
[TABLE]
by the total probability formula, implying that with probability at least . Hence there is an event , whose complement has probability at most
[TABLE]
on which . Now define and . Then, for any bounded Lipschitz function , we conclude from (3.69) and (3.70) that, for and on the event
[TABLE]
we have
[TABLE]
This proves the corollary.
3.4 The main theorem
We now use Corollary 3.7 to compare the conditional distributions, given , of the normalized random variables and , where
[TABLE]
for a careful choice of , with the centring constant chosen because . These are the correct standardizations to achieve a non-trivial limit. Thus we wish to compare with , for Lipschitz functions that have and . This corresponds to taking and in Corollary 3.7, because of the pre-factors in the definitions of and . Thus, although is already small for large , if , we need also to show that, for , it is possible to choose , and so as to make small with . Recalling the definition (3.11) of , the expression for in Corollary 3.7 shows that this is the case, for chosen small enough, if,
[TABLE]
So, for
[TABLE]
choose so that and then so that ; then, if we choose
[TABLE]
it follows that there are constants and such that
[TABLE]
for all , except on an event of probability at most . Particular choices are to take
[TABLE]
in which case we can take any , and express the error in (3.73) as , except on an event of probability at most , albeit with different constants and .
Corollary 3.7 and (3.73) compare the distribution of with that of the quantity , for any . The path of is approximated, to first order, by a time shift of the deterministic path , and the shift is the same throughout the path, being determined by the value of the single -measurable random variable . In the remaining argument, we exploit this to show that, to a good approximation, the path after time is that of the approximation , together with a perturbation that can be expressed in the form , where is an -measurable function depending on the value of , and is the standard normal distribution.
To do so, in view of (3.73), we now need a central limit theorem for as defined in (3.71). Writing
[TABLE]
where the final equality holds for all , the next lemma shows that is close in distribution to .
Lemma 3.8
Let be defined as in (3.71), and let and ; suppose that is as for (3.72) and , where is as in (3.74). Then there is a constant such that, for all , and on the event ,
[TABLE]
uniformly in .
Proof: From (1.12), we have , so that, by Taylor’s expansion, for any , we can write
[TABLE]
from (2.17). Thus, in making a linear approximation to
[TABLE]
the remainder term can be bounded by . Now, because , we have
[TABLE]
where the inequality follows using (2.17). Hence, for any , and using (3.75), we have
[TABLE]
Thus, taking in (3.76), and on , it follows that
[TABLE]
and the lemma follows because , from (3.72).
We are now in a position to prove a central limit theorem, with an error bound expressed in terms of the bounded Wasserstein distance.
Theorem 3.9
Suppose that for , where is as in (1.4) and as in (1.11) (so that for ). Suppose that is as for (3.72) and , where and are as in (3.74), with . Suppose that is large enough that (3.13) is satisfied, and that and , where is as in Theorem 2.8. Then, for any , there exist constants and and an event with such that
[TABLE]
uniformly in , where is defined in (3.75), in Lemma 3.5 and in (3.25).
Proof: In view of (3.73) and Lemma 3.8, it suffices to show that
[TABLE]
with and as in (3.74). Corollary 2.10, with , shows that there is an event with such that, on ,
[TABLE]
provided that . Then, from (2.25) and (2.36),
[TABLE]
and the theorem follows because and
[TABLE]
on , and , from (3.72).
This theorem is not quite the same as Theorem 1.1, because both mean and variance are expressed in terms of , which, as is seen from its definition in (3.40), is not necessarily determined by knowledge of alone, because all the birth times of come into its definition. Instead, one can observe as in (1.10). We now show that this is enough.
We construct a lower bound for by summing over the subset of the birth times in (1.10) that belong to , where is defined in (3.5), and
[TABLE]
with the birth times of before , defined in (3.3). These give rise to non-intersecting neighbourhoods at time , though not necessarily to all such, and they form a subset more amenable to calculation. Then it is immediate from (1.4) that, for all sufficiently large,
[TABLE]
the final inequality following from (2.2). Then, using arguments analogous to those in Lemma 3.2, we have
[TABLE]
Hence, for ,
[TABLE]
and, for , this is of order . The most sensitive place where this enters is into , when the difference has to be small relative to , because of the factor ; but this is the case if , as in the statement of the theorem, by Lemma 4.2. The conversion of into an event that can be determined from can be accomplished in similar fashion, by modifying the definitions of its constituent events in terms of , .
Appendix
We note here two technical lemmas that are used in the previous arguments. The first establishes a bound on the extreme fluctuations of an integral with respect to a compensated Poisson process.
Lemma 4.1
Let , where is a Poisson process and the process is predictable and a.s. bounded in modulus by the deterministic function . Define and . Then
[TABLE]
for all . If is decreasing, we have
[TABLE]
for all .
Proof: For any , the process
[TABLE]
is a supermartingale (van de Geer (1995, p. 1795)), and stopping at easily yields
[TABLE]
if . The corresponding bound for is proved in analogous fashion. Now, if , choose , giving the first conclusion of the lemma. The second follows by choosing .
The second lemma establishes some smoothness of the function .
Lemma 4.2
With defined as above, and for any , we have
[TABLE]
Proof: We note that and that for all . Then, writing and using (2.17), we have
[TABLE]
for any . Hence, using (4.1), and taking expectations first conditional on , we have
[TABLE]
This implies that
[TABLE]
since , proving the first inequality.
For the second, since in and ,
[TABLE]
Acknowledgement
ADB thanks the Department of Statistics and Applied Probability at the National University of Singapore, and the mathematics departments of the University of Melbourne and Monash University, for their kind hospitality while much of the work was undertaken. AR thanks the School of Mathematics and Statistics at the University of Melbourne for their kind hospitality.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. J. Aldous (2012) When knowing early matters: gossip, percolation and Nash equilibria. In: Prokhorov and Contemporary Probability Theory , Eds. A.N. Shiryaev, S.R.S. Varadhan and E.L. Presman, pp. 3–28. Springer Proceedings in Mathematics & Statistics 33 , Springer Verlag, Heidelberg.
- 2[2] S. Asmussen & H. Hering (1983) Branching processes. Progress in probability and statistics 3, Birkhäuser, Boston.
- 3[3] F. G. Ball (1983) The threshold behaviour of epidemic models. J. Appl. Probab. 20 , 227–241.
- 4[4] A. D. Barbour, L. Holst & S. Janson (1992) Poisson approximation . Oxford University Press.
- 5[5] A. D. Barbour & G. Reinert (2013). Asymptotic behaviour of gossip processes and small world networks. Adv. Appl. Probab. 45 , 981–1010.
- 6[6] S. Chatterjee & R. Durrett (2011) Asymptotic behavior of Aldous’ gossip process. Ann. Appl. Probab. 21 , 2447–2482.
- 7[7] E. Z. Ganuza & S. D. Durham (1974) Mean–square and almost–sure convergence of supercritical age–dependent branching processes. J. Appl. Probab. 11 , 678–686.
- 8[8] S. van de Geer (1995) Exponential inequalities for martingales, with application to maximum likelihood estimation for counting processes. Ann. Statist. 23 , 1779–1801.
