Large deviations for i.i.d. replications of the total progeny of a Galton--Watson process
Claudio Macci, Barbara Pacchiarotti

TL;DR
This paper explores large deviation principles for the total progeny in Galton--Watson processes, including cases with random initial populations and estimators of offspring mean, linking branching process theory with large deviation techniques.
Contribution
It introduces large deviation results for total progeny distributions in Galton--Watson processes, including new insights for random initial populations and estimator sequences.
Findings
Large deviation rate functions for total progeny are characterized.
Results extend to processes with random initial populations.
Estimates of offspring mean exhibit specific large deviation behaviors.
Abstract
The Galton--Watson process is the simplest example of a branching process. The relationship between the offspring distribution, and, when the extinction occurs almost surely, the distribution of the total progeny is well known. In this paper, we illustrate the relationship between these two distributions when we consider the large deviation rate function (provided by Cram\'{e}r's theorem) for empirical means of i.i.d. random variables. We also consider the case with a random initial population. In the final part, we present large deviation results for sequences of estimators of the offspring mean based on i.i.d. replications of total progeny.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Large deviations for i.i.d. replications of the total progeny
of a Galton–Watson process
C.\fnmClaudioMacci
Dipartimento di Matematica, Università di Roma Tor Vergata,
Via della Ricerca Scientifica, I-00133 Rome, Italy
B.\fnmBarbaraPacchiarotti
(2017; 27 September 2016; 15 December 2016; 17 December 2016)
Abstract
The Galton–Watson process is the simplest example of a branching process. The relationship between the offspring distribution, and, when the extinction occurs almost surely, the distribution of the total progeny is well known. In this paper, we illustrate the relationship between these two distributions when we consider the large deviation rate function (provided by Cramér’s theorem) for empirical means of i.i.d. random variables. We also consider the case with a random initial population. In the final part, we present large deviation results for sequences of estimators of the offspring mean based on i.i.d. replications of total progeny.
Cramér’s theorem,
initial random population,
estimators of offspring mean,
60F10,
60J80,
62F10,
62F12,
doi:
10.15559/16-VMSTA72
keywords:
keywords:
[2010]
††volume: 4††issue: 1
\startlocaldefs\urlstyle
rm \allowdisplaybreaks \endlocaldefs
\cortext
[cor1]Corresponding author.
\publishedonline
11 January 2017
1 Introduction
There is a vast literature on branching processes. Here we cite the monographs [1, 3, 12]; moreover, we also cite the monographs [18] for the multitype case, [10], which focuses on statistical inference, and [13] and [15] for applications in biology.
The simplest example of a branching process is the Galton–Watson process. We consider the case of a population that has a unique individual at the beginning and all the individuals (of all generations) live for a unitary time; moreover, at the end of their lifetimes, every individual of the population (of every generation) produces a random number of new individuals acting independently of all the rest, according to a specific fixed distribution. So, if we consider a sequence of random variables such that is the population size at time (for all ), we have and
[TABLE]
where is a family of nonnegative integer-valued i.i.d. random variables. In other words, represent the offspring generated at time by each of individuals that live at time . We recall some other preliminaries on the Galton–Watson process in Section 2, where, in particular, we consider a slightly different notation to allow the case with a random initial population (instead of the case with a unitary initial population cited before).
In this paper, we present large deviation results. The theory of large deviations is a collection of techniques that gives an asymptotic estimate of small probabilities in an exponential scale (see, e.g., [6] as a reference). We recall some preliminaries in Section 2. The literature on large deviations for branching processes is large. Here we essentially recall some references with results concerning the Galton–Watson process.
In several references, the large-time behavior for the supercritical case is studied, namely the case where the offspring mean is strictly larger than one (in such a case, the extinction probability is strictly less than one). Here we recall [2] (see also [4] for the multitype case), [5], where the main object is the study of the tails of , [19] with a careful analysis based on harmonic moments of , [20] (and [21]) with some conditional large deviation results based on some local limit theorems, [8] where the central role of some “lower deviation probabilities” is highlighted for the study of the asymptotic behavior of the Lotka–Nagaev estimator of .
Other references study the most likely paths to extinction at some time when the initial population is large. The idea is to consider the representation of a branching process with initial population equal to as a sum of i.i.d. replications of the process with a unitary initial population; in this case, Cramér’s theorem for empirical means of i.i.d. random variables (on ) plays a crucial role. A most likely path to extinction in [16] (see also [17]) is a trajectory that minimizes the rate function among the paths that reach the level 0 at time . A generalization of this concept for the most likely paths to reach a level can be found in [11].
In this paper, we are interested in a different direction. Namely, we are interested in the empirical means of i.i.d. replications of the total progeny of a Galton–Watson process. The total progenies of branching processes are studied in several references: here we cite the old references [7, 14, 22] for a Galton–Watson process, and [9] (see Section 2.2) among the references concerning different branching processes. The total progeny of a Galton–Watson process is an almost surely finite random variable when the extinction occurs almost surely, and therefore the supercritical case will not be considered. Some relationships between the offspring distribution and the total progeny distribution of a Galton–Watson process are well known (see \eqrefeq:link-pmf for the probability mass functions and \eqrefeq:link-pgf for the probability generating functions).
A new relationship is provided by Proposition 1, where we illustrate how the rate function for the empirical means of total progenies can be expressed in terms of the analogous rate function for the empirical means of a single progeny. This is a quite natural problem to investigate large deviations, and, as we can expect, \eqrefeq:link-pgf has an important role in the proof; in fact, the large deviation rate function for empirical means of i.i.d. random variables (provided by Cramér’s theorem recalled below; see Theorem 1) is given by the Legendre transform of the logarithm of the (common) moment generating function of the random variables. Moreover, the relationship provided by Proposition 1 can have interest in information theory because the involved rate functions can be expressed in terms of suitable relative entropies (or Kullback–Leibler divergences); see, for example, [23] for a discussion on the rate function expressions in terms of the relative entropy.
Another result presented in this paper is Proposition 2, that is a version of Proposition 1, where the initial population is a random variable with a suitable distribution. Finally, in Propositions 3 and 4, we prove large deviation results for some estimators of the offspring mean in terms of i.i.d. replications of the total progeny and of the initial population (we are considering the case where the initial population is a random variable as in Proposition 2).
We conclude with the outline of the paper. We start with some preliminaries in Section 2. In Section 3, we prove the results concerning the large deviation rate functions related to Cramér’s theorem. Finally, in Section 4, we prove the large deviation results for the estimators of the offspring mean .
2 Preliminaries
We start with some preliminaries on the Galton–Watson process. In the second part, we recall some preliminaries on large deviations.
2.1 Preliminaries on Galton–Watson process
Here we introduce a slightly different notation, and, moreover, we recall some preliminaries in order to define the total progeny of a Galton–Watson process.
We start with some notation concerning the offspring distribution (note that defined further coincides with in the Introduction):
- •
the probability mass function (for all integer );
- •
the probability generating function ;
- •
the mean value (and we have ).
Moreover, we introduce the analogous items for the initial population:
- •
the probability mass function (see \eqrefeq:pmf-initial-population);
- •
the probability generating function ;
- •
the mean value (and we have ).
So, from now on, we consider the following slightly different notation:
[TABLE]
(in place of presented before). More precisely:
- •
the probability generating function of is (so does not depend on ), and therefore
[TABLE]
- •
for a family of i.i.d. random variables with probability generating function , we have
[TABLE]
Remark 1**.**
Note that here corresponds to presented in the Introduction if or, equivalently, if (i.e. for all ).
If we consider the extinction probability
[TABLE]
then it is known that we have
[TABLE]
moreover, if , then we have if and if . More generally, we have
[TABLE]
and, if (we obviously have if ), then we have the following cases:
[TABLE]
Then, if and , then the random variable defined by
[TABLE]
is almost surely finite and provides the total progeny of . In view of what follows, we consider the probability generating function
[TABLE]
where is the probability mass function of the random variable . Moreover, we have the mean value
[TABLE]
in particular, even if , namely
[TABLE]
Finally, we recall some well-known connections between total progeny and offspring distributions (see e.g. [7]): for the probability mass functions, we have
[TABLE]
where is the th power of convolution of ; for the probability generating functions, we have
[TABLE]
2.2 Preliminaries on large deviations
We start with the concept of large deviation principle (LDP). A sequence of random variables taking values in a topological space satisfies the LDP with rate function if is a lower semicontinuous function,
[TABLE]
and
[TABLE]
We also recall that a rate function is said to be good if all its level sets are compact.
Remark 2**.**
If for some closed set (at least eventually with respect to ), then for ; this can be checked by taking the lower bound for the open set .
In particular, we refer to Cramér’s theorem on (see e.g. Theorems 2.2.3and 2.2.30 in [6] for the cases and ), and we recall its statement. We remark that, in this paper, we consider the cases (in such a case, the rate function need not to be a good rate function) and . Moreover, we use the symbol for the inner product in .
Theorem 1** **(Cramér’s theorem)
*Let be a sequence of i.i.d. -valued random variables, and let be the sequence of empirical means defined by *(for all .
(i)* If , then satisfies the LDP with rate function defined by*
[TABLE]
(ii)* If and the origin of belongs to the interior of the set , then satisfies the LDP with good rate function defined by*
[TABLE]
3 Applications of Cramér’s theorem
The aim of this section is to prove Propositions 1 and 2. In view of this, we recall Lemmas 1 and 2, which give two immediate applications of Cramér’s theorem (Theorem 1) with ; in Lemma 2, we consider the case with a unitary initial population almost surely (thus, as stated Remark 1, the case with or, equivalently, ).
Lemma 1** **(Cramér’s theorem for offspring distribution)
*Let be i.i.d. random variables with probability generating function . Let be the sequence of empirical means defined by *(for all . Then satisfies the LDP with rate function defined by .
Lemma 2** **(Cramér’s theorem for total progeny distribution with
)
*Assume that and . Let be i.i.d. random variables with probability generating function . Let be the sequence of empirical means defined by *(for all . Then satisfies the LDP with rate function defined by .
Now we can prove our main results. We start with Proposition 1, which provides an expression for in terms of .
Proposition 1
Let and be the rate functions in Lemmas 1 and 2. Then we have for all .
Proof.
We remark that
[TABLE]
where , and
[TABLE]
where , by Lemmas 1 and 2, respectively.
Moreover, the function defined by
[TABLE]
is a bijection. This can be checked noting that (for all ) because (here we take into account \eqrefeq:link-pgf); moreover, its inverse is defined by
[TABLE]
(where is the inverse of ), and (for all ) because .
Thus, we can set (for ) in the expression of , and we get
[TABLE]
Then (we take into account \eqrefeq:link-pgf in the second equality below) {align*} I_f(x)&=sup_β∈D(G_f,id){logG_f,id(e^β)x-log(e^-βe^βf(G_f,id(e^β))}
=sup_β∈D(G_f,id){logG_f,id (e^β)x+β-logG_f,id (e^β)}
=sup_β∈D(G_f,id){β-(1-x)logG_f,id (e^β)},
and, for , we get
[TABLE]
We conclude by taking for (thus, ), and we obtain the desired equality with some easy computations. ∎
Now we present Proposition 2, which concerns the LDP for the empirical means of i.i.d. bivariate random variables distributed as . In particular, we obtain an expression for the rate function in terms of in Lemma 1 and defined by
[TABLE]
Proposition 2
*Let be i.i.d. random variables distributed as . Assume that is finite in a neighborhood of . Let be the sequence of empirical means defined by *(for all . Then satisfies the LDP with good rate function defined by
[TABLE]
Remark 3**.**
We are assuming (implicitly) that and ; in fact, since we require that is finite in a neighborhood of , we are assuming that and .
Proof.
The LDP is a consequence of Cramér’s theorem (Theorem 1) with , and the rate function is defined by
[TABLE]
Throughout the proof, we restrict our attention on the pairs such that . In fact, almost surely, we have , and therefore ; thus, by Remark 2 we have if condition fails.
We remark that , and therefore
[TABLE]
thus,
[TABLE]
Furthermore, the function
[TABLE]
is a bijection defined on , where
[TABLE]
as in the proof of Proposition 1; then, for , we obtain
[TABLE]
Thus, we have (note that the last equality holds by Proposition 1) {align*} I_G_f,g,g(y,z)&≤sup_β∈R {βy+zlogG_f,id(e^β) }+ sup_δ∈R {δz-logg(e^δ) }
= { zI_G_f,id(y/z)+I_g(z)if y≥z>0,I_g(0)if y=z=0,∞otherwise.
= { yI_f (y-zy)+I_g(z)if y≥z>0,I_g(0)if y=z=0,∞otherwise.
We conclude by showing the inverse inequality
[TABLE]
To this end, we take two sequences and such that
[TABLE]
and
[TABLE]
Then we have
[TABLE]
and we get \eqrefeq:inverse-inequality letting go to infinity. ∎
4 Large deviations for estimators of
In this section, we prove two LDPs for two sequences of estimators of the offspring mean . Namely, if is the sequence in Proposition 2 (see also the precise assumptions in Remark 3; in particular, we have ), then we consider:
; 2. 2.
.
Obviously, these estimators are well defined if the denominators are different from zero; then, in order to have well-defined estimators, we always assume that (where is as in \eqrefeq:pmf-initial-population), and, noting that, in general, , we have
[TABLE]
Moreover, both sequences converge to as (see in \eqrefeq:mean-value-total-progeny), and they coincide when the initial population is deterministic (equal to almost surely).
The LDPs of these two sequences are proved in Propositions 3 and 4. Moreover, Corollary 1 and Remark 4 concern the comparison between the convergence of the first sequence and its analogue when the initial population is deterministic (equal to the mean). Propositions 3 and 4 are proved by combining the contraction principle (see e.g. Theorem 4.2.1 in [6]) and Proposition 2 (note that the rate function in Proposition 2 is good, as it is required to apply the contraction principle). We remark that, in the proofs of Propositions 3 and 4, we take into account that by Proposition 2 and . At the end of this section, we present some remarks on the comparison between the rate functions in Propositions 3 and 4 (Remarks 5 and 6).
We start with the LDP of the first sequence of estimators.
Proposition 3
Assume the same hypotheses of Proposition 2 and . Let be i.i.d. random variables distributed as . Let be the sequence of empirical means defined by (for all ). Then satisfies the LDP with good rate function defined by
[TABLE]
Proof.
By Proposition 2 and the contraction principle we have the LDP of with good rate function defined by
[TABLE]
The case is trivial because we have the infimum over the empty set. For , we rewrite this expression as follows (where we take into account the expression of the rate function in Proposition 2): {align*} J_G_f,g,g(x)&=inf{I_G_f,g,g (z1-x,z ):z>0 }
=inf{z1-xI_f (z1-x-zz1-x )+I_g(z):z>0 }
=inf{z1-xI_f(x)+I_g(z):z>0 }
=-sup{-zIf(x)1-x-I_g(z):z>0 };
thus, since for , we obtain by taking into account the definition of in \eqrefdef:rf-initial-population and the well-known properties of Legendre transforms (see e.g. Lemma 4.5.8 in [6]; see also Lemma 2.2.5(a) and Exercise 2.2.22 in [6] for the convexity and the lower semicontinuity of ). ∎
We have an immediate consequence of this proposition that concerns the case with a deterministic initial population equal to (almost surely). Namely, if we consider the probability generating function defined by (for all ), then we mean the case , and therefore:
- •
almost surely; thus, and almost surely (for all );
- •
are i.i.d. random variables distributed as , that is,
[TABLE]
- •
the rate function is
[TABLE]
by Proposition 3.
Corollary 1** **(Comparison between in Proposition
3 and )
We have for all . Moreover the inequality turns into an equality if and only if we have one of the following cases:
- •
* and ;*
- •
* and ;*
- •
* is deterministic, equal to , and for all .*
Proof.
The case is trivial. On the contrary, if , then by Jensen’s inequality we have
[TABLE]
moreover, the cases where the inequality turns into an equality follow from the well-known properties of Jensen’s inequality. ∎
Remark 4** (Comparison between convergence of estimators of ).**
Assume that and the initial population is not deterministic. Then there exists such that
[TABLE]
Thus, we can say that converges to (as faster than ; in fact, we can find such that
[TABLE]
We can repeat the same argument to say that converges to (as faster than in Lemma 1. In fact, we have almost surely, is an integer, and, since because , we have ; then we have
[TABLE]
(we can also consider the case if .
Now we present the LDP for the second sequence of estimators.
Proposition 4
*Assume the same hypotheses of Proposition 2 and . Let be i.i.d. random variables distributed as . Let be the sequence of empirical means defined by *(for all . Then satisfies the LDP with good rate function defined by
[TABLE]
Proof.
By Proposition 2 and the contraction principle we have the LDP of with good rate function defined by
[TABLE]
The case is trivial because we have the infimum over the empty set (we recall that because ). For , we have
[TABLE]
and we obtain the desired formula by taking into account the expression of the rate function in Proposition 2. ∎
Remark 5** (We can have for some ).**
We know that, for in Proposition 3, we have for . On the contrary, as we see, we could have for some . In order to explain this fact, we denote the minimum value such that by ; then we have ; moreover, we have if . In conclusion, we can say that if , then the range of negative values such that is
[TABLE]
in fact, for , both and are finite for , and therefore we can say that if or, equivalently, if \eqrefeq:range-of-negative-x holds.
Remark 6** (Estimators of when ).**
If , that is, for all or, equivalently, , then the rate function in Proposition 3 is
[TABLE]
Then it is easy to check that coincides with , and therefore coincides with in \eqrefeq:main-estimators-rf-deterministic-initial-population (note that, in particular, we cannot have the strict inequalities in \eqrefeq:local-strict-inequality-between-rf in Remark 4 stated for the case ). Finally, if (and as usual or, equivalently, ), then we have in the variational formula of the rate function in Proposition 4, and therefore
[TABLE]
Note the rate function in \eqrefeq:rf-prop-minor-estimators-muf=0 can also be derived by combining the contraction principle and the rate function for the empirical means ; in fact, we have , and the rate function is good by the hypotheses of Proposition 4 (see Proposition 2 and Remark 3). Finally, we also note that inequality \eqrefeq:range-of-negative-x appears in the rate function expression \eqrefeq:rf-prop-minor-estimators-muf=0.
Acknowledgments
The authors thank a referee for suggesting shorter proofs of Propositions 1 and 2. The support of GNAMPA (INDAM) is acknowledged.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] {bbook} \bauthor \bsnm Asmussen, \binits S., \bauthor \bsnm Hering, \binits H.: \bbtitle Branching Processes. \bpublisher Birkhäuser, \blocation Boston ( \byear 1983). \bid doi=10.1007/978-1-4615-8155-0, mr=0701538 \Orig Bib Text Asmussen S., Hering H. (1983) Branching Processes. Birkhäuser, Boston. \end Orig Bib Text \bptok structpyb \endbibitem
- 2[2] {barticle} \bauthor \bsnm Athreya, \binits K.B.: \batitle Large deviation rates for branching processes. I. Single type case. \bjtitle Ann. Appl. Probab. \bvolume 4, \bfpage 779– \blpage 790 ( \byear 1994). \bid doi=10.1214/aoap/1177004971, mr=1284985 \Orig Bib Text Athreya K.B. (1994) Large deviation rates for branching processes. I. Single type case. Ann. Appl. Probab. 4, 779–790. \end Orig Bib Text \bptok structpyb \endbibitem
- 3[3] {bbook} \bauthor \bsnm Athreya, \binits K.B., \bauthor \bsnm Ney, \binits P.E.: \bbtitle Branching Processes. \bpublisher Springer, \blocation New York, Heidelberg ( \byear 1972). \bid mr=0373040 \Orig Bib Text Athreya K.B., Ney P.E. (1972) Branching Processes. Springer-Verlag, New York-Heidelberg. \end Orig Bib Text \bptok structpyb \endbibitem
- 4[4] {barticle} \bauthor \bsnm Athreya, \binits K.B., \bauthor \bsnm Vidyashankar, \binits A.N.: \batitle Large deviation rates for branching processes. II. The multitype case. \bjtitle Ann. Appl. Probab. \bvolume 5, \bfpage 566– \blpage 576 ( \byear 1995). \bid doi=10.1214/ aoap/1177004778, mr=1336883 \Orig Bib Text Athreya K.B., Vidyashankar A.N. (1995) Large deviation rates for branching processes. II. The multitype case. Ann. Appl. Probab. 5, 566–576. \end Orig Bib Text \bptok st
- 5[5] {barticle} \bauthor \bsnm Biggins, \binits J.D., \bauthor \bsnm Bingham, \binits N.H.: \batitle Large deviations in the supercritical branching process. \bjtitle Adv. Appl. Probab. \bvolume 25, \bfpage 757– \blpage 772 ( \byear 1993). \bid doi=10.1017/S 0001867800025738, doi=10.2307/1427790, mr=1241927 \Orig Bib Text Biggins J.D., Bingham N.H. (1993) Large deviations in the supercritical branching process. Adv. in Appl. Probab. 25, 757–772. \end Orig Bib Text \bptok structpyb \en
- 6[6] {bbook} \bauthor \bsnm Dembo, \binits A., \bauthor \bsnm Zeitouni, \binits O.: \bbtitle Large Deviations Techniques and Applications, \bedition 2nd edn. \bpublisher Springer, \blocation New York ( \byear 1998). \bid doi=10.1007/978-1-4612-5320-4, mr=1619036 \Orig Bib Text Dembo A., Zeitouni O. (1998) Large Deviations Techniques and Applications (2nd Edition). Springer. New York. \end Orig Bib Text \bptok structpyb \endbibitem
- 7[7] {barticle} \bauthor \bsnm Dwass, \binits M.: \batitle The total progeny in a branching process and a related random walk. \bjtitle J. Appl. Probab. \bvolume 6, \bfpage 682– \blpage 686 ( \byear 1969). \bid doi=10.1017/S 0021900200026711, mr=0253433 \Orig Bib Text Dwass M. (1969). The total progeny in a branching process and a related random walk. J. Appl. Probability 6, 682–686. \end Orig Bib Text \bptok structpyb \endbibitem
- 8[8] {barticle} \bauthor \bsnm Fleischmann, \binits K., \bauthor \bsnm Wachtel, \binits V.: \batitle Lower deviation probabilities for supercritical Galton–Watson processes. \bjtitle Ann. Inst. Henri Poincaré Probab. Stat. \bvolume 43, \bfpage 233– \blpage 255 ( \byear 2007). \bid doi=10.1016/j.anihpb.2006.03.001, mr=2303121 \Orig Bib Text Fleischmann K., Wachtel V. (2007) Lower deviation probabilities for supercritical Galton–Watson processes. Ann. Inst. H. Poincaré Probab. Statist
