The variance of the average depth of a pure birth process converges to 7
Ken R. Duffy, Gianfelice Meli, Seva Shneer

TL;DR
This paper proves that the variance of the average leaf depth in a pure birth process converges to 7, showing consistency within individual trees despite fluctuations across the ensemble.
Contribution
It establishes that the variance of the average leaf depth in a pure birth process converges to a constant, contrasting with the linear growth of variance in individual leaf depths.
Findings
Variance of average leaf depth converges to 7.
Within individual trees, average depth is highly consistent.
Variance across trees fluctuates but stabilizes for the average depth.
Abstract
If trees are constructed from a pure birth process and one defines the depth of a leaf to be the number of edges to its root, it is known that the variance in the depth of a randomly selected leaf of a randomly selected tree grows linearly in time. In this letter, we instead consider the variance of the average depth of leaves within each individual tree, establishing that, in contrast, it converges to a constant, . This result indicates that while the variance in leaf depths amongst the ensemble of pure birth processes undergoes large fluctuations, the average depth across individual trees is much more consistent.
Click any figure to enlarge with its caption.
Figure 1
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
The variance of the average depth of a pure birth process converges to 7
Ken R. Duffy Hamilton Institute, Maynooth University, Maynooth, Ireland
Gianfelice Meli 11footnotemark: 1
Seva Shneer School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK
Abstract
If trees are constructed from a pure birth process and one defines the depth of a leaf to be the number of edges to its root, it is known that the variance in the depth of a randomly selected leaf of a randomly selected tree grows linearly in time. In this letter, we instead consider the variance of the average depth of leaves within each individual tree, establishing that, in contrast, it converges to a constant, . This result indicates that while the variance in leaf depths amongst the ensemble of pure birth processes undergoes large fluctuations, the average depth across individual trees is much more consistent.
1 Introduction
Continuous time branching processes form fundamental building blocks of many stochastic models (e.g. [8]) and much is known about many statistics associated with them. A pure birth process [14] is the simplest continuous time branching process. It describes the growth of a directed tree that starts at time [math] with a root, which is the first leaf. Each leaf extends the tree by creating two new leaves after an exponentially distributed time with mean , independently of everything else. Pure birth processes appear as a fundamental model of study in a large number of applications from data structures in computer science to likelihood methods in phylogenetics to the study of random walkers on random graphs, and are well studied.
Of interest to us here is a measure of tree depth, the distance from root to leaves. If one conditions on the number of nodes, much is known. For example, Pittel [12] linked prior results regarding binary search trees [15, 5, 3] to continuous time Markovian branching processes, establishing scaling properties of the depth of the both the shortest and longest leaf. Further extensions of those results have since been found [13, 1]. Without conditioning on the number of nodes in the tree, relatively little appears in the literature. For a pure birth process, it is known that the mean depth of a randomly chosen leaf in a randomly selected tree grows as with variance [16]. However, for many applications, particularly in the life sciences e.g [11, 9], one is interested in the properties of individual growing trees. Denoting the number of leaves in a random tree at time by and the sum of their depths by , with and . The object of the present study is the variance across trees of the average depth of the leaves within them, i.e. , and our main result is as follows.
Theorem 1**.**
For a pure birth process, we have that
[TABLE]
In addition to the results in [16], this finding is potentially surprising because it is known that the two processes and have different growth rates, and , respectively [7, 17], from which one might anticipate that the variability of the average depth of a tree diverges to infinity as . Those suppositions are incorrect as it has recently been established that, for general continuous time branching processes, and are strongly correlated at the level of sample paths [10], and that, for a pure birth process, almost surely. A visualization of the result in Theorem 1, obtained by Monte Carlo simulation, is provided in Fig. 1. Note that the result does not depend on , which only influences the speed of convergence.
In order to evaluate , we condition the average generation on the number of leaves at time , . By the Law of Total Variance (e.g. [2])
[TABLE]
and, in order to study the variance of the average depth of the leaves at time , we study the quantities and in Lemmas 2 and 3, respectively. Theorem 1 then follows.
2 Results
Before proceeding with the analysis of the two terms on the RHS of (1), we prove a lemma that will simplify the proofs of Lemmas 2 and 3. For that, we introduce a new process, , denoting the sum of the squares of the depths of the leaves at time , which appears when the second moment of is studied. In the following we also consider the discrete-time process associated with and , namely and , which account for the sum and the sum of the squares of the depths of the leaves, respectively, when the number of leaves is .
Lemma 1**.**
We have that
[TABLE]
Proof.
Throughout this proof, we condition on and denote by the depth of the leaves present at time , which are not independent. From the definitions, we have and . The idea of the proof is to recover the formulas given above by finding recurrence equations for , and .
For , denote by a random variable that takes value if the -th leaf is the first one, among the existing, to extend the tree with two new leaves, and [math] otherwise. The random variables in the set are independent for and, due to the memoryless property of the exponential distribution, for all , with the number of leaves in the tree. Furthermore, the are not independent of each other because only one of them can assume value , i.e. , implying that and if . With that in mind, we establish the following relations
[TABLE]
From the first equation in (4) we obtain
[TABLE]
where we have used that and are independent. This gives the following recurrence relation , that, solved with initial condition , results in the first formula in (2).
Similarly, using the second equation in (4), we have that
[TABLE]
from which we get the recurrence equation . Solving this recursion with , we obtain the second result in (2).
Using (5) and the two results just found (i.e. the formulas in (2)), we can now find an expression for .
[TABLE]
The equation above can be rewritten as the recurrence equation
[TABLE]
that, when solved with initial condition , gives (3). ∎
We now use Lemma 1 to study the limit behaviour of the first term on the RHS of (1).
Lemma 2**.**
For a pure birth process, we have that
[TABLE]
Proof.
Given that a.s. [6, Chapter 5], for every fixed we have that . This implies that
[TABLE]
Using Lemma 1, we can now compute this variance:
[TABLE]
where in the third equality we have added and subtracted the quantity
[TABLE]
Taking the limit as , we have that
[TABLE]
Using Lemma 1, the first term on the RHS of (2) becomes
[TABLE]
The first term on the RHS of (7) is given by
[TABLE]
whereas the second one is given by
[TABLE]
So, the first sum in the RHS of (2) is equal to . For the last sum in the RHS of (2), we have
[TABLE]
Joining all these results, we obtain
[TABLE]
∎
Lemma 1 allows us to also understand the behaviour of the conditional variance of the expected average depth of the leaves given their number.
Lemma 3**.**
For a pure birth process, we have that
[TABLE]
Proof.
From Lemma 1 we know that
[TABLE]
where, in the second inequality, we have used the fact that the variance of a process doesn’t change when a constant is added. Given that is a pure birth process, the distribution of is given by (e.g. [14, pg. 430])
[TABLE]
where is the expected time before a leaf generates two new leaves, which allows us to evaluate the second term in (8) exactly:
[TABLE]
Let . Then
[TABLE]
and, given , we have that . This implies that
[TABLE]
and the second term in the brackets on the RHS of (8) is therefore .
Consider the first term on the RHS of (8).
[TABLE]
The first term in the brackets on the RHS of (2) is given by
[TABLE]
For the second term, we have that
[TABLE]
Denoting with and noticing that and
[TABLE]
we obtain that , and the second term on the RHS of (2) is thus .
So, joining all the results, we have that
[TABLE]
∎
Theorem 1 follows from equation (1) using the results in Lemmas 2 and 3.
Acknowledgments: The authors thank Tom S. Weber (WEHI) for contributing to the conjecture of Theorem 1. Part of this work was supported by Science Foundation Ireland grant 12 IP 1263.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. D. Biggins and D. R. Grey. A note on the growth of random trees. Stat. Probab. Lett. , 32(4):339–342, 1997.
- 2[2] J. K. Blitzstein and J. Hwang. Introduction to probability . Chapman and Hall/CRC, 2014.
- 3[3] L. Devroye. A note on the height of binary search trees. J. ACM , 33(3):489–498, 1986.
- 4[4] B. Efron and R. J. Tibshirani. An introduction to the bootstrap . CRC press, 1994.
- 5[5] P. Flajolet and A. Odlyzko. Exploring binary trees and other simple trees. In 21st FOCS , pages 207–216. IEEE, 1980.
- 6[6] T. E. Harris. The theory of branching processes . Springer-Verlag, Berlin, 1963.
- 7[7] P. Jagers. Renewal theory and the almost sure convergence of branching processes. Ark. Mat. , 7(6):495–504, 1969.
- 8[8] M. Kimmel and D. E. Axelrod. Branching Processes in Biology . Springer, 2002.
