A non-increasing tree growth process for recursive trees and   applications

Laura Eslava

arXiv:1701.01656·math.PR·November 11, 2021·Comb. Probab. Comput.

A non-increasing tree growth process for recursive trees and applications

Laura Eslava

PDF

TL;DR

This paper introduces a novel non-increasing tree growth process that models recursive trees with applications to Kingman's coalescent and degree distribution analysis, providing new couplings and convergence rate results.

Contribution

The paper presents a new tree growth process that maintains the shape of recursive trees while allowing non-monotonous degree distributions, and applies it to coalescent coupling and degree extremal analysis.

Findings

01

Provides a non-standard coupling of all finite Kingman's coalescents.

02

Extends degree distribution properties to extremal cases with convergence rates.

03

Shows the shape of the tree matches that of a recursive tree despite non-monotonous degrees.

Abstract

We introduce a non-increasing tree growth process $((T_{n}, σ_{n}), n \geq 1)$ , where $T_{n}$ is a rooted labeled tree on $n$ vertices and $σ_{n}$ is a permutation of the vertex labels. The construction of $(T_{n}, σ_{n})$ from $(T_{n - 1}, σ_{n - 1})$ involves rewiring a random (possibly empty) subset of edges in $T_{n - 1}$ towards the newly added vertex; as a consequence $T_{n - 1} \neq \subset T_{n}$ with positive probability. The key feature of the process is that the shape of $T_{n}$ has the same law as that of a random recursive tree, while the degree distribution of any given vertex is not monotonous in the process. We present two applications. First, while couplings between Kingman's coalescent and random recursive trees where known for any fixed $n$ , this new process provides a non-standard coupling of all finite Kingman's coalescents. Second, we use the new process and the…

Figures6

Click any figure to enlarge with its caption.

Equations208

D_{n} = {(T, σ) : T \in T_{n}, σ is an stamp history of T} .

D_{n} = {(T, σ) : T \in T_{n}, σ is an stamp history of T} .

Z_{m}^{(n)} = # {v \in [n] : d_{R_{n}} (v) \geq m}

Z_{m}^{(n)} = # {v \in [n] : d_{R_{n}} (v) \geq m}

2^{- m + l o g n} (1 - o (n^{- γ})) = E [Z_{m}^{(n)}] \leq 2^{- m + l o g n} .

2^{- m + l o g n} (1 - o (n^{- γ})) = E [Z_{m}^{(n)}] \leq 2^{- m + l o g n} .

\frac{Z _{m}^{(n)} - λ _{n, m}}{λ _{n, m}} ⟶ dist N (0, 1) .

\frac{Z _{m}^{(n)} - λ _{n, m}}{λ _{n, m}} ⟶ dist N (0, 1) .

Z_{m}^{(n_{j})} ⟶ dist P o i (λ) .

Z_{m}^{(n_{j})} ⟶ dist P o i (λ) .

d_{TV} (Z_{m}^{(n)}, Poi (λ_{n, m}))

d_{TV} (Z_{m}^{(n)}, Poi (λ_{n, m}))

P (Δ_{n} < ⌊ lo g n ⌋ - i) = exp {- 2^{i + ε_{n}}} (1 + o (1)),

P (Δ_{n} < ⌊ lo g n ⌋ - i) = exp {- 2^{i + ε_{n}}} (1 + o (1)),

ρ_{C} (e) = max {i \in [n - 1] : e \in E (f_{i})} .

ρ_{C} (e) = max {i \in [n - 1] : e \in E (f_{i})} .

σ_{C} (u) = ρ_{C} (uv) + 1.

σ_{C} (u) = ρ_{C} (uv) + 1.

P ((F_{n}, \dots, F_{1}) = (f_{n}, \dots, f_{1})) = k = 1 \prod n - 1 P (F_{k} = f_{k} ∣ (F_{n}, \dots, F_{k + 1}) = (f_{n}, \dots, f_{k + 1})) .

P ((F_{n}, \dots, F_{1}) = (f_{n}, \dots, f_{1})) = k = 1 \prod n - 1 P (F_{k} = f_{k} ∣ (F_{n}, \dots, F_{k + 1}) = (f_{n}, \dots, f_{k + 1})) .

P ((F_{n}, \dots, F_{1}) = (f_{n}, \dots, f_{1})) = [(n - 1)! n!]^{- 1} .

P ((F_{n}, \dots, F_{1}) = (f_{n}, \dots, f_{1})) = [(n - 1)! n!]^{- 1} .

P_{n} = {(k, l, x) : 1 \leq l < k \leq n, x \in {0, 1}^{n - 1}} \cup {(1, 0, x) : x \in {0, 1}^{n - 1}, x_{1} = 1};

P_{n} = {(k, l, x) : 1 \leq l < k \leq n, x \in {0, 1}^{n - 1}} \cup {(1, 0, x) : x \in {0, 1}^{n - 1}, x_{1} = 1};

V_{n} (k, l, x, σ) = V_{n} (k, x, σ) = {v \in [n - 1] : x_{σ (v)} = 1, σ (v) \geq k} .

V_{n} (k, l, x, σ) = V_{n} (k, x, σ) = {v \in [n - 1] : x_{σ (v)} = 1, σ (v) \geq k} .

h_{n} ((T, σ), (k, l, x)) = (T^{'}, σ^{'})

h_{n} ((T, σ), (k, l, x)) = (T^{'}, σ^{'})

E (T^{'}) = {(E (T) \cup {v n; v \in V}) ∖ {v p_{T} (v); v \in V} {n σ^{- 1} (l)} \cup (E (T) \cup {v n; v \in V}) ∖ {v p_{T} (v); v \in V} if k = 1, if k > 1.

E (T^{'}) = {(E (T) \cup {v n; v \in V}) ∖ {v p_{T} (v); v \in V} {n σ^{- 1} (l)} \cup (E (T) \cup {v n; v \in V}) ∖ {v p_{T} (v); v \in V} if k = 1, if k > 1.

σ^{'} (v) = σ (v) + 1_{[σ (v) \geq k]} .

σ^{'} (v) = σ (v) + 1_{[σ (v) \geq k]} .

h_{n} ((T, σ), (k, l, x)) \in D_{n} .

h_{n} ((T, σ), (k, l, x)) \in D_{n} .

H_{n} (T, σ) = h_{n} ((T, σ), (K, L, X)) .

H_{n} (T, σ) = h_{n} ((T, σ), (K, L, X)) .

P (p_{T} (v) = w, σ (v) = j, σ (w) = i) = \frac{1}{n ( n - 1 ) ( j - 1 )} 1_{[j > i]} .

P (p_{T} (v) = w, σ (v) = j, σ (w) = i) = \frac{1}{n ( n - 1 ) ( j - 1 )} 1_{[j > i]} .

(p_{σ (T)} (σ^{- 1} (v)), v \in V (T) ∖ {r (T)}) = dist {p_{R_{n}} (j), 1 < j \leq n},

(p_{σ (T)} (σ^{- 1} (v)), v \in V (T) ∖ {r (T)}) = dist {p_{R_{n}} (j), 1 < j \leq n},

P (p_{T} (v) = w, σ (v) = j, σ (w) = i)

P (p_{T} (v) = w, σ (v) = j, σ (w) = i)

= \frac{1}{n ( n - 1 )} P (p_{σ (T)} (j) = i)

= \frac{1}{n ( n - 1 ) ( j - 1 )} 1_{[j > i]} .

P (p_{T} (v) = w_{v} ∣ σ = π) =

P (p_{T} (v) = w_{v} ∣ σ = π) =

=

P (p_{T} (v) = w_{v} ∣ σ = π) = \frac{1}{π ( v ) - 1} .

P (p_{T} (v) = w_{v} ∣ σ = π) = \frac{1}{π ( v ) - 1} .

P (σ (T) = π (T) ∣ σ = π)

P (σ (T) = π (T) ∣ σ = π)

= P (p_{T} (v) = p_{T} (v), v \in V (t) ∖ {r (T)} ∣ σ = π)

= v \in V (T) ∖ r (T) \prod P (p_{T} (v) = p_{T} (v) ∣ σ = π)

= [(n - 1)!]^{- 1};

P ((T, σ) = (T, π))

P ((T, σ) = (T, π))

= \frac{1}{n !} P (σ (T) = π (T) ∣ σ = π) = [n! (n - 1)!]^{- 1} .

{p_{T} (v), v \in V (T) ∖ {r (T)}} =

{p_{T} (v), v \in V (T) ∖ {r (T)}} =

\cup {p_{T} (v), σ (n) \leq σ (v) \leq n}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A non-increasing tree growth process for recursive trees and applications

Laura Eslava

(Date: June 22nd 2018)

Abstract.

We introduce a non-increasing tree growth process $((T_{n},{\sigma}_{n}),\,n\geq 1)$ , where $T_{n}$ is a rooted labeled tree on $n$ vertices and ${\sigma}_{n}$ is a permutation of the vertex labels. The construction of $(T_{n},{\sigma}_{n})$ from $(T_{n-1},{\sigma}_{n-1})$ involves rewiring a random (possibly empty) subset of edges in $T_{n-1}$ towards the newly added vertex; as a consequence $T_{n-1}\not\subset T_{n}$ with positive probability. The key feature of the process is that the shape of $T_{n}$ has the same law as that of a random recursive tree, while the degree distribution of any given vertex is not monotonous in the process.

We present two applications. First, while couplings between Kingman’s coalescent and random recursive trees where known for any fixed $n$ , this new process provides a non-standard coupling of all finite Kingman’s coalescents. Second, we use the new process and the Chen-Stein method to extend the well-understood properties of degree distribution of random recursive trees to extremal-range cases. Namely, we obtain convergence rates on the number of vertices with degree at least $c\ln n$ , $c\in(1,2)$ , in trees with $n$ vertices. Further avenues of research are discussed.

Key words and phrases:

Tree growth processes, Kingman’s coalescent, random recursive trees, coupling, Chen-Stein method, extreme values

2010 Mathematics Subject Classification:

60C05, 05C80.

1. Introduction

In a paper of 1970 [21], Na and Rapoport presented the problem of modeling how the structure of networks (as sociograms, communication and acquaintance networks) emerge through time. They considered two cases: * ‘growing’* trees and ‘static’ trees. The ‘growing’ model is now know as uniform attachment model and each instance is usually named (random) recursive tree. These are part of a broad class of tree growth models where vertices are sequentially added and connected to a random vertex in the current tree. On the other side, the term ‘static’ was motivated by the fact that this construction starts with the $n$ vertices the tree is aimed to have and $n-1$ edges are added one by one (without creating cycles). The ‘static’ model was an early description of what is now referred to as coalescent processes. The seemingly two distinct models of growth have been shown to be related for certain coalescent procedures (e.g. additive and Kingman’s); that is, their resulting trees can also be constructed by a growth process [1, 20, 22]. In particular, Kingman’s coalescents correspond, for any fixed number of vertices $n$ , to recursive trees; see Remark 2.3.

Here we present a non-increasing tree growth process $((T_{n},{\sigma}_{n}),\,n\geq 1)$ where $T_{n}$ is a rooted labeled tree on $n$ vertices and ${\sigma}_{n}$ is a permutation of the vertex labels. The three key features of this new growth process are:

(1)

The shape of $T_{n}$ has the same distribution as that of recursive trees (vertices are labeled uniformly at random), 2. (2)

adding edges according to the permutation $\sigma_{n}$ (in reverse order), recovers Kingman’s coalescent, 3. (3)

there is a positive probability that $T_{n-1}\not\subset T_{n}$ .

Formally, we introduce the class $\mathcal{D}_{n}$ of decorated trees on $n$ vertices and a random mapping $\mathrm{H}_{n}:\mathcal{D}_{n-1}\to\mathcal{D}_{n}$ such that $\mathrm{H}_{n}(T_{n-1},{\sigma}_{n-1})\stackrel{{\scriptstyle\mathrm{dist}}}{{=}}(T_{n},{\sigma}_{n})$ for all $n>1$ . Our main result, Theorem 1.1, states that recursively applying the mappings $\mathrm{H}_{n}$ to the unique element in $\mathcal{D}_{1}$ gives uniformly random decorated trees on $\mathcal{D}_{n}$ ; from which the properties above are recovered. The fact that we can construct recursive trees in a non-increasing fashion is, to the best of our knowledge, a novel idea and it opens a wide range of further avenues of research. We discuss some of them in the last section.

We call Robin-Hood pruning to the random mapping $\mathrm{H}_{n}$ that builds $(T_{n},{\sigma}_{n})$ from $(T_{n-1},{\sigma}_{n-1})$ ; it is the key conceptual contribution of this work and builds on the correspondence between recursive trees and Kingman’s coalescent exploited in [2, 10]. It seems that such connection had been rarely exploited, with exception to [6, 23], where an equivalent construction was used by to study union-find trees and then related to recursive trees.

Additionally, we provide applications to high-degree vertices of recursive trees and their maximum degree. Kingman’s coalescent had already been exploited by Addario-Berry and the author to describe near-maximum degrees in recursive trees, [2, 10]. With the new procedure, we are able to extract finer information about extreme degree values in recursive trees. The main underlying technique is the Chen-Stein method for convergence rates to Poisson distributions. Informally, this method approximates the law of a sum $W$ of indicator variables, by understanding how the law of such indicator variables changes when conditioning on one of them being equal to one. In our case, the sum $W$ counts the number of vertices with high-degree. The perspective of the Robin-Hood pruning allow us to understand how the vertex-degree distributions change when we condition on one of such vertex-degrees being large.

Before we continue to precise statements of our results, we introduce basic notation that will be used throughout the paper as well as the standard construction of recursive trees.

1.1. Notation

For $n\in\mathbb{N}$ , we write $[n]=\{1,\ldots,n\}$ and ${\mathcal{S}}_{n}$ for the set of permutations on $[n]$ . We denote natural logarithms by $\ln(\cdot)$ and logarithms with base 2 by $\log(\cdot)$ .

Given a rooted labeled tree $T=(V(T),E(T))$ , write $|T|=|V(T)|$ and call $|T|$ the size of $T$ . We write ${\mathcal{T}}_{n}$ for the set of rooted trees $T$ with vertex set $V(T)=[n]$ . By convention, we direct all edges toward the root $r(T)$ and write $e=uv$ for an edge with tail $u$ and head $v$ . For $u\in V(T)\setminus\{r(T)\}$ we write $p_{T}(u)$ for the parent of $u$ , that is, the unique vertex $v$ with $uv$ in $E(T)$ . Finally, write $\mathrm{d}_{T}(v)$ for the number of edges directed toward $v$ in $T$ , and call $\mathrm{d}_{T}(v)$ the degree of $v$ . Note that $\mathrm{d}_{T}(v)=\#\{u:p_{T}(u)=v\}$ .

We say $T\in{\mathcal{T}}_{n}$ is increasing if its vertex labels increase along root-to-leaf paths; in other words, if $T\in{\mathcal{T}}_{n}$ and $p_{T}(v)<v$ for all $v\in[n]\setminus\{r(T)\}$ (in particular, $r(T)=1$ ). We write $\mathcal{I}_{n}\subset{\mathcal{T}}_{n}$ for the set of increasing trees of size $n$ . Using induction, it is easy to see that $|\mathcal{I}_{n}|=(n-1)!$ for all $n$ . Next, a tree growth process is a sequence $(T_{n},\,n\geq 1)$ of trees with $T_{n}\in{\mathcal{T}}_{n}$ for each $n$ . The process is increasing if $T_{n}\subset T_{n+1}$ for all $n$ ; this implies that $T_{n}\in\mathcal{I}_{n}$ for all $n$ .

Recursive trees on $n$ vertices, which we denote $R_{n}$ , are usually constructed as follows. Start with $R_{1}$ as a single node with label 1. For each $1<j\leq n$ , $R_{j}$ is obtained from $R_{j-1}$ by adding a new vertex $j$ and connecting it to $v_{j}\in[j-1]$ ; the choice of $v_{j}$ is uniformly random and independent for each $1<j\leq n$ . It is readily seen that $R_{n}$ is a uniformly random tree in $\mathcal{I}_{n}$ . It follows that the process $(R_{n},\,n\geq 1)$ is a random increasing tree growth process.

1.2. The new growth process

In what follows we extend the concept of increasing trees. If $T\in{\mathcal{T}}_{n}$ and $\sigma\in{\mathcal{S}}_{n}$ then $\sigma(T)$ is the tree $T^{\prime}\in{\mathcal{T}}_{n}$ with edges $\{\sigma(u)\sigma(v):uv\in E(T)\}$ . In words, $T^{\prime}$ is obtained from $T$ by relabeling the vertices of $T$ according to the permutation $\sigma$ ; see Figure 1 for an example. We say that $\sigma$ is an stamp history for $T$ if $\sigma(T)$ is increasing. If $\sigma$ is an stamp history for $T$ then we say that the pair $(T,\sigma)$ is a recursively decorated tree or decorated tree, and that vertex $v$ has time stamp $\sigma(v)$ . We denote the set of decorated trees of size $n$ by

[TABLE]

For each $n\geq 2$ , the Robin-Hood pruning $\mathrm{H}_{n}:\mathcal{D}_{n-1}\to\mathcal{D}_{n}$ is a random mapping that can be applied to any decorated tree. The exact definition of $\mathrm{H}_{n}$ will be given in Section 3. Broadly speaking, $\mathrm{H}_{n}(T,\sigma)$ is obtained from $(T,\sigma)$ by pruning some subtrees of $T$ and placing them as subtrees of a new vertex labeled $n$ ; additionally, vertex $n$ attaches to a random vertex or becomes the root of the new tree. The stamp history in $\mathrm{H}_{n}(T,\sigma)$ is adjusted from $\sigma$ such that vertex $n$ has a uniformly random time stamp. Heuristically, the random procedure follows a ‘steal from the old to give to the new’ scheme; that is, once the time stamp of $n$ has been determined, vertices with an earlier time stamp have larger probability of being reattached to vertex $n$ .

The content of our main theorem says that, when the input of $\mathrm{H}_{n}$ is uniformly random in $\mathcal{D}_{n-1}$ the output is uniformly random in $\mathcal{D}_{n}$ . For the remainder of the paper, for any $n\geq 1$ , the pair $(T_{n},{\sigma}_{n})$ denotes a uniformly random element in $\mathcal{D}_{n}$ . Such result boils down to carefully setting up the distribution of the random parameters involved in the Robin-Hood pruning.

Theorem 1.1.

For each $n\geq 2$ , the Robin-Hood pruning provides a coupling between $(T_{n-1},{\sigma}_{n-1})$ and $(T_{n},{\sigma}_{n})$ such that $(T_{n},{\sigma}_{n})=\mathrm{H}_{n}(T_{n-1},{\sigma}_{n-1})$ .

Note that $|\mathcal{D}_{1}|=1$ , thus the Robin-Hood pruning can be unambiguously applied to decorated trees starting from $\mathcal{D}_{1}$ . Theorem 1.1 implies that the tree growth process $((T_{n},{\sigma}_{n}),\,n\geq 1)$ given by $(T_{n},{\sigma}_{n})=\mathrm{H}_{n}((T_{n-1},{\sigma}_{n-1}))$ is composed of uniformly random decorated trees, but it yields a non-increasing growth process on trees. This occurs since the rewiring may destroy some subtrees in the previous tree; see Remark 3.4. However, the shape of $T_{n}$ has the same law as that of $R_{n}$ ; this is proven by a straightforward bijection between $\mathcal{D}_{n}$ and $\mathcal{I}_{n}\times{\mathcal{S}}_{n}$ .

Proposition 1.2.

For each $n\in\mathbb{N}$ , $|\mathcal{D}_{n}|=n!(n-1)!$ and if $(T_{n},{\sigma}_{n})\in\mathcal{D}_{n}$ is chosen uniformly at random then ${\sigma}_{n}(T_{n})\stackrel{{\scriptstyle\mathrm{dist}}}{{=}}R_{n}$ is a recursive tree of size $n$ and ${\sigma}_{n}$ is a uniformly chosen permutation in ${\mathcal{S}}_{n}$ .

Proof.

By definition, if $(T,\sigma)\in\mathcal{D}_{n}$ , then $\sigma(T)\in\mathcal{I}_{n}$ . Let $\varphi:\mathcal{D}_{n}\to\mathcal{I}_{n}\times{\mathcal{S}}_{n}$ be defined such that $\varphi(T,\sigma)=(\sigma(T),\sigma)$ . For an increasing tree $T$ and $\sigma\in{\mathcal{S}}_{n}$ , let $T^{\prime}=\sigma^{-1}(T)$ then $\varphi(T^{\prime},\sigma)=(T,\sigma)$ , it is also straightforward that $\varphi$ is injective. Therefore, $|\mathcal{D}_{n}|=|\mathcal{I}_{n}|\cdot|{\mathcal{S}}_{n}|=n!(n-1)!$ . The result follows since bijections preserve the uniform measure on finite probability spaces. ∎

Growth procedures naturally couple families of trees as the size varies. For example, Proposition 1.3 below shows that $(T_{n},\sigma_{n})$ is a representation of Kingman’s coalescent on $[n]$ ; informally, the stamp history encodes the addition of edges in the coalescent. Precise definitions are given in Section 2, for the moment it suffices to say that $\mathbf{C}=(F_{n},\ldots,F_{1})$ denotes a Kingman’s coalescent, where the $F_{j}$ are forests.

Proposition 1.3.

Let $(T_{n},{\sigma}_{n})$ be uniformly random in $\mathcal{D}_{n}$ and $\mathbf{C}=(F_{n},\ldots,F_{1})$ be a Kingman’s coalescent. Denote $F_{1}=\{T_{\mathbf{C}}\}$ , then $T_{\mathbf{C}}\stackrel{{\scriptstyle\mathrm{dist}}}{{=}}T_{n}$ and the forests evolution is given by $\sigma_{n}$ .

Typically there is no simple coupling of finite $n$ -coalescent processes as $n$ varies. The first application of Theorem 1.1 is that the Robin-Hood pruning produces, given a Kingman’s coalescent on $n$ vertices, a Kingman’s coalescent on $n+1$ vertices.

Corollary 1.4.

The tree growth process $((T_{n},{\sigma}_{n}),\,n\geq 1)$ , coupled as in Theorem 1.1 gives an explicit coupling of all finite Kingman’s coalescents.

The proof of Proposition 1.3 is given in Section 2 and is based on previous connections between recursive trees and Kingman’s coalescents; see Remark 2.3.

1.3. High-degree vertices in $R_{n}$

In this section we establish a phase change on the number of high-degree vertices in recursive trees. Phase changes occurs on random structures when a class of variables undergo a transition from asymptotic normal limits to asymptotic Poisson limits. The change is marked by the mean of the variables going from infinite to bounded. In recursive trees, for example, the number of fringe trees of a given size undergoes a phase change when the size $k$ of the trees tend to infinite and $k=o(\sqrt{n})$ no longer holds [12]; similar results are given when the fringe trees are required to satisfy any given property or pattern [5, 16]. For an integer $0<m\leq n$ , let us count the number of high-degree vertices by

[TABLE]

and write $\lambda_{n,m}={\mathbb{E}}\left[Z_{m}^{(n)}\right]$ . The following estimates where implicitly given in [2] and a proof can be found in Appendix A, Proposition 6.3. For each $c\in(0,2)$ , there is $\gamma=\gamma(c)$ such that uniformly over $m<c\ln n$ ,

[TABLE]

It thus follows that the phase change occurs when $m=m(n)\approx\log n$ . Using a Poisson approximation together with (1) we obtain the following phase change for the counts on high-degree vertices.

Theorem 1.5.

For each $c\in(1,\log e)$ there exists $c^{\prime}\in(1,c)$ such that if $c^{\prime}\ln n<m<c\ln n$ , then

[TABLE]

If $m\geq\log n$ , under sequences $n_{j}$ for which $\lambda_{n_{j},m}\to\lambda$ we have that

[TABLE]

Remark 1.6.

Counting the high-degree vertices is equivalent to count fringe trees (of all sizes) with a high-degree root. Therefore, the asymptotic normal distribution of $Z_{m}^{(n)}$ , (2), follows from [16, Corollary 1.25]; however, the computation of both the mean and variance for the renormalization of the variables is not seemingly straightforward. Nevertheless, we remark that the associated convergence rates in Theorem 1.7 are strong and novel.

Previous results on the profile of recursive trees consider $X_{m}^{(n)}=\#\{v\in[n]:\mathrm{d}_{R_{n}}(v)=m\}$ , for $m<n$ . For finite values of $m$ , Janson established the joint limiting distribution of $(X_{m}^{(n)},\,m\geq 1)$ in [17]. Addario-Berry and the author addressed the case $m=m(n)\to\infty$ , providing all the possible limiting distributions of $(X_{\lfloor\log n\rfloor+k}^{(n)},\,k\in\mathbb{Z})$ and establishing asymptotic normality for $X_{m}^{(n)}$ when $m=\log n-d$ and $d=d(n)$ slowly tends to infinity [2].

Theorem 1.5 follows from the convergence rates of the next theorem, which in turn, applies the Chen-Stein method to $Z_{m}^{(n)}$ . By changing the perspective of recursive trees to the distribution equivalent $T_{n}$ we can use the Robin-Hood pruning to understand how the variables $({\mathbf{1}}_{[_{]}}{\mathrm{d}_{T_{n}}(v)\geq m},\,v\in[n])$ change when conditioning to $\mathrm{d}_{T_{n}}(v)\geq m$ . The details of this approach are somewhat delicate, so we defer the discussion to Section 4.

Theorem 1.7.

Fix $1<c^{\prime}<c<2$ . There are constants $\alpha=\alpha(c)\in(0,1)$ and $\beta=\beta(c^{\prime})>0$ such that uniformly for $m=m(n)$ satisfying $c^{\prime}\ln n<m<c\ln n$ ,

[TABLE]

Remark 1.8.

A detailed but simple track of the conditions on $\alpha$ , see Proposition 4.1, shows that there is a non-empty interval $\mathcal{I}=((1-\alpha)\log e,c)$ such that if $c^{\prime}\in\mathcal{I}\cap(1,2)$ , then the bounds in Theorem 1.7 are, in fact, tending to zero.

Remark 1.9.

The exponent $\alpha$ is determined by almost negative correlation between pairs of vertices in $T_{n}$ (see Proposition 4.1), while the exponent $\beta$ depends on an auxiliary coupling based on the Robin-Hood pruning (see Proposition 4.4). We believe that the constraint on $c^{\prime}>1$ could be relaxed by obtaining uniform bounds on ${\mathbb{P}}\left(\mathrm{d}_{R_{n}}(i)=m\right)$ rather than ${\mathbb{P}}\left(\mathrm{d}_{R_{n}}(i)\geq m\right)$ .

Finally, consider now the maximum degree $\Delta_{n}$ of a recursive tree $R_{n}$ . Note that $Z_{m}^{(n)}>0$ if and only if $\Delta_{n}\geq m$ . Therefore, having ${\mathbb{E}}\left[Z^{(n)}_{\log n}\right]\approx 1$ indicates $\Delta_{n}\approx\log n$ . In fact, Devroye and Lu showed that $\Delta_{n}/\log n\to 1$ a.s. [7]. The first tail bounds on $\Delta_{n}$ where obtained for ${\mathbb{P}}\left(\Delta_{n}<\lfloor\log n\rfloor+i\right)$ with $i\in\mathbb{Z}$ using singularity analysis of generating functions [13]. The relation between recursive trees and Kingman’s coalescent provided simpler proofs to such results, extending it also to $i<2\ln n-\log n$ [2]. The bounds in Theorem 1.7 yield broader, tighter bounds.

Corollary 1.10.

There exists $C>0$ such that uniformly over $0<i=i(n)<\log e\ln\ln n-C$ ,

[TABLE]

where $\varepsilon_{n}=\log n-\lfloor\log n\rfloor$ .

The maximum of i.i.d. random variables is, under rather general conditions, distributed in the limit as the Gumbel (or double-exponential) distribution [15]; however lattice distributions are excluded from this regime. Addressing the case of integer-valued variables, Anderson gives sufficient conditions under which the Gumbel distribution serves as an approximation for their maximum [3]; among those is the geometric distribution. Now, when we randomize the labels in $R_{n}$ (e.g. using the tree $T_{n}$ instead), vertex degrees become exchangeable and their limiting distributions are geometric. Although, the degrees of $T_{n}$ are not independent, their correlations are weak and the Gumbel-type approximation still arises for the distribution of $\Delta_{n}$ . Goh and Schmutz provide an alternative heuristic based on the fact that $\mathrm{d}_{R_{n}}(i)$ , with $i\to\infty$ slowly, is asymptotically normal [13].

Outline

The paper is divided into two parts. First, we discuss more on the connection between recursive trees, Kingman’s coalescents and other tree models in Section 2. The precise definition of the Robin-Hood pruning $\mathrm{H}_{n}$ together with the proof of Theorem 1.1 is given in Section 3. Second, the results on high-degree vertices of recursive trees use the Chen-Stein method and the Robin-Hood pruning in a non-trivial way. An overview on how we use the Chen-Stein method is given Section 4. Assuming the existence of an auxiliary coupling (Proposition 4.4), we complete the proofs concerning high-degree vertices in Section 4.1. The auxiliary coupling, based on the Robin-Hood pruning, is presented in Section 5. And finally, Section 6 discusses further avenues of research.

2. Kingman’s coalescents and recursive trees: distinct representations

Discrete coalescents are processes on partitions of $[n]$ that can be represented with different tree structures. On can encode the coalescent using an $n$ -chain: a sequence of forests where, at all times there are $n$ vertices (or elements), and $n-1$ edges are added one by one until a tree is formed. However, there is an more traditional construction using binary search trees (BST) where internal nodes correspond to merges and only external nodes correspond to elements of the coalescent. In the next section we introduce the representation used in this paper and prove Proposition 1.3. Following that, we discuss the well-know bijection between BST’s and recursive trees and the difference between the two coalescent representations. In addition, we explain the difference between the Yule-Harding model of phylogenetic trees and its uniform model, and highlight the importance of clarifying both the rules applied to the mergings in coalescent processes and their representation as trees.

2.1. Recursive trees perspective

Na and Rapoport loosely described this process as the construction of ‘static’ trees with $n$ vertices [21]:

“Initially, single elements move about at random. Each collision forms a couple. A collision of a couple with a single element forms a triple, a collision of an $s$ -tuple with a $t$ -tuple forms an $(s+t)$ -tuple, and so on. At each collision a link is established between an element of one $X$ -tuple and an element of another, the links being rigid so that the elements of the same $k$ -tuple cannot collide. The process goes on until the entire set of $n$ elements has been joined into an $n$ -tuple.”

By changing the rule on how to link the elements on the tuples, we obtain distinct coalescent distributions. In the description in [21] there are no restrictions on the elements allowed to be linked during the coalescent111Unfortunately, it was incorrectly presumed in [21] that ‘static’ trees build uniformly random unrooted labeled trees.. In fact, the discrete multiplicative coalescent arises when any possible link is chosen uniformly at random. It is associated with Kruskal’s algorithm for the minimum weighted spanning tree problem [1].

Kingman’s coalescent is characterized by the property that the merging probability of any pair of components is independent of the components’ sizes. The representation used in this paper uses that, at each time the representative of each ‘tuple’ is the current root of the tree and it is closely related to the ‘union-find’ algorithm used in computer science (see e.g. [24]). A formal description follows.

A forest $f$ is a set of trees with pairwise disjoint vertex sets. Denote by $V(f)$ and $E(f)$ , respectively, the union of the vertex and edge sets of the trees in $f$ . For $n\geq 1$ , an $n$ -chain is a sequence $C=(f_{n},\ldots,f_{1})$ of elements of $\mathcal{F}_{n}=\{f:V(f)=[n]\}$ such that, for $1<i\leq n$ , $f_{i-1}$ is obtained from $f_{i}$ by adding a directed edge between the roots of some pair of trees in $f_{i}$ . In particular, $f_{n}$ consists of $n$ one-vertex trees and $f_{1}$ consists of a single tree on $n$ vertices denoted by $T_{C}$ . For an example see Figure 2.

Next we introduce the necessary notation to define Kingman’s coalescent using $n$ -chains. For an $n$ -chain $(f_{n},\ldots,f_{1})$ and $1\leq i\leq n$ , list the trees of $f_{i}$ in increasing order of their smallest-labeled vertex as $T_{1}^{(i)},\ldots,T_{i}^{(i)}$ . Independently for each $1<i\leq n$ let $\{a_{i},b_{i}\}\subset\{\{a,b\}:1\leq a<b\leq i\}$ be uniformly chosen at random; in addition, let $\xi_{i}$ be independent Bernoulli random variables with mean $1/2$ .

Definition 2.1.

Kingman’s $n$ -coalescent is defined as $\mathbf{C}=(F_{n},\ldots,F_{1})$ constructed as follows. For $1<i\leq n$ , $F_{i-1}$ is obtained from $F_{i}$ by adding an edge between $r(T_{a_{i}}^{(i)})$ and $r(T_{b_{i}}^{(i)})$ . If $\xi_{i}=1$ then direct the edge towards $r(T_{a_{i}}^{(i)})$ ; otherwise direct it towards $r(T_{b_{i}}^{(i)})$ . The forest $F_{i-1}$ consists of the new tree and the remaining $i-2$ unaltered trees from $F_{i}$ .

In other words, if $\mathbf{C}=(F_{n},\ldots,F_{1})$ is a Kingman’s coalescent, then each of the trees of $F_{i}$ correspond to a set of coalesced elements after $n-i+1$ steps of the process. At each step, two sets (represented by their roots) coalesce and a new representative is chosen uniformly at random.

To link $n$ -chains with decorated trees, we first define a natural edge labeling that tracks the number of trees left in the forest when a give edge comes along. Fix $C=(f_{n},\ldots,f_{1})$ , for each $e\in E(T_{C})$ , let

[TABLE]

We next define a vertex labeling $\sigma_{C}:V(T_{C})\to[n]$ . Let $\sigma_{C}(r(T_{C}))=1$ , and for each $uv\in E(t_{C})$ , let

[TABLE]

The following proposition shows that the pair $(T_{C},\sigma_{C})\in\mathcal{D}_{n}$ contains all the information to recover the original $n$ -chain $C$ ; in other words, if $\mathcal{C}_{n}$ denotes the set of $n$ -chains, then $\mathcal{D}_{n}$ and $\mathcal{C}_{n}$ are in bijection.

Proposition 2.2.

Let $\Upsilon:\mathcal{C}_{n}\to\mathcal{D}_{n}$ be defined as follows. For an $n$ -chain $C=(f_{n},\ldots,f_{1})$ , let $\Upsilon(C)=(T_{C},\sigma_{C})$ . Then $\Upsilon$ is a bijection.

Proof.

First, we show that $\mathcal{C}_{n}$ and $\mathcal{D}_{n}$ have the same cardinality. To count the number of $n$ -chains, consider constructing $(f_{n},\ldots,f_{1})$ by deciding which edge to add from $f_{k}$ to $f_{k-1}$ . Since there are $k$ trees in $f_{k}$ , when we have chosen $(f_{n},\ldots,f_{k})$ , there are $k(k-1)$ possible directed edges to add. Therefore, $|\mathcal{C}_{n}|=n!(n-1)!$ .

Next, let $C=(f_{n},\ldots,f_{1})$ be an $n$ -chain. For each $1\leq i<n$ , the new edge in $f_{i}$ joins the roots of two trees in $f_{i+1}$ and is directed towards the root of the resulting tree. Thus, the labels $\{\rho_{C}(e),\,e\in E(T_{C})\}$ decrease along all paths in $T_{C}$ towards the root $r(T_{C})$ . Consequently, the labels $\{\sigma_{C}(v),\,v\in[n]\}$ are, indeed, an stamp history of $T_{C}$ . It follows that $\Upsilon$ is well defined.

Finally, let $C=(f_{n},\ldots,f_{1})$ , $C^{\prime}=(f^{\prime}_{n},\ldots,f^{\prime}_{1})$ be distinct $n$ -chains and write $k=\min\{i:\,f_{i}\neq f^{\prime}_{i}\}$ . If $k=1$ then $T_{C}\neq T_{C^{\prime}}$ and clearly, $\Upsilon(C)\neq\Upsilon(C^{\prime})$ . Otherwise $T_{C}=T_{C^{\prime}}$ , $f_{k-1}=f^{\prime}_{k-1}$ and the (unique) edges $e\in E(f_{k-1})\setminus E(f_{k})$ and $e^{\prime}\in E(f^{\prime}_{k-1})\setminus E(f^{\prime}_{k})$ are distinct. It follows that $e=uv\in f^{\prime}_{k}$ and so $\sigma_{C}(u)=k>\sigma_{C^{\prime}}(u)$ . This shows that $\Upsilon$ is injective, and so $\Upsilon$ is a bijection between $\mathcal{C}_{n}$ and $\mathcal{D}_{n}$ . ∎

Using the bijection of Proposition 2.2, it follows that Proposition 1.3 boils down to showing that $\mathbf{C}$ is uniformly random in $\mathcal{C}_{n}$ .

Proof of Proposition 1.3.

Let $\mathbf{C}=(F_{n},\ldots,F_{1})$ be a Kingman’s coalescent. For any fixed $n$ -chain $(f_{n},\ldots,f_{1})\in\mathcal{C}_{n}$ ,

[TABLE]

Among the $k(k+1)$ possible oriented edges connecting roots of $f_{k+1}$ , exactly one of them can be added to $f_{k+1}$ to yield $f_{k}$ . Thus, regardless of the sequence $(f_{n},\ldots,f_{1})$ ,

[TABLE]

Recall $F_{1}=\{T_{\mathbf{C}}\}$ . By Proposition 2.2, $(T_{\mathbf{C}},{\sigma}_{\mathbf{C}})\in\mathcal{D}_{n}$ and it has a uniform distribution, since the bijection preserves the uniform measure of $\mathbf{C}$ . Finally, by Proposition 2.2, it follows that $T_{\mathbf{C}}\stackrel{{\scriptstyle\mathrm{dist}}}{{=}}T_{n}$ . The evolution of the forests is given by $\rho_{\mathbf{C}}$ ; equivalently by $\sigma_{\mathbf{C}}$ . ∎

Remark 2.3.

It follows from Propositions 1.2 and 1.3 that, for any fixed $n$ and up to relabeling of vertices, Kingman’s coalescent correspond to recursive trees. See also [1, 2] for direct proofs of this fact.

2.2. The binary search tree connection

Binary search trees have been related to both recursive trees and phylogenetic trees. In this section we briefly discuss these connections and compare them with Kingman’s coalescent. Let $\mathcal{B}_{n}$ be the set containing all plane, rooted, unlabeled binary trees with $n$ external nodes. Trees in $\mathcal{B}_{n}$ distinguish between left and right subtrees of any given internal vertex. It can be shown that the sizes $|\mathcal{B}_{n}|=3\cdot 5\cdots(2n-3)$ are given by the Catalan numbers.

Binary search trees are the tree representation of the sorting algorithm Quicksort. Simply described, for each $n\geq 1$ , the quicksort algorithm takes a permutation $\sigma\in[n]$ and constructs (step by step) a binary tree with internal vertex labels on $[n]$ as follows. The root is $\sigma(1)$ and vertices $\sigma(2),\ldots,\sigma(n)$ are sequentially added so that the final tree satisfies the following property: for any internal node $j$ , all nodes on its left subtree are smaller than $j$ and all nodes on its right tree are larger than $j$ . It follows that, given a shape of a binary tree $B\in\mathcal{B}_{n+1}$ , there is exactly one way to label internal vertices.

There are $n!$ distinct permutations as input for the quicksort algorithm. Devroye introduced the representation of the binary search tree (process) using time stamps which record all the insertion process. Using this representation, the rotation correspondence maps (one-to-one) recursive trees (on $n-1$ vertices) and binary search trees (with $n$ external vertices). For a thorough description of the correspondence see [16, Section 2, Figures 1-2].

On the other side, phylogenetic trees on $n$ species are represented by elements in $\mathcal{B}_{n}$ . In this case, species are assumed to have a common ancestor (the root), internal nodes are also ancestors and the time elapsed between differentiation of species, length of the branches, is omitted. Two common distributions on phylogenetic trees are the uniform one, known as the Catalan model, and the Yule-Harding model. The latter is a process that constructs trees starting from the root, by branching a uniformly random external node and replacing it with a cherry (an internal node with two external nodes). Clearly, this construction corresponds one-to-one to the quicksort algorithm. We remark that the Yule-Harding process does not yield uniform phylogenetic trees (as neither the BST is uniform in $\mathcal{B}_{n}$ ). For further discussion between the two models, see e.g. [4, Section 3].

It has been presumed that Kingman’s coalescent is the bottom’s up construction of the Yule-Harding model, see e.g. [5]; however, such correspondence has to be done carefully, as merges in principle are not bound to satisfy planarity constraints. As we can see through the bijections and $n!$ -to-1 mappings from Propositions 2.2 and 1.2, there should be a correspondence between BST’s with time stamps and Kingman’s coalescent.

The construction of Kingman’s coalescent as a binary tree in $\mathcal{B}_{n}$ with time stamps is the following. Using the same random variables used in Definition 2.1, add an internal node connecting the two roots of the merging trees, while the coin flip indicates which of the trees is left child of the new internal vertex; the time stamps indicate the (reversed) order of addition of internal nodes. Note that in this construction, the symmetry breaking of the coin flip is still necessary.

Conversely, we describe how to interpret time stamps of a BST as the merging history of a Kingman’s coalescent. To do so, we have to label external nodes uniformly at random (so that there are a total of $n!(n-1)!$ different processes). Now, the role of internal vertices is as follows. At step $k\in[n-1]$ , the two set of external vertices in each of the subtrees of the vertex with time stamp $n-k$ are the subsets to be merged in the coalescent.

Kingman’s coalescent has uniform distribution when considering all possible merging histories with elements labeled exchangeably. However, considering only the final tree (either in ${\mathcal{T}}_{n}$ or $\mathcal{B}_{n}$ ) yields a non-uniform distribution: there are $n!(n-1)!$ total ways to merge the subtrees (if we use the symmetry breaking at each merging), but there are only $|{\mathcal{T}}_{n}|=n^{n-1}$ and $|\mathcal{B}_{n}|=3\cdot 5\cdots(2n-3)$ different rooted, labeled trees and phylogenetic trees, respectively.

3. The Robin-Hood pruning

The Robin-Hood pruning $\mathrm{H}_{n}:\mathcal{D}_{n-1}\to\mathcal{D}_{n}$ is a random procedure based on randomizing the parameters of a deterministic mapping $\mathrm{h}_{n}:\mathcal{D}_{n-1}\times\mathcal{P}_{n}\to\mathcal{D}_{n}$ where the set $\mathcal{P}_{n}$ defines all possible ways to prune a decorated tree on $n-1$ vertices. The distribution on $\mathcal{P}_{n}$ is tailored so that the Robin-Hood pruning, in fact, yields a coupling of $((T_{n},\sigma_{n});\,n\geq 1)$ .

First we introduce the necessary notation to define $\mathcal{P}_{n}$ , the deterministic pruning $\mathrm{h}_{n}$ and verify that, indeed, the mapping $\mathrm{h}_{n}$ is well defined. We then continue to define the distribution on $\mathcal{P}_{n}$ used in defining the Robin-Hood pruning (Definition 3.5). The proof of Theorem 1.1 requires us to characterize the properties of the uniform distribution in decorated trees. For the characterization in Lemma 3.6 and the proof of Theorem 1.1, we underline the difference between deterministic elements $(T,\pi)$ of $\mathcal{D}_{n}$ and random elements $(\bm{T},\bm{\sigma})$ using bold notation; the distribution of $(\bm{T},\bm{\sigma})$ is not given a priori.

3.1. A deterministic process

Informally, we define all possible ways to prune a decorated tree on $n-1$ vertices using three parameters $(k,l,x)\in\mathcal{P}_{n}$ : the time stamp $k$ of the new vertex, its point of attachment $l$ given by time stamp, and the vertices to be rewired encoded by time stamp in the sequence $x=(x_{1},\ldots,x_{n-1})$ . Once a stamp history is given to a tree $T$ , $\mathcal{V}_{n}$ contains the vertices to be pruned and rewired towards the new vertex $n$ .

We now proceed to precise definitions. Let $n\geq 2$ and set

[TABLE]

additionally, for $(k,l,x)\in\mathcal{P}_{n}$ and a permutation $\sigma\in{\mathcal{S}}_{n-1}$ , let

[TABLE]

Remark 3.1.

The definition of $\mathcal{P}_{n}$ is such that $\sigma^{-1}(1)\in\mathcal{V}_{n}$ if and only if $k=1$ .

The following deterministic pruning is illustrated in Figure 3.

Definition 3.2.

Fix $n\geq 2$ , $(T,\sigma)\in\mathcal{D}_{n-1}$ and $(k,l,x)\in\mathcal{P}_{n}$ . We define $(T^{\prime},{\sigma}^{\prime})$ and set

[TABLE]

as follows. First, let $\mathcal{V}=\mathcal{V}_{n}(k,x,\sigma)$ and construct $T^{\prime}$ from $T$ : For each $v\in\mathcal{V}\setminus\{r(T)\}$ , replace the edge $vp_{T}(v)$ with an edge connecting $v$ to a new vertex labeled $n$ . Now, if $k=1$ then attach $r(T)$ to $n$ ; otherwise, attach vertex $n$ to $\sigma^{-1}(l)$ . In other words, the edges of $T^{\prime}$ are given by

[TABLE]

Second, let ${\sigma}^{\prime}:[n]\to[n]$ be defined by ${\sigma}^{\prime}(n)=k$ and for $v<n$ ,

[TABLE]

Lemma 3.3.

For any $n\geq 2$ , $\mathrm{h}_{n}:\mathcal{D}_{n-1}\times\mathcal{P}_{n}\to\mathcal{D}_{n}$ is well defined. That is, for any $(T,\sigma)\in\mathcal{D}_{n-1}$ and $(k,l,x)\in\mathcal{C}_{n}$ ,

[TABLE]

Proof.

Write $\mathrm{h}_{n}((T,\sigma),(k,l,x))=(T^{\prime},\sigma^{\prime})$ . When $k=1$ , it is clear that $T^{\prime}$ is a tree. When $k>1$ , let $w=\sigma^{-1}(l)$ be the parent of $n$ in $T^{\prime}$ and let $(w=v_{1},\ldots,v_{j}=r(T))$ be the path from $w$ to the root of $T$ . Since $\sigma$ is a stamp history of $T$ , $l=\sigma(v_{1})>\sigma(v_{2})>\cdots>\sigma(v_{j})=1$ ; moreover, $l<k$ . It follows that $v_{i}\notin\mathcal{V}(k,l,x,\sigma)$ for all $i\in[j]$ and consequently, no edges in the path from $n$ to the root in $T^{\prime}$ closes a cycle by connecting to $n$ .

Now, we show that ${\sigma}^{\prime}$ is a stamp history for $T^{\prime}$ . It is clear that ${\sigma}^{\prime}$ is a permutation of $[n]$ , so it suffices to prove that ${\sigma}^{\prime}(v)>{\sigma}^{\prime}(p_{T^{\prime}}(v))$ , for all $v\in V(T)\setminus\{r(T^{\prime})\}$ . First, for vertices $v$ with $p_{T^{\prime}}(v)=n$ we have $\sigma(v)\geq k$ and consequently ${\sigma}^{\prime}(v)=\sigma(v)+1>k={\sigma}^{\prime}(n)$ .

Second, consider $v,w<n$ with $p_{T^{\prime}}(v)=w$ . It follows that $vw\in E(T)$ and thus $\sigma(v)>\sigma(w)$ . Consequently, ${\mathbf{1}}_{[\sigma(v)\geq k]}\geq{\mathbf{1}}_{[\sigma(w)\geq k]}$ and so ${\sigma}^{\prime}(v)>{\sigma}^{\prime}(w)$ . The last case occurs when $k>1$ and $p_{T^{\prime}}(n)=w=\sigma^{-1}(l)$ . We then have ${\sigma}^{\prime}(n)=k>l=\sigma(w)={\sigma}^{\prime}(w)$ . ∎

Remark 3.4.

Whenever $(k,l,x)\in\mathcal{P}_{n}$ has $x_{j}=1$ for some $j\geq k$ , setting $(T^{\prime},\sigma^{\prime})=\mathrm{h}_{n}((T,\sigma),(k,l,x))$ and $v=\sigma^{-1}(j)$ yields $n=p_{T^{\prime}}(v)\neq p_{T}(v)\in[n-1]$ . This implies that $T\not\subset T^{\prime}$ .

3.2. The random process

The $\mathrm{H}_{n}$ -set is a sample of $\mathcal{P}_{n}$ according to the following distribution.

Definition 3.5.

Fix $n\geq 1$ . Let $K\stackrel{{\scriptstyle\mathrm{dist}}}{{=}}\mathrm{Unif}\left(1,2,\ldots,n\right)$ ; if $K=1$ let $L=0$ , and if $K>1$ let $L=\mathrm{Unif}\left(1,2,\ldots K-1\right)$ . Independently, let $X=(X_{1},\ldots,X_{n-1})$ where $X_{i}\stackrel{{\scriptstyle\mathrm{dist}}}{{=}}\mathrm{Bernoulli}\left(1/i\right)$ are independent variables. An $\mathrm{H}_{n}$ -set is a triple of random variables with the same law as $(K,L,X)\in\mathcal{P}_{n}$ .

We are ready to define the Robin-Hood pruning. For each $n\geq 2$ , let $(K,L,X)\in\mathcal{P}_{n}$ be an $\mathrm{H}_{n}$ -set and define

[TABLE]

The law of $\mathrm{H}_{n}(T,\sigma)$ depends on the initial input $(T,\sigma)$ ; however, the distribution of the $\mathrm{H}_{n}$ -set is tailored so that $\mathrm{H}_{n}(T_{n-1},\sigma_{n-1})$ preserves the uniform measure on decorated trees. In order to prove Theorem 1.1, we start with a characterization of $(T_{n},\sigma_{n})$ .

Lemma 3.6.

Let $n\geq 1$ be an integer. A random decorated tree $(\bm{T},\bm{\sigma})\in\mathcal{D}_{n}$ is uniformly random if and only if the following properties are satisfied.

i)

The permutation $\bm{\sigma}$ is uniformly random on ${\mathcal{S}}_{n}$ . 2. ii)

Conditionally given $\bm{\sigma}$ , the vertices $(p_{\bm{\sigma}(\bm{T})}(\bm{\sigma}^{-1}(v)),\,v\in V(\bm{T})\setminus\{r(\bm{T})\})$ are independent. 3. iii)

For all vertices $v,w\in[n]$ and indices $i,j\in[n]$ ,

[TABLE]

Proof.

Let $(\bm{T},\bm{\sigma})=(T_{n},\sigma_{n})$ be uniformly random on $\mathcal{D}_{n}$ . Condition $i)$ follows directly from Proposition 1.2 which states that $\bm{\sigma}$ is a uniformly random permutation and $\bm{\sigma}(\bm{T})$ has the law of a recursive tree $R_{n}$ . In addition, Proposition 1.2 implies

[TABLE]

from which conditions $ii)$ and $iii)$ immediately follow: Parents in recursive trees are chosen independently for each of the vertices, and for all $v,w,i,j\in[n]$ ,

[TABLE]

Now consider a random decorated tree $(\bm{T},\bm{\sigma})\in\mathcal{D}_{n}$ satisfying conditions $i)$ - $iii)$ . Fix a decorated tree $(T,\pi)\in\mathcal{D}_{n}$ , and for $v\in V(T)\setminus\{r(T)\}$ , let $w_{v}=p_{T}(v)$ . Condition $ii)$ on the conditional independence of parents gives, for $v\neq r(T)$ ,

[TABLE]

Using that $\pi$ is an stamp history for $T$ , so $\pi(v)>\pi(w_{v})$ , and that $\bm{\sigma}$ is uniformly random, it follows from (3) that

[TABLE]

Any increasing tree $T^{\prime}\in\mathcal{I}_{n}$ is determined by the set of parents $\{p_{T^{\prime}}(v),\,1<v\leq n\}$ . Using that $\pi(T)\in\mathcal{I}_{n}$ and the conditional independence from condition $ii)$ we get

[TABLE]

the last equality holds by (4) and the fact that $\{\pi(v),\,v\in V(T)\setminus\{r(T)\}\}=\{2,\ldots,n\}$ . Finally, using the equation above and that $\bm{\sigma}$ is uniformly random, we have

[TABLE]

This holds regardless of the choice of $(T,\pi)$ , so $(\bm{T},\bm{\sigma})$ is uniformly random in $\mathcal{D}_{n}$ .

∎

We are now ready to prove Theorem 1.1.

Proof of Theorem 1.1.

Let $(T_{n-1},\sigma_{n-1})\in\mathcal{D}_{n-1}$ be a uniformly random decorated tree. Let $(K,L,X)$ be an $\mathrm{H}_{n}$ -set and let $(\bm{T},\bm{\sigma})=\mathrm{h}((T_{n-1},\sigma_{n-1}),(K,L,X))$ . It suffices to show that $(\bm{T},\bm{\sigma})$ satisfies the properties in Lemma 3.6.

First, condition $i)$ follows from the construction of $\bm{\sigma}$ and the distributions of both $K$ and $\sigma_{n-1}$ . Second, once conditioning on $\bm{\sigma}$ , which is equivalent to conditioning on both $\sigma_{n-1}$ and $K$ , we get

[TABLE]

where the last two sets are conditionally independent given $\bm{\sigma}$ . Now, since $(T_{n-1},\sigma_{n-1})$ is uniformly random in $\mathcal{D}_{n-1}$ , the parents $\{p_{T_{n-1}}(v),\,v\in 1<\sigma_{n-1}(v)<K\}$ are independent, conditionally given $\sigma_{n-1}$ (and thus, also conditionally given $\bm{\sigma}$ ). On the other hand, for $v$ with $\bm{\sigma}(v)\geq K$ ,

[TABLE]

Note that $p_{\bm{T}}(v)$ is determined independently from other vertices, thus $\{p_{\bm{T}}(v),\,K\leq\bm{\sigma}(v)\leq n\}$ are also independent, conditionally given $\bm{\sigma}$ . This implies that condition $ii)$ is satisfied.

Third, fix $1\leq i<j\leq n$ and fix distinct $v,w\in[n]$ . We consider three cases; namely $v=n$ , $w=n$ , and $\{v,w\}\subset[n-1]$ . Let

[TABLE]

It remains to show that the probabilities of $A_{1},A_{2},A_{3}$ are given by (3) for all $i,j\in[n]$ . The event $p_{\bm{T}}(n)=w$ implies that $\sigma_{n-1}(w)=L<K$ . Therefore, $A_{1}$ occurs precisely when $K=j$ , $L=i$ , and $\sigma_{n-1}(w)=i$ . Then,

[TABLE]

Next, $p_{\bm{T}}(v)=n$ implies that $\sigma_{n-1}(v)\geq K$ and thus $\bm{\sigma}(v)=\sigma_{n-1}(v)+1$ . It then follows that $A_{2}$ occurs when $K=i$ , $\sigma_{n-1}(v)=j-1$ , and $X_{j-1}=1$ . Therefore,

[TABLE]

For the last case, since $u,v<n$ , it follows that $K\notin\{i,j\}$ . For each $k\in[n]\setminus\{i,j\}$ let

[TABLE]

In computing the probabilities ${\mathbb{P}}\left(A_{3,k}\right)$ we use that $(T_{n-1},\sigma_{n-1})$ is uniformly random in $RD_{n-1}$ . If $K>j$ , then both $\sigma_{n-1}(v)=\bm{\sigma}(v)$ and $\sigma_{n-1}(w)=\bm{\sigma}(w)$ ; in addition, $p_{\bm{T}}(v)=w$ only if $p_{T_{n-1}}(v)=w$ . Therefore, if $k>j$ , then

[TABLE]

Similarly, if $K<j$ , then $\sigma_{n-1}(v)=\bm{\sigma}(v)-1$ , $\sigma_{n-1}(w)=\bm{\sigma}(w)-{\mathbf{1}}_{[K<i]}$ , and additionally $X_{j-1}=0$ . It then follows that, if $k<j$ ,

[TABLE]

We have shown that ${\mathbb{P}}\left(A_{3,k}\right)$ is uniform for all $k\in[n]\setminus\{i,j\}$ , and we get

[TABLE]

Altogether, we have shown that condition $iii)$ is satisfied and so the proof is complete. ∎

4. The Poisson approximation

Recall that $(T_{n},\sigma_{n})$ is a uniform decorated tree and that $T_{n}$ has the shape of a recursive tree. In fact, Proposition 1.2 implies that the following distributional identity holds, for all $n\in\mathbb{N}$ ,

[TABLE]

It follows that the distribution of $(Z_{m}^{(n)},\,m\geq 1)$ and $\Delta_{n}$ does not change if we redefine them as $Z_{m}^{(n)}=\#\{v\in[n]:\,\mathrm{d}_{T_{n}}(v)\geq m\}$ and $\Delta=\max\{\mathrm{d}_{T_{n}}(v):\,v\in[n]\}$ . However, the correlations in $(\mathrm{d}_{T_{n}}(v);\,v\in[n])$ have a subtle difference in comparison with those in $(\mathrm{d}_{R_{n}}(i);\,i\in[n])$ . To see this, observe that $(\mathrm{d}_{R_{n}}(i),\,i\in[n])$ is negative orthant dependent; for a definition see [8]. This fact can be proven by induction from the two-vertex case $(\mathrm{d}_{R_{n}}(i),\mathrm{d}_{R_{n}}(j))$ , which, in turn, follows essentially from the negative orthant dependency of multinomial distributions, see e.g. [7, Lemma 1]. As a consequence, for all $i,j\in[n]$ ,

[TABLE]

On the other hand, the following proposition gives conditions on $m$ for the degrees in $T_{n}$ to have a pairwise ‘almost’ negative correlation.

Proposition 4.1.

For any $c\in(0,2)$ there exists $\alpha=\alpha(c)>0$ such that uniformly for $m=m(n)<c\ln n$ and distinct $v,w\in[n]$ ,

[TABLE]

Moreover, $\alpha<\frac{1}{4}(1-c+\sqrt{1+2c-c^{2}})<1$ .

We make precise the constraints on $\alpha$ as this is crucial to Theorem 1.5. A weaker version of Proposition 4.1, without explicit error bounds, was proved in [2, Proposition 4.2]; a complete proof of Proposition 4.1 appears in Appendix A.

Although we do not claim the bounds in Proposition 4.1 are optimal, it seems that the property in (5) is lost when randomizing the vertex labels of $R_{n}$ to obtain $T_{n}$ . The bound in (6) will be an important input to the Chen-Stein Method.

Briefly explained, our application of the Chen-Stein method compares, in total variation distance, the sum $Z_{m}^{(n)}$ with respect to a Poisson variable with mean ${\mathbb{E}}\left[Z_{m}^{(n)}\right]$ . The strength of the bounds depend on finding suitable couplings between $({\mathbf{1}}_{[\mathrm{d}_{T_{n}}(v)\geq m]};v\in[n])$ and conditional versions of such variables. More precisely, we use the pruning procedure to obtain $T_{n}$ and Fact 5.1 describes $(\mathrm{d}_{T_{n}}(i),\,i\in[n])$ in terms of the independent elements $(\mathrm{d}_{T_{n-1}}(i),\,i\in[n-1])$ and $\mathrm{d}_{T_{n}}(n)$ . This allows us to analyze the conditional law of $(\mathrm{d}_{T_{n}}(i),\,i\in[n-1])$ given $\{\mathrm{d}_{T_{n}}(n)\geq m\}$ holds.

Before going into further details we layout the necessary notation. Given probability measures $\mu$ and $\nu$ , a coupling of $\mu$ and $\nu$ is a pair $(X,Y)$ of random variables (either real or vector-valued) with $X\sim\mu$ and $Y\sim\nu$ . Let $I=(I_{a},\,a\in\mathcal{A})$ be a collection of $\{0,1\}$ -valued random variables. Let $\mu$ be the law of $W=\sum_{a\in\mathcal{A}}I_{a}$ and for $a\in\mathcal{A}$ let $\nu_{a}$ be the conditional law of $W$ given that $I_{a}=1$ , so

[TABLE]

We use the Chen-Stein method stated below.

Theorem 4.2 ([14, Theorem 3.7]).

Let $I=(I_{a},\,a\in\mathcal{A})$ be a collection of $\{0,1\}$ -valued random variables and let $W=\sum_{a\in\mathcal{A}}I_{a}$ . For each $a\in\mathcal{A}$ fix a coupling $(W,W_{a})$ of $\mu$ and $\nu_{a}$ . Then with $\lambda={\mathbb{E}}\left[W\right]$ , we have

[TABLE]

To apply Theorem 4.2 with as tight as possible bounds, one can exploit properties of the variables $I_{a}$ or construct couplings of $\mu$ and $\nu$ with specific properties.

Corollary 4.3.

Let $I=(I_{a},\,a\in\mathcal{A})$ be a collection of $\{0,1\}$ -valued random variables and let $W=\sum_{a\in\mathcal{A}}I_{a}$ . If the variables $I=(I_{a},\,a\in\mathcal{A})$ are exchangeable, then for any fixed $a\in\mathcal{A}$ and coupling $(W,W_{a})$ of $\mu$ and $\nu_{a}$ , we have

[TABLE]

If, moreover, $W_{a}=(J_{ab},\,b\in\mathcal{A})$ and there is a coupling $(W,W_{a})$ of $\mu$ and $\nu_{a}$ satisfying $J_{ab}\leq I_{a}$ for all $b\in\mathcal{A}\setminus\{a\}$ , then

[TABLE]

Now, for the remainder of the section, fix $m$ and let, for all $v\in[n]$ , $I_{v}={\mathbf{1}}_{[\mathrm{d}_{T_{n}}(v)\geq m]}$ , so that $Z_{m}^{(n)}\stackrel{{\scriptstyle\mathrm{dist}}}{{=}}\sum_{v\in[n]}I_{v}$ . Let $(I,J)=((I_{v},\,v\in[n]),(J_{v},\,v\in[n])$ be a coupling of $\mu$ and $\nu=\nu_{n}$ where $\mu$ is the law of $(I_{1},\ldots,I_{n})$ and $\nu=\nu_{n}$ is the conditional law of $(I_{1},\ldots,I_{n})$ given that $I_{n}=1$ .

If we would have orthant negative correlation for $(\mathrm{d}_{T_{n}}(v),\,v\in[n])$ then it would follow that for all $v\in[n-1]$ , ${\mathbb{E}}\left[I_{n}I_{v}\right]-{\mathbb{E}}\left[I_{n}\right]{\mathbb{E}}\left[I_{v}\right]\leq 0$ and so the conditions for (8) would be satisfied. Although such strong property has not been yet established, Proposition 4.1 implies for each $v\in[n-1]$ ,

[TABLE]

This suggests that there are couplings of $\mu$ and $\nu$ for which, with high probability, $I_{v}\leq J_{v}$ for all $v\in[n-1]$ . The existence for such couplings is delicate as the inequality $I_{v}\leq J_{v}$ has to hold for all $v\in[n-1]$ simultaneously.

The next proposition is the key ingredient in applying the Chen-Stein method to prove Theorem 1.7. The coupling is based on the Robin-Hood pruning and its proof is the content of Section 5.

Proposition 4.4.

Let $c\in(1,2)$ . There is $\beta=\beta(c)>0$ such that for any $m=m(n)>c\ln n$ there exists a coupling $(I,J)=((I_{1},\ldots,I_{n}),(J_{1},\ldots,J_{n}))$ of $\mu$ and $\nu$ , in which for all $v\in[n-1]$ ,

[TABLE]

In the next section we assume Proposition 4.4 and complete the proofs of the results on high-degree vertices of $R_{n}$ .

4.1. Proofs for high-degree vertices

Proof of Theorem 1.7.

Fix $1<c^{\prime}<c<2$ and let $c^{\prime}\ln n<m=m(n)<c\ln n$ . We apply the Chen-Stein method to $Z_{m}^{(n)}\stackrel{{\scriptstyle\mathrm{dist}}}{{=}}\sum_{v\in[n]}I_{v}$ . First, we use the coupling $(I,J)=((I_{1},\ldots,I_{n}),(J_{1},\ldots,J_{n}))$ of $\mu$ and $\nu$ given in Proposition 4.4. By (7), we have

[TABLE]

It thus remains to show that the terms in the bound above are $O(2^{-m+(1-\alpha)\log n})+O(n^{-\beta})$ , where $\alpha=\alpha(c)\in(0,1)$ and $\beta=\beta(c^{\prime})>0$ are defined as in Propositions 4.1 and 4.4 respectively. For any $v\in[n-1]$ ,

[TABLE]

The terms in the last line are bounded by (6) and Proposition 4.4, respectively. Since (1) gives ${\mathbb{E}}\left[I_{n}\right]=2^{-m}(1+o(1))$ we get

[TABLE]

Finally, (1) together with $\alpha<1$ also gives ${\mathbb{E}}\left[I_{n}\right]=O(2^{-m+(1-\alpha)\log n})$ . ∎

Proof of Theorem 1.5.

Fix $c\in(1,\log e)$ and let $\alpha=\alpha(c)$ be as in Theorem 1.7. Using the upper bound for $\alpha$ in Proposition 4.1 and simple computations yield $(1-\alpha)\log e<c$ . Thus, we can chose $c^{\prime}\in((1-\alpha)\log e,c)$ . Let $m=m(n)$ be such that $c^{\prime}\ln n<m<c\ln n$ . By the choice of $c$ and $c^{\prime}$ , we have that, as $n\to\infty$ , $(1-\alpha)\log n-m<0$ ; while (1) implies

[TABLE]

The result then follows by Theorem 1.7 and the central limit theorem of Poisson variables, see e.g. [9, Exercise 3.4.4].

∎

Proof of Theorem 1.10.

Recall that $\varepsilon_{n}=\log n-\lfloor\log n\rfloor$ . Let $i=i(n)$ satisfy $0<i<\log e\ln\ln n-C$ , where $C>0$ is a constant to be determined below, and note that $2^{i+\varepsilon_{n}}\leq 2^{i+1}<2^{-C+1}\ln n$ . Let $m=\lfloor\log n\rfloor-i$ and $Z\stackrel{{\scriptstyle\mathrm{dist}}}{{=}}\mathrm{Poi}\left(\lambda_{m,n}\right)$ .

We have that $\{\Delta_{n}<\lfloor\log n\rfloor-i\}$ if and only if $\{Z_{m}^{(n)}=0\}$ . Therefore,

[TABLE]

We deal with the two terms on the right-hand side of (9) separately. First, using the lower bound on $i$ , there is a constant $c\in(\log e,2)$ such that for $n$ large enough, $m-i<c\ln n$ . Therefore, (1) gives $\gamma>0$ such that $\lambda_{n,m}=2^{i+\varepsilon_{n}}+o(n^{-\gamma}\ln n)$ . Consequently,

[TABLE]

For the second term in (9), Theorem 1.7 gives $\alpha,\beta>0$ such that

[TABLE]

It remains to deal with these two error terms. Note that $\exp\{2^{i+\varepsilon_{n}}\}\leq\exp\{2^{-C+1}\ln n\}$ . Therefore, if $C>1+\log(1/\beta)$ then

[TABLE]

similarly, for $C$ large enough,

[TABLE]

The two limits above imply that $\mathrm{d}_{\mathrm{TV}}(Z_{m}^{(n)},Z)=o(\exp\{-2^{i+\varepsilon_{n}}\})$ , completing the proof.

∎

5. The coupling for the Chen-Stein Method

In this section we define and analyze the auxiliary coupling used in Proposition 4.4. The coupling is based on the following straightforward property of the deterministic pruning.

Fact 5.1.

Fix $n\geq 2$ . For $\mathrm{h}_{n}((T,\sigma),(k,l,x))=(T^{\prime},\sigma^{\prime})$ , we have $\mathrm{d}_{T^{\prime}}(n)=\sum_{i=k}^{n-1}x_{i}$ , and for $v\in[n-1]$ ,

[TABLE]

In words, Fact 5.1 specifies when the degree of a vertex $v<n$ changes: either for having $n$ as a new child or for losing children that are rewired towards $n$ . Clearly, the degree of $n$ equals the total number of such rewirings.

The heuristic for the almost negative relation obtained in Proposition 4.4 is the following. Start with $(T_{n-1},\sigma_{n-1})$ and apply the Robin-Hood procedure. If the degree of vertex $n$ is large, Fact 5.1 implies that a large number of vertices in $T_{n-1}$ were rewired towards $n$ in the new tree; thus, many (parent) vertices decreased their degree by at least one. In short, conditioning on $\deg_{T_{n}}(n)\geq m$ implies that other vertices are (slightly) less likely to satisfy $\deg_{T_{n}}(v)\geq m$ .

For the remainder of the section, fix $n\in\mathbb{N}$ , $c\in(1,2)$ and $m=m(n)>c\ln n$ . Let $(T_{n-1},\sigma_{n-1})$ be uniformly random in $\mathcal{D}_{n-1}$ , $(K,L,X)$ be an $\mathrm{H}_{n}$ -set, and $(K^{\prime},L^{\prime},X^{\prime})$ be distributed as an $\mathrm{H}_{n}$ -set conditioned to satisfy $\sum_{i=K}^{n-1}X^{\prime}_{i}\geq m$ . Now, write

[TABLE]

To avoid cluttery notation, we omit the dependency on $m$ of the conditional random variables $(K^{\prime},L^{\prime},X^{\prime})$ and $(\bm{T},\bm{\sigma})$ . By Fact 5.1 and Theorem 1.1, $(\bm{T},\bm{\sigma})$ is a conditional version of $(T_{n},\sigma_{n})$ given that $\mathrm{d}_{T_{n}}(n)\geq m$ . Consequently, if $I_{v}={\mathbf{1}}_{[\mathrm{d}_{T_{n}}(v)\geq m]}$ and $J_{v}={\mathbf{1}}_{[\mathrm{d}_{\bm{T}}(v)\geq m]}$ for all $v\in[n]$ , then any coupling between $(K,L,X)$ and $(K^{\prime},L^{\prime},X^{\prime})$ yields a coupling for the measures $\mu$ and $\nu$ in Proposition 4.4.

Our goal is then to couple $(K,L,X)$ and $(K^{\prime},L^{\prime},X^{\prime})$ in such a way that the negative relation between $I_{v}$ and $J_{v}$ fails on a negligible set. More precisely, we construct a coupling so that there is $\beta=\beta(c)>0$ satisfying

[TABLE]

Lemmas 5.2–5.4 provide the coupling between $(K,L,X)$ and $(K^{\prime},L^{\prime},X^{\prime})$ , while Proposition 5.5 gives necessary conditions, under the coupling, for $I_{v}<J_{v}$ to hold. The proof of Proposition 4.4 then follows from bounding the probability that such necessary conditions occur.

5.1. Construction of the coupling

For any integer $n-m\leq k<n$ , let $X^{k}=(X_{i}^{k},\,i\in[n-1])$ be a conditional version of $X$ given that $\sum_{i=k}^{n-1}X_{i}\geq m$ . The following observation is quite standard but we include a proof for completeness. For $a=(a_{1},\ldots,a_{d})$ and $b=(b_{1},\ldots,b_{d})\in\{0,1\}^{d}$ , $a\leq b$ only if $a_{i}\leq b_{i}$ for all $i\in[d]$ . We say that $S\subset\{0,1\}^{d}$ is monotone if $a\leq b$ and $a\in S$ imply $b\in S$ .

Lemma 5.2.

For each $k<n$ , there exists a coupling of $X^{k}$ and $X$ such that $X_{i}\leq X^{k}_{i}$ for all $i\in[n-1]$ .

Proof.

Fix $k<n$ . Note that $S_{k}=\{a\in\{0,1\}^{n-1}:a_{k}+\ldots+a_{n-1}\geq m\}$ is a monotone subset of $\{0,1\}^{n-1}$ . Harris inequality implies ${\mathbb{P}}\left(X\in S\cap S_{k}\right)\geq{\mathbb{P}}\left(X\in S_{k}\right){\mathbb{P}}\left(X\in S\right)$ , for any monotone subset $S\in\{0,1\}^{n-1}$ . Dividing through by ${\mathbb{P}}\left(X\in S_{k}\right)$ yields ${\mathbb{P}}\left(X^{k}\in S\right)\geq{\mathbb{P}}\left(X\in S\right)$ . Therefore, $X^{k}$ stochastically dominates $X$ . The existence of the coupling is then guaranteed by Strassen’s theorem [19]. ∎

Before the next coupling, we gather two observations. First, for fixed $(k,l)$ , we have ${\mathbb{P}}\left(L=l|K=k\right)={\mathbb{P}}\left(L^{\prime}=l|K^{\prime}=k\right)$ . To see this, observe that ${\mathbb{P}}\left(L^{\prime}=l|K^{\prime}=k\right)$ can be rewritten as

[TABLE]

the claim then follows by the independence between $X$ and $(K,L)$ . Second, the sequence $p_{k}={\mathbb{P}}\left(K=k\,|\,\sum_{i=K}^{n-1}X_{i}\geq m\right)$ is proportional to ${\mathbb{P}}\left(\sum_{i=k}^{n-1}X_{i}\geq m\right)$ and, thus, it is decreasing in $k$ . Clearly, the latter sequence of probabilities is decreasing in $k$ , while both are proportional with a factor $Z=n{\mathbb{P}}\left(\sum_{i=K}^{n-1}X_{i}\geq m\right)$ . To see this, use the independence between $X$ and $K$ to obtain

[TABLE]

Lemma 5.3.

There exists a coupling of $(K,L)$ and $(K^{\prime},L^{\prime})$ such that $K^{\prime}\leq K$ and $L^{\prime}\leq L$ .

Proof.

Let $X=(X_{1},\ldots,X_{n-1})$ be independent with $X_{i}\stackrel{{\scriptstyle\mathrm{dist}}}{{=}}\mathrm{Bernoulli}\left(1/i\right)$ and independently, let $U_{1},U_{2}$ be i.i.d. $\mathrm{Unif}\left(0,1\right)$ . By a slight abuse of notation we redefine the variables $(K,L)$ and $(K^{\prime},L^{\prime})$ using the variables $U_{1},U_{2}$ and argue that the original law is preserved.

Let $(K,L)=(\lceil nU_{1}\rceil,\lceil(K-1)U_{2}\rceil)$ and $(K^{\prime},L^{\prime})=(K^{\prime},\lceil(K^{\prime}-1)U_{2}\rceil)$ with

[TABLE]

It is straightforward that $(K,L)$ and $K^{\prime}$ have the correct law by construction, while $L^{\prime}$ has the correct law since ${\mathbb{P}}\left(L=l|K=k\right)={\mathbb{P}}\left(L^{\prime}=l|K^{\prime}=k\right)$ for each $0\leq l<k\leq n$ . Moreover, since $p_{k}$ is decreasing, it follows that $K^{\prime}=j$ implies $U_{1}>\sum_{i=1}^{j-1}p_{i}\geq\frac{j-1}{n}$ . It follows that $K\geq j=K^{\prime}$ , and so $L=\lceil(K-1)U_{2}\rceil\geq\lceil(K^{\prime}-1)U_{2}\rceil=L^{\prime}$ . ∎

Lemma 5.4.

There exists a coupling of $(K,L,X)$ and $(K^{\prime},L^{\prime},X^{\prime})$ such that $K^{\prime}\leq K$ , $L^{\prime}\leq L^{\prime}$ and $X_{i}\leq X^{\prime}_{i}$ for all $i\in[n-1]$ .

Proof.

Let $U_{1},U_{2}$ be i.i.d. $\mathrm{Unif}\left(0,1\right)$ and independently, let $X=(X_{1},\ldots,X_{n-1})$ be independent with $X_{i}\stackrel{{\scriptstyle\mathrm{dist}}}{{=}}\mathrm{Bernoulli}\left(1/i\right)$ . For each $1\leq k<n$ fix a vector $X^{k}$ coupled with $X$ according to Lemma 5.2. The dependence structure of $X^{1},\ldots,X^{n-1}$ is unimportant to the argument, but for concreteness we may, e.g., take them to be conditionally independent given $X$ . On the other hand, it is important to insist that the $X^{k}$ are independent of $(K^{\prime},L^{\prime})$ . Since we will define $(K^{\prime},L^{\prime})$ using $U_{1},U_{2}$ , the existence of such joint coupling is straightforward.

Again, by a slight abuse of notation we redefine the variables and argue that the original law is preserved. Define $(K,L)$ , $(K^{\prime},L^{\prime})$ as in Lemma 5.3 and let $X^{\prime}=X^{K^{\prime}}$ . Clearly, $(K,L,X)$ is an $\mathrm{H}_{n}$ -set. It remains to show that $(K^{\prime},L^{\prime},X^{\prime})$ has the conditional distribution of $(K,L,X)$ given that $\sum_{i=K}^{n-1}X_{i}\geq m$ . For any $(k,l,x)\in\mathcal{P}_{n}$ , the probability ${\mathbb{P}}\left((K,L,X)=(k,l,x)\,\middle|\,\sum_{i=K}^{n-1}x_{i}\geq m\right)$ can be rewritten as

[TABLE]

Adding two factors of ${\mathbb{P}}\left(\sum_{i=k}^{n-1}x_{i}\geq m\right)$ and using the independence between $(K,L)$ and $X$ , we can factorize these probabilities as

[TABLE]

These probabilities correspond, respectively, to the distributions of $(K^{\prime},L^{\prime})$ and $X^{k}$ , which are independent. Therefore,

[TABLE]

as desired. Finally, the variables $(K,L,X)$ and $(K^{\prime},L^{\prime},X^{\prime})$ satisfy the desired inequalities by Lemmas 5.2 and 5.3. ∎

5.2. Analysis of the coupling

The proof of Proposition 4.4 boils down to understanding necessary conditions for $\mathrm{d}_{T_{n}}(v)<m\leq\mathrm{d}_{\bm{T}}(v)$ to hold under the coupling of Lemma 5.4.

Proposition 5.5.

Consider $(K,L,X)$ and $(K^{\prime},L^{\prime},X^{\prime})$ defined in Lemma 5.4 and their corresponding decorated trees $(T_{n},\sigma_{n}),(\bm{T},\bm{\sigma})$ defined in (10) and (11). For any $v\in[n-1]$ ,

[TABLE]

Proof.

From the properties of the coupling in Lemma 5.4,

[TABLE]

Consequently, using Fact 5.1 we have that $\mathrm{d}_{\bm{T}}(v)-\mathrm{d}_{T_{n}}(v)\leq{\mathbf{1}}_{[L^{\prime}=\sigma_{n-1}(v)]}$ . On the other hand, if $\{\mathrm{d}_{T_{n}}(v)<m\leq\mathrm{d}_{\bm{T}}(v)\}$ holds, then it follows that $\mathrm{d}_{\bm{T}}(v)-\mathrm{d}_{T_{n}}(v)>0$ and so it is necessary that $\{L^{\prime}=\sigma_{n-1}(v)\}$ holds. Finally, $\{m\leq\mathrm{d}_{\bm{T}}(v)\}$ implies that

[TABLE]

or equivalently, that $\{\mathrm{d}_{T_{n-1}}(v)\geq m-1\}$ . ∎

We can also argue, more specifically, that

[TABLE]

however, the approach we chose allow us to use uniform bounds for all $v\in[n-1]$ . We will frame the events $\{\mathrm{d}_{T_{n-1}}(v)\geq m-1\}$ from the perspective of recursive trees where the degree distributions are distinct for each vertex. Recall the following version of Bernstein inequalities (see, e.g. [18] Theorem 2.8, (2.5)). For a sum $S$ of $\{0,1\}$ -valued variables and $\varepsilon>0$ , ${\mathbb{P}}\left(S>(1+\varepsilon){\mathbb{E}}\left[S\right]\right)\leq\exp\left\{-\frac{3\varepsilon^{2}}{2(3+\varepsilon)}{\mathbb{E}}\left[S\right]\right\}$ . By the construction of $R_{n}$ we have that $\mathrm{d}_{R_{n}}(i)\stackrel{{\scriptstyle\mathrm{dist}}}{{=}}\sum_{k=i}^{n}B_{k}\leq\sum_{k=1}^{n}B_{k}$ where $(B_{k},\,k\geq 1)$ are independent Bernoulli variables with mean $1/k$ . Therefore,

[TABLE]

Using that ${\mathbb{E}}\left[\sum_{k=1}^{n}B_{k}\right]=\ln n+O(1)<c\ln n$ , we can apply Berstein’s inequality with $\varepsilon=c-1+o(1)$ and set $\beta=\frac{3\varepsilon^{2}}{2(3+\varepsilon)}$ . It follows that there is $\beta=\beta(c)>0$ such that uniformly over $m>c\ln n$ , and $i\in[n]$ ,

[TABLE]

Proof of Proposition 4.4.

Fix $c\in(1,2)$ . Let $m=m(n)>c\ln n$ and $\beta=\beta(c)>0$ be as in (14). Let $T_{n}$ and $\bm{T}$ be as defined in (10) and (11) with $((K,L,X),(K^{\prime},L^{\prime},X^{\prime}))$ as in Lemma 5.4. Set $I_{v}={\mathbf{1}}_{[\mathrm{d}_{T_{n}}(v)\geq m]}$ and $J_{v}={\mathbf{1}}_{[\mathrm{d}_{\bm{T}}(v)\geq m]}$ for all $v\in[n]$ , so that $(I,J)=((I_{1},\ldots,I_{n}),(J_{1},\ldots J_{n}))$ is a coupling of the measures $\mu$ and $\nu$ .

Our goal is to bound ${\mathbb{P}}\left(I_{v}<J_{v}\right)={\mathbb{P}}\left(\mathrm{d}_{T_{n}}(v)<m\leq\mathrm{d}_{\bm{T}}(v)\right)$ . First, by Proposition 5.5,

[TABLE]

Next we obtain uniform bounds for the terms on the right-hand side. Recall that $\sigma_{n-1}$ is a uniformly random permutation independent of $L^{\prime}$ and that $\sigma_{n-1}(T_{n-1})\stackrel{{\scriptstyle\mathrm{dist}}}{{=}}R_{n-1}$ . These facts, together with (14) gives, for each $j\in[n-1]$ ,

[TABLE]

Plugging together these bounds, we get for any $v\in[n-1]$ ,

[TABLE]

∎

6. Conclusions and further research

The Robin-Hood pruning yields an interesting process $((T_{n},{\sigma}_{n}),\,n\geq 1)$ . By Theorem 1.1 and Proposition 1.2, $\sigma_{n}(T_{n})\stackrel{{\scriptstyle\mathrm{dist}}}{{=}}R_{n}$ for all $n\geq 1$ ; that is, $T_{n}$ has the shape of a recursive tree. The novelty of this process is that the Robin-Hood pruning is a fairly complex dynamic of trees which has potential connections to mathematical models of social and economic networks and raises challenging theoretical questions.

First, only asymptotically about half the time $T_{n}$ is obtained from $T_{n-1}$ by simply attaching $n$ to a uniformly random vertex. To see this, recall that $\mathrm{d}_{T_{n}}(n)\stackrel{{\scriptstyle\mathrm{dist}}}{{=}}\min\{\mathrm{Geo}\left(1/2\right),|{\mathcal{S}}|\}$ where $|{\mathcal{S}}|\to\infty$ (see Fact 6.1 and Lemma 6.2). It follows that with probability tending to $1/2$ the newly added vertex will be a leaf. Second, for all $n\geq 1$ , $\mathrm{d}_{R_{n}}(n)=0$ a.s. , while Fact 5.1 and the distribution of the $\mathrm{H}$ -set yield

[TABLE]

Third, from time to time, a large proportion of edges will be rewired towards the newly added vertex, drastically reshaping the structure of the tree. For example, for any $a\in[0,1)$ ,

[TABLE]

As for applications, in the context of random networks, the Robin-Hood pruning has an interpretation in terms of ‘trends’; for example, a new vertex brings in a new idea to the network which may drastically rewire the interests or connections of established individuals in the network. The stamp history $\sigma_{n}$ gives a ranking between the elements of $T_{n}$ that determines the susceptibility of changing parents in the tree. Preferential attachment models are considered better models for real-world networks. It would be interesting to devise a similar pruning procedure that, acting on preferential attachment trees, preserves their scale-free degree distribution.

In the context of biology, Kingman’s coalescent is usually represented with increasing binary trees, keeping individuals as external nodes and adding an internal node for each merge between two lineages. The representation using $n$ -chains breaks the symmetry between the pairs of trees merging at each step. Thus, it is not clear how the Robin-Hood pruning process would have a significant interpretation in terms of the genealogical information.

Regardless of the perspective we use to motivate the process $((T_{n},\sigma_{n}),\,n\geq 1)$ , there are many interesting theoretical questions that would be worth pursuing. To name just a few:

(1)

Understand the process describing how the parent and descendants of a given vertex change with time.

•

Describe how the size of the subtree rooted at a fixed node $j$ evolves.

•

How does maximum size of such subtree grow? 2. (2)

Understand the maximum degree dynamics in both $(R_{n},\,n\geq 1)$ and $(T_{n},\,n\geq 1)$ .

•

How often does vertices attaining the maximum degree change?

•

Are this dynamics the same for both processes? 3. (3)

Determine whether there is a coupling for which the sequence $({\mathbf{1}}_{[\mathrm{d}_{T_{n}}(v)]},v\in[n])$ is negative related (i.e. that conditions for (8) are satisfied), or similarly, whether the sequence is negative orthant dependent.

Acknowledgements

I would like to thank Louigi Addario-Berry and Henning Sulzbach for some very helpful discussions, and to the anonymous referees who provided insight on how to improve the presentation of the results and additional references. This research was supported by FQRNT through PBEEE scholarship with number 169888.

Appendix A: Proof of Proposition 4.1

We use the representation of Kingman’s coalescent that consists of a chain $\mathbf{C}=(F_{n},\ldots,F_{1})$ and write $T^{(n)}$ for the unique tree contained in $F_{1}$ . By Proposition 1.3 we can work with the tree $T^{(n)}$ . The proof mimics that of [2, Proposition 4.2], but requires a little more care as we wish to obtain explicit error bounds.

For each $v,j\in[n]$ let $T_{j}(v)$ denote the tree in $F_{j}$ that contains vertex $v$ . For each $v\in[n]$ , the selection set of $v$ is defined as

[TABLE]

this set keeps record of the times when the tree containing $v$ merges. Finally, for each $2\leq j\leq n$ , we say that $\xi_{j}$ is favorable for vertices in $T_{a_{j}}^{(j)}$ (resp. vertices in $T_{b_{j}}^{(j)}$ ) if $\xi_{j}=1$ (resp. $\xi_{j}=0$ ).

The key property of Kingman’s coalescent is the following. For each $j\in{\mathcal{S}}_{n}(v)$ , if $\xi_{j}$ favors $v$ , then $r(T_{j}(v))$ increases its degree by one in the process; otherwise $r(T_{j}(v))$ attaches to the root of the other merging tree and the degree of $r(T_{j}(v))$ remains unchanged for the rest of the process. Since all vertices start the process as roots, $\mathrm{d}_{T^{(n)}}(v)$ is equal to the length of the first streak of favorable times for $v$ . Moreover, $(\xi_{j},\,j\in[n-1])$ are independent and distributed as $\mathrm{Bernoulli}\left(1/2\right)$ . Therefore we have the following distributional equivalence.

Fact 6.1.

Let $D$ be a random variable with distribution $\mathrm{Geo}\left(1/2\right)$ independent of ${\mathcal{S}}_{n}(v)$ , then

[TABLE]

This fact, together with the next lemma, allow us to get estimates for the tails of $\mathrm{d}_{T^{(n)}}(v)$ .

Lemma 6.2.

If $c\in(0,2)$ and $0<\varepsilon\leq 1-c/2$ . Writing $a=1-\varepsilon-c/2$ , we have

[TABLE]

Proof.

First, there are $j(j-1)$ distinct pair of trees in $F_{j}$ , exactly $j-1$ of such pairs contains $T_{j}(v)$ ; thus ${\mathbb{P}}\left(j\in{\mathcal{S}}_{n}(v)\right)=2/j$ . Since the merging trees are chosen independently at each time, we have that for any $a\in[0,1)$ we have

[TABLE]

where the variables $B_{1},\ldots B_{n}$ are independent Bernoulli variables with ${\mathbb{E}}\left[B_{i}\right]=2/i$ , respectively. The desired bound is then a straightforward application of Bernstein’s inequalities (see, e.g. [18], Theorem 2.8 and (2.6)). For a sum $S$ of $\{0,1\}$ -valued variables, we have ${\mathbb{P}}\left(S\leq{\mathbb{E}}\left[S\right]-t\right)\leq\exp\{-t^{2}/2{\mathbb{E}}\left[S\right]\}$ . In this case, $S=\sum_{i=n^{a}}^{n}B_{i}$ and

[TABLE]

The result follows by setting $t=2\varepsilon\ln n+O(1)$ . ∎

Proposition 6.3.

If $c\in(0,2)$ and $m<c\ln n$ , then for $\varepsilon=(2-c)^{2}/4$ ,

[TABLE]

Proof of Proposition 6.3.

It follows from Lemma 6.1 that

[TABLE]

The upper bound on ${\mathbb{P}}\left(\mathrm{d}_{T^{(n)}}(1)\geq m\right)$ is then trivial, while the lower bound follows by Lemma 6.2 using $\varepsilon=1-c/2$ and that ${\mathcal{S}}_{n}(v)={\mathcal{S}}_{n}(v)\setminus[1]$ . ∎

Now, consider two distinct vertices $v,w\in[n]$ . For $m\in\mathbb{N}$ , let $\mathcal{G}_{m}\in\{2,\ldots,n\}^{2}$ contain all pairs of selection sets that enable vertices $v$ and $w$ to have degree at least $m$ ; that is, $(A,B)\in\mathcal{G}_{m}$ only if

[TABLE]

Since the $\xi_{j}$ are independent of the selection times, we have that

[TABLE]

To estimate ${\mathbb{P}}\left(({\mathcal{S}}_{n}(v),{\mathcal{S}}_{n}(w))\in\mathcal{G}_{m}\right)$ we need more details on the dynamics of the model. We start with a simple tail bound for the following random variable; let

[TABLE]

Lemma 6.4.

For $a\in(0,1)$ , ${\mathbb{P}}\left(\tau>n^{a}\right)\leq 4n^{-a}$ .

Proof.

Vertices in $T^{(n)}$ are exchangeable, so we can take $v=1,w=2$ ; these vertices belong to distinct trees in $F_{j}$ for all $j\geq\tau$ . Additionally, by the ordering convention of trees in $F_{j}$ , it follows that $T_{j}(1)=1$ and $T_{j}(2)=2$ for all $j\geq\tau$ .

We claim that for all $2<k\leq n$ ,

[TABLE]

This follows by induction on $n-k$ . Clearly, $\tau=n$ only if $\{a_{n},b_{n}\}=\{1,2\}$ which occurs with probability $\frac{2}{n(n-1)}$ , thus ${\mathbb{P}}\left(\tau\leq n-1\right)$ satisfies the equation above. For $k<n$ , we have

[TABLE]

Next, for $k$ larger enough,

[TABLE]

The second inequality uses that $1-x>e^{-2x}$ for $x>0$ sufficiently small, followed by the fact that $e^{-\sum 2x_{j}}>1-\sum 2x_{j}$ . The result follows with $k=n^{a}$ . ∎

Lemma 6.5.

If $c\in(0,2)$ and $m<c\ln n$ , then for any $\gamma<\frac{1}{4}(1-c+\sqrt{1+2c-c^{2}})$ ,

[TABLE]

Proof.

For each $\varepsilon\in(0,1-c/2]$ write $a=a(\varepsilon)=1-\varepsilon-c/2$ , then

[TABLE]

Before, establishing (16), we note that the terms in the right-hand side of (16) are bounded by Lemmas 6.4 and 6.2, respectively. Since such bounds depend on the choice of $\varepsilon$ , we can use

[TABLE]

The last equality since the functions to be minimized are decreasing and increasing, respectively, on the $(0,1)$ interval. It then follows that the maximum is attained when $0<\varepsilon<1-c/2$ satisfies $1-\varepsilon-c/2=\varepsilon^{2}/(\varepsilon+\frac{c}{2})$ .

We now proceed to establish equation (16). At step $\tau$ , exactly one of $v$ and $w$ is favored by $\xi_{\tau}$ . Thus, at least one of $v$ or $w$ gets its degree fixed for the remainder of the process. Therefore,

[TABLE]

By intersecting with the event $\tau>n^{a}$ , and the exchangeability of vertices in $T^{(n)}$ we get,

[TABLE]

from which (16) follows. ∎

Proof of Proposition 4.1.

Fix $c\in(0,2)$ , $m=m(n)<c\ln n$ and let $I_{v},J_{v}$ be defined as in Proposition 4.1. By Proposition 1.3, if follows that ${\mathbb{E}}\left[I_{v}\right]={\mathbb{P}}\left(\mathrm{d}_{T^{(n)}}(v)\geq m\right)$ and

[TABLE]

the last equality by (15). Lemmas 6.5 and 6.3 then gives that for $\alpha<\frac{1}{4}(1-c+\sqrt{1+2c-c^{2}})$ ,

[TABLE]

∎

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] L. Addario-Berry. Partition functions of discrete coalescents: from Cayley’s formula to frieze’s ξ ( 3 ) 𝜉 3 \xi(3) limit theorem. In XI Symposium on Probability and Stochastic Processes. Progress in Probability , volume 68. Birkhauser, Basel.
2[2] Louigi Addario-Berry and Laura Eslava. High degree of random recursive trees. Random Structures Algorithms , 52:560–575, 2018.
3[3] C. W. Anderson. Extreme value theory for a class of discrete distributions with applications to some stochastic processes. J. Appl. Probability , 7:99–113, 1970.
4[4] Blum M., Franois O and Janson, S. The mean, the variance and limiting distributions of two statistics sensitive to phylogenetic tree balance. Ann. Appl. Probability , 16:2195–2114, 2006.
5[5] Huilan Chang and Michael Fuchs. Limit theorems for patterns in phylogenetic trees. J. Math. Biol. , 60(4):481–512, 2010.
6[6] L. Devroye. Branching processes in the analysis of the heights of trees. Acta Inform. , 24(3):277–298, 1987.
7[7] Luc Devroye and Jiang Lu. The strong convergence of maximal degrees in uniform random recursive trees and dags. Random Structures Algorithms , 7(1):1–14, 1995.
8[8] Devdatt Dubhashi and Desh Ranjan. Balls and bins: a study in negative dependence. Random Structures Algorithms , 13(2):99–124, 1998.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A non-increasing tree growth process for recursive trees and applications

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

1.1. Notation

1.2. The new growth process

Theorem 1.1**.**

Proposition 1.2**.**

Proof.

Proposition 1.3**.**

Corollary 1.4**.**

1.3. High-degree vertices in RnR_{n}Rn​

Theorem 1.5**.**

Remark 1.6**.**

Theorem 1.7**.**

Remark 1.8**.**

Remark 1.9**.**

Corollary 1.10**.**

Outline

2. Kingman’s coalescents and recursive trees: distinct representations

2.1. Recursive trees perspective

Definition 2.1**.**

Proposition 2.2**.**

Proof.

Proof of Proposition 1.3.

Remark 2.3**.**

2.2. The binary search tree connection

3. The Robin-Hood pruning

3.1. A deterministic process

Remark 3.1**.**

Definition 3.2**.**

Lemma 3.3**.**

Proof.

Remark 3.4**.**

3.2. The random process

Definition 3.5**.**

Lemma 3.6**.**

Proof.

Proof of Theorem 1.1.

4. The Poisson approximation

Proposition 4.1**.**

Theorem 4.2** ([14, Theorem 3.7]).**

Corollary 4.3**.**

Proposition 4.4**.**

4.1. Proofs for high-degree vertices

Proof of Theorem 1.7.

Proof of Theorem 1.5.

Proof of Theorem 1.10.

5. The coupling for the Chen-Stein Method

Fact 5.1**.**

5.1. Construction of the coupling

Lemma 5.2**.**

Proof.

Lemma 5.3**.**

Proof.

Lemma 5.4**.**

Proof.

5.2. Analysis of the coupling

Proposition 5.5**.**

Proof.

Proof of Proposition 4.4.

6. Conclusions and further research

Acknowledgements

Appendix A: Proof of Proposition 4.1

Fact 6.1**.**

Lemma 6.2**.**

Proof.

Proposition 6.3**.**

Proof of Proposition 6.3.

Lemma 6.4**.**

Proof.

Lemma 6.5**.**

Proof.

Theorem 1.1.

Proposition 1.2.

Proposition 1.3.

Corollary 1.4.

1.3. High-degree vertices in $R_{n}$

Theorem 1.5.

Remark 1.6.

Theorem 1.7.

Remark 1.8.

Remark 1.9.

Corollary 1.10.

Definition 2.1.

Proposition 2.2.

Remark 2.3.

Remark 3.1.

Definition 3.2.

Lemma 3.3.

Remark 3.4.

Definition 3.5.

Lemma 3.6.

Proposition 4.1.

Theorem 4.2 ([14, Theorem 3.7]).

Corollary 4.3.

Proposition 4.4.

Fact 5.1.

Lemma 5.2.

Lemma 5.3.

Lemma 5.4.

Proposition 5.5.

Fact 6.1.

Lemma 6.2.

Proposition 6.3.

Lemma 6.4.

Lemma 6.5.