Random Self-Similar Trees: A mathematical theory of Horton laws

Yevgeniy Kovchegov; Ilya Zaliapin

arXiv:1905.02629·math.PR·May 21, 2019

Random Self-Similar Trees: A mathematical theory of Horton laws

Yevgeniy Kovchegov, Ilya Zaliapin

PDF

TL;DR

This paper develops a mathematical framework for understanding Horton laws in hierarchical trees, linking their self-similarity and invariance properties to pruning operations, with applications across various scientific disciplines.

Contribution

It provides a unified mathematical theory connecting Horton laws to pruning and self-similarity in trees, advancing the understanding of branching structures.

Findings

01

Horton laws are characterized as invariants under pruning.

02

Self-similarity explains the universality of Horton laws across disciplines.

03

Pruning operations are essential for modeling branching and coalescent processes.

Abstract

The Horton laws originated in hydrology with a 1945 paper by Robert E. Horton, and for a long time remained a purely empirical finding. Ubiquitous in hierarchical branching systems, the Horton laws have been rediscovered in many disciplines ranging from geomorphology to genetics to computer science. Attempts to build a mathematical foundation behind the Horton laws during the 1990s revealed their close connection to the operation of pruning -- erasing a tree from the leaves down to the root. This survey synthesizes recent results on invariances and self-similarities of tree measures under various forms of pruning. We argue that pruning is an indispensable instrument for describing branching structures and representing a variety of coalescent and annihilation dynamics. The Horton laws appear as a characteristic imprint of self-similarity, which settles some questions prompted by…

Tables1

Table 1. Table 1: Mean size, 𝖤 [ X K ] = 𝒩 1 [ K ] 𝖤 delimited-[] subscript 𝑋 𝐾 subscript 𝒩 1 delimited-[] 𝐾 {\sf E}[X_{K}]={\mathcal{N}}_{1}[K] , and length, 𝖤 [ Y K ] = 𝖤 [ length ( T ) ] 𝖤 delimited-[] subscript 𝑌 𝐾 𝖤 delimited-[] length 𝑇 {\sf E}[Y_{K}]={\sf E}[\textsc{length}(T)] , of a critical binary Galton-Watson tree T ∼ d 𝖦𝖶 ( 1 ) superscript similar-to 𝑑 𝑇 𝖦𝖶 1 T\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(1) ; here c = 2 𝑐 2 c=2 , R = 4 𝑅 4 R=4 .

$𝗈𝗋𝖽 (T)$	$𝒩_{1} [K]$	$𝖤 [length (T)]$	$2 - \frac{𝖤 [Y_{K}]}{𝖤 [X_{K}]}$	$4 - \frac{𝖤 [X_{K}]}{𝖤 [X_{K - 1}]}$
$1$	$1$	$1$	$1$	–
$2$	$3$	$5$	$1 / 3$	$1$
$3$	$11$	$21$	$9 \times 10^{- 2}$	$1 / 3$
$4$	$43$	$85$	$2 \times 10^{- 2}$	$9 \times 10^{- 2}$
$5$	$171$	$341$	$6 \times 10^{- 3}$	$2 \times 10^{- 2}$
$6$	$683$	$1365$	$1 \times 10^{- 3}$	$6 \times 10^{- 3}$
$7$	$2731$	$5461$	$4 \times 10^{- 4}$	$1 \times 10^{- 3}$
$8$	$10923$	$21845$	$9 \times 10^{- 5}$	$4 \times 10^{- 4}$
$9$	$43691$	$87381$	$2 \times 10^{- 5}$	$9 \times 10^{- 5}$
$10$	$174763$	$349525$	$6 \times 10^{- 6}$	$2 \times 10^{- 5}$
$11$	$699051$	$1398101$	$1 \times 10^{- 6}$	$6 \times 10^{- 6}$
$12$	$2796203$	$5592405$	$4 \times 10^{- 7}$	$1 \times 10^{- 6}$
$13$	$11184811$	$22369621$	$9 \times 10^{- 8}$	$4 \times 10^{- 7}$
$14$	$44739243$	$89478485$	$2 \times 10^{- 8}$	$9 \times 10^{- 8}$
$15$	$178956971$	$357913941$	$6 \times 10^{- 9}$	$2 \times 10^{- 8}$
$16$	$715827883$	$1431655765$	$1 \times 10^{- 9}$	$6 \times 10^{- 9}$
$17$	$2863311531$	$5726623061$	$3 \times 10^{- 10}$	$1 \times 10^{- 9}$
$18$	$11453246123$	$22906492245$	$9 \times 10^{- 11}$	$3 \times 10^{- 10}$
$19$	$45812984491$	$91625968981$	$2 \times 10^{- 11}$	$9 \times 10^{- 11}$
$20$	$183251937963$	$366503875925$	$5 \times 10^{- 12}$	$2 \times 10^{- 11}$

Equations1446

\frac{N _{K}}{N _{K + 1}} = R \Leftrightarrow N_{K} \propto R^{- K}

\frac{N _{K}}{N _{K + 1}} = R \Leftrightarrow N_{K} \propto R^{- K}

\textsc l e n g t h (T) = i = 1 \sum # T l_{i} .

\textsc l e n g t h (T) = i = 1 \sum # T l_{i} .

\textsc h e i g h t (T) = 1 \leq i \leq # T max d (v_{i}, ρ) .

\textsc h e i g h t (T) = 1 \leq i \leq # T max d (v_{i}, ρ) .

d (w, x) + d (y, z) \leq max {d (w, y) + d (x, z), d (x, y) + d (w, z)} .

d (w, x) + d (y, z) \leq max {d (w, y) + d (x, z), d (x, y) + d (w, z)} .

T \equiv R^{0} (T) \to R^{1} (T) \to \dots \to R^{k} (T) = ϕ,

T \equiv R^{0} (T) \to R^{1} (T) \to \dots \to R^{k} (T) = ϕ,

ord (T) = min {k \geq 0 : R^{k} (T) = ϕ} .

ord (T) = min {k \geq 0 : R^{k} (T) = ϕ} .

ord (p) = {r r + 1 if # {s : i_{s} = r} = 1, otherwise .

ord (p) = {r r + 1 if # {s : i_{s} = r} = 1, otherwise .

ord (p) = max (i, j) + δ_{ij} = ⌊ lo g_{2} (2^{i} + 2^{j})⌋,

ord (p) = max (i, j) + δ_{ij} = ⌊ lo g_{2} (2^{i} + 2^{j})⌋,

R (\textsc e mb e d (T)) = \textsc e mb e d (R (T)),

R (\textsc e mb e d (T)) = \textsc e mb e d (R (T)),

μ_{K} (T) = μ (T ∣ T \in H_{K})

μ_{K} (T) = μ (T ∣ T \in H_{K})

μ = K = 1 \sum \infty p_{K} μ_{K} .

μ = K = 1 \sum \infty p_{K} μ_{K} .

\nu(T)=\mu\circ\mathcal{R}^{-1}(T)=\mu\big{(}\mathcal{R}^{-1}(T)\big{)}.

\nu(T)=\mu\circ\mathcal{R}^{-1}(T)=\mu\big{(}\mathcal{R}^{-1}(T)\big{)}.

ν (T ∣ T \neq = ϕ) = μ (T) .

ν (T ∣ T \neq = ϕ) = μ (T) .

p_{K} = p (1 - p)^{K - 1}, K \geq 1,

p_{K} = p (1 - p)^{K - 1}, K \geq 1,

μ_{K + 1} (R^{- 1} (T)) = μ_{K} (T) .

μ_{K + 1} (R^{- 1} (T)) = μ_{K} (T) .

R^{- 1} (H_{K - 1}) = H_{K}, K \geq 2.

R^{- 1} (H_{K - 1}) = H_{K}, K \geq 2.

μ (R^{- 1} (T)) = μ (T) (1 - μ (H_{1})) .

μ (R^{- 1} (T)) = μ (T) (1 - μ (H_{1})) .

μ (H_{K}) = by \eqref H_{s} hi f t μ (R^{- 1} (H_{K - 1})) = by \eqref d e f : p i 1 (1 - μ (H_{1})) μ (H_{K - 1}),

μ (H_{K}) = by \eqref H_{s} hi f t μ (R^{- 1} (H_{K - 1})) = by \eqref d e f : p i 1 (1 - μ (H_{1})) μ (H_{K - 1}),

μ (T) = μ (H_{1}) (1 - μ (H_{1}))^{K - 1} μ_{K} (T),

μ (T) = μ (H_{1}) (1 - μ (H_{1}))^{K - 1} μ_{K} (T),

μ (R^{- 1} (T)) = μ (H_{1}) (1 - μ (H_{1}))^{K} μ_{K + 1} (R^{- 1} (T)) .

μ (R^{- 1} (T)) = μ (H_{1}) (1 - μ (H_{1}))^{K} μ_{K + 1} (R^{- 1} (T)) .

l_{T} = x_{T} \cdot \textsc l e n g t h (T) .

l_{T} = x_{T} \cdot \textsc l e n g t h (T) .

Combinatorial shape : μ (τ) = Law (\textsc s ha p e (T) = τ),

Combinatorial shape : μ (τ) = Law (\textsc s ha p e (T) = τ),

Relative edge lengths : χ_{τ} (\overset{x}{ˉ}) = Law (x_{T} = \overset{x}{ˉ} ∣ \textsc s ha p e (T) = τ),

Total tree length : F_{τ, \overset{x}{ˉ}} (ℓ) = Law (\textsc l e n g t h (T) = ℓ ∣ x_{T} = \overset{x}{ˉ}, \textsc s ha p e (T) = τ) .

\Xi_{\tau}(\bar{x})={\sf Law}\left(x_{\mathcal{R}(T)}=\bar{x}\,|\,\textsc{shape}\big{(}\mathcal{R}(T)\big{)}=\tau\right)

\Xi_{\tau}(\bar{x})={\sf Law}\left(x_{\mathcal{R}(T)}=\bar{x}\,|\,\textsc{shape}\big{(}\mathcal{R}(T)\big{)}=\tau\right)

\Phi_{\tau,\bar{x}}(\ell)={\sf Law}\left(\textsc{length}\big{(}\mathcal{R}(T)\big{)}=\ell\,|\,x_{\mathcal{R}(T)}=\bar{x},~{}\textsc{shape}\big{(}\mathcal{R}(T)\big{)}=\tau\right).

\Phi_{\tau,\bar{x}}(\ell)={\sf Law}\left(\textsc{length}\big{(}\mathcal{R}(T)\big{)}=\ell\,|\,x_{\mathcal{R}(T)}=\bar{x},~{}\textsc{shape}\big{(}\mathcal{R}(T)\big{)}=\tau\right).

μ_{K}^{H} (τ) = μ_{K} (τ) \forall τ \in H_{K} .

μ_{K}^{H} (τ) = μ_{K} (τ) \forall τ \in H_{K} .

χ_{τ}^{H} (\overset{x}{ˉ}) = χ_{τ} (\overset{x}{ˉ}),

χ_{τ}^{H} (\overset{x}{ˉ}) = χ_{τ} (\overset{x}{ˉ}),

F_{τ, \overset{x}{ˉ}}^{H} (ℓ) = F_{τ, \overset{x}{ˉ}} (ℓ) .

F_{τ, \overset{x}{ˉ}}^{H} (ℓ) = F_{τ, \overset{x}{ˉ}} (ℓ) .

μ (τ) = ν (τ ∣ τ \neq = ϕ) .

μ (τ) = ν (τ ∣ τ \neq = ϕ) .

Ξ_{τ} (\overset{x}{ˉ}) = χ_{τ} (\overset{x}{ˉ})

Ξ_{τ} (\overset{x}{ˉ}) = χ_{τ} (\overset{x}{ˉ})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Random Self-Similar Trees:

A mathematical theory of Horton laws

Yevgeniy Kovchegovlabel=e1][email protected] [ Department of Mathematics, Oregon State University

2000 SW Campus Way, Corvallis, OR 97331-4605

Ilya Zaliapinlabel=e2][email protected] [ Department of Mathematics and Statistics, University of Nevada Reno,

1664 North Virginia st., Reno, NV 89557-0084

Abstract

The Horton laws originated in hydrology with a 1945 paper by Robert E. Horton, and for a long time remained a purely empirical finding. Ubiquitous in hierarchical branching systems, the Horton laws have been rediscovered in many disciplines ranging from geomorphology to genetics to computer science. Attempts to build a mathematical foundation behind the Horton laws during the 1990s revealed their close connection to the operation of pruning – erasing a tree from the leaves down to the root. This survey synthesizes recent results on invariances and self-similarities of tree measures under various forms of pruning. We argue that pruning is an indispensable instrument for describing branching structures and representing a variety of coalescent and annihilation dynamics. The Horton laws appear as a characteristic imprint of self-similarity, which settles some questions prompted by geophysical data.

05C05, 05C80,

05C63, 58-02,

keywords:

[class=MSC]

\startlocaldefs\endlocaldefs

t1This is an original survey paper

t2The work is supported by FAPESP award 2018/07826-5 and by NSF award DMS-1412557.

and t3The work is supported by NSF award EAR-1723033.

1 Introduction
1.1 Early empirical evidence
1.2 Survey structure
2 Definitions and notations
2.1 Spaces of finite rooted trees
2.2 Real trees
2.3 Horton pruning
2.4 Horton-Strahler orders
2.5 Alternative definitions of Horton-Strahler orders
2.6 Tokunaga indices and side branching
2.7 Labeling edges
2.8 Galton-Watson trees
3 Self-similarity with respect to Horton pruning
3.1 Self-similarity of a combinatorial tree
3.2 Self-similarity of a tree with edge lengths
3.3 Mean self-similarity of a combinatorial tree
3.4 Examples of self-similar trees
4 Horton law in self-similar trees
4.1 Proof of Theorem 1
4.2 Well-defined asymptotic Horton ratios
4.3 Entropy and information theory
4.4 Applications
4.4.1 Hydrology
4.4.2 Computer science
5 Critical binary Galton-Watson tree
5.1 Combinatorial case
5.1.1 Horton and Tokunaga self-similarities
5.1.2 Dynamics of branching probabilities under Horton pruning
5.1.3 The Central Limit Theorem and the strong Horton law for branch counts
5.2 Metric case
5.2.1 Length of a Galton-Watson random tree ${\sf GW}(\lambda)$
5.2.2 Height of a Galton-Watson random tree ${\sf GW}(\lambda)$
6 Hierarchical Branching Process
6.1 Definition and main properties
6.2 Hydrodynamic limit
6.3 Criticality and time invariance
6.3.1 Definitions
6.3.2 Criticality and time-invariance in a self-similar process
6.4 Closed form solution for equally distributed branch lengths
6.5 Critical Tokunaga process
6.6 Martingale approach
6.6.1 Markov tree process
6.6.2 Martingale representation of tree size and length
6.6.3 Strong Horton laws in a critical Tokunaga tree
6.7 Combinatorial HBP: Geometric Branching Process
6.7.1 Definition and main properties
6.7.2 Tokunaga self-similarity of time invariant process
6.7.3 Frequency of orders in a large critical Tokunaga tree
6.7.4 Proof of Theorem 15
7 Tree representation of continuous functions
7.1 Harris path
7.2 Level set tree
7.2.1 Tamed functions: finite number of local extrema
7.2.2 General case
7.3 Reciprocity of Harris path and level set tree
7.4 Horton pruning of positive excursions
7.5 Excursion of a symmetric random walk
7.6 Exponential random walks
7.7 Geometric random walks and critical non-binary Galton-Watson trees
7.8 White noise and Kingman’s coalescent
7.8.1 Coalescent processes, trees
7.8.2 White noise
7.9 Level set trees on higher dimensional manifolds and Morse theory
8 Kingman’s coalescent process
8.1 Smoluchowski-Horton ODEs for Kingman’s coalescent
8.2 Hydrodynamic limit
8.3 Some properties of the Smoluchowski-Horton system of ODEs
8.3.1 Simplifying the Smoluchowski-Horton system of ODEs
8.3.2 Rescaling to $[0,1]$ interval
8.4 Proof of the existence of the root-Horton limit
8.4.1 Proof of Lemma 26 and related results
8.4.2 Proof of Lemma 27 and related results
9 Generalized dynamical pruning
9.1 Examples of generalized dynamical pruning
9.1.1 Example: pruning via the tree height
9.1.2 Example: pruning via the Horton-Strahler order
9.1.3 Example: pruning via the total tree length
9.1.4 Example: pruning via the number of leaves
9.2 Pruning for $\mathbb{R}$ -trees
9.3 Relation to other generalizations of pruning
9.4 Invariance with respect to the generalized dynamical pruning
9.5 Prune invariance of ${\sf GW}(\lambda)$
10 Continuum 1-D ballistic annihilation
10.1 Continuum model, sinks, and shock trees
10.2 Piece-wise linear potential with unit slopes
10.2.1 Graphical representation of the shock wave tree
10.2.2 Structure of the shock wave tree
10.2.3 Ballistic annihilation as generazlized pruning
10.3 Ballistic annihilation of an exponential excursion
10.4 Random sink in an infinite exponential potential
10.5 Real tree description of ballistic annihilation
10.5.1 $\mathbb{R}$ -tree representation of ballistic annihilation
10.5.2 Metric spaces on the set of initial particles
10.5.3 Other prunings on $\mathbb{T}$
11 Infinite trees built from leaves down
11.1 Infinite plane trees built from the leaves down
11.2 Infinite exponential critical binary Galton-Watson tree built from the leaves down
11.3 Continuum annihilation
12 Some open problems
A Weak convergence results of Kurtz for density dependent population processes
B Characterization of exponential random variables
C Notations
D Standard distributions
E Tree functions and mappings

1 Introduction

Invariance of the Galton-Watson tree measures with respect to pruning (erasure) that begins at the leaves and progresses down to the tree root has been recognized since the late 1980s. Both continuous [105] and discrete [29] versions of prunings have been studied. The prune-invariance of the trees naturally translates to the symmetries of the respective Harris paths [65]. The richness of such a connection is supported by the well-studied embeddings of the Galton-Watson trees in the excursions of random walks and Brownian motions (e.g., [107, 89, 116]). This provides a point of departure for this survey of recent results on prune-invariance, and more restrictive self-similarity, of tree measures and related stochastic processes on the real line. While the critical Galton-Watson tree and its Harris path (which is known to be a random walk) serve as an important example, the results extend to trees with more complicated structure and non-Markovian Harris paths. The main attention is paid to a discrete Horton pruning for finite trees (Sects. 2-8), yet we also consider infinite and real trees, and general forms of pruning (Sects. 9-11). Looking at random trees through a prism of self-similarity offers a concise parameterization of the respective measures via their Tokunaga sequences (Sect. 3), and uncovers a variety of structures and symmetries (e.g., Thms. 1,12,15,23,24). The surveyed results suggest that particular forms of pruning may underline the evolution of familiar dynamical systems, allowing their efficient analytical treatment (Sects. 8,10). The surveyed results also pose new questions related to random self-similar trees.

We begin by summarizing the key empirical observations that provided an impetus for the topic (Sect. 1.1) and discussing the structure and main results of this survey (Sect. 1.2). Here, we keep the references to a minimum, and indicate survey sections where one can find future information.

1.1 Early empirical evidence

The theory of random self-similar trees originated in the studies of river networks, which supplied the key empirical observations reviewed below.

Horton-Strahler orders (Sects. 2.4, 2.5). Informally, the aim of orders is to quantify the importance of vertices and edges in the tree hierarchy. It is natural to agree that the orders of a vertex and its parental edge are the same. Hence, we are only concerned with ordering vertices. In a perfect binary tree (where all leaves are located at the same depth, i.e., at the same distance from the root) one can assign orders inversely proportional to the vertex depth; see Fig. 1(a). In other words, we start with order $1$ at the leaves and increase the order by unity with every step towards the root.

A celebrated ordering scheme that generalizes this idea to an arbitrary tree (not necessarily binary) has been originally developed by Robert E. Horton [70], and later redesigned by Arthur N. Strahler [129] to its present form. It assigns integer orders to tree vertices and edges, beginning with order $1$ at the leaves and increasing the order by unity every time a pair of edges of the same order meets at a vertex; see Fig. 1(b). A sequence of adjacent vertices/edges with the same order is called a branch.

An example of Horton-Strahler ordering is shown in Fig. 2(a) for a small river network in the south-central US. Here, the orders serve as a good proxy for (a logarithm of) various physical characteristics of river channels: channel length, the area of the contributing basin, etc. The Horton-Strahler orders (a.k.a. Strahler numbers) provide an efficient ranking of the tree branches and have proven essential in numerous fields (see Sect. 4.4). As an example, the highest-order channel in a river basin commonly coincides with the basin’s namesake river (e.g., Amazon river is the highest-order channel of the Amazon basin). One may find it quite impressive that such an identification can be done using purely combinatorial properties of the basin. Further examples of Horton-Strahler ordering are shown in Figs. 8,9,10.

Horton laws and Horton exponents (Sect. 4). A geometric decay of the number of branches of increasing Horton-Strahler orders was first described by Robert E. Horton [70] in a study of river stream networks. Since then, the Horton law and its ramifications have proven indispensable in hydrology and have been reported in multiple other areas; see Sect. 4.4 for details and references.

The Horton law for branch numbers states that the numbers $N_{K}$ of channels (branches) of order $K$ in a large basin decay geometrically with the order:

[TABLE]

for some Horton exponent $R>1$ . Figure 3(a) illustrates the Horton law for branch numbers in the Beaver creek network of Fig. 2(a). In this basin, we find $R\approx 4.55$ .

The Horton laws are also found for multiple other river statistics (basin area, basin magnitude, channel length, etc.), with different Horton exponents. Figure 3(b) illustrates the Horton laws for the average magnitude (the number of leaves) $M_{K}$ in a subbasin of order $K$ , and the average number $L_{K}$ of edges in a channel of order $K$ in the Beaver creek network of Fig. 2(a). The respective Horton exponents here are $R_{M}\approx 4.55$ (for magnitude) and $R_{L}\approx 2.275$ (for edge number).

Horton pruning and its generalizations (Sects. 2.3, 9). The Horton-Strahler orders are naturally connected to the Horton pruning operation, which erases the leaves of a tree together with the adjacent edges, and removes the degree- $2$ vertices that might result from such erasure. Figure 2 illustrates a consecutive application of the Horton pruning to the Beaver creek network. The channels (branches) of order $K$ are being erased at the $K$ -th iteration of the Horton pruning. The mathematical theory of Horton laws concerns the tree measures that are invariant with respect to the Horton pruning. We also introduce a generalized dynamical pruning that allows one to erase a metric tree from the leaves down to the root in different ways, both continuous (metric) and discrete (combinatorial), and consider the respective prune-invariance.

Tokunaga model (Sects. 6.5, 6.6, 6.7). A notable observation inherited from the study of river networks is the Tokunaga law [133]. It complements the Horton law by describing the mergers of branches of distinct orders. Informally, the Tokunaga law suggests that the average number $\bar{N}_{i,j}$ , $i<j$ , of branches of order $i$ that merge with a branch of order $j$ in a given basin is an exponential function of the order difference, $\ln(\bar{N}_{i,j})~{}\propto~{}j-i$ . The Tokunaga model is surprisingly powerful in approximating the observed river networks [155] and predicting the values of multiple Horton exponents. Figure 3 shows how a one-parametric critical Tokunaga model $S^{\rm Tok}$ of Sect. 6.5 fits the average values of three branching statistics in the Beaver creek network.

In this work, we show the fundamental importance of the Toeplitz constraint $\bar{N}_{i,j}=f(j-i)$ . We also provide a theoretical justification for the classical version of the Tokunaga law, which corresponds to a particular choice $\ln f(x)~{}\propto~{}x$ .

1.2 Survey structure

Our primary goal is to survey the recent developments in the theory of random self-similar trees; yet a number of results, models, and approaches presented here are original. These novel results are motivated by the need to connect the dots and bridge the gaps when presenting a unified theory from the perspective of Horton pruning and its generalizations. We highlight some of these original contributions below in a list of survey topics.

The survey begins with the main definitions and notations in Sect. 2. This includes the definitions of finite rooted trees and tree spaces, and a brief overview of real trees. Next, Horton pruning and Horton-Strahler orders are introduced.

Section 3 defines the main types of invariances for tree measures sought-after in this survey. This includes a strong, distributional, Horton self-similarity and a weaker mean Horton self-similarity. Importantly, we justify the requirement of coordination, which, together with prune-invariance, constitutes the self-similarity studied in this work. Every Horton self-similar tree (either mean or distributional) is associated with a sequence of nonnegative Tokunaga coefficients $\{T_{k}\}_{k\geq 1}$ , which are theoretical analogs of the empirical averages $\bar{N}_{i,i+k}$ . The Tokunaga self-similar trees are a two-parameter sub-family of the mean Horton self-similar trees, with $T_{k}=ac^{k-1}$ .

The Horton law for tree measures is formally defined in Sect. 4 in terms of the random counts $N_{k}[T]$ of branches of order $k$ in a random tree $T$ . We introduce two versions of the strong Horton law, where one is convergence in probability and the other is convergence of expectation ratios. The main result of the section (Thm. 1) establishes that the mean Horton self-similarity implies the strong Horton law in expectation ratios, and expresses the Horton exponent $R$ via the Tokunaga sequence $\{T_{k}\}$ . Subsequently, we survey computations of the entropy rate for trees that satisfy the strong Horton law, as a function of the Horton exponent $R$ , and for the Tokunaga self-similar trees, as a function of the Tokunaga parameters $(a,c)$ . This emphasizes a special role played by the critical Tokunaga self-similar trees with $a=c-1$ , and a special point $(a,c)=(1,2)$ that describes (but is not limited to) the critical binary Galton-Watson tree. The section concludes with a brief discussion of the applications of Horton-Strahler orders and Horton laws in natural and computer sciences.

Section 5 discusses the Horton law and Tokunaga self-similarity for the combinatorial critical binary Galton-Watson tree. The proofs of the strong Horton law for branch numbers (Cor. 2) and the Central Limit Theorem for branch numbers (Cor. 3) are novel, and emphasize the power of the pruning approach. We also find here the length and height of the critical binary Galton-Watson tree with i.i.d. exponential edge lengths that is called the exponential critical binary Galton-Watson tree.

Section 6 introduces a multi-type Hierarchical Branching Process (HBP), which is the main model of this work. The process trajectories are described by time oriented trees; this induces a probability measure on the space of planar binary trees with edge lengths. The HBP can generate trees with an arbitrary sequence of Tokunaga coefficients $\{T_{k}\}$ . The combinatorial part of these trees is always mean Horton self-similar; the measures are also (distributionally) Horton self-similar under mild conditions (Thm. 9). A hydrodynamic limit is established (Thm. 10) that describes the averaged branch dynamics as a deterministic system of ordinary differential equations (ODEs). This system of ODEs is used to detect a phase transition that separates fading and explosive behavior of the average process progeny (Thm. 11). A subclass of critical Tokunaga processes (Def. 26) that happens at the phase transition boundary and corresponds to $T_{k}=(c-1)c^{k-1}$ reproduces many of the symmetries seen in the exponential critical binary Galton-Watson tree, including independence of edge lengths. The exponential critical binary Galton-Watson tree is a special case of the critical Tokunaga process with $c=1$ .

The results in Sect. 6.6 are original. We introduce a Markov tree-valued process that generates the critical Tokunaga trees. We find a two-dimensional martingale with respect to the filtration of this Markov tree process and use Doob’s Martingale Convergence Theorem for establishing the strong Horton law for the branch numbers (Thm. 14, Cor. 6).

The Geometric Branching Process that describes the combinatorial part of a Horton self-similar HBP is examined in Sect. 6.7. We show, in particular, that invariance of this process with respect to the unit time shift is equivalent to a one-dimensional version, $a=c-1$ , of the Tokunaga constraint $T_{k}=ac^{k-1}$ (Thm. 15). This provides an independent justification for studying the critical Tokunaga process. We show that the complete non-empty descendant subtrees in a combinatorial critical Tokunaga tree have the same distribution, and two non-overlapping trees are independent if and only if the process is critical binary Galton-Watson (Cor. 9). Moreover, the empirical frequencies of edge/vertex orders in a large random critical Tokunaga tree approximate the order distribution in the respective space of trees (Props. 11, 12). This property is convenient for applied statistical analysis, where one might only be able to examine a handful of (large) trees.

Section 7 extends the Horton self-similarity results to time series via tree representation of continuous functions, a construction that goes back to Menger [99], Kronrod [77] and the celebrated Kolmogorov-Arnold representation theorem [8, 141]. The level set tree for a continuous function is defined following the well known pseudo-metric approach (158) [3, 4, 89, 106, 45, 116]. We emphasize the connection of this construction with the Rising Sun Lemma (Lem. 18) of F. Riesz [118]. Proposition 14 reveals equivalence between the Horton pruning and transition to the local extrema of a function. This allows us to interpret the Horton self-similarity for level set trees as the existence of a time series whose distribution is invariant under transition to local extrema; see (167). An example of such an extreme-invariant process is given by the symmetric exponential random walk of Sect. 7.6.

The results in Sect. 7.5 are novel; they refer to the level set tree $T$ of a positive excursion of a symmetric homogeneous random walk $\{X_{k}\}_{k\in\mathbb{Z}}$ on $\mathbb{R}$ . The main result of this section (Thm. 16) shows that the combinatorial shape of $T$ is distributed as the critical binary Galton-Watson tree, for any choice of the transition kernel for $\{X_{k}\}$ . We also show (Lem. 20) that $T$ has identically distributed edge lengths if and only if the transition kernel of $\{X_{k}\}$ is the probability density function of the Laplace distribution. The results of this section complement Thm. 18, a classical result on Galton-Watson representation of the level set tree of an exponential excursion, that can be found in [116, Lemma 7.3] and [89, 106].

Section 7.8 demonstrates a close connection between the level set tree of a sequence of i.i.d. random variables (discrete white noise) and the tree of the Kingman’s coalescent process. The two trees are separated by a single Horton pruning (Thm. 21).

Section 7.9 expands the level set tree construction to a Morse function defined on a multidimensional compact differentiable manifold. The key results from the Morse theory [103, 109, 31] are used to describe the tree structure (Cor. 19, Lem. 23).

Section 8 establishes a weak form of Horton law for a tree representation of Kingman’s coalescent process (Thm. 23). The proof is based on a Smoluchowski-type system of Smoluchowski-Horton ODEs (190) that describes evolution of the number of branches of a given Horton-Strahler order in a tree that represents Kingman’s $N$ -coalescent, in a hydrodynamic limit. Section 8.2 uses T. Kurtz’s weak convergence results for density dependent population processes (Appendix A) to give a new, shorter than the original [82], derivation of the hydrodynamic limit. We present two alternative, more concise, versions of the Smoluchowski-Horton ODEs in (200) and (203), and use them to find a close numerical approximation to the Horton exponent in the Kingman’s coalescent: $R=3.0438279\dots$ . This exponent also applies to the level set tree of a discrete white noise, via the equivalence of Thm. 21 in Sect. 7.8.

Section 9 introduces the generalized dynamical pruning (213). This operation erases consecutively larger parts of a tree $T$ , starting from the leaves and going down towards the root, according to a monotone nondecreasing pruning function $\varphi$ along the tree. The generalized dynamical pruning encompasses a number of discrete and continuous pruning operations, notably including the tree erasure of Jacques Neveu [105] (Sect. 9.1.1) and Horton pruning (Sect. 9.1.2). Important for our discussion, it generically includes erasures that do not satisfy the semigroup property (Sects. 9.1.3, 9.1.4). Theorem 24 establishes prune invariance (Def. 35) of the exponential critical binary Galton-Watson tree with respect to a generalized dynamical pruning with an arbitrary admissible pruning function $\varphi$ . The scaling exponents (Def. 35(ii)) that describe such pruning for the function $\varphi$ being the tree length, tree height, or Horton-Starhler order are found in Thm. 25.

As an illuminating application of the generalized dynamical pruning, Sect. 10 examines the continuum 1-D ballistic annihilation model $A+A\rightarrow\zeroslash$ for a constant initial particle density and initial velocity that alternates between the values of $\pm 1$ . The model dynamics creates coalescing shock waves, similar to those that appear in Hamilton-Jacobi equations [18], that have tree structure. We show (Cor. 21 of Thm. 26) that the shock tree is isometric to the level set tree of the initial potential (integral of velocity), and the model evolution is equivalent to a generalized dynamical pruning of the shock tree, with the pruning function equal to the total tree length (Thm. 28). This equivalence allows us to construct a complete probabilistic description of the annihilation dynamics for the initial velocity that alternates between the values of $\pm 1$ at the epochs of a constant rate Poisson point process (Thms. 29, 30, 31). A real tree representation of the continuum ballistic annihilation is presented in Sect. 10.5.

Section 11 is novel. Here we construct an infinite level set tree, built from leaves down, for a time series $\{X_{k}\}_{k\in\mathbb{Z}}$ . This gives a fresh perspective on multiple earlier results; e.g., those concerning the level set trees of random walks (Sect. 7.6), the generalized dynamical pruning (Sect. 9.5), or the evolution of an infinite exponential potential in the continuum annihilation model (Sect. 10.4). For instance, the infinite-tree version of prune-invariance for the exponential Galton-Watson tree (Thm. 32) can be established in a much simpler way than its finite counterpart (Thm. 24). Although this natural perspective has always influenced our research, this is the first time it is presented in explicit form.

The survey concludes with a short list of open problems (Sect. 12).

Many concepts used in this survey are overlapping with the recent expositions on random trees, branching and coalescent processes by Aldous [3, 4, 5], Berestycki [22], Bertoin [26], Drmota [39], Duquesne and LeGall [45], Evans [52], Le Gall [90], Lyons and Peres [93], and Pitman [116]. We expect that the perspectives displayed in the present survey will with time connect and intertwine with better established topics in the theory of random trees.

2 Definitions and notations

2.1 Spaces of finite rooted trees

A connected acyclic graph is called a tree. Consider the space $\mathcal{T}$ of finite unlabeled rooted reduced trees with no planar embedding. The (combinatorial) distance between a pair of tree vertices is the number of edges in a shortest path between them. A tree is called rooted if one of its vertices, denoted by $\rho$ , is selected as the tree root. The existence of root imposes a parent-offspring relation between each pair of adjacent vertices: the one closest to the root is called the parent, and the other the offspring. The space $\mathcal{T}$ includes the empty tree $\phi$ comprised of a root vertex and no edges. The absence of planar embedding in this context is the absence of order among the offspring of the same parent. The tree root is the only vertex that does not have a parent. We write $\#T$ for the number of non-root vertices, equal to the number of edges, in a tree $T$ . Hence, a finite tree $T=\rho\cup\{v_{i},e_{i}\}_{1\leq i\leq\#T}$ is comprised of the root $\rho$ and a collection of non-root vertices $v_{i}$ , each of which is connected to its unique parent ${\sf parent}(v_{i})$ by the parental edge $e_{i}$ , $1\leq i\leq\#T$ . Unless indicated otherwise, the vertices are indexed in order of depth-first search, starting from the root. A tree is called reduced if it has no vertices of degree $2$ , with the root as the only possible exception.

The space of trees from $\mathcal{T}$ with positive edge lengths is denoted by $\mathcal{L}$ . The trees in $\mathcal{L}$ , also known as weighted tree [116, 93], can be considered metric spaces. Specifically, the trees from $\mathcal{L}$ are isometric to one-dimensional connected sets comprised of a finite number of line segments that can share end points. The distance along tree paths is defined according to the Lebesgue measure on the edges. Each such tree can be embedded into $\mathbb{R}^{2}$ without creating additional edge intersections (see Fig. 4). Such a two-dimensional pictorial representation serves as the best intuitive model for the trees discussed in this work.

We write $\mathcal{T}_{\rm plane}$ and $\mathcal{L}_{\rm plane}$ for the spaces of trees from $\mathcal{T}$ and $\mathcal{L}$ with planar embedding, respectively. Any tree from $\mathcal{T}$ or $\mathcal{L}$ can be embedded in a plane by selecting an order for the offsprings of the same parent. Choosing different embeddings for the same tree $T\in\mathcal{T}$ (or $\mathcal{L}$ ) leads, in general, to different trees from $\mathcal{T}_{\rm plane}$ (or $\mathcal{L}_{\rm plane}$ ). Figure 4 illustrates alternative planar embeddings of a tree $T\in\mathcal{L}$ . Planar embedding (offspring order) should not be confused with drawing style, related to how edges are represented in a plane. Each panel in Fig. 4 uses a separate drawing style.

Sometimes we focus on the combinatorial tree ${\textsc{shape}(T)}$ , which retains the combinatorial structure of $T\in\mathcal{L}$ (or $\mathcal{L}_{\rm plane}$ ) while omitting its edge lengths and embedding. Similarly, the combinatorial tree ${\textsc{p-shape}(T)}$ retains the combinatorial structure of $T\in\mathcal{L}_{\rm plane}$ and planar embedding, and omits the edge length information. Here shape is a projection from $\mathcal{L}$ or $\mathcal{L}_{\rm plane}$ to $\mathcal{T}$ , and p-shape is a projection from $\mathcal{L}_{\rm plane}$ to $\mathcal{T}_{\rm plane}$ .

A non-empty rooted tree is called planted if its root has degree $1$ ; in this case the only edge connected to the root is called the stem. Otherwise the root has degree $\geq 2$ and a tree is called stemless. We denote by $\mathcal{L}^{|}$ and $\mathcal{L}^{\vee}$ the subspaces of $\mathcal{L}$ consisting of planted and stemless trees, respectively. Hence $\mathcal{L}=\mathcal{L}^{|}\cup\mathcal{L}^{\vee}$ . Also, we let the empty tree $\phi$ to be contained in each of the spaces. Therefore, $\mathcal{L}^{|}\cap\mathcal{L}^{\vee}=\{\phi\}$ . Similarly, we write $\mathcal{L}_{\rm plane}^{|}$ and $\mathcal{L}_{\rm plane}^{\vee}$ for the subspaces of $\mathcal{L}_{\rm plane}$ consisting of planted and stemless trees, respectively. Clearly, $\mathcal{L}_{\rm plane}=\mathcal{L}_{\rm plane}^{|}\cup\mathcal{L}_{\rm plane}^{\vee}$ and $\mathcal{L}_{\rm plane}^{|}\cap\mathcal{L}_{\rm plane}^{\vee}=\{\phi\}$ . Fig. 5 shows examples of a planted and a stemless tree.

For any space $\mathcal{S}$ from the list $\{\mathcal{T},\mathcal{T}_{\rm plane},\mathcal{L},\mathcal{L}_{\rm plane}\}$ we write $\mathcal{B}\mathcal{S}$ for the respective subspace of binary trees, $\mathcal{S}^{|}$ for the subspace of planted trees in $\mathcal{S}$ including $\phi$ , and $\mathcal{S}^{\vee}$ for the subspace of stemless trees in $\mathcal{S}$ including $\phi$ . We also consider subspaces $\mathcal{B}\mathcal{S}^{|}=\mathcal{S}^{|}\cap\mathcal{B}\mathcal{S}$ of planted binary trees and $\mathcal{B}\mathcal{S}^{\vee}=\mathcal{S}^{\vee}\cap\mathcal{B}\mathcal{S}$ of stemless binary trees.

Let $l_{T}=(l_{1},\dots,l_{\#T})$ with $l_{i}>0$ be the vector of edge lengths of a tree $T\in\mathcal{L}$ (or $\mathcal{L}_{\rm plane}$ ). The length of a tree $T$ is the sum of the lengths of its edges:

[TABLE]

The height of a tree $T$ is the maximal distance between the root and a vertex:

[TABLE]

2.2 Real trees

It is often natural to consider metric trees with structures more complicated than that allowed by finite spaces $\mathcal{L}$ and $\mathcal{L}_{\rm plane}$ . In such cases, we use the following general definition.

Definition 1 (Metric tree [116, Sect. 7]).

A metric space $(M,d)$ is called a tree if for each choice of $u,v\in M$ there is a unique continuous path $\sigma_{u,v}:[0,d(u,v)]\to M$ that travels from $u$ to $v$ at unit speed, and for any simple continuous path $F:[0,L]\to M$ with $F(0)=u$ and $F(L)=v$ , the ranges of $F$ and $\sigma_{u,v}$ coincide.

As an example of a metric tree that does not belong to $\mathcal{L}_{\rm plane}$ , consider a unit disk in the complex plane $M=\{z\in\mathbb{C}:|z|\leq 1\}$ and connect each point $z\in M$ to the origin ${\bf 0}$ by a linear segment $[z,{\bf 0}]$ . Distances between points are computed in a usual way, but only along such segments. This is a tree whose (uncountable) set of leaves coincides with the unit circle $\{|z|=1\}$ . We refer to a book of Steve Evans [52] for a comprehensive discussion and further examples. Sects. 7,10 of the present survey examine several natural constructions of a metric $d$ on an $n$ -dimensional manifold $M$ with $n\geq 1$ , such that $(M,d)$ becomes a (one-dimensional) tree according to Def. 1.

Consider a metric tree $T=(M,d)$ . For any two points $x,y\in M$ , we define a segment $[x,y]\subset M$ to be the image of the unique path $\sigma_{x,y}$ of the above definition. We call a point $y\in M$ a descendant of $x\in M$ if the path $[\rho,y]$ includes $x$ . Equivalently, removing $x$ from the tree separates its descendants from the root. To lighten the notations, we conventionally say $x\in T$ to indicate that point $x\in M$ belongs to tree $T$ .

Metric trees benefit from an alternative characterization. Recall that a metric space $(M,d)$ is called [math]-hyperbolic, if any quadruple $w,x,y,z\in M$ satisfies the following four point condition [52, Lemma 3.12]:

[TABLE]

The four point condition is an algebraic description of an intuitive geometric constraint on geodesic connectivity of quadruples that is shown in Fig. 6(a). An equivalent way to define [math]-hyperbolicity is the three point condition illustrated in Fig. 6(b). It is readily seen that the four point condition is satisfied by any finite tree with edge lengths (considered as a metric space). In general, a connected and [math]-hyperbolic metric space is called a real tree, or $\mathbb{R}$ -tree [52, Theorem 3.40]. Similarly to the case of finite trees, we say that a point $p\in T$ is an ancestor of point $q\in T$ if the segment with endpoints $q$ and $\rho$ includes $p$ : $p\in[p,\rho]\subset T$ . In this case, the point $q$ is called a descendant of point $p$ . We denote by $\Delta_{p,T}$ the descendant tree at point $p$ , that is the set of all descendants of point $p\in T$ , including $p$ as the tree root. The set of all descendant leaves of point $p$ is denoted by $\Delta^{\circ}_{p,T}$ . We use real trees in Sect. 10 to represent the dynamics of a continuum ballistic annihilation model.

2.3 Horton pruning

The concepts of Horton pruning and self-similarity under Horton pruning were originally developed for combinatorial binary trees $T\in\mathcal{BT}$ [113, 29, 150, 81]. Here we provide a general definition of Horton pruning and Horton-Strahler orders for trees in $\mathcal{T}$ , their planar embeddings $\mathcal{T}_{\rm plane}$ , and trees with edge lengths from $\mathcal{L}$ and $\mathcal{L}_{\rm plane}$ . Horton pruning is illustrated in Fig. 7.

Definition 2 (Series reduction).

The operation of series reduction on a rooted tree (with or without edge lengths, plane or not) removes each degree-two non-root vertex by merging its adjacent edges into one. For trees with edge lengths it adds the lengths of the two merging edges. The series reduction does not affect the left/right orientation in the planar trees.

Thus, the series reduction is a mapping from the space of rooted trees (with or without edge lengths, plane or not) to the corresponding space of reduced rooted trees, which can be either $\mathcal{T},\mathcal{T}_{\rm plane},\mathcal{L},$ or $\mathcal{L}_{\rm plane}$ . Hence the term reduced in the definition of these spaces.

Definition 3 (Horton pruning).

Horton pruning $\mathcal{R}$ on either of the spaces $\mathcal{T},\mathcal{T}_{\rm plane},\mathcal{L},$ or $\mathcal{L}_{\rm plane}$ is an onto function whose value $\mathcal{R}(T)$ for a tree $T\neq\phi$ is obtained by removing the leaves and their parental edges from $T$ , followed by series reduction. We also set $\mathcal{R}(\phi)=\phi$ .

Horton pruning induces a map on the underlying space of trees (Fig. 7). The trajectory of each tree $T$ under $\mathcal{R}(\cdot)$ is uniquely determined and finite:

[TABLE]

with the empty tree $\phi$ as the (only) fixed point. The pre-image $\mathcal{R}^{-1}(T)$ of any non-empty tree $T$ consists of an infinite collection of trees.

2.4 Horton-Strahler orders

It is natural to think of the distance to $\phi$ under the Horton pruning map and introduce the respective notion of tree order [70, 129] (see Fig. 7).

Definition 4 (Horton-Strahler order).

The Horton-Strahler order ${\sf ord}(T)\in\mathbb{Z}_{+}$ of a tree $T\in\mathcal{T}$ ( $\mathcal{T}_{\rm plane},\mathcal{L},\mathcal{L}_{\rm plane}$ ) is defined as the minimal number of Horton prunings necessary to eliminate the tree:

[TABLE]

In particular, the order of the empty tree is ${\sf ord}(\phi)=0$ , because $\mathcal{R}^{0}(\phi)=\phi$ . Most of our discussion will be focused on non-empty trees with orders ${\sf ord}(T)>0$ . We will often consider measures on tree spaces that assign probability zero to the empty tree $\phi$ .

Horton pruning partitions the underlying tree space into exhaustive and mutually exclusive collection of subspaces $\mathcal{H}_{K}$ of trees of Horton-Strahler order $K\geq 0$ such that $\mathcal{R}(\mathcal{H}_{K+1})=\mathcal{H}_{K}$ . Here $\mathcal{H}_{0}=\{\phi\}$ , $\mathcal{H}_{1}$ consists of a single tree comprised of a root and a leaf descendant to the root, and all other subspaces $\mathcal{H}_{K}$ , $K\geq 2$ , consist of an infinite number of trees. In particular, the tree size in these subspaces is unbounded from above: for any $M>0$ and any $K\geq 2$ , there exists a tree $T\in\mathcal{H}_{K}$ such that $\#T>M$ . At the same time, the definition of Horton-Strahler orders implies, for any $K\geq 2$ , $\{\#T\big{|}T\in\mathcal{H}_{K}\}\geq 2^{K-1}.$

Definition 5 (Horton-Strahler terminology).

We introduce the following definitions related to the Horton-Strahler order of a tree (see Fig. 8):

(Subtree at a vertex)* For any non-root vertex $v$ in $T\neq\phi$ , a subtree** $T_{v}\subset T$ is the only planted subtree in $T$ rooted at the parental vertex ${\sf parent}(v)$ of $v$ , and comprised by $v$ and all its descendant vertices together with their parental edges.* 2. 2.

(Vertex order)* For any vertex $v\in T\setminus\{\rho\}$ we set ${\sf ord}(v)={\sf ord}(T_{v})$ (Fig. 8a). We also set ${\sf ord}(\rho)={\sf ord}(T)$ .* 3. 3.

(Edge order)* The parental edge of a non-root vertex has the same order as the vertex.* 4. 4.

(Branch)* A maximal connected component consisting of vertices and edges of the same order is called a branch** (Fig. 8a). Note that a tree $T$ always has a single branch of the maximal order ${\sf ord}(T)$ . In a stemless tree, the maximal order branch may consist of a single root vertex.* 5. 5.

(Initial and terminal vertex of a branch)* The branch vertex closest to the root is called the initial vertex of the branch**. The branch vertex farthest from the root is called the terminal vertex of a branch**. See Fig. 8a.* 6. 6.

(Complete subtree of a given order)* Consider a connected component of tree $T$ that has been completely removed in $K$ pruning operations (but has not been completely removed in $K-1$ prunings). This connected component together with the vertex used to connect it to the rest of the tree is a subtree of $T$ that will be called a complete subtree of order $K$ .*

We observe that each subtree $T_{v}$ at the initial vertex $v$ of a branch of order $K\leq{\sf ord}(T)$ is a complete subtree of order $K$ , and vice versa (Fig. 8b-d). A complete subtree of order ${\sf ord}(T)$ coincides with $T$ (Fig. 8e). All subtrees of order ${\sf ord}=1$ are complete (and consist of a single leaf and its parental edge).

Figures 1,2,9,10 show examples of Horton-Strahler ordering in binary trees.

2.5 Alternative definitions of Horton-Strahler orders

Definition 4 connects the Horton-Strahler orders to the Horton pruning operation, which is the main theme of this survey. Here we give two alternative, equivalent, definitions of the Horton-Strahler orders. The proof of equivalence is straightforward and is left as an exercise.

The Horton-Strahler orders can be defined via hierarchical counting [70, 129, 36, 113, 108, 29]. In this approach, each leaf is assigned order $1$ . If an internal vertex $p$ has $m\geq 1$ offspring with orders $i_{1},i_{2},\ldots,i_{m}$ and $r=\max\left\{i_{1},i_{2},\ldots,i_{m}\right\}$ , then

[TABLE]

The parental edge of a non-root vertex has the same order as the vertex. The Horton-Strahler order of a tree $T\neq\phi$ is ${\sf ord}(T)=\max\limits_{v\in T}{\sf ord}(v)$ , where the maximum is taken over all vertices in $T$ . This definition is most convenient for practical calculations, which explains its popularity in the literature.

For instance, in a reduced binary tree, an internal vertex $p$ with two offspring of orders $i$ and $j$ has order

[TABLE]

where $\delta_{ij}$ is the Kronecker’s delta and $\lfloor x\rfloor$ denotes the maximal integer less than or equal to $x$ . In words, the order increases by unity every time when two edges of the same order meet at a vertex (Figs. 1,2,9,10).

Finally, we observe that ${\sf ord}(T)$ of a planted tree $T$ equals the depth of the maximal planted perfect binary subtree of $T$ with the same root (see Sect. 3.4, Ex. 1).

2.6 Tokunaga indices and side branching

The Tokunaga indices complement the Horton-Strahler orders (Sects. 2.4,2.5) by cataloging the mergers of branches according to their orders. In this work, we define and use the Tokunaga indices in binary trees. It is straightforward to adopt these definitions for trees with general branching.

Recall that a branch (Def. 5) is an uninterrupted sequence of vertices and edges of the same order (Fig. 8(a)). According to the Horton-Strahler ordering rules, every time when two branches of the same order $i$ meet at a vertex, this vertex (and hence the branch for which this is the terminal vertex) is assigned order $i\!+\!1$ . We refer to this as principal branching. A merger of two branches of distinct orders at a vertex, however, does not result in assigning this vertex (and the corresponding branch) a higher order; in this case a higher-order branch absorbs the lower-order branch. This phenomenon is known as side branching [108]. A branch of order $i$ that merges with (and is being absorbed by) a branch of a higher order $j>i$ is referred to as a side branch of Tokunaga index $\{i,j\}$ .

Formally, for a non-root vertex $v$ in a reduced binary tree, we let ${\sf sibling}(v)$ denote the unique vertex of the tree that has the same parent as $v$ , i.e., ${\sf parent}(v)={\sf parent}({\sf sibling}(v)).$

Definition 6 (Tokunaga indices).

In a binary tree $T$ , consider a branch $b$ of order $i\in\{1,\dots,{\sf ord}(T)-1\}$ , and let $v$ denote the initial vertex of the branch $b$ , whence ${\sf ord}(v)=i$ . The branch $b$ is assigned the Tokunaga index $\{i,j\}$ , where $j={\sf ord}({\sf sibling}(v))$ . The Horton-Strahler ordering rules imply that $j\geq i$ . A branch with Tokunaga index $\{i,i\}$ is called principal branch. A branch with Tokunaga index $\{i,j\}$ such that $i<j$ is called side branch.

The definition of Tokunaga indices is illustrated in Fig. 11.

Remark 1.

We emphasize that the Tokunaga indices refer to the tree branches, not to individual vertices and edges as is the case with the Horton-Strahler orders.

2.7 Labeling edges

The edges of a planar tree can be labeled by numbers $1,\dots,\#T$ in order of depth-first search. For a tree with no embedding, labeling is done by selecting a suitable embedding and then using the depth-first search labeling as above. Such embedding should be properly aligned with the Horton pruning $\mathcal{R}$ , as we describe in the following definition.

Definition 7 (Proper embedding).

An embedding function $\textsc{embed}:\mathcal{T}\to\mathcal{T}_{\rm plane}$ ( $\mathcal{L}\to\mathcal{L}_{\rm plane}$ ) is called proper if for any $T\in\mathcal{T}$ $(T\in\mathcal{L})$

[TABLE]

where the pruning on the left-hand side is in $\mathcal{T}_{\rm plane}$ ( $\mathcal{L}_{\rm plane}$ ) and pruning on the right-hand side is in $\mathcal{T}$ ( $\mathcal{L})$ .

An example of proper embedding is given in [84].

2.8 Galton-Watson trees

The Galton-Watson distributions (aka Bienaymé-Galton-Watson distributions) over $\mathcal{T}^{|}$ are pivotal in the theory of random trees. Recall that a random Galton-Watson tree starts with a single progenitor represented by the tree root. The population then develops in discrete steps. At every discrete step $d>0$ each existing population member (represented by a tree leaf at the maximal depth $d-1$ ) gives birth to $k\geq 0$ offspring with probability $q_{k}$ , $\sum_{k\geq 0}q_{k}=1$ , with $k=0$ representing no offspring, and terminates. Hence, each member that terminates at step $d$ is represented by a tree vertex at depth $d-1$ . The process stops at step $d_{\rm max}$ when every leaf at depth $d_{\rm max}-1$ produces no offspring.

We denote the respective tree distribution on $\mathcal{T}^{|}$ by $\mathcal{GW}(\{q_{k}\})$ . Observe that $q_{1}=0$ in order to generate reduced trees. Assuming that $q_{1}<1$ , the resulting tree is finite with probability one if and only if $\sum k\,q_{k}\leq 1$ [66, 11]. At the same time, it is well known that in the critical case (i.e., for $\sum kq_{k}=1$ ) the time to extinction (and hence the tree size) has infinite first moment.

We write $\mathcal{GW}(q_{0},q_{2})$ for the probability distribution of (combinatorial) binary Galton-Watson trees in $\mathcal{BT}^{|}$ . The critical case (unit expected progeny) corresponds to $q_{0}=q_{2}=1/2$ . Finally, we let $\mathcal{GW}_{\rm plane}(q_{0},q_{2})$ denote the probability distribution of (combinatorial) plane binary Galton-Watson trees in $\mathcal{BT}_{\rm plane}^{|}$ . A random tree sampled from $\mathcal{BT}_{\rm plane}^{|}$ with distribution $\mathcal{GW}_{\rm plane}(q_{0},q_{2})$ is obtained from a random tree sampled from $\mathcal{BT}^{|}$ with distribution $\mathcal{GW}(q_{0},q_{2})$ via the uniform planar embedding that assigns the left-right orientation to each pair of offsprings uniformly and independently for each node.

We conclude this section with a particular characterization of the critical binary Galton-Watson distribution $\mathcal{GW}(1/2,1/2)$ ; it follows directly from the process definition and will be used later.

Remark 2.

A distribution $\mu$ on $\mathcal{BT}^{|}$ is $\mathcal{GW}(1/2,1/2)$ if and only if it can be constructed in the following way. Start with a stem (root edge). With probability $1/2$ this completes the tree generation process. With the complementary probability $1/2$ , draw two trees independently from the distribution $\mu$ , and attach them (as subtrees) to the non-root vertex of the stem. This completes the construction.

3 Self-similarity with respect to Horton pruning

This section introduces self-similarity for finite combinatorial and metric trees. The term self-similarity is associated with invariance of a tree distribution with respect to the Horton pruning $\mathcal{R}$ introduced in Sect. 2.3. The prune-invariance alone, however, is insufficient to generate interesting families of trees. This calls for an additional property – coordination among trees of different orders. Coordination together with prune-invariance constitutes the self-similarity studied in this work.

We start in Sects. 3.1, 3.2 with a strong, distributional, self-similarity for measures on the spaces $\mathcal{T}$ and $\mathcal{L}$ , respectively. A weaker form of self-similarity that only considers the average values of selected branch statistics it discussed in Sect. 3.3 for a narrower class of combinatorial binary trees from $\mathcal{BT}$ .

3.1 Self-similarity of a combinatorial tree

Let $\mathcal{H}_{K}\subset\mathcal{T}$ be the subspace of trees of Horton-Strahler order $K\geq 0$ . Naturally, $\mathcal{H}_{K}\bigcap\mathcal{H}_{K^{\prime}}=\emptyset$ if $K\neq K^{\prime}$ , and $\bigcup\limits_{K\geq 1}\mathcal{H}_{K}=\mathcal{T}$ . Consider a set of conditional probability measures $\{\mu_{K}\}_{K\geq 0}$ each of which is defined on $\mathcal{H}_{K}$ by

[TABLE]

and let $p_{K}=\mu(\mathcal{H}_{K})$ . Then $\mu$ can be represented as a mixture of the conditional measures:

[TABLE]

Definition 8 (Horton prune-invariance).

Consider a probability measure $\mu$ on $\mathcal{T}$ such that $\mu(\phi)=0$ . Let $\nu$ be the pushforward measure, $\nu=\mathcal{R}_{*}(\mu)$ , i.e.,

[TABLE]

Measure $\mu$ is called invariant with respect to the Horton pruning (Horton prune-invariant) if for any tree $T\in\mathcal{T}$ we have

[TABLE]

Remark 3.

The pushforward measure $\nu$ is induced by the original measure $\mu$ via the pruning operation: if $T^{\prime}\stackrel{{\scriptstyle d}}{{\sim}}\mu$ then $T=\mathcal{R}(T^{\prime})\stackrel{{\scriptstyle d}}{{\sim}}\nu$ . In particular, we observe that $\nu(\phi)=\mu(\mathcal{H}_{1})$ and this probability can be positive.

Proposition 1.

Let $\mu$ be a Horton prune-invariant measure on $\mathcal{T}$ . Then the distribution of orders, $p_{K}=\mu(\mathcal{H}_{K})$ , is geometric:

[TABLE]

where $p=p_{1}=\mu(\mathcal{H}_{1})$ , and for any $T\in\mathcal{H}_{K}$

[TABLE]

Proof.

Horton pruning $\mathcal{R}$ is a shift operator on the sequence of subspaces $\{\mathcal{H}_{k}\}$ :

[TABLE]

The only tree eliminated by pruning is the tree of order $1$ : $\{\tau:\mathcal{R}(\tau)=\phi\}=\mathcal{H}_{1}.$ This allows to rewrite (8) for any $T\neq\phi$ as

[TABLE]

Combining (11) and (12) we find for any $K\geq 2$

[TABLE]

which establishes (9). Next, for any tree $T\in\mathcal{H}_{K}$ we have

[TABLE]

Together with (12) this implies (10). ∎

Proposition 1 shows that a Horton prune-invariant measure $\mu$ is completely specified by its conditional measures $\mu_{K}$ and the mass $p=\mu(\mathcal{H}_{1})$ of the tree of order $K=1$ . The same result was obtained for Galton-Watson trees in [29, Thm. 3.5].

Next, we introduce a (distributional) coordination property. Informally, we require that a complete subtree $T_{K}$ of a given order $K$ uniformly randomly selected from a random tree $T_{H}$ of order $H\geq K$ has a common distribution independent of $H$ . Since a tree $T_{K}$ of order $K$ has only one complete subtree of order $K$ , which coincides with $T_{K}$ , this common distribution must be $\mu_{K}$ . Formally, consider the following process of selecting a uniform random complete subtree ${\sf subtree}_{K,H}$ of order $K$ from a random tree $T_{H}\in\mathcal{H}_{H}$ . First, select a random tree $T_{H}$ according to the conditional measure $\mu_{H}$ . Label all complete subtrees of order $K$ in $T_{H}$ in order of proper labeling of Sect. 2.7, and select a uniform random subtree, which we denote ${\sf subtree}_{K,H}$ . By construction, ${\sf subtree}_{K,H}\in\mathcal{H}_{K}$ ; we denote the corresponding sampling measure on $\mathcal{H}_{K}$ by $\mu^{H}_{K}$ .

Definition 9 (Coordination).

A set of measures $\{\mu_{K}\}_{K\geq 1}$ on $\{\mathcal{H}_{K}\}_{K\geq 1}$ is called coordinated if $\mu^{H}_{K}(T)=\mu_{K}(T)$ for any $K\geq 1$ , $H\geq K$ , and $T\in\mathcal{H}_{K}$ . A measure $\mu$ on $\mathcal{T}$ is called coordinated if the respective conditional measures $\{\mu_{K}\}$ , as in Eq. (7), are coordinated.

Definition 10 (Combinatorial Horton self-similarity).

*A probability measure $\mu$ on $\mathcal{T}$ is called self-similar with respect to Horton pruning (Horton self-similar) if it is coordinated and Horton prune-invariant. *

3.2 Self-similarity of a tree with edge lengths

Consider a tree $T\in\mathcal{L}$ with edge lengths given by a positive vector $l_{T}=(l_{1},\dots,l_{\#T})$ and let $\textsc{length}(T)=\sum_{i}l_{i}$ . We assume that the edges are labeled in a proper way as described in Sect. 2.7. A tree is completely specified by its combinatorial shape $\textsc{shape}(T)$ and edge length vector $l_{T}$ . The edge length vector $l_{T}$ can be specified by distribution $\chi(\cdot)$ of a point $x_{T}=(x_{1},\dots,x_{\#T})$ on the simplex $\sum_{i}x_{i}=1$ , $0<x_{i}\leq 1$ , and conditional distribution $F(\cdot|x_{T})$ of the tree length $\textsc{length}(T)$ , where

[TABLE]

A measure $\eta$ on $\mathcal{L}$ is a joint distribution of tree’s combinatorial shape and its edge lengths; it has the following component measures.

[TABLE]

The definition of self-similarity for a tree with edge lengths builds on its analog for combinatorial trees in Sect. 3.1. The combinatorial notions of coordination (Def. 9) and Horton prune-invariance (Def. 8), which we refer to as coordination and prune-invariance in shapes, are complemented with analogous properties in edge lengths. Formally, we denote by $\mu^{H}_{K}(\tau)$ , $\chi^{H}_{\tau}(\bar{x})$ , and $F^{H}_{\tau,\bar{x}}(\ell)$ the component measures for a uniform complete subtree ${\sf subtree}_{K,H}$ . (Notice that the subtree order $K$ is completely specified by the tree shape $\tau$ , which explains the absence of subscript $K$ in the component measures for subtree length). We also consider the distribution of edge lengths after pruning:

[TABLE]

and

[TABLE]

Finally, we adopt here the notation $\mathcal{H}_{K}$ for a subspace of trees of order $K\geq 1$ from $\mathcal{L}$ , and consider conditional measures $\mu_{K}(\tau)=\mu(\tau|{\sf ord}(\tau)=K)$ , $K\geq 1$ , for a tree $\tau\in\mathcal{L}$ .

Definition 11 (Horton self-similarity of a tree with edge lengths).

We call a measure $\eta$ on $\mathcal{L}$ self-similar with respect to Horton pruning $\mathcal{R}$ if the following conditions hold:

(i)

The measure is coordinated in shapes. This means that for every $K\geq 1$ and every $H\geq K$ we have

[TABLE]

(ii)

The measure is coordinated in lengths. This means that for every $K\geq 1$ , $H\geq K$ , and $\tau\in\mathcal{H}_{K}$ we have

[TABLE]

and for every given $\bar{x}$ ,

[TABLE]

(iii)

The measure is Horton prune-invariant in shapes. This means that for the pushforward measure $\nu=\mathcal{R}_{*}(\mu)=\mu\circ\mathcal{R}^{-1}$ we have

[TABLE]

(iv)

The measure is Horton prune-invariant in lengths. This means that

[TABLE]

and there exists a scaling exponent $\zeta>0$ such that for any combinatorial tree $\tau\in\mathcal{T}$ we have

[TABLE]

3.3 Mean self-similarity of a combinatorial tree

The discussion of this section refers to the space $\mathcal{BT}$ of combinatorial binary trees. Let $N_{k}=N_{k}[T]$ be the number of branches of order $k$ in a tree $T$ , and $N_{i,j}=N_{i,j}[T]$ be the number of side branches with Tokunaga index $\{i,j\}$ with $1\leq i<j\leq{\sf ord}(T)$ in a tree $T$ , i.e., the number of instances when an order- $i$ branch merges with and is being absorbed by an order- $j$ branch. Examples of counts $N_{i}[T]$ and $N_{i,j}[T]$ are given in Figs. 8,10,11. We do not consider the numbers $N_{i,i}[T]$ of principal branches in $T$ , since $N_{i,i}[T]=2N_{i+1}[T]$ and hence such counts are redundant with respect to the branch counts.

We write ${\sf E}_{K}[\cdot]$ for the mathematical expectation with respect to $\mu_{K}$ of Eq. (6). As before, we adopt the notation $\mathcal{H}_{K}$ for the subspace of trees of order $K$ in $\mathcal{BT}$ .

We define the average Horton numbers for subspace $\mathcal{H}_{K}$ as

[TABLE]

and the average side-branch numbers of index $\{i,j\}$ as

[TABLE]

We assume below that the average branch and side-branch numbers are finite for any $K\geq 1$ :

[TABLE]

The Tokunaga coefficient $T_{i,j}[K]$ for subspace $\mathcal{H}_{K}$ is defined as the ratio of the average side-branch number of index $\{i,j\}$ to the average Horton number of order $j$ :

[TABLE]

The Tokunaga coefficient $T_{i,j}[K]$ is hence reflects the average number of side-branches of index $\{i,j\}$ per branch of order $j$ in a tree of order $K$ .

Remark 4.

Suppose that measure $\mu$ is coordinated (Def. 9). Then, all (complete) branches of order $j$ within a random tree $T\in\mathcal{H}_{K}$ sampled with $\mu_{K}$ have the same distribution. In particular, the numbers $n_{i,j}(b_{k})$ of branches of order $i$ that merge into a particular branch $b_{k}$ , $k=1,\dots,N_{j}[T]$ of order $j$ in $T$ has the same distribution for all $b_{k}$ . Let $n_{i,j}$ be a random variable such that $n_{i,j}(b_{k})\stackrel{{\scriptstyle d}}{{=}}n_{i,j}$ . Assume, furthermore, that the random counts $n_{i,j}(b_{k})$ are independent of $N_{j}[T]$ . Then, by Wald’s equation, we have

[TABLE]

and, accordingly,

[TABLE]

In other words, the Tokunaga coefficient in this case is the expected number of side-branches of appropriate index in a randomly selected branch. This is how the Tokunaga coefficient is often defined (e.g., [29]). The definition (14) adopted here is more general, as it does not require the distributional coordination and independence of side-branch numbers and branch numbers.

Next, we introduce a property that ensures independence of the side-branch structure of a tree order. This is a weaker version of the distributional coordination (Def. 9).

Definition 12 (Mean coordination).

A set of probability measures $\{\mu_{K}\}_{K\geq 1}$ on $\{\mathcal{H}_{K}\}_{K\geq 1}$ is called mean coordinated if

[TABLE]

A measure $\mu$ on $\mathcal{BT}$ is called mean coordinated if the respective conditional measures $\{\mu_{K}\}$ , as in Eq. (7), are mean coordinated.

For a mean coordinated measure $\mu$ , the Tokunaga matrix $\mathbb{T}_{K}$ is a $K\times K$ matrix

[TABLE]

which coincides with the restriction of any larger-order Tokunaga matrix $\mathbb{T}_{M}$ , $M>K$ , to the first $K\times K$ entries.

Definition 13 (Toeplitz property).

A set of probability measures $\{\mu_{K}\}_{K\geq 1}$ on $\{\mathcal{H}_{K}\}_{K\geq 1}$ is said to satisfy the Toeplitz property if for every $K\geq 2$ there exists a sequence $T_{k}[K]\geq 0$ , $k=1,2,\dots$ such that

[TABLE]

The elements of the sequences $T_{k}[K]$ are also referred to as Tokunaga coefficients, which does not create confusion with $T_{i,j}[K]$ . A measure $\mu$ on $\mathcal{BT}$ is said to satisfy the Toeplitz property if the respective conditional measures $\{\mu_{K}\}$ , as in Eq. (7), satisfy the Toeplitz property.

Definition 14 (Mean Horton self-similarity).

A set of probability measures $\{\mu_{K}\}_{K\geq 1}$ on $\{\mathcal{H}_{K}\}_{K\geq 1}$ is called mean Horton self-similar if it is mean coordinated and satisfies the Toeplitz property. A measure $\mu$ on $\mathcal{BT}$ is called mean Horton self-similar if the respective conditional measures $\{\mu_{K}\}$ , as in Eq. (7), are mean Horton self-similar.

An alternative definition Def. 16 stated below will explain the name.

Combining Eqs. (15) and (16) we find that for a mean Horton self-similar measure there exists a nonnegative Tokunaga sequence $\{T_{k}\}_{k=1,2,\ldots}$ such that

[TABLE]

and the corresponding Tokunaga matrices $\mathbb{T}_{K}$ are Toeplitz:

[TABLE]

Recall that Horton pruning $\mathcal{R}$ decreases the Horton-Strahler order of each vertex (and hence of each branch) by unity; in particular

[TABLE]

Consider the pushforward probability measure $\mathcal{R}_{*}(\mu)$ induced on $\mathcal{H}_{K}$ by the pruning operator:

[TABLE]

The Tokunaga coefficients computed on $\mathcal{H}_{K}$ using the pushforward measure $\mathcal{R}_{*}(\mu)$ are denoted by $T_{i,j}^{\mathcal{R}}[K]$ . Formally,

[TABLE]

Definition 15 (Mean Horton prune-invariance).

A set of probability measures $\{\mu_{K}\}_{K\geq 1}$ on $\{\mathcal{H}_{K}\}_{K\geq 1}$ is called mean Horton prune-invariant if

[TABLE]

for any $K\geq 2$ and all $1\leq i<j\leq K$ . A measure $\mu$ on $\mathcal{BT}$ is called mean Horton prune-invariant if the respective conditional measures $\{\mu_{K}\}$ , as in Eq. (7), are mean Horton prune-invariant.

Definition 16 (Mean Horton self-similarity).

A set of probability measures $\{\mu_{K}\}_{K\geq 1}$ on $\{\mathcal{H}_{K}\}_{K\geq 1}$ is called mean self-similar with respect to Horton pruning, or mean Horton self-similar, if it is mean coordinated and mean Horton prune-invariant. A measure $\mu$ on $\mathcal{BT}$ is called mean self-similar with respect to Horton pruning if the respective conditional measures $\{\mu_{K}\}$ , as in Eq. (7), are mean self-similar with respect to Horton pruning.

Proposition 2.

Definitions 14 and 16 of mean self-similarity are equivalent.

This equivalence was proven in [81]. Its validity is readily seen from the diagram of Fig. 12a, which shows relations among the quantities $T_{i,j}[K]$ , $T_{i,j}[K+1]$ , and $T_{i+1,j+1}[K+1]$ involved in the definitions of mean coordination (Def. 12), Toeplitz property (Def. 13), and mean Horton prune-invariance (Def. 15). Moreover, we observe that if any two of these properties hold, the third also holds. The Venn diagram of Fig. 12b illustrates the relation among mean coordination, mean prune-invariance, Toeplitz property and mean self-similarity in the binary tree space $\mathcal{BT}$ .

Consider a mean Horton self-similar measure $\mu$ . Observe that since exactly two branches of order $k$ are required to form a branch of order $(k+1)$ , the average number of side-branches of order $1\leq k<K$ within $\mathcal{H}_{K}$ is ${\mathcal{N}}_{k}[K]-2{\mathcal{N}}_{k+1}[K]$ . This number can also be computed by counting the average number of side-branches of order $k$ for all higher-order branches:

[TABLE]

Equalizing these two expressions we arrive at the main system of counting equations:

[TABLE]

Consider a $K\times K$ linear operator

[TABLE]

The counting equations (22) rewrite as

[TABLE]

where $e_{K}$ is the $K$ -th coordinate basis vector. Using this equation for $(K+1)$ and considering the last $K$ components we obtain

[TABLE]

This proves the following statement.

Proposition 3.

Consider a mean Horton self-similar measure $\mu$ on $\mathcal{BT}$ . Then for any $K\geq 1$ and $1\leq k\leq K$ we have

[TABLE]

and

[TABLE]

Definition 17 (Tokunaga self-similarity).

A mean Horton self-similar measure $\mu$ on $\mathcal{BT}$ is called Tokunaga self-similar with parameters $(a,c)$ if its Tokunaga sequence $\{T_{j}\}_{j=1,2,\ldots}$ is expressed as

[TABLE]

for some constants $a\geq 0$ and $c>0$ .

Tokunaga self-similarity (25) specifies a combinatorial tree shape (up to a permutation of side branch attachment within a given branch) with only two parameters $(a,c)$ , hence suggesting a conventional modeling paradigm. The empirical validity of the Tokunaga self-similarity constraints (25) has been confirmed for a variety of river networks at different geographic locations [113, 131, 38, 94, 155], as well as in other types of data represented by trees, including botanical trees [108], the veins of botanical leaves [137, 114], clusters of dynamically limited aggregation [111, 108], percolation and forest-fire model clusters [152, 145], earthquake aftershock sequences [135, 69, 149], tree representation of symmetric random walks [150] (Sect. 7.6), and hierarchical clustering [58]. The conditions (25), however, lacks a theoretical justification. We make a step towards justifying this condition in Sect. 6.7.2.

Remark 5 (Mean self-similarity is a property of conditional measures).

The properties introduced in this section – mean coordination (Def. 12), Toeplitz (Def. 13), mean Horton prune-invariance (Def. 15), and mean Horton self-similarity (Def. 14,16) – are completely specified by a set of conditional measures $\{\mu_{K}\}$ , and are independent of the randomization probabilities $p_{K}=\mu(\mathcal{H}_{K})$ , see Eq. (7).

Remark 6 (Terminology).

The self-similarity concepts studied in this work refer to a measure $\mu$ , or a collection of conditional measures $\{\mu_{K}\}$ , on a suitable space of trees. For the sake of brevity, we sometimes use a common abuse of notations and discuss self-similarity of a random tree $T\stackrel{{\scriptstyle d}}{{\sim}}\mu$ (e.g., claiming that a tree $T$ is mean Horton self-similar, etc.). Formally, such statements apply to the respective tree distribution $\mu$ .

3.4 Examples of self-similar trees

This section collects some examples (and non-examples) of self-similar trees and related properties.

Example 1 (Perfect binary trees).

Recall that a binary tree is called perfect if it is reduced and all its leaves have the same depth (combinatorial distance from the root). Consider space ${\sf Bin}\subset\mathcal{BT}^{|}$ of finite planted perfect binary trees; see Fig. 13. We write $D=D[T]$ for the depth of a tree $T$ and ${\sf Bin}_{D}\subset{\sf Bin}$ for the subspace of trees of depth $D\geq 1$ . The subspace ${\sf Bin}_{D}$ consists of a single tree with $2^{D-1}$ leaves; it has Horton-Strahler order $D$ . Every conditional measure $\mu_{K}$ in this case is a point measure on ${\sf Bin}_{K}$ , $K\geq 1$ . Moreover, the order of a vertex at depth $1\leq d\leq D$ (and its parental edge) is $D-d+1$ , and for the tree ${\sf Bin}_{K}$ we have

[TABLE]

We write ${\sf Bin}(\kappa)\subset\mathcal{BL}^{|}$ for the space of metric trees with combinatorial shapes from ${\sf Bin}$ and length $\kappa^{i-1}$ assigned to edges of order $i\geq 1$ . The bottom row of Fig. 13 shows trees ${\sf Bin}_{i}$ , $i=4,3,2,1$ , that correspond to $\kappa=1.5$ .

(a)

Coordination in shapes (Def. 9 or 11(i)) and in lengths (Def. 11(ii)). The space ${\sf Bin}$ is coordinated in shapes and lengths, since every subtree of order $K$ in a tree of order $H\geq K$ (not necessarily a uniform complete subtree) is the tree ${\sf Bin}_{K}$ .

(b)

Mean coordination (Def. 12) and Toeplitz property (Def. 13). By construction, the space ${\sf Bin}$ has no side-branching ( $N_{i,j}[T]=0$ ), and so

[TABLE]

This implies mean coordination and Toeplitz property.

(c)

Mean self-similarity (Def. 14) follow from (b).

(d)

Mean Horton self-similarity (Def. 16). Recall that subspace ${\sf Bin}_{K}$ consists of a single tree for any $K\geq 1$ . Since

[TABLE]

the space is mean Horton prune-invariant. Together with mean coordination of (b) this implies mean Horton self-similarity.

(e)

Combinatorial Horton self-similarity (Def. 10). Observe that the argument used in (d) also implies Horton prune-invariance in shapes (Def. 8 or 11(iii)). Together with coordination in shapes of (a) this gives combinatorial Horton self-similarity.

(f)

Tokunaga self-similarity with $a=0$ (Def. 17) follows from (b).

(g)

Horton prune-invariance in lengths (Def. 11(iv)). By construction, the leaves of a pruned tree have length $\kappa$ ; and the edge lengths change by a multiplicative factor $\kappa$ with every combinatorial step toward the root. This implies Horton prune-invariance in lengths with $\zeta=\kappa$ .

(h)

Self-similarity (Def. 11) with $\zeta=\kappa$ follows from (a), (c) or (d), and (g). It implies that for any $K\geq 1$ and $m\geq 0$ , the tree ${\sf Bin}_{K}$ is obtained by scaling all edges of the tree $\mathcal{R}^{m}({\sf Bin}_{K+m})$ by a multiplicative factor $\kappa^{-m}$ . The four columns of Fig. 13 correspond to $m=0,1,2,3$ and $K+m=4$ .

Example 2 (Combinatorial critical binary Galton-Watson trees).

The Galton-Watson distribution $\mathcal{GW}(\{q_{k}\})$ on $\mathcal{T}^{|}$ has the coordination property for any distribution $\{q_{k}\}$ with $p_{1}\neq 1$ . Indeed, the Markovian branching mechanism (see Sect. 2.8) creates subtrees of the same structure, independently of the tree order. This implies coordination. However, mean and distributional prune-invariance (and hence mean and combinatorial Horton self-similarity) only hold in the critical binary case $\mathcal{GW}(\frac{1}{2},\frac{1}{2})$ [29]. The corresponding Tokunaga sequence is $T_{j}=2^{j-1}$ , $j\geq 1$ , which implies Tokunaga self-similarity with parameters $(a,c)=(1,2)$ .

Example 3 (Critical binary Galton-Watson trees with i.i.d. exponential edge lengths).

The space of critical binary Galton-Watson trees with independent exponential edge lengths is Horton self-similar with $\zeta=2$ ; this is shown in Sect. 5.1.

Example 4 (Hierarchical Branching Process).

Section 6 introduces a rich class of measures on $\mathcal{BL}^{|}$ induced by the Hierarchical Branching Process (HBP). Notably, one can construct a version of the process that is Horton self-similar (Def. 11) with an arbitrary Tokunaga sequence $\{T_{j}\}$ and for an arbitrary $\zeta>0$ . This class includes the critical binary Galton-Watson tree with independent exponential lengths as a special case.

Example 5 (Combinatorial Tokunaga trees).

Tokunaga self-similar trees (Def. 17) are specified by a particular form of the Tokunaga sequence:

[TABLE]

This is a very flexible model that can account for a variety of dendritic patterns. Figure 14 shows four selected examples:

[TABLE]

The case $T_{j}=0$ corresponds to perfect binary trees with no side branching (see also Ex. 1). In this case, all branch mergers lead to increase of branch order by unity. This results in a most symmetric deterministic tree structure. Some side branching appears for $T_{j}=\delta_{j,1}$ (hence $T_{1}=1,T_{2}=0,T_{3}=0,\dots$ ): every branch of order $K$ has on average a single side branch of order $(K-1)$ , and no side branches of lower orders. This destroys symmetry and introduce randomness in tree shape. The case $T_{j}=1$ corresponds to an average of one side branch of any order $1\leq k\leq K-1$ within a branch of order $K$ , resulting in tentacle-shaped formations of varying length. The most complicated case illustrated here corresponds to $T_{j}=2^{j-1}$ , which is the Tokunaga sequence for critical binary Galton-Watson trees (but not necessarily vice versa); see Ex. 2. In this case the number of side branches increases geometrically with the difference of branch orders, hence producing branches with widely varying lengths and shapes.

Example 6 (Tokunaga trees with i.i.d. exponential edge lengths).

Random edge lengths often appear as an element of applied modeling. Figure 15 illustrates the same four Tokunaga models as in Ex. 5, with i.i.d. exponential edge lengths. Clearly, this additional random element substantially affects the tree outlook. The edge length variability becomes a dominant element of the metric tree shape. We notice, in particular, that the four types of trees with exponential edge lengths in Fig. 15 look much more similar that the same four types with deterministic edge lengths related to branch order.

Example 7 (Critical Tokunaga processes).

Section 6.5 introduces a subclass of HBP, called critical Tokunaga processes, with $T_{j}=(c-1)c^{j-1}$ , $j\geq 1$ for an arbitrary $c\geq 1$ . These processes generate tree distributions that are Horton self-similar with $\zeta=c$ and have i.i.d. exponential edge lengths.

Example 8 (Independent random attachment).

A variety of mean Horton self-similar measures on $\mathcal{T}$ can be constructed for an arbitrary sequence of Tokunaga coefficients $\{T_{j}\}_{j=1,2,\ldots}$ . Here we give a natural example [81].

Fix a sequence $\{T_{j}\}_{j=1,2,\ldots}$ of Tokunaga coefficients. By Remark 5, it is sufficient to construct a set of Horton self-similar conditional measures $\mu_{K}$ , $K\geq 1$ .

The subspace $\mathcal{H}_{1}$ , which consists of a single-leaf tree $\tau_{1}$ , possesses a trivial unity mass conditional measure $\mu_{1}$ . To construct a random tree from $\mathcal{H}_{2}$ , we select a discrete probability distribution $P_{1,2}(n)$ , $n=0,1,\dots$ , with the mean value $T_{1}$ . A random tree $T\in\mathcal{H}_{2}$ is obtained from the single-leaf tree $\tau_{1}$ via the following two operations. First, we attach two offspring vertices to the leaf of $\tau_{1}$ . This creates a tree of order $2$ with no side-branches – one internal vertex of degree 3, two leaves, and the root. Second, we draw the number $\tilde{N}_{1,2}$ from the distribution $P_{1,2}$ , and attach $\tilde{N}_{1,2}$ vertices to this tree so that they form side-branches of index $\{1,2\}$ .

In general, we use a recursive construction procedure. Assume that a measure $\mu_{K-1}$ , $K\geq 2$ , is constructed. To construct a random tree $T\in\mathcal{H}_{K}$ we select a set of discrete probability distributions $P_{k,K}(n)$ , $k=1,...,K-1$ , on $\mathbb{Z}_{+}$ with the respective mean values $T_{j}$ . A random tree $T\in\mathcal{H}_{K}$ is constructed by adding branches of order $1$ (leaves) to a random tree $\tau\in\mathcal{H}_{K-1}$ . First, we add two new child vertices to every leaf of $\tau$ hence producing a tree $\tilde{T}$ of order $K$ with no side-branches of order $1$ . Second, for each branch $b$ of order $2\leq j\leq K$ in $\tilde{T}$ we draw a random number $\tilde{N}_{1,j}(b)$ from the distribution $P_{j-1,K}$ and attach $\tilde{N}_{1,j}(b)$ new child vertices to this branch so that they form side-branches of index $\{1,j\}$ . Each new vertex is attached in a random order with respect to the existing side-branches. Specifically, we notice that $m\geq 0$ side-branches attached to a branch of order $j$ are uniquely associated with $m+1$ edges within this branch. The attachment of the new $\tilde{N}_{1,j}(b)$ vertices among the $m+1$ edges is given by the equiprobable multinomial distribution with $m+1$ categories and $\tilde{N}_{1,j}(b)$ trials.

The procedure described above generates a set of mean-coordinated measures $\{\mu_{K}\}_{K\geq 1}$ on $\{\mathcal{H}_{K}\}_{K\geq 1}$ , since the mean values $T_{j}$ of the distributions $P_{k,K}$ are independent of $K$ . Furthermore, observe that

[TABLE]

and hence $T_{i,j}[K]={\mathcal{N}}_{i,j}[K]/{\mathcal{N}}_{j}[K]=T_{j-i}$ , so the tree is mean self-similar, according to Def. 14.

Finally, to make that construction combinatorially Horton self-similar (Def. 10), each tree $\tau_{K}\in\mathcal{H}_{K}$ must be assigned the probability $p_{K}=p(1-p)^{K-1}$ .

Example 9 (Why coordination?).

Relating mean Horton self-similarity (Def. 16) to mean prune-invariance (Def. 15) is quite intuitive (see also [29]). Much less so is the requirement of mean coordination of conditional measures (Def. 12), included in the definition of mean self-similarity. This requirement is motivated by our goal to bridge the measure-theoretic definition of self-similarity via the pruning operation (Def. 16) to a branch counting definition (Def. 14). In applications, when a handful of trees of different orders is observed, the coordination assumption allows one to estimate the Tokunaga coefficients $T_{i,j}$ and make inference regarding the Toeplitz property; see [113, 108, 38, 155]. The absence of coordination, at the same time, allows for a variety of prune-invariant measures with no Toeplitz constraint, which are hardly treatable in applications. To give an example of such a measure, let select any tree $\tau_{2}$ from the pre-image of the only tree $\tau_{1}\in\mathcal{H}_{1}$ of order $K=1$ under the pruning operation: $\tau_{2}\in\mathcal{R}^{-1}(\tau_{1})=\mathcal{H}_{2}$ . In a similar fashion, select any tree $\tau_{K+1}$ from the pre-image of $\tau_{K}$ for $K\geq 2$ . This gives us a collection of trees $\tau_{K}\in\mathcal{H}_{K}$ , $K\geq 1$ such that $\mathcal{R}(\tau_{K+1})=\tau_{K}$ . Assign the full measure on $\mathcal{H}_{K}$ to $\tau_{K}$ : $\mu_{K}(\tau_{K})=1$ . By construction, the measures $\{\mu_{K}\}$ are mean prune-invariant. They, however, may satisfy neither the mean coordination nor the Toeplitz property. This example illustrates how one can produce rather obscure collections of mean prune-invariant measures, providing a motivation for the coordination requirement.

4 Horton law in self-similar trees

In this section, we introduce the strong Horton law for the numbers of branches of different orders in a combinatorial tree on $\mathcal{T}$ (Def. 18) and for the respective averages (Def. 19). The main result of this section (Thm. 1) shows that the mean Horton self-similarity (Defs. 14 and 16) implies the strong Horton law for mean branch numbers (Def. 19).

Consider a measure $\mu$ on $\mathcal{T}$ and its conditional measures $\mu_{K}$ , each defined on subspace $\mathcal{H}_{K}\subset\mathcal{T}$ of trees of Horton-Strahler order $K\geq 1$ . We write $T\stackrel{{\scriptstyle d}}{{\sim}}\mu_{K}$ for a random tree $T$ drawn from subspace $\mathcal{H}_{K}$ according to measure $\mu_{K}$ .

Definition 18 (Strong Horton law for branch numbers).

We say that a probability measure $\mu$ on $\mathcal{T}$ satisfies a strong Horton law for branch numbers if there exists such a positive (constant) Horton exponent $R\geq 2$ that for any $k\geq 1$

[TABLE]

that is, for any $\epsilon>0$

[TABLE]

Corollary 6 in Sect. 6.6.2 is an example of the strong Horton law for branch numbers. In the context of Horton laws, the adjective strong refers to the type of geometric decay, while the convergence of random variables is in probability. Section 4.2 discusses weaker types of geometric convergence. An alternative, weaker, definition of the Horton law is formulated in terms of expected branch counts.

Definition 19 (Strong Horton law for mean branch numbers).

We say that a probability measure $\mu$ on $\mathcal{T}$ satisfies a strong Horton law for mean branch numbers if there exists such a positive (constant) Horton exponent $R\geq 2$ that for any $k\geq 1$

[TABLE]

Lemma 1.

The strong Horton law for branch numbers (Def. 18) implies the strong Horton law for mean branch numbers (Def. 19).

Proof.

By construction, if ${\sf ord}(T)=K$ , then $N_{1}[T]\geq 2^{K-1}$ . Accordingly, for any $k\leq K$ we have ${N_{k}[T]\over N_{1}[T]}\leq 2^{1-k}$ . Assuming the strong Horton law (28) for branch numbers, for any given $\epsilon>0$ , we have

[TABLE]

for all sufficiently large $K$ . Thus, for a given $k\in\mathbb{N}$ and for all sufficiently large $K$ exceeding $k$ , we have

[TABLE]

as $\left|{N_{k}[T]\over N_{1}[T]}-R^{1-k}\right|\leq\max\Big{(}2^{1-k},\,R^{1-k}\Big{)}\leq 2^{1-k}$ . This establishes (29). ∎

A similar calculation allows us to establish the following result.

Lemma 2.

Consider a probability measure $\mu$ on $\mathcal{T}$ and suppose the following properties hold:

(i)

$\mu$ * satisfies the strong Horton law for mean branch numbers (Def. 19), and*

(ii)

$\forall k\geq 1$ * $\exists L_{k}\in[0,\infty)$ such that $\left(\frac{N_{k}[T]}{N_{1}[T]};T\stackrel{{\scriptstyle d}}{{\sim}}\mu_{K}\right)\stackrel{{\scriptstyle p}}{{\to}}L_{k}$ as $K\to\infty$ .*

Then, the measure $\mu$ satisfies the strong Horton law for branch numbers (Def. 18), i.e., $L_{k}=R^{1-k}$ .

Sufficient conditions for the strong Horton law for mean branch numbers in binary trees were found in [81], hence providing rigorous foundations for the celebrated regularity that has escaped a formal explanation for a long time. These conditions are presented in Thm. 1 of this section. It has been shown in [82] that the tree that describes a trajectory of Kingman’s coalescent process with $N$ particles obeys a weaker version of Horton law as $N\to\infty$ (Sect. 8), and that the first pruning of this tree for any finite $N$ is equivalent to a level set tree of a white noise (see Sect. 7 for definitions).

Consider a mean self-similar measure $\mu$ on $\mathcal{BT}$ with a Tokunaga sequence $\{T_{j}\}_{j=1,2,\ldots}$ . Define a sequence $t(j)$ as

[TABLE]

and let $\hat{t}(z)$ denote the generating function of $\{t(j)\}_{j=0,1,\ldots}$ :

[TABLE]

For a holomorphic function $f(z)$ represented by a power series $f(z)=\sum\limits_{j=0}^{\infty}a_{j}z^{j}$ in a nonempty disk $|z|\leq\rho$ we write

[TABLE]

Theorem 1 (Strong Horton law in a mean self-similar tree).

Suppose $\mu$ is a mean Horton self-similar measure on $\mathcal{BT}$ with a Tokunaga sequence $\{T_{j}\}_{j=1,2,\ldots}$ such that

[TABLE]

Then the strong Horton law for mean branch numbers (Def. 19) holds with the Horton exponent $R=1/w_{0}$ , where $w_{0}$ is the only real zero of the generating function $\hat{t}(z)$ in the interval $\left(0,{1\over 2}\right]$ . Moreover,

[TABLE]

and

[TABLE]

Conversely, if $~{}\limsup\limits_{j\rightarrow\infty}T_{j}^{1/j}=\infty$ , then the limit $\lim\limits_{K\to\infty}\frac{{\mathcal{N}}_{k}[K]}{{\mathcal{N}}_{1}[K]}$ does not exist at least for some $k$ .

Proof.

The proof of Thm. 1 is given in Sect. 4.1. ∎

That the Horton exponent $R$ is reciprocal to the real root of $\hat{t}(z)$ was noticed by Peckham [113], under the assumption $\displaystyle\lim_{K\to\infty}\left(N_{k}R^{k-K}\right)=const.>0$ .

Below we give two examples of using Theorem 1.

Example 10 (Tokunaga self-similar trees).

Consider a Tokunaga self-similar tree (Def. 17) with $T_{j}=a\,c^{j-1}$ , where $a,c>0$ . (We exclude the case $a=0\Rightarrow T_{j}=0$ , which correspond to perfect binary trees with no side branching.) This model received considerable attention in the literature [113, 133, 98], in part because of its ability to closely describe river networks [155]. Here we have

[TABLE]

and

[TABLE]

The discriminant of the quadratic polynomial in the numerator is positive,

[TABLE]

Therefore, there exist two real roots, $z_{1}<z_{2}$ , of the numerator. It is easy to check that

[TABLE]

Hence, there is a single root of $\hat{t}(z)=0$ for $|z|<1/c$ of algebraic multiplicity one:

[TABLE]

and the respective Horton exponent is

[TABLE]

as was observed in earlier works [133, 113, 98]. A map of the values of the Horton exponent $R(a,c)$ is shown in Fig. 16a. As suggested by (37), the level sets of $R(a,c)$ are fairly approximated by $a+c=const.$

To examine the rate of convergence in the strong Horton law, we use (34). The reciprocal generating function is given by

[TABLE]

Thus, since ${1\over z-p}=-\sum\limits_{k=0}^{\infty}{1\over p^{k+1}}z^{k}$ for $|z|<|p|$ , formula (34) implies

[TABLE]

Accordingly, the rate of convergence in (35) is determined by the ratio $z_{1}/z_{2}<1$ – values farther away from 1 lead to faster convergence. Recall (Prop. 3) that

[TABLE]

Hence, the ratio $z_{1}/z_{2}$ also determines the rate of convergence in (29). Figure 16(b) shows the ratio $z_{1}/z_{2}$ as a function of $(a,c)$ . The only region when the ratio is approaching 1, hence slowing down the convergence rate in the strong Horton law, corresponds to $\{c\approx 2,a<1\}$ .

Figure 17 illustrates the strong Horton law in a Tokunaga mean self-similar tree with $a=1,c=2$ , which corresponds to $T_{j}=2^{j-1}$ , $j\geq 1$ . In this case (Figs. 17(a),18)

[TABLE]

The ratios ${\mathcal{N}}_{k}[K]/{\mathcal{N}}_{k+1}[K]$ for $K=20$ are shown in Fig. 17(b). The ratios are very close to the theoretical value $R=1/w_{0}=4$ , except for the branch orders $k$ close to the tree order $K$ , $k>15$ . As suggested by Fig. 16(b), for most of the choices $(a,c)$ the convergence rate is higher, so we expect to have a larger number of ratios in a close vicinity of the limit value $R$ . As we discussed above, the convergence in (35) has the same rate, with first terms (small $k$ ) deviating from the limit value rather then the last ones, as was the case in (29) and Fig. 17(b).

We show below in Eq. 47 that, in general, the rate of convergence in the strong Horton law (29), (35) is controlled by

[TABLE]

where $\gamma$ separates $w_{0}$ from other possible zeros of $\hat{t}(z)$ – higher values lead to faster convergence. Figure 18 shows the value $\log_{10}|\hat{t}(z)|$ on its disk on convergence for the Tokunaga tree of this example. Here, the only zero of $\hat{t}(z)$ at $z=1/4$ (downward peak) is well isolated so that the surrounding values are separated from zero; this suggests a high rate of convergence that we already illustrated more directly in (39) and Figs. 16(b),17(b).

Example 11 (Shallow side-branching).

Suppose $T_{j}=0$ for $j\geq 3$ , that is we only have “shallow” side-branches of orders $\{k-2,k\}$ and $\{k-1,k\}$ . Then

[TABLE]

The only root of this equation within $[0,1/2]$ is

[TABLE]

which leads to

[TABLE]

In particular, if $T_{j}=0$ for $j\geq 2$ , then $R=T_{1}+2$ ; such trees are called “cyclic” [113]. This shows that the entire range of Horton exponents $2\leq R<\infty$ can be achieved by trees with only very shallow side-branching.

We conclude this section with a linear algebra construction that clarifies the essence of Horton law in a mean self-similar tree. Define a vector $\zeta_{K}\in\mathbb{R}^{K}$ of average Horton numbers and a respective normalized vector $\xi_{K}\in\mathbb{R}^{\infty}$ as

[TABLE]

and consider an infinite dimensional extension to operator $\mathbb{G}_{K}$ of (23):

[TABLE]

Using these notations, the main counting equations (24) becomes $\mathbb{G}_{K}\zeta_{K}=-e_{K},$ and therefore

[TABLE]

Here ${\mathcal{N}}_{1}[K]\geq(T_{1}+2)^{K-1}\to\infty$ as $K\to\infty$ , and hence the strong Horton law for mean branch numbers (Def. 19) is equivalent to the existence of a limit solution $\lim\limits_{K\rightarrow\infty}\xi_{K}=\xi$ to an infinite dimensional linear operator equation

[TABLE]

with coordinates $\xi(k)=R^{1-k}$ .

4.1 Proof of Theorem 1

First, we establish (Prop. 4) necessary and sufficient conditions for the existence of the strong Horton law. Then we show that these conditions are satisfied and express the value of the Horton exponent $R$ via the Tokunaga coefficients $\{T_{j}\}$ .

Proposition 4.

Let $\mu$ be a mean Horton self-similar measure on $\mathcal{BT}$ . Suppose that the limit

[TABLE]

exists and is finite. Then, the strong Horton law for mean branch numbers holds; that is, for each positive integer $k$ ,

[TABLE]

Conversely, if the limit (42) does not exist, then the limit in the left hand side of (43) also does not exist, at least for some $k$ .

Proof.

Suppose the limit (42) exists and is finite. Proposition 3 implies that for any fixed integer $m\geq 1$

[TABLE]

Thus, for any fixed integer $k\geq 2$ ,

[TABLE]

Conversely, suppose the limit $\lim\limits_{K\rightarrow\infty}{{\mathcal{N}}_{1}[K+1]\over{\mathcal{N}}_{1}[K]}$ does not exist. Taking $k=2$ , we obtain by Prop. 3

[TABLE]

Thus $\lim\limits_{K\rightarrow\infty}{{\mathcal{N}}_{2}[K]\over{\mathcal{N}}_{1}[K]}$ diverges. ∎

Next, we express ${\mathcal{N}}_{1}[K]$ via the elements of the Tokunaga sequence $\{T_{j}\}_{j=1,2,\ldots}$ that satisfy condition (33). The quantity ${\mathcal{N}}_{1}[K+1]$ can be computed by counting, and expressed via convolution products as follows:

[TABLE]

where $\delta_{0}(j)$ is the Kronecker delta, and therefore, $(t+\delta_{0})(0)=0$ . Hence, taking the $z$ -transform of ${\mathcal{N}}_{1}[K]$ , we obtain

[TABLE]

for $|z|$ small enough. Recalling the definition (32) establishes (34):

[TABLE]

Since $T_{j}\geq 0$ for any $j\geq 1$ , the function $\hat{t}(z)=-1+2z+\sum\limits_{j=1}^{\infty}z^{j}T_{j}$ has a single real root $w_{0}$ in the interval $(0,1/2]$ . Our goal is to show that the Horton exponent $R$ is reciprocal to $w_{0}$ . We begin by showing that $w_{0}$ is the root of $\hat{t}(z)$ closest to the origin.

Lemma 3.

Let $w_{0}$ be the only real root of $\hat{t}(z)=-1+2z+\sum\limits_{j=1}^{\infty}z^{j}T_{j}$ in the interval $\left(0,1/2\right]$ . Then, for any other root $w$ of $~{}\hat{t}(z)$ , we have $|w|>w_{0}.$

Proof.

Since $\{T_{j}\}$ are all nonnegative reals, we have $\overline{\hat{t}(\bar{z})}=\hat{t}(z)$ . The radius of convergence of $\sum\limits_{j=1}^{\infty}z^{j}T_{j}$ must be greater than $w_{0}$ . Suppose $w=re^{i\theta}$ ( $0\leq\theta<2\pi$ ) is a root of magnitude at most $w_{0}$ . That is $~{}\hat{t}(w)=0~{}$ and $r:=|w|\leq w_{0}.$ Then $~{}\hat{t}(\bar{w})=0~{}$ and

[TABLE]

If $r<w_{0}$ , then

[TABLE]

arriving to a contradiction. Thus $r=w_{0}$ .

Next we show that $\theta=0$ . Suppose not. Then

[TABLE]

arriving to another contradiction. Hence $r=w_{0}$ , $\theta=0$ , and $w=w_{0}$ . ∎

Let $L=\limsup\limits_{j\rightarrow\infty}T_{j}^{1/j}$ . Then $L^{-1}$ is the radius of convergence of $\hat{t}(z)$ (we set $L^{-1}=\infty$ if $L=0$ ), and $L^{-1}>w_{0}.$ Lemma 3 asserts that there exists a positive real $\gamma\in(w_{0},L^{-1})$ such that

[TABLE]

Accordingly, for $0<\rho<w_{0}$

[TABLE]

Observe that $Res\left({1\over\hat{t}(z)z^{K}};w_{0}\right)$ is a constant multiple of $w_{0}^{-K}$ since $w_{0}$ is a root of $\hat{t}(z)$ of algebraic multiplicity one. Thus, since $w_{0}<\gamma$ and

[TABLE]

we have

[TABLE]

Proposition 4 now implies the following lemma.

Lemma 4.

Suppose $~{}\limsup\limits_{j\rightarrow\infty}T_{j}^{1/j}<\infty$ . Then, for each positive integer $k$

[TABLE]

Moreover,

[TABLE]

To establish the converse we need the following statement.

Proposition 5.

Suppose $\mu$ is a mean Horton self-similar measure on $\mathcal{BT}$ with Tokunaga sequence $\{T_{j}\}_{j\geq 1}$ . Then

[TABLE]

for all $j\in\mathbb{N}$ and $(K-1)\in j\mathbb{N}.$

Proof.

Fix any $j\geq 1$ . The main counting equations (22) show that for any integer $m\geq 0$

[TABLE]

Accordingly,

[TABLE]

given $mj+1\leq K$ . Choosing $m=(K-1)/j$ we obtain

[TABLE]

∎

Suppose the limit

[TABLE]

exists and is finite. Proposition 5 asserts that ${\mathcal{N}}_{1}[K]^{1/(K-1)}\geq T_{j}^{1/j}~{}$ for all $j\in\mathbb{N}$ and $(K-1)\in j\mathbb{N}$ . Hence,

[TABLE]

We summarize this in a lemma.

Lemma 5.

Suppose $~{}\limsup\limits_{j\rightarrow\infty}T_{j}^{1/j}=\infty$ . Then, the limit $~{}\lim\limits_{K\rightarrow\infty}{{\mathcal{N}}_{k}[K]\over{\mathcal{N}}_{1}[K]}$ does not exist at least for some $k$ .

Finally, Thm. 1 follows from Lem. 4 and Lem. 5.

4.2 Well-defined asymptotic Horton ratios

The setting for Horton law in (27) and (29) can be generalized beyond randomizing the tree measure with respect to Horton-Strahler orders as in (7). For instance, as it will be the case with the combinatorial critical binary Galton-Watson trees $\mathcal{GW}\left({1\over 2},{1\over 2}\right)$ in (63), the tree measure may be randomized with respect to the number of leaves in a tree. A general set up for the Horton laws is described below.

Let ${\mathcal{Q}}_{n}$ , $n\in\mathbb{N}$ , be a sequence of probability measures on $\mathcal{T}$ . We write $N_{j}^{({\mathcal{Q}}_{n})}$ for the number of branches of Horton-Strahler order $j\geq 1$ in a tree generated according to ${\mathcal{Q}}_{n}$ .

Definition 20 (Well-defined asymptotic Horton ratios).

We say that a sequence of probability measures $\{{\mathcal{Q}}_{n}\}_{n\in\mathbb{N}}$ has well-defined asymptotic Horton ratios if for each $j\geq 1$

[TABLE]

where ${\mathcal{N}}_{j}$ is a constant, called the asymptotic Horton ratio of the branches of order $j$ .

Sometimes it is possible to establish a stronger limit than in (48). One such example is the almost sure convergence in equation (130) of Sect. 6.6.2.

For a sequence of well-defined asymptotic Horton ratios ${\mathcal{N}}_{j}$ , the Horton law states that ${\mathcal{N}}_{j}$ decreases in a geometric fashion as $j$ goes to infinity. We consider three particular forms of geometric decay.

Definition 21 (Root, ratio, and strong Horton laws).

Consider a sequence $\{{\mathcal{Q}}_{n}\}_{n\in\mathbb{N}}$ of probability measures on $\mathcal{T}$ with well-defined asymptotic Horton ratios (Def. 20). Then, the sequence $\{{\mathcal{Q}}_{n}\}$ is said to obey

•

a root-Horton law if the following limit exists: $\lim\limits_{j\rightarrow\infty}\Big{(}{\mathcal{N}}_{j}\Big{)}^{-{1\over j}}=R$ ;

•

a ratio-Horton law if the following limit exists: $\lim\limits_{j\rightarrow\infty}{{\mathcal{N}}_{j}\over{\mathcal{N}}_{j+1}}=R$ ;

•

a strong Horton law if the following limit exists: $\lim\limits_{j\rightarrow\infty}\big{(}{\mathcal{N}}_{j}R^{j}\big{)}=const$ .

The constant $R$ is called the Horton exponent. In each case, we require the Horton exponent $R$ to be finite and positive.

Observe that the Horton laws in Def. 21 above are listed in the order from weaker to stronger.

4.3 Entropy and information theory

The information theoretical aspects of self-similar trees were not addressed until very recently. This section reviews recent results by Chunikhina [33, 34], where the entropy rate is computed for trees that satisfy the strong Horton law for branch numbers (Def. 18) and for Tokunaga self-similar trees (Def. 17) as a function of the respective parameters, $R$ and $(a,c)$ .

Consider a subspace $\mathcal{T}_{N_{1},\ldots,N_{K}}$ of $\mathcal{BT}_{\rm plane}^{|}$ of trees of a given order ${\sf ord}(T)=K$ and given admissible ( $N_{K}=1$ , $N_{j}\geq 2N_{j+1}$ ) branch counts $N_{1},N_{2},\ldots,N_{K}$ :

[TABLE]

In [33], Chunikhina finds the size of $\mathcal{T}_{N_{1},\ldots,N_{K}}$ , providing an alternative form of expression that was first derived by Shreve [124].

Lemma 6 (Branch counting lemma, [33]).

[TABLE]

Subsequently, Lem. 6 is used to find the entropy rate for trees that satisfy the strong Horton law (Def. 18) with exponent $R>2$ .

Theorem 2 (Entropy rate for Horton self-similar trees, [33]).

For a given $R>2$ , let $T$ be a random tree, uniformly sampled from the space

[TABLE]

where $\epsilon\in(0,R)$ is a given small quantity. Then, the entropy rate

[TABLE]

where

[TABLE]

is the binary entropy function illustrated in Fig. 19(a). The entropy rate $\mathcal{H}_{\infty}(R)$ is illustrated in Fig. 19(b).

Notice that the trees in $\mathcal{T}_{R,K}$ satisfy the strong Horton law (Def. 18) with the Horton exponent $R$ , and $2R^{K-1}$ is the asymptotic number of nodes in a tree $T$ from $\mathcal{T}_{R,K}$ .

Remark 7.

It is an easily verified fact that a random tree $T$ selected uniformly from the subspace

[TABLE]

of $\mathcal{BT}_{\rm plane}^{|}$ containing only the trees with $N$ leaves ( $2N$ nodes and $2N-1$ edges) is distributed as a random tree sampled from the critical plane Galton-Watson distribution $\mathcal{GW}_{\rm plane}\left({1\over 2},{1\over 2}\right)$ , conditioned on $\#T=2N-1$ , i.e.,

[TABLE]

Consequently, we have that

[TABLE]

The number $\big{|}\mathcal{BT}_{\rm plane}^{|}(N)\big{|}$ of different combinatorial shapes of rooted planted plane binary trees with $N$ leaves and $2N-1$ edges, is given by $C_{N-1}$ , where $C_{n}$ denotes the Catalan number defined as

[TABLE]

Using $\big{|}\mathcal{BT}_{\rm plane}^{|}(N)\big{|}=C_{N-1}$ and Stirling’s formula, it is observed in [33] that the entropy rate for a tree $T^{\prime}$ , selected uniformly from $\mathcal{BT}_{\rm plane}^{|}(N)$ is

[TABLE]

Thus, scaling by the asymptotic number of nodes $2R^{K-1}$ in Thm. 2 implies

[TABLE]

Indeed, by definition of the corresponding spaces,

[TABLE]

where the union is taken over $N$ ranging from

[TABLE]

and therefore

[TABLE]

Hence, for the following limits known to converge, we have

[TABLE]

Moreover, scaling by the asymptotic number of nodes $2R^{K-1}$ in Thm. 2 enables representing $\mathcal{H}_{\infty}(R)$ as the limit ratio of the entropy for Horton self-similar trees with parameter $R$ to the entropy for uniformly selected binary trees. Specifically, let $T$ be a random tree sampled uniformly from the space $\mathcal{T}_{R,K}$ and let $T^{\prime}$ be a random tree sampled uniformly from the space $\mathcal{BT}_{\rm plane}^{|}(N)$ with $N=R^{K-1}$ . Then, equations (49) and (55) imply that $\mathcal{H}_{\infty}(R)$ is the the limit ratio of entropies as the space sizes grow with $K\rightarrow\infty$ :

[TABLE]

As an important consequence of Thm. 2, a special place of the parameter $R=4$ is established amongst all Horton exponents $R\in[2,\infty)$ as

[TABLE]

Not surprisingly, $R=4$ is the parameter value for the strong Horton law results we will encounter in Sect. 5, primarily in the context of the critical binary Galton-Watson tree $\mathcal{GW}\left({1\over 2},{1\over 2}\right)$ . Indeed, as stated in Rem. 7, the tree $T^{\prime\prime}=\textsc{shape}(T^{\prime})\in\mathcal{BT}^{|}$ in (56) is a random tree sampled from the Galton-Watson distribution $\mathcal{GW}\left({1\over 2},{1\over 2}\right)$ conditioned on $\#T^{\prime\prime}=2N-1$ .

In [34], Chunikhina extended the results in [33] by counting the number of trees with the given merger numbers $N_{i,j}$ (see Sect. 3.3), and finding the entropy rates for the Tokunaga self-similar trees (Def. 17) represented as a function of the parameters $(a,c)$ . For a given integer $K>1$ , consider a finite sequence of admissible branch counts $\{N_{i}\}_{i=1,\ldots,K}$ , and a finite sequence of admissible branch numbers $\{N_{i,j}\}_{1\leq i<j\leq K}$ . Admissibility means that for all $i\leq K-1$ ,

[TABLE]

as all $N_{i}$ branches of Horton-Strahler order $i$ have to merge into a higher order branch (either two branches of order $i$ merge and originate a branch of order $i+1$ , or a branch of order $i$ merges into a branch of order $j>i$ ). Consider the subspace

[TABLE]

Lemma 7 (Side branch counting lemma, [34]).

[TABLE]

Lemma 7 is used to obtain the following asymptotic results. Consider Tokunaga self-similar tree with parameters $(a,c)$ . Such a tree satisfies the strong Horton law for mean branch numbers (Def. 19) with the Horton exponent (37)

[TABLE]

Next, similarly to $\mathcal{T}_{R,K}$ , one can define the space $\mathcal{T}_{a,c,K}$ of asymptotically Tokunaga self-similar trees of order $K$ . Informally, this space includes the trees in $\mathcal{BT}_{\rm plane}^{|}$ such that

[TABLE]

where $R=R(a,c)$ , and the asymptotic equality $\sim$ is taken as $K\to\infty$ .

Theorem 3 (Entropy rate for Tokunaga self-similar trees, [34]).

For given $a,c>0$ , let $T$ be a random tree, uniformly sampled from the space $\mathcal{T}_{a,c,K}$ . Then, the entropy rate

[TABLE]

Figure 20(a) illustrates the entropy rate $\mathcal{H}_{\infty}(a,c)$ .

If $a=c-1$ , then $R=2c$ by (37), and the equation (3) simplifies, leading to the following corollary.

Corollary 1 ([34]).

Let $T$ be a random tree, uniformly sampled from the space $\mathcal{T}_{a,c,K}$ with $c>1$ and $a=c-1$ . Then $T$ satisfies the strong Horton law (29) with $R=2c$ , and the entropy rate is given by

[TABLE]

where $H(z)$ is the binary entropy function (50) and $\mathcal{H}_{\infty}(R)$ is defined by (49).

Figure 20(b) illustrates this result, by showing how the difference of entropy rates $\mathcal{H}_{\infty}(a,c)-\mathcal{H}_{\infty}(R)$ decreases away from the line $a=c-1$ . The special place for the line $a=c-1$ within the parameter space of the Tokunaga self-similar random trees was observed earlier in [139, 83, 84]. See Remark 11. The constraint $a=c-1$ will reappear in many instances in Sect. 6 of the present work.

Finally, the maximum value $\max\mathcal{H}_{\infty}(a,c)=1$ is achieved at the special point $(a,c)=(1,2)$ of the special line $a=c-1$ . Once again, this is not surprising as $(a,c)=(1,2)$ is the parameter value for the Tokunaga self-similarity results of Sect. 5, presented in the context of the critical binary Galton-Watson trees $\mathcal{GW}\left({1\over 2},{1\over 2}\right)$ and related processes. We recall that the combinatorial shape $T^{\prime\prime}=\textsc{shape}(T^{\prime})\in\mathcal{BT}^{|}$ of the random binary tree $T^{\prime}$ in (55) is distributed according to $\mathcal{GW}\left({1\over 2},{1\over 2}\right)$ conditioned on $\#T^{\prime\prime}=2N-1$ .

4.4 Applications

A quantitative understanding of the branching patterns is instrumental in hydrology [120, 132, 96, 15, 27, 76], geomorphology [38, 67], statistical seismology [13, 135, 69, 154, 60, 151, 149], statistical physics of fracture [121], vascular analysis [72], brain studies [32], ecology [30], biology [137], and beyond, encouraging a rigorous treatment. Introduced in hydrology to describe the dendritic structure of river networks, which is among the most evident examples of natural branching, Horton-Strahler [70, 129] and Tokunaga [133] indexing schemes have been rediscovered and used in other fields. Subsequently, the Horton law (Def. 18) and Tokunaga self-similarities (Def. 17) have been empirically or rigorously established in numerous observed and modeled systems [108]. This includes hydrology (see Sect. 4.4.1), vein structure of botanical leaves [108, 137], diffusion limited aggregation [111, 97, 147], two dimensional site percolation [136, 145, 152, 153], a hierarchical coagulation model of Gabrielov et al. [58] introduced in the framework of self-organized criticality, and a random self-similar network model of Veitzer and Gupta [139] developed as an alternative to the Shreve’s random topology model for river networks. The Horton exponent commonly reported in empirical studies is within the range $3<R<6$ . Curiously, it has been observed in [83] that the critical Tokunaga model (Sect. 6.5) with this range of Horton exponents generates trees with fractal dimension in the range $\approx(1.6,3)$ , which includes all the trees that may exist in a three-dimensional world, excluding the range $<1.6$ that corresponds to almost “linear”, and probably less studied, trees.

4.4.1 Hydrology

An illuminating natural example of Horton laws and Tokunaga self-similarity is given by the combinatorial structure of river networks (Figs. 2,3). The hydrological Horton law was first described by Robert E. Horton [70] who noticed that the empirical ratio $N_{K}/N_{K+1}$ in river streams is close to $4$ . This observation has been strongly corroborated in numerous observational studies [75, 124, 91, 113, 131, 62, 120, 101, 134]. See Barndorff-Nielsen [17] for a 1993 survey for probabilists.

Write $Z_{K}$ for the value of a selected statistic $Z$ averaged over basins/channels of order $K$ . This can be basin area, basin magnitude (number of leaves in the tree that describes the basin), the lengths of the longest channel, the total channel lengths, etc. The Horton law approximates the growth of $Z_{K}$ with order as a geometric sequence: $Z_{K}~{}\propto~{}R_{Z}^{K}$ with $R_{Z}>1$ . Informally, this suggests that the order $K$ of a channel (branch) or a subbasin (subtree) is proportional to $\ln(Z_{K})$ , where $Z_{K}$ can be interpreted as the channel/basin “size”. If statistic $Z$ satisfies the Horton law with exponent $R_{Z}$ , and the branch counts $N_{K}$ satisfy the Horton law (1) with Horton exponent $R$ , then

[TABLE]

A similar power relation holds for any pair of statistics that satisfy the Horton law. A well studied example is the Hack’s law that relates the length $L$ of the longest stream to the basin area $A$ via $L~{}\propto~{}A^{h}$ with $h\approx 0.6$ [119].

Furthermore, it has been shown that river networks are closely approximated by a two-parametric Tokunaga self-similar model (Def. 17) with parameters that are independent of river’s geographic location [133, 113, 38, 155]. The Tokunaga model closely predicts values of the Horton exponents for multiple basin statistics with only two parameters (see Fig. 3).

Discovery of the Horton law prompted exploration of various branching models, most popular of which is the critical binary Galton-Watson tree (Sect. 5), also known in hydrology as Shreve’s random topology model [124, 125]; it is conditionally equivalent to the uniform distribution on planar binary trees with a fixed number of leaves [116]. This model has the Horton exponent $R=4$ and Tokunaga parameters $(a,c)=(1,2)$ ; see Thm. 4. For long time, the critical binary Galton-Watson tree has remained the only well-known probability model for which the Horton and Tokunaga self-similarity was rigorously established, and whose Horton-Strahler ordering has received attention in the literature [124, 125, 73, 17, 36, 113, 112, 143, 148, 29]. The model has been particularly popular in hydrology as an approximation to the topology of the observed river networks [132]. Scott Peckham [113] has first explicitly noticed, by performing a high-precision extraction of river channels for Kentucky River, Kentucky and Powder River, Wyoming, that the Horton exponents and Tokunaga parameters for the observed rivers significantly deviate from that for the Galton-Watson model. He reported values $R\approx 4.6$ and $(a,c)\approx(1.2,2.5)$ and emphasized the importance of studying a broad range of Horton exponents and Tokunaga parameters. The general interest to fractals and self-similar structures in natural sciences during the $1990$ s resulted in a quest, mainly inspired and led by Donald Turcotte, for Tokunaga self-similar tree graphs of diverse origin. As a result, the Horton and Tokunaga self-similarity, with a broad range of respective parameters, have been empirically or rigorously established in numerous observed and modeled systems, well beyond river networks.

4.4.2 Computer science

The Horton-Strahler orders are known in computer science as the register function or register number. They first appeared in the $1958$ paper by Ershov [49] as the minimal number of memory registers required for evaluating a binary arithmetic expression.

A study of Flajolet et al. [55] concerns calculating the average register function in a random plane planted binary tree with $n$ leaves. That is, let the random tree $T$ be uniformly sampled from all $C_{n-1}$ trees in the subspace $\mathcal{BT}_{\rm plane}^{|}(n)$ of $\mathcal{BT}_{\rm plane}^{|}$ defined in (51), where $C_{n}$ is the Catalan number (54). Following Rem. 7, we know that the combinatorial shape $\textsc{shape}(T)\in\mathcal{BT}^{|}$ of such binary tree $T$ can also be obtained by sampling from the Galton-Watson distribution $\mathcal{GW}\left({1\over 2},{1\over 2}\right)$ conditioned on $\#T=2n-1$ . The work [55] finds the average register function (Horton-Strahler order) in a random binary tree $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf Unif}\big{(}\mathcal{BT}_{\rm plane}^{|}(n)\big{)}$ ,

[TABLE]

where $v_{2}(n)$ is known as the dyadic valuation of $n$ . Specifically, the dyadic valuation of $n$ is the cardinality of the inverse image of

[TABLE]

i.e., $v_{2}(n)=\big{|}\{(p,k)\in\mathbb{Z}_{+}\times\mathbb{N}~{}:~{}k2^{p}=n\}\big{|}$ .

In addition, Flajolet et al. [55] proved that for $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf Unif}\big{(}\mathcal{BT}_{\rm plane}^{|}(n)\big{)}$ ,

[TABLE]

where $D(\cdot)$ is a particular continuous periodic function of period one, explicitly derived in [55]. We illustrate Eq. (59) below in Fig. 50(a), which closely reproduces Fig. 6 from the original paper by Flajolet et al. [55]. Equation (59) is related to the tree size asymptotic (35) of Thm. 1, with the Horton exponent $R=4$ .

For more on register functions see [56, 117, 104, 41, 64] and references therein.

5 Critical binary Galton-Watson tree

The critical binary Galton-Watson tree is pivotal for the theory of random trees and for diverse applications because of its transparent generation process and multiple symmetries. This section summarizes some properties of this tree used in our further discussion.

5.1 Combinatorial case

Here we discuss the combinatorial binary Galton-Watson trees.

5.1.1 Horton and Tokunaga self-similarities

Burd, Waymire, and Winn [29] have first recognized a special position held by the critical binary tree with respect to the Horton pruning in the space of Galton-Watson distributions $\mathcal{GW}(\{q_{k}\})$ on $\mathcal{T}^{|}$ . We now state the main result of [29] using the language of the present work.

Theorem 4 (Horton self-similarity of Galton-Watson trees, [29]).

Consider a collection of Galton-Watson measures $\mathcal{GW}(\{q_{k}\})$ on $\mathcal{T}^{|}$ . The following statements are equivalent:

(a)

A distribution is Horton self-similar (Def. 10);

(b)

A distribution is mean Horton self-similar (Def. 14,16);

(c)

*A distribution is Tokunaga self-similar (Def. 17); *

(d)

A distribution is critical binary: $q_{0}=q_{2}=1/2$ .

Furthermore, the critical binary distribution has Tokunaga sequence $T_{j}=2^{j-1}$ , $j\geq 1$ , which corresponds to Tokunaga self-similarity with $(a,c)=(1,2)$ and strong Horton law with exponent $R=4$ .

The following statement provides a useful characterization of the critical binary Galton-Watson tree.

Proposition 6 ([29]).

Suppose $T\stackrel{{\scriptstyle d}}{{\sim}}\mathcal{GW}(1/2,1/2)$ . Then, the tree order ${\sf ord}(T)$ has geometric distribution:

[TABLE]

Furthermore, let $b_{j}$ be a branch of order $j\geq 2$ in $T$ selected uniformly and randomly among all branches of order $j$ in $T$ . Then, the total number $m_{j}\geq 0$ of side branches within the branch $b_{j}$ is geometrically distributed:

[TABLE]

In particular,

[TABLE]

where $T_{i}=2^{i-1}$ , $i\geq 1$ , are the Tokunaga coefficients. Conditioned on $m_{j}$ , each side branch within $b_{j}$ is assigned order $i$ independently of other side branches with probability

[TABLE]

Notably, critical non-binary Galton-Watson trees converge to the critical binary tree under consecutive Horton pruning, as described in the following statement.

Theorem 5 (Attraction property of critical binary Galton-Watson tree, [29]).

Suppose a Galton-Watson measure $\mu\equiv\mathcal{GW}(\{q_{k}\})$ on $\mathcal{T}^{|}$ satisfies the following conditions:

•

The measure $\mu$ is critical, i.e. $q_{1}\neq 1$ and $\sum_{k}kq_{k}=1$ ;

•

The measure $\mu$ has a.s. bounded offspring number, i.e. there exists such $j_{0}\geq 2$ that $q_{j}=0$ for any $j\geq j_{0}$ .

Then, for any $\tau\in\mathcal{T}^{|}$

[TABLE]

where $\mu^{*}$ denotes the critical binary Galton-Watson measure on $\mathcal{T}^{|}$ :

[TABLE]

The Markov structure of the Galton-Watson tree $T\stackrel{{\scriptstyle d}}{{\sim}}\mathcal{GW}(\{q_{k}\})$ ensures the existence of the following additional properties:

(i)

The forest of trees obtained by removing the edges and the vertices below combinatorial depth $d\geq 0$ has the same frequency structure as the original space $\mathcal{GW}(\{q_{k}\})$ ;

(ii)

A subtree rooted in a uniform random vertex of $T$ has the same distribution as $T$ ; and

(iii)

The forest of trees obtained by considering subtrees rooted at every vertex of $T$ approximates the frequency structure of the entire space of trees when the order of $T$ increases.

We define these properties more formally in Sect. 6.7. Combined with the Horton self-similarity of Thm. 4, they further highlight very special symmetries of the critical binary Galton-Watson distribution $\mathcal{GW}({1\over 2},{1\over 2})$ . Stated loosely, this distribution is invariant with respect to various form of cutting, either from the leaves down or from the root up. Moreover, this is the only distribution that enjoys all these invariances in the family of Galton-Watson distributions $\mathcal{GW}(\{q_{k}\})$ . Analysis of real world data (e.g. [113, 108]), however, reveals self-similar tree-like structures with Tokunaga parameters and Horton exponents different from those in the critical binary Galton-Watson model. This motivates one to look for invariant tree models outside of the Galton-Watson family. In Sect. 6.5, we construct a one parameter family of trees, called critical Tokunaga trees, that inherit all the invariant properties mentioned in this section and include the critical binary Galton-Watson tree as a special case. In particular, it generates self-similar trees with Horton exponents $2\leq R<\infty$ .

5.1.2 Dynamics of branching probabilities under Horton pruning

The following result of Burd et al. [29] clarifies the Horton self-similarity of the critical binary Galton-Watson tree and absence of such in non-critical case.

Theorem 6 (Dynamics of branching [29, Proposition 2.1]).

Consider a critical or subcritical combinatorial binary Galton-Watson probability measure $\mu_{0}=\mathcal{GW}(q_{0},q_{2})$ on $\mathcal{BT}^{|}$ , i.e. require $q_{0}+q_{2}=1$ and $q_{2}\leq 1/2$ . Construct a recursion by repeatedly applying Horton pruning operation $\mathcal{R}$ as follows. Starting with $k=0$ , and for each consecutive integer, let $\nu_{k}=\mathcal{R}_{*}(\mu_{k})$ be the pushforward probability measure induced by the pruning operator, i.e.,

[TABLE]

and set

[TABLE]

Then for each $k\geq 0$ , distribution $\mu_{k}(T)$ is a binary Galton-Watson distribution $\mathcal{GW}(q_{0}^{(k)},q_{2}^{(k)})$ with $q_{0}^{(k)}$ and $q_{2}^{(k)}$ constructed recursively as follows: start with $q_{0}^{(0)}=q_{0}$ and $q_{2}^{(0)}=q_{2}$ , and let

[TABLE]

Consequently, a combinatorial binary Galton-Watson probability distribution $\mathcal{GW}(q_{0},q_{2})$ is prune-invariant as in the Def. 8 if and only if it is critical, i.e.,

[TABLE]

5.1.3 The Central Limit Theorem and the strong Horton law for branch counts

For a given $N\in\mathbb{N}$ , consider $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf Unif}\big{(}\mathcal{BT}_{\rm plane}^{|}(N)\big{)}$ . Following Remark 7, we know that $\textsc{shape}(T)\stackrel{{\scriptstyle d}}{{\sim}}\left(\mathcal{GW}\left({1\over 2},{1\over 2}\right)\Big{|}\#T=2N-1\right)$ . The branch counts

[TABLE]

are integer valued random variables induced by $T$ . They are the same for $T$ and $\textsc{shape}(T)$ , i.e., $N_{j}^{(N)}[\textsc{shape}(T)]=N_{j}^{(N)}[T]$ . The following Law of Large Numbers was proved in Wang and Waymire [143] (Theorem 2.1).

Theorem 7 (LLN for order two branches, [143]).

For a random tree $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf Unif}\big{(}\mathcal{BT}_{\rm plane}^{|}(N)\big{)}$ ,

[TABLE]

Recall that we know from Theorem 6 that the critical binary Galton-Watson tree is invariant under the Horton pruning operation $\mathcal{R}$ . Thus, the strong Horton law for branch numbers is deduced from Theorem 7 as follows.

Corollary 2 (The strong Horton law for branch counts).

For a random tree $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf Unif}\big{(}\mathcal{BT}_{\rm plane}^{|}(N)\big{)}$ and for all $j\in\mathbb{N}$ ,

[TABLE]

Proof.

For a fixed integer $k>1$ and a tree $T^{\rm GW}\stackrel{{\scriptstyle d}}{{\sim}}\mathcal{GW}\left({1\over 2},{1\over 2}\right)$ , we have for any positive integers $N$ and $M\leq 2^{-(k-1)}N$ ,

[TABLE]

as $\mathcal{R}^{k-1}(T^{\rm GW})\stackrel{{\scriptstyle d}}{{=}}T^{\rm GW}$ by the Horton prune-invariance Theorem 6 (and a more general statement in Theorem 24 of Sect. 9.4). The first equality in (64) can be easily verified from permutability of attachments of smaller order branches to the larger order branches. Specifically, the event $N_{k}^{(N)}[T^{\rm GW}]=M$ is equivalent to the event that the pruned tree $\mathcal{R}^{k-1}\big{(}T^{\rm GW}\big{)}$ will have $\#\mathcal{R}^{k-1}\big{(}T^{\rm GW}\big{)}=2M-1$ edges. Thus, conditioned of the combinatorial shape $\mathcal{R}^{k-1}\big{(}T^{\rm GW}\big{)}$ , all complete subtrees $T_{v}$ (see Def. 5(6)) of $T$ such that ${\sf ord}(T_{v})={\sf ord}(v)<k$ and ${\sf ord}({\sf parent}(v))\geq k$ will be attached to the edges and leaves of $\mathcal{R}^{k-1}\big{(}T^{\rm GW}\big{)}$ in the same number of ways, for each $\mathcal{R}^{k-1}\big{(}T^{\rm GW}\big{)}$ satisfying $\#\mathcal{R}^{k-1}\big{(}T^{\rm GW}\big{)}=2M-1$ edges.

Thus, for a fixed $k\in\mathbb{N}$ and a random tree

[TABLE]

we have by (64),

[TABLE]

for all $M\leq 2^{-(k-1)}N$ . Hence, Thm. 7 implies

[TABLE]

Next, we let ${0\over 0}=0$ as here $N_{k}^{(N)}[T]\leq N_{k-1}^{(N)}[T]$ , and

[TABLE]

Then, as $\lim\limits_{N\rightarrow\infty,}{\sf P}\big{(}{\sf ord}(T)<k\big{)}=0$ we have

[TABLE]

Finally, iterating (65), we obtain

[TABLE]

∎

Following Theorem 7, the corresponding Central Limit Theorem was proved in Wang and Waymire [143] (Theorem 2.4).

Theorem 8 (CLT for order two branches, [143]).

For a random tree $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf Unif}\big{(}\mathcal{BT}_{\rm plane}^{|}(N)\big{)}$ ,

[TABLE]

Next, using the pruning framework, the following Central Limit Theorem for $N_{j}^{(N)}[T]$ is readily obtained as a direct consequence of the original Theorem 8 of Wang and Waymire [143] and the Horton prune-invariance (Def. 8) of $\mathcal{GW}\left({1\over 2},{1\over 2}\right)$ as stated in Theorem 6, and a more general statement that will appear in Theorem 24 of Sect. 9.4.

Corollary 3 (CLT for branch numbers, [146]).

For a random tree $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf Unif}\big{(}\mathcal{BT}_{\rm plane}^{|}(N)\big{)}$ ,

[TABLE]

where we set ${0\over 0}=0$ .

Proof.

Pruning $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf Unif}\big{(}\mathcal{BT}_{\rm plane}^{|}(N)\big{)}$ iteratively $j-1$ times, we obtain $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf Unif}\Big{(}\mathcal{BT}_{\rm plane}^{|}\big{(}N_{j}^{(N)}[T]\big{)}\Big{)}$ , where for the case when $j>{\sf ord}(T)$ and $N_{j}^{(N)}[T]=0$ , we set $\mathcal{BT}_{\rm plane}^{|}(0):=\{\phi\}$ . Hence, Theorem 8 immediately implies

[TABLE]

Thus, substituting (63) into (68), we obtain (67). ∎

The limit (67) was derived by Yamamoto [146] directly, after a series of technically involved calculations.

5.2 Metric case

In this section we turn to the trees in $\mathcal{BL}^{|}$ . In particular, we will assign i.i.d. exponential lengths to the edges of a critical plane binary Galton-Watson tree $\mathcal{GW}_{\rm plane}({1\over 2},{1\over 2})$ in $\mathcal{T}^{|}$ , thus obtaining what will be called the exponential critical binary Galton-Watson tree.

Definition 22 (Exponential critical binary Galton-Watson tree).

We say that a random tree $T\in\mathcal{BL}_{\rm plane}^{|}$ is an exponential critical binary Galton-Watson tree with (edge length) parameter $\lambda>0$ , and write $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(\lambda)$ , if the following conditions are satisfied:

(i)

p-shape*( $T$ ) is a critical plane binary Galton-Watson tree $\mathcal{GW}_{\rm plane}({1\over 2},{1\over 2})$ ;*

(ii)

conditioned on a given p-shape( $T$ ), the edges of $T$ are sampled as independent ${\sf Exp}(\lambda)$ random variables, i.e., random variables with probability density function (p.d.f.)

[TABLE]

The branching process that generates an exponential critical binary Galton-Watson tree is known as the continuous time Galton-Watson process, and is sometimes simply called Markov branching process [66].

5.2.1 Length of a Galton-Watson random tree ${\sf GW}(\lambda)$

Recall the modified Bessel functions of the first kind

[TABLE]

Lemma 8.

Suppose $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(\lambda)$ is an exponential critical binary Galton-Watson tree with parameter $\lambda$ . The total length $\textsc{length}(T)$ of the tree $T$ has the p.d.f.

[TABLE]

Proof.

Recall that the number of different combinatorial shapes of a planted plane binary tree with $n+1$ leaves, and therefore $2n+1$ edges, is given by the Catalan number (54), i.e.,

[TABLE]

The total length of $2n+1$ edges is a gamma random variable with parameters $\lambda$ and $2n+1$ and density function

[TABLE]

Hence, the total length of the tree $T$ has the p.d.f.

[TABLE]

∎

Next, we compute the Laplace transform of $\ell(x)$ . By the summation formula in (5.2.1),

[TABLE]

where we let $Z={\lambda\over 2(\lambda+s)}$ , and the characteristic function of Catalan numbers

[TABLE]

is well known. Therefore

[TABLE]

Note that the Laplace transform $\mathcal{L}\ell(s)$ could be derived from the total probability formula

[TABLE]

where $\phi_{\lambda}(x)$ is the exponential p.d.f. (69). Thus, $\mathcal{L}\ell(s)$ solves

[TABLE]

Corollary 4.

The p.d.f. $f(x)$ of the length of an excursion in an exponential symmetric random walk with parameter $\lambda$ is given by

[TABLE]

Proof.

Observe that the excursion has twice the length of a tree ${\sf GW}(\lambda)$ . ∎

5.2.2 Height of a Galton-Watson random tree ${\sf GW}(\lambda)$

Lemma 9 ([85]).

Suppose $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(\lambda)$ is an exponential critical binary Galton-Watson tree with parameter $\lambda$ . Then, the height $\textsc{height}(T)$ of the tree $T$ has the cumulative distribution function

[TABLE]

Proof.

The proof is based on duality between trees and positive real excursions that we introduce in Sect. 7. In particular, Thm. 18 establishes equivalence between the level set tree (Sect. 7.2) of a positive excursion of an exponential random walk (Sect. 7.6) and an exponential critical binary Galton-Watson tree ${\sf GW}(\lambda)$ . This implies, in particular, that for a tree $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(\lambda)$ the $\textsc{height}(T)$ has the same distribution as the height of a positive excursion of an exponential random walk $Y_{k}$ with $Y_{0}=0$ and independent increments $Y_{k+1}-Y_{k}$ distributed according to the Laplace density function ${\phi_{\lambda}(x)+\phi_{\lambda}(-x)\over 2}={\lambda\over 2}e^{-\lambda|x|}$ , with $\phi_{\lambda}(x)$ defined in (69).

Notice that $Y_{k}$ is a martingale. We condition on $Y_{1}>0$ , and consider an excursion $Y_{0},Y_{1},\ldots,Y_{\tau_{-}}$ with $\tau_{-}=\min\{k>1~{}:~{}Y_{k}\leq 0\}$ denoting the termination step of the excursion. For $x>0$ , we write

[TABLE]

for the probability that the height of the excursion exceeds $x$ . The problem of finding $p_{x}$ is solved using the Optional Stopping Theorem. Let

[TABLE]

Observe that

[TABLE]

For a fixed $y\in(0,x)$ , by the Optional Stopping Theorem, we have

[TABLE]

Hence,

[TABLE]

Thus,

[TABLE]

and therefore,

[TABLE]

Hence,

[TABLE]

∎

We continue examining the height function $\textsc{height}(T)$ for $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(\lambda)$ . This time, we condition on $\#T=2n-1$ , i.e., the tree $T$ has $n$ leaves and $n-1$ internal non-root vertices. We let ${\sf H}_{n}(x)$ denote the corresponding conditional cumulative distribution function,

[TABLE]

There, for a one-leaf tree,

[TABLE]

and for $n\geq 2$ , the following recursion follows from conditioning on the length of the stem (root edge),

[TABLE]

where $C_{n}$ is the Catalan number as defined in (54).

Next, we consider the following $z$ -transform:

[TABLE]

Then, (79) and (80) imply

[TABLE]

which, if we let $y=a-x$ , simplifies to

[TABLE]

We differentiate the above equation, obtaining

[TABLE]

Let

[TABLE]

be the two roots of ${\sf x}^{2}-{\sf x}+z=0$ . Here, ${\sf x}_{2}(z)/z=1/{\sf x}_{1}(z)=c(z)$ is the $z$ -transform of the Catalan sequence $C_{n}$ , introduced in (72). Then, (82) solves as

[TABLE]

where due to the initial conditions ${\sf h}(0;z)=0$ , we have $\Phi(z)={\sf x}_{1}(z)/{\sf x}_{2}(z)$ , and

[TABLE]

Solution (83) implies

[TABLE]

Here and throughout we use $-\pi<\arg(z)\leq\pi$ branch of the logarithm when defining $\sqrt{1-4z}$ for $|z|<1/4$ .

Now, since ${\sf P}\big{(}\#T=2n-1\big{)}=2C_{n-1}4^{-n}$ , the series expansion (81) implies

[TABLE]

where $z\in(-1/4,\,1/4)$ is real. We substitute (84) into the limit (85),

[TABLE]

thus obtaining an alternative proof of formula (77) in Lemma 9.

The asymptotic of the height distribution ${\sf H}_{n}(a)$ for a given number of leaves $n$ was the object of analysis in [79, 144, 61, 44]. In particular, Gupta et al. [61] extended the results of Kolchin [79], by showing that

[TABLE]

It was also observed in [61] that ${\sf H}_{\infty}\left({a\over 2\sqrt{2}}\right)$ is the distribution function for the maximum of the Brownian excursion as shown in the work of Durrett and Iglehart [43]. The results of [61] were further developed in [44] for more general trees with edge lengths.

6 Hierarchical Branching Process

Tree self-similarity has been studied primarily in terms of the average values of selected branch statistics, as defined in Sect. 3.3. Until recently, the only rigorous results have been obtained only for a very special classes of Markov trees (e.g., binary Galton-Watson trees with no edge lengths, as in Sect. 5.1). At the same time, solid empirical evidence motivates a search for a flexible class of self-similar models that would encompass a variety of observed combinatorial and metric structures and rules of tree growth. In Sec. 3.2 we introduced a general concept of self-similarity that accounts for both combinatorial and metric tree structure. In this section we will describe a model called hierarchical branching process that generates a broad range of self-similar trees (Thm. 9) and includes the critical binary Galton-Watson tree with exponential edge lengths as a special case (Thm. 13). We will also introduce a class of critical self-similar Tokunaga processes (Sect. 6.5) that enjoy additional symmetries — their edge lengths are i.i.d. random variables (Prop. 10), and subtrees of large Tokunaga trees reproduce the probabilistic structure of the entire random tree space (Prop. 11). The results of this section are derived in [84].

The results of Sect. 5 concerned a very narrow class of mean self-similar trees with $T_{j}=2^{j-1}$ . Among such trees, the self-similarity is established only for the critical binary Galton-Watson tree ${\sf GW}(\gamma)$ with independent exponential edge lengths, i.e., continuous parameter Galton-Watson binary branching Markov processes; this case corresponds to the scaling exponent $\zeta=2$ . Next, we construct a multi-type branching process [66, 11] that generates self-similar trees for an arbitrary sequence $T_{j}\geq 0$ and for any $\zeta>0$ ; it includes the critical binary Galton-Watson tree as a special case.

6.1 Definition and main properties

Consider a probability mass function $\{p_{K}\}_{K\geq 1}$ , a sequence $\{T_{k}\}_{k\geq 1}$ of nonnegative Tokunaga coefficients, and a sequence $\{\lambda_{j}\}_{j\geq 1}$ of positive termination rates. We now define a hierarchical branching process $S(t)$ .

Definition 23 (Hierarchical Branching Process (HBP)).

We say that $S(t)$ is a hierarchical branching process with a triplet of parameter sequences $\{T_{k}\}$ , $\{\lambda_{j}\}$ , and $\{p_{K}\}$ , and write

[TABLE]

if $S(t)$ is a multi-type branching process that develops in continuous time $t>0$ according to the following rules:

(i)

The process $S(t)$ starts at $t=0$ with a single progenitor (root branch) whose Horton-Strahler order (type) is $K\geq 1$ with probability $p_{K}$ .

(ii)

Every branch of order $j\leq K$ produces offspring (side branches) of every order $i<j$ with rate $\lambda_{j}T_{j-i}$ . Each offspring (side branch) is assigned a uniform random orientation (right or left).

(iii)

A branch of order $j$ terminates with rate $\lambda_{j}$ .

(iv)

At its termination time, a branch of order $j\geq 2$ splits into two independent branches of order $j-1$ . The two branches are assigned uniform random orientations, i.e., a uniformly randomly selected branch becomes right and the other becomes left.

(v)

A branch of order $j=1$ terminates without leaving offspring.

(vi)

Generation of side branches and termination of distinct branches are independent.

The definition implies that the process $S(t)$ terminates a.s. in finite time. Accordingly, the branching history of $S(t)$ creates a random binary tree $T[S]$ in the space $\mathcal{BL}_{\rm plane}^{|}$ of planted binary trees with edge lengths and planar embedding. To avoid heavy notations, we sometimes use the process distribution name ${\sf HBP}(\cdot,\cdot,\cdot)$ , as well as its various special cases introduced below, to also denote the measures induced by the process on suitable tree spaces ( $\mathcal{T}_{\rm plane}^{|}$ , $\mathcal{L}_{\rm plane}^{|}$ $\mathcal{BL}_{\rm plane}^{|}$ , etc.)

The next statement describes the branching structure of $T[S]$ .

Proposition 7 (Side-branching in hierarchical branching process, [84]).

Consider a hierarchical branching process $S(t)\stackrel{{\scriptstyle d}}{{\sim}}{\sf HBP}\big{(}\{T_{k}\},\{\lambda_{j}\},\{p_{K}\}\big{)}$ and let $T[S]$ be the tree generated by $S(t)$ in $\mathcal{BL}_{\rm plane}^{|}$ . For a branch $b\subset T[S]$ of order $K\geq 1$ , let $m_{i}:=m_{i}(b)\geq 0$ be the number of its side branches of order $i=1,\dots,K-1$ , and $m:=m(b)=m_{1}+\dots+m_{K-1}$ be the total number of the side branches. Conditioned on $m$ , let $l_{i}:=l_{i}(b)$ be the lengths of $m+1$ edges within $b$ , counted sequentially from the initial vertex, and $l:=l(b)=l_{1}+\dots+l_{m+1}$ be the total branch length. Define

[TABLE]

for $K\geq 0$ by assuming $T_{0}=0$ . Then the following statements hold:

The tree order satisfies

[TABLE] 2. 2.

The total number $m(b)$ of side branches within a branch $b$ of order $K$ has geometric distribution:

[TABLE]

with ${\sf E}[m(b)]=S_{K-1}-1=T_{1}+\dots+T_{K-1}.$ 3. 3.

Conditioned on the total number $m$ of side branches, the distribution of vector $(m_{1},\dots,m_{K-1})$ is multinomial with $m$ trials and success probabilities

[TABLE]

The vector $({\sf ord}_{1},\dots,{\sf ord}_{m})$ of side branch orders, where the side branches are labeled sequentially starting from the initial vertex of $b$ , is obtained from the sequence

[TABLE]

by a uniform random permutation $\sigma_{m}$ of indices $\{1,\dots,m\}$ :

[TABLE] 4. 4.

The total numbers of side branches and orders of side branches are independent in distinct branches. 5. 5.

The branch length $l$ has exponential distribution with rate $\lambda_{K}$ , independent of the lengths of any other branch (of any order). The corresponding edge lengths $l_{i}$ are i.i.d. random variables; they have a common exponential distribution with rate

[TABLE]

Proof.

All the properties readily follow from Def. 23. ∎

Combining properties 2 and 3 of Prop. 7 we find that the number $m_{i}$ of side branches of order $i$ within a branch $b$ of order $K$ has geometric distribution:

[TABLE]

with ${\sf E}\left[m_{i}\right]=T_{K-i}.$ We also notice that the numbers $m_{i}(b)$ for $i=1,\dots,K-1$ within the same branch $b$ are dependent.

Proposition 7 provides an alternative definition of the hierarchical branching process and suggests a recursive construction of $T[S]$ that does not require time-dependent simulations. Specifically, a tree of order $K=1$ consists of two vertices (root and leaf) connected by an edge of exponential length with rate $\lambda_{1}$ . Assume now that we know how to construct a random tree of any order below $K\geq 2$ . To construct a tree of order $K$ , we start with a perfect (combinatorial) planted binary tree of depth $K$ , which we call skeleton. The combinatorial shapes of such trees is illustrated in Fig. 13. All leaves in the skeleton have the same depth $K$ , and all vertices at depth $1\leq\kappa\leq K$ have the same Horton-Strahler order $K-\kappa+1$ . The root (at depth 0) has order $K$ . Next, we assign lengths to the branches of the skeleton. Recall (Ex. 1) that each branch in a perfect tree consists of a single edge. To assign length to a branch $b$ of order $\kappa$ , with $1\leq\kappa\leq K$ , we generate a geometric number $m\stackrel{{\scriptstyle d}}{{\sim}}{\sf Geom}_{0}(S^{-1}_{\kappa-1})$ according to (89) and then $m+1$ i.i.d. exponential lengths $l_{i}$ , $i=1,\dots,m+1$ , with the common rate $\lambda_{\kappa}S_{\kappa-1}$ according to (91). The total length of the branch $b$ is $l_{1}+\dots+l_{m+1}$ . Moreover, branch $b$ has $m$ side branches that are attached along $b$ with spacings $l_{i}$ , starting from the branch point closest to the root. The order assignment for the side branches is done according to (90). We generate side branches (each has order below $K$ ) independently and attach them to the branch $b$ . This completes the construction of a random tree of order $K$ . To construct a random HBP tree, one first generates a random order $K\geq 1$ according to (88) and then constructs a tree of order $K$ using the above recursive process.

Next, we establish various forms of self-similarity for the hierarchical branching process.

Theorem 9 (Self-similarity of hierarchical branching process, [84]).

Consider a hierarchical branching process $S(t)\stackrel{{\scriptstyle d}}{{\sim}}{\sf HBP}\big{(}\{T_{k}\},\{\lambda_{j}\},\{p_{K}\}\big{)}$ and let $T:=T[S]$ be the tree generated by $S(t)$ on $\mathcal{BL}_{\rm plane}^{|}$ . The following statements hold.

The combinatorial tree $\textsc{shape}(T)$ is mean Horton self-similar (according to Def. 14,16) with Tokunaga coefficients $\{T_{k}\}$ . 2. 2.

The combinatorial tree $\textsc{shape}(T)$ is Horton self-similar (according to Def. 10) with Tokunaga coefficients $\{T_{k}\}$ if and only if

[TABLE] 3. 3.

The tree $T$ is Horton self-similar (according to Def. 11) with scaling exponent $\zeta>0$ if and only if

[TABLE]

for some positive $\gamma$ and $0<p<1$ .

Proof.

By process construction, the tree $T$ is coordinated in shapes and lengths (according to Def. 11), with independent complete subtrees.

(1) Proposition 7, part (3) implies that the expected value of the number $\tilde{N}_{i,j}$ of side branches of order $i\geq 1$ within a branch of order $j>i$ is given by ${\sf E}\left[\tilde{N}_{i,j}\right]=T_{j-i}$ . The mean self-similarity of Def. 14 with coefficients $T_{k}$ immediately follows, using a conditional argument as in (8).

(2) Assume that $\textsc{shape}\left(T\right)$ is self-similar. A geometric distribution of orders is then established in Prop. 1. Inversely, a geometric distribution of orders ensures that the total mass $\mu\left(\mathcal{H}_{K}\right)$ , $K\geq 1$ , is invariant with respect to pruning. The conditional distribution of trees of a given order is completely specified by the side branch distribution, described in Proposition 7, parts (1)-(3). Consider a branch of order $K+1$ , $K\geq 1$ . Pruning decreases the orders of this branch, and all its side branches, by unity. Pruning eliminates a random geometric number $m_{1}$ of side-branches of order $1$ from the branch. It acts as a thinning with removal probability $T_{K}/(S_{K}-1)$ on the total side branch count $m$ . Accordingly, the total side branch count after pruning has geometric distribution with success probability

[TABLE]

The order assignment among the remaining side branches (with possible orders $i=1,\dots,K-1$ ) is done according to multinomial distribution with probabilities proportional to $T_{K-i}$ . This coincides with the side branch structure in the original tree, hence completing the proof of (2).

(3) Having proven (2), it remains to prove the statement for the length structure of the tree. Assume that $T$ is self-similar with scaling exponent $\zeta$ . The branches of order $j\geq 2$ become branches of order $j-1$ after pruning, which necessitates $\lambda_{j}=\zeta\,\lambda_{j-1}$ . Inversely, pruning acts as a thinning on the side branches within a branch of order $K+1$ , eliminating the side branches of order ${\sf ord}=1$ . Accordingly, the spacings between the remaining side branches are exponentially distributed with a decreased rate

[TABLE]

Comparing this with (91), and recalling the self-similarity of $\textsc{shape}\left(T\right)$ , we conclude that Def. 11 is satisfied with scaling exponent $\zeta$ . ∎

6.2 Hydrodynamic limit

Here we analyze the average numbers of branches of different orders in a hierarchical branching process, using a hydrodynamic limit. Specifically, let $n\,x^{(n)}_{j}(s)$ be the number of branches of order $j$ at time $s$ observed in $n$ independent copies of the process $S$ . Let $N_{j}(s)$ be the number of branches of order $j\geq 1$ in the process $S$ at instant $s\geq 0$ . We observe that, by the law of large numbers,

[TABLE]

Theorem 10 (Hydrodynamic limit for branch dynamics, [84]).

Suppose that the following conditions are satisfied:

[TABLE]

and

[TABLE]

Then, for any given $T>0$ , the empirical process

[TABLE]

converges almost surely, as $n\to\infty$ , to the process

[TABLE]

that satisfies

[TABLE]

where $\Lambda=\text{diag}\{\lambda_{1},\lambda_{2},\ldots\}$ is a diagonal operator with the entries $\lambda_{1},\lambda_{2},\ldots~{}$ , $e_{i}$ are the standard basis vectors, and operator $\mathbb{G}$ defined in Eq. (41).

Proof.

The process $~{}x^{(n)}(s)~{}$ evolves according to the transition rates

[TABLE]

with

[TABLE]

Here the first term reflects termination of branches of order $1$ ; the second term reflects termination of branches of orders $i+1>1$ , each of which results in creation of two branches of order $i$ ; and the last term reflects side-branching. Thus, the infinitesimal generator of the stochastic process $x^{(n)}(s)$ is

[TABLE]

Let

[TABLE]

The convergence result of Kurtz ([50, Theorem 2.1, Chapter 11], [87, Theorem 8.1]) given here in Appendix A extends (without changing the proof) to the Banach space $\ell^{1}(\mathbb{R})$ provided the same conditions are satisfied for $\ell^{1}(\mathbb{R})$ as for $\mathbb{R}^{d}$ in Theorem 33. Specifically, we require that for a compact set $\mathcal{C}$ in $\ell^{1}(\mathbb{R})$ ,

[TABLE]

and there exists $M_{\mathcal{C}}>0$ such that

[TABLE]

Here the condition (97) follows from

[TABLE]

which in turn follow from conditions (94). Similarly, Lipschitz conditions (98) are satisfied in $\mathcal{C}$ due to conditions (94). Thus, by Theorem 33 extended for $\ell^{1}(\mathbb{R})$ , the process $x^{(n)}(s)$ converges almost surely to $x(s)$ that satisfies $\dot{x}=F(x)$ , which expands as the following system of ordinary differential equations:

[TABLE]

with the initial conditions $x(0)=\lim\limits_{n\rightarrow\infty}x^{(n)}(0)=\pi:=\sum\limits_{K=1}^{\infty}p_{K}e_{K}$ by the law of large numbers. Finally, we observe that $\|\pi\|_{1}=1$ , and conditions (94) imply that $\mathbb{G}\Lambda$ is a bounded operator in $\ell^{1}(\mathbb{R})$ . ∎

6.3 Criticality and time invariance

6.3.1 Definitions

Assume that the hydrodynamic limit $x(s)$ , and hence the averages $x_{j}(s)$ , exist. Write $\pi=\sum\limits_{K=1}^{\infty}p_{K}e_{K}$ for the initial distribution of the process. Consider the average progeny of the process, that is the average number of branches of any order alive at instant $s\geq 0$ :

[TABLE]

Definition 24.

A hierarchical branching process $S(s)$ is said to be critical if its average progeny is constant: $C(s)=1$ for all $s\geq 0$ .

Definition 25.

A hierarchical branching process $S(s)$ is said to be time-invariant if

[TABLE]

Proposition 8.

Suppose the hydrodynamic limit $x(s)$ exists, and the hierarchical branching process $S(s)$ is time-invariant. Then the process $S(s)$ is critical.

Proof.

$C(s)=\|x(s)\|_{1}=\|e^{\mathbb{G}\Lambda s}\pi\|_{1}=\|\pi\|_{1}=1.$ ∎

Recall the function $~{}\hat{t}(z)=-1+2z+\sum_{j}z^{j}\,T_{j}~{}$ defined in Eq. (31) for all complex $|z|<1/L$ , where the inverse radius of convergence $L$ is defined in Eq. (93). We also recall that there is a unique real root $w_{0}$ of $\hat{t}(z)$ within $(0,\frac{1}{2}]$ . We formulate some of the results below in terms of $\hat{t}(z)$ and the Horton exponent $R:=w_{0}^{-1}$ ; see Theorem 1.

Proposition 9.

Suppose $\Lambda\pi$ is a constant multiple of the geometric vector $v_{0}=\sum\limits_{K=1}^{\infty}R^{-K}e_{K}$ . Then the process $S(s)$ is time-invariant.

Proof.

Observe that since $\hat{t}\left(R^{-1}\right)=0$ and $\mathbb{G}$ is a Toeplitz operator,

[TABLE]

and

[TABLE]

Hence $\mathbb{G}\Lambda\pi=\hat{t}\left(R^{-1}\right)\Lambda\pi=0$ and

[TABLE]

∎

Remark 8.

Proposition 9 states that the condition

[TABLE]

is sufficient for time-invariance, for any proportionality constant $b>0$ . This implies that a time-invariant process can be constructed for

(i)

an arbitrary sequence of Tokunaga coefficients $\{T_{k}\}$ satisfying (93) – by selecting $\lambda_{K}\,p_{K}=b\,R^{-K}$ ;

(ii)

arbitrary sequences $\{T_{k}\}$ satisfying (93) and $\{p_{K}\}$ – by selecting $\lambda_{K}=b\,R^{-K}\,p_{K}^{-1}$ ;

(iii)

arbitrary sequences $\{T_{k}\}$ satisfying (93) and $\{\lambda_{K}\}$ – by selecting $p_{K}=b\,R^{-K}\,\lambda_{K}^{-1}$ .

At the same time, arbitrary sequences $\{\lambda_{K}\},\{p_{K}\}$ will not, in general, satisfy (101) and hence will not correspond to a time-invariant process.

6.3.2 Criticality and time-invariance in a self-similar process

A convenient characterization of criticality can be established for self-similar hierarchical branching processes. Recall that by Theorem 9, part (3), a self-similar process $S(s)$ is specified by parameters $\gamma>0$ , $0<p<1$ and length self-similarity constant $\zeta>0$ such that $p_{K}=p(1-p)^{K-1}$ and $\lambda_{j}=\gamma\,\zeta^{-j}$ . We refer to a self-similar process by its parameter triplet, and write $S(s)\stackrel{{\scriptstyle d}}{{\sim}}S_{p,\gamma,\zeta}(s)$ . We denote the respective average progeny by $C_{p,\gamma,\zeta}(s)$ . Observe that in the self-similar case the first of the conditions (94) is equivalent to $\zeta\geq 1$ , and the second is equivalent to $\zeta\geq L$ . Hence, the conditions (94) are equivalent to $\zeta\geq 1\vee L$ .

Theorem 11 (Average progeny of a self-similar process, [84]).

Consider a self-similar process $S_{p,\gamma,\zeta}(s)$ with $0<p<1$ and $\gamma>0$ . Suppose that (93) is satisfied and $\zeta\geq 1\vee L$ . Then

[TABLE]

Proof.

The choice of the limits for $\zeta$ ensures that the conditions (94) are satisfied and hence, by Theorem 10, the hydrodynamic limit $x(s)$ exists and the function $C_{p,\gamma,\zeta}(s)$ is well defined. Now we have

[TABLE]

and therefore

[TABLE]

Iterating recursively, we obtain

[TABLE]

and in general,

[TABLE]

Thus, taking $x(0)=\pi$ ,

[TABLE]

The average progeny function for fixed values of $p\in(0,1)$ , $\gamma>0$ and $\zeta\geq 1$ can therefore be expressed as

[TABLE]

since

[TABLE]

Next, notice that by letting $p^{\prime}=1-\zeta^{-1}(1-p)$ , we have from (6.3.2) and the uniform convergence of the corresponding series for any fixed $M>0$ and $s\in[0,M]$ , that

[TABLE]

Observe that $\zeta\geq 1$ implies $p^{\prime}\geq p$ and $~{}C_{p^{\prime},\gamma,\zeta}(s)\leq C_{p,\gamma,\zeta}(s)$ . Also, observe that

[TABLE]

as $\hat{t}$ is an increasing function on $[0,\infty)$ and $\hat{t}\big{(}1/R\big{)}=0$ . This leads to the statement of the theorem. ∎

Remark 9.

If $\zeta=1$ , then $p^{\prime}=p$ and equation (105) has an explicit solution

[TABLE]

Accordingly,

[TABLE]

This case is further examined in Sect. 6.4. In general, the average progeny $C_{p,\gamma,\zeta}(s)$ may increase sub-exponentially for $p<1-{\zeta\over R}$ . For example, if there is a nonnegative integer $d$ such that $\zeta^{d+1}<R$ , then for $p=1-{\zeta^{d+1}\over R}$ we have $\hat{t}\big{(}\zeta^{-d-1}(1-p)\big{)}=0$ . Accordingly, (103) implies that $C_{p,\gamma,\zeta}(s)$ is a polynomial of degree $d$ .

Theorem 12 (Criticality of a self-similar process, [84]).

Consider a self-similar process $S_{p,\gamma,\zeta}(s)$ with $0<p<1$ , $\gamma>0$ . Suppose that (93) is satisfied and $\zeta\geq 1\vee L$ . Then the following conditions are equivalent:

(i)

The process is critical.

(ii)

The process is time-invariant.

(iii)

The following relations hold: $\zeta<R\quad\text{and}\quad p=p_{c}:=1-\frac{\zeta}{R}.$

Proof.

(i) $\leftrightarrow$ (iii) is established in Theorem 11. (ii) $\rightarrow$ (i) is established in Prop 8. (iii) $\rightarrow$ (ii): Observe that $\hat{t}\left(\zeta^{-1}(1-p)\right)=\hat{t}\left(R^{-1}\right)=0$ . Time invariance now follows from (103). ∎

Remark 10.

By Thm. 9, the product $\lambda_{K}\,p_{K}$ in a self-similar process is given by

[TABLE]

for some $0<p<1$ , $\gamma>0$ , and $\zeta\geq 1\vee L$ . Hence, a time-invariant process can be constructed, according to Prop. 9 and (101), by selecting any sequence $\{T_{k}\}$ such that the unique real zero $w_{0}$ on $[0,1/2)$ of the respective function $\hat{t}(z)$ is given by

[TABLE]

Theorem 12 states that this is the only possible way to construct a time-invariant process, given that the process is self-similar.

6.4 Closed form solution for equally distributed branch lengths

Consider a self-similar hierarchical branching process with $\Lambda=I$ and $x(0)=e_{K}$ for a given integer $K\geq 1$ . In other words, we assume $\lambda_{j}=1$ for all $j\geq 1$ , which implies $\gamma=\zeta=1$ .

In this case, the system of equation (99) is finite dimensional,

[TABLE]

with the initial conditions $x(0)=e_{K}$ .

Recall the sequence $t(j)$ defined in Eq. (30), and let $y(s)=e^{s}x(s)$ . Then (106) rewrites in terms of the coordinates of $y(s)$ as follows

[TABLE]

with the initial conditions $y(0)=e_{K}$ . The ODEs (107) can be solved recursively in a reversed order of equations in the system obtaining for $m=1,\ldots,K-1$ ,

[TABLE]

Let $\delta_{0}(j)={\bf 1}_{\{j=0\}}$ be the Kronecker delta function. Then we arrive with the closed form solution

[TABLE]

Observe that if we randomize the orders of trees by assigning an order $K$ to a tree with geometric probability $p_{K}=p(1-p)^{K-1}$ , then the above closed form expression (6.4) would yield an expression for the average progeny that was observed in Remark 9 of this section:

[TABLE]

6.5 Critical Tokunaga process

We introduce here a class of hierarchical branching processes that enjoy all of the symmetries discussed in this work – Horton self-similarity, criticality, time-invariance, strong Horton law, Tokunaga self-similarity, and also have independently distributed edge lengths. Despite these multiple constraints, the class is sufficiently broad, allowing the self-similarity constant $\zeta$ (Def. 11, part (iv)) to take any value $\zeta\geq 1$ , and the Horton exponent to take any value $R\geq 2$ . The critical binary Galton-Watson process is a special case of this class.

Definition 26 (Critical Tokunaga process).

We say that $S(t)$ is a critical Tokunaga process with parameters ( $\gamma$ , $c$ ), and write $S(t)\stackrel{{\scriptstyle d}}{{\sim}}S^{\rm Tok}(t;c,\gamma)$ , if it is a hierarchical branching process with the following parameter triplet:

[TABLE]

for some $\gamma>0,~{}c\geq 1$ .

Proposition 10 (Critical Tokunaga process).

Suppose $S(t)\stackrel{{\scriptstyle d}}{{\sim}}S^{\rm Tok}(t;c,\gamma)$ and let $T[S]$ be the tree of $S(t)$ . Then,

$S(t)$ * is a Horton self-similar, critical, and time invariant process*

[TABLE] 2. 2.

Independently of the combinatorial shape of $T[S]$ , its edge lengths are i.i.d. exponential random variables with rate $\gamma$ . 3. 3.

We have

[TABLE]

Proof.

Self-similarity follows from Thm. 9, part (3). Specification of parameters (109) implies $p=2^{-1}$ and $\zeta=c$ . The Horton exponent $R=2c$ is found from (37). Criticality and time-invariance now follow from Thm. 12, since here

[TABLE]

To establish the edge lengths property, observe that

[TABLE]

Recall from Prop. 7, part(4) that the edge lengths within a branch of order $K\geq 1$ are i.i.d. exponential r.v.s with rate

[TABLE]

The values of $R$ , $p$ , and $\zeta$ are found in 1. The expression for $\hat{t}(z)$ and equality $L=c$ are readily obtained from the geometric form of the Tokunaga coefficients $T_{k}$ . ∎

Criticality and i.i.d. edge length distribution property characterize the critical Tokunaga process, as we explain in the following statement.

Lemma 10.

Consider a self-similar hierarchical branching process $S(t)\!\!\stackrel{{\scriptstyle d}}{{\sim}}\!\!S_{p,\gamma,\zeta}(t)$ with $p\in(0,1)$ and $\gamma>0$ . Suppose that (93) holds and $\zeta\geq 1\vee L$ . Let $T[S]$ be the tree of $S(t)$ . Then, the following conditions are equivalent:

$S(t)$ * is critical and the edges in $T$ have i.i.d. exponential lengths with rate $\gamma>0$ .* 2. 2.

$S(t)$ * is a critical Tokunaga process: $S(t)\stackrel{{\scriptstyle d}}{{\sim}}S_{{1\over 2},\gamma,c}(t).$ *

Proof.

The implication ( $2\Rightarrow 1$ ) was established in Prop. 10. To show ( $1\Rightarrow 2$ ), recall from Prop. 7, Eq. (91), that the edge lengths within a branch of order $K$ are i.i.d. with rate $\lambda_{K}S_{K-1}$ . If the rate is independent of $K$ , we have for any $K\geq 1$ :

[TABLE]

or

[TABLE]

Given $S_{0}=1$ , we find $S_{K}=\zeta^{K}$ , and hence $T_{K}=(\zeta-1)\zeta^{K-1}.$ By (37), the Horton exponent is $R=2\zeta$ . Criticality implies (Prop. 12, part (iii)):

[TABLE]

which completes the proof. ∎

It follows from the proof of Lemma 10 that the i.i.d. edge length property alone (and no criticality) is equivalent to the following constraints on the process parameters:

[TABLE]

while allowing an arbitrary choice of $p\in(0,1)$ . The tree of such process is Tokunaga self-similar, although not critical unless $p=2^{-1}$ .

The next results shows that the critical binary Galton-Watson tree ${\sf GW}(\lambda)$ with i.i.d. exponential edge lengths is a special case of the critical Tokunaga process.

Theorem 13 (Critical binary Galton-Watson tree, [84]).

Suppose $S(t)$ is a critical Tokunaga process with parameters

[TABLE]

which means $S(t)\stackrel{{\scriptstyle d}}{{\sim}}S^{\rm Tok}(t;2,\gamma)$ . Let $T[S]$ be the tree of $S(t)$ . Then $T[S]$ has the same distribution on $\mathcal{BL}_{\rm plane}^{|}$ as the critical binary Galton-Watson tree with i.i.d. edge lengths: $T[S]\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(\gamma)$ .

Proof.

Consider a tree $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(\gamma)$ in $\mathcal{BL}_{\rm plane}^{|}$ . We show below that this tree can be dynamically generated according to Def. 23 of the hierarchical branching process with parameters (110).

First, notice that by Prop. 6

[TABLE]

We will establish later in Corollary 12 that the length of every branch of order $j$ in $T$ is exponentially distributed with parameter $\lambda_{j}=\gamma 2^{1-j}$ , which matches the branch length distribution in the hierarchical branching process (110). Furthermore, by Corollary 12, conditioned on $\mathcal{R}^{i}(T)\neq\phi$ (which happens with a positive probability), we have $\mathcal{R}^{i}(T)\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(2^{-i}\gamma)$ . This means that the distribution of Galton-Watson trees pruned $i$ times is a linearly scaled version of the original distribution (the same combinatorial structure, linearly scaled edge lengths). Recall (Prop. 6) the total number $m_{j}$ of side branches within a branch of order $j\geq 2$ in $T$ is geometrically distributed with mean $T_{1}+\dots+T_{j-1}=2^{j-1}-1$ , where $T_{i}=2^{i-1}$ , $i\geq 1$ . Conditioned on $m_{j}$ , the assignment of orders among the $m_{j}$ side-branches is done according to the multinomial distribution with $m_{j}$ trials and success probability for order $i=1,\dots,j-1$ given by $T_{j-i}/(T_{1}+\dots+T_{j-1})$ . This implies that the leaves of the original tree merge into every branch of the pruned tree as a Poisson point process with intensity $\gamma=\lambda_{j}T_{j-1}$ . Iterating this pruning argument, the branches of order $i$ merge into any branch of order $j$ in the pruned tree $\mathcal{R}^{i}(T)$ as a Poisson point process with intensity $\gamma\,2^{-i}=\lambda_{j}T_{j-i}$ for every $j>i$ .

Finally, the orientation of the two offspring of the same parent in ${\sf GW}(\gamma)$ is uniform random, by Def. 22. We conclude that tree ${\sf GW}(\gamma)$ has the same distribution on $\mathcal{BL}_{\rm plane}^{|}$ as the critical Tokunaga process with parameters (110). ∎

Remark 11.

The condition $T_{i,i+k}=T_{k}=a\,c^{k-1}$ was first introduced in hydrology by Eiji Tokunaga [133] in a study of river networks, hence the process name. The additional constraint $a=c-1$ is necessitated here by the self-similarity of tree lengths, which requires the sequence $\lambda_{j}$ to be geometric. The sequence of the Tokunaga coefficients then also has to be geometric, and satisfy $a=c-1$ , to ensure identical distribution of the edge lengths, see Prop. 7(4). Recall the special place case $a=c-1$ plays for the entropy rate of Tokunaga self-similar trees as elaborated in Sect. 4.3. See Cor. 1. Interestingly, the constraint $a=c-1$ appears in the random self-similar network (RSN) model introduced by Veitzer and Gupta [139], which uses a purely topological algorithm of recursive local replacement of the network generators to construct random self-similar trees. The importance of the constraint $a=c-1$ in purely combinatorial context is revealed in Sect. 6.7.

6.6 Martingale approach

In this section, we propose a martingale representation for the size and length of a critical Tokunaga tree of a given order. This leads, via the martingale techniques, to the strong Horton laws for both these quantities, and allows us to find the asymptotic order of a tree of a given size. The proposed martingale representation is related to an alternative construction of a critical Tokunaga tree, via a Markov tree process on $\mathcal{BL}_{\rm plane}^{|}$ .

6.6.1 Markov tree process

Consider a critical Tokunaga process $S^{\rm Tok}(t;c,\gamma)$ (Def. 26) with $c>1$ (hence excluding a trivial case $c=1$ of perfect binary trees), and let $\mu$ be the measure induced by this process on $\mathcal{BL}_{\rm plane}^{|}$ . Following the notations introduced in Sect. 3.1, Eq. (6), we consider conditional measures

[TABLE]

Next, we construct a discrete time Markov tree process $\big{\{}\Upsilon_{K}\big{\}}_{K\in\mathbb{N}}$ on $\mathcal{BL}_{\rm plane}^{|}$ such that for each $K\in\mathbb{N}$ ,

[TABLE]

Let

[TABLE]

be the number of leaves in $\Upsilon_{K}$ and $Y_{K}=\textsc{length}(\Upsilon_{K})\in\mathbb{R}_{+}$ be the tree length. We let $\Upsilon_{1}$ be an I-shaped tree of Horton-Strahler order one, with the edge length $Y_{1}\stackrel{{\scriptstyle d}}{{\sim}}{\sf Exp}(\gamma)$ . This tree has one leaf, $X_{1}=1$ .

Conditioned on $\Upsilon_{K}$ , the tree $\Upsilon_{K+1}$ is constructed according to the following transition rules. Denote by $\Upsilon_{K}^{\prime}$ the tree $\Upsilon_{K}$ with edge length scaled by $c$ . That is, the tree $\Upsilon_{K}^{\prime}$ is obtained by multiplying the edge lengths in $\Upsilon_{K}$ by $c$ , while preserving the combinatorial shape and planar embedding:

[TABLE]

Next, we attach new leaf edges to $\Upsilon_{K}^{\prime}$ at the points sampled by a Poisson point process with intensity $\gamma(c-1)c^{-1}$ along $\Upsilon_{K}^{\prime}$ . We also attach a pair of new leaf edges to each of the leaves in $\Upsilon_{K}^{\prime}$ ; there is exactly $2X_{K}$ such attachments ( $X_{K}$ pairs). The lengths of all the newly attached leaf edges are i.i.d. exponential random variables with parameter $\gamma$ . The left-right orientation of the newly added edges is determined independently and uniformly. Finally, the tree $\Upsilon_{K+1}$ consists of $\Upsilon_{K}^{\prime}$ and all the attached leaves and leaf edges.

Lemma 11.

The process $\big{\{}\Upsilon_{K}\big{\}}_{K\in\mathbb{N}}$ is a Markov process that satisfies (111).

Proof.

The process construction readily implies the Markov property, and ensures that ${\sf ord}(\Upsilon_{K})=K$ and $\mathcal{R}(\Upsilon_{K+1})=\Upsilon_{K}$ . Next, we show that a random tree $\Upsilon_{K}$ satisfies Def. 23, conditioned on the tree order $K\geq 1$ , with the critical Tokunaga parameters

[TABLE]

The tree $\Upsilon_{1}$ has exponential edge length with parameter $\lambda_{1}=\gamma$ and no side branching, hence $\Upsilon_{1}\stackrel{{\scriptstyle d}}{{\sim}}\mu_{1}$ . Assume now that $\Upsilon_{K}\stackrel{{\scriptstyle d}}{{\sim}}\mu_{K}$ for some $K\geq 1$ and establish each of the properties of Def. 23, except the tree order property (i), for $\Upsilon_{K+1}$ .

Property Def. 23(ii). Fix any $j$ such that $1<j\leq K$ . Every branch of order $j$ in $\Upsilon_{K+1}$ is formed by a branch of order $j-1$ in $\Upsilon_{K}$ . In particular, the length of the branch is multiplied by $c$ . Accordingly, every branch of order $j$ within $\Upsilon_{K+1}$ produces offspring of every order $i$ such that $1<i<j$ with rate

[TABLE]

By construction, the side branches of order $i=1$ are generated with rate

[TABLE]

This establishes property (ii).

Property Def. 23(iii). Using the same argument as above, each branch of order $j>1$ in $\Upsilon_{K+1}$ terminates with rate $c^{-1}\lambda_{j-1}=\lambda_{j}.$ By construction, each branch of order $i=1$ terminates with rate $\gamma=\lambda_{1}.$ This establishes property (iii).

Properties Def. 23(iv,v,vi) follow trivially from the process construction. This completes the proof.

∎

Notice that sampling a random variable $\kappa\stackrel{{\scriptstyle d}}{{\sim}}{\sf Geom}_{1}\left({1\over 2}\right)$ independently of the process $\Upsilon_{K}$ , we have the stopped process $\Upsilon_{\kappa}\stackrel{{\scriptstyle d}}{{\sim}}\mu$ .

6.6.2 Martingale representation of tree size and length

By construction, the pairs $(X_{K},Y_{K})$ and $(X_{K+1},Y_{K+1})$ are related in an iterative way as follows. Conditioned on the values of $(X_{K},Y_{K})$ , we have

[TABLE]

where $V_{K}\stackrel{{\scriptstyle d}}{{\sim}}{\sf Poi}\big{(}\gamma(c-1)Y_{K}\big{)}$ is the number of side branches of order one attached to $\Upsilon^{\prime}_{K}$ . Next, conditioning on $X_{K+1}$ , we have

[TABLE]

where $U_{K}\stackrel{{\scriptstyle d}}{{\sim}}{\sf Gamma}\big{(}X_{K+1},\gamma\big{)}$ is the sum of $X_{K+1}$ i.i.d. edge lengths, each exponentially distributed with parameter $\gamma$ .

Lemma 12 (Martingale representation).

The sequence

[TABLE]

is a martingale with respect to the Markov tree process $\big{\{}\Upsilon_{K}\big{\}}_{K\in\mathbb{N}}$ .

Proof.

Taking conditional expectations in (112) and (113) gives

[TABLE]

This can be summarized as

[TABLE]

where

[TABLE]

The eigenvalues of the matrix $\mathbb{M}$ are $R=2c$ and $1$ . The largest eigenvalue equals the Horton exponent $R$ ; the respective eigenspace is $y=2\gamma^{-1}x$ . Equation (117) implies that

[TABLE]

is a vector valued martingale with respect to the Markov tree process $\big{\{}\Upsilon_{K}\big{\}}_{K\in\mathbb{N}}$ . Multiplying this martingale by the left eigenvector $\Big{(}1,\,\gamma(c-1)\Big{)}$ of $\mathbb{M}$ that corresponds to the largest eigenvalue $R$ , we obtain a scalar martingale with respect to $\big{\{}\Upsilon_{K}\big{\}}_{K\in\mathbb{N}}$ :

[TABLE]

This completes the proof. ∎

Lemma 13.

Suppose $\mu=S^{\rm Tok}(t;c,\gamma)$ is the distribution of a critical Tokunaga process, and $\big{\{}\Upsilon_{K}\big{\}}_{K\in\mathbb{N}}$ is the corresponding Markov tree process. Then,

[TABLE]

Proof.

Recall that $Y_{K}$ is a sum of $2X_{K}-1$ independent edge lengths, each being exponentially distributed with parameter $\gamma$ . Thus, since $X_{K}=N_{1}[\Upsilon_{K}]\geq 2^{K-1}$ , the Chebyshev inequality implies for any $\epsilon>0$ ,

[TABLE]

as ${\sf E}[Y_{K}\,|X_{K}]=2\gamma^{-1}X_{K}-\gamma^{-1}$ and ${\sf E}[Y_{K}^{2}\,|X_{K}]=4\gamma^{-2}X_{K}^{2}-3\gamma^{-2}X_{K}+\gamma^{-1}$ .

Hence, by the Borel-Cantelli lemma, we arrive with the almost sure convergence in (118). ∎

Lemma 14.

Suppose $\mu=S^{\rm Tok}(t;c,\gamma)$ is the distribution of a critical Tokunaga process, and $\big{\{}\Upsilon_{K}\big{\}}_{K\in\mathbb{N}}$ is the corresponding Markov tree process. Then,

[TABLE]

Proof.

For a given integer $x\geq 2^{K-1}$ , we condition on the event $X_{K}=x$ . Then, $Y_{K}$ is a sum of $2X_{K}-1=2x-1$ i.i.d. exponential edge lengths. Hence, $Y_{K}\stackrel{{\scriptstyle d}}{{\sim}}{\sf Gamma}\big{(}2x-1,\gamma\big{)}$ . Finally, recall that in the setup of (112), $V_{K}\stackrel{{\scriptstyle d}}{{\sim}}{\sf Poi}\big{(}\gamma(c-1)Y_{K}\big{)}$ . Therefore, we can compute the moment generating function of $V_{K}$ conditioned on the event $X_{K}=x$ as follows

[TABLE]

with the domain $s\in\left(-\infty,\,\log{c\over c-1}\right)$ .

Next, we use (6.6.2) in the exponential Markov inequality (a.k.a. Chernoff bound). For a given $\varepsilon\in\Big{(}0,(c-1)c^{-1}\Big{)}$ and $x\geq 2^{K-1}$ , by (112) we have, for all $s\geq 0$ ,

[TABLE]

We find the extreme value of ${e^{(1-\varepsilon)cs}\over ce^{s}-(c-1)}$ in (120), and substitute

[TABLE]

into the right hand side of (120), obtaining

[TABLE]

Now, since $X_{K}\geq 2^{K-1}$ , (121) implies

[TABLE]

Next, plugging $\varepsilon=1-e^{-1/K^{2}}$ into (6.6.2), we find that

[TABLE]

and equivalently,

[TABLE]

Therefore, by the Borel-Cantelli lemma,

[TABLE]

where $|\cdot|$ denotes the magnitude of sets. Hence, as $\sum\limits_{K=1}^{\infty}{1\over K^{2}}<\infty$ ,

[TABLE]

This completes the proof. ∎

6.6.3 Strong Horton laws in a critical Tokunaga tree

The martingale representation of Lemma 12 has an immediate implication for the asymptotic behavior of the average size of a critical Tokunaga tree, stated below.

Corollary 5 (Strong Horton law for mean branch numbers).

Suppose $\mu=S^{\rm Tok}(t;c,\gamma)$ is the distribution of a critical Tokunaga process with $c\geq 1$ . Then, the following closed form expression holds for all $1\leq k\leq K$ :

[TABLE]

Consequently, $\mu=S^{\rm Tok}(t;c,\gamma)$ satisfies the strong Horton law for mean branch numbers (Def. 19). The equation (127) implies, in particular,

[TABLE]

Proof.

Since $Y_{K}$ is a sum of $2X_{K}-1$ independent edge lengths, each exponentially distributed with parameter $\gamma$ , we have ${\sf E}[Y_{K}]=\gamma^{-1}(2{\sf E}[X_{K}]-1)$ . Therefore,

[TABLE]

Furthermore, for all $1\leq k\leq K$ , substituting $K-k+1$ instead of $K$ in the above equation, we obtain

[TABLE]

Since $M_{K}$ is a martingale (see Lemma 12), we have ${\sf E}[M_{K-k+1}]={\sf E}[M_{K}]$ . Hence,

[TABLE]

as ${\sf E}[X_{K-k+1}]={\sf E}\big{[}N_{k}[\Upsilon_{K}]\big{]}$ and ${\sf E}[X_{K}]={\sf E}\big{[}N_{1}[\Upsilon_{K}]\big{]}$ . This establishes (127). The strong Horton law (29) for mean branch numbers follows from (127). The expression (128) is obtained by using $k=K$ in (127). This completes the proof. ∎

We also suggest an alternative proof that emphasizes the spectral property of the transition matrix $\mathbb{M}$ of (117).

Alternative proof of Corollary 5.

Taking expectation in (117) we obtain, for any $K>1$ ,

[TABLE]

Since $Y_{K}$ is a sum of $2X_{K}-1$ independent edge lengths, each exponentially distributed with parameter $\gamma$ , we have ${\sf E}[Y_{K}]=\gamma^{-1}(2{\sf E}[X_{K}]-1)$ . Recall also that $\Big{(}1,\gamma(c-1)\Big{)}$ is the left eigenvector of $\mathbb{M}$ that corresponds to the eigenvalue $R$ . Accordingly,

[TABLE]

Premultiplying (129) by the eigenvector $\Big{(}1,\gamma(c-1)\Big{)}$ we hence obtain

[TABLE]

which establishes (127), since ${\sf E}[X_{1}]={\sf E}\big{[}N_{K}[\Upsilon_{K}]\big{]}$ and ${\sf E}[X_{K}]={\sf E}\big{[}N_{1}[\Upsilon_{K}]\big{]}$ . The strong Horton law (29) for mean branch numbers follows from (127). The expression (128) is obtained by using $k=K$ in (127). This completes the proof. ∎

The sizes of trees of distinct orders have fixed asymptotic ratios in a much stronger (almost sure) sense, as we show below.

Theorem 14.

Suppose $\mu=S^{\rm Tok}(t;c,\gamma)$ is the distribution of a critical Tokunaga process, and $\big{\{}\Upsilon_{K}\big{\}}_{K\in\mathbb{N}}$ is the corresponding Markov tree process. Then,

[TABLE]

Proof.

Recall that by Lemma 12, $M_{K}$ defined in (114) is a martingale. Also, $M_{K}>0$ and is in $L^{1}$ for all $K\in\mathbb{N}$ . Thus, by the Doob’s Martingale Convergence Theorem, $M_{K}$ converges almost surely. Hence, by (118), $R^{1-K}X_{K}$ also converges almost surely, and

[TABLE]

In other words, for almost every trajectory of the process $\big{\{}\Upsilon_{K}\big{\}}_{K\in\mathbb{N}}$ , we have $R^{1-K}X_{K}=R^{1-K}N_{1}[\Upsilon_{K}]$ converging to a finite limit $V_{\infty}$ , where $V_{\infty}$ is a random variable. Hence, for any $k\in\mathbb{N}$ , the random sequences

[TABLE]

converge almost surely to the same finite $V_{\infty}$ , where $V_{\infty}>0~{}a.s.$ by Lemma 14. The almost sure convergence (130) follows. ∎

The almost sure convergence (130) in Theorem 14 implies the corresponding week convergence

[TABLE]

via the Bounded Convergence Theorem. We restate it as the following corollary.

Corollary 6 (Strong Horton law for branch numbers).

The distribution $\mu=S^{\rm Tok}(t;c,\gamma)$ of a critical Tokunaga process satisfies the strong Horton law for branch numbers (Def. 18). That is, for any $\epsilon>0$ ,

[TABLE]

Corollary 7 (Asymptotic tree order).

Suppose $\mu=S^{\rm Tok}(t;c,\gamma)$ is the distribution of a critical Tokunaga process, and $\big{\{}\Upsilon_{K}\big{\}}_{K\in\mathbb{N}}$ is the corresponding Markov tree process. Then,

[TABLE]

Proof.

Recall from (131) that

[TABLE]

where $V_{\infty}$ is finite (by Doob’s Martingale Convergence Theorem) and $V_{\infty}>0~{}a.s.$ by Lemma 14. Accordingly,

[TABLE]

with $-\infty<\log_{R}{V_{\infty}}<\infty~{}a.s.$ Recalling that $\#\Upsilon_{K}=2X_{K}-1$ completes the proof. ∎

The almost sure convergence (118) allows to restate the limit results of this section in terms of the tree length $Y_{K}$ .

Corollary 8 (Strong Horton laws for tree lengths).

Suppose $\mu=S^{\rm Tok}(t;c,\gamma)$ is the distribution of a critical Tokunaga process, and $\big{\{}\Upsilon_{K}\big{\}}_{K\in\mathbb{N}}$ is the corresponding Markov tree process. Then, for a tree $T\stackrel{{\scriptstyle d}}{{\sim}}\mu$ ,

[TABLE]

Furthermore, we have, for any $k\geq 1$ ,

[TABLE]

which implies the strong Horton law for tree lengths: for any $\epsilon>0$ ,

[TABLE]

Example 12 (Critical binary Galton Watson tree).

Theorem 13 asserts that the critical binary Galton-Watson tree with exponential i.i.d. edge lengths, $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(\lambda)$ , has the same distribution as a critical Tokunaga branching process with $c=2$ and $\gamma=\lambda$ . In this case $R=2c=4$ and the expressions (127), (128) give, for any $K\geq 1$ ,

[TABLE]

Fixing $\lambda=1$ , by the expression(133) we have, for any $K\geq 1$ ,

[TABLE]

Table 1 shows the values of the mean size and mean length of a critical binary Galton-Watson tree $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(1)$ , conditioned on selected values of tree order.

6.7 Combinatorial HBP: Geometric Branching Process

This section focuses on combinatorial structure of a Horton self-similar hierarchical branching process [84]

[TABLE]

Let $T[S]$ be the tree generated by $S(t)$ in $\mathcal{BL}_{\rm plane}^{|}$ . Section 6.7.1 introduces a discrete time multi-type geometric branching process $\mathcal{G}(s)={\mathcal{G}}(s;\{T_{k}\},p)$ whose trajectories induce a random tree $\mathcal{G}(\{T_{k}\},p)$ on $\mathcal{BT}^{|}$ such that

[TABLE]

We then show in Sect. 6.7.2 that geometric branching process is time invariant (in discrete time) if and only if it is Tokunaga self-similar with $T_{k}=(c-1)c^{k-1}$ and $p=1/2$ .

6.7.1 Definition and main properties

Our goal is to consider combinatorial shape of a self-similar hierarchical branching process. The following definition suggests an explicit time dependent construction of such a process, which we denote $\mathcal{G}(s;\{T_{k}\},p)$ .

Definition 27 (Geometric Branching Process).

Consider a sequence of Tokunaga coefficients $\{T_{k}\geq 0\}_{k\geq 1}$ and $0<p<1$ . Define

[TABLE]

for $K\geq 0$ by assuming $T_{0}=0$ . The Geometric Branching Process (GBP) $\mathcal{G}(s)={\mathcal{G}}(s;\{T_{k}\},p)$ describes a discrete time multi-type population growth:

(i)

The process starts at $s=0$ with a progenitor of order ${\sf ord}(\mathcal{G})$ such that ${\sf ord}(\mathcal{G})\stackrel{{\scriptstyle d}}{{\sim}}{\sf Geom}_{1}(p)$ .

(ii)

At every integer time instant $s>0$ , each population member of order $K\in\{1,\dots,{\sf ord}(\mathcal{G})\}$ terminates with probability $q_{K}=S^{-1}_{K-1}$ , independently of other members. At termination, a member of order $K>1$ produces two offspring of order $(K-1)$ ; and a member of order $K=1$ terminates with leaving no offspring.

(iii)

At every integer time instant $s>0$ , each population member of order $K\in\{1,\dots,{\sf ord}(\mathcal{G})\}$ survives (does not terminate) with probability

[TABLE]

independently of other members. In this case, it produces a single offspring (side branch). The offspring order $i\in\{1,\dots,K-1\}$ , is drawn from the distribution

[TABLE]

The geometric tree ${\mathcal{G}}(\{T_{k}\},p)$ is a combinatorial tree generated by the trajectories of ${\mathcal{G}}(s;\{T_{k}\},p)$ in $\mathcal{BT}^{|}$ .

By construction, the distribution of a geometric tree $\mathcal{G}(\{T_{k}\},p)$ coincides with the combinatorial shape of the tree of a combinatorially Horton self-similar hierarchical branching process $S(t)$ with Tokunaga coefficients $\{T_{k}\}$ , initial order distribution $p_{K}=p(1-p)^{K-1}$ and an arbitrary positive sequence of termination rates $\{\lambda_{i}\}$ . Accordingly, the branching structure of a geometric tree is described by Prop. 7, items (1)-(4). The essential elements of the geometric trees (tree order, total number of side branches within a branch, numbers of side branches of a given order within a branch) are described by geometric laws, hence the model name.

Similarly to the tree of an HBP, a geometric tree can be constructed without time-dependent simulations, following a suitable modification of the algorithm given after Prop. 7. Specifically, the step that involves generation and assignment of the edge lengths $l_{i}$ should be skipped.

Consider a geometric tree $\mathcal{G}=\mathcal{G}(\{T_{k}\},p)$ and its two subtrees, $T^{a}$ and $T^{b}$ , rooted at the internal vertex closest to the root, randomly and uniformly permuted. We call $T^{a}$ and $T^{b}$ the principal subtrees of $\mathcal{G}$ . Let $K$ be the order of $\mathcal{G}$ , and, conditioned on $K>1$ , let $K_{a},K_{b}$ be the orders of the principal subtrees $T^{a}$ and $T^{b}$ , respectively. Observe that the pair $K_{a},K_{b}$ uniquely defines the tree order $K$ :

[TABLE]

We write $K_{1}\leq K_{2}$ for the order statistics of $K_{a}$ , $K_{b}$ .

Lemma 15 (Order of principal subtrees).

Conditioned on the tree order $K$ , the joint distribution of the order statistics $(K_{1},K_{2})$ is given by

[TABLE]

where

[TABLE]

Proof.

Definition 27, part (ii) states that a branch of order $K$ splits into two branches of order $K-1$ with probability $S^{-1}_{K-1}$ , which establishes the first case of (138). Definition 27, part (iii) states that, otherwise, with probability $1-S^{-1}_{K-1}$ , a side branch is created whose order equals $j$ with probability $T_{K-j}(S_{K-1}-1)^{-1}$ . This gives

[TABLE]

which establishes the second case. ∎

6.7.2 Tokunaga self-similarity of time invariant process

Let $x_{i}(s)$ , $i\geq 1$ , denote the average number of vertices of order $i$ at time $s$ in the process $\mathcal{G}(s)$ , and ${\bf x}(s)=(x_{1}(s),x_{2}(s),\dots)^{T}$ be the state vector. By definition we have

[TABLE]

where ${\bf e}_{K}$ are standard basis vectors. Furthermore, if $q_{a,b}$ , $a\geq b$ , denotes the probability that a vertex of order ${\sf ord}=a+{\bf 1}_{\{a=b\}}$ that exists at time $s$ splits into a pair of vertices of orders $(a,b)$ at time $(s+1)$ , then

[TABLE]

The first term in the right-hand side of (139) corresponds to a split of an order- $(K+1)$ vertex into two vertices of order $K$ , the second – to a split of an order- $K$ vertex into a vertex of order $K$ and a vertex of a smaller order, and the third – to a split of a vertex of order $i>K$ into a vertex of order $K$ and a vertex of order $i$ . The geometric branching implies (see Lemma 15, Eq. (138))

[TABLE]

Accordingly, the system (139) rewrites as

[TABLE]

where $\mathbb{G}$ is defined in Eq. (41), and

[TABLE]

In this setup, the unit time shift operator $\mathcal{S}$ , which advances the process time by unity, can be applied to individual trees and forests (collection of trees). For each tree $T\in\mathcal{T}^{|}$ , the operator removes the root and stem, resulting in two principal subtrees $T^{a}$ and $T^{b}$ . A consecutive applications of $d$ time shifts to a tree $T$ is equivalent to removing the vertices at depth $\leq d$ from the root together with their parental edges (Fig. 21). Next we define time invariance with respect to the shift $\mathcal{S}$ .

Definition 28 (Time invariance).

Geometric branching process $\mathcal{G}(s)$ , $s\in\mathbb{Z}_{+}$ , is called time invariant if the state vector ${\bf x}(s)$ is invariant with respect to a unit time shift $\mathcal{S}$ :

[TABLE]

Now we formulate the main result of this Section.

Theorem 15 ([83]).

A geometric branching process $\mathcal{G}(s;T_{k},p)$ is time invariant if and only if

[TABLE]

We call this family a (combinatorial) critical Tokunaga process, and the respective trees – (combinatorial) critical Tokunaga trees.

Theorem 15 is proven in Sect. 6.7.4 via solving a nonlinear system of equations that writes (144) in terms of ratios $S_{k}/S_{k+1}$ .

Corollary 9.

Let $\mathcal{G}$ be a combinatorial critical Tokunaga tree. Then the distribution of the principal subtree $T^{a}$ (and hence $T^{b}$ ) matches that of the initial tree $\mathcal{G}$ . The distributions of $T^{a}$ and $T^{b}$ are independent if and only if $c=2$ .

Proof.

Let ${\sf ord}(\mathcal{G})$ denote the (random) order of a random geometric tree $\mathcal{G}$ . Conditioned on ${\sf ord}(\mathcal{G})>1$ , at instant $s=1$ (equivalently, after a unit time shift $\mathcal{S}$ ) there exist exactly two vertices that are the roots of the principal subtrees $T^{a}$ and $T^{b}$ . Since the trees $T^{a}$ and $T^{b}$ have the same distribution, their roots have the same order distribution. Denote by $y_{k}$ the probability that the tree $T^{a}$ has order $k\geq 1$ and let ${\bf y}=(y_{1},y_{2},\dots)^{T}$ . By Thm. 15, the process $\mathcal{G}(s)$ is time invariant. We have $p=\pi_{1}=1/2$ , which, together with time invariance, implies

[TABLE]

This establishes the first statement.

The second statement follows from examining the joint distribution $q_{a,b}$ of (142). Recall that we write $K$ for the order of tree $T$ , $K_{a}$ , $K_{b}$ for the orders of the principal subtrees $T^{a}$ , $T^{b}$ , and $K_{1}<K_{2}$ for the order statistics of $K_{a}$ , $K_{b}$ . Observe that for $k>1$ ,

[TABLE]

Furthermore,

[TABLE]

Accordingly, the joint distribution of $K_{a}$ , $K_{b}$ equals the product of their marginals if and only if $c=2$ . This establishes the second statement.

We also notice that

[TABLE]

which provides an alternative, direct proof of the first statement of the corollary that does not use the time invariance. ∎

Remark 12.

Corollary 9 asserts that the principal subtrees in a random critical Tokunaga tree are dependent, except the critical binary Galton-Watson case $c=2$ . This implies that, in general, non-overlapping subtrees within a critical Tokunaga tree are dependent. Accordingly, the increments of the Harris path $H$ of a critical Tokunaga process have (long-range) dependence. The only exception is the case $c=2$ that will be discussed in Sect. 7.6. The structure of $H$ is hence reminiscent of a self-similar random process with long-range dependence [100, 122]. Establishing the correlation structure of the Harris paths of critical Tokunaga processes is an interesting open problem (see Sect. 12).

6.7.3 Frequency of orders in a large critical Tokunaga tree

Combinatorial trees of the critical Tokunaga processes (Def. 26, Prop. 10), and hence the time invariant geometric trees (also called combinatorial critical Tokunaga trees) of Thm. 15, have an additional important property: the frequencies of vertex orders in a large-order tree approximate the tree order distribution $p_{K}=2^{-K}$ in the space $\mathcal{BT}^{|}$ . To formalize this observation, let $\mu$ be a measure on $\mathcal{BT}^{|}$ induced by a combinatorial critical Tokunaga tree $\mathcal{G}$ of (145). For a fixed $K\geq 1$ , let $\mu_{K}(\mathcal{G})=\mu(\mathcal{G}|{\sf ord}(\mathcal{G})=K)$ . We write $V_{k}[\mathcal{G}]$ for the number of non-root vertices of order $k$ in a tree $\mathcal{G}$ , and let $\mathcal{V}_{k}[K]={\sf E}_{K}\big{[}V_{k}[\mathcal{G}]\big{]}.$ Finally, we denote by $V[\mathcal{G}]=\sum\limits_{k=1}^{{\sf ord}(\mathcal{G})}V_{k}[T]$ the total number of non-root vertices in $\mathcal{G}$ , and notice that $V[\mathcal{G}]=2V_{1}[\mathcal{G}]-1$ . Thus, $\mathcal{V}[K]:={\sf E}_{K}\big{[}V[\mathcal{G}]\big{]}=2\mathcal{V}_{1}[K]-1$ .

Proposition 11.

Let $\mathcal{G}$ be a combinatorial critical Tokunaga tree (145). Then

[TABLE]

Let $v\in\mathcal{G}$ be a vertex selected by uniform random drawing from the non-root vertices of $\mathcal{G}$ . Then, for any $k\geq 1$ ,

[TABLE]

Proof.

Theorem 1 asserts that a critical Tokunaga tree $\mathcal{G}$ satisfies the strong Horton law (29) with Horton exponent $R=2c$ :

[TABLE]

Conditioned on ${\sf ord}(\mathcal{G})=K$ we have, for any $k\in\{1,\ldots,K\}$ ,

[TABLE]

where $m_{i}(\mathcal{G})$ is the number of side branches that merge the $i$ -th branch of order $k$ in $\mathcal{G}$ , according to the proper branch labeling of Sect. 2.7. Proposition 7 gives

[TABLE]

For a critical Tokunaga tree with $T_{k}=(1-c)c^{k-1}$ this implies

[TABLE]

To show (148), we write

[TABLE]

where $m(i)$ is a random variable that represents the total number of side branches within $i$ -th branch of order $k$ within $\mathcal{G}$ . Since $N_{k}[\mathcal{G}]\stackrel{{\scriptstyle p}}{{\to}}\infty$ for any $k\geq 1$ as ${\sf ord}(\mathcal{G})\to\infty$ , the Weak Law of Large Numbers gives

[TABLE]

Finally, the strong Horton law of Cor. 6 gives

[TABLE]

This implies (148) and completes the proof. ∎

Proposition 11 has an immediate extension to trees with edge lengths, which we include here for completeness. Recall (Def. 1) that a tree $\mathcal{G}\in\mathcal{BL}^{|}$ can be considered a metric space with distance $d(a,b)$ between two points $a,b\in\mathcal{G}$ defined as the length of the shortest path within $\mathcal{G}$ connecting them.

Proposition 12.

Let $\mathcal{G}$ be a combinatorial critical Tokunaga tree (145). Let point $u\in\mathcal{G}$ be sampled from a uniform density function on the metric space $\mathcal{G}$ , and let ${\sf ord}(u)$ denote the order of the edge to which the point $u$ belongs. Then

[TABLE]

Proof.

Proposition 10 establishes that the edge lengths in $\mathcal{G}$ are i.i.d. exponential random variables. Thus we can generate $\mathcal{G}$ by first sampling a combinatorial critical Tokunaga tree $\textsc{shape}(\mathcal{G})$ , and then assigning i.i.d. exponential edge lengths. Provided that we already sampled $\textsc{shape}(\mathcal{G})$ , selecting the i.i.d. edge lengths and then selecting the point $u\in\mathcal{G}$ uniformly at random, and marking the edge that $u$ belongs to, is equivalent to selecting a random edge uniformly from the edges of $\textsc{shape}(\mathcal{G})$ , in order of proper labeling of Sect. 2.7. The order ${\sf ord}(u)$ is uniquely determined by the edge to which $u$ belongs. The statement now follows from Prop. 11. ∎

6.7.4 Proof of Theorem 15

Lemma 16 ([83]).

A geometric branching process $\mathcal{G}(s)$ is time invariant if and only if $p=1/2$ and the sequence $\{T_{k}\}$ solves the following (nonlinear) system of equations:

[TABLE]

Proof.

Assume that the process is time invariant. Then the process progeny is constant in time and equals unity:

[TABLE]

Observe that in one time step, every vertex of order ${\sf ord}=1$ terminates, and any vertex of order ${\sf ord}>1$ splits in two. Hence, the process progeny at $s=1$ is

[TABLE]

which implies $p=1/2$ . Accordingly, $p(1-p)^{k-1}=2^{-k}$ and the time invariance (144) takes the following coordinate form

[TABLE]

Multiplying (151) by $2^{k}$ and observing that $T_{k}=S_{k}-S_{k-1}$ , we obtain

[TABLE]

and

[TABLE]

We prove (150) by induction. For $k=1$ we have

[TABLE]

which establishes the base case

[TABLE]

Next, assuming that the statement is proven for $(k-1)$ , the left-hand side of (152) vanishes, and the right-hand part rewrites as (150). This establishes necessity.

Conversely, we showed that the system (150) is equivalent to (144) in case $p=1/2$ . This establishes sufficiency. ∎

Let $a_{k}=S_{k}/S_{k+1}\leq 1$ for all $k\geq 0$ . Then, for any $i\geq 0$ and any $k>0$ we have $S_{i}/S_{k+i}=a_{i}\,a_{i+1}\dots a_{i+k-1}$ . The system (150) rewrites in terms of $a_{i}$ as

[TABLE]

and so on, which can be summarized as

[TABLE]

Lemma 17 ([83]).

The system (153) with the initial value $a_{0}=1/c>0$ has a unique solution

[TABLE]

Proof of Lemma 17.

Suppose $\{a_{0},a_{1},a_{2},\dots\}$ is a solution to system (153). Then $\{1,a_{1}/a_{0},a_{2}/a_{0},\dots\}$ is also a solution, since each equation only includes multinomial terms of the same degree. Thus, without loss of generality we assume $a_{0}=1$ , and we need to prove that

[TABLE]

We consider two cases.

Case I. Suppose the sequence $\{a_{j}\}$ has a maximum: there exists an index $i\in\mathbb{N}$ such that $a_{i}=\max\limits_{j\in\mathbb{N}}a_{j}.$ Define

[TABLE]

Using $n=\ell$ in (153) we obtain that for any $\ell\in\mathbb{N}$ ,

[TABLE]

and using $n=\ell+1$ we find that an arbitrary $a_{\ell}$ is the weighted average of $\{a_{\ell+j}\}_{j=1,2,\ldots}$ :

[TABLE]

Hence, since $a_{i}=\max\limits_{j\in\mathbb{N}}a_{j}$ ,

[TABLE]

Similarly, letting $\ell=i-1$ in (154) and (155), we obtain $a_{i-1}=a$ . Recursively, by plugging in $\ell=i-2,~{}i-3,\ldots$ , we show that

[TABLE]

Finally, ${1\over 2}a_{1}+{1\over 4}a_{2}+{1\over 8}a_{3}+\ldots=1$ implies $a=1$ .

Case II. Suppose there is no $\max\limits_{j\in\mathbb{N}}a_{j}$ . Let $U:=\limsup\limits_{j\rightarrow\infty}a_{j}.$ From (153) we know via cancelation that

[TABLE]

Thus, $2^{-1}\,a_{n}<1$ and $U\leq 2$ . The absence of maximum implies $a_{j}<U\leq 2$ for all $j\in\mathbb{N}$ .

Plugging $n+1$ in (153), we obtain

[TABLE]

Thus, since $a_{j}<U$ for all $j\in\mathbb{N}$ ,

[TABLE]

which simplifies via (156) to

[TABLE]

For all $\varepsilon\in(0,1)$ , there are infinitely many $n\in\mathbb{N}$ such that $a_{n}>(1-\varepsilon)U$ . Then, for any such $n$ , the above inequality (157) implies

[TABLE]

where

[TABLE]

Let $\varphi^{(k)}=\varphi\circ\ldots\circ\varphi$ . Repeating the argument for any given number of iterations $K\in\mathbb{N}$ , we obtain

[TABLE]

Thus, given any $K\in\mathbb{N}$ , fix $\varepsilon\in(0,1)$ small enough so that such that $\varphi^{(k)}(\varepsilon)\in(0,1)$ for all $k=1,2,\ldots,K$ . Then, taking $n>K$ such that $a_{n}>(1-\varepsilon)U$ , we obtain from (156) that

[TABLE]

Now, since $\varepsilon$ can be chosen arbitrarily small,

[TABLE]

Finally, since $K$ can be selected arbitrarily large, we have proven that $1\geq U$ . However, this will contradict the assumption of Case II. Indeed, if $a_{j}<U\leq 1$ for all $j\in\mathbb{N}$ , then

[TABLE]

contradicting the first equation in the statement of the theorem. Thus, the assumptions of Case II cannot be satisfied. We conclude that there exists a maximal element in the sequence $\{a_{j}\}_{j=1,2,\ldots}$ as assumed in Case I, implying the statement of the theorem. ∎

Proof of Theorem 15.

Lemma 17 implies $a_{k}=S_{k}/S_{k+1}=1/c$ for some $c\geq 1$ . Hence $S_{1}=1+T_{1}=c$ and $T_{1}=c-1.$ Furthermore,

[TABLE]

and, accordingly,

[TABLE]

which completes the proof. ∎

7 Tree representation of continuous functions

We review here the results of [89, 106, 116, 150] on tree representation of continuous functions. This representation allows us to apply the self-similarity concepts to time series.

7.1 Harris path

For any embedded tree $T\in\mathcal{L}_{\rm plane}$ with edge lengths, the Harris path (also known as the contour function, or Dyck path) is defined as a piece-wise linear function [65, 116]

[TABLE]

that equals the distance from the root traveled along the tree $T$ in the depth-first search, as illustrated in Fig. 22. For a tree $T$ with $n$ leaves, the Harris path $H_{T}(t)$ is a piece-wise linear positive excursion that consists of $2n$ linear segments with alternating slopes $\pm 1$ .

7.2 Level set tree

This section introduces a tree representation of continuous functions, which we call a level set tree. We begin in Sect. 7.2.1 by assuming a finite number of local extrema; this construction is more intuitive and is sufficient for analysis of finite trees from $\mathcal{L}_{\rm plane}$ . A general definition for continuous functions follows in Sect. 7.2.2.

7.2.1 Tamed functions: finite number of local extrema

Consider a closed interval $I\subset\mathbb{R}$ and function $f(x)\in C(I)$ , where $C(I)$ is the space of continuous functions from $I$ to $\mathbb{R}$ . Suppose that $f(x)$ has a finite number of distinct local minima. The level set $\mathcal{L}_{\alpha}\left(f\right)$ is defined as the pre-image of the function values equal to or above $\alpha$ :

[TABLE]

The level set $\mathcal{L}_{\alpha}$ for each $\alpha$ is a union of non-overlapping intervals; we write $|\mathcal{L}_{\alpha}|$ for their number. Notice that $|\mathcal{L}_{\alpha}|=|\mathcal{L}_{\beta}|$ as soon as the interval $[\alpha,\,\beta]$ does not contain a value of local extrema of $f(x)$ and $0\leq|\mathcal{L}_{\alpha}|\leq n$ , where $n$ is the total number of the local maxima of $f(x)$ over $I$ .

The level set tree $\textsc{level}(f)\in\mathcal{L}_{\rm plane}$ is a tree that describes the structure of the level sets $\mathcal{L}_{\alpha}$ as a function of threshold $\alpha$ , as illustrated in Fig. 23. Specifically, there are bijections between

(i)

the leaves of $\textsc{level}(f)$ and the local maxima of $f(x)$ ;

(ii)

the internal (parental) vertices of $\textsc{level}(f)$ and the local minima of $f(x)$ , excluding possible local minima achieved on the boundary $\partial I$ ;

(iii)

a pair of subtrees of $\textsc{level}(f)$ rooted in the parental vertex that corresponds to a local minima $f(x^{*})$ and the adjacent positive excursions (or meanders bounded by $\partial I$ ) of $f(x)-f(x^{*})$ to the right and left of $x^{*}$ .

Furthermore, every edge in the tree is assigned a length equal the difference of the values of $f(x)$ at the local extrema that correspond to the vertices adjacent to this edge according to the bijections (i) and (ii) above. The tree root corresponds to the global minimum of $f(x)$ on $I$ . If the minimum is achieved at $x\in I\setminus\partial I$ , then the level set tree is stemless, $\textsc{level}(f)\in\mathcal{L}_{\rm plane}^{\vee}$ ; this case is shown in Fig. 23. Otherwise, if the minimum is on the boundary $\partial I$ , then the level set tree is planted, $\textsc{level}(f)\in\mathcal{L}_{\rm plane}^{|}$ .

7.2.2 General case

For a function $f(x)\in C(I)$ on a closed interval $I\subset\mathbb{R}$ , the level set tree is defined via the framework of Def. 1, following Aldous [3, 4] and Pitman [116]. Specifically, let $\underline{f}[a,b]:=\inf_{x\in[a,b]}f(x)$ for any subinterval $[a,b]\subset\,I$ . We define a pseudo-metric on $I$ as [4, 116]

[TABLE]

We write $a\sim_{f}b$ if $d_{f}(a,b)=0$ . Here $d_{f}$ is a metric on the quotient space $I_{f}\equiv I/\!\sim_{f}$ . It can be shown [116] that $\left(I_{f},d_{f}\right)$ is a tree by Def. 1. Figure 24 illustrates this construction for a particular piece-wise function (left panel), and shows the respective tree $(I_{f},d_{f})$ as an element of $\mathcal{L}_{\rm plane}^{|}$ (right panel).

We describe now the unique path $\sigma_{a,b}\subset I_{f}$ between a pair of points $a,b$ . Let $c\in[a,b]$ be the leftmost point where $f(x)$ achieves the minimum on $[a,b]$ :

[TABLE]

We define a function $\underline{f}(x)$ on $[a,b]$ as

[TABLE]

By construction, $\underline{f}(x)$ is a continuous function that is monotone non-increasing on $[a,c]$ and monotone nondecreasing on $[c,b]$ . Furthermore, $\underline{f}(x)\leq f(x)$ and, in particular, $\underline{f}(x)=f(x)$ for $x\in\{a,b,c\}$ .

Lemma 18 (Rising Sun Lemma, F. Riesz [118]).

Let

[TABLE]

Then $S$ is an open set that can be represented as a countable union of disjoint intervals

[TABLE]

such that $f(a_{k})=f(b_{k})=\underline{f}(a_{k})=\underline{f}(b_{k})$ and $f(x)>f(a_{k})$ for any $x\in(a_{k},b_{k})$ .

Proof.

The statement is equivalent to that of the Rising Sun Lemma of Riesz [118, 130] applied to the functions $-f(x)$ on $[c,b]$ and $-f(-x)$ on $[a,c]$ . We just notice that $f(c)$ is the global minimum of $f(x)$ on $[a,b]$ and so $c$ cannot be a part of $S$ . The union of two open sets, each represented as a countable union of disjoint intervals, is itself an open interval represented as a countable union of disjoint intervals. This completes the proof. ∎

Figure 25 illustrates the Rising Sun Lemma in our setting on the interval $[c,b]$ . As the sun rises from east (right), it lightens some segments of the graph of $f(x)$ , and leaves the other segments in shade. The pre-image of the shaded segments is the set $S$ , while the pre-image of the lighted segments is the path $\sigma_{c,b}$ . The path, considered as a set in $[c,b]$ , is making at most a countable number of jumps over the intervals $(a_{k},b_{k})$ that comprise the set $S$ of Lemma 18.

For a tamed function with a finite number of local extrema, the path $\sigma_{a,b}$ is the pre-image of the graph of $\underline{f}(x)$ excluding the constant intervals. The Rising Sun Lemma ensures that this statement generalizes to any continuous function:

[TABLE]

which is travelled at unit speed left to right. As a real set, the path $\sigma_{a,b}$ may have quite complicated structure. For instance, it can be the Cantor set. This, however, does not disturb the continuity of the map $[0,d_{f}(a,b)]\to I_{f}$ in Def. 1.

The Rising Sun Lemma asserts that the function $\underline{f}(x)$ on $[a,b]$ has at most a countable set of constant disjoint intervals $I^{(k)}=(a_{k},b_{k})$ , each of which corresponds to a positive excursion of $f(x)$ . The end points of these intervals are equivalent in $I_{f}$ , hence each interval generates a tree $(I^{(k)}_{f},d_{f})$ whose root corresponds to the equivalence class on $I$ consisting of $\{a_{k},b_{k}\}$ . This observations leads to the following statement.

Corollary 10.

The level set tree $\textsc{level}(f)$ of a continuous function $f(x)$ on a real closed interval $[a,b]\subset\mathbb{R}$ consists of a segment of length $d_{f}(a,b)$ and at most a countable number of trees attached to this segment with the same orientation. There is a one-to-one correspondence between these trees and the intervals $(a_{k},b_{k})$ from the Rising Sun Lemma.

It is straightforward to observe that the tree $(I_{f},d_{f})$ is equivalent to the above defined level set tree $\textsc{level}(f)$ for a function $f(x)\in C\left(I\right)$ with a finite number of distinct local minima. We just notice that for any subinterval $[a,b]\subset I$ , the correspondence $a\sim_{f}b$ implies $\{f(x):~{}x\in[a,b]\}$ is a nonnegative excursion i.e.,

[TABLE]

In other words, every point in $\left(I_{f},d_{f}\right)$ is an equivalence class of points on $I$ with respect to $\sim_{f}$ . There exist three types of equivalence classes, depending on the number of distinct points from $I$ they include: (i) each single point class corresponds to a leaf vertex (local maximum), (ii) each two point class corresponds to an internal edge point (positive excursion), and (iii) each three point class corresponds to an interval vertex (two adjacent positive excursions). For a general $f(x)\in C(I)$ there may exist equivalence classes that include an arbitrary number $n$ of points from $I$ , corresponding to $(n-1)$ adjacent positive excursions; and classes that consist of an infinite (countable or uncountable) number of points. Conversely, for every $\alpha$ , the level set $\mathcal{L}_{\alpha}(f)$ is a union of non-overlapping intervals $[a_{j},b_{j}]$ , i.e.,

[TABLE]

where for each $j$ , $a_{j}\sim_{f}b_{j}$ .

Representing level sets of a continuous function as a tree goes back to works of Menger [99] and Kronrod [77]. A multivariate analog of level set tree is among the key tools in proving the celebrated Kolmogorov-Arnold representation theorem (every multivariate continuous function can be represented as a superposition of continuous functions of two variables) that gives a positive answer to a general version of the Hilbert’s thirteenth problem [8, 141]. Such trees have also been discussed by Vladimir Arnold in connection to topological classification of Morse functions and generalizations of Hilbert’s sixteenth problem [9, 10]. Level set trees for multivariate Morse functions (albeit slightly different from those considered by Arnold) are discussed in Sect. 7.9.

7.3 Reciprocity of Harris path and level set tree

Consider a function $f(x)\in C(I)$ with a finite number of distinct local minima. By construction, the level set tree $\textsc{level}(f)$ is completely determined by the sequence of the values of local extrema of $f$ , and is not affected by timing of those extrema, as soon as their order is preserved. This means, for instance, that if $g(x)$ is a continuous and monotone increasing function on $I$ , then the trees $\textsc{level}(f)$ and $\textsc{level}(f\circ g)$ are equivalent in $\mathcal{L}_{\rm plane}$ . Hence, without loss of generality we can focus on the level set trees of continuous functions with alternating slopes $\pm 1$ . We write $\mathcal{E}^{\rm ex}$ for the space of all positive piece-wise linear continuous finite excursions with alternating slopes $\pm 1$ and a finite number of segments (i.e., a finite number of local extrema).

The level set tree of an excursion from $\mathcal{E}^{\rm ex}$ and Harris path are reciprocal to each other as described in the following statement.

Proposition 13 (Reciprocity of Harris path and level set tree).

The Harris path $H:\mathcal{L}_{\rm plane}^{|}\to\mathcal{E}^{\rm ex}$ and the level set tree ${\textsc{level}}:\mathcal{E}^{\rm ex}\to\mathcal{L}_{\rm plane}^{|}$ are reciprocal to each other. This means that for any $T\in\mathcal{L}_{\rm plane}^{|}$ we have $\textsc{level}(H_{T}(t))\equiv T,$ and for any $g(t)\in\mathcal{E}^{\rm ex}$ we have $H_{\textsc{level}(g)}(t)\equiv g(t).$

This statement is readily verified by examining the excursions and trees in Figs. 22,24.

7.4 Horton pruning of positive excursions

This section examines the level set tree and its Horton pruning for a positive excursion on a finite real interval. We use these results for analysis of random walks $X_{k}$ , $k\in\mathbb{Z}$ , which motivates us to write here $X_{t}$ , $t\in\mathbb{R}$ , for a continuous function.

Consider a continuous positive excursion $X_{t}$ , $t\in[a,b]$ , with a finite number of distinct local minima and such that $X_{a}=X_{b}=0$ and $X_{t}>0$ for $a<t<b$ . Furthermore, consider excursion $X^{(1)}_{t}$ , $t\in[a,b]$ , obtained by a linear interpolation of the boundary values and the local minima of $X_{t}$ ; as well as functions $X^{(m)}_{t}$ , $t\in[a,b]$ , for $m\geq 1$ , obtained by taking the local minima of $X_{t}$ iteratively $m$ times, and linearly interpolating their values together with $X_{a}=X_{b}=0$ (see Fig. 26a).

In the space of level set trees of tamed continuous functions, the Horton pruning $\mathcal{R}$ corresponds to coarsening the respective function by removing (smoothing) its local maxima, as illustrated in Fig. 26. An iterative pruning corresponds to iterative transition to the local minima, as we describe in the next statement.

Proposition 14 (Horton pruning of positive excursions, [150]).

The transition from a positive excursion $X_{t}$ to the respective excursion $X^{(1)}_{t}$ of its local minima corresponds to the Horton pruning of the level set tree $\textsc{level}(X_{t})$ . This is illustrated in a diagram of Fig. 27. In general,

[TABLE]

Proof.

First,

[TABLE]

is established via the following observation. For a pair of consecutive local minima $s_{1}<s_{2}$ , the level set tree $\textsc{level}(\widetilde{X}_{t})$ of the function

[TABLE]

is obtained from $\textsc{level}\left(X_{t}\right)$ by removing the leaf that corresponds to the unique local maximum of $X_{t}$ inside $(s_{1},s_{2})$ together with its parental edge that connects it to the parental vertex, corresponding to $\max\{X_{s_{1}},X_{s_{2}}\}$ . Thus, substituting $X_{t}$ with linear interpolation of local minima, $X^{(1)}_{t}$ , will result in simultaneous removal of leaves together with the parental edges. The statement of the proposition follows via recursion of (159). ∎

It is straightforward to formulate an analog of Prop. 14 without the excursion assumption, for continuous functions with a finite number of distinct local minima within $[a,b]$ .

7.5 Excursion of a symmetric random walk

We turn now to random walks $\{X_{k}\}_{k\in\mathbb{Z}}$ . Linear interpolation of their trajectories corresponds to the tamed continuous functions. A random walk $\{X_{k}\}_{k\in\mathbb{Z}}$ with a transition kernel $p(x,y)$ is called homogeneous if $p(x,y)\equiv p(y-x)$ for any $x,y\in\mathbb{R}$ . A homogeneous random walk is symmetric if $p(x)=p(-x)$ for all $x\in\mathbb{R}$ . The transition kernel of a symmetric random walk can be represented as the even part of a p.d.f. $f(x)$ with support ${\sf supp}(f)\subseteq\mathbb{R}_{+}$ :

[TABLE]

We assume that $p(x)$ , and hence $f(x)$ , is an atomless density function.

We write $\{X^{(1)}_{k}\}_{k\in\mathbb{Z}}$ for the sequence of local minima of $\{X_{k}\}_{k\in\mathbb{Z}}$ , listed in the order of occurrence, from left to right. In particular, we set $X^{(1)}_{0}$ to be the value of the leftmost local minima of $X_{k}$ for $k\geq 0$ . Recursively, we let $\{X^{(j+1)}_{k}\}_{k\in\mathbb{Z}}$ denote the sequence of local minima of $\{X^{(j)}_{k}\}_{k\in\mathbb{Z}}$ .

Lemma 19 (Local minima of random walks, [150]).

The following statements hold.

(i)

The sequence of local minima $\{X^{(1)}_{k}\}_{k\in\mathbb{Z}}$ of a homogeneous random walk $\{X_{k}\}_{k\in\mathbb{Z}}$ is itself a homogeneous random walk.

(ii)

The sequence of local minima $\{X^{(1)}_{k}\}_{k\in\mathbb{Z}}$ of a symmetric homogeneous random walk $\{X_{k}\}_{k\in\mathbb{Z}}$ is itself a symmetric homogeneous random walk.

Proof.

Let $d_{j}=X^{(1)}_{j+1}-X^{(1)}_{j}$ . We have, for each $j$ ,

[TABLE]

where the first sum corresponds to $\xi_{+}$ positive increments of $X_{k}$ between a local minimum $X^{(1)}_{j}$ and the subsequent local maximum $m_{j}$ , and the second sum corresponds to $\xi_{-}$ negative increments between the local maximum $m_{j}$ and the subsequent local minimum $X^{(1)}_{j+1}$ . Accordingly, $\xi_{+}$ and $\xi_{-}$ are independent geometric random variables

[TABLE]

with parameters, respectively,

[TABLE]

and $Y_{i}$ , $Z_{i}$ are i.i.d. positive continuous random variables with p.d.f.s, respectively,

[TABLE]

$(i)$ By independence of increments of a random walk, the random jumps $d_{j}$ have the same distribution for each $j$ . This establishes the statement.

$(ii)$ For the kernel of a symmetric random walk, we have representation (160). In this case, $\xi_{+}$ and $\xi_{-}$ are independent geometric random variables with parameters $p^{+}=p^{-}=1/2$ and $Y_{i}$ , $Z_{i}$ are i.i.d. positive continuous random variables with p.d.f. $f(x)$ . Hence, both sums in (161) have the same distribution, and their difference has a symmetric distribution. Thus $\{X^{(1)}_{j}\}_{j\in\mathbb{Z}}$ is a symmetric homogeneous random walk. ∎

We notice that the symmetric kernel $p^{(1)}(x)$ for the chain of local minima $\{X^{(1)}_{j}\}_{j\in\mathbb{Z}}$ is necessarily different from $p(x)$ in both parts of Lemma 19. Hence, the random walk $\{X^{(1)}_{j}\}$ of local minima is always different from the initial random walk $\{X_{k}\}$ . In a symmetric case, however, both the processes happen to be closely related in terms of the structure of their level set trees. Now we explore this relation.

Consider linear interpolation $\{X_{t}\}_{t\in\mathbb{R}}$ of a symmetric homogeneous random walk $\{X_{k}\}_{k\in\mathbb{Z}}$ with an atomless transition kernel $p(x)$ . For any $k\in\mathbb{Z}$ , let

[TABLE]

be the level set tree of the first positive excursion of $X_{t}-X_{k}$ to the right of $k$ , with convention $T^{\rm ex}=\phi$ if $X_{k+1}-X_{k}<0$ . Formally, let $r=r(k)\in\mathbb{R}$ be the unique epoch such that (Fig. 28)

[TABLE]

The epoch $r(k)$ is almost surely finite, as can be demonstrated by a renewal argument using the symmetry of the increments of $X_{k}$ . We define

[TABLE]

It follows from this definition that

[TABLE]

The basic properties of symmetric homogeneous random walks imply that the distribution of $T^{\rm ex}(X_{t},k)$ is the same for all points $k\in\mathbb{Z}$ . This justifies the following definition.

Definition 29 (Positive and nonnegative excursions).

In the above setup, we call process $X^{\rm ex}_{t}$ a nonnegative excursion of the linearly interpolated symmetric homogeneous random walk $\{X_{t}\}_{t\in\mathbb{R}}$ if

[TABLE]

Furthermore, we call process $X^{\rm ex}_{t}$ a positive excursion of the linearly interpolated symmetric homogeneous random walk $\{X_{t}\}_{t\in\mathbb{R}}$ if

[TABLE]

A positive excursion defined above will also be called a positive right excursion. The corresponding positive excursion in the reversed time order, starting from $k$ and going in the negative time direction, will be called positive left excursion. According to Def. 29, a nonnegative excursion may consist of a single point (if $r(k)=k$ ), in which case its level set tree is the empty tree. A positive excursion necessarily includes at least one positive value, and its level set tree is non-empty.

Consider a homogeneous random walk $X_{k}$ with a symmetric atomless transition kernel $p(x)$ , $x\in\mathbb{R}$ , represented as in (160). Note that $X_{k}$ is time reversible, with $p(x)$ also being the transition kernel of the reversed process. The increment between a pair of consecutive local extrema (a minimum and a maximum) of $X_{k}$ is a sum of ${\sf Geom}_{1}(1/2)$ -distributed number of i.i.d. $f(x)$ -distributed random variables, and therefore has density

[TABLE]

We now examine a positive-time process $\{X_{k}\}_{k\geq 0}$ , conditioned on $X_{0}=0$ . Consider a sequence of local minima $\big{\{}X_{j}^{(1)}\big{\}}_{j\geq 1}$ , where we set $X_{0}^{(1)}=0$ , and $X_{1}^{(1)},X_{2}^{(1)},\ldots$ are the local minima of the random walk $X_{k}$ , listed from left to right. For a positive right excursion $X^{\sf ex}$ originating at $k=0$ , the number $N$ of leaves in the level set tree $\textsc{level}(X^{\sf ex})$ is determined by the location of the first local minimum below zero:

[TABLE]

The number of edges in the level set tree is $\#\textsc{level}(X^{\sf ex})=2N-1$ . Moreover, let $\kappa>0$ be the time of the first local minimum below zero, $X_{\kappa}=X_{N}^{(1)}$ . Next, we define the quantity by which the first nonpositive local minimum of $X_{k}$ falls below the starting point at zero.

Definition 30 (Extended positive excursion and excess value).

In the above setup, the process $\breve{X}^{\sf ex}=\{X_{t}\}_{t\in[0,\kappa]}$ is called the extended positive excursion or extended positive right excursion. That is, $\breve{X}^{\sf ex}$ is obtained by extending the excursion $X^{\sf ex}$ until the first local minimum $X_{\kappa}$ below the starting value. The quantity $~{}\Lambda\big{(}\breve{X}^{\sf ex}\big{)}:=-X_{N}^{(1)}$ is called the excess value of the extended excursion. This definition is illustrated in Fig. 29(a).

The notions of the extended positive excursion and the excess value $\Lambda\big{(}\breve{X}^{\sf ex}\big{)}$ can be expanded to the left and right excursions with arbitrary initial values.

Theorem 16 (Combinatorial excursion tree is critical Galton-Watson).

Suppose $X^{\sf ex}$ is a positive excursion of a homogeneous random walk on $\mathbb{R}$ with a symmetric atomless transition kernel and $T=\textsc{level}(X^{\sf ex})$ . Then, the combinatorial shape of $T$ has the same distribution on $\mathcal{BT}^{|}$ as the critical binary Galton-Watson tree:

[TABLE]

Proof.

Recall that $\textsc{shape}(T)$ is almost surely in $\mathcal{BT}^{|}$ . Without loss of generality we consider a positive right excursion $X^{\sf ex}$ originating at $k=0$ , where we set $X_{0}=0$ . The tree $\textsc{shape}(T)$ has exactly one leaf if and only if the first local minimum falls below zero. That is, if the jump from $X_{0}=0$ to the first local maximum is smaller than or equal to the size of the jump from the first local maximum to the consecutive local minimum. The probability of this event is:

[TABLE]

According to the characterization of the critical Galton-Watson distribution $\mathcal{GW}(1/2,1/2)$ given in Remark 2 of Sect. 2.8, the proof will be complete if we show that conditioned on ${\sf ord}(T)\geq 2$ , the tree $\textsc{shape}(T)$ splits into a pair of complete subtrees sampled independently from the same distribution as $\textsc{shape}(T)$ . This step is completed as follows.

Consider the space $\mathcal{X}_{L}$ of all the trajectories of all extended positive left excursions originating at $X_{0}=0$ and whose level set trees are of Horton-Strahler order $\geq 2$ . Similarly, consider the space $\mathcal{X}_{R}$ of all the trajectories of all extended positive right excursions originating at $X_{0}=0$ and whose level set trees are of Horton-Strahler order $\geq 2$ . We know from (163) that the probability measure for each of the sets $\mathcal{X}_{L}$ and $\mathcal{X}_{R}$ totals $1/2$ . Thus, we may consider the union set of left and right extended positive excursions $\mathcal{X}_{L}\cup\mathcal{X}_{R}$ and equip it with a new probability measure obtained by gluing together the two respective restrictions of probability measures for the left and the right positive excursions. That is the probability measure over the trajectories in $\mathcal{X}_{L}\cup\mathcal{X}_{R}$ when restricted to either $\mathcal{X}_{L}$ or $\mathcal{X}_{R}$ , will coincide with the respective probability measures for the left and for the right positive excursions, with the total probability adding up to one. Now, since all the left and the right extended positive excursions in $\mathcal{X}_{L}\cup\mathcal{X}_{R}$ have Horton-Strahler order $\geq 2$ , for each $\breve{X}^{\sf ex}\in\mathcal{X}_{L}\cup\mathcal{X}_{R}$ there is almost surely a unique integer $d^{*}$ such that $\breve{X}^{\sf ex}_{d^{*}}>0$ is the smallest local minimum of the excursion $\breve{X}^{\sf ex}$ .

Next, conditioning on $X_{0}=0$ being a local minimum of $X_{t}$ , we consider a space $\mathcal{X}_{LR}$ of all possible trajectories such that each trajectory consists of the left and the right extended positive excursions originating from $X_{0}=0$ (with no restrictions on their orders). For a trajectory in $\mathcal{X}_{LR}$ , let $\kappa_{L}<0$ and $\kappa_{R}>0$ denote the (random) endpoints of the left and the right extended positive excursions. Thus, a trajectory $X_{t}$ , $t\in[\kappa_{L},\kappa_{R}]$ , in $\mathcal{X}_{LR}$ consists of a left extended positive excursion $X_{t}~{}(\kappa_{L}\leq t\leq 0)$ and a right extended positive excursion $X_{t}~{}(0\leq t\leq\kappa_{R})$ . This construction is illustrated in Fig. 29(b). The probability measure over the space $\mathcal{X}_{LR}$ is a product measure of the left and the right positive excursions. We claim that there exists a bijective measure preserving shift map

[TABLE]

Indeed, if the excess value $\Lambda\big{(}\{X_{t}\}_{\kappa_{L}\leq t\leq 0}\big{)}=-X_{\kappa_{L}}$ for the left excursion is smaller than the excess value $\Lambda\big{(}\{X_{t}\}_{\kappa_{L}\leq t\leq 0}\big{)}=-X_{\kappa_{R}}$ for the right excursion, we set

[TABLE]

Otherwise, we set

[TABLE]

The map $\Psi$ is one-to-one onto as it consists of the vertical and the horizontal shifts. Also observe that under the mapping $\Psi$ , the point $(0,0)$ of a trajectory in $\mathcal{X}_{LR}$ is sent to the point $(d^{*},\breve{X}^{\sf ex}_{d^{*}})$ of the image trajectory in $\mathcal{X}_{L}\cup\mathcal{X}_{R}$ . We can construct $\Psi^{-1}:\,\mathcal{X}_{L}\cup\mathcal{X}_{R}\rightarrow\mathcal{X}_{LR}$ accordingly as a map that shifts a trajectory $\breve{X}^{\sf ex}$ in $\mathcal{X}_{L}\cup\mathcal{X}_{R}$ by subtracting $(d^{*},\breve{X}^{\sf ex}_{d^{*}})$ . Finally, because we take the same product of the transition kernel values ${\sf s}(x)$ for the increments of a trajectory in $\mathcal{X}_{LR}$ as for its image in $\mathcal{X}_{L}\cup\mathcal{X}_{R}$ under the one-to-one mapping $\Psi$ , the mapping $\Psi$ is measure preserving.

Thus, since vertical and horizontal shifts of a function preserve its level set tree, we conclude that the distribution of the level set trees for the trajectories in $\mathcal{X}_{LR}$ and the trajectories in $\mathcal{X}_{L}\cup\mathcal{X}_{R}$ coincide. The level set tree for a trajectory in $\mathcal{X}_{LR}$ consists of a stem that branches into two level set trees of the left and right positive excursions adjacent to $X_{0}=0$ , sampled independently from the same distribution as $\textsc{shape}(T)$ . This is so since for the trajectories in $\mathcal{X}_{LR}$ , $~{}X_{0}=0$ is the smallest local minimum. Finally, we observe that the distribution of $\textsc{shape}\big{(}\textsc{level}(\breve{X}^{\sf ex})\big{)}$ is the same when $\breve{X}^{\sf ex}$ is sampled from $\mathcal{X}_{L}$ as when it is sampled from $\mathcal{X}_{R}$ . Thus, for $\breve{X}^{\sf ex}$ sampled from $\mathcal{X}_{R}$ , $\textsc{shape}\big{(}\textsc{level}(\breve{X}^{\sf ex})\big{)}$ consists of a stem that branches into two level set trees. If $X^{\sf ex}$ is the right positive excursion corresponding to $\breve{X}^{\sf ex}$ sampled from $\mathcal{X}_{R}$ , then almost surely,

[TABLE]

Thus, conditioned on ${\sf ord}(T)\geq 2$ , the tree $\textsc{shape}(T)$ splits into a pair of complete subtrees sampled independently from the same distribution as $\textsc{shape}(T)$ . This completes the proof. ∎

Theorem 16 establishes that the level set trees of symmetric random walks have the same combinatorial structure (equivalent to that of a ciritical binary Galton-Watson tree), independently of the choice of the transition kernel $p(x)$ . The planar embedding and metric structure of the level set trees, however, may depend on the kernel, as we illustrate in the following remark.

Remark 13.

Consider an extended positive right excursion $\breve{X}^{\rm ex}$ of a symmetric homogeneous random walk and let $T=\textsc{level}(\breve{X}^{\rm ex})$ be its level set tree. Condition on the event ${\sf ord}(T)\geq 2$ , which ensures that the left and right principal subtrees of $T$ , which we denote $T^{\ell}$ and $T^{r}$ , respectively, are non-empty.

It follows from the construction in the proof of Thm. 16 that the subtrees $T^{\ell}$ and $T^{r}$ can be sampled as follows. Consider two independent excursions – an extended positive right excursion $\breve{X}_{t}^{\sf ex,r},~{}t\in[0,\kappa_{r}]$ , and an extended positive left excursion $\breve{X}_{t}^{\sf ex,\ell},~{}t\in[\kappa_{\ell},0]$ . Next, condition on the event that the excess value of the left excursion is less than that of the right excursion:

[TABLE]

Denote by $X^{\sf ex,\ell}$ and $X^{\sf ex,r}$ the positive left and right excursions that correspond to the extended excursions $\breve{X}^{\sf ex,\ell}$ and $\breve{X}^{\sf ex,r}$ . Then,

[TABLE]

Write $X^{\sf ex}$ for the positive right excursion that corresponds to the extended excursion $\breve{X}^{\rm ex}$ . Then, the stem of the tree $\textsc{level}(X^{\sf ex})\in\mathcal{BL}_{\rm plane}^{|}$ has length equal to $\Lambda\big{(}\{\breve{X}_{t}^{\sf ex,\ell}\}_{t\in[0,\kappa_{\ell}]}\big{)}$ . This, in general, may introduce dependence between the planar embedding of $T$ and its edge lengths. Such dependence is absent in the exponential critical binary Galton-Watson tree ${\sf GW}(\lambda)$ .

Next, condition on the event that $X^{\sf ex}$ is an $\Lambda$ -shaped excursion, which is equivalent to

[TABLE]

Then, the density function of the excess value $\Lambda\big{(}\breve{X}^{\sf ex}\big{)}$ that we denote by $\lambda_{1}(x)$ satisfies

[TABLE]

where ${\sf s}(x)$ was defined in (162). This is so because conditioned on

[TABLE]

the extended excursion $\breve{X}^{\sf ex}$ consists of an ${\sf s}$ -distributed jump upward, and a larger ${\sf s}$ -distributed jump downward. The excess value $\Lambda\big{(}\breve{X}^{\sf ex}\big{)}$ is the difference between the jumps. The multiple of $2$ in (165) is due to conditioning upon the event of probability $1/2$ that the jump up is smaller than the jump down.

Similarly, one can condition on the event that $X^{\sf ex}$ is an $M$ -shaped excursion, which is equivalent to the event that the level set tree has $2$ leaves and $3$ edges, i.e.,

[TABLE]

Then, the density function of the excess value $\Lambda\big{(}\breve{X}^{\sf ex}\big{)}$ that we denote by $\lambda_{2}(x)$ satisfies

[TABLE]

This is so because conditioned on

[TABLE]

the extended excursion $\breve{X}^{\sf ex}$ consists of two $\Lambda$ -shaped (left and right) extended positive excursions originating from the only local minimum within the interior of the time domain $[0,\kappa]$ of $\breve{X}^{\sf ex}$ . The excess value $\Lambda\big{(}\breve{X}^{\sf ex}\big{)}$ is the difference between the two $\lambda_{1}$ -distributed excess values of the two $\Lambda$ -shaped extended positive excursions.

Lemma 20.

Consider a homogeneous random walk $X_{k}$ on $\mathbb{R}$ with a symmetric atomless transition kernel $p(x)$ , $x\in\mathbb{R}$ , i.e., there is a p.d.f. $f(x)$ with the support ${\sf supp}(f)\subseteq\mathbb{R}_{+}$ such that $p(x)={1\over 2}(f(x)+f(-x))$ . Consider an extended positive excursion $\breve{X}^{\sf ex}$ of $X_{k}$ , and the corresponding positive excursion $X^{\sf ex}$ . Let $T=\textsc{level}(X^{\sf ex})$ . Then, the following statements are equivalent:

(a)

$T$ * is independent of the excess value $\Lambda\big{(}\breve{X}^{\sf ex}\big{)}$ ;*

(b)

conditioned on $\textsc{p-shape}(T)$ , the edge lengths are identically distributed;

(c)

$f(x)$ * is an exponential p.d.f.*

If any of these statements holds, then the edge lengths are i.i.d. exponential random variables.

Proof.

$(c)\Rightarrow(a)$ . It is easy to show via the characteristic functions that ${\sf s}(x)$ is an exponential p.d.f. if and only if $f(x)$ is an exponential p.d.f.. The memoryless property of the exponential random variables implies that if ${\sf s}(x)$ is an exponential p.d.f., then $T=\textsc{level}(X^{\sf ex})$ is independent of the excess value $\Lambda\big{(}\breve{X}^{\sf ex}\big{)}$ .

$(a)\Rightarrow(c)$ . The excess value of a $\Lambda$ -shaped extended positive excursion has the same distribution as the excess value of a $M$ -shaped extended positive excursion if and only if $\lambda_{1}(x)\equiv\lambda_{2}(x)$ . If this equality holds, then by equation (166) the p.d.f. $\lambda_{1}(x)$ satisfies equation (233) in Lemma 33, which implies that $\lambda_{1}(x)\equiv\lambda_{2}(x)$ is an exponential density function. Hence, from (165) and Lemma 34 we conclude that ${\sf s}(x)\equiv\lambda_{1}(x)$ is an exponential density function, which in turn implies that $f(x)$ is exponential.

$(b)\Rightarrow(c)$ . The distribution of the leaf length is the minimum of two independent ${\sf s}(x)$ -distributed random variables. Thus the cumulative distribution function of the leaf length equals

[TABLE]

The cumulative distribution function for the length of the non-leaf edge in a $Y$ -shaped branch equals

[TABLE]

Here, $F_{1}(x)\equiv F_{2}(x)$ if and only if $\lambda_{1}(x)\equiv{\sf s}(x)$ , which by Lemma 33 and equation (165) happens if and only if ${\sf s}(x)$ is exponential. This implies that $f(x)$ is an exponential p.d.f..

$(c)\Rightarrow(b)$ . Suppose $f(x)$ is the exponential density with parameter $\lambda$ , i.e., $f(x)=\phi_{\lambda}(x)$ . According to the construction in the proof of Thm. 16, together with statement $(a)$ , and because any edge in $T$ is a stem of a unique descendant subtree of $T$ , it suffices to prove that conditioned on $\textsc{p-shape}(T)$ , the tree stem (root edge) has exponential distribution with parameter $\lambda$ .

According to (162), ${\sf s}(x)$ has the exponential density with parameter $\lambda/2$ . Conditioned on ${\sf ord}(T)=1$ , the length of the stem (the only edge of the tree) equals the minimum of two independent exponentially distributed random variables with density ${\sf s}(x)$ , and hence has the exponential density with parameter $\lambda$ . Conditioned on ${\sf ord}(T)\geq 2$ , the length of the stem is the minimum of the excess values of two independent extended positive excursions. By the memoryless property of the exponential distribution, each of these excess values has the exponential distribution with parameter $\lambda/2$ . Hence, the stem length has the exponential distribution with parameter $\lambda$ . This shows that the edge lengths in $T$ have the same distribution.

Finally, suppose any and therefore all three of the statements (a)-(c) hold, then properties (b) and (c) insure that the edge lengths are identically and exponentially distributed, while property (a) insures the independence of edge lengths. This completes the proof. ∎

7.6 Exponential random walks

Proposition 14 (and the subsequent comment) suggests that the problem of finding Horton self-similar trees with edge lengths is related to finding extreme-invariant processes

[TABLE]

where $\{X_{k}\}_{k\in\mathbb{Z}}$ , is a time series with an atomless distribution at every $k$ and $X^{(1)}_{j}$ is the corresponding time series of local minima. The equality (167) is understood as the distributional equivalence of two time series.

In this section we establish a sufficient condition for a symmetric homogeneous random walk to solve (167), and show that in this case $\zeta=2$ . Moreover, we show that if a symmetric random walk $X_{k}$ satisfies (167), the level set tree of its finite positive excursion, considered as elements of $\mathcal{L}_{\rm plane}$ , is self-similar according to Def. 11. Symmetric random walks with exponential increments is an example of a process that solves (167).

The following result describes the solution of the problem (167) in terms of the characteristic function of $f(x)$ .

Proposition 15 (Extreme-invariance of a symmetric homogeneous random walk, [150]).

Consider a symmetric homogeneous random walk $\{X_{k}\}_{k\in\mathbb{Z}}$ with a transition kernel $~{}p(x)=\frac{f(x)+f(-x)}{2}~{}$ , where $f(x)$ is a p.d.f. with support ${\sf supp}(f)\subseteq\mathbb{R}_{+}$ and a finite second moment. Then, the local minima $\{X^{(1)}_{j}\}_{j\in\mathbb{Z}}$ of $\{X_{k}\}_{k\in\mathbb{Z}}$ form a symmetric homogeneous random walk with transition kernel

[TABLE]

if and only if $\zeta=2$ and

[TABLE]

where $\widehat{f}(s)$ is the characteristic function of $f(x)$ and $\Re[z]$ denotes the real part of $z\in\mathbb{C}$ .

Proof.

Each increment between the consecutive local minima of $X_{k}$ can be represented as $d_{j}$ of (161) where $\{Y_{i}\}$ and $\{Z_{i}\}$ are i.i.d. with density $f(x)$ , and $\xi_{+}$ and $\xi_{-}$ are independent geometric random variables with parameter $1/2$ , i.e., ${\sf Geom}_{1}(1/2)$ .

The law of total variance readily implies that $\zeta=2$ . Indeed,

[TABLE]

where $\mu$ and $\mu^{2}+\sigma^{2}$ are the first and the second moments of $f(x)$ respectively. Thus, on one hand, the variance of the increments of $X_{k}$ is

[TABLE]

since for a symmetric homogeneous random walk, ${\sf E}[X_{k+1}-X_{k}]=0$ . On the other hand, (161) and (7.6) imply that the variance of the increments in the sequence of local minima $X^{(1)}_{j}$ is

[TABLE]

Hence, ${\sf Var}(X^{(1)}_{j+1}-X^{(1)}_{j})=4\,{\sf Var}(X_{k+1}-X_{k})$ , and therefore $\zeta=2$ is the only value of $\zeta$ for which the scaling (168) may hold.

Taking the characteristic functions in (168), we obtain

[TABLE]

while taking the characteristic function of $d_{j}$ in (161) we have

[TABLE]

Thus, (168) is satisfied if and only if

[TABLE]

Substituting $\zeta=2$ into (171) completes the proof. ∎

Example 13.

Exponential density $f(x)=\phi_{\lambda}(x)$ of (69) solves (169) with any $\lambda>0$ ; see Thm. 17 below for more detail.

Consider a time series $\{X_{k}\}_{k\in\mathbb{Z}}$ , with an atomless distribution of values at each $k$ . Let $\{X_{t}\}_{t\in\mathbb{R}}$ , be a continuous function of linearly interpolated values of $X_{k}$ . We define a positive excursion of $X_{k}$ as a fragment of the time series on an interval $[l,r]$ , $l,r\in\mathbb{Z}$ , such that $X_{l}\geq X_{r}$ and $X_{k}>X_{l}$ for all $l<k<r$ (see Fig. 28). To each positive excursion of $X_{k}$ on $[l,r]$ corresponds a positive excursion of $X_{t}$ on $[l,\tilde{r}]$ , where $\tilde{r}\in(r-1,r]$ is such that $X_{\tilde{r}}=X_{l}$ . The level set tree of a positive excursion of $X_{k}$ is that of the corresponding positive excursion of $X_{t}$ .

Propositions 15 and 14 imply the following statement.

Corollary 11.

Consider a symmetric homogeneous random walk $\{X_{k}\}_{k\in\mathbb{Z}}$ with a transition kernel $~{}p(x)=\frac{f(x)+f(-x)}{2}~{}$ , where $f(x)$ is a p.d.f. with support ${\sf supp}(f)\subseteq\mathbb{R}_{+}$ and a finite second moment. Let

[TABLE]

be the level set tree for a positive excursion $\{X_{t}\}_{t\in[l,r]}$ generated by the random walk $X_{k}$ as defined in Sect. 7.2. Then, the tree $T$ has a Horton self-similar distribution (Def. 11) over $\mathcal{BL}_{\rm plane}^{|}$ , if and only if the condition (169) holds for the characteristic function $\widehat{f}(s)$ of $f(x)$ .

Proof.

The coordination in shapes and lengths follows from the random walk construction. Props. 15,14 establish the Horton prune-invariance. ∎

A homogeneous random walk on $\mathbb{R}$ is called exponential random walk if its transition kernel is a mixture of exponential jumps:

[TABLE]

where $\phi_{\lambda}$ is the exponential density with parameter $\lambda>0$ as defined in Eq. (69). We refer to an exponential random walk by its parameter triplet $\{\rho,\lambda_{u},\lambda_{d}\}$ . Each interpolated exponential random walk with parameters $\{\rho,\lambda_{u},\lambda_{d}\}$ is a piece-wise linear function whose positive (up) and negative (down) increments are independent exponential random variables with respective parameters $\lambda_{u}$ and $\lambda_{d}$ , and the probabilities of a positive or negative increment at every integer instant are $\rho$ and $(1-\rho)$ , respectively. After a time change that makes all segments to have slopes $\pm 1$ , each interpolated exponential random walk with parameters $\{\rho,\lambda_{u},\lambda_{d}\}$ corresponds to a piece-wise linear function with alternating rises and falls that have independent exponential lengths with parameters $(1-\rho)\lambda_{u}$ and $\rho\lambda_{d}$ , respectively. An exponential random walk is symmetric if and only if $\rho=1/2$ and $\lambda_{u}=\lambda_{d}$ .

Theorem 17 (Self-similarity of exponential random walks, [150]).

Let $\{X_{k}\}_{k\in\mathbb{Z}}$ be an exponential random walk with parameters $\{\rho,\lambda_{u},\lambda_{d}\}$ . Then

(a)

The sequence $\{X^{(1)}_{j}\}_{j\in\mathbb{Z}}$ of the local minima of $X_{k}$ is an exponential random walk with parameters $\{\rho^{*},\lambda_{u}^{*},\lambda_{d}^{*}\}$ such that

[TABLE]

(b)

The exponential walk $X_{k}$ satisfies the self-similarity condition (167) if and only if it is symmetric ( $\rho=1/2$ * and $\lambda_{u}=\lambda_{d}$ ), i.e., when $p(x)$ is a mean zero Laplace p.d.f.*

(c)

The self-similarity (167) is achieved after the first Horton pruning, for the chain $\{X^{(1)}_{j}\}_{j\in\mathbb{Z}}$ of the local minima, if and only if the walk’s increments have zero mean, $\rho\,\lambda_{d}=(1-\rho)\,\lambda_{u}$ .

Proof.

(a) By Lemma 19(i), the sequence of local minima $X^{(1)}_{j}$ of $X_{k}$ is a homogeneous random walk with transition kernel $p^{(1)}(x)$ . The latter is the probability distribution of the jumps $d_{j}$ given by (161) with

[TABLE]

The characteristic function $\widehat{p}^{(1)}(s)$ of the transition kernel $p^{(1)}(x)$ is found here as follows

[TABLE]

where

[TABLE]

is the characteristic function of an exponential random variable with parameter $\lambda$ , and $\rho^{*},\lambda^{*}_{u},\lambda^{*}_{d}$ are given by (172). Thus,

[TABLE]

This means that the sequence of local minima $\{X^{(1)}_{j}\}$ also evolves according to a two-sided exponential transition kernel, only with different parameters, $\rho^{*}$ , $\lambda^{*}_{d}$ , and $\lambda^{*}_{u}$ .

Part (b) of the theorem follows immediately from part (a). Alternatively, we observe that the exponential density $f(x)=\phi_{\lambda}(x)$ solves (169) with any $\lambda>0$ : by (173) we have

[TABLE]

and

[TABLE]

Hence, Part (b) follows from Prop. 15.

(c) Observe that $\rho^{*}=1/2$ and $\lambda^{*}_{d}=\lambda^{*}_{u}$ if and only if $\rho\,\lambda_{d}=(1-\rho)\,\lambda_{u}$ .

∎

We now extend Def. 22 to non-critical Galton-Watson trees.

Definition 31 (Exponential binary Galton-Watson tree, [116]).

We say that a random planted embedded binary tree $T\in\mathcal{BL}_{\rm plane}^{|}$ is an exponential binary Galton-Watson tree and write $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(\lambda^{\prime},\lambda)$ , for $0\leq\lambda^{\prime}<\lambda$ , if

(i)

shape*( $T$ ) is a binary Galton-Watson tree $\mathcal{GW}(q_{0},q_{2})$ with*

[TABLE]

(ii)

the orientation for every pair of siblings in $T$ is random and symmetric; and

(iii)

conditioned on a given shape( $T$ ), the edges of $T$ are sampled as independent exponential random variables with parameter $\lambda$ , i.e., with density (69).

In particular, we observe that ${\sf GW}(\lambda)={\sf GW}(0,\lambda).$ A connection between exponential random walks and exponential Galton-Watson trees is provided by the following well known result.

Theorem 18.

[116, Lemma 7.3],[89, 106]*

Consider a random excursion $Y_{t}$ in $\mathcal{E}^{\rm ex}$ . The level set tree $\textsc{level}(Y_{t})$ is an exponential binary Galton-Watson tree ${\sf GW}(\lambda^{\prime},\lambda)$ if and only if the alternating rises and falls of $Y_{t}$ , excluding the last fall, are distributed as independent exponential random variables with parameters ${\lambda+\lambda^{\prime}\over 2}$ and ${\lambda-\lambda^{\prime}\over 2}$ , respectively, for some $0\leq\lambda^{\prime}<\lambda$ .*

Equivalently, for a random excursion $Y_{t}$ of a homogeneous random walk in $\mathcal{E}^{\rm ex}$ , the level set tree $\textsc{level}(Y_{t})$ is an exponential binary Galton-Watson tree ${\sf GW}(\lambda^{\prime},\lambda)$ if and only if $Y_{t}$ , as an element of $\mathcal{E}^{\rm ex}$ , corresponds to an excursion of an exponential walk with parameters $\{\rho,\lambda_{u},\lambda_{p}\}$ satisfying $(1-\rho)\lambda_{u}={\lambda+\lambda^{\prime}\over 2}$ and $\rho\lambda_{d}={\lambda-\lambda^{\prime}\over 2}.$

We emphasize the following direct consequence of Thms. 17(a) and 18.

Corollary 12.

Suppose $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(\gamma)$ is an exponential critical binary Galton-Watson tree. Then, the following statements hold:

(a)

The pruned exponential critical binary Galton-Watson tree is an exponential critical binary Galton-Watson tree:

[TABLE]

(b)

The lengths of branches of Horton-Strahler order $j\geq 1$ in $T$ (see Def. 5) has exponential distribution with parameter $2^{1-j}\,\gamma$ . The lengths of branches (of all orders) are independent.

Remark 14 (A link between Thm. 17 and Thm. 6.).

Consider an excursion of an exponential random walk $X_{t}$ with parameters $\{\rho,\lambda_{u},\lambda_{d}\}$ . The geometric stability of the exponential distribution implies that the monotone rises and falls of $X_{t}$ are exponentially distributed with parameters $(1-\rho)\,\lambda_{u}$ and $\rho\,\lambda_{d}$ , respectively. Thus, Thm. 18 implies that $\textsc{shape}\left(\textsc{level}(X_{t})\right)$ is distributed as a binary Galton-Watson tree $\mathcal{GW}(q_{0},q_{2})$ with

[TABLE]

The first pruning $X_{t}^{(1)}$ (see Sect. 7.4), according to (172), is an exponential random walk with parameters

[TABLE]

Its upward and downward increments are exponentially distributed with parameters, respectively,

[TABLE]

Accordingly, the level set tree for a positive excursion $X_{t}^{(1)}$ is a binary Galton-Watson tree $\mathcal{GW}(q_{0}^{(1)},q_{2}^{(1)})$ with

[TABLE]

Continuing this way, we find that $n$ -th pruning $X_{t}^{(n)}$ of $X_{t}$ is an exponential random walk such that the level set tree of its positive excursion has binary Galton-Watson distribution $\mathcal{GW}(q_{0}^{(n)},q_{2}^{(n)})$ with

[TABLE]

The first equality in (174) defines the same iterative system as (61) in Thm. 6 of Burd et al. that describes iterative Horton pruning of Galton-Watson trees. Another noteworthy relation connecting the exponential random walk $X_{t}^{(n)}$ with parameters $\{\rho^{(n)},\lambda^{(n)}_{u},\lambda^{(n)}_{d}\}$ and the Galton-Watson tree $\mathcal{GW}(q_{0}^{(n-1)},q_{2}^{(n-1)})$ is given by

[TABLE]

7.7 Geometric random walks and critical non-binary Galton-Watson trees

A recent study by Barbosa et al. [16] examines the self-similar properties of the level-set trees corresponding to the excursions of the so-called geometric random walk on $\mathbb{Z}$ , defined below (Def. 32). The results in [16] give a discrete-space version of the results discussed in Sect. 7.6.

For the given probabilities $\{p_{1},p_{2},r_{1},r_{2}\}$ such that $p_{1}+p_{2}\leq 1$ , consider a discrete-time random walk on $\mathbb{Z}$ , where at each time step, $p_{1}$ is the probability of an upward jump, $p_{2}$ is the probability of a downward jump, and $1-p_{1}-p_{2}$ is the probability of remaining at the same location. Conditioned on jumping upward, the increment size is a ${\sf Geom}_{1}(r_{1})$ -distributed random variable, while conditioned on jumping downward, the increment size is a ${\sf Geom}_{1}(r_{2})$ -distributed random variable. Here is a formal definition.

Definition 32 (Geometric random walk).

A geometric random walk $X_{t}$ with probability parameters

[TABLE]

is a discrete time space-homogeneous random walk on $\mathbb{Z}$ with transition probabilities $p(x,y)=p(y-x)$ such that its jump kernel $p(x)$ is a double-sided geometric probability mass function (discrete Laplace distribution) that can be expressed as

[TABLE]

where $\delta_{0}(x)$ denotes the Kronecker delta function at [math], and $g_{i}(x)$ ( $i=1,2$ ) is the probability mass function of a ${\sf Geom}_{1}(r_{i})$ -distributed random variable. The distribution for a geometric random walk is denoted by ${\rm GRW}(p_{1},p_{2},r_{1},r_{2})$ .

Example 14.

The most celebrated example of a geometric random walk is the simple random walk on $\mathbb{Z}$ with distribution ${\rm GRW}\big{(}{1\over 2},{1\over 2},1,1\big{)}$ .

By (175), the characteristic function for the increments in a geometric walk is given by

[TABLE]

Equation (176) leads to the derivation of the following invariance result, analogous to Thm. 17(a) in a discrete space setting.

Theorem 19 ([16]).

Suppose $X_{t}$ is a geometric random walk ${\rm GRW}(p_{1},p_{2},r_{1},r_{2})$ , then the time series $X^{(1)}_{t}$ of its consecutive local minima (including flat plateaus) is also a geometric random walk ${\rm GRW}\big{(}p^{(1)}_{1},p^{(1)}_{2},r^{(1)}_{1},r^{(1)}_{2}\big{)}$ with probability parameters

[TABLE]

If $r_{1}=r_{2}=r$ and $p_{1}=p_{2}=p$ , the geometric random walk ${\rm SGRW}(p,r)\equiv{\rm GRW}(p,p,r,r)$ is called symmetric geometric random walk (SGRW). In this case, Thm. 19 can be reinterpreted as the following statement, analogous to Thm. 17(b) adapted to the discrete space $\mathbb{Z}$ .

Corollary 13 ([16]).

Suppose $X_{t}\stackrel{{\scriptstyle d}}{{\sim}}{\rm SGRW}(p,r)$ is a symmetric geometric random walk on $\mathbb{Z}$ . Then, the time series $X^{(1)}_{t}$ of its consecutive local minima is also a symmetric geometric random walk ${\rm SGRW}\big{(}p^{(1)},r^{(1)}\big{)}$ with probability parameters

[TABLE]

Next, consider the case of a geometric random walk $X_{t}$ with mean zero increments,

[TABLE]

In this case $p_{1}r_{2}=p_{2}r_{1}$ , and Thm. 19 and Cor. 13 imply the following result.

Corollary 14 ([16]).

Suppose $X_{t}\stackrel{{\scriptstyle d}}{{\sim}}{\rm GRW}(p_{1},p_{2},r_{1},r_{2})$ is a mean zero geometric random walk, i.e. $p_{1}r_{2}=p_{2}r_{1}$ . Then, the time series $X^{(1)}_{t}$ of its consecutive local minima is a symmetric geometric random walk ${\rm SGRW}\big{(}p^{(1)},r^{(1)}\big{)}$ with probability parameters

[TABLE]

where $r=\frac{2p_{1}r_{2}}{p_{1}+p_{2}}=\frac{2p_{2}r_{1}}{p_{1}+p_{2}}$ .

Furthermore, let $X^{(k+1)}_{t}$ for $k=1,2,\ldots$ be the time series of the consecutive local minima of $X^{(k)}_{t}$ . Then, $X^{(k)}_{t}$ is also a symmetric geometric random walk ${\rm SGRW}\big{(}p^{(k)},r^{(k)}\big{)}$ with probability parameters

[TABLE]

For the remainder of this section, let $\{p^{(k)},r^{(k)}\}$ denote the parameters of the symmetric geometric random walk ${\rm SGRW}\big{(}p^{(k)},r^{(k)}\big{)}$ , obtained by taking $k$ iterations of local minima of $X_{t}\stackrel{{\scriptstyle d}}{{\sim}}{\rm GRW}(p_{1},p_{2},r_{1},r_{2})$ , as in Cor. 14.

Corollary 15 ([16]).

Suppose $X_{t}\stackrel{{\scriptstyle d}}{{\sim}}{\rm GRW}(p_{1},p_{2},r_{1},r_{2})$ is a mean zero geometric random walk, i.e. $p_{1}r_{2}=p_{2}r_{1}$ . Then,

[TABLE]

The following is a discrete analogue of Thm. 18, stated in Sect. 7.6.

Theorem 20 ([16]).

Suppose $X_{t}\stackrel{{\scriptstyle d}}{{\sim}}{\rm GRW}(p_{1},p_{2},r_{1},r_{2})$ is a geometric random walk with a nonnegative drift, i.e., $p_{1}r_{2}\leq p_{2}r_{1}$ . Let $T^{\sf ex}$ be the level set tree of a positive excursion of $X_{t}$ . Then,

[TABLE]

with

[TABLE]

where $r_{1}^{(1)}$ , $r_{2}^{(1)}$ , $p_{1}^{(1)}$ , and $p_{2}^{(1)}$ are as in Theorem 19 (recall that $q_{1}\equiv 0$ since we work with reduced trees). Moreover, if $X_{t}$ is a mean zero geometric random walk (i.e., $p_{1}r_{2}=p_{2}r_{1}$ ), then

[TABLE]

where $r_{1}^{(1)}$ and $r_{2}^{(1)}$ are as in Corollary 14.

Observe that, in the setting of Thm. 20, if we consider a mean zero GRW ( $p_{1}r_{2}=p_{2}r_{1}$ and, equivalently, $r_{1}^{(1)}=r_{2}^{(1)}$ ) then,

[TABLE]

In other words, the level set tree of its positive excursion is distributed as a critical Galton-Watson tree $\mathcal{GW}\big{(}\{q_{k}\}\big{)}$ . Combining Prop. 14 with Thm. 20 we have the following corollary.

Corollary 16 ([16]).

Suppose $X_{t}\stackrel{{\scriptstyle d}}{{\sim}}{\rm GRW}(p_{1},p_{2},r_{1},r_{2})$ is a mean zero geometric random walk, i.e. $p_{1}r_{2}=p_{2}r_{1}$ . Let $T^{\sf ex}$ be the level set tree of a positive excursion of $X_{t}$ . Then, $\textsc{shape}(T^{\sf ex})\stackrel{{\scriptstyle d}}{{\sim}}\mathcal{GW}\big{(}\{q_{k}\}\big{)}$ , where $\mathcal{GW}\big{(}\{q_{k}\}\big{)}$ is a critical Galton-Watson distribution on $\mathcal{T}^{|}$ . Moreover, for any $n\geq 1$ , the level set tree of a positive excursion of $X_{t}^{(n)}$ is distributed as

[TABLE]

with

[TABLE]

where $r^{(n)}$ is given by equation (177) of Corollary 14.

Finally, letting $n\to\infty$ , we have

[TABLE]

The convergence in (179) follows from Cor. 15 as $r^{(n)}\to 0$ . Writing $\nu=\mathcal{GW}\big{(}\{q_{k}\}\big{)}$ , we have by Cor. 16 that the pushforward measure satisfies

[TABLE]

while equation (179) additionally asserts that

[TABLE]

where $\mu^{*}$ denotes the critical binary Galton-Watson measure on $\mathcal{T}^{|}$ defined in (60). Equation (180) provides a specific example of Thm. 5 (Thm. 1.3 in [29]) showing that recursive pruning of a critical Galton-Watson tree converges to a binary critical Galton-Watson tree.

7.8 White noise and Kingman’s coalescent

This section establishes an interesting correspondence between the tree representations of a white noise (sequence of i.i.d. random variables) and celebrated Kingman’s coalescent process [74]. We begin by an informal review of coalescent processes and their trees.

7.8.1 Coalescent processes, trees

Coalescent processes [116, 5, 26, 22, 51]. A general finite coalescent process begins with $N$ singletons. The cluster formation is governed by a symmetric collision rate kernel $K(i,j)=K(j,i)>0$ . Specifically, a pair of clusters with masses (weights) $i$ and $j$ coalesces at the rate $K(i,j)/N$ , independently of the other pairs, to form a new cluster of mass $i+j$ . The process continues until there is a single cluster of mass $N$ .

Formally, for a given $N\geq 1$ consider the space $\mathcal{P}_{[N]}$ of partitions of $[N]=\{1,2,\ldots,N\}$ . Let $\Pi^{(N)}_{0}$ be the initial partition in singletons, and $\Pi^{(N)}_{t}~{}~{}(t\geq 0)$ be a strong Markov process such that $\Pi^{(N)}_{t}$ transitions from partition $\pi\in\mathcal{P}_{[N]}$ to $\pi^{\prime}\in\mathcal{P}_{[N]}$ with rate $K(i,j)/N$ provided that partition $\pi^{\prime}$ is obtained from partition $\pi$ by merging two clusters of $\pi$ of weights $i$ and $j$ . If $K(i,j)\equiv 1$ for all positive integer masses $i$ and $j$ , the process $\Pi^{(N)}_{t}$ is known as the $N$ -particle Kingman’s coalescent process. If $K(i,j)=i+j$ the process is called the $N$ -particle additive coalescent. Finally, if $K(i,j)=ij$ the process is called the $N$ -particle multiplicative coalescent.

Coalescent tree. A merger history of the $N$ -particle coalescent process can be naturally described by a time oriented binary tree constructed as follows. Start with $N$ leaves that represent the initial $N$ particles and have time mark $t=0$ . When two clusters coalesce (a transition occurs), merge the corresponding vertices to form an internal vertex with a time mark of the coalescent. The final coalescence forms the tree root. The resulting time oriented binary tree represents the history of the process. We notice that a given unlabeled tree corresponds to multiple coalescent trajectories obtained by relabeling of the initial particles.

Let $T^{(N)}_{\rm K}$ denote the coalescent tree for the $N$ -particle Kingman’s coalescent process. Let $N_{j}$ denote the number of branches of Horton-Strahler order $j$ in the tree $T^{(N)}_{\rm K}$ . In Sect. 8 we will show that for each $j\geq 1$ , the asymptotic Horton ratios ${\mathcal{N}}_{j}$ are well-defined (Def. 20), that is

[TABLE]

Moreover, the Horton ratios ${\mathcal{N}}_{j}$ are finite and can be expressed as

[TABLE]

where the sequence $g_{j}(x)$ solves the following system of ordinary differential equations (ODEs):

[TABLE]

with $g_{1}(x)=2/(x+2)$ , $g_{j}(0)=0$ for $j\geq 2$ . Equivalently,

[TABLE]

where $h_{0}\equiv 0$ and the sequence $h_{j}(x)$ satisfies the ODE system

[TABLE]

with the initial conditions $h_{k}(0)=1$ for $j\geq 1$ .

The root-Horton law (Def. 21) for the well-defined Horton ratios ${\mathcal{N}}_{j}$ (181) of the Kingman’s coalescent process is stated in Thm. 23, with the Horton exponent bounded by the interval $2\leq R\leq 4$ . Moreover, the Horton exponent is estimated to be $R=3.0438279\ldots$ via the ODE representation in (182) and (183). The numerical computation (not shown here) affirms that the ratio-Horton and the strong Horton laws of Def. 21 are valid for the Kingman’s coalescent as well.

7.8.2 White noise

In this section we will show that the combinatorial shape function for the level set tree $T_{\sf wn}$ of white noise is closely connected to the shape function of the Kingman’s coalescent tree $T_{\rm K}=T^{(N)}_{\rm K}$ introduced in Sect. 7.8.1. Specifically, the two are separated by a single Horton pruning $\mathcal{R}$ . In other words, conditioning on the same number of leaves, $\textsc{shape}\big{(}\mathcal{R}(T_{\rm K})\big{)}\stackrel{{\scriptstyle d}}{{=}}\textsc{shape}\big{(}T_{\sf wn}\big{)}$ .

Let $W^{(N)}_{j}$ with $j=1,\dots,N\!-\!1$ be a discrete white noise that is a discrete time process comprised of $N\!-\!1$ i.i.d. random variables with a common atomless distribution. Next, we consider an auxiliary process $\tilde{W}^{(N)}_{i}$ with $i=1,\dots,2N\!-\!1$ , such that it has exactly $N$ local maxima and $N\!-\!1$ internal local minima $\tilde{W}^{(N)}_{2j}=W^{(N)}_{j}$ , $j=1,\dots,N\!-\!1$ . We call $\tilde{W}^{(N)}_{i}$ an extended white noise. It can be constructed as in the following example.

Example 15 (Extended white noise).

[TABLE]

where $i^{\prime}=\max\left(1,\frac{i-1}{2}\right)$ and $i^{\prime\prime}=\min\left(N-1,\frac{i+1}{2}\right)$ .

Let $T_{\sf wn}^{(N)}=\textsc{level}\left(\tilde{W}^{(N)}_{i}\right)$ be the level set tree of $\tilde{W}^{(N)}_{i}$ . By construction, $T_{\sf wn}^{(N)}$ has exactly $N$ leaves. Also observe that the level set trees $T_{\sf wn}^{(N)}$ and $\textsc{level}\left(W^{(N)}_{j}\right)$ are separated by a single Horton pruning:

[TABLE]

Lemma 21.

The distribution of $\textsc{shape}\left(T_{\sf wn}^{(N)}\right)$ on $\mathcal{BT}^{|}$ is the same for any atomless distribution $F$ of the values of the associated white noise $W^{(N)}_{j}$ .

Proof.

The condition of atomlessness of $F$ is necessary to ensure that the level set tree is binary with probability one. By construction, the combinatorial level set tree is completely determined by the ordering of the local minima of the respective trajectory, independently of the particular values of its local maxima and minima. We complete the proof by noticing that the distribution for the ordering of $W^{(N)}_{j}$ is the same for any choice of atomless distribution $F$ . ∎

Let $T^{(N)}_{\rm K}$ be the tree that corresponds to the Kingman’s $N$ -coalescent, and let $\textsc{shape}\left(T^{(N)}_{\rm K}\right)$ be its combinatorial version that drops the time marks of the vertices. Both the trees $\textsc{shape}\left(T_{\sf wn}^{(N)}\right)$ and $\textsc{shape}\left(T^{(N)}_{\rm K}\right)$ , belong to the space $\mathcal{BT}^{|}$ (or, more specifically, to $\mathcal{BT}^{|}$ conditioned on $N$ leaves).

Theorem 21.

The trees $\textsc{shape}\left(T_{\sf wn}^{(N)}\right)$ and $\textsc{shape}\left(T^{(N)}_{\rm K}\right)$ have the same distribution on $\mathcal{BT}^{|}$ .

Proof.

The proof uses a construction similar in some respect to the celebrated Kingman paintbox process [74, 116, 26, 22]. For the Kingman’s $N$ -coalescent, let us enumerate the initial singletons from $1$ to $N$ . We will identify each cluster with a collection of singletons listed from left to right, where the order in which they are listed is important as it contains a certain amount of information regarding the process’s merger history. Specifically, consider a pair of clusters ${\bf i}$ and ${\bf j}$ , identified with the corresponding collection of singletons as follows

[TABLE]

Next, we split the merger rate of ${1\over N}$ into two. We let the clusters ${\bf i}$ and ${\bf j}$ merge into the new cluster

[TABLE]

with rate ${1\over 2N}$ , or into the new cluster

[TABLE]

also with rate ${1\over 2N}$ . The final merger results in a cluster consisting of all $N$ singletons, listed as a permutation from $S_{N}$ ,

[TABLE]

Conditioning on the final permutation $\sigma$ , the merger history is described by the random connection times,

[TABLE]

where $t_{j}$ is the merger time when the singletons $\sigma_{j}$ and $\sigma_{j+1}$ meet in the same cluster. The following diagram helps visualize the connection times:

[TABLE]

Since all $(N\!-\!1)!$ orderings of the connection times $t_{1},\ldots,t_{N-1}$ are equiprobable, the combinatorial shape of the resulting coalescent tree is distributed as the combinatorial tree $\textsc{shape}\left(T_{\sf wn}^{(N)}\right)$ , where all $(N\!-\!1)!$ orderings of the analogous connection times $W^{(N)}_{1},W^{(N)}_{2},\ldots,W^{(N)}_{N-1}$ are also equiprobable. ∎

The following result is a consequence of the above Thm. 21 and Thm. 23 that we state and prove in Sect. 8 establishing the root-Horton law (Def. 21) for Kingman’s coalescent tree $\textsc{shape}\left(T^{(N)}_{\rm K}\right)$ .

Corollary 17.

The combinatorial level set tree of a discrete white noise $W^{(N)}_{j}$ is root-Horton self similar with the same Horton exponent $R$ as that for Kingman’s $N$ -coalescent.

Proof.

Together, Theorems 21 and 23 imply the root-Horton self-similarity for $\textsc{shape}\left(T_{\sf wn}^{(N)}\right)$ , with the same Horton exponent $R$ .

By definition, Horton pruning corresponds to an index shift in Horton statistics: $N_{j}\big{[}\mathcal{R}(T)\big{]}=N_{j+1}[T]$ ( $j\geq 1$ ). Thus, the root-Horton self-similarity for $\textsc{shape}\left(T_{\sf wn}^{(N)}\right)$ implies the root-Horton self-similarity for $\textsc{shape}\left(\textsc{level}\big{(}W^{(N)}_{j}\big{)}\right)$ . Finally, the Horton exponent is preserved under the extra Horton pruning as

[TABLE]

∎

7.9 Level set trees on higher dimensional manifolds and Morse theory

Consider an $n$ -dimensional differentiable manifold $M=M^{n}$ , and a differentiable function $f:M\rightarrow\mathbb{R}$ . A point $p$ is called a critical point of $f$ if $df(p)=0$ , in which case, $f(p)$ is said to be a critical value of $f$ . A point $x\in M$ is called a regular point of $f$ if it is not a critical point.

If $p$ is a critical point of $f$ , then

[TABLE]

is the Taylor expansion of $f$ around $p$ , where

[TABLE]

is a symmetric bilinear form over the tangent space $T_{p}M$ generated by the Hessian matrix ${\partial^{2}f\over\partial x_{i}\,\partial x_{j}}(p)$ , and $O(3)$ denotes the third and higher order terms.

Definition 33 (Nondegenerate points and Morse functions [109]).

Let $M$ and $f$ to be as above. A critical point $p\in M$ of $f$ is said to be nondegenerate if the determinant of its Hessian matrix ${\partial^{2}f\over\partial x_{i}\,\partial x_{j}}(p)$ is not equal to zero. A differentiable function $f:M\rightarrow\mathbb{R}$ is said to be a Morse function if all of its critical points are nondegenerate.

Theorem 22 (Morse, [109]).

Consider an $n$ -dimensional differentiable manifold $M$ , and a differentiable function $f:M\rightarrow\mathbb{R}$ . If $p\in M$ is a nondegenerate critical point of $f$ , then there exists an open neighborhood $U$ of $p$ and local coordinates $(x_{1},\ldots,x_{n})$ on $U$ with

[TABLE]

such that in this coordinates $f(x)$ is a quadratic polynomial represented as

[TABLE]

If $B(u,v):V\times V\rightarrow\mathbb{R}$ is a nondegenerate (i.e., with non-zero determinant) symmetric bilinear form over an $n$ -dimensional vector space $V$ , then there exists a unique nonnegative integer $\lambda\leq n$ and at least one basis $\mathcal{B}$ of $V$ such that, in basis $\mathcal{B}$ ,

[TABLE]

This implies the following corollary to the Morse Theorem (Thm. 22), known as the Morse Lemma.

Corollary 18 (Morse Lemma [109]).

Consider an $n$ -dimensional differentiable manifold $M$ , and a differentiable function $f:M\rightarrow\mathbb{R}$ . If $p\in M$ is a nondegenerate critical point of $f$ , then there exists and open neighborhood $U$ of $p$ and local coordinates $(x_{1},\ldots,x_{n})$ on $U$ with

[TABLE]

such that in this coordinates,

[TABLE]

The integer $\lambda$ in Cor. 18 is called the index of the nondegenerate critical point $p\in M$ . The next lemma concerns directly the structure of the level set trees for $f:M\rightarrow\mathbb{R}$ . Let $M$ and $f$ to be as above. Following the one-dimensional setup of Sect. 7.2.1, for $\alpha\in\mathbb{R}$ we consider the level set

[TABLE]

Lemma 22 ([103, 31]).

Consider an $n$ -dimensional differentiable manifold $M$ , and a Morse function $f:M\rightarrow\mathbb{R}$ . Given points $p,q\in M$ and a differentiable curve $\gamma:\,[0,1]\rightarrow M$ such that $\gamma(0)=p$ and $\gamma(1)=q$ . Let $a=\min\big{\{}f(p),f(q)\big{\}}$ be the minimal endpoint value, and let $b=\min\limits_{t\in[0,1]}\big{(}f\circ\gamma(t)\big{)}$ .

Suppose $f^{-1}\big{(}[a,b]\big{)}$ is compact and does not contain any critical points of index $n$ or $n-1$ . Then, for any $\delta>0$ , there exists a differentiable curve $\widetilde{\gamma}:\,[0,1]\rightarrow M$ homotopic to $\gamma$ such that $\widetilde{\gamma}(0)=p$ and $\widetilde{\gamma}(1)=q$ , and

[TABLE]

Consider an $n$ -dimensional compact differentiable manifold $M$ , and a Morse function $f:M\rightarrow\mathbb{R}$ . Recalling the definition of a level set tree in dimension one, for $p,q\in M$ , let

[TABLE]

where the supremum is taken over all continuous curves $\gamma:\,[0,1]\rightarrow M$ such that $\gamma(0)=p$ and $\gamma(1)=q$ . Next, as it was the case when $\dim(M)=1$ , we define a pseudo-metric on $M$ as

[TABLE]

We write $p\sim_{f}q$ if $d_{f}(p,q)=0$ , and observe that $d_{f}$ is a metric over the quotient space $M/\!\sim_{f}$ . Thus, $\left(M/\!\sim_{f},d_{f}\right)$ is a metric space, satisfying Def. 1 of a tree. This tree will be called the level set tree of $f$ , and denoted by $\textsc{level}(f)$ . Here, $d_{f}(p,q)\geq|f(p)-f(q)|$ , with $d_{f}(p,q)=|f(p)-f(q)|$ if and only if points $(p/\!\sim_{f})$ and $(q/\!\sim_{f})$ of $\textsc{level}(f)$ belong to the same lineage. In particular, if $d_{f}(p,q)=f(p)-f(q)$ , then $(p/\!\sim_{f})$ is the descendant point to $(q/\!\sim_{f})$ , and respectively, $(q/\!\sim_{f})$ is the ancestral point to $(p/\!\sim_{f})$ . Figures 32,33 show examples of level set trees for functions $f$ on $\mathbb{R}^{2}$ .

Example 16 (Compactness requirement).

The requirement for the manifold $M$ to be compact is necessary to ensure that there are no pairs of disjoint closed sets such that the distance between the two sets equals zero. As a counterexample, consider a function $f(x,y)=x^{2}-e^{y}$ on $M=\mathbb{R}^{2}$ (Fig. 34). Here, the level set $\mathcal{L}_{0}$ consists of two nonintersecting closed regions, marked by gray shading in Fig. 34(b):

[TABLE]

and

[TABLE]

The distance between $A$ and $B$ is zero, as the two sets get arbitrary close along the line $x=0$ as $y\to-\infty$ . Consider points $p=(e,2)\in A$ and $q=(-e,2)\in B$ marked in Fig. 34. The points $p$ and $q$ are not connected by a continuous path inside $\mathcal{L}_{0}$ , since each such a path must intersect the line $x=0$ along which $f<0$ . Yet, if we were to extend the distance in (186) to $M=\mathbb{R}^{2}$ , then $\underline{f}(p,q)=0$ since for any $\delta>0$ there exists a path similar to $\gamma$ in Fig. 34(b), with the tip on the line $x=0$ for large enough $y$ , so that $\gamma\subset\mathcal{L}_{-\delta}$ . Consequently, we have $\,d_{f}(p,q)=0$ implying that the points $p$ and $q$ are equivalent on the level set tree of $f$ , $p=_{\sim_{f}}q$ , albeit they belong to two disconnected components of $\mathcal{L}_{0}$ .

Naturally, if $f:M\rightarrow\mathbb{R}$ is a Morse function, the critical points of index $n$ (local maxima) correspond to the leaves of the level set tree $\textsc{level}(f)$ . As we decrease $\alpha$ , new segments of $\mathcal{L}_{\alpha}$ appear at the critical points of index $n$ , and disconnected components of $\mathcal{L}_{\alpha}$ merge at some critical points of index less than $n$ . If $M$ is a compact manifold and $f:M\rightarrow\mathbb{R}$ is a Morse function, then by Lem. 22 the critical points of index less than $n-1$ cannot be the merger points of separated pieces of $\mathcal{L}_{\alpha}$ . Thus, we obtain the following corollary of Lem. 22.

Corollary 19.

Consider an $n$ -dimensional compact differentiable manifold $M$ , and a Morse function $f:M\rightarrow\mathbb{R}$ . Then, there is a bijection between the leaves of $\textsc{level}(f)$ and the critical points of $f$ of index $n$ , and a one-to-one (but not necessarily onto) correspondence between the internal (non-leaf) vertices of $\textsc{level}(f)$ and the critical points of $f$ of index $n-1$ .

Proof.

Suppose $c\in M$ is a critical point of $f$ of index less than $n-1$ such that $(c/\!\sim_{f})$ is an internal (non-leaf) vertex of $\textsc{level}(f)$ . Then, $(c/\!\sim_{f})$ is a parent vertex to at least one pair of points $(p/\!\sim_{f})$ and $(q/\!\sim_{f})$ of $\textsc{level}(f)$ that do not belong to the same lineage, $\underline{f}(p,q)=f(c)$ , and therefore

[TABLE]

where $a=\min\big{\{}f(p),f(q)\big{\}}$ . Thus, since $M$ is a differentiable manifold, there exists a differentiable curve $\gamma:\,[0,1]\rightarrow M$ such that $\gamma(0)=p$ and $\gamma(1)=q$ , and $\min\limits_{t\in[0,1]}\big{(}f\circ\gamma(t)\big{)}=f(c)$ . Then, by Lemma 22, for any $\delta>0$ , there exists a differentiable curve $\widetilde{\gamma}:\,[0,1]\rightarrow M$ homotopic to $\gamma$ such that $\widetilde{\gamma}(0)=p$ and $\widetilde{\gamma}(1)=q$ , and

[TABLE]

Hence,

[TABLE]

for any $\delta>0$ . Therefore, $d_{f}(p,q)=|f(p)-f(q)|$ , contradicting (187), i.e., contradicting the assumption that $(p/\!\sim_{f})$ and $(q/\!\sim_{f})$ do not belong to the same lineage in $\textsc{level}(f)$ . ∎

Remark 15.

Corollary 19 asserts that while every internal vertex of the level set tree corresponds to a critical point of index $1$ , not every critical point of index $1$ may correspond to an internal vertex. Figure 32 shows an example of a function where every critical point of index $1$ (saddle) corresponds to an internal vertex. Figure 33 shows an example of a function where the critical point of index $1$ (saddle) does not corresponds to an internal vertex.

Finally, Cor. 19 together with Morse Lemma (Cor. 18) imply the following lemma.

Lemma 23.

Consider an $n$ -dimensional compact differentiable manifold $M$ , and a Morse function $f:M\rightarrow\mathbb{R}$ . Suppose there is no two distinct critical points $p$ and $q$ of index $n-1$ with the same value $f(p)=f(q)$ . Then, the level set tree $\textsc{level}(f)$ is binary.

Proof.

Suppose $p$ is a critical point of $f$ corresponding to an internal (non-leaf) vertex in $\textsc{level}(f)$ . Then, by Corollary 19, $p$ has index $\lambda=n-1$ . Corollary 18 asserts that there exists and open neighborhood $U$ of $p$ and local coordinates $(x_{1},\ldots,x_{n})$ on $U$ with

[TABLE]

such that in this coordinates,

[TABLE]

Hence, as $\alpha$ decreases, the merger of distinct components of $\mathcal{L}_{\alpha}$ happens along the $x_{n}$ -coordinate axis. This allows for the merger of at most two components. ∎

Vladimir Arnold studied an alternative (albeit similar in spirit) construction of level set trees that he called the graph of Morse function $f:M\rightarrow\mathbb{R}$ , concentrating mainly on the spheres $M=S^{2}$ ; see [8, 9, 10] and references therein. Arnold has shown that these graphs are binary trees as well. These trees are constructed in such a way that both the local minima (index [math]) and the local maxima (index $2$ ) points of $f$ correspond to the leaves, while the saddle points (index $1$ ) correspond to the internal (non-leaf) vertices. The goal of Arnold’s study was to shed light on the problem of classifying all possible configurations of the horizontal lines on the topographical maps formulated by A. Cayley in 1868. In [10], Arnold quotes a communication with Morse: M. Morse has told me, in 1965, that the problem of the description of the possible combinations of several critical points of a smooth function on a manifold looks hopeless to him. L. S. Pontrjagin and H. Whitney were of the same opinion. Arnold’s work of topological classification of level lines for Morse functions on $S^{2}$ enriched the collection of questions accompanying the Hilbert’s sixteenth problem, which promoted the study of the topological structures of the level lines of real polynomials $p(x)$ over $x\in\mathbb{R}^{n}$ , [68, 9, 10].

8 Kingman’s coalescent process

We refer to a general definition of a coalescent process in Section 7.8.1. Recall that in an $N$ -particle coalescent process, a pair of clusters with masses $i$ and $j$ coalesces at the rate $K(i,j)/N$ . The mass-independent rate $K(i,j)=1$ defines the Kingman’s coalescent process [74]. The following result establishes a weak form of Horton law for Kingman’s coalescent.

Theorem 23 (Root-Horton law for Kingman’s coalescent, [82]).

Consider Kingman’s $N$ -coalescent process and its tree representation $T^{(N)}_{\rm K}$ . Let $N_{j}=N_{j}^{(N)}$ denote the number of branches of Horton-Strahler order $j$ in the tree $T^{(N)}_{\rm K}$ .

(i)

The asymptotic Horton ratios ${\mathcal{N}}_{j}$ exist and are finite for all $j\in\mathbb{N}$ , as in Def. 20. That is, for each $j$ , the following limit exists and is finite:

[TABLE]

(ii)

Furthermore, ${\mathcal{N}}_{j}$ satisfy the root-Horton law (Def. 21):

[TABLE]

with Horton exponent $2\leq R\leq 4$ .

8.1 Smoluchowski-Horton ODEs for Kingman’s coalescent

In this section we provide a heuristic derivation of Smoluchowski-type ODEs for the number of Horton-Strahler branches in the coalescent tree $T^{(N)}_{\rm K}$ and consider the asymptotic version of these equations as $N\to\infty$ . Section 8.2 formally establishes the validity of the hydrodynamic limit.

Recall that $K(i,j)\equiv 1$ . Let $|\Pi^{(N)}_{t}|$ denote the total number of clusters at time $t\geq 0$ , and let $\eta_{(N)}(t):=|\Pi^{(N)}_{t}|/N$ be the total number of clusters relative to the system size $N$ . Then $\eta_{(N)}(0)=N/N=1$ and $\eta_{(N)}(t)$ decreases by $1/N$ with each coalescence of clusters; this happens with the rate

[TABLE]

since $1/N$ is the coalescence rate for any pair of clusters regardless of their masses. Informally, this implies that the large-system limit relative number of clusters $\displaystyle\eta(t)=\lim_{N\to\infty}\eta_{(N)}(t)$ satisfies the following ODE:

[TABLE]

The initial condition $\eta(0)=1$ implies a unique solution $\eta(t)=2/(2+t)$ . The existence of the limit $\eta(t)$ is established in Lem. 24(a) of Sect. 8.2.

Next, for any $k\in\mathbb{N}$ we write $\eta_{k,N}(t)$ for the relative number of clusters (with respect to the system size $N$ ) that correspond to branches of Horton-Strahler order $k$ in tree $T^{(N)}_{\rm K}$ at time $t$ . Initially, each particle represents a leaf of Horton-Strahler order $1$ . Accordingly, the initial conditions are set to be, using Kronecker’s delta notation,

[TABLE]

Below we describe the evolution of $\eta_{k,N}(t)$ using the definition of Horton-Strahler orders.

Observe that $~{}\eta_{k,N}(t)$ increases by $1/N$ with each coalescence of clusters of Horton-Strahler order $k-1$ that happens with the rate

[TABLE]

Thus ${\eta_{k-1,N}^{2}(t)\over 2}+o(1)$ is the instantaneous rate of increase of $\eta_{k,N}(t)$ .

Similarly, $~{}\eta_{k,N}(t)$ decreases by $1/N$ when a cluster of order $k$ coalesces with a cluster of order strictly higher than $k$ that happens with the rate

[TABLE]

and it decreases by $2/N$ when a cluster of order $k$ coalesces with another cluster of order $k$ that happens with the rate

[TABLE]

Thus the instantaneous rate of decrease of $\eta_{k,N}(t)$ is

[TABLE]

We can informally write the limit rates-in and the rates-out for the clusters of Horton-Strahler order via the following Smoluchowski-Horton system of ODEs:

[TABLE]

with the initial conditions $\eta_{k}(0)=\delta_{1}(k)$ . Here we interpret $\displaystyle\eta_{k}(t)$ as the hydrodynamic limit of $\eta_{k,N}(t)$ as $N\to\infty$ , which will be rigorously established in Lem. 24(b) of Sect. 8.2. We also let $\eta_{0}\equiv 0$ .

Since $\eta_{k}(t)$ has the instantaneous rate of increase $\eta_{k-1}^{2}(t)/2$ , the relative total number of clusters corresponding to branches of Horton-Strahler order $k$ is then

[TABLE]

This equation has a simple heuristic interpretation. Specifically, according to the Horton-Strahler rule (5), a branch of order $k>1$ can only be created by merging two branches of order $k-1$ . In Kingman’s coalescent process these two branches are selected at random from all pairs of branches of order $k-1$ that exist at instant $t$ . As $N$ goes to infinity, the asymptotic density of a pair of branches of order $(k-1)$ , and hence the instantaneous intensity of newly formed branches of order $k$ , is $\eta^{2}_{k-1}(t)/2$ . The integration over time gives the relative total number of order- $k$ branches. The validity of equation (191) is established within the proof of Thm. 23(i) that follows Lem. 24.

It is not hard to compute the first three terms of the sequence ${\mathcal{N}}_{k}$ by solving equations (189) and (190) in the first three iterations:

[TABLE]

Hence, we have ${{\mathcal{N}}_{1}/{\mathcal{N}}_{2}}={3}$ and ${{\mathcal{N}}_{2}/{\mathcal{N}}_{3}}=3.038953879388\dots$ Our numerical results yield, moreover,

[TABLE]

8.2 Hydrodynamic limit

This section establishes the existence of the asymptotic ratios ${\mathcal{N}}_{k}$ of (188) as well as the validity of the equations (189), (190) and (191) in a hydrodynamic limit. We refer to Darling and Norris [35] for a survey of techniques for establishing convergence of a Markov chain to the solution of a differential equation.

Notice that if the first $k-1$ functions $\eta_{1}(t),\ldots,\eta_{k-1}(t)$ are given, then (190) is a linear equation in $\eta_{k}(t)$ . This quasilinearity implies the existence and uniqueness of a solution.

We now proceed with establishing a hydrodynamic limit for the Smoluchowski-Horton system of ODEs (190). Let

[TABLE]

Lemma 24.

Let $\eta_{(N)}(t)$ be the relative total number of clusters and $\eta(t)$ be the solution to equation (189) with the initial condition $\eta(0)=1$ . Let $\eta_{k,N}(t)$ denote the relative number of clusters that correspond to branches of Horton-Strahler order $k$ and let functions $\eta_{k}(t)$ solve the system of equations (190) with the initial conditions $\eta_{k}(0)=\delta_{1}(k)$ . Then, as $N\to\infty$ ,

(a)

$~{}\big{\|}\eta_{(N)}(t)-\eta(t)\big{\|}_{L^{\infty}[0,\infty)}\stackrel{{\scriptstyle p}}{{\to}}0$ ;

(b)

$~{}\|\eta_{k,N}(t)-\eta_{k}(t)\|_{L^{\infty}[0,\infty)}\stackrel{{\scriptstyle p}}{{\to}}0$ , $\forall k\geq 1$ .

Proof.

We adopt here the approach of [80] that uses the weak limit law established in [50, Theorem 2.1, Chapter 11] and [87, Theorem 8.1]; it is briefly explained in Appendix A of this manuscript. This approach is different from the original proof given in [82], and also from the method developed in Norris [110] for the Smoluchowski equations.

For a fixed positive integer $K$ , let

[TABLE]

with $\hat{X}_{N}(0)=Ne_{1}$ . The process $\hat{X}_{N}(t)$ is a finite dimensional Markov process. Its transition rates can be found using the formalism (228) for density dependent population processes. Specifically, let $x=(x_{1},x_{2},\ldots,x_{K+1})$ . Then, for any $1\leq k\leq K$ , the change vector $\ell=-e_{k}-e_{K+1}$ corresponding to a merger of a cluster of order $k$ into a cluster of order higher than $k$ has the rate

[TABLE]

where $\beta_{\ell}(x)=x_{k}\left(x_{K+1}-\sum\limits_{j=1}^{k}x_{j}\right)$ . For a given $k$ such that $1\leq k\leq K$ , the change vector

[TABLE]

corresponding to a merger of a pair of clusters of order $k$ is assigned the rate

[TABLE]

where $\beta_{\ell}(x)=\frac{x_{k}^{2}}{2}$ . Finally, the change vector $\ell=-e_{K+1}$ corresponding to a merger of two clusters, both of order greater than $K$ , is assigned the rate

[TABLE]

where $\beta_{\ell}(x)=\frac{x_{K+1}^{2}}{2}$ .

By Thm. 33, $X_{N}(t)=N^{-1}\hat{X}_{N}(t)$ converges to $X(t)$ as in (231), where $X(t)$ satisfies (230) with

[TABLE]

where we let $x_{-1}=0$ at all times. Here, $F(x)$ naturally satisfies the Lipschitz continuity conditions (229), and the initial conditions $X(0)=X_{n}(0)=e_{1}$ .

Therefore, for a given integer $K>0$ and a fixed real $T>0$ , equation (230) in Thm. 33 with $F(x)$ as in (8.2) yields

[TABLE]

and

[TABLE]

for all $k=1,2,\ldots,K$ , with $\eta_{(N)}$ satisfying (189) and $\eta_{k,N}$ satisfying the system of Smoluckowski-Horton system of ODEs (190).

Let $T_{m}$ be the time when the first $m$ clusters merge. The expectation for the time $T_{m}$ is

[TABLE]

For given $\epsilon\in(0,1)$ and $\gamma>1$ let $m=\lfloor(1-\epsilon)N\rfloor$ . Taking $T>{2(1-\epsilon)\over\epsilon}\gamma$ , we have for all $t\geq T$ ,

[TABLE]

Thus $~{}\big{|}\eta_{(N)}(t)-\eta(t)\big{|}>\epsilon~{}$ would imply $~{}\eta_{(N)}(t)>\epsilon>\eta(t)>0$ , and by Markov’s inequality, we obtain

[TABLE]

Together (194) and the above equation (197) imply

[TABLE]

Hence $~{}\|\eta_{(N)}(t)-\eta(t)\|_{L^{\infty}[0,\infty)}\rightarrow 0~{}$ in probability, establishing Lemma 24(a).

Finally, observe that for any $\epsilon>0$ and for $T>0$ large enough so that $~{}\eta(T)<\epsilon$ ,

[TABLE]

Thus,

[TABLE]

where the last bound is obtained from Markov inequality: for $m=\lfloor(1-\epsilon)N\rfloor$ ,

[TABLE]

by (196). Together, equations (195) and (198) imply

[TABLE]

∎

Consequently, we establish a hydrodynamic limit for the Horton ratios (Thm. 23(i)) and validate formula (191).

Proof of Theorem 23(i).

The existence of the limit ${\mathcal{N}}_{j}=\lim_{N\to\infty}N_{j}/N$ in probability and its expression (191) via the solution $\eta_{(N)}(t)$ of (189) follows from (192) in the context of Theorem 33 and the tail bound (197). ∎

8.3 Some properties of the Smoluchowski-Horton system of ODEs

Here we restate the Smoluchowski-Horton system of ODEs (190) as a simpler quasilinear system of ODEs (200), which we later (Sect. 8.3.2) rescale to the interval $[0,1]$ (203). Some of the properties established in Prop. 16 and Lem. 25 of this section are used in the proof of Thm. 23(ii) in Sect. 8.4.

8.3.1 Simplifying the Smoluchowski-Horton system of ODEs

Let $g_{1}(t)=\eta(t)$ and $g_{k}(t)=\eta(t)-\sum\limits_{j:~{}j<k}\eta_{j}(t)$ be the asymptotic number of clusters of Horton-Strahler order $k$ or higher at time $t$ . We can rewrite (190) via $g_{k}$ using $\eta_{k}(t)=g_{k}(t)-g_{k+1}(t)$ :

[TABLE]

We now rearrange the terms, obtaining for all $k\geq 2$ ,

[TABLE]

One can readily check that ${d\over dt}g_{2}(t)-{g^{2}_{1}(t)\over 2}+g_{1}(t)\,g_{2}(t)=0$ ; the above equations hence simplify as follows

[TABLE]

Observe that the existence and uniqueness of the solution sequence $g_{k}$ of (200) follows immediately from the quasilinear structure of the system (200): for a known $g_{k}(t)$ , the next function $g_{k+1}(t)$ is obtained by solving a first-order linear equation.

From (200) one has $g_{k}(t)>0$ for all $t>0$ , and similarly, from the equation (190) one has

[TABLE]

Next, returning to the asymptotic ratios ${\mathcal{N}}_{k}$ , we observe that (199) implies, for $k\geq 2$ ,

[TABLE]

since

[TABLE]

where $0\leq g_{k}(t)<g_{1}(t)\rightarrow 0$ as $t\rightarrow\infty$ , and $\int\limits_{0}^{\infty}{d\over dt}g_{k}(t)dt=g_{k}(\infty)-g_{k}(0)=0$ for $k\geq 2$ . Let $n_{k}$ represent the number of order- $k$ branches relative to the number of order- $(k+1)$ branches:

[TABLE]

Consider the following limits that represent, respectively, the root and the ratio asymptotic Horton laws:

[TABLE]

Theorem 23(ii) establishes the existence of the first limit. We expect the second, stronger, limit also to exist and both of them to be equal to $3.043827\dots$ according to our numerical results. We now establish some basic facts about $g_{k}$ and $n_{k}$ .

Proposition 16.

*Let $g_{k}(x)$ solve the ODE system (200). Then

** (a)**

$~{}\frac{1}{2}\int\limits_{0}^{\infty}{g^{2}_{k}(t)}dt=\int\limits_{0}^{\infty}g_{k}(t)g_{k+1}(t)dt,$

** (b)**

$~{}\int\limits_{0}^{\infty}g^{2}_{k+1}(t)dt=\int\limits_{0}^{\infty}(g_{k}(t)-g_{k+1}(t))^{2}dt,$

** (c)**

$~{}\lim\limits_{t\rightarrow\infty}tg_{k}(t)=2,$

** (d)**

$~{}n_{k}={\|g_{k}\|^{2}_{L^{2}[0,\infty)}\over\|g_{k+1}\|^{2}_{L^{2}[0,\infty)}}\geq{2},$

** (e)**

$~{}n_{k}={\|g_{k}\|^{2}_{L^{2}[0,\infty)}\over\|g_{k+1}\|^{2}_{L^{2}[0,\infty)}}\leq{4}.$

Proof.

Part (a) follows from integrating (200), and part (b) follows from part (a). Part (c) is done by induction, using the L’Hôpital’s rule as follows. It is obvious that $~{}\lim\limits_{x\rightarrow\infty}tg_{1}(t)=2$ . Hence, for any $k\geq 1$ , (201) implies

[TABLE]

Also,

[TABLE]

implying $~{}[tg_{k+1}]^{\prime}\geq 0~{}$ for all $t\geq 0$ as $g_{k}(t)-g_{k+1}(t)\geq 0$ and $2-tg_{k}(t)>0$ . Hence, $tg_{k+1}(t)$ is bounded and nondecreasing. Thus, $~{}\lim\limits_{t\rightarrow\infty}tg_{k+1}(t)$ exists for all $k\geq 1$ .

Next, suppose $~{}\lim\limits_{t\rightarrow\infty}tg_{k}(t)=2$ . Then by the Mean Value Theorem, for any $t>0$ and for all $y>t$ ,

[TABLE]

Taking $~{}y\rightarrow\infty$ , obtain

[TABLE]

Therefore

[TABLE]

implying $~{}\lim\limits_{t\rightarrow\infty}tg_{k+1}(t)=2$ .

Statement (d) follows from (202) as we have ${\mathcal{N}}_{k}\geq 2{\mathcal{N}}_{k+1}$ from the definition of the Horton-Strahler order. An alternative proof of (d) using the system of ODEs (203) is given in Sect. 8.3.2.

Part (e) follows from part (a) together with Hölder inequality

[TABLE]

which implies $n_{k}={\|g_{k}\|^{2}_{L^{2}[0,\infty)}\over\|g_{k+1}\|^{2}_{L^{2}[0,\infty)}}\leq{4}$ . ∎

Remark 16.

The statements (a) and (b) of Proposition 16 have a straightforward heuristic interpretation, similar to that of equation (191) above. Specifically, (a) claims that the asymptotic relative total number of vertices of order $k+1$ and above in the Kingman’s tree (left-hand side) equals twice the asymptotic relative total number of vertices of order $k+1$ and above except the vertices parental to two vertices of order $k$ (right-hand side). This is nothing but the asymptotic property of a binary tree – the number of leaves equals twice the number of internal nodes. The item (a) hence merely claims that the Kingman’s tree formed by clusters of order above $k$ is binary for any $k\geq 1$ . Similarly, item (b) claims that the asymptotic relative total number of vertices of order $(k+2)$ and above (left-hand side) equals the asymptotic relative total number of vertices of order $(k+1)$ (right-hand side). This is yet another way of saying that the Kingman’s tree is binary.

8.3.2 Rescaling to $[0,1]$ interval

Define

[TABLE]

for $x\in[0,1)$ . Then $h_{0}\equiv 0$ , $h_{1}\equiv 1$ , and the system of ODEs (200) rewrites as

[TABLE]

with the initial conditions $h_{k}(0)=1$ .

Observe that the above quasilinearized system of ODEs (203) has $h_{k}(x)$ converging to $h(x)={1\over 1-x}$ as $k\rightarrow\infty$ , where $h(x)$ is the solution to Riccati equation $h^{\prime}(x)=h^{2}(x)$ over $[0,1)$ , with the initial value $h(0)=1$ . Specifically, we have proven that $g_{k}(x)\rightarrow 0$ as $k\rightarrow\infty$ . Thus

[TABLE]

Observe that $h_{2}(x)=(1+e^{2x})/2$ , but for $k\geq 3$ finding a closed form expression becomes increasingly hard.

We observe from (202) that the quantity $n_{k}$ rewrites in terms of $h_{k}$ as follows

[TABLE]

Consequently, equation (204) implies

[TABLE]

Now, for a known $h_{k}(x)$ , (203) is a first-order linear ODE in $h_{k+1}(x)$ . Its solution is given by $h_{k+1}(x)=\mathcal{H}h_{k}(x)$ , where $\mathcal{H}$ is a nonlinear operator defined as follows

[TABLE]

Hence, the problem of establishing the limit (205) for the root-Horton law concerns the asymptotic behavior of an iterated nonlinear functional.

The following lemma will be used in Sect. 8.4.

Lemma 25.

[TABLE]

Proof.

Observing $\,h^{\prime}_{k+1}(x)+(h_{k+1}(x)-h_{k}(x))^{2}=h_{k+1}^{2}(x)\,$ , we use integration by parts to obtain

[TABLE]

as $1/h(x)=1-x$ . ∎

Next, we notice that (201) implies

[TABLE]

for all $k\geq 1$ .

Finally, an alternative proof to Proposition 16(d) using the system of ODEs (203) follows from Lemma 25 and (207).

Alternative proof of Proposition 16(d).

Lemma 25 implies

[TABLE]

Hence, equation (207) yields $~{}n_{k}={\big{\|}1-h_{k}/h\big{\|}_{L^{2}[0,1]}^{2}\over\big{\|}1-h_{k+1}/h\big{\|}_{L^{2}[0,1]}^{2}}\geq 2$ . ∎

8.4 Proof of the existence of the root-Horton limit

Here we present a proof of Thm. 23(ii). The proof is based on Lemmas 26 and 27 stated below that will be proven in the Sects. 8.4.1 and 8.4.2.

Lemma 26.

If the limit $\lim\limits_{k\rightarrow\infty}{h_{k+1}(1)\over h_{k}(1)}$ exists, then $\lim\limits_{k\rightarrow\infty}\left({\mathcal{N}}_{k}\right)^{-{1\over k}}=\lim\limits_{k\rightarrow\infty}\left(\prod\limits_{j=1}^{k}n_{j}\right)^{1\over k}$ also exists, and

[TABLE]

Lemma 27.

The limit $\lim\limits_{k\rightarrow\infty}{h_{k+1}(1)\over h_{k}(1)}\geq 1$ exists, and is finite.

Once Lemmas 26 and 27 are established, the validity of root-Horton law Theorem 23(ii) is proved as follows.

Proof of Theorem 23(ii).

The existence and finiteness of $\lim\limits_{k\rightarrow\infty}{h_{k+1}(1)\over h_{k}(1)}$ established in Lemma 27 is the precondition for Lemma 26 that in turn implies the existence and finiteness of the limit

[TABLE]

as needed for the root-Horton law. Furthermore,

[TABLE]

and $2\leq R\leq 4$ by Proposition 16. ∎

8.4.1 Proof of Lemma 26 and related results

Proposition 17.

[TABLE]

Proof.

Equation (203) implies

[TABLE]

Integrating both sides of the equation (210) from [math] to $1$ we obtain

[TABLE]

as $h_{k+1}(0)=1$ .

Hence, using Lemma 25, the first inequality in (209) is proved as follows

[TABLE]

Finally, equations (207) and (210) imply

[TABLE]

This completes the proof. ∎

Proof of Lemma 26.

If the limit $\lim\limits_{k\rightarrow\infty}{h_{k+1}(1)\over h_{k}(1)}$ exists and is finite, then so is the limit $\lim\limits_{k\rightarrow\infty}\left({1\over h_{k}(1)}\right)^{-{1\over k}}$ . Then, the existence and the finiteness of the limit $\lim\limits_{k\rightarrow\infty}\left({\mathcal{N}}_{k}\right)^{-{1\over k}}$ follow from equation (205) and Proposition 17. ∎

8.4.2 Proof of Lemma 27 and related results

In this subsection we use the approach developed by Drmota [40] to prove the existence and the finiteness of $\lim\limits_{k\rightarrow\infty}{h_{k+1}(1)\over h_{k}(1)}\geq 1$ . As we saw earlier, this result was used for proving existence, finiteness, and positivity of $\lim\limits_{k\rightarrow\infty}\left({\mathcal{N}}_{k}\right)^{-{1\over k}}=\lim\limits_{k\rightarrow\infty}\left(\prod\limits_{j=1}^{k}n_{j}\right)^{-{1\over k}}$ , the root-Horton law.

Definition 34.

Given $\gamma\in(0,1]$ . Let

[TABLE]

Note that the sequences of functions $h_{k}(x)$ and $V_{k,\gamma}(x)$ can be extended beyond $x=1$ .

Next, we make some observations about the above defined functions.

Observation 1.

$V_{k,\gamma}(x)$ are positive continuous functions satisfying

[TABLE]

for all $x\in[0,1]\setminus(1-\gamma)$ , with initial conditions $V_{k,\gamma}(0)=1$ .

Observation 2.

Let $\gamma_{k}={h_{k}(1)\over h_{k+1}(1)}$ . Then

[TABLE]

and

[TABLE]

Observation 3.

[TABLE]

for all $x\in[0,1]$ since $h_{k}(x)\leq h_{k+1}(x)$ .

Observation 4.

Since $h_{1}(x)\equiv 1$ and $\gamma_{1}={h_{1}(1)\over h_{2}(1)}$ ,

[TABLE]

Observation 4 generalizes as follows.

Proposition 18.

[TABLE]

In order to prove Proposition 18 we will need the following lemma.

Lemma 28.

For any $\gamma\in(0,1)$ and $k\geq 1$ , function $V_{k,\gamma}(x)-h_{k+1}(x)$ changes its sign at most once as $x$ increases from $1-\gamma$ to $1$ . Moreover, since $V_{k,\gamma}(1-\gamma)=h(1-\gamma)>h_{k+1}(1-\gamma)$ , function $V_{k,\gamma}(x)-h_{k+1}(x)$ can only change sign from nonnegative to negative.

Proof.

This is a proof by induction with base at $k=1$ . Here $V_{1,\gamma}(x)={1\over\gamma}$ is constant on $[1-\gamma,1]$ , while $h_{2}(x)=(1+e^{2x})/2$ is an increasing function, and

[TABLE]

For the induction step, we need to show that if $V_{k,\gamma}(x)-h_{k+1}(x)$ changes its sign at most once, then so does $V_{k+1,\gamma}(x)-h_{k+2}(x)$ . Since both sequences of functions satisfy the same ODE relation (see Observation 1), we have

${d\over dx}\left[(V_{k+1,\gamma}(x)-h_{k+2}(x))\cdot e^{-2\int\limits_{1-\gamma}^{x}h_{k+1}(y)dy}\right]$

[TABLE]

where $h_{k+1}(x)\leq V_{k+1,\gamma}(x)$ by definition of $V_{k+1,\gamma}(x)$ , and $V_{k,\gamma}(x)\leq V_{k+1,\gamma}(x)$ as in Observation 3.

Now, let

[TABLE]

Then

[TABLE]

The function $2V_{k+1,\gamma}(x)-V_{k,\gamma}(x)-h_{k+1}(x)\geq 0$ , and since $V_{k,\gamma}(x)-h_{k+1}(x)$ changes its sign at most once, then $I(x)$ should change its sign from nonnegative to negative at most once as $x$ increases from $1-\gamma$ to $1$ . Hence

[TABLE]

should change its sign from nonnegative to negative at most once as

[TABLE]

by (207). ∎

Proof of Proposition 18.

Take $\gamma=\gamma_{k}$ in Lemma 28. Then function $h_{k+1}(x)-V_{k,\gamma_{k}}(x)$ should change its sign from nonnegative to negative at most once within the interval $[1-\gamma_{k},1]$ . Hence, $V_{k,\gamma_{k}}(1-\gamma_{k})>h_{k+1}(1-\gamma_{k})$ and $h_{k+1}(1)=V_{k,\gamma_{k}}(1)$ imply $h_{k+1}(x)\leq V_{k,\gamma_{k}}(x)$ as in the statement of the proposition. ∎

Now we are ready to prove the monotonicity result.

Lemma 29.

[TABLE]

Proof.

We prove it by contradiction. Suppose $\gamma_{k}\geq\gamma_{k+1}$ for some $k\in\mathbb{N}^{+}$ . Then

[TABLE]

and therefore

[TABLE]

as $h_{k+1}(x)\leq V_{k,\gamma_{k}}(x)$ by Proposition 18.

Recall that for $x\in[1-\gamma_{k+1},1]$ ,

[TABLE]

where at $1-\gamma_{k+1}$ we consider only the right-hand derivative. Thus for $x\in[1-\gamma_{k+1},1]$ ,

[TABLE]

where $A(x)=2V_{k+1,\gamma_{k+1}}(x)-V_{k,\gamma_{k+1}}(x)-h_{k+1}(x)\geq 0$ , $B(x)=2h_{k+1}(x)>0$ , and $V_{k+1,\gamma_{k+1}}(1-\gamma_{k+1})-h_{k+2}(1-\gamma_{k+1})=h(1-\gamma_{k+1})-h_{k+2}(1-\gamma_{k+1})>0$ . Hence

[TABLE]

arriving to a contradiction since $V_{k+1,\gamma_{k+1}}(1)=h_{k+2}(1)$ . ∎

Corollary 20.

Limit $\lim\limits_{k\rightarrow\infty}\gamma_{k}$ exists.

Proof.

Lemma 29 implies $\gamma_{k}$ is a monotone increasing sequence, bounded by $1$ . ∎

Proof of Lemma 27.

Lemma 27 follows immediately from Corollary 20 and an observation that ${h_{k+1}(1)\over h_{k}(1)}={1\over\gamma_{k}}$ . ∎

9 Generalized dynamical pruning

The Horton pruning (Def. 3), which is the key element of the self-similarity theory developed in previous sections, is a very particular way of erasing a tree. Here we suggest a general approach to erasing a finite tree from leaves down to the root that include both combinatorial and metric prunings, and discuss the respective prune-invariance.

Given a tree $T\in\mathcal{L}$ and a point $x\in T$ , let $\Delta_{x,T}$ be the descendant tree of $x$ : it is comprised of all points of $T$ descendant to $x$ , including $x$ ; see Fig. 35a. Then $\Delta_{x,T}$ is itself a tree in $\mathcal{L}$ with root at $x$ . Let $T_{1}=(M_{1},d_{1})$ and $T_{2}=(M_{2},d_{2})$ be two metric rooted trees (Def. 1), and let $\rho_{1}$ denote the root of $T_{1}$ . A function $f:T_{1}\rightarrow T_{2}$ is said to be an isometry if ${\sf Image}[f]\subseteq\Delta_{f(\rho_{1}),T_{2}}$ and for all pairs $x,y\in T_{1}$ ,

[TABLE]

The tree isometry is illustrated in Fig. 35b. We use the isometry to define a partial order in the space $\mathcal{L}$ as follows. We say that $T_{1}$ is less than or equal to $T_{2}$ and write $T_{1}\preceq T_{2}$ if there is an isometry $f:T_{1}\rightarrow T_{2}$ . The relation $\preceq$ is a partial order as it satisfies the reflexivity, antisymmetry, and transitivity conditions. Moreover, a variety of other properties of this partial order can be observed, including order denseness and semi-continuity.

We say that a function $\varphi:\mathcal{L}\rightarrow\mathbb{R}$ is monotone nondecreasing with respect to the partial order $\preceq$ if $\varphi(T_{1})\leq\varphi(T_{2})$ whenever $T_{1}\preceq T_{2}.$ Consider a monotone nondecreasing function $\varphi:\mathcal{L}\rightarrow\mathbb{R}_{+}$ . We define the generalized dynamical pruning operator $\mathcal{S}_{t}(\varphi,T):\mathcal{L}\rightarrow\mathcal{L}$ induced by $\varphi$ for any $t\geq 0$ as

[TABLE]

where $\rho$ denotes the root of tree $T$ . Informally, the operator $\mathcal{S}_{t}$ cuts all subtrees $\Delta_{x,T}$ for which the value of $\varphi$ is below threshold $t$ , and always keeps the tree root. Extending the partial order to $\mathcal{L}$ by assuming $\phi\preceq T$ for all $T\in\mathcal{L}$ , we observe for any $T\in\mathcal{L}$ that $S_{s}(T)\preceq S_{t}(T)$ whenever $s\geq t$ .

9.1 Examples of generalized dynamical pruning

The dynamical pruning operator $\mathcal{S}_{t}$ encompasses and unifies a range of problems, depending on a choice of $\varphi$ , as we illustrate in the following examples.

9.1.1 Example: pruning via the tree height

Let the function $\varphi(T)$ equal the height of tree $T$ :

[TABLE]

In this case the operator $\mathcal{S}_{t}$ satisfies the continuous semigroup property:

[TABLE]

It coincides with the continuous pruning (a.k.a. tree erasure) studied by Jacques Neveu [105], who established invariance of a critical and sub-critical binary Galton-Watson trees with i.i.d. exponential edge lengths with respect to this operation.

It is readily seen that for a coalescent process (Sect. 7.8.1), the dynamical pruning $\mathcal{S}_{t}$ of the corresponding coalescent tree with $\varphi(T)$ as in (214) replicates the coalescent process. More specifically, the timing and order of particle mergers is reproduced by the dynamics of the leaves of $\mathcal{S}_{t}(\varphi,T)$ . See Sect. 10.2.3, Thm. 27 for a concrete version of this statement for the coalescent dynamics of shocks in the continuum ballistic annihilation model.

9.1.2 Example: pruning via the Horton-Strahler order

Let the function $\varphi(T)$ be one unit less that the Horton-Strahler order ${\sf ord}(T)$ of a tree $T$ :

[TABLE]

This function is also known as the register number [49, 55], as it equals the minimum number of memory registers necessary to evaluate an arithmetic expression described by a tree $T$ , assuming that the result is stored in an additional register that also can be used for calculations.

With the choice (215), the dynamical pruning operator coincides with the Horton pruning (Def. 3): $\mathcal{S}_{t}=\mathcal{R}^{\lfloor t\rfloor}$ , if we assume that all edge lengths equal to unity. It is readily seen that $\mathcal{S}_{t}$ satisfies the discrete semigroup property:

[TABLE]

Most of the present survey is focused on invariance of a tree distribution with respect to this operation.

9.1.3 Example: pruning via the total tree length

Let the function $\varphi(T)$ equal the total lengths of $T$ :

[TABLE]

The dynamical pruning by the tree length is illustrated in Fig. 36 for a Y-shaped tree that consists of three edges.

Importantly, in this case $\mathcal{S}_{t}$ does not satisfy the semigroup property. To see this, consider an internal vertex point $x\in T$ (see Fig. 36, where the only internal vertex is marked by a gray ball). Then $\Delta_{x,T}$ consists of point $x$ as its root, the left subtree of length $a$ and the right subtree of length $b$ . Observe that the whole left subtree is pruned away by time $a$ , and the whole right subtree is pruned away by time $b$ . However, since

[TABLE]

the junction point $x$ will not be pruned until time instant $a+b$ . Thus, $x$ will be a leaf of $\mathcal{S}_{t}(\varphi,T)$ for all $t$ such that

[TABLE]

This situation corresponds to Stage IV in Fig. 36, where each of the left and right subtrees stemming from point $x$ (marked by a gray ball) consists of a single root vertex.

The semigroup property in this example can be introduced by considering mass-equipped trees. Informally, we replace each pruned subtree $\tau$ of $T$ with a point of mass equal to the total length of $\tau$ . The massive points contain some of the information lost during the pruning process, which is enough to establish the semigroup property. Specifically, by time $a$ , the pruned away left subtree (Fig. 36, Stage III) turns into a massive point of mass $a$ attached to $x$ on the left side. Similarly, by time $b$ , the pruned away right subtree (Fig. 36, Stage IV) turns into a massive point of mass $b$ attached to $x$ on the right side. For $\max\{a,b\}\leq t\leq a+b$ , this construction keeps truck of the quantity $a+b-t$ associated with point $x$ , and when the quantity $a+b-t$ decreases to [math], the two massive points coalesce into one. If at instant $t$ a single massive point seats at a leaf, its mass $m=t$ , and the leaf’s parental edge is being pruned. If at instant $t$ two massive points (left and right) seat at a leaf, they total mass $m\geq t$ , and further pruning of the leaf’s parental edge is prevented until the instant $t=m$ , when the two massive points coalesce. Keeping track of all such quantities makes $\mathcal{S}_{t}$ satisfy the continuous semigroup property. This construction is formally introduced in Sect. 10, which shows that the pruning operator $\mathcal{S}_{t}$ with (216) coincides with the potential dynamics of continuum mechanics formulation of the 1-D ballistic annihilation model, $A+A\rightarrow\zeroslash$ .

9.1.4 Example: pruning via the number of leaves

Let the function $\varphi(T)$ equal the number of leaves in a tree $T$ . This choice is closely related to the mass-conditioned dynamics of an aggregation process. Specifically, consider $N$ singletons (particles with unit mass) that appear in a system at instants $t_{n}\geq 0$ , $1\leq n\leq N$ . The existing clusters merge into consecutively larger clusters by pair-wise mergers. The cluster mass is additive: a merger of two clusters of masses $i$ and $j$ results in a cluster of mass $i+j$ . We consider a time-oriented tree $T$ that describes this process. The tree $T$ has $N$ leaves and $(N-1)$ internal vertices. Each leaf corresponds to an initial particle, each internal vertex corresponds to a merger of two clusters, and the edge lengths represent times between the respective mergers. The action of $\mathcal{S}_{t}$ on such a tree coincides with a conditional state of the process that only considers clusters of mass $\geq t$ . A well-studied special case is a coalescent process with a kernel $K(i,j)$ of Sect. 7.8.1.

9.2 Pruning for $\mathbb{R}$ -trees

The generalized dynamical pruning is readily applied to real trees (Sect. 2.2), although this is not the focus of our work. We notice that the total tree length (Example 9.1.3) and number of leaves (Example 9.1.4) might be undefined (infinite) for an $\mathbb{R}$ -tree. We introduce in Sect. 10.5.3 a mass function that can serve as a natural general analog of these and other functions on finite trees. We show (Sect. 10.2.3, Thm. 28) that pruning by mass is equivalent to the pruning by the total tree lengths in a particular situation of ballistic annihilation model with piece-wise continuous potential with a finite number of segments. Accordingly, our results should be straightforwardly extended to $\mathbb{R}$ -trees that appear, for instance, as a description of the continuum ballistic annihilation dynamics for other initial potentials.

9.3 Relation to other generalizations of pruning

A pruning operation similar in spirit to the generalized dynamical pruning was considered in a work by Duquesne and Winkel [46] that extended a formalism by Evans [52] and Evans et al. [53]. We notice that the two definitions of pruning, the generalized dynamical pruning of Sect. 9 and that in [46], are principally different, despite their similar appearance. In essence, the work [46] assumes the Borel measurability with respect to the Gromov-Hausdorff metric ([46], Section 2), which implies the semigroup property of the respective pruning ([46], Lemma 3.11). On the contrary, the generalized dynamical pruning defined here may have the semigroup property only under very particular choices of $\varphi(T)$ as in the examples in Sect. 9.1.1 and 9.1.2. The majority of natural choices of $\varphi(T)$ , including the tree length $\varphi(T)=\textsc{length}(T)$ (Sect. 9.1.3) or the number of leaves in a tree (Sect. 9.1.4), do not satisfy the semigroup property, and hence are not covered by the pruning of [46]. The main results of our Sect. 10 refer to the pruning function $\varphi(T)=\textsc{length}(T)$ that does not satisfy the semigroup property, as shown in Sect. 9.1.3.

Curiously, for the above two examples with no semigroup property, i.e., when $\varphi(T)=\textsc{length}(T)$ and when $\varphi(T)$ equals the number of leaves in $T$ , the following discontinuity property holds with respect to the Gromov-Hausdorff metric $d_{\sf GH}$ defined in [52, 53, 46]. For any $\epsilon>0$ and any $M>0$ , there exist trees $T$ and $T^{\prime}$ in $\mathcal{L}$ such that

[TABLE]

Indeed, if $\varphi(T)=\textsc{length}(T)$ , we consider a tree $T$ with the number of leaves exceeding $M/\epsilon$ , and let $T^{\prime}$ be the tree obtained from $T$ by elongating each of its leaves by $\epsilon$ . Similarly, if $\varphi(T)$ is the number of leaves in $T$ , we construct $T^{\prime}$ from $T$ by attaching at least $M/\epsilon$ new leaves, each of length $\epsilon$ .

9.4 Invariance with respect to the generalized dynamical pruning

Consider a tree $T\in\mathcal{L}_{\rm plane}$ with edge lengths given by a vector $l_{T}=(l_{1},\dots,l_{\#T})$ . The vector $l_{T}$ can be specified by distribution $\chi(\cdot)$ of a point $x_{T}=(x_{1},\dots,x_{\#T})$ on the standard simplex

[TABLE]

and conditional distribution $F(\cdot|x_{T})$ of the tree length $\textsc{length}(T)$ , so that

[TABLE]

Accordingly, a tree $T$ can be completely specified by its planar shape, a vector of proportional edge lengths, and the total tree length:

[TABLE]

A measure $\eta$ on $\mathcal{L}_{\rm plane}$ is a joint distribution of these three components:

[TABLE]

where the tree planar shape is specified by

[TABLE]

the relative edge lengths is specified by

[TABLE]

and the total tree length is specified by

[TABLE]

Let us fix $t\geq 0$ and a function $\varphi:\mathcal{L}_{\rm plane}\rightarrow\mathbb{R}$ that is monotone nondecreasing with respect to the partial order $\preceq$ . We denote by $\mathcal{S}_{t}^{-1}(\varphi,T)$ the preimage of a tree $T\in\mathcal{L}_{\rm plane}$ under the generalized dynamical pruning:

[TABLE]

Consider the distribution of edge lengths induced by the pruning:

[TABLE]

and

[TABLE]

where the notation $\tilde{T}:=\mathcal{S}_{t}(\varphi,T)$ is used for brevity.

Definition 35 (Generalized prune invariance).

Consider a function $\varphi:\mathcal{L}_{\rm plane}\rightarrow\mathbb{R}_{+}$ that is monotone nondecreasing with respect to the partial order $\preceq$ . A measure $\eta$ on $\mathcal{L}_{\rm plane}$ is called invariant with respect to the generalized dynamical pruning $\mathcal{S}_{t}(\cdot)=\mathcal{S}_{t}(\varphi,\cdot)$ (or simply prune invariant) if the following conditions hold for all $t\geq 0$ :

(i)

The measure is prune-invariant in shapes. This means that for the pushforward measure $\nu=(\mathcal{S}_{t})_{*}(\mu)=\mu\circ\mathcal{S}_{t}^{-1}$ we have

[TABLE]

(ii)

The measure is prune-invariant in edge lengths. This means that for any combinatorial planar tree $\tau\in\mathcal{T}_{\rm plane}$

[TABLE]

and there exists a scaling exponent $\zeta\equiv\zeta(\varphi,t)>0$ such that for any relative edge length vector $\bar{x}\in\Delta^{\#\tau}$ we have

[TABLE]

Remark 17 (Pruning trees with no embedding).

The generalized dynamical pruning (213) and the notion of prune invariance (Def. 35) can be similarly defined on the space $\mathcal{L}$ of metric trees with no planar embedding. In this work we only apply the concept of prune invariance to planar trees.

Remark 18 (Relation to Horton prune-invariance).

Definition 35 is similar to Def. 11 of prune invariance with respect to the Horton pruning, with combinatorial Horton pruning $\mathcal{R}$ being replaced with metric generalized dynamical pruning $\mathcal{S}_{t}$ .

The prune invariance of Def. 35 unifies multiple invariance properties examined in the literature. For example, the classical work by Jacques Neveu [105] establishes the prune invariance of the exponential critical binary Galton-Watson trees ${\sf GW}(\lambda)$ with respect to the tree erasure from the leaves down to the root at a unit rate, which is equivalent to the generalized dynamical pruning with function $\varphi(T)=\textsc{height}(T)$ (Sect. 9.1.1). The prune invariance with respect to the Horton pruning (Sect. 9.1.2) has been established by Burd et al. [29] for the combinatorial critical binary Galton-Watson $\mathcal{GW}\left({1\over 2},{1\over 2}\right)$ trees (Thm. 4 in Sect. 5.1.1). Duquesne and Winkel [46] established the prune-invariance of the exponential critical binary Galton-Watson ${\sf GW}(\lambda)$ trees with respect to the so-called hereditary property, which includes the tree erasure of Sect. 9.1.1 and Horton pruning of Sect. 9.1.2. The critical Tokunaga trees analyzed in Sect. 6.5 are prune-invariant with respect to the Horton pruning; this model includes ${\sf GW}(\lambda)$ trees as a special case. Section 9.5 below establishes the prune invariance of the exponential critical binary Galton-Watson ${\sf GW}(\lambda)$ trees with respect to the generalized pruning with an arbitrary pruning function $\varphi(T)$ .

9.5 Prune invariance of ${\sf GW}(\lambda)$

This section establishes prune invariance of exponential critical binary Galton-Watson trees with respect to arbitrary generalized pruning.

Theorem 24 ([85]).

Let $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(\lambda)$ , $T\in\mathcal{BL}_{\rm plane}^{|}$ , be an exponential critical binary Galton-Watson tree with parameter $\lambda>0$ . Then, for any monotone nondecreasing function $\varphi:\mathcal{BL}_{\rm plane}^{|}\rightarrow\mathbb{R}_{+}$ and any $\Delta>0$ we have

[TABLE]

where $p_{\Delta}(\lambda,\varphi)={\sf P}(\mathcal{S}_{\Delta}(\varphi,T)\not=\phi)$ . That is, the pruned tree $T^{\Delta}$ conditioned on surviving is an exponential critical binary Galton-Watson tree with parameter

[TABLE]

Proof.

Let $X$ denote the length of the stem (edge adjacent to the root) in $T$ , and $Y$ denote the length of the stem in $T^{\Delta}$ . Let $x$ be the nearest descendent vertex (a junction or a leaf) to the root in $T$ . Then $X$ , which is an exponential random variable with parameter $\lambda$ , represents the distance from the root of $T$ to $x$ . Let ${\sf deg}_{T}(x)$ denote the degree of $x$ in tree $T$ and ${\sf deg}_{T^{\Delta}}(x)$ denote the degree of $x$ in tree $T^{\Delta}$ . If $T^{\Delta}=\phi$ , then $Y=0$ . Let

[TABLE]

The event $\{Y\leq h\}$ is partitioned into the following non-overlapping sub-events S ${}_{1},\dots$ S4 illustrated in Fig. 37:

(S1)

The event $\{{\sf deg}_{T}(x)=1\text{ and }X\leq h\}$ has probability

[TABLE]

(S2)

The event

[TABLE]

has probability

[TABLE]

(S3)

The event $\{X\leq h$ and ${\sf deg}_{T}(x)=3$ and either both subtrees of $T$ descending from $x$ are pruned away completely (not intersecting $T^{\Delta}$ ) or $\{x\in T^{\Delta},~{}{\sf deg}_{T^{\Delta}}(x)=3\}\}$ has probability

[TABLE]

(S4)

The event

[TABLE]

has probability111Here, ${\sf deg}_{T^{\Delta}}(x)=2$ means $x$ is neither a junction nor a leaf in $T^{\Delta}$ .

[TABLE]

Using this we have two representations for the probability ${\sf P}(Y\leq h)$ :

[TABLE]

which simplifies to

[TABLE]

Differentiating the above equality we obtain the following equation for the p.d.f. $f(y)={d\over dy}F(y)$ of $Y$ :

[TABLE]

where as before $\phi_{\lambda}$ denotes the exponential density with parameter $\lambda$ as in (69). Applying integral transformation on both sides of the equation, we obtain the characteristic function $\widehat{f}(s)={\sf E}\big{[}e^{isY}\big{]}$ of $Y$ ,

[TABLE]

Thus, we conclude that $Y$ is an exponential random variable with parameter $\lambda p_{\Delta}$ .

Next, let $y$ be the descendent vertex (a junction or a leaf) to the root in $T^{\Delta}$ . If $T^{\Delta}=\phi$ , let $y$ denote the root. Let

[TABLE]

Then,

[TABLE]

implying

[TABLE]

which in turn yields $q={1\over 2}$ .

We saw that conditioning on $\mathcal{S}_{\Delta}(\varphi,T)\not=\phi$ , the pruned tree $T^{\Delta}$ has the stem length distributed exponentially with parameter $\lambda p_{\Delta}$ . Then, with probability $q={1\over 2}$ , the pruned tree $T^{\Delta}$ branches at $y$ (the stem end point farthest from the root) into two independent subtrees, each distributed as $\{T^{\Delta}~{}|~{}T^{\Delta}\neq\phi\}$ . Thus, we recursively obtain that $T^{\Delta}$ is a critical binary Galton-Watson tree with i.i.d. exponential edge length with parameter $\lambda p_{\Delta}$ . ∎

Next, we find an exact form of the survival probability $p_{\Delta}(\lambda,\varphi)$ for three particular choices of $\varphi$ , thus obtaining $\mathcal{E}_{\Delta}(\lambda,\varphi)$ .

Theorem 25 ([85]).

In the settings of Theorem 24, we have

(a)

If $\varphi(T)$ equals the total length of $T$ $(\varphi=\textsc{length}(T))$ , then

[TABLE]

(b)

If $\varphi(T)$ equals the height of $T$ $(\varphi=\textsc{height}(T))$ , then

[TABLE]

(c)

If $\varphi(T)+1$ equals the Horton-Strahler order of the tree $T$ , then

[TABLE]

where $\lfloor\Delta\rfloor$ denotes the maximal integer $\leq\Delta$ .

Proof.

Part (a). Suppose $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(\lambda)$ , and let $\ell(x)$ once again denote the p.d.f. of the total length $\textsc{length}(T)$ . Then, by Lemma 8,

[TABLE]

where for the last equality we used formula 11.3.14 in [2].

Part (b). Suppose $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(\lambda)$ . Let ${\sf H}(x)$ once again denote the cumulative distribution function of the height $\textsc{height}(T)$ . Then by Lemma 9, for any $\Delta>0$ ,

[TABLE]

Part (c). Follows from Corollary 12(a). ∎

Remark 19.

Let ${\mathcal{E}}_{\Delta}(\lambda,\varphi)={2\lambda\over\lambda\Delta+2}$ as in Theorem 25(b). Here $~{}{\mathcal{E}}_{0}\lambda=\lambda$ and ${\mathcal{E}}_{\Delta}(\lambda,\varphi)$ is a linear-fractional transformation associated with matrix

[TABLE]

Since ${\mathcal{A}}_{\Delta}$ form a subgroup in $SL_{2}(\mathbb{R})$ , the transformations $\left\{{\mathcal{E}}_{\Delta}\right\}_{\Delta\geq 0}$ satisfy the semigroup property

[TABLE]

for any pair $\Delta_{1},\Delta_{2}\geq 0$ .

We notice also that the operator ${\mathcal{E}}_{\Delta}(\lambda,\varphi)$ in part (c) of Theorem 25 satisfies only the discrete semigroup property for nonnegative integer times. Finally, one can check that ${\mathcal{E}}_{\Delta}(\lambda,\varphi)$ in part (a) does not satisfy the semigroup property.

10 Continuum 1-D ballistic annihilation

As an illuminating application of the generalized dynamical pruning (Sect. 9) and its invariance properties (Sect. 9.4), we consider the dynamics of particles governed by $1$ -D ballistic annihilation model, traditionally denoted $A+A\rightarrow\zeroslash$ [47]. This model describes the dynamics of particles on a real line: a particle with Lagrangian coordinate $x$ moves with a constant velocity $v(x)$ until it collides with another particle, at which moment both particles annihilate, hence the model notation. The annihilation dynamics appears in chemical kinetics and bimolecular reactions and has received attention in physics and probability literature [47, 20, 19, 115, 42, 21, 48, 28, 86, 126].

In a continuum version of the ballistic annihilation model introduced in [85], the moving shock waves represent the sinks that aggregate the annihilated particles and hence accumulate the mass of the media. Dynamics of these sinks resembles a coalescent process that generates a tree structure for their trajectories, which explain the term shock wave tree that we use below. The dynamics of a ballistic annihilation model with two coalescing sinks is illustrated in Fig. 38.

Sect. 10.1 introduces the continuum annihilation model and describes the natural emergence of sinks (shocks). The model initial conditions are given by a particle velocity distribution and particle density on $\mathbb{R}$ . Subsequently, we only consider a constant density and initial velocity distribution with alternating values $\pm 1$ , or, equivalently, initial piece-wise linear potential $\psi(x,0)$ with alternating slopes $\pm 1$ (Fig. 39). Section 10.2 discusses a construction of the graphical embedding of the shock wave tree into the phase space $(x,\psi(x,t))$ and space-time domain $(x,t)$ . Theorems 27, 28 in Sect. 10.2.3 establish equivalence of the ballistic annihilation dynamics to the generalized dynamical pruning of a (mass-equipped) shock wave tree. Sections 10.3,10.4 illustrate how the pruning interpretation of annihilation dynamics facilitates analytical treatment of the model. Specifically, we give a complete description of the time-advanced potential function $\psi(x,t)$ at any instant $t>0$ for the initial potential in a form of exponential excursion (Thm. 29), and describe the temporal dynamics of a random sink (Thms. 30,31). A real tree representation of ballistic annihilation is discussed in Sect. 10.5.

10.1 Continuum model, sinks, and shock trees

Consider a Lebesgue measurable initial density $g(x)\geq 0$ of particles on an interval $[a,b]\subset\mathbb{R}$ . The initial particle velocities are given by $v(x,0)=v(x)$ . Prior to collision and subsequent annihilation, a particle located at $x_{0}$ at time $t=0$ moves according to its initial velocity, so its coordinate $x(t)$ changes as

[TABLE]

When the particle collides with another particle, it annihilates. Accordingly, two particles with initial coordinates and velocities $(x_{-},v_{-})$ and $(x_{+},v_{+})$ collide and annihilate at time $t$ when they meet at the same new position,

[TABLE]

given that neither of the particles annihilated prior to $t$ . In this case, the annihilation time is given by

[TABLE]

Let $v(x,t)$ be the Eulerian specification of the velocity field at coordinate $x$ and time instant $t$ ; we define the corresponding potential function

[TABLE]

so that $v(x,t)=-\partial_{x}\psi(x,t)$ . Let $\psi(x,0)=\Psi_{0}(x)$ be the initial potential.

We call a point $\sigma(t)$ sink (or shock), if there exist two particles that annihilate at coordinate $\sigma(t)$ at time $t$ . Suppose $v(x)\in C^{1}(\mathbb{R})$ . The equation (219) implies that appearance of a sink is associated with a negative local minima of $v^{\prime}(x^{*})$ ; we call such points sink sources. Specifically, if $x^{*}$ is a sink source, then a sink will appear at breaking time $t^{*}=-1/v^{\prime}(x^{*})$ at the location given by

[TABLE]

provided there exists a punctured neighborhood

[TABLE]

such that none of the particles with the initial coordinates in $N_{\delta}(x^{*})$ is annihilated before time $t^{*}$ .

Sinks, which originate at sink sources, can move and coalesce (see Fig. 38). We refer to a sink trajectory as a shock wave. We impose the conservation of mass condition by defining the mass of a sink at time $t$ to be the total mass of particles annihilated in the sink between time zero and time $t$ . When sinks coalesce, their masses add up. It will be convenient to assume that sinks do not disappear when they stop accumulating mass. Informally, we assume that the sinks are being pushed by the system particles. Formally, there exists three cases depending on the occupancy of a neighborhood of $\sigma(t)$ . If there exists an empty neighborhood around the sink coordinate $\sigma(t)$ , the sink is considered at rest – its coordinate does not change. If only the left neighborhood of $\sigma(t)$ is empty, and the right adjacent velocity is negative:

[TABLE]

the sink at $\sigma(t)$ moves with velocity $v(\sigma_{+},t)$ . A similar rule is applied to the case of right empty neighborhood. The appearance, motion, and subsequent coalescence of sinks can be described by a time oriented shock tree. In particular, the coalescence of sinks under initial conditions with a finite number of sink sources is described by a finite tree.

The dynamics of ballistic annihilation, either in discrete or continuum versions, can be quite intricate and is lacking a general description. The existing analyses focus on the evolution of selected statistics under particular initial conditions. In the following sections, we give a complete description of the dynamics in case of two-valued initial velocity and constant particle density.

10.2 Piece-wise linear potential with unit slopes

The discrete 1-D ballistic annihilation model with two possible velocities $\pm v$ was considered in [47, 19, 21, 48, 28]; the three velocity case ( $-1$ , [math], and $+1$ ) appeared in [42, 126]. Here, we explore a continuum version of the 1-D ballistic annihilation with two possible initial velocities and constant initial density, i.e. $v(x)=\pm v$ and $g(x,0)\equiv g(x)\equiv g_{0}$ for $x\in[a,b]$ . Since we can scale both space and time, without loss of generality we let $v(x)=\pm 1$ and $g(x)\equiv 1$ .

Recall (Sect. 7.3) the space $\mathcal{E}^{\rm ex}$ of positive piece-wise linear continuous excursions with alternating slopes $\pm 1$ and finite number of segments. We write ${\mathcal{E}}^{\sf ex}([a,b])$ for the restriction of this space on the real interval $[a,b]$ . We consider an initial potential $\psi(x,0)=\Psi_{0}(x)$ such that $-\psi(x,0)\in{\mathcal{E}}^{\sf ex}([a,b])$ ; see Fig. 39. This space bears a lot of symmetries that facilitate our analysis.

The dynamics of a system with a simple unit slope potential is illustrated in Fig. 40. Prior to collision, the particles move at unit speed either to the left or to the right, so their trajectories in the $(x,t)$ space are given by lines with slope $\pm 1$ (Fig. 40, top panel, gray lines). The local minima of the potential $\Psi_{0}(x)$ correspond to the points whose right neighborhood moves to the left and left neighborhood moves to the right with unit speed, hence immediately creating a sink. Accordingly, the sinks appear at $t=0$ at the local minima of the potential; and those are the only sinks of the system. The sinks move and merge to create a shock wave tree, shown in blue in Fig. 40.

Observe that the domain $[a,b]$ is partitioned into non-overlapping subintervals with boundaries $x_{j}$ such that the initial particle velocity assumes alternating values of $\pm 1$ within each interval, with boundary values $v(a,0)=v(a)=1$ and $v(b,0)=v(b)=-1$ . Because of the choice of potential $\Psi_{0}(x)$ , we have

[TABLE]

i.e. the total length of the subintervals with the initial velocity $-1$ equals the total length of the subintervals with the initial velocity $1$ . For a finite interval $[a,b]$ , there exists a finite time $t_{\rm max}=(b-a)/2$ at which all particles aggregate into a single sink of mass $m=(b-a)=2\,t_{\rm max}$ [85]. We only consider the solution on the time interval $[0,t_{\rm max}]$ , and assume that the density of particles vanishes outside of $[a,b]$ .

10.2.1 Graphical representation of the shock wave tree

For our fixed choice of the initial particle density $g(x)\equiv 1$ , the model dynamics is completely determined by the potential $\Psi_{0}(x)$ . We will be particularly interested in the dynamics of sinks (shocks), which we refer to as shock waves. The trajectories of sinks can be described by a set (Fig. 40, top panel)

[TABLE]

in the system space-time domain $(x,t):x\in[a,b],~{}t\in\big{[}0,(b-a)/2\big{]}.$ These trajectories have a finite binary tree structure: the combinatorial planar shape of $\mathcal{G}^{(x,t)}(\Psi_{0})$ is a finite tree in $\mathcal{BT}_{\rm plane}^{|}$ [85]. For any two points $(x_{i},t_{i})\in\mathcal{G}^{(x,t)}(\Psi_{0})$ , $i=1,2$ , connected by a unique self-avoiding path $\gamma$ within $\mathcal{G}^{(x,t)}(\Psi_{0})$ , we define the distance between them as

[TABLE]

where

[TABLE]

Equivalently, the distance between the points within a single edge is defined as their nonnegative time increment; this induces the distance $d^{(x,t)}$ on $\mathcal{G}^{(x,t)}(\Psi_{0})$ .

Similarly, the trajectories of the sinks can be described by a set (Fig. 40, bottom panel)

[TABLE]

in the system phase space $(x,\psi(x,t)):~{}x\in[a,b],~{}t\in\big{[}0,(b-a)/2\big{]}.$ For any two points $(x_{i},\psi_{i})\in\mathcal{G}^{(x,\psi)}(\Psi_{0})$ , $i=1,2$ , connected by a unique self-avoiding path $\gamma$ within $\mathcal{G}^{(x,\psi)}(\Psi_{0})$ , we define the distance between them as

[TABLE]

Equivalently, one can consider the $L^{1}$ distance between the points within a single edge; this induces the distance $d^{(x,\psi)}$ on $\mathcal{G}^{(x,\psi)}(\Psi_{0})$ .

Lemma 30 ([85]).

The metric spaces $\big{(}\mathcal{G}^{(x,t)}(\Psi_{0}),d^{(x,t)}\big{)}$ and $\big{(}\mathcal{G}^{(x,\psi)}(\Psi_{0}),d^{(x,\psi)}\big{)}$ are trees (Def. 1). Furthermore, they have a finite number of edges and are isomeric to a unique binary tree from $\mathcal{BL}_{\rm plane}^{|}$ that we denote by $S(\Psi_{0})$ .

We refer to the trees of Lem. 30 as the graphical trees $\mathcal{G}^{(x,t)}(\Psi_{0})$ and $\mathcal{G}^{(x,\psi)}(\Psi_{0})$ since they are two alternative graphical representations of the shock wave tree $S(\Psi_{0})$ .

10.2.2 Structure of the shock wave tree

Importantly, for our particular choice of the initial potential, the combinatorial structure and the planar embedding of the shock wave tree coincide with that of the level set tree $T=\textsc{level}\big{(}-\Psi_{0}\big{)}$ of the initial potential, as we state in the following theorem.

Theorem 26 (Shock wave tree is a level set tree, [85]).

Suppose $g(x)\equiv 1$ and the initial potential $\Psi_{0}(x)$ is such that $-\Psi_{0}(x)\in\mathcal{E}^{\sf ex}$ . Then

[TABLE]

Theorem 26 implies that there is one-to-one correspondence between internal local maxima of $\Psi_{0}(x)$ and internal non-root vertices of $S(\Psi_{0})$ . There is also a one-to-one correspondence between local minima and the leaves. We label the tree vertices with the indices $j$ that correspond to the enumeration of the local extrema $x_{j}$ of $\Psi_{0}(x)$ ; see Fig. 41. We write ${\sf parent}(i)$ for the index of the parent vertex to vertex $i$ ; ${\sf right}(i)$ and ${\sf left}(i)$ for the indices of the right and the left offsprings of an internal vertex $i$ ; and ${\sf sibling}(i)$ for the index of the sibling of vertex $i$ .

For a local extremum $x_{j}$ , we define its basin $\mathcal{B}_{j}$ as the shortest interval that contains $x_{j}$ and supports a non-positive excursion of $\Psi_{0}(x)$ . Formally, $\mathcal{B}_{j}=[x^{\rm left}_{j},x^{\rm right}_{j}]$ , where

[TABLE]

We observe that the basin $\mathcal{B}_{j}$ for a local minimum $x_{j}$ coincides with its coordinate: $\mathcal{B}_{j}=\{x_{j}=x^{\rm left}_{j}=x^{\rm right}_{j}\}$ .

The basin’s length is $\big{|}\mathcal{B}_{j}\big{|}=x^{\rm right}_{j}-x^{\rm left}_{j}$ . Point $c_{j}=(x^{\rm right}_{j}+x^{\rm left}_{j})/2$ denotes the center of the basin $\mathcal{B}_{j}$ . Additionally, we let

[TABLE]

We are now ready to describe the metric structure of the shock tree $S(\Psi_{0})$ and a constructive embedding $\mathcal{G}^{(x,\psi)}(\Psi_{0})$ of the tree $S(\Psi_{0})$ into the system’s phase space.

Metric tree structure. The length $l_{j}$ of the parental edge of a non-root vertex $j$ within $S(\Psi_{0})$ is given by $l_{j}=\mathrm{v}_{j}+\mathrm{h}_{j}.$

Graphical shock tree in the phase space. The tree $\mathcal{G}^{(x,\psi)}(\Psi_{0})$ is the union of the following vertical and horizontal segments:

$(\mathrm{v})$

For every local extremum $x_{j}$ of $\Psi_{0}(x)$ there exists a vertical segment from $(c_{j},\Psi_{0}(x_{j}))$ to $(c_{j},\Psi_{0}(x_{j})+\mathrm{v}_{j})$ .

(h)

For every local maximum $x_{j}$ of $\Psi_{0}(x)$ there exists a horizontal segment of length $\mathrm{h}_{{\sf left}(j)}+\mathrm{h}_{{\sf right}(j)}$ from $(c_{{\sf left}(j)},\Psi_{0}(x_{j}))$ to $(c_{{\sf right}(j)},\Psi_{0}(x_{j}))$ .

Figure 41 shows the graphical shock trees $\mathcal{G}^{(x,\psi)}$ and $\mathcal{G}^{(x,t)}$ for an initial potential with two local maxima and three local minima, and illustrates the labeling of vertical ( $\mathrm{v}_{j}$ ) and horizontal ( $\mathrm{h}_{j}$ ) segments of the tree. Figure 42 shows an example of the graphical tree $\mathcal{G}^{(x,\psi)}$ for an initial potential with nine local minima (and, hence, with nine initial sinks).

Consider a tree $\mathcal{V}(\Psi_{0})\in\mathcal{BL}_{\rm plane}^{|}$ that has the same planar combinatorial structure as $S(\Psi_{0})$ , and the length of the parental edge of vertex $j$ is given by $l_{j}=\mathrm{v}_{j}$ . Informally, this is a tree that consists of the vertical segments of the graphical tree $\mathcal{G}^{(x,\psi)}(\Psi_{0})$ (Fig. 40, bottom). We have the following corollary of Thm. 26.

Corollary 21 ([85]).

Suppose $g(x)\equiv 1$ and potential $\Psi_{0}(x)$ is such that $-\Psi_{0}(x)\in\mathcal{E}^{\sf ex}$ . Then

[TABLE]

10.2.3 Ballistic annihilation as generazlized pruning

This section shows that the dynamics of continuum ballistic annihilation with constant initial density and unit-slope potential is equivalent to the generalized dynamical pruning of either the shock wave tree (Thm. 27) or the level set tree of the potential (Thm. 28).

Suppose a tree $T\in\mathcal{BL}_{\rm plane}^{|}$ has a particular graphical representation $\mathcal{G}_{T}\in\mathbb{R}^{2}$ implemented by a bijective isometry $f:T\rightarrow\mathcal{G}_{T}$ that maps the root of $T$ into the root of $\mathcal{G}_{T}$ . We extend the notion of the generalized dynamical pruning $\mathcal{S}_{t}(\varphi,\mathcal{G}_{T})$ for the graphical tree $\mathcal{G}_{T}$ by considering the $f$ -image of $\mathcal{S}_{t}(\varphi,T)$ :

[TABLE]

Consider a natural isometry (Lem. 30) between the shock wave tree $S(\Psi_{0})$ and either of the graphical shock trees, $\mathcal{G}^{(x,t)}(\Psi_{0})$ (in the space-time domain) or $\mathcal{G}^{(x,\psi)}(\Psi_{0})$ (in the phase space). The next theorem formalizes an observation that the dynamics of sinks is described by the continuous pruning (Sect. 9.1.1) of the shock wave tree.

Theorem 27 (Annihilation pruning I, [85]).

Suppose $g(x)\equiv 1$ , and the initial potential $\Psi_{0}(x)$ is such that $-\Psi_{0}(x)\in\mathcal{E}^{\sf ex}$ . Then, the dynamics of sinks is described by the generalized dynamical pruning $\mathcal{S}_{t}(\varphi,\mathcal{G})$ of either the graphical tree $\mathcal{G}=\mathcal{G}^{(x,\psi)}(\Psi_{0})$ (in the phase space) or $\mathcal{G}=\mathcal{G}^{(x,t)}(\Psi_{0})$ (in the space-time domain), with the pruning function $\varphi(T)=\textsc{height}(T)$ . Specifically, the locations of sinks at any instant $t\in[0,t_{\rm max})$ coincide with the location of the leaves of the pruned tree $\mathcal{S}_{t}(\varphi,\mathcal{G})$ .

Theorem 27 only refers to the dynamics of the sinks; it is, however, intuitively clear that the entire potential $\psi(x,t)$ at any given $t>0$ can be uniquely reconstructed from either of the pruned graphical trees, $\mathcal{G}^{(x,t)}(\Psi_{0})$ or $\mathcal{G}^{(x,\psi)}(\Psi_{0})$ . Because of the multiple symmetries [85], the graphical trees possess significant redundant information. It has been shown in [85] that the reduced tree $\mathcal{V}(\Psi_{0})$ (Cor. 21) equipped with information about the sinks provides a minimal description sufficient for reconstructing the entire continuum annihilation dynamics.

Lemma 31 ([85]).

Suppose $g(x)\equiv 1$ , and the initial potential $\Psi_{0}(x)$ is such that $-\Psi_{0}(x)\in\mathcal{E}^{\sf ex}$ . Then,

[TABLE]

Lemma 31 states that the level set tree (i.e., the sequence of the local extreme values) of $\psi(x,t)$ is uniquely reconstructed from the pruned tree $\mathcal{V}(\Psi_{0})$ . This, however, is not sufficient to reconstruct the entire time-advanced potential, which has plateaus corresponding to the intervals of zero density (recall the empty regions in the top panels of Fig. 40). The information about such plateaus is lost in the pruned tree. It happens that it suffices to remember “the size” of the pruned out parts of the tree in order to completely reconstruct the annihilation dynamics from $\mathcal{V}(\Psi_{0})$ . Specifically, we store the value $\varphi(\tau)$ for each subtree $\tau$ that has been pruned out. These values are stored in the cuts – the points where the pruned subtrees were attached to the initial tree; see Fig. 43(a). The cuts is a union of the leaves of the pruned tree and the vertices of the initial tree that became edge points in the pruned tree. A formal definition is given below.

Definition 36 (Cuts).

The set $\mathcal{D}_{t}(\varphi,T)$ of cuts in a pruned tree $\mathcal{S}_{t}(\varphi,T)$ is defined as the boundary of the pruned part of the tree

[TABLE]

We now define an extension $\widetilde{\mathcal{S}}_{t}(\varphi,T)$ of the generalized dynamical pruning that preserves the sizes of pruned subtrees. Such pruning starts with a tree from $\mathcal{BL}_{\rm plane}^{|}$ and results in a tree from the space of mass-equipped trees, denoted $\widetilde{\mathcal{BL}}_{\rm plane}^{|}$ . The pruning $\widetilde{\mathcal{S}}_{t}(\varphi,T)$ of a tree $T\in\mathcal{BL}_{\rm plane}^{|}$ is a tree from $\widetilde{\mathcal{BL}}_{\rm plane}^{|}$ , whose projection to $\mathcal{BL}_{\rm plane}^{|}$ coincides with $\mathcal{S}_{t}(\varphi,T)$ . In addition, the tree is equipped with massive points placed at the cuts. Each massive point corresponds to a pruned out subtree $\tau$ of $T$ , with mass equal $\varphi(\tau)$ . If a cut is the boundary for two pruned subtrees (Fig. 43(a), cuts a,d), then it hosts two oriented masses. Such cuts are typical in prunings that do not have the semigroup property (see Fig. 36, Stage IV). Figure 43(b) illustrates mass-equipped pruning $\widetilde{\mathcal{S}}_{t}(\varphi,T)$ with pruning function $\varphi=\textsc{length}$ .

Next, we describe how to construct a potential $\psi_{T,t}(x)$ for a given $t\in[0,t_{\rm max}]$ and all $x\in[a,b]$ from a pruned mass-equipped tree $T=\widetilde{\mathcal{S}}_{t}(\textsc{length},\mathcal{V}(\Psi_{0}))$ . Theorem 28 then shows that this reconstructed potential coincides with the time-advances potential of the annihilation dynamics.

Construction 1 (Tree $\rightarrow$ potential).

Suppose $T=\widetilde{\mathcal{S}}_{t}(\textsc{length},\mathcal{V}(\Psi_{0}))$ . The corresponding potential $\psi_{T,t}(x)$ , with $-\psi_{T,t}(x)\in{\mathcal{E}}^{\sf ex}$ , is constructed in the following steps:

(1)

Construct the Harris path $H_{T}(x)$ for the projection of $T$ to $\mathcal{BL}_{\rm plane}^{|}$ (i.e., disregarding masses), and consider the negative excursion $-H_{T}(x)$ .

(2)

At every local minimum of $-H_{T}(x)$ that corresponds to a double mass $(m_{\rm L},m_{\rm R})$ , insert a horizontal plateau of length

[TABLE]

as illustrated in Fig. 44, Stage $2$ .

(3)

At every monotone point of $-H_{T}(x)$ that corresponds to an internal mass $m$ , insert a horizontal plateau of length $2m$ (Fig. 44, Stage $3$ ).

(4)

At every internal local maxima of $-H_{T}(x)$ , insert a horizontal plateau of length $2t$ (Fig. 44, Stage $1$ ).

The following theorem establishes the equivalence of the continuum annihilation dynamics and mass-equipped generalized dynamical pruning with respect to the tree length. In particular, it includes the statement of Lem. 31.

Theorem 28 (Annihilation pruning II, [85]).

Suppose $g(x)\equiv 1$ and the initial potential $\Psi_{0}(x)$ is such that $-\Psi_{0}(x)\in{\mathcal{E}}^{\sf ex}$ . Then, for any $t\in[0,t_{\rm max}]$ , the time-advances potential $\psi(x,t)$ is uniquely reconstructed (by Construction 1) from the pruned tree $T(t)=\widetilde{\mathcal{S}}_{t}(\textsc{length},\mathcal{V}(\Psi_{0})).$ That is, $\psi(x,t)\equiv\psi_{T(t),t}$ for all $x\in[a,b]$ .

It is shown in [85] that, inversely, the mass-equipped tree $\widetilde{\mathcal{S}}_{t}(\textsc{length},\mathcal{V}(\Psi_{0}))$ can be uniquely reconstructed from the time-advanced potential $\psi(x,t)$ . Hence, the continuum ballistic annihilation dynamics is equivalent to the mass-equipped generalized dynamical pruning of the level set tree of the initial potential. The next sections illustrates how this equivalence facilitates the analytical treatment of the model.

10.3 Ballistic annihilation of an exponential excursion

This section examines a special case of piece-wise linear potential with unit slopes: a negative exponential excursion. Consider potential

[TABLE]

that is the negative Harris path (Sect. 7.1) of an exponential critical binary Galton-Watson tree with parameter $\lambda$ (Def. 31). In words, the potential is a negative finite excursion with linear segments of alternating slopes $\pm 1$ , such that the lengths of all segments except the last one are i.i.d. exponential random variables with parameter $\lambda/2$ . Accordingly, the initial particle velocity $v(x,0)$ alternates between the values $\pm 1$ at epochs of a stationary Poisson point process on $\mathbb{R}$ with rate $\lambda/2$ , starting with $+1$ and until the respective potential crosses the zero level.

Corollary 22 (Exponential excursion).

Suppose $g(x)\equiv 1$ and initial potential $\Psi_{0}(x)=-H_{{\sf GW}(\lambda)}(x)$ . Then the corresponding tree $\mathcal{V}(\Psi_{0})\in\mathcal{BL}_{\rm plane}^{|}$ is an exponential binary critical Galton-Watson tree ${\sf GW}(\lambda)$ .

Proof.

By Cor. 21, the tree $\mathcal{V}(\Psi_{0})$ is the level set tree of the negative potential $-\Psi_{0}(x)$ . The statement now follows from Thm. 18. ∎

To formulate the next result, recall that if $T\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(\lambda)$ and $\varphi(T)=\textsc{length}(T)$ , then by (9.5),

[TABLE]

Also, the p.d.f. of $\textsc{length}(T)$ is given by $\ell(x)$ of (70).

Theorem 29 (Ballistic annihilation dynamics of an exponential excursion, [85]).

Suppose the initial particle density is constant, $g(x)\equiv 1$ , and the initial potential $\psi(x,0)$ is the negative Harris path of an exponential critical binary Galton-Watson tree with parameter $\lambda$ , i.e., $\mathcal{V}(\Psi_{0})\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}(\lambda)$ . Then, at any instant $t>0$ the mass-equipped shock tree $\mathcal{V}_{t}=\widetilde{\mathcal{S}}_{t}(\textsc{length},\mathcal{V}(\Psi_{0}))$ conditioned on surviving, $\mathcal{V}_{t}\not=\phi$ , is distributed according to the following rules.

(i)

The planar shape of the tree, as an element of $\mathcal{BL}_{\rm plane}^{|}$ , is distributed as an exponential binary Galton-Watson tree ${\sf GW}(\lambda_{t})$ with $\lambda_{t}:=\lambda p_{t}$ . 2. (ii)

A single or double mass points are placed independently in each leaf with the probability of a single mass being

[TABLE] 3. (iii)

Each single mass at a leaf has mass $m=t$ . For a double mass, the individual masses $(m_{\rm L},m_{\rm R})$ have the following joint p.d.f.

[TABLE]

for $a,b>0$ , $a\vee b~{}\leq t~{}<a+b$ . 4. (iv)

The number of mass points placed in the interior of any edge is distributed geometrically with the probability of placing $k$ masses being

[TABLE]

The locations of $k$ mass points are independent uniform in the interior of the edge. The orientation of each mass is left or right independently with probability $1/2$ . 5. (v)

The edge masses are i.i.d. random variables with the following common p.d.f.

[TABLE]

10.4 Random sink in an infinite exponential potential

Here we focus on the dynamics of a random sink in the case of a negative exponential excursion potential. To avoid subtle conditioning related to a finite potential, we consider here an infinite exponential potential $\Psi^{\rm exp}_{0}(x)$ , $x\in\mathbb{R}$ , constructed as follows. Let $x_{i}$ , $i\in\mathbb{Z}$ be the epochs of a Poisson point process on $\mathbb{R}$ with rate $\lambda/2$ , indexed so that $x_{0}$ is the epoch closest to the origin. The initial velocity $v(x,0)$ is a piece-wise constant continuous function that alternates between values $\pm 1$ within the intervals $(x_{i}-1,x_{i}]$ and with $v(x_{0},0)=1$ . Accordingly, the initial potential $\Psi^{\rm exp}_{0}(x)$ is a piece-wise linear continuous function with a local minimum at $x_{0}$ and alternating slopes $\pm 1$ of independent exponential duration. The results in this section refer to the sink $\mathcal{M}_{0}$ with initial Lagrangian coordinate $x_{0}$ . We refer to $\mathcal{M}_{0}$ as a random sink, using translation invariance of Poisson point process.

Observe that for any fixed $t>0$ , the dynamics of $\mathcal{M}_{0}$ is completely specified by a finite excursion within $\Psi^{\rm exp}_{0}(x)$ . For instance, one can consider the shortest negative excursion of $\Psi^{\rm exp}_{0}(x)$ within interval $\mathcal{B}_{0}^{t}$ such that $x_{0}\in\mathcal{B}_{0}^{t}$ , $|\mathcal{B}_{0}^{t}|>2t$ , and one end of $\mathcal{B}_{0}^{t}$ is a local maximum of $\Psi^{\rm exp}_{0}(x)$ (see Fig. 45). The respective Harris path is an exponential Galton-Watson tree ${\sf GW}(\lambda)$ . The dynamics of $\mathcal{M}_{0}$ consists of alternating intervals of mass accumulation (vertical segments of $\mathcal{G}^{(x,\psi)}$ ) and motion (horizontal segments of $\mathcal{G}^{(x,\psi)}$ ), starting with a mass accumulation interval. Label the lengths $\mathrm{v}_{i}$ of the vertical segments and the lengths $\mathrm{h}_{i}$ of the horizontal segments in the order of appearance in the examined trajectory. Corollary 22 implies that $\mathrm{v}_{i},\mathrm{h}_{i}$ are independent; the lengths of $\mathrm{v}_{i}$ are i.i.d. exponential random variables with parameter $\lambda$ ; and the lengths of $\mathrm{h}_{i}$ equal the total lengths of independent Galton-Watson trees ${\sf GW}\left(\lambda\right)$ . This description, illustrated in Fig. 46, allows us to find the mass dynamics of a random sink, which is described in the next two theorems.

Theorem 30 (Growth probability of a random sink, [85]).

The probability $\xi(t)$ that a random sink $\mathcal{M}_{0}$ is growing at a given instant $t>0$ (that is, it is at rest and accumulates mass) is given by

[TABLE]

Theorem 31 (Mass distribution of a random sink, [85]).

The mass of a random sink $\mathcal{M}_{0}$ at instant $t>0$ has probability distribution

[TABLE]

where $\delta_{2t}$ denotes Dirac delta function (point mass) at $2t$ .

Remark 20.

One can notice that the continuum annihilation dynamics of this section, with its shock waves, shock wave trees, and sink masses is reminiscent of that in the 1-D inviscid Burgers equation that describes the evolution of the velocity field $v(x,t)$ :

[TABLE]

The Burgers dynamics appears in a surprising variety of problems, ranging from cosmology to fluid dynamics and vehicle traffic models; see [18, 57, 63] for comprehensive review. The solution of the Cauchy problem for the Burgers equation develops singularities (shocks) that correspond to intersection of individual particles. The shocks evolve via the shock waves that can be described as massive particles that aggregate the colliding regular particles and hence accumulate the mass of the media. The dynamics of these massive particles generates a tree structure for their world trajectories, the shock wave tree [25, 63].

The case of smooth random initial velocity can be treated explicitly via the Hopf-Cole solution. The case of non-smooth random initial velocities, e.g. a white noise or a (fractional) Brownian motion, has been extensively studied, both numerically [123] and analytically [127, 24, 25, 59]. In this case, tracing the dynamics of the massive particles backward in time (from a point within a shock tree to the leaves) corresponds to fragmentation of the mass and describes the genealogy of the shocks, i.e., the sets of particles that merge with a given massive particle [23, 59]. In particular, it has been established in [25] that the shock wave tree for a Brownian motion initial velocity becomes the eternal additive coalescent after a proper time change; similar arguments apply for the Lévy type initial velocities [102]. However, despite general heuristic understanding of the structure of the Burgers shock wave tree, a complete analytical description is lacking (e.g., [123]).

10.5 Real tree description of ballistic annihilation

Recall that an $\mathbb{R}$ -tree is a generalization of the concept of a finite tree with edge lengths to infinite spaces; see Sect. 2.2 for a formal setup. We construct here (Sect. 10.5.1) an $\mathbb{R}$ -tree $\mathbb{T}=\mathbb{T}(\Psi_{0})$ that describes the entire model dynamics as coalescence of particles and sinks; this tree is sketched by gray lines in the top panel of Figs. 40 and 47. Specifically, the tree consists of points $(x,t)$ such that there exist either a particle or a sink with coordinate $x$ at time $t$ . There is one-to-one correspondence between the initial particles $(x,0)$ and leaf vertices of $\mathbb{T}$ . Each leaf edge of $\mathbb{T}$ corresponds (one-to-one) to the free (ballistic) run of a corresponding particle before annihilating in a sink. Four of such free runs are depicted by green arrows in Fig. 47. The shock wave tree (movement and coalescence of sinks) corresponds to the non-leaf part of the tree $\mathbb{T}$ ; it is shown by blue lines in Figs. 40, 47. We adopt a convention that the motion of a particle consists of two parts: an initial ballistic run at unit speed, and subsequent motion within a respective sink. For example, the within-sink motion of particles $x$ and $x^{\prime}$ is shown by red line in Fig. 47. This interpretation extends motion of all particles to the same time interval $[0,t_{\rm max}]$ , with $t_{\rm max}$ being the time of appearance of the final sink that accumulates the total mass on the initial interval. This final sink serves as the tree root. Section 10.5.1 introduces a proper metric on this space so that the model is represented by a time oriented rooted $\mathbb{R}$ -tree. In particular, the metric induced by this tree on the initial particles $(x,0)$ becomes an ultrametric, with the distance between any two particles equal to the time until their collision (as particles or as respective sinks).

Section 10.5.2 discusses two non-Lebesgue metrics of the system’s domain $[a,b]$ . Both describe the ballistic annihilation dynamics and are readily constructed from the initial potential $\Psi_{0}(x)$ . One of these decsriptions is an $\mathbb{R}$ -tree and the other is not. The $\mathbb{R}$ -tree description establishes an equivalence between the pairs of points that collide with each other, like the pairs $(x,x^{\prime})$ and $(y,y^{\prime})$ in Fig. 47. This tree is isometric to the level set tree $\textsc{level}(-\Psi_{0})$ of the initial potential that is used in this work to describe the shock wave tree (Cor. 21); it is known in the literature as a tree in continuous path [116, Def. 7.6],[52, Ex. 3.14]. In Sect. 10.5.3 we briefly discuss a natural way of introducing prunings on $\mathbb{R}$ -trees and show that a typical pruning does not have the semigroup property.

10.5.1 $\mathbb{R}$ -tree representation of ballistic annihilation

We construct here a real tree representation of the continuum ballistic annihilation model of Sect. 10.2. Specifically, we assume a unit particle density $g(x)\equiv 1$ and initial potential $-\Psi_{0}(x)\equiv-\psi(x,0)\in{\mathcal{E}^{\rm ex}}$ , i.e. $\Psi_{0}(x)$ is a unit slope negative excursion with a finite number of segments on a finite interval $[a,b]$ (e.g., bottom panel of Fig. 40). Recall that the interval $[a,b]$ completely annihilates by time $t_{\rm max}=(b-a)/2$ , producing a single sink at space-time location $((b+a)/2,t_{\rm max})$ .

Consider the model’s entire space-time domain $\mathbb{T}=\mathbb{T}(\Psi_{0})$ that consists of all points of the form $(x,t)$ , $x\in[a,b]$ , $0\leq t\leq t_{\rm max}$ , such that there exists either a particle or a sink at location $x$ at time instant $t$ . The shaded (hatched) regions in the top panels of Figs. 40,41 are examples of such sets of points. For any pair of points $(x,t)$ and $(y,s)$ in $\mathbb{T}$ , we define their unique earliest common ancestor as a point

[TABLE]

such that $w$ is the infimum over all $w^{\prime}$ such that

[TABLE]

The length of the unique segment between the points $(x,t)$ and $(y,s)$ is defined as

[TABLE]

where $w$ is the time component of $(z,w)={\sf A}_{\mathbb{T}}((x,t),(y,s))$ .

The tree $(\mathbb{T},d)$ for a simple initial potential is illustrated in the top panel of Fig. 40 by gray lines. The tree has a relatively simple structure. There is a one-to-one correspondence between the initial particles $(x,0)$ , $x\in[a,b]$ , and the leaf vertices of $\mathbb{T}$ . There is a one-to-one correspondence between the ballistic runs of the initial particles (runs before collision and annihilation) and the leaf edges of $\mathbb{T}$ . Four of such runs are shown by green arrows in Fig. 47. There is one-to-one correspondence between the sink points $(\sigma(t),t)$ and the non-leaf part of $\mathbb{T}$ . In particular, the tree root corresponds to the final sink $((a+b)/2,t_{\rm max})$ . The sink points are shown by blue line in Figs. 40,41. It is now straightforward to check that the tree $(\mathbb{T},d)$ satisfies the four point condition.

Consider again the sink subspace of $\mathbb{T}$ , which consists of the points $\{\sigma(t),t)\}$ such that there exists a sink at location $\sigma(t)$ at time instant $t$ , equipped with the distance (223). This metric subspace is also a tree, as a connected subspace of an $\mathbb{R}$ -tree [52]. This tree is isometric to the shock wave tree $S(\Psi_{0})$ and hence to either of its graphical representations $\mathcal{G}^{(x,t)}(\Psi_{0})$ or $\mathcal{G}^{(x,\psi)}(\Psi_{0})$ that are illustrated in Figs. 40,41 (top and bottom panels, respectively).

From the above construction, it follows that all leaves $(x,0)$ are located at the same depth (distance from the root) $t_{\rm max}$ . To see this, consider the segment that connect a leaf and the root and apply (223). Moreover, each time section at a fixed instant $t_{0}$ , ${\sf sec}(\mathbb{T},t_{0})=\{(x,t_{0})\in\mathbb{T}\}$ , is located at the same depth $t_{\max}\!-\!t_{0}$ . This implies, in particular, that for any fixed $t_{0}\geq 0$ , the metric induced by $\mathbb{T}$ on ${\sf sec}(\mathbb{T},t_{0})$ is an ultrametric, which means that $d_{1}(p,q)\leq d_{1}(p,r)\vee d_{1}(r,q)$ for any triplet of points $p,q,r\in{\sf sec}(\mathbb{T},t_{0})$ . Accordingly, each triangle $p,q,r\in{\sf sec}(\mathbb{T},t_{0})$ is an isosceles, meaning that at least two of the three pairwise distances between $p,q$ and $r$ are equal and not greater than the third [52, Def. 3.31]. The length definition (223) implies that the distance between any pair of points from any fixed section ${\sf sec}(\mathbb{T},t_{0})$ equals the time until the two points (each of which can be either a particle or a sink) collide.

We notice that the collection of leaf vertices $\Delta^{\circ}_{p,\mathbb{T}}$ descendant to a point $p\in\mathbb{T}$ can be either a single point $(x_{p},0)$ , if $p$ is within a leaf edge and represents the ballistic run of a particle, or an interval $\{(x,0):x_{\rm left}(p)\leq x\leq x_{\rm right}(p)\}$ , if $p$ is a non-leaf point that represents a sink. We define the mass $m(p)$ of a point $p\in\mathbb{T}$ as

[TABLE]

where the last equality reflects the assumption $g(z)\equiv 1$ . The mass $m(p)$ generalizes the quantity “number of descendant leaves” (Sect. 9.1.4) to the $\mathbb{R}$ -tree situation with an uncountable set of leaves. We observe that (i) a point $p\in\mathbb{T}$ represents a ballistic run if and only if $m(p)=0$ ; (ii) a point $p\in\mathbb{T}$ represents a sink if and only if $m(p)>0$ . This means that the shock wave tree, which is isometric to the sink part of the tree $(\mathbb{T},d)$ , can be extracted from $(\mathbb{T},d)$ by the condition $\{p:m(p)>0\}$ .

10.5.2 Metric spaces on the set of initial particles

In this section we discuss two metrics on the system’s domain $[a,b]$ , which is isometric to the set $\{(x,0):x\in[a,b]\}$ of initial particles. These spaces contain the key information about the system dynamics and, unlike the complete tree $(\mathbb{T},d)$ of Sect. 10.5.1, can be readily constructed from the potential $\Psi_{0}(x)$ .

Metric $h_{1}(x,y)$ reproduces the ultrametric induced by $(\mathbb{T},d)$ on $[a,b]$ . Below we explicitly connect this metric to $\Psi_{0}(x)$ . For any pair of points $x,y\in[a,b]$ we define a basin ${\sf B}_{\Psi_{0}}(x,y)$ as the interval that supports the minimal negative excursion within $\Psi_{0}(x)$ that contains the points $x,y$ . Formally, assuming without loss of generality that $x<y$ we find the maximum of $\Psi_{0}$ on $[x,y]$ :

[TABLE]

and use it to define the basin ${\sf B}_{\Psi_{0}}(x,y)=[l,r],$ where

[TABLE]

The metric is now defined as

[TABLE]

It is straightforward to check that

[TABLE]

where the collision is understood as either collision of particles, collision of sinks that annihilated the particles, or collision between a sink that annihilated one of the particles and the other particle. For instance, the claim is readily verified, by examining the bottom panel of Fig. 47, for any pair of points from the set $\{x,x^{\prime},y,y^{\prime}\}$ . The metric space $([a,b],h_{1})$ is not a tree. Moreover, this space is totally disconnected, since there only exists a finite number of points (local minima of $\Psi_{0}(x)$ ) that have a neighborhood of arbitrarily small size. Any other point at the Euclidean distance $\epsilon$ from the nearest local minimum is separated from other points by at least $\epsilon/2$ .

Metric $h_{2}(x,y)$ describes the mass accumulation by sinks during the annihilation process. Specifically, we introduce an equivalence relation among the annihilating particles, by writing $x\sim_{\Psi_{0}}y$ if the particles with initial coordinates $x$ and $y$ collide and annihilate with each other. For example, in Fig. 47 we have $x\sim_{\Psi_{0}}x^{\prime}$ and $y\sim_{\Psi_{0}}y^{\prime}$ . The following metric is now defined on the quotient space $[a,b]|_{\sim_{\Psi_{0}}}$ :

[TABLE]

In words, the distance $h_{2}(x,y)$ between particles $x$ and $y$ equals the total mass accumulated by the sinks to which the particles belong during the time intervals between the instants when the particles joined the respective sinks and the instant of particle (or respective sink) collision. Another interpretation is that $h_{2}(x,y)$ equals to the minimal Euclidean distance between points $x,y\in[a,b]|_{\sim_{\Psi_{0}}}$ in the quotient space; one can travel in this quotient space as along a regular real interval, with a possibility to jump (with no distance accumulation) between equivalent points. This $\mathbb{R}$ -tree construction is know as the tree in continuous path [116, Def. 7.6],[52, Ex. 3.14].

The metric space $([a,b]|_{\sim_{\Psi_{0}}},h_{2})$ is a tree that is isometric to the level set tree of the potential $\Psi_{0}(x)$ on $[a,b]$ and hence to the (finite) shock wave tree $\mathcal{V}(\Psi_{0})$ (by Cor. 21), with the convention that the root is placed in $a\sim_{\Psi_{0}}b$ . This means, in particular, that prunings of these two trees, with the same pruning function and pruning time, coincide.

10.5.3 Other prunings on $\mathbb{T}$

One can introduce a large class of prunings on an $\mathbb{R}$ -tree $(\mathbb{T},d)$ following the approach used above to define the point mass $m(p)$ . Specifically, consider a measure $\eta(\cdot)$ on $[a,b]$ and define $m_{\eta}(p)=\eta(\Delta^{\circ}_{p,\mathbb{T}})$ . The function $m_{\eta}(p)$ is nondecreasing along each segment that connect a leaf and the root $\rho_{\mathbb{T}}$ of $\mathbb{T}$ . Hence, one can define a pruning with respect to $m_{\eta}$ on $\mathbb{T}$ by cutting all points $p$ with $m_{\eta}(p)<t$ for a given $t\geq 0$ . It is readily seen that the function $m_{\eta}(p)$ typically has discontinuities along a path between a leaf and the root of $\mathbb{T}$ . This means that pruning with respect to $m_{\eta}$ typically does not have the semigroup property.

11 Infinite trees built from leaves down

Examples of infinite trees built from the root up are plentiful; they include the infinite trees induced by the Yule processes or any other birth processes; infinite trees generated by a supercritical branching process; the trees that represent depth-first search and breadth-first search algorithms on infinite networks. In this section we explore the infinite trees built from leaves down that arise naturally in the context of infinitely many coalescing particles or the level set trees of continuous functions. Interestingly, many of the results about finite trees can be obtained from the characterizations of the corresponding infinite trees built from leaves down.

11.1 Infinite plane trees built from the leaves down

In the context of Sect. 7.2, set $I=\mathbb{R}$ and consider a function $f(x)\in C(\mathbb{R})$ . Let $\mathcal{X}$ and $\mathcal{Y}$ be the sets containing all locations of local minima and local maxima of $f(x)$ , respectively. Formally, $x_{0}\in\mathcal{X}$ if $\exists\delta>0$ s.t. $f(x)\geq f(x_{0})$ $\forall x\in(x_{0}-\delta,x_{0}+\delta)$ , and $\mathcal{Y}$ is defined analogously. Hence, the local extrema may include plateaus of constant values. We assume that $f(x)$ satisfies the following conditions:

(a)

The set $\mathcal{X}$ of the locations of local minima has infinite image, i.e.,

[TABLE]

This condition guarantees that the level set tree of $f(x)$ that we construct below has an infinite number of vertices.

(b)

The intersection of $\mathcal{X}$ with any finite interval $[a,b]$ is either empty or consists of a finite number of closed intervals (possibly including separate points). This condition guarantees that every descendant subtree of the infinite level set tree of $f(x)$ is finite. The conditions (a),(b) guarantee that the level set tree has countably many vertices.

(c)

$\forall a\in\mathbb{R}$ , the sets

[TABLE]

are empty or consist of finitely many closed intervals (including separate points). Here, $f^{-1}(-\infty)$ is an empty set. This condition, or equivalent, guarantees that the level set tree has finite branching (no vertices of infinite degree).

Recalling the construction in Sect. 7.2.2, the level set tree $T_{\infty}=\textsc{level}\big{(}f(x)\big{)}$ has infinitely many leaves. There, $T_{\infty}=\Big{(}\mathbb{R}/\!\sim_{f},\,d_{f}\Big{)}$ is a metric quotient space obtained with respect to identification (denoted by $a_{\ell}\sim_{f}a_{r}$ ) of pairs of points $a_{\ell}$ and $a_{r}$ in $\mathbb{R}$ as one point. Recall that we have $a_{\ell}\sim_{f}a_{r}$ whenever the following conditions are satisfied

$a_{\ell}<a_{r}$ and $f(a_{\ell})=f(a_{r})$ ; 2. 2.

$\forall x\in(a_{\ell},a_{r})$ we have $f(x)\geq f(a_{\ell})=f(a_{r})$ .

The local maxima $\mathcal{Y}$ (including plateaus) constitute the leaves in $T_{\infty}$ , and the local minima $\mathcal{X}$ (including plateaus) constitute the internal vertices (junctions) in $T_{\infty}$ . Such $T_{\infty}$ is also called an infinite plane tree built from the leaves down induced by function $f(x)$ . The reason for the name being that as we study $f(x)$ over larger and larger intervals (e.g. $[-a,a]$ as $a\rightarrow\infty$ ) we discover more and more leaves of $T_{\infty}$ (local maxima) and their merger history (local minima) from leaves down, but never reaching the root.

To give a convenient description of an infinite tree $T_{\infty}$ built from leaves down, we designate one leaf as the golden leaf, and its ancestral lineage is called the golden lineage (Fig. 48). In the above construction, we let the leaf that corresponds to the first local maximum in the nonnegative half-line,

[TABLE]

to be designated as the golden leaf. Let $\mathcal{L}_{\rm plane}^{\infty}$ denote the space of infinite plane trees built from the leaves down, with edge lengths and designated golden leaf. For a tree $T_{\infty}\in\mathcal{L}_{\rm plane}^{\infty}$ with a designated golden leaf $\gamma^{*}$ , we let $\ell=[\gamma^{*},\phi]$ denote the unique ancestral path from the golden leaf $\gamma^{*}$ to its parent, grandparent, great-grandparent and on towards the tree root $\phi$ , where $\phi$ is a point at infinity. Here, the ancestral path $\ell$ will be called the golden lineage. The golden lineage $\ell=\left\{\ell(i),e(i)\right\}$ consists of infinitely many vertices $\ell(i)$ that we enumerate by the index $i\geq 0$ along the path, starting from the golden leaf $\ell(0)=\gamma^{*}$ and increasing as we go down the golden lineage $\ell$ , and infinitely many edges $e(i)=[\ell(i),\ell(i+1)]$ .

Each tree $T_{\infty}\in\mathcal{L}_{\rm plane}^{\infty}$ can be represented as a forest of finite trees attached to the golden lineage $\ell$ as follows

[TABLE]

where for each $i\geq 1$ , $\mathcal{D}_{i}=\Delta_{\ell(i)}\in\mathcal{L}_{\rm plane}^{|}$ denotes the complete subtree of $T_{\infty}$ rooted at $\ell(i)$ that does not include the golden leaf, and $\sigma_{i}\in\{-1,+1\}$ denotes the left-right orientation of $\mathcal{D}_{i}$ with respect to the golden lineage $\ell$ . Figure 48 illustrates this construction.

The representation (224) of a tree $T_{\infty}\in\mathcal{L}_{\rm plane}^{\infty}$ allows one to relate the space $\mathcal{L}_{\rm plane}^{\infty}$ of infinite planar trees built from the leaves down with edge lengths and a designated golden leaf to the notion of a forest of trees attached to the floor line described in Sect. 7.4 of [116]. In addition, the golden lineage construct helps at meterizing the space $\mathcal{L}_{\rm plane}^{\infty}$ .

Importantly, for any point $x\in T_{\infty}$ , the descendant tree $\Delta_{x,T}$ is a finite tree in $\mathcal{L}_{\rm plane}$ . Therefore, the definition of generalized dynamical pruning (213) extends naturally to the space $\mathcal{L}_{\rm plane}^{\infty}$ of infinite plane trees built from the leaves down. Applying the generalized dynamical pruning $\mathcal{S}_{t}$ to an infinite tree built from the leaves down, the uppermost point of the golden lineage within $\mathcal{S}_{t}(\varphi,T)$ will become the golden leaf for the pruned tree $\mathcal{S}_{t}(\varphi,T)$ .

Next, we extend the notion of prune-invariance in planar shapes from Def. 35(i) to a subspace $S^{\infty}$ of the space $\mathcal{L}_{\rm plane}^{\infty}$ . Consider a subspace $S^{\infty}$ of $\mathcal{L}_{\rm plane}^{\infty}$ . For a given monotone nondecreasing function $\varphi:\mathcal{L}_{\rm plane}\rightarrow\mathbb{R}^{+}$ , consider generalized pruning dynamics $\mathcal{S}_{t}(\varphi,T_{\infty})$ ( $T_{\infty}\in S^{\infty}$ ). We say that a probability measure $\mu$ on $S_{\infty}$ is prune-invariant in planar shapes if

[TABLE]

where $\mu_{t}=(\mathcal{S}_{t})_{*}(\mu)=\mu\circ\mathcal{S}_{t}^{-1}$ is the pushforward measure, and $\Sigma$ is the induced $\sigma$ -algebra.

The above definition of prune invariance (225) is significantly different from the original Def. 35(i) for finite trees as $\phi\not\in S^{\infty}$ and we do not need to condition on the event $\mathcal{S}_{t}(\varphi,T)\not=\phi$ in the pushforward measure. Importantly, the prune-invariance in (225) coincides with the John Von Neumann [142] definition of the invariant measure, fundamental for ergodic theory and dynamical systems. At the same time, the definition of prune-invariance in edge lengths Def. 35(ii) does not need to be reformulated any differently for the infinite trees built from leaves down.

The renown Krylov-Bogolyubov theorem [78] states that for a compact metrizable topological space $\Omega$ with the induced Borel $\sigma$ -algebra $\Sigma$ , and a continuous function $\mathcal{S}:\Omega\rightarrow\Omega$ , there exists an invariant probability measure $\mu$ on $(\Omega,\Sigma)$ satisfying

[TABLE]

where $\mu_{*}=(\mathcal{S})_{*}(\mu)=\mu\circ\mathcal{S}^{-1}$ is the pushforward measure.

Here we will not concentrate on constructing a suitable metric for the space $\mathcal{L}_{\rm plane}^{\infty}$ . However, in the spirit of the Krylov-Bogolyubov theorem, we will show in Thm. 32 that the infinite critical planar binary Galton-Watson tree ${\sf GW}_{\infty}(\lambda)$ built from the leaves down that we construct in Sect. 11.2 is prune invariant under generalized dynamical pruning $\mathcal{S}_{t}$ induced by a monotone nondecreasing function $\varphi:\mathcal{L}_{\rm plane}\rightarrow\mathbb{R}^{+}$ . Additionally, it will be observed that Thm. 32 is a generalization of Thm. 24.

11.2 Infinite exponential critical binary Galton-Watson tree built from the leaves down

Consider a Poisson point process $\{T_{k}\}_{k\in\mathbb{Z}}$ on $\mathbb{R}$ with parameter $\lambda/2$ , enumerated from left to right (where $T_{0}$ is the epoch closest to zero). Let

[TABLE]

In other words, $X_{t}$ is a continuous piecewise linear function with slopes alternating between $\pm 1$ as it crosses the Poisson epochs $\{T_{k}\}_{k\in\mathbb{Z}}$ , i.e., the slope

[TABLE]

The level set tree $T_{\infty}=\textsc{level}\big{(}X_{t}\big{)}$ is invariant under shifting $X_{t}$ vertically, or shifting and scaling $X_{t}$ horizontally.

Fix a point $t^{*}\in\mathbb{R}$ and generate $X_{t}$ with a Poisson point process $\{T_{k}\}_{k\in\mathbb{Z}}$ . Then, with probability one, there will be a positive excursion of $X_{t}-X_{t^{*}}$ over an interval that begins or ends at $t^{*}$ . By Thm. 18, the level set tree of this adjacent positive excursion is distributed as ${\sf GW}(\lambda)$ . Therefore, the infinite binary level set tree $T_{\infty}=\textsc{level}\big{(}X_{t}\big{)}$ for $X_{t}$ will be referred to as the infinite planar exponential critical binary Galton-Watson tree built from the leaves down with parameter $\lambda$ , and denoted by ${\sf GW}_{\infty}(\lambda)$ . We also refer to this tree as the infinite exponential critical binary Galton-Watson tree.

In the representation (224) of a tree $T_{\infty}\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}_{\infty}(\lambda)$ , the golden lineage $\ell$ is distributed as a one-dimensional Poisson process with parameter $\lambda$ , the orientation variables $\sigma_{i}$ are i.i.d. Bernoulli with parameter $1/2$ , and the complete subtrees $\mathcal{D}_{i}$ are i.i.d. ${\sf GW}(\lambda)$ trees. Finally, the golden lineage $\lambda$ and the sequences, $\sigma_{i}$ and $\mathcal{D}_{i}$ , are all sampled independently of each other.

The following is a variation of Thm. 24 for the infinite critical exponential binary Galton-Watson tree.

Theorem 32.

Let $T_{\infty}\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}_{\infty}(\lambda)$ with $\lambda>0$ . Then, for any monotone nondecreasing function $\varphi:\mathcal{BL}_{\rm plane}^{|}\rightarrow\mathbb{R}_{+}$ and any $\Delta>0$ we have

[TABLE]

where

[TABLE]

That is, the pruned tree $T_{\infty}^{\Delta}$ is also an infinite exponential critical binary Galton-Watson tree with the scaled parameter

[TABLE]

Notice that since we are dealing with an infinite tree $T_{\infty}$ , we do not need to be concerned about it surviving under the pruning operation $\mathcal{S}_{\Delta}$ . The survival probability $p_{\Delta}$ used in the statement of Thm. 32 is computed for finite trees, so the values of scaled parameter $\mathcal{E}_{\Delta}(\lambda,\varphi)$ for selected pruning functions are given by Thm. 25.

Proof.

Let ${\sf par}(x)$ denote the right parent to a point $x$ in $T_{\infty}$ . This means that the vertex ${\sf par}(x)$ is the parent of the first right subtree that one meets when travels the tree $T_{\infty}$ from $x$ down to the root. In the Harris path of $T_{\infty}$ , there exist two points that correspond to $x$ (they merge into a single point when $x$ is a leaf). Consider the rightmost of these points, $r_{x}$ , which belongs to a downward increment of the Harris path. The vertex ${\sf par}(x)$ corresponds to the nearest right local minima of $r_{x}$ . Similarly, we let ${\sf par}_{\Delta}(\cdot)$ denote the right parent on $T_{\infty}^{\Delta}$ .

Consider a leaf $x\in T_{\infty}^{\Delta}$ , which is also a point in $T_{\infty}$ ; see Fig. 49(a). We now find the distribution of the distance from $x$ to ${\sf par}_{\Delta}(x)$ , i.e., the length of the respective downward segment of the Harris path; see Fig. 49(b). Consider the descendant lineage of $x$ in $T_{\infty}$ , which consists of vertices

[TABLE]

Due to the memorylessness property of exponential distribution, and the symmetry of left-right orientation of subtrees in $T_{\infty}$ , the distance from $x$ down to ${\sf par}(x)$ has exponential distribution with rate $\lambda/2$ . The point $x$ belongs to one (left) of the two complete subtrees rooted at ${\sf par}(x)$ in $T_{\infty}$ . Observe that ${\sf par}_{\Delta}(x)={\sf par}(x)$ if and only if the subtree that does not contain $x$ (we call it sibling subtree) has not been pruned out completely, i.e., the intersection of the sibling subtree with $T_{\infty}^{\Delta}$ is not empty. (In the example of Fig. 49(a), we have ${\sf par}_{\Delta}(x)=x_{2}\equiv b$ .) The sibling subtree is known to be distributed as ${\sf GW}(\lambda)$ . Therefore,

[TABLE]

Iterating this argument, we have for $k\geq 1$ ,

[TABLE]

Therefore, the distance from a vertex $x$ down to ${\sf par}_{\Delta}(x)$ is a geometric ${\sf Geom}_{1}(p_{\Delta})$ sum of independent exponential random variables with parameter $\lambda/2$ . Hence, it is itself an exponential random variable with parameter $\lambda p_{\Delta}/2$ . In other words, the downward segment of the Harris path of the pruned tree $T_{\infty}^{\Delta}$ adjacent to the local maximum that corresponds to the leaf $x$ has exponential lengths with parameter $\lambda p_{\Delta}/2$ ; see Fig. 49(b).

The same argument (using left parents) shows that the upward segment of the Harris path of the pruned tree $T_{\infty}^{\Delta}$ adjacent to the local maximum that corresponds to the leaf $x$ has exponential lengths with parameter $\lambda p_{\Delta}/2$ . The lengths of the upward and downward segments are independent; see Fig. 49(b).

Applying the above argument to all leaves in $T_{\infty}^{\Delta}$ , we conclude that the Harris path of $T_{\infty}^{\Delta}$ consists of alternating up/down increments with independent lengths, distributed exponentially with the parameter $\lambda p_{\Delta}/2$ . Theorem 18 states that in this case $T_{\infty}^{\Delta}$ is an exponential critical binary Galton-Watson tree with parameter $\lambda p_{\Delta}$ . This completes the proof. ∎

Observe that Thm. 24 can be obtained from Thm. 32 by considering finite excursions of $X_{t}$ . Also notice that for the particular case of Horton pruning (Sect. 9.1.2), the statement of Thm. 32 follows from Thm. 17.

11.3 Continuum annihilation

One can observe that the continuum annihilation dynamics that begins with an infinite exponential potential $\Psi^{\rm exp}_{0}(x)$ , $x\in\mathbb{R}$ (see Sect. 10.4), is nothing but the generalized dynamical pruning $\mathcal{S}_{t}(\varphi,T_{\infty})$ of the infinite planar critical exponential binary Galton-Watson tree built from the leaves down

[TABLE]

where $\varphi(T)=\textsc{length}(T)$ for $T\in\mathcal{BL}_{\rm plane}^{|}$ . Moreover, the key results of Sect. 10.4, Thms. 30 and 31, that describe the growth dynamics of a sink in the continuum annihilation model are in fact describing the length distributions of pruned out sections of $T_{\infty}\stackrel{{\scriptstyle d}}{{\sim}}{\sf GW}_{\infty}(\lambda)$ under the generalized dynamical pruning $\mathcal{S}_{t}(\varphi,T_{\infty})$ . The proofs of these results can be rewritten in the infinite tree style of Thm. 32.

12 Some open problems

Consider the cumulative distribution function ${\sf H}_{n}(x)$ for the height of an exponential critical binary Galton-Watson tree ${\sf GW}(\lambda)$ (Def. 22) conditioned on having $n$ leaves; see (78) of Sect. 5.2.2. Can one derive the limit (87) from the equation (84)? 2. 2.

For a given sequence $\{T_{k}\}_{k\in\mathbb{Z_{+}}}$ of positive real numbers, construct a coalescent process whose symmetric kernel is a function of the clusters’ Horton-Strahler orders, in such a way that the combinatorial part of the coalescent tree is mean self-similar with respect to Horton pruning (Defs. 14 and 16), with Tokunaga coefficients $\{T_{k}\}$ . This would complement an analogous branching process construction of Sect. 6. 3. 3.

Generalize equation (59) of Flajolet et al. [55] for the critical Tokunaga processes (Sect. 6.5). Formally, consider a tree $T$ that corresponds to a critical Tokunaga process $S^{\rm Tok}(t;c,\gamma)$ (Def. 26). Establish the following generalization of (59): for any given $c>1$ , there exists a periodic function $D_{c}(\cdot)$ of period one such that

[TABLE]

as $n\rightarrow\infty$ , where $R=2c$ . We confirmed the validity of (227) numerically; see Fig. 50. 4. 4.

For a hierarchical branching process $S(t)$ (Def. 23, Sect. 6.1), describe the correlation structure of its Harris path. A special case is given by Thm. 18; it shows that the Harris path of the exponential critical binary Galton-Watson tree ${\sf GW}(\lambda)$ , which corresponds to the hierarchical branching process $S(t)\stackrel{{\scriptstyle d}}{{\sim}}S^{\rm Tok}(t;c,\gamma)$ (Sect. 6.5), is an excursion of the exponential random walk (Sect. 7.6), with parameters $\left\{{1\over 2},\lambda,\lambda\right\}$ . 5. 5.

Recall that a rescaled Harris path of an exponential critical binary Galton-Watson tree ${\sf GW}(\lambda)$ converges to the excursion of a standard Brownian motion [89, 106]. For a hierarchical branching process $S(t)$ (Def. 23, Sect. 6.1), explore the existence of a proper infinite-tree limit and the respective limiting excursion process. 6. 6.

Prove the following extension of Lem. 20. In the setup of the Lemma, suppose that for any tree $T$ , conditioned on $\textsc{p-shape}(T)$ , the edge lengths in $T$ are independent. Show that $f(x)$ is an exponential p.d.f. 7. 7.

Can the finite second moment assumption in Prop. 15 be removed? Also, does (169) characterize the exponential distribution (like the characterizations in Appendix B)? 8. 8.

In the context of Sect. 7.9, extend the one-dimensional result of Prop. 14 to higher dimensions. Specifically, consider an $n$ -dimensional compact differentiable manifold $M=M^{n}$ , and a Morse function $f:M\rightarrow\mathbb{R}$ . Construct a natural Morse function $f^{(1)}:M\rightarrow\mathbb{R}$ such that

[TABLE] 9. 9.

In the setting of Thm. 23 from Sect. 8, establish the asymptotic ratio-Horton law (Def. 21) for the Kingman’s coalescent tree, and, if possible, prove the asymptotic strong Horton law (Def. 21). Specifically, prove $\lim\limits_{j\rightarrow\infty}{{\mathcal{N}}_{j}\over{\mathcal{N}}_{j+1}}=R$ , and if possible, $\lim\limits_{j\rightarrow\infty}\big{(}{\mathcal{N}}_{j}R^{j}\big{)}=const$ . Is it possible to derive a closed form expression for the Horton exponent $R$ ? 10. 10.

Find a suitable ramification of the generalized dynamical pruning sufficient for describing the evolution of the shock tree in the one-dimensional inviscid Burgers equation (222) and its multidimensional modification known as the adhesion model [18, 57, 63]. Use this to complement the framework developed in [127, 24, 25, 59].

Appendix A Weak convergence results of Kurtz for density dependent population processes

We first formulate the framework for the convergence result of Kurtz as stated in Theorem 2.1 in Chapter 11 of [50] (Theorem 8.1 in [87]). There, the density dependent population processes are defined as continuous time Markov processes with state spaces in $\mathbb{Z}^{d}$ , and transition intensities represented as follows

[TABLE]

where $\ell,k\in\mathbb{Z}^{d}$ , and $\beta_{\ell}$ is a given collection of rate functions.

In Section 5.1 of [5], Aldous observes that the results from Chapter 11 of Ethier and Kurtz [50] can be used to prove the weak convergence of a Marcus-Lushnikov process to the solutions of Smoluchowski system of equations in the case when the Marcus-Lushnikov process can be formulated as a finite dimensional density dependent population process. Specifically, the Marcus-Lushnikov processes corresponding to the multiplicative and Kingman’s coalescent with the monodisperse initial conditions ( $n$ singletons) can be represented as finite dimensional density dependent population processes defined above.

Define $F(x)=\sum\limits_{\ell}\ell\beta_{\ell}(x)$ . Then, Theorem 2.1 in Chapter 11 of [50] (Theorem 8.1 in [87]) states the following law of large numbers. Let $\hat{X}_{n}(t)$ be the Markov process with the intensities $q^{(n)}(k,k+\ell)$ given in (228), and let $X_{n}(t)=n^{-1}\hat{X}_{n}(t)$ . Finally, let $|x|=\sqrt{\sum x_{i}^{2}}$ denote the Euclidean norm in $\mathbb{R}^{d}$ .

Theorem 33.

Suppose for all compact $\mathcal{K}\subset\mathbb{R}^{d}$ ,

[TABLE]

and there exists $M_{\mathcal{K}}>0$ such that

[TABLE]

Suppose $\lim\limits_{n\to\infty}X_{n}(0)=x_{0}$ , and $X(t)$ satisfies

[TABLE]

for all $T\geq 0$ . Then

[TABLE]

Appendix B Characterization of exponential random variables

This section contains a number of characterization results for exponential random variables that we use in this manuscript. We refer the reader to [12, 7] for more on characterization of exponential random variables.

The following result of K. S. Lau and C. R. Rao [88] that implies a characterization of exponential random variables is used by us for establishing Lemma 20. See [14] for more on Integrated Cauchy Functional Equations.

Lemma 32 ([88]).

Consider an Integrated Cauchy Functional Equation

[TABLE]

where $\mu(\cdot)$ is a p.d.f. on $[0,\infty)$ and $G(x)>0$ for $x$ in the support of $\mu$ . Then, $G(x)=e^{-\lambda x}$ for some $\lambda\geq 0$ .

The following characterization of exponential random variables follows immediately from Lemma 32.

Lemma 33.

Consider a p.d.f. $g(x)$ defined on $[0,\infty)$ , and satisfying

[TABLE]

Then, $g(x)$ is an exponential density function.

Proof.

Let $G(a)=\int\limits_{a}^{\infty}g(x)\,dx$ . Then, integrating (233), we have for all $a\geq 0$ ,

[TABLE]

where $\mu(y)=1-G^{2}(y)$ is a p.d.f. on $[0,\infty)$ . We notice that (234) produces equation (232). Hence, by Lem. 32, $G(x)=e^{-\lambda x}$ , where $\lambda>0$ as $g(x)$ is p.d.f. ∎

Next, we recall the Parseval’s identity, which we will use in the proof of characterization Lemma 34.

Theorem 34 (Parseval’s identity, [138]).

For a pair of cumulative distribution functions $F(x)$ and $G(x)$ and their respective characteristic functions $\widehat{f}(s)$ and $\widehat{g}(s)$ the following identity holds for all $s\in\mathbb{R}$

[TABLE]

We give yet another characterization of the exponential p.d.f. $\phi_{\lambda}(x)=\lambda e^{-\lambda x}{\bf 1}_{\{x\geq 0\}}$ as defined in (69).

Lemma 34.

Consider a p.d.f. $g(x)$ defined on $[0,\infty)$ , and satisfying

[TABLE]

Then, $g(x)=\phi_{\lambda}(x)$ .

Proof.

Observe that $\phi_{\lambda}(x)$ satisfies

[TABLE]

Thus,

[TABLE]

Hence, for the two pairs of independent random variables

[TABLE]

we have

[TABLE]

Therefore, for the characteristic functions $\widehat{\phi}_{\lambda}$ and $\widehat{g}$ , we have

[TABLE]

Observe that (238) can be also obtained from (237) via multiplying both sides by $e^{isx}$ and integrating.

Next, from the Parseval’s identity Theorem 34 and (237), we have $\forall s\geq 0$ ,

[TABLE]

Therefore,

[TABLE]

and (238) implies for any $x>0$ ,

[TABLE]

∎

Appendix C Notations

[TABLE]

Appendix D Standard distributions

[TABLE]

Appendix E Tree functions and mappings

[TABLE]

Acknowledgements

First and foremost, we are grateful to Ed Waymire for his continuing advice, encouragement, and support on more levels than one. We would like to thank Amir Dembo for providing valuable feedback, including the very idea of writing this survey; Jim Pitman for his comments and suggesting relevant publications; and Tom Kurtz for his insight regarding infinite dimensional population processes.

We would like to express our appreciation to the colleagues with whom we discussed this work at different stages of its preparation: Maxim Arnold, Krishna Athreya, Bruno Barbosa, Vladimir Belitsky, Yehuda Ben-Zion, Robert M. Burton, Mickael Checkroun, Evgenia Chunikhina, Steve Evans, Efi Foufoula-Georgiou, Andrei Gabrielov, Michael Ghil, Mark Meerschaert, George Molchan, Peter T. Otto, Scott Peckham, Victor Pérez-Abreu, Jorge Ramirez, Andrey Sarantsev, Sunder Sethuraman, Alejandro Tejedor, Enrique Thomann, Donald L. Turcotte, Guochen Xu, Anatoly Yambartsev, and many others. Finally, we thank the participants of the workshop Random Trees: Structure, Self-similarity, and Dynamics that took place during April 23-27, 2018, at the Centro de Investigación en Matemáticas (CIMAT), Guanajuato, México, for sharing their knowledge and research results.

YK would like to express his gratitude to IME - University of São Paulo (USP), São Paulo, Brazil, for hosting him during his 2018-2019 sabbatical.

Bibliography155

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. Abraham, J.-F. Delmas, H. He, Pruning Galton-Watson trees and tree-valued Markov processes Ann. Inst. H. Poincaré Probab. Statist., 48 (3) (2012) 688–705.
2[2] M. Abramowitz and I. A. Stegun, Handbook of mathematical functions: with formulas, graphs, and mathematical tables Courier Corporation, 55 (1964).
3[3] D. J. Aldous, The continuum random tree I. The Annals of Probability, 19 (1) (1991) 1–28.
4[4] D. J. Aldous, The continuum random tree III. The Annals of Probability, 21 (1) (1993) 248-289.
5[5] D. J. Aldous, Deterministic and stochastic models for coalescence (aggregation and coagulation): a review of the mean-field theory for probabilists , Bernoulli, 5 (1999) 3–48.
6[6] D. J. Aldous and J. Pitman, Tree-valued Markov chains derived from Galton-Watson processes Ann. Inst. H. Poincaré Probab. Statist., 34 (5) (1998) 637–686.
7[7] B. C. Arnold and J. S. Huang, in Exponential distribution: theory, methods and applications (edited by K. Balakrishnan and A. P. Basu), CRC Press, Taylor & Francis Group (1996).
8[8] V. I. Arnold, On the representation of continuous functions of three variables by superpositions of continuous functions of two variables Matematicheskii Sbornik Vol. 48 (90), no. 1, (1959) 3–74.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Random Self-Similar Trees:

Abstract

keywords:

Contents

1 Introduction

1.1 Early empirical evidence

1.2 Survey structure

2 Definitions and notations

2.1 Spaces of finite rooted trees

2.2 Real trees

Definition 1** (Metric tree [116, Sect. 7]).**

2.3 Horton pruning

Definition 2** (Series reduction).**

Definition 3** (Horton pruning).**

2.4 Horton-Strahler orders

Definition 4** (Horton-Strahler order).**

Definition 5** (Horton-Strahler terminology).**

2.5 Alternative definitions of Horton-Strahler orders

2.6 Tokunaga indices and side branching

Definition 6** (Tokunaga indices).**

Remark 1**.**

2.7 Labeling edges

Definition 7** (Proper embedding).**

2.8 Galton-Watson trees

Remark 2**.**

3 Self-similarity with respect to Horton pruning

3.1 Self-similarity of a combinatorial tree

Definition 8** (Horton prune-invariance).**

Remark 3**.**

Proposition 1**.**

Proof.

Definition 9** (Coordination).**

Definition 10** (Combinatorial Horton self-similarity).**

3.2 Self-similarity of a tree with edge lengths

Definition 11** (Horton self-similarity of a tree with edge lengths).**

3.3 Mean self-similarity of a combinatorial tree

Remark 4**.**

Definition 12** (Mean coordination).**

Definition 13** (Toeplitz property).**

Definition 14** (Mean Horton self-similarity).**

Definition 15** (Mean Horton prune-invariance).**

Definition 16** (Mean Horton self-similarity).**

Proposition 2**.**

Proposition 3**.**

Definition 17** (Tokunaga self-similarity).**

Remark 5** (Mean self-similarity is a property of conditional measures).**

Remark 6** (Terminology).**

3.4 Examples of self-similar trees

Example 1** (Perfect binary trees).**

Example 2** (Combinatorial critical binary Galton-Watson trees).**

Example 3** (Critical binary Galton-Watson trees with i.i.d. exponential edge lengths).**

Example 4** (Hierarchical Branching Process).**

Example 5** (Combinatorial Tokunaga trees).**

Example 6** (Tokunaga trees with i.i.d. exponential edge lengths).**

Example 7** (Critical Tokunaga processes).**

Example 8** (Independent random attachment).**

Example 9** (Why coordination?).**

4 Horton law in self-similar trees

Definition 18** (Strong Horton law for branch numbers).**

Definition 19** (Strong Horton law for mean branch numbers).**

Lemma 1**.**

Proof.

Lemma 2**.**

Theorem 1** (Strong Horton law in a mean self-similar tree).**

Proof.

Example 10** (Tokunaga self-similar trees).**

Example 11** (Shallow side-branching).**

4.1 Proof of Theorem 1

Proposition 4**.**

Proof.

Lemma 3**.**

Proof.

Lemma 4**.**

Proposition 5**.**

Definition 1 (Metric tree [116, Sect. 7]).

Definition 2 (Series reduction).

Definition 3 (Horton pruning).

Definition 4 (Horton-Strahler order).

Definition 5 (Horton-Strahler terminology).

Definition 6 (Tokunaga indices).

Remark 1.

Definition 7 (Proper embedding).

Remark 2.

Definition 8 (Horton prune-invariance).

Remark 3.

Proposition 1.

Definition 9 (Coordination).

Definition 10 (Combinatorial Horton self-similarity).

Definition 11 (Horton self-similarity of a tree with edge lengths).

Remark 4.

Definition 12 (Mean coordination).

Definition 13 (Toeplitz property).

Definition 14 (Mean Horton self-similarity).

Definition 15 (Mean Horton prune-invariance).

Definition 16 (Mean Horton self-similarity).

Proposition 2.

Proposition 3.

Definition 17 (Tokunaga self-similarity).

Remark 5 (Mean self-similarity is a property of conditional measures).

Remark 6 (Terminology).

Example 1 (Perfect binary trees).

Example 2 (Combinatorial critical binary Galton-Watson trees).

Example 3 (Critical binary Galton-Watson trees with i.i.d. exponential edge lengths).

Example 4 (Hierarchical Branching Process).

Example 5 (Combinatorial Tokunaga trees).

Example 6 (Tokunaga trees with i.i.d. exponential edge lengths).

Example 7 (Critical Tokunaga processes).

Example 8 (Independent random attachment).

Example 9 (Why coordination?).

Definition 18 (Strong Horton law for branch numbers).

Definition 19 (Strong Horton law for mean branch numbers).

Lemma 1.

Lemma 2.

Theorem 1 (Strong Horton law in a mean self-similar tree).

Example 10 (Tokunaga self-similar trees).

Example 11 (Shallow side-branching).

Proposition 4.

Lemma 3.

Lemma 4.

Proposition 5.

Lemma 5.

Definition 20 (Well-defined asymptotic Horton ratios).

Definition 21 (Root, ratio, and strong Horton laws).

Lemma 6 (Branch counting lemma, [33]).

Theorem 2 (Entropy rate for Horton self-similar trees, [33]).

Remark 7.

Lemma 7 (Side branch counting lemma, [34]).

Theorem 3 (Entropy rate for Tokunaga self-similar trees, [34]).

Corollary 1 ([34]).

Theorem 4 (Horton self-similarity of Galton-Watson trees, [29]).

Proposition 6 ([29]).

Theorem 5 (Attraction property of critical binary Galton-Watson tree, [29]).

Theorem 6 (Dynamics of branching [29, Proposition 2.1]).

Theorem 7 (LLN for order two branches, [143]).

Corollary 2 (The strong Horton law for branch counts).

Theorem 8 (CLT for order two branches, [143]).

Corollary 3 (CLT for branch numbers, [146]).

Definition 22 (Exponential critical binary Galton-Watson tree).

5.2.1 Length of a Galton-Watson random tree ${\sf GW}(\lambda)$

Lemma 8.

Corollary 4.

5.2.2 Height of a Galton-Watson random tree ${\sf GW}(\lambda)$

Lemma 9 ([85]).

Definition 23 (Hierarchical Branching Process (HBP)).

Proposition 7 (Side-branching in hierarchical branching process, [84]).

Theorem 9 (Self-similarity of hierarchical branching process, [84]).

Theorem 10 (Hydrodynamic limit for branch dynamics, [84]).

Definition 24.

Definition 25.

Proposition 8.

Proposition 9.

Remark 8.

Theorem 11 (Average progeny of a self-similar process, [84]).

Remark 9.