One- versus multi-component regular variation and extremes of Markov   trees

Johan Segers

arXiv:1902.02226·math.PR·October 5, 2020

One- versus multi-component regular variation and extremes of Markov trees

Johan Segers

PDF

TL;DR

This paper develops a comprehensive theory of multi-component regular variation for Markov trees, analyzing how tail behaviors change with different conditioning variables and establishing connections via a generalized time change formula.

Contribution

It introduces a novel multi-component regular variation framework for Markov trees, extending tail analysis beyond single-component cases and linking tail trees through a generalized formula.

Findings

01

Weak convergence to tail trees under tail assumptions

02

Balance of marginal tails leads to a generalized time change formula

03

Multi-component regular variation applies to broader models beyond Markov trees

Abstract

A Markov tree is a random vector indexed by the nodes of a tree whose distribution is determined by the distributions of pairs of neighbouring variables and a list of conditional independence relations. Upon an assumption on the tails of the Markov kernels associated to these pairs, the conditional distribution of the self-normalized random vector when the variable at the root of the tree tends to infinity converges weakly to a random vector of coupled random walks called tail tree. If, in addition, the conditioning variable has a regularly varying tail, the Markov tree satisfies a form of one-component regular variation. Changing the location of the root, that is, changing the conditioning variable, yields a different tail tree. When the tails of the marginal distributions of the conditioning variables are balanced, these tail trees are connected by a formula that generalizes the time…

Equations158

(Θ_{1, 2}, Θ_{1, 3}, Θ_{1, 4}) = (M_{1, 2}, M_{1, 2} M_{2, 3}, M_{1, 2} M_{2, 4})

(Θ_{1, 2}, Θ_{1, 3}, Θ_{1, 4}) = (M_{1, 2}, M_{1, 2} M_{2, 3}, M_{1, 2} M_{2, 4})

(Θ_{3, 2}, Θ_{3, 1}, Θ_{3, 4}) = (M_{3, 2}, M_{3, 2} M_{2, 1}, M_{3, 2} M_{2, 4}) .

(Θ_{3, 2}, Θ_{3, 1}, Θ_{3, 4}) = (M_{3, 2}, M_{3, 2} M_{2, 1}, M_{3, 2} M_{2, 4}) .

X_{A} ⊥ ⊥ X_{B} ∣ X_{S}

X_{A} ⊥ ⊥ X_{B} ∣ X_{S}

L (X_{b} / x_{a} ∣ X_{a} = x_{a}) ⟶ \mbox d μ_{e}, x_{a} \to \infty.

L (X_{b} / x_{a} ∣ X_{a} = x_{a}) ⟶ \mbox d μ_{e}, x_{a} \to \infty.

\forall η > 0, δ ↓ 0 lim x \to \infty lim sup ε \in [0, δ] sup P (X_{b} / x > η ∣ X_{a} = ε x) = 0.

\forall η > 0, δ ↓ 0 lim x \to \infty lim sup ε \in [0, δ] sup P (X_{b} / x > η ∣ X_{a} = ε x) = 0.

L (X / X_{u} ∣ X_{u} = t) ⟶ \mbox d Θ_{u} = (Θ_{u, v})_{v \in V}, t \to \infty,

L (X / X_{u} ∣ X_{u} = t) ⟶ \mbox d Θ_{u} = (Θ_{u, v})_{v \in V}, t \to \infty,

\forall v \in V ∖ {u}, Θ_{u, v} = e \in [u ⇝ v] \prod M_{e} .

\forall v \in V ∖ {u}, Θ_{u, v} = e \in [u ⇝ v] \prod M_{e} .

\begin{array}[]{l}\Theta_{1,1}=1\\ \Theta_{1,2}=M_{1,2}\\ \Theta_{1,3}=M_{1,3}\\ \Theta_{1,4}=M_{1,4}\\ \Theta_{1,5}=M_{1,4}M_{4,5}\\ \Theta_{1,6}=M_{1,4}M_{4,6}\\ \Theta_{1,7}=M_{1,4}M_{4,5}M_{5,7}\end{array}

\begin{array}[]{l}\Theta_{1,1}=1\\ \Theta_{1,2}=M_{1,2}\\ \Theta_{1,3}=M_{1,3}\\ \Theta_{1,4}=M_{1,4}\\ \Theta_{1,5}=M_{1,4}M_{4,5}\\ \Theta_{1,6}=M_{1,4}M_{4,6}\\ \Theta_{1,7}=M_{1,4}M_{4,5}M_{5,7}\end{array}

t \to \infty lim E [g (X_{1 : (d - 1)} / t) ∣ X_{0} = t] = E [g (Θ_{0, 1 : (d - 1)})],

t \to \infty lim E [g (X_{1 : (d - 1)} / t) ∣ X_{0} = t] = E [g (Θ_{0, 1 : (d - 1)})],

t \to \infty lim E [f (X_{1 : d} / t) ∣ X_{0} = t] = E [f (Θ_{0, 1 : d})] .

t \to \infty lim E [f (X_{1 : d} / t) ∣ X_{0} = t] = E [f (Θ_{0, 1 : d})] .

∣ E [f (X_{1 : d} / t) ∣ X_{0} = t] - E [f (Θ_{0, 1 : d})] ∣

∣ E [f (X_{1 : d} / t) ∣ X_{0} = t] - E [f (Θ_{0, 1 : d})] ∣

⩽ ∣ E [\mathds 1 (X_{k} / t ⩾ δ) f (X_{1 : d} / t) ∣ X_{0} = t] - E [\mathds 1 (Θ_{k} ⩾ δ) f (Θ_{0, 1 : d})] ∣

\mbox + ∣ E [\mathds 1 (X_{k} / t < δ) f (X_{1 : d} / t) ∣ X_{0} = t] - E [\mathds 1 (Θ_{k} < δ) f (Θ_{0, 1 : d})] ∣ .

E [\mathds 1 (\frac{X _{k}}{t} ⩾ δ) f (\frac{X _{1 : d}}{t}) ∣ X_{0} = t] = \int_{x_{1 : (d - 1)}} \mathds 1 (\frac{x _{k}}{t} ⩾ δ) E [f (\frac{x _{1 : (d - 1)}}{t}, \frac{X _{d}}{t}) ∣ X_{k} = x_{k}] P [X_{1 : (d - 1)} \in d x_{1 : (d - 1)} ∣ X_{0} = t] .

E [\mathds 1 (\frac{X _{k}}{t} ⩾ δ) f (\frac{X _{1 : d}}{t}) ∣ X_{0} = t] = \int_{x_{1 : (d - 1)}} \mathds 1 (\frac{x _{k}}{t} ⩾ δ) E [f (\frac{x _{1 : (d - 1)}}{t}, \frac{X _{d}}{t}) ∣ X_{k} = x_{k}] P [X_{1 : (d - 1)} \in d x_{1 : (d - 1)} ∣ X_{0} = t] .

E [\mathds 1 (\frac{X _{k}}{t} ⩾ δ) f (\frac{X _{1 : d}}{t}) ∣ X_{0} = t] = \int_{y_{1 : (d - 1)}} g_{t} (y_{1 : (d - 1)}) P [\frac{X _{1 : (d - 1)}}{t} \in d y_{1 : (d - 1)} ∣ X_{0} = t],

E [\mathds 1 (\frac{X _{k}}{t} ⩾ δ) f (\frac{X _{1 : d}}{t}) ∣ X_{0} = t] = \int_{y_{1 : (d - 1)}} g_{t} (y_{1 : (d - 1)}) P [\frac{X _{1 : (d - 1)}}{t} \in d y_{1 : (d - 1)} ∣ X_{0} = t],

g_{t} (y_{1 : (d - 1)}) = \mathds 1 (y_{k} ⩾ δ) E [f (y_{1 : (d - 1)}, y_{k} \frac{X _{d}}{t y _{k}}) ∣ X_{k} = t y_{k}] .

g_{t} (y_{1 : (d - 1)}) = \mathds 1 (y_{k} ⩾ δ) E [f (y_{1 : (d - 1)}, y_{k} \frac{X _{d}}{t y _{k}}) ∣ X_{k} = t y_{k}] .

g (y_{1 : (d - 1)}) = \mathds 1 (y_{k} ⩾ δ) E [f (y_{1 : (d - 1)}, y_{k} M_{k, d})] .

g (y_{1 : (d - 1)}) = \mathds 1 (y_{k} ⩾ δ) E [f (y_{1 : (d - 1)}, y_{k} M_{k, d})] .

\lim_{t\to\infty}g_{t}\bigl{(}y_{1:(d-1)}(t)\bigr{)}=g(y_{1:(d-1)}).

\lim_{t\to\infty}g_{t}\bigl{(}y_{1:(d-1)}(t)\bigr{)}=g(y_{1:(d-1)}).

\int_{y_{1 : (d - 1)}} g (y_{1 : (d - 1)}) P [Θ_{1 : (d - 1)} \in d y_{1 : (d - 1)}] = \int_{y_{1 : (d - 1)}} \mathds 1 (y_{k} ⩾ δ) E [f (y_{1 : (d - 1)}, y_{k} M_{k, d})] P [Θ_{1 : (d - 1)} \in d y_{1 : (d - 1)}] .

\int_{y_{1 : (d - 1)}} g (y_{1 : (d - 1)}) P [Θ_{1 : (d - 1)} \in d y_{1 : (d - 1)}] = \int_{y_{1 : (d - 1)}} \mathds 1 (y_{k} ⩾ δ) E [f (y_{1 : (d - 1)}, y_{k} M_{k, d})] P [Θ_{1 : (d - 1)} \in d y_{1 : (d - 1)}] .

E [\mathds 1 (Θ_{k} ⩾ δ) f (Θ_{1 : (d - 1)}, Θ_{k} M_{k, d})] = E [\mathds 1 (Θ_{k} ⩾ δ) f (Θ_{1 : d})]

E [\mathds 1 (Θ_{k} ⩾ δ) f (Θ_{1 : (d - 1)}, Θ_{k} M_{k, d})] = E [\mathds 1 (Θ_{k} ⩾ δ) f (Θ_{1 : d})]

P (X_{k} / t < δ ∣ X_{0} = t) + P (Θ_{k} < δ) .

P (X_{k} / t < δ ∣ X_{0} = t) + P (Θ_{k} < δ) .

E [\mathds 1 (\frac{X _{k}}{t} < δ) f (\frac{X _{1 : d}}{t}) ∣ X_{0} = t] - E [\mathds 1 (Θ_{k} < δ) f (Θ_{1 : d})]

E [\mathds 1 (\frac{X _{k}}{t} < δ) f (\frac{X _{1 : d}}{t}) ∣ X_{0} = t] - E [\mathds 1 (Θ_{k} < δ) f (Θ_{1 : d})]

⩽ E [\mathds 1 (\frac{X _{k}}{t} < δ) f (\frac{X _{1 : d}}{t}) ∣ X_{0} = t] - E [\mathds 1 (\frac{X _{k}}{t} < δ) f (\frac{X _{1 : (d - 1)}}{t}, 0) ∣ X_{0} = t]

\mbox + E [\mathds 1 (\frac{X _{k}}{t} < δ) f (\frac{X _{1 : (d - 1)}}{t}, 0) ∣ X_{0} = t] - E [\mathds 1 (Θ_{k} < δ) f (Θ_{1 : (d - 1)}, 0)]

\mbox + E [\mathds 1 (Θ_{k} < δ) f (Θ_{1 : (d - 1)}, 0)] - E [\mathds 1 (Θ_{k} < δ) f (Θ_{1 : d})] .

E [\mathds 1 (\frac{X _{k}}{t} < δ) ∣ f (\frac{X _{1 : d}}{t}) - f (\frac{X _{1 : (d - 1)}}{t}, 0)∣ X_{0} = t] ⩽ E [\mathds 1 (\frac{X _{k}}{t} < δ) min (1, L \frac{X _{d}}{t}) ∣ X_{0} = t] .

E [\mathds 1 (\frac{X _{k}}{t} < δ) ∣ f (\frac{X _{1 : d}}{t}) - f (\frac{X _{1 : (d - 1)}}{t}, 0)∣ X_{0} = t] ⩽ E [\mathds 1 (\frac{X _{k}}{t} < δ) min (1, L \frac{X _{d}}{t}) ∣ X_{0} = t] .

\int_{[0, δ)} E [min (1, L \frac{X _{d}}{t}) ∣ X_{k} = εt] P [\frac{X _{k}}{t} \in d ε ∣ X_{0} = t] .

\int_{[0, δ)} E [min (1, L \frac{X _{d}}{t}) ∣ X_{k} = εt] P [\frac{X _{k}}{t} \in d ε ∣ X_{0} = t] .

E [min (1, L X_{d} / t) ∣ X_{k} = εt]

E [min (1, L X_{d} / t) ∣ X_{k} = εt]

\mbox + E [min (1, L X_{d} / t) \mathds 1 (X_{d} / t > η) ∣ X_{k} = εt]

⩽ L η + P (X_{d} / t > η ∣ X_{k} = εt) .

L η + ε \in [0, δ) sup P (X_{d} > η t ∣ X_{k} = εt) .

L η + ε \in [0, δ) sup P (X_{d} > η t ∣ X_{k} = εt) .

δ ↓ 0 lim sup t \to \infty lim sup E [\mathds 1 (X_{k} / t < δ) min (1, L X_{d} / t) ∣ X_{0} = t] ⩽ L η .

δ ↓ 0 lim sup t \to \infty lim sup E [\mathds 1 (X_{k} / t < δ) min (1, L X_{d} / t) ∣ X_{0} = t] ⩽ L η .

E [\mathds 1 (Θ_{k} < δ) ∣ f (Θ_{1 : (d - 1)}, 0) - f (Θ_{d})∣] ⩽ E [\mathds 1 (Θ_{k} < δ) min (1, L Θ_{k} M_{k, d})] .

E [\mathds 1 (Θ_{k} < δ) ∣ f (Θ_{1 : (d - 1)}, 0) - f (Θ_{d})∣] ⩽ E [\mathds 1 (Θ_{k} < δ) min (1, L Θ_{k} M_{k, d})] .

\bigl{\lvert}\operatorname{\mathbb{E}}[f(X/X_{u})\mid X_{u}>t]-\operatorname{\mathbb{E}}[f(\Theta_{u})]\bigr{\rvert}\\ \leqslant\frac{1}{\operatorname{\mathbb{P}}(X_{u}>t)}\int_{(t,\infty)}\bigl{\lvert}\operatorname{\mathbb{E}}[f(X/X_{u})\mid X_{u}=s]-\operatorname{\mathbb{E}}[f(\Theta_{u})]\bigr{\rvert}\,\operatorname{\mathbb{P}}(X_{u}\in\mathrm{d}s).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

One- versus multi-component regular variation

and extremes of Markov trees

Abstract

A Markov tree is a random vector indexed by the nodes of a tree whose distribution is determined by the distributions of pairs of neighbouring variables and a list of conditional independence relations. Upon an assumption on the tails of the Markov kernels associated to these pairs, the conditional distribution of the self-normalized random vector when the variable at the root of the tree tends to infinity converges weakly to a random vector of coupled random walks called tail tree. If, in addition, the conditioning variable has a regularly varying tail, the Markov tree satisfies a form of one-component regular variation. Changing the location of the root, that is, changing the conditioning variable, yields a different tail tree. When the tails of the marginal distributions of the conditioning variables are balanced, these tail trees are connected by a formula that generalizes the time change formula for regularly varying stationary time series. The formula is most easily understood when the various one-component regular variation statements are tied up to a single multi-component statement. The theory of multi-component regular variation is worked out for general random vectors, not necessarily Markov trees, with an eye towards other models, graphical or otherwise.

keywords:

Conditional independence; graphical model; Hüsler–Reiss distribution; max-linear model; Markov tree; multivariate Pareto distribution; Pickands dependence function; regular variation; root change formula; tail measure; tail tree; time change formula.

\authornames

J. SEGERS

\authorone

[UCLouvain]Johan Segers \addressoneUCLouvain, LIDAM/ISBA, Voie du Roman Pays 20, B-1348 Louvain-la-Neuve, Belgium. Email: [email protected]

1 Introduction

Imagine a random vector $X=(X_{1},\ldots,X_{d})$ of nonnegative variables. One of the components, say $X_{i}$ , is known to have exceeded a large threshold. How does this information affect the conditional distribution of the whole vector $X$ ? There could be a causal link from $X_{i}$ to the other variables $X_{j}$ , perhaps via a network of dependence relations, so that tampering with $X_{i}$ would affect the whole system. Another possibility is that a large value of $X_{i}$ is merely the result of a large value of some other variable $X_{j}$ . The latter event, however, could have consequences for still other variables $X_{k}$ .

Depending on which one of the $d$ components is known to have been exceptionally large, the conditional distribution of $X$ is likely to be different. Still, if high values of two variables $X_{i}$ and $X_{j}$ are not unlikely to arrive together, the conditional distribution of $X$ given that $X_{i}$ is large must be connected to the one given that $X_{j}$ is large.

In this paper, these questions are studied for general random vectors using the language of regular variation. The answers are worked out for the particular case that $X$ is a Markov tree. A large value at a particular node is found to spread through the tree via independent increments along the edges. The joint limit distribution is the one of a vector of coupled geometric random walks. The couplings occur through the common edges of different paths starting at the same root node.

Graphical models, of which Markov trees are a special case, bring structure and sparsity to the web of dependence relations between many random variables [23, 38]. Extreme value theory for such models is a fairly recent subject. In [1], a metric that takes the distance along a river into account underlies a spatial model for extremes of river networks. Recursive max-linear models on directed acyclic graphs are proposed in [13] and put to work in [9, 14]. In [17], the density of a multivariate Pareto distribution is factorized through a version of the Hammersley–Clifford theorem. Such factorizations are also the theme in [10], where they form the basis of new inference methods for extremes of graphical models, including the identification of the graphical structure itself. Multivariate Hüsler–Reiss extreme-value copulas based on Gaussian Markov trees and higher-order truncated vines are introduced in [24], who propose composite likelihood methods based on bivariate margins to estimate the parameters.

Multivariate Pareto distributions arise as weak limits of normalized random vectors conditionally on the event that at least one component exceeds a high threshold. Although such conditioning events are covered by Theorem 3.9 below, the focus of this paper is rather on the case where the exceedance is known to have occurred at a specific variable. The message hinted at in the title is that both points of view are mathematically equivalent, but that, at least for Markov trees, the one-component limit is particularly elegant, as will be explained next.

1.1 Tail tree of a Markov tree

For a Markov chain, it was discovered in [35] that, conditionally on the event that the series is large at some time instant, the conditional distribution of the future of the system is that of a random walk, a process called tail chain in [26]. For light-tailed marginal distributions, this random walk is additive, and for heavy-tailed margins it is geometric, i.e., multiplicative, which is the convention used in this paper.

A Markov tree can be viewed as a coupled collection of Markov chains with common stretches. Take for instance the four-variate Markov tree in Figure 1. The nodes of the tree are $\{1,2,3,4\}$ and the three pairs of neighbours are $\{1,2\}$ , $\{2,3\}$ and $\{2,4\}$ . The vector $(X_{1},X_{2},X_{3})$ is a Markov chain, and so is $(X_{1},X_{2},X_{4})$ . These two chains are coupled via the common pair $(X_{1},X_{2})$ . Conditionally on $X_{2}$ , the variables $X_{1}$ , $X_{3}$ and $X_{4}$ are independent, since any path that connects two of the three nodes 1, 3 and 4 passes through node 2. This conditional independence property together with the distributions of the three pairs $(X_{1},X_{2})$ , $(X_{2},X_{3})$ and $(X_{2},X_{4})$ determines the joint distribution of $(X_{1},X_{2},X_{3},X_{4})$ .

For the moment, assume that the four variables have the same, regularly varying tail function. The set-up involving regular variation will be further motivated in Section 1.2. The effect (not necessarily causal) on $X_{2}$ of a large value at $X_{1}$ is via a multiplicative increment $M_{1,2}$ whose distribution is equal to the weak limit of $X_{2}/X_{1}$ conditionally on $X_{1}=t$ as $t\to\infty$ . The existence of this limit is an assumption on the Markov kernel induced by the distribution of the pair $(X_{1},X_{2})$ . Similarly, a large value at $X_{2}$ affects $X_{3}$ and $X_{4}$ via the increments $M_{2,3}$ and $M_{2,4}$ , respectively. The effect of $X_{1}$ on $X_{3}$ is then through the composite increment $M_{1,2}M_{2,3}$ , whereas on $X_{4}$ it is through $M_{1,2}M_{2,4}$ . The conditional independence property ensures that the increments $M_{1,2}$ , $M_{2,3}$ and $M_{2,4}$ are mutually independent. The common edge $(1,2)$ on the paths from node $1$ to node $3$ and from node $1$ to node $4$ induces dependence between the two tail chains $(M_{1,2},M_{1,2}M_{2,3})$ and $(M_{1,2},M_{1,2}M_{2,4})$ via the common increment $M_{1,2}$ . In this paper, the random vector

[TABLE]

is called the tail tree induced by $X$ with root at node $u=1$ .

The tail tree represents a network of stochastic dependence relations that are not necessarily causal. Suppose the Markov tree in Figure 1 represents water levels at four locations on a river network. If water flows from left to right, node 2 represents a point where the stream branches into two channels, as occurs for instance in a river delta. If water flows from right to left, however, node 2 represents the junction of two branches coming from nodes 3 and 4 into a larger stream flowing towards node 1. In the first case, the tail tree describes how a high water level at the upstream node 1 may cause high water levels at various locations in the delta further downstream. In the second, case however, it is nodes 3 and 4 that are situated upstream, and the tail tree models the sources of a high water volume at the downstream site 1. Still other set-ups are possible, such as for instance node 3 being upstream and nodes 1 and 4 being downstream: high water levels at nodes 1 and 4 are then related through a common cause at node 2, which can itself perhaps be traced back to node 3.

Whatever the causal relationships within $X$ , it may make sense to change the conditioning variable. In Figure 1, for instance, suppose it is known that a large value has occurred at node 3 rather than at node 1. Tracing the paths from node 3 to the three other nodes yields the tail tree with root at node $u=3$ :

[TABLE]

The tail trees in (1) and (2) have a similar structure. The two edges on the path between the root nodes 1 and 3 have changed direction, however. The edge from node 2 to node 4 is common to both tail trees.

For each pair $\{a,b\}$ of neighbouring nodes, the choice of the root node $u$ determines which of the two increments appears in the tail tree: $M_{a,b}$ from $X_{a}$ to $X_{b}$ or $M_{b,a}$ from $X_{b}$ to $X_{a}$ . The distributions of $M_{a,b}$ and $M_{b,a}$ are connected by an expression that involves the marginal distributions of $X_{a}$ and $X_{b}$ . For stationary and reversible Markov chains, this relation underlies a sufficiency property discovered in [4]. For tail chains of not necessarily reversible Markov chains, it was described in [21, 33] and for tail processes of regularly varying stationary time series in [2] via the time change formula. This formula can be understood most easily through the connection between the tail process and the tail measure [7, 27, 32], and this is also the way in which the root change formula in Corollary 3.2 below will be derived, but then without the assumption of stationarity and for general random vectors, not necessarily Markov trees.

1.2 Regular variation

The language of regularly varying functions and measures provides a rich medium through which to express limit theorems. Recall that a positive, Lebesgue measurable function $f$ defined on a neighbourhood of infinity is regularly varying with index $\tau\in\mathbb{R}$ if $\lim_{t\to\infty}f(\lambda t)/f(t)=\lambda^{\tau}$ for all $\lambda\in(0,\infty)$ . If $X$ is a nonnegative random variable with unbounded support, cumulative distribution function $F(x)=\operatorname{\mathbb{P}}(X\leqslant x)$ and tail function $\overline{F}=1-F$ , regular variation of $\overline{F}$ with index $-\alpha<0$ is equivalent to weak convergence of the conditional distribution of $X/t$ given that $X>t$ to a Pareto random variable $Y$ with index $\alpha$ , i.e., $\operatorname{\mathbb{P}}(Y>y)=y^{-\alpha}$ for all $y\in[1,\infty)$ . We write $\mathcal{L}(X/t\mid X>t)\raisebox{-0.5pt}{\,\scriptsize$ \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize $d$ }}}}{{\longrightarrow}} $}\,\operatorname{Pa}(\alpha)$ as $t\to\infty$ , where $\mathcal{L}(Z\mid A)$ denotes the conditional distribution of the random object $Z$ given the event $A$ , the arrow $\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize$ d $}}}}{{\longrightarrow}}$ denotes convergence in distribution, and $\operatorname{Pa}(\alpha)$ denotes the said Pareto distribution.

For multivariate distributions, regular variation can be described via multivariate cumulative distribution functions as well, but an approach via convergence of Borel measures is more versatile. Let the state space be $\SS=[0,\infty)^{d}$ . Generalizations to star-shaped metric spaces or abstract cones as in [7, 18, 25, 34] are left for further work. Let $I\subset\{1,\ldots,d\}$ denote the non-empty set of indices $i$ of variables of which the conditioning event $X_{i}>t$ is of possible interest. The marginal distributions of $X_{i}$ for $i\in I$ are assumed to be regularly varying and the ratios of their tail functions are assumed to converge to positive constants. This set-up is a bit more general than the one of identical margins and comes at little technical or notational cost.

The measures involved may have infinite mass but need to assign finite values to sets that remain bounded away from $\{x\in\SS:\forall i\in I,x_{i}=0\}$ or $\{x\in\SS:x_{i}=0\}$ , depending on the conditioning event. The topology on the space of such measures will be the one proposed in [25], extending [18], and resembles the one of vague convergence of measures, but avoiding the need to consider artificially compactified spaces. Regular variation is defined as convergence of $b(t)\,\operatorname{\mathbb{P}}(X/t\in\,\cdot\,)$ to a limit measure called tail measure. Here, $b(t)>0$ is a scale function tending to infinity and calibrated to the marginal distributions of $X_{i}$ for $i\in I$ .

It is instructive to formulate statements in terms of weak convergence of distributions. For a high threshold $t$ tending to infinity and for a component $i\in I$ , consider the asymptotic distribution of the rescaled random vector $X/t$ given that $X_{i}>t$ . Decompose $X/t$ as $(X_{i}/t,X/X_{i})$ . Here, $X_{i}/t$ represents the overall level of $X$ with respect to $t$ whereas $X/X_{i}$ represents a self-normalized version of $X$ . Convergence in distribution of $(X_{i}/t,X/X_{i})$ given $X_{i}>t$ as $t\to\infty$ is a special case of what is called one-component regular variation in [17], explored already in [16, 30] for the bivariate case but allowing for affine normalizations. The random variable $X_{i}/t$ is asymptotically $\operatorname{Pa}(\alpha)$ distributed and independent of $X/X_{i}$ , whose weak limit, denoted by $\Theta_{i}=(\Theta_{i,j})_{j=1}^{d}$ , captures extremal dependence within $X$ given that $X_{i}$ is large. Letting the index $i$ run through $I$ produces multiple such one-component regular variation statements, which, together, are equivalent to what can be called multi-component regular variation. The limit distributions $\Theta_{i}$ that arise for various indices $i$ must be mutually consistent, and the tail measure mentioned at the end of the previous paragraph embraces them all at once.

In Section 3, the focus is on tying together multiple one-component regular variation limits. The theory is worked out for general random vectors, not necessarily Markov trees. A number of results in that section have already been formulated in the literature in one way or another, in slightly different settings. Some of the equivalence relations in Theorem 3.1, for instance, resemble those in [17, Theorem 1.4] and [34, Proposition 3.1]. The model consistency property between limit measures in Theorem 3.1(ii) is formulated in [6, Section 2] for the bivariate case. The root change formula in Corollary 3.4 extends the time change formula for regularly varying stationary time series stemming from [2] and studied extensively in [7, 20]. Multivariate Pareto distributions as in Theorem 3.9 are foreshadowed in [29, Section 6.3] and appear in [11, 31] when $\rho(x)=\max(x_{1},\ldots,x_{d})$ and in [8] for more general functionals $\rho$ . These are just a few connections, and the above list is by no means intended to be complete.

The set-up involving regular variation is intended to serve two purposes. First, to model tail dependence within a vector of random variables which have been transformed to the same, heavy-tailed distribution, such as the unit-Fréchet distribution, as is common in multivariate extreme value theory. Second, to model the joint distribution of a vector of regularly varying random variables, not necessarily identically distributed, but with equivalent tails, such as returns on financial portfolios composed of the same basket of underlying assets. The latter framework is more general than the former and comes at little additional notational cost.

1.3 Outline

For a Markov tree $X$ , convergence as $t\to\infty$ of the conditional distribution of $X/X_{u}$ given that $X_{u}=t$ is proved in Section 2. The main assumption is that, for edges $e=(a,b)$ directed away from the root $u$ , the conditional distribution of $X_{b}/X_{a}$ given $X_{a}=t$ converges as $t\to\infty$ . No regular variation is needed yet.

The tail trees pertaining to different roots $u$ can be linked up thanks to the theory of one- and multi-component regular variation developed in Section 3. The results do not rely on the Markov property and cover quite general random vectors $X$ on $[0,\infty)^{d}$ , as is illustrated briefly for max-linear models. An interesting special case of these are the recursive max-linear structural equation models introduced in [13], featuring a causal structure induced by a directed acyclic graph. Most of the proofs of this section are deferred to the Appendix.

When combined, the results in Section 2 and 3 serve to uncover the regular variation properties of Markov trees in Section 4. The common special case that the joint distribution of the Markov tree is absolutely continuous with respect to Lebesgue measure is the subject of Section 5. The theory then simplifies considerably and the limit distribution with respect to a single root $u$ is already sufficient to reconstruct the limit distributions with respect to all other possible roots $\bar{u}$ .

In Sections 4 and 5, the distributions of the increments of the tail trees are calculated in case the pair distributions are max-stable, not necessarily absolutely continuous. For the Hüsler–Reiss distribution max-stable distribution, the tail tree is multivariate log-normal, constructed from partial sums of independent normal random variables along the edges of the tree.

2 The spectral tail tree of a Markov tree

A (finite) graph is a pair $(V,E)$ where $V$ is a non-empty finite set of vertices or nodes and where $E\subset V\times V$ is a set of edges. Self-loops are excluded, i.e., $(u,u)\not\in E$ for all $u\in V$ . To avoid trivialities, $V$ is assumed to have at least two elements. Two nodes are neighbours if they are joined by an edge. A graph is undirected if $(a,b)\in E$ implies $(b,a)\in E$ . A path from a node $u$ to a node $v$ is a collection $\{e_{1},\ldots,e_{n}\}\subset E$ of edges such that $e_{k}=(u_{k-1},u_{k})$ for all $k=1,\ldots,n$ , for $n+1$ distinct nodes $u_{0},u_{1},\ldots,u_{n}\in V$ such that $u_{0}=u$ and $u_{n}=v$ . An undirected tree $\mathcal{T}=(V,E)$ is an undirected graph such that for any pair of distinct nodes $u$ and $v$ , there exists a unique path from $u$ to $v$ , and this path is then denoted by $[{u}\rightsquigarrow{v}]$ .

Let $\mathcal{T}=(V,E)$ be an undirected tree and let $X=(X_{v})_{v\in V}$ be a random vector indexed by the nodes of the tree. The pair $(X,\mathcal{T})$ is a Markov tree if it satisfies the global Markov property [23]: whenever $A,B,S$ are disjoint, non-empty subsets of $V$ such that $S$ separates $A$ and $B$ (i.e., any path between a node $a\in A$ and a node $b\in B$ passes through some node in $S$ ), the conditional independence relation

[TABLE]

holds, where $X_{W}$ denotes the random vector $(X_{v})_{v\in W}$ for $W\subset V$ .

For an undirected tree $\mathcal{T}=(V,E)$ and a node $u\in V$ , let $\mathcal{T}_{u}=(V,E_{u})$ denote the directed, rooted tree that consists of directing the edges in $E$ outward starting from $u$ . Formally, $E_{u}$ is the subset of $E$ that is obtained by choosing for every pair of edges $(a,b)$ and $(b,a)$ in $E$ the one such that the first node separates the second one from $u$ . If $(a,b)\in E_{u}$ , then $a$ is the (necessarily unique) parent of $b$ in $\mathcal{T}_{u}$ whereas $b$ is a child of $a$ in $\mathcal{T}_{u}$ .

Let $(X,\mathcal{T})$ be a nonnegative Markov tree, where $\mathcal{T}=(V,E)$ is an undirected tree.

{condition}

There exists $u\in V$ with the following two properties.

(i)

For every directed edge $e=(a,b)\in E_{u}$ , there exists a version of the conditional distribution of $X_{b}$ given $X_{a}$ and a probability measure $\mu_{e}$ on $[0,\infty)$ such that

[TABLE] 2. (ii)

For edges $e=(a,b)\in E_{u}$ such that $a\neq u$ and such that there exists an edge $\bar{e}\in[{u}\rightsquigarrow{a}]$ for which $\mu_{\bar{e}}(\{0\})>0$ , we have

[TABLE]

Assumption 2(ii) is similar to [26, equation (3.4)] and prevents non-extreme values to cause extreme ones. A similar assumption is [33, equation (2.4)], where it is illustrated [33, Example 7.5] what can go wrong without it.

Theorem 2.1

Let $(X,\mathcal{T})$ be a nonnegative Markov tree on $\mathcal{T}=(V,E)$ . Assume Condition 2. Let $(M_{e}:e\in E_{u})$ be a vector of independent random variables such that the law of $M_{e}$ is $\mu_{e}$ for all $e\in E_{u}$ . Then

[TABLE]

where $\Theta_{u,u}=1$ and

[TABLE]

The random vector $(\Theta_{u,v})_{v\in V}$ is called the tail tree of the Markov tree $(X_{v})_{v\in V}$ , adapting terminology for Markov chains in [26]. In Figure 2, the tail tree is illustrated for a tree with seven nodes. For subvectors $(\Theta_{u,w})_{w\in W}$ where all nodes in $W$ lie on the same path starting at $u$ , the structure of the tail tree is that of a geometric random walk; take for instance $u=1$ and $W=\{1,4,5,7\}$ in Figure 2. The tail tree couples several geometric random walks together through the common edges in the underlying paths: in the same figure, consider for instance the vectors indexed by $\{1,4,5,7\}$ and by $\{1,4,6\}$ , respectively, which share the initial edge $(1,4)$ .

Proof 2.2 (Proof of Theorem 2.1)

Put $d=\lvert{V}\rvert-1\geqslant 1$ . The proof is by induction on $d$ .

If $V$ has only two elements, i.e., $d=1$ , then Condition 2(i) already confirms the convergence stated in (6) and (7). Therefore, we can henceforth assume that $V$ has at least three elements, i.e., $d\geqslant 2$ . Identify $V$ with $\{0,1,\ldots,d\}$ in such a way that the root is $u=0$ and such that if $(a,b)\in E_{u}$ then $a<b$ . Since $X_{0}/X_{0}=1=\Theta_{0,0}$ , we do not need to consider the components $X_{0}$ and $\Theta_{0,0}$ in (6).

Step 1.* — Let $k$ denote the parent of $d$ in the directed tree $\mathcal{T}_{0}$ , that is, $k$ is the unique node in $\{0,1,\ldots,d-1\}$ such that $(k,d)$ is an edge in $E_{0}$ . Our way of numbering nodes implies that $d$ cannot be the parent of every other node. Condition 2 is then satisfied also for the nonnegative Markov tree $X_{0:(d-1)}=(X_{0},X_{1},\ldots,X_{d-1})$ on the tree that is obtained from $\mathcal{T}$ by removing node $d$ from $V$ and edges $(k,d)$ and $(d,k)$ from $E$ . The induction hypothesis then means that, for every bounded, continuous function $g:[0,\infty)^{d}\to\mathbb{R}$ , we have*

[TABLE]

the joint distribution of $\Theta_{0,1:(d-1)}=(\Theta_{0,1},\ldots,\Theta_{0,d-1})$ being given by (7).

Let $f:\mathbb{R}^{d}\to[0,1]$ be a Lipschitz function. We will show that

[TABLE]

Recall that $k\in\{0,1,\ldots,d-1\}$ denotes the parent node of $d$ . We need to distinguish between two cases: $k=0$ is the root or $k\in\{1,\ldots,d-1\}$ is a non-root vertex. The case $k=0$ is similar to but easier than the case $k\in\{1,\ldots,d-1\}$ and is left to the reader. We assume henceforth that $k\in\{1,\ldots,d-1\}$ .

Step 2.* — Let $\delta>0$ be such that $\operatorname{\mathbb{P}}(\Theta_{0,k}=\delta)=0$ . We have*

[TABLE]

We will show that the expression (9) converges to zero as $x_{0}\to\infty$ (Step 3). Moreover, we will find a bound for the limit superior of (10) as $x\to\infty$ . The bound will depend on $\delta$ but will converge to zero as $\delta\downarrow 0$ (Step 4). Together, these properties of (9) and (10) are sufficient to prove the theorem (Step 5).

Step 3: The term (9).* — The vertex $k$ is the parent of $d$ in $\mathcal{T}_{0}$ , and therefore it separates $d$ from the other vertices. By the conditional independence property (3),*

[TABLE]

To explain our notation: the integral is over $x_{1:(d-1)}=(x_{1},\ldots,x_{d-1})$ and is with respect to the conditional distribution of $X_{1:(d-1)}=(X_{1},\ldots,X_{d-1})$ given that $X_{0}=t$ . The integrand involves the conditional expectation of a function of $X_{d}$ given that $X_{k}=x_{k}$ .

We change variables and integrate with respect to the conditional distribution of $X_{1:(d-1)}/t$ given that $X_{0}=t$ : we get

[TABLE]

where the integrand in (11) is given by

[TABLE]

By Assumption 2(i), we have $X_{d}/x_{k}\mid X_{k}=x_{k}\raisebox{-0.5pt}{\,\scriptsize$ \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize $d$ }}}}{{\longrightarrow}} $}\,M_{k,d}$ as $x_{k}\to\infty$ . Define

[TABLE]

Recall that $f$ is bounded and (Lipschitz) continuous. By the extended continuous mapping theorem [37, Theorem 18.11], we have, for all vectors $y_{1:(d-1)}$ such that $y_{k}\neq\delta$ and for all functions $y_{1:(d-1)}(\,\cdot\,)$ such that $y_{1:(d-1)}(t)\to y_{1:(d-1)}$ as $t\to\infty$ , the limit relation

[TABLE]

Moreover, $\mathcal{L}(X_{1:(d-1)}/t\mid X_{0}=t)\raisebox{-0.5pt}{\,\scriptsize$ \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize $d$ }}}}{{\longrightarrow}} $}\,\mathcal{L}(\Theta_{1:(d-1)})$ as $t\to\infty$ by the induction hypothesis. By the same extended continuous mapping theorem, the integral (11) converges to

[TABLE]

Recall that $(M_{e})_{e\in E_{u}}$ is a vector of independent random variables such that the law of $M_{e}$ is $\mu_{e}$ for $e\in E_{u}$ . By construction, $M_{k,d}$ and $\Theta_{1:(d-1)}$ are then independent too: each component $\Theta_{0,j}$ of $\Theta_{1:(d-1)}$ is a product of random variables $M_{a,b}$ with $a,b\in\{0,\ldots,d-1\}$ and thus $e=(a,b)\neq(k,d)$ . The above integral may therefore be simplified to

[TABLE]

since $\Theta_{d}=\Theta_{k}M_{k,d}$ . It follows that the limit of (9) as $t\to\infty$ is equal to zero.

Step 4: The term (10).* — We consider two cases: $\operatorname{\mathbb{P}}(\Theta_{k}=0)=0$ (Step 4.a) and $\operatorname{\mathbb{P}}(\Theta_{k}=0)>0$ (Step 4.b).*

Step 4.a: The case $\operatorname{\mathbb{P}}(\Theta_{k}=0)=0$ .* — Since $0\leqslant f\leqslant 1$ , the integral (10) is bounded by*

[TABLE]

By the induction hypothesis, this sum converges to $2\operatorname{\mathbb{P}}(\Theta_{k}<\delta)$ as $x\to\infty$ . The latter probability converges to zero as $\delta\downarrow 0$ , as required.

Step 4.b: The case $\operatorname{\mathbb{P}}(\Theta_{k}=0)>0$ .* — We decompose (10) into three terms:*

[TABLE]

Let $L>0$ be such that $\lvert{f(y)-f(z)}\rvert\leqslant L\sum_{j=1}^{d}\lvert{y_{j}-z_{j}}\rvert$ for all $y,z\in\mathbb{R}^{d}$ . Furthermore, recall that $0\leqslant f\leqslant 1$ , so that also $\lvert{f(y)-f(z)}\rvert\leqslant 1$ for all $y,z\in\mathbb{R}^{d}$ .

Step 4.b.i: The term (12).* — The term (12) is bounded by*

[TABLE]

The node $k$ separates the nodes [math] and $d$ . By the global Markov property, the expectation on the right-hand side of (15) is therefore equal to

[TABLE]

Let $\eta\in(0,2/L)$ . The conditional expectation in the integrand in (16) satisfies

[TABLE]

Therefore, the integral in (16) is bounded by

[TABLE]

By Assumption 2(ii), we can first take the limit superior as $t\to\infty$ and then the limit superior as $\delta\downarrow 0$ to find that

[TABLE]

Since $\eta$ can be chosen arbitrarily close to zero, we find that the double limit superior above is equal to zero.

Step 4.b.ii: The term (13).* — By the induction hypothesis, the term (13) converges to zero as $t\to\infty$ .*

Step 4.b.iii: The term (14).* — Since $\Theta_{d}=\Theta_{k}M_{k,d}$ , the term (14) is bounded by*

[TABLE]

By the dominated convergence theorem, the expectation on the right-hand side converges to zero as $\delta\downarrow 0$ .

Step 5.* — The terms (9) and (10) were analyzed in Steps 3 and 4, respectively. In Step 3, it was shown that the term (9) converges to zero as $t\to\infty$ , for any $\delta>0$ such that $\operatorname{\mathbb{P}}(\Theta_{0,k}=\delta)=0$ . In Step 4, it was shown that the limit superior as $t\to\infty$ of the term in (10) is bounded by a quantity depending on $\delta$ which converges to zero as $\delta\downarrow 0$ . Since the expression in (8) does not depend on $\delta$ , its limit as $t\to\infty$ must thus be zero.*

This completes the proof of the induction step and thus of the theorem.

Corollary 2.3

In the setting of Theorem 2.1, also $\mathcal{L}(X/X_{u}\mid X_{u}>t)\raisebox{-0.5pt}{\,\scriptsize$ \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize $d$ }}}}{{\longrightarrow}} $}\,\Theta_{u}$ as $t\to\infty$ .

Proof 2.4

For a bounded and continuous function $f:[0,\infty)^{V}\to\mathbb{R}$ , we have

[TABLE]

Given $\varepsilon>0$ , Theorem 2.1 allows us to find $t(\varepsilon)$ sufficiently high such that the absolute value inside the integral is bounded by $\varepsilon$ for all $s\geqslant t(\varepsilon)$ . But then the left-hand side in the previous display is bounded by $\varepsilon$ too, for all $t\geqslant t(\varepsilon)$ . Since $\varepsilon>0$ was arbitrary, the stated convergence in distribution follows.

3 One- versus multi-component regular variation

Let $X=(X_{1},\ldots,X_{d})$ be a random vector of nonnegative variables. Upon an obvious change in notation, Corollary 2.3 concerned weak convergence of $\mathcal{L}(X/X_{i}\mid X_{i}>t)$ as $t\to\infty$ for some $i\in\{1,\ldots,d\}$ . This convergence plus regular variation of the marginal distribution of $X_{i}$ is a special case of what is called one-component regular variation in [17]. The weak limit, $\Theta_{i}=(\Theta_{i,j})_{j=1}^{d}$ , depends on the choice of $i$ . There may be good reasons to consider these limits for several indices $i$ . Let $I\subset\{1,\ldots,d\}$ be the set of all indices $i$ for which such a limit $\Theta_{i}$ exists. How are these random vectors $\Theta_{i}$ related?

In this section, several such one-component statements are combined into a single one which could be called multi-component regular variation. If $I=\{1,\ldots,d\}$ , this is just ordinary multivariate regular variation. As discussed already in Section 1.2, the connections between the limits $\Theta_{i}$ generalize the time change formula for stationary regularly varying time series and can be deduced from their connections to a limiting tail measure.

Let $\SS=[0,\infty)^{d}$ for some positive integer $d$ and let $I\subset\{1,\ldots,d\}$ be non-empty. For $x\in\SS$ , put $x_{I}=(x_{i})_{i\in I}$ . Define $\SS_{0,I}=\{x\in\SS:\max(x_{I})>0\}$ . Let $\mathcal{M}_{0,I}$ denote the collection of Borel measures $\nu$ on $\SS_{0,I}$ with the property that $\nu(B)$ is finite for every Borel set $B$ of $\SS_{0,I}$ that is contained in a set of the form $\{x\in\SS:\max(x_{I})\geqslant\varepsilon\}$ for some $\varepsilon>0$ . Let $\mathcal{C}_{0,I}$ denote the collection of bounded, continuous functions $f:\SS_{0,I}\to\mathbb{R}$ for which there exists $\varepsilon>0$ such that $f(x)=0$ as soon as $\max(x_{I})\leqslant\varepsilon$ . Let $\mathcal{M}_{0,I}$ be equipped with the smallest topology that makes the evaluation mappings $\nu\mapsto\nu(f)=\int f\,\mathrm{d}\nu$ continuous, where $f$ ranges over $\mathcal{C}_{0,I}$ . This is the notion of $\mathcal{M}_{\mathbb{O}}$ convergence in [25], with, in their notation, $\mathbb{C}=\{x\in[0,\infty)^{d}:\forall i\in I,x_{i}=0\}$ and $\mathbb{O}=\SS\setminus\mathbb{C}=\SS_{0,I}$ . The topology just defined is metrizable, turning $\mathcal{M}_{0,I}$ into complete, separable metric space, with convenient characterizations of relative compactness, a Portmanteau theorem, and a mapping theorem, all very much in the spirit of the notion of vague convergence of Borel measures on locally compact second countable Hausdorff spaces. Convergence of measures with respect to this topology is denoted by the arrow $\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize$ 0 $}}}}{{\longrightarrow}}$ . If $I$ is just a singleton, $\{i\}$ say, then notation is simplifed from $\SS_{0,\{i\}}$ to $\SS_{0,i}$ and so on. For $\alpha>0$ , let $\operatorname{Pa}(\alpha)$ denote the Pareto distribution on $[1,\infty)$ with shape parameter $\alpha$ , that is, the distribution of a random variable $Z$ such that $\operatorname{\mathbb{P}}(Z>z)=z^{-\alpha}$ for $z\geqslant 1$ . Product measure is denoted by $\otimes$ .

Theorem 3.1

Let $X=(X_{1},\ldots,X_{d})$ be a random vector in $\SS=[0,\infty)^{d}$ and let $I\subset\{1,\ldots,d\}$ be non-empty. Let $F_{i}(x)=\operatorname{\mathbb{P}}(X_{i}\leqslant x)$ and $\overline{F}_{i}=1-F_{i}$ for $i\in I$ and $x\in[0,\infty)$ . Assume there exists a function $b$ , regularly varying at infinity with index $\alpha>0$ , such that $\lim_{t\to\infty}b(t)\overline{F}_{i}(t)=c_{i}\in(0,\infty)$ for $i\in I$ . The following statements are equivalent:

(a)

For every $i\in I$ we have $\mathcal{L}(X/X_{i}\mid X_{i}>t)\raisebox{-0.5pt}{\,\scriptsize$ \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize $d$ }}}}{{\longrightarrow}} $}\,\mathcal{L}(\Theta_{i})$ as $t\to\infty$ for some random vector $\Theta_{i}=(\Theta_{i,j})_{j=1}^{d}$ on $\SS$ . 2. (b)

For every $i\in I$ we have $\mathcal{L}(X_{i}/t,X/X_{i}\mid X_{i}>t)\raisebox{-0.5pt}{\,\scriptsize$ \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize $d$ }}}}{{\longrightarrow}} $}\,\operatorname{Pa}(\alpha)\otimes\mathcal{L}(\Theta_{i})$ as $t\to\infty$ for some random vector $\Theta_{i}$ on $\SS$ . 3. (c)

For every $i\in I$ we have $\mathcal{L}(X/t\mid X_{i}>t)\raisebox{-0.5pt}{\,\scriptsize$ \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize $d$ }}}}{{\longrightarrow}} $}\,\mathcal{L}(Y_{i})$ as $t\to\infty$ for some random vector $Y_{i}=(Y_{i,j})_{j=1}^{d}$ on $\SS$ . 4. (d)

For every $i\in I$ there exists $\nu_{i}\in\mathcal{M}_{0,i}$ such that $b(t)\operatorname{\mathbb{P}}(X/t\in\,\cdot\,)\raisebox{-0.5pt}{\,\scriptsize$ \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize $0$ }}}}{{\longrightarrow}} $}\,\nu_{i}$ as $t\to\infty$ in $\mathcal{M}_{0,i}$ . 5. (e)

There exists $\nu\in\mathcal{M}_{0,I}$ such that $b(t)\operatorname{\mathbb{P}}(X/t\in\,\cdot\,)\raisebox{-0.5pt}{\,\scriptsize$ \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize $0$ }}}}{{\longrightarrow}} $}\,\nu$ as $t\to\infty$ in $\mathcal{M}_{0,I}$ .

In that case, the limiting objects are connected in the following ways: for all $i\in I$ ,

(i)

$Y_{i}$ * is equal in distribution to $Y_{i,i}\Theta_{i}$ , where $\mathcal{L}(Y_{i,i})=\operatorname{Pa}(\alpha)$ and $Y_{i,i}$ and $\Theta_{i}$ are independent;* 2. (ii)

$\nu_{i}$ * is equal to the restriction of $\nu$ to $\SS_{0,i}$ ;* 3. (iii)

$\operatorname{\mathbb{P}}(Y_{i}\in\,\cdot\,)=c_{i}^{-1}\nu(\,\cdot\,\cap\{x:x_{i}>1\})$ ; 4. (iv)

for every Borel measurable $f:\SS_{0,I}\to[0,\infty]$ , we have

[TABLE]

The proof of Theorem 3.1, together with the proofs of the other theorems in this section, is given in Appendix A.

To highlight the connection with the theory of one-component regular variation in [17], note that the random vector $(X,\boldsymbol{Y})$ taking values in $[1,\infty)\times\mathbb{R}^{d-1}$ in [17, Theorem 1.4] plays the same role as the random vector $(X_{i},X/X_{i})$ in Theorem 3.1 above. The equivalence between (ii) and (iv) in the cited theorem is then the same as the equivalence between (a) and (b) in Theorem 3.1.

Further, note that in Theorem 3.1(a), for $j\in\{1,\ldots,d\}$ such that $\Theta_{i,j}$ is not degenerate at [math], we necessarily have $\liminf_{t\to\infty}\operatorname{\mathbb{P}}(X_{j}>t)/\operatorname{\mathbb{P}}(X_{i}>t)>0$ , i.e., the tail of $X_{j}$ is at least as heavy as the one of $X_{i}$ . If also $j\in I$ , we can reverse the roles of $i$ and $j$ to find that the condition that the tails of all variables $X_{i}$ with $i\in I$ are balanced is almost forced.

Apart from the characterizations (a)–(e) in Theorem 3.1, other equivalent ones are possible, for instance, involving sequences rather than functions, with a scaling function inside the probability rather than outside, or with respect to radial and ‘angular’ coordinates $(\rho(X)/t,X/\rho(X))$ for some appropriate functional $\rho$ . See for instance [29, Theorem 6.1] and [25, Theorem 3.1]. The tail measures $\nu$ and $\nu_{i}$ are homogeneous with index $-\alpha$ [25, Theorem 3.1] and, upon a coordinate transformation, can be written as product measures. Since the focus here is on the weak limits $\Theta_{i}$ , these properties are not further elaborated upon. Statement (e) in Theorem 3.1 implies that the vector $X_{I}=(X_{i})_{i\in I}$ is multivariate regularly varying with limit measure $\nu_{I}(\,\cdot\,)=\nu(\{x\in[0,\infty)^{d}:x_{I}\in\,\cdot\,\})$ on $[0,\infty)^{I}\setminus\{0\}$ , which in turn implies, among other things, that it is in the domain of attraction of a multivariate max-stable distribution with Fréchet margins and exponent measure $\nu_{I}$ ; see for instance [28, 29].

A noteworthy special case of (17) is when $f$ is the indicator function of the orthant $\{x\in\SS_{0,I}:\forall j\in J,\,x_{j}>y_{j}\}$ , where $J\subset\{1,\ldots,d\}$ has a non-empty intersection with $I$ and where $y_{j}>0$ for all $j\in J$ . If $i\in I\cap J$ , then $f(x)\mathds{1}\{x_{i}>0\}=f(x)$ , and thus

[TABLE]

A remarkable consequence is that the right-hand side does not depend on the choice of $i\in I\cap J$ . This invariance property is a special case of a more general mutual consistency property of the limit distributions $\mathcal{L}(\Theta_{i})$ for $i\in I$ that is formulated in Corollary 3.2 below.

Corollary 3.2 (Model consistency)

If the equivalent conditions of Theorem 3.1 are fulfilled, then, for Borel measurable $f:\SS_{0,I}\to[0,\infty]$ and for $i,j\in I$ , we have

[TABLE]

Proof 3.3

By (17), both sides in (19) are equal to $\int_{\SS_{0,I}}f(x)\,\mathds{1}\{x_{i}>0,x_{j}>0\}\,\mathrm{d}\nu(x)$ .

Corollary 3.4 (Root-change formula)

If the equivalent conditions of Theorem 3.1 are fulfilled, then, for all Borel measurable $g:\SS_{0,I}\to[0,\infty)$ and for all $i,j\in I$ , we have

[TABLE]

In particular, $\operatorname{\mathbb{P}}[\Theta_{j,i}>0]=1$ if and only if $\operatorname{\mathbb{E}}[\Theta_{i,j}^{\alpha}]=c_{j}/c_{i}$ , and then, for all $g$ as above,

[TABLE]

Proof 3.5

In Corollary 3.2, take $f(x)=g(x/x_{j})\,\mathds{1}\{x_{j}>1\}$ . As $\Theta_{j,j}=1$ , the left-hand side of (19) is $c_{j}\operatorname{\mathbb{E}}[\mathds{1}\{\Theta_{j,i}>0\}g(\Theta_{j})]$ . The right-hand side of (19) becomes $c_{i}\operatorname{\mathbb{E}}[\mathds{1}\{\Theta_{i,j}>0\}g(\Theta_{i}/\Theta_{i,j})\Theta_{i,j}^{\alpha}]$ , in which the indicator function is redundant.

The special case $g\equiv 1$ in (20) yields $c_{j}\operatorname{\mathbb{P}}(\Theta_{j,i}>0)=c_{i}\,\operatorname{\mathbb{E}}[\Theta_{i,j}^{\alpha}]$ . If $\operatorname{\mathbb{E}}[\Theta_{i,j}^{\alpha}]=c_{j}/c_{i}$ , we have $\operatorname{\mathbb{P}}(\Theta_{j,i}>0)=1$ , and the indicator function on the left-hand side of (20) can be omitted.

If the limit measure $\nu$ does not assign any mass to the coordinate hyperplane $\{x:x_{i}=0\}$ , the indicator in (17) is redundant and $\nu$ can be expressed entirely in terms of $\mathcal{L}(\Theta_{i})$ . Moreover, whether this occurs or not can be read off from the $\alpha$ -th moments of the components of $\Theta_{i}$ .

Corollary 3.6

In Theorem 3.1, we have, for $i\in I$ ,

[TABLE]

If $\nu(\{x:x_{i}=0\})=0$ for some $i\in I$ , then, for all Borel measurable $f:\SS_{0,I}\to[0,\infty]$ ,

[TABLE]

Moreover, all tail trees $\Theta_{j}$ for $j\in I$ are determined by $\Theta_{i}$ via (21).

Proof 3.7

Choose $i,j\in I$ . In (17), let $f$ be the indicator function of the set $\{x:x_{i}=0\}$ and let the index $i$ in (17) be equal to the index $j$ chosen here. It follows that $\nu(\{x\in\SS_{0,I}:x_{i}=0,x_{j}>0\})$ is zero if $\operatorname{\mathbb{P}}(\Theta_{j,i}=0)=0$ and infinity otherwise. By (20) with $g\equiv 1$ , we have $\operatorname{\mathbb{P}}(\Theta_{j,i}=0)=0$ if and only if $\operatorname{\mathbb{P}}(\Theta_{j,i}>0)=1$ if and only if $\operatorname{\mathbb{E}}[\Theta_{i,j}^{\alpha}]=c_{j}/c_{i}$ . Equation (22) follows. If $\nu(\{x:x_{i}=0\})=0$ , we can omit the indicator $\mathds{1}\{x_{i}>0\}$ on the left-hand side of (17), yielding (23).

Let $f$ be the indicator function of the set $\{x:\exists j\in J,\,x_{j}>y_{j}\}$ , where $J\subset\{1,\ldots,d\}$ is non-empty and where $y_{j}>0$ for all $j\in J$ . If $\nu(\{x:x_{i}=0\})=0$ , then, by (23),

[TABLE]

In contrast to equation (18), equation (24) is true only when $\nu(\{x:x_{i}=0\})=0$ , a prerequisite for which Corollary 3.6 gives a necessary and sufficient condition.

By Corollary 3.6, the case where $\operatorname{\mathbb{E}}[\Theta_{i,j}^{\alpha}]=c_{j}/c_{i}$ for all $j\in I$ leads to considerable simplifications. In fact, the special case $K=\{i\}$ in the next theorem implies that even the weak convergence of $\mathcal{L}(X/X_{j}\mid X_{j}>t)$ for $j\in I$ can then be deduced from the weak convergence of $\mathcal{L}(X/X_{i}\mid X_{i}>t)$ alone.

Theorem 3.8

In Theorem 3.1, a sufficient condition for (a)–(e) to hold is that there exists a non-empty set $K\subset I$ with the following two properties:

(i)

For every $i\in K$ we have $\mathcal{L}(X/X_{i}\mid X_{i}>t)\raisebox{-0.5pt}{\,\scriptsize$ \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize $d$ }}}}{{\longrightarrow}} $}\,\mathcal{L}(\Theta_{i})$ as $t\to\infty$ for some random vector $\Theta_{i}$ on $\SS$ . 2. (ii)

For every $j\in I\setminus K$ , there exists $i=i(j)\in K$ such that $\operatorname{\mathbb{E}}[\Theta_{i,j}^{\alpha}]=c_{j}/c_{i}$ .

*In that case, also $\mathcal{L}(X/X_{j}\mid X_{j}>t)\raisebox{-0.5pt}{\,\scriptsize$ \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize $d$ }}}}{{\longrightarrow}} $}\,\mathcal{L}(\Theta_{j})$ as $t\to\infty$ for $j\in I\setminus K$ , where the law of $\Theta_{j}$ is given in terms of the one of $\Theta_{i}$ with $i=i(j)$ via (21). *

The focus so far has been on weak limits of conditional distributions involving a high-threshold exceedance by a specific component. In the spirit of the multivariate peaks-over-thresholds methodology [10, 22], the following result covers, among other possibilities, the case where the conditioning event involves a high-threshold exceedance in at least one of a number of components.

Theorem 3.9

Suppose the conditions of Theorem 3.1 are fulfilled. Let $\rho:\SS\to[0,\infty)$ be continuous and homogeneous of order one, that is, $\rho(\lambda x)=\lambda\rho(x)$ for $\lambda\in[0,\infty)$ and $x\in\SS$ . If $\SS_{\rho}:=\{x\in\SS_{0,I}:\rho(x)>1\}$ is contained in a set of the form $\{x:\max(x_{I})>\varepsilon\}$ for some $\varepsilon>0$ and if $\nu(\SS_{\rho})>0$ , then $b(t)\operatorname{\mathbb{P}}[\rho(X)>t]\to\nu(\SS_{\rho})\in(0,\infty)$ as $t\to\infty$ and $\mathcal{L}(X/t\mid\rho(X)>t)\raisebox{-0.5pt}{\,\scriptsize$ \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize $d$ }}}}{{\longrightarrow}} $}\,\nu(\,\cdot\,\cap\SS_{\rho})/\nu(\SS_{\rho})$ as $t\to\infty$ . The conclusions of Theorem 3.1 thus apply to the random vector $(X,\rho(X))$ in the space $[0,\infty)^{d+1}$ and relative to the index set $I\cup\{d+1\}$ .

Examples of $\rho$ in Theorem 3.9 are $\rho(x)=\max_{i\in I}a_{i}x_{i}$ and $\rho(x)=\sum_{i\in I}a_{i}x_{i}$ , where $a\in[0,\infty)^{I}\setminus\{0\}$ . The special case $I=\{1,\ldots,d\}$ and $\rho(x)=\max(x_{1},\ldots,x_{d})$ produces multivariate Pareto distributions as in [29, Section 6.3] and other references mentioned in Section 1.2. Also covered by Theorem 3.9 is $\rho(x)=\min_{i\in J}a_{i}x_{i}$ for non-empty $J\subset I$ and $a_{j}\in(0,\infty)$ for all $j\in J$ provided $\operatorname{\mathbb{E}}[\min\{a_{j}^{\alpha}\Theta_{i,j}^{\alpha}:j\in J\}]>0$ for some (and hence all) $i\in I$ . If, however, $\nu(\SS_{\rho})=0$ , then $\operatorname{\mathbb{P}}[\rho(X)>t]$ decays more rapidly than $b(t)$ , and more refined models are needed, opening up a whole new world of possibilities.

Example 3.10

Let $X=(X_{1},\ldots,X_{d})$ follow the max-linear model

[TABLE]

where $a_{i,r}\in[0,\infty)$ are scalars such that $\max_{r}a_{i,r}>0$ for all $i$ and where $Z_{1},\ldots,Z_{s}$ are independent and identically distributed nonnegative random variables whose common distribution function $F$ has a regularly varying tail function $\overline{F}=1-F$ with index $-\alpha<0$ . The marginal tails satisfy $\overline{F}_{i}(t)/\overline{F}(t)\to\sum_{r}a_{i,r}^{\alpha}=:c_{i}$ as $t\to\infty$ . If $X_{i}$ exceeds a large threshold $t\to\infty$ , the probability that this was due to $Z_{r}$ is proportional to $a_{i,r}^{\alpha}$ , and then the other factors $Z_{\bar{r}}$ for $\bar{r}\neq r$ are of smaller order than $Z_{r}$ . It follows that (a) in Theorem 3.1 holds where the law of $\Theta_{i}$ is discrete with at most $s$ atoms and is given by

[TABLE]

with $\epsilon_{x}(\,\cdot\,)$ denoting a unit point mass at $x$ . From (26), we find

[TABLE]

It follows that $\operatorname{\mathbb{E}}[\Theta_{i,j}^{\alpha}]=c_{j}/c_{i}$ as soon as $a_{i,r}>0$ for every $r$ such that $a_{j,r}>0$ . Indeed, in that case, if $X_{j}$ is large, then some variable $Z_{r}$ with $r=1,\ldots,s$ such that $a_{j,r}$ is positive was large, which in turn implies that $X_{i}$ is large as well, so that $\operatorname{\mathbb{P}}(\Theta_{j,i}>0)=1$ .

Example 3.11

Recursive max-linear models on directed acyclic graphs were introduced in [13]. Borrowing some of their notation, consider a directed acyclic graph $\mathcal{D}=(V,E)$ with nodes $V=\{1,\ldots,d\}$ and edges $E=\{(k,i):i\in V,k\in\operatorname{pa}(i)\}$ , where $\operatorname{pa}(i)\subset V$ denotes the possibly empty set of parents of $i$ . Consider a random vector $X=(X_{1},\ldots,X_{d})$ given by the structural equation model

[TABLE]

where the random variables $Z_{1},\ldots,Z_{d}$ are as in Example 3.10 with $s=d$ and where all coefficients $\gamma_{ki}$ and $\gamma_{ii}$ are (strictly) positive; the maximum over the empty set is zero by convention. Then by [14, Theorem 2.2], the random vector $X$ admits the max-linear representation

[TABLE]

with coefficients $b_{ji}$ , for $i,j\in\{1,\ldots,d\}$ , defined as follows: $b_{ii}=\gamma_{ii}$ and $b_{ji}=0$ if $j\in V\setminus(\operatorname{an}(i)\cup\{i\})$ , while

[TABLE]

where $P_{ji}$ is the collection of paths $p=\{e_{1},\ldots,e_{n}\}$ from $j$ to $i$ in $\mathcal{D}$ ; recall the definition of a path in the beginning of Section 2. The representation in (28) is of the form (25) with $s=d$ and with $a_{i,r}=b_{ri}$ for $i,r\in\{1,\ldots,d\}$ . It follows that $a_{i,r}=0$ unless $r=i$ or $r\in\operatorname{an}(i)$ .

The condition that $a_{i,r}>0$ whenever $a_{j,r}>0$ is satisfied as soon as $j\in\operatorname{an}(i)$ : indeed, in that case, we have $\operatorname{an}(j)\cup\{j\}\subset\operatorname{an}(i)$ , so that $b_{rj}>0$ implies $r\in\operatorname{an}(j)\cup\{j\}$ and thus $r\in\operatorname{an}(i)$ , implying $b_{ri}>0$ . Through the structural equation model (27), a large value appearing at a node $j$ will also be felt at any of its descendants $i$ . We get $\operatorname{\mathbb{P}}(\Theta_{j,i}>0)=1$ for $j\in\operatorname{an}(i)$ , which, by Corollary 3.4, means that the root-change formula (21) applies, by which the law of $\Theta_{j}$ can be recovered from the one of $\Theta_{i}$ . Moreover, Theorem 3.8 applies with $K$ equal to the set of leaf nodes, i.e., the nodes without descendants.

In the special case that the directed acyclic graph is also a directed, rooted tree, every node $i$ has either exactly one parent or is equal to the root node, say $u$ . In that case, the collection of paths $P_{ji}$ between $j\in\operatorname{an}(i)$ and $i\in\{1,\ldots,d\}\setminus\{u\}$ is a singleton, $p=[{j}\rightsquigarrow{i}]$ , and the formula for $b_{ji}$ in (28) simplifies to $b_{ji}=\gamma_{jj}\prod_{e\in[{j}\rightsquigarrow{i}]}\gamma_{e}$ . Furthermore, the tail tree $\Theta_{u}$ in (26) starting from the root node $u$ simplifies to the degenerate distribution at the point $\theta_{u}=(\theta_{u,1},\ldots,\theta_{u,d})$ with coordinates $\theta_{u,j}=\prod_{e\in[{u}\rightsquigarrow{j}]}\gamma_{e}\in(0,\infty)$ for $j\in\{1,\ldots,d\}$ . This is of the form in (7) with degenerate increments $M_{e}=\gamma_{e}$ for all $e\in E$ .

4 Regularly varying Markov trees

As in Section 2, let $(X,\mathcal{T})$ be a nonnegative Markov tree on the undirected tree $\mathcal{T}=(V,E)$ . The general theory in Section 3 sheds light on the relation between two tail trees emanating at different roots. For two different nodes $u$ and $\bar{u}$ in $V$ , the sets of directed edges $E_{u}$ and $E_{\bar{u}}$ are the same except for the edges connecting nodes on the path between $u$ and $\bar{u}$ , which are directed in opposite ways in the two edge sets: For every $(a,b)\in[{u}\rightsquigarrow{\bar{u}}]=E_{u}\setminus E_{\bar{u}}$ , we have $(b,a)\in[{\bar{u}}\rightsquigarrow{u}]=E_{\bar{u}}\setminus E_{u}$ and the other way around.

Condition 2 was formulated relative to a single root $u\in V$ . The next condition covers all nodes $u\in U$ in a non-empty subset $U$ of $V$ as possible roots. For such $U$ , let $E_{U}=\bigcup_{u\in U}E_{u}$ denote the set of directed edges that appear in at least one of the directed trees $E_{u}$ .

{condition}

There exists a non-empty $U\subset V$ with the following two properties:

(i)

For every $e=(a,b)\in E_{U}$ , there exists a version of the conditional distribution of $X_{b}$ given $X_{a}$ and a probability measure $\mu_{e}$ on $[0,\infty)$ such that (4) holds. 2. (ii)

For every edge $e=(a,b)\in E_{U}$ for which there exists $u\in U$ such that $e\in E_{u}$ and an edge $\bar{e}\in[{u}\rightsquigarrow{a}]$ such that $\mu_{\bar{e}}(\{0\})>0$ , we have (5).

If $u,v\in U$ in Condition 4 then every node $w\in V$ that is on the path between $u$ and $v$ can be added to $U$ and Condition 4 remains true. Indeed, for such $u,v,w$ , we have $E_{w}\subset E_{u}\cup E_{v}$ , which takes care of (i), and $[{w}\rightsquigarrow{a}]\subset[{u}\rightsquigarrow{a}]\cup[{v}\rightsquigarrow{a}]$ for every node $a\in V$ , which takes care of (ii). The author is grateful to an anonymous reviewer for having pointed this out.

Corollary 4.1

Let $(X,\mathcal{T})$ be a nonnegative Markov tree on the undirected tree $\mathcal{T}=(V,E)$ . Let $U\subset V$ be non-empty. Let $F_{u}(x)=\operatorname{\mathbb{P}}(X_{u}\leqslant x)$ and $\overline{F}_{u}=1-F_{u}$ for all $u\in U$ . Assume that there exists a positive function $b$ , regularly varying at infinity with index $\alpha>0$ , such that $b(t)\overline{F}_{u}(t)\to c_{u}\in(0,\infty)$ as $t\to\infty$ for every $u\in U$ . Assume that Condition 4 holds. Let $(M_{e})_{e\in E_{U}}$ be a vector of independent random variables such that $M_{e}$ has law $\mu_{e}$ for each $e\in E_{U}$ . Then all conclusions of Theorem 3.1 hold with $I=U$ and with $\Theta_{u}$ the tail tree in (7) for $u\in U$ .

Proof 4.2

Condition 4 and Corollary 2.3 imply that assumption (a) in Theorem 3.1 is satisfied for $I=U$ and with $\Theta_{u}$ the tail tree in (7), for every $u\in U$ . All equivalence relations and other properties are then as stated in Theorem 3.1.

Corollary 4.3

In Corollary 4.1, if $a,b\in V$ are neighbours in $E$ and if they both belong to $U$ , then the distributions of $M_{a,b}$ and $M_{b,a}$ mutually determine each other by

[TABLE]

for all Borel measurable $g:(0,\infty)\to[0,\infty]$ .

Proof 4.4

To find (29), apply (20) to the case $d=2$ and the random vector $(X_{a},X_{b})$ . The two limit random vectors $\Theta_{u}$ in Theorem 3.1(a) are $(1,M_{a,b})$ and $(M_{b,a},1)$ when conditioning on $X_{a}>t$ and on $X_{b}>t$ , respectively. Equation (29) implies $\operatorname{\mathbb{P}}(M_{b,a}>z)=(c_{a}/c_{b})\operatorname{\mathbb{E}}[\mathds{1}\{zM_{a,b}<1\}\,M_{a,b}^{\alpha}]$ for all $z\in[0,\infty)$ , so that the distribution of $M_{b,a}$ can be recovered from the one of $M_{a,b}$ .

For different roots $u,\bar{u}\in U$ , the tail trees $\Theta_{u}$ and $\Theta_{\bar{u}}$ have the same multiplicative structure. The differences between their distributions lie in the starting nodes of the paths and in the distributions of the multiplicative increments for edges on the paths $[{u}\rightsquigarrow{\bar{u}}]$ and $[{\bar{u}}\rightsquigarrow{u}]$ , since these edges change direction. For such edges of which the nodes belong to $U$ as well, the increment distributions are related by (29). See Figure 3 for an illustration.

For $u,\bar{u}\in U$ , the equality $\operatorname{\mathbb{E}}[\Theta_{u,\bar{u}}^{\alpha}]=c_{\bar{u}}/c_{u}$ has interesting ramifications, see Corollaries 3.4 and 3.6 and Theorem 3.8. If all nodes on the path between $u$ and $\bar{u}$ belong to $U$ as well, then, since

[TABLE]

we have $\operatorname{\mathbb{E}}[\Theta_{u,\bar{u}}^{\alpha}]=c_{\bar{u}}/c_{u}$ as soon as $\operatorname{\mathbb{E}}[M_{a,b}^{\alpha}]=c_{b}/c_{a}$ for every $e=(a,b)\in[{u}\rightsquigarrow{\bar{u}}]$ .

Given the tree structure, the distribution of a Markov tree $X$ on $\mathcal{T}=(V,E)$ is entirely determined by the bivariate distributions $(X_{a},X_{b})$ for $e=(a,b)\in E$ . Markov chains of which all pairs $(X_{i},X_{i+1})$ are max-stable were proposed in [5, Section 4.6] and [36]. When extended to trees, this construction method provides models meeting Condition 4.

Example 4.5

Let the distribution of the random pair $(X,Y)$ on $(0,\infty)^{2}$ be bivariate max-stable with cumulative distribution function

[TABLE]

where $A:[0,1]\to[1/2,1]$ is a Pickands dependence function, that is, a convex function such that $\max(w,1-w)\leqslant A(w)\leqslant 1$ for all $w\in[0,1]$ ; see [15] and the references therein. Both marginal distributions are unit-Fréchet, $F(z,\infty)=F(\infty,z)=\exp(-1/z)$ for $z\in(0,\infty)$ . In particular, the marginal tail functions are regularly varying at infinity with index $-\alpha=-1$ .

Let $A^{\prime}$ be the left-hand derivative of $A$ , which exists everywhere on $(0,1]$ , takes values between $-1$ and $1$ , and is non-decreasing and continuous from the left; define $A^{\prime}(0)$ as the right-hand limit. Since $A$ is convex, it is absolutely continuous, and the set of points in $(0,1)$ where it is not continuously differentiable is at most countable. For $x,y\in(0,\infty)$ such that $A$ is differentiable at $w=x/(x+y)$ , we have

[TABLE]

It follows that $\mathcal{L}(Y/x\mid X=x)\raisebox{-0.5pt}{\,\scriptsize$ \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize $d$ }}}}{{\longrightarrow}} $}\,M$ as $x\to\infty$ , where

[TABLE]

This is part (i) of Condition 2. Further, equation (5) in part (ii) of Condition 2 follows from the monotone regression dependence property of bivariate max-stable distributions established in [12], by which the supremum over $\varepsilon$ in (5) is attained in $\varepsilon=\delta$ and the limit superior as $x\to\infty$ is bounded by $\operatorname{\mathbb{P}}(M\geqslant\eta/\delta)$ , which tends to [math] as $\delta\downarrow 0$ for every fixed $\eta>0$ .

*This construction using bivariate max-stable distributions is in some sense generic. Given a random variable $M$ on $[0,\infty)$ with expectation $\operatorname{\mathbb{E}}(M)\leqslant 1$ , one can define a Pickands dependence function $A$ by $A(w)=1-\operatorname{\mathbb{E}}[\min(1-w,wM)]$ for $w\in[0,1]$ , and then (31) holds. The extension to general exponents $\alpha$ and tail constants $c_{u}$ is straightforward. *

5 Absolutely continuous case

If the joint distribution of the Markov tree $X=(X_{v})_{v\in V}$ on $\mathcal{T}=(V,E)$ is absolutely continuous with respect to the Lebesgue measure on $[0,\infty)^{V}$ , the formulations of the conditions and results simplify considerably. Let $f$ denote the joint probability density function of $X$ and let $f_{v}$ , for $v\in V$ , denote the marginal density of $X_{v}$ .

By the Hammersley–Clifford theorem [23, Theorem 3.9], $X$ is a Markov tree as soon as the joint density factorizes as

[TABLE]

The second product is over all unordered pairs of neighbours and $f_{a,b}$ denotes the bivariate density function of $(X_{a},X_{b})$ .

For $t\in(0,\infty)$ such that $f_{a}(t)\in(0,\infty)$ , the density of $\mathcal{L}(X_{b}/t\mid X_{a}=t)$ is $tf_{a,b}(t,ty)/f_{a}(t)$ for $y\in(0,\infty)$ . The following condition replaces Condition 4.

{condition}

For every $e=(a,b)\in E$ , there exists a probability density function $q_{a,b}$ on $(0,\infty)$ such that

[TABLE]

Theorem 5.1

Let the random vector $X$ on $[0,\infty)^{V}$ be a Markov tree on the undirected tree $\mathcal{T}=(V,E)$ with joint density function $f$ . Assume there exists a positive function $g$ , regularly varying at infinity with index $-\alpha-1<-1$ , such that $f_{v}(t)/g(t)\to c_{v}\in(0,\infty)$ as $t\to\infty$ for every $v\in(0,\infty)$ . If Condition 5 holds, then the conditions of Corollary 4.1 are satisfied with $U=V$ , the same constants $c_{u}$ , and auxiliary function $b(t)=\alpha/\{t\,g(t)\}$ . For all pairs of neighbours $a,b\in V$ , the density of $M_{a,b}$ is $q_{a,b}$ and for almost every $y\in(0,\infty)$ , we have

[TABLE]

Moreover, $\operatorname{\mathbb{E}}[\Theta_{u,v}^{\alpha}]=c_{v}/c_{u}$ for all $u,v\in V$ , so that $b(t)\operatorname{\mathbb{P}}(X/t)\raisebox{-0.5pt}{\,\scriptsize$ \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize $0$ }}}}{{\longrightarrow}} $}\,\nu$ as $t\to\infty$ , where $\nu\in\mathcal{M}_{0}$ satisfies

[TABLE]

for every $u\in V$ and for every Borel measurable $f:[0,\infty)^{V}\setminus\{0\}\to[0,\infty]$ , with $\Theta_{u}$ the tail tree in (7). Moreover, all tail trees are connected through (21).

Proof 5.2

The function $f_{v}$ is regularly varying at infinity with index $-\alpha-1$ too. By Karamata’s theorem [3, Proposition 1.5.10], we have $tf_{v}(t)/\overline{F}_{v}(t)\to\alpha$ and thus $\overline{F}_{v}(t)/\{tg(t)\}\to c_{v}/\alpha$ as $t\to\infty$ .

Condition 4 with $U=V$ follows from Condition 5 and Scheffé’s theorem. Part (ii) of Condition 4 is void, since $\mu_{e}(\{0\})=0$ for every $e\in E$ .

If $a,b\in V$ are neighbours, we can apply (29) to $g(y)=\mathds{1}_{(0,z)}(y)$ , where $z\in(0,\infty)$ , to find

[TABLE]

Since this is true for every $z\in(0,\infty)$ , we must have $c_{b}\,q_{b,a}(y)=c_{a}\,y^{-\alpha-2}\,q_{a,b}(y^{-1})$ for almost every $y\in(0,\infty)$ , whence (32).

Since $\mu_{(b,a)}$ does not have an atom at [math], the identity (29) with $g=\mathds{1}_{(0,\infty)}$ implies that $\operatorname{\mathbb{E}}[M_{(a,b)}^{\alpha}]=c_{b}/c_{a}$ . Apply (30) and the observation on the line just below that equation to see that $\operatorname{\mathbb{E}}[\Theta_{u,v}^{\alpha}]=c_{v}/c_{u}$ for all $u,v\in V$ . By Corollary 3.4, all tail trees are then connected via (21).

Finally, $\mathcal{M}_{0}$ -convergence to $\nu$ with the stated expression follows from Theorem 3.1 and Corollary 3.6.

Example 5.3

In Example 4.5, assume that $A$ is twice continuously differentiable on $(0,1)$ and that $A^{\prime}(0)=-1$ and $A^{\prime}(1)=1$ . The distribution of $(X,Y)$ is then absolutely continuous and the conditional density of $Y/x$ given that $X=x$ converges as $x\to\infty$ to the function

[TABLE]

The conditions on $A$ imply that $\int_{0}^{\infty}q(z)\,\mathrm{d}z=\int_{0}^{1}w\,A^{\prime\prime}(w)\,\mathrm{d}w=1$ and $\int_{0}^{\infty}z\,q(z)\,\mathrm{d}z=\int_{0}^{1}(1-w)\,A^{\prime\prime}(w)\,\mathrm{d}w=1$ , so that $q$ is a probability density function with first moment equal to $1$ . Moreover, replacing the function $A$ by the Pickands dependence function $w\mapsto A(1-w)$ amounts to changing $q$ by the function $z\mapsto z^{-3}q(z^{-1})$ , in line with (32) with $c_{a}=c_{b}=1$ and $\alpha=1$ .

An interesting example in this respect is the bivariate Hüsler–Reiss distribution [19] with Pickands dependence function

[TABLE]

for $0<w<1$ . Here $\lambda\in(0,\infty)$ is a parameter and $\Phi(z)=\int_{-\infty}^{z}(2\pi)^{-1/2}\exp(-z^{2}/2)\,\mathrm{d}z$ is the standard normal cumulative distribution function. After tedious calculations, we find that $q$ is given by the density of the lognormal random variable $M=\exp\{2\lambda(Z-\lambda)\}$ , where $Z$ is a standard normal random variable. Note indeed that $\operatorname{\mathbb{E}}[M]=1$ . Moreover, the density function satisfies $z\,q(z)=z^{-2}q(z^{-1})$ for all $z\in(0,\infty)$ , which is (32) with $q_{a,b}=q_{b,a}=q$ and $c_{a}=c_{b}=1$ and $\alpha=1$ . This also follows from the symmetry of the Hüsler–Reiss Pickands dependence function, i.e., $A(w)=A(1-w)$ for all $w\in[0,1]$ , so that the pair $(X,Y)$ is exchangeable.

*If all neighbouring pairs $(X_{a},X_{b})$ for $(a,b)\in E$ of the Markov tree follow such Hüsler–Reiss max-stable distributions, the joint distribution of the tail tree is multivariate log-normal, since $\log\Theta_{u,v}=\sum_{e\in[{u}\rightsquigarrow{v}]}\log M_{e}$ for all $u,v\in V$ , where the random variables $\log M_{e}$ are independent and normally distributed with expectation $-2\lambda_{e}^{2}$ and variance $4\lambda_{e}^{2}$ , with dependence parameter $\lambda_{e}\in(0,\infty)$ for all $e\in E$ . *

Appendix A Proofs for Section 3

Proof A.1 (Proof of Theorem 3.1)

(a) and (b) are equivalent.* — Clearly, (b) implies (a). Conversely, assume (a); let us show (b). Let $z\in[1,\infty)$ and let $\theta\in\SS$ be such that $\operatorname{\mathbb{P}}(\Theta_{j}=\theta_{j})=0$ for all $j\in\{1,\ldots,d\}$ . We have*

[TABLE]

*It follows that $\operatorname{\mathbb{P}}(X_{i}/t\leqslant z,X/X_{i}\leqslant\theta\mid X_{i}>t)\to\operatorname{\mathbb{P}}(Z\leqslant z)\operatorname{\mathbb{P}}(\Theta_{i}\leqslant\theta)$ as $t\to\infty$ , where $Z$ is a $\operatorname{Pa}(\alpha)$ random variable. *

(b) implies (c) and (i).* — Since $(X/t)=(X_{i}/t)(X/X_{i})$ , statement (b) and the continuous mapping theorem [37, Theorem 2.3] imply that $\mathcal{L}(X/t\mid X_{i}>t)$ converges weakly to $Y_{i}=Z\Theta_{i}$ , where $Z$ is a $\operatorname{Pa}(\alpha)$ random variable independent of $\Theta_{i}$ . Since $\Theta_{i,i}=1$ almost surely, we have $Y_{i,i}=Z$ .*

(c) implies (a).* — Since $X/X_{i}=(X/t)/(X_{i}/t)$ , statement (c) and the continuous mapping theorem imply statement (a) with $\Theta_{i}=Y_{i}/Y_{i,i}$ .*

(b) implies (d).* — Define a Borel measure $\nu_{i}$ on $\SS_{0,i}$ by*

[TABLE]

If $B$ is a Borel subset of $\SS_{0,i}$ contained in $\{x\in\SS_{0,i}:x_{i}\geqslant\varepsilon\}$ for some $\varepsilon>0$ , then $\operatorname{\mathbb{P}}(z\Theta_{i}\in B)=0$ as soon as $z<\varepsilon$ , since $\Theta_{i,i}=1$ almost surely. As a consequence, $\nu_{i}(B)\leqslant c_{i}\varepsilon^{-\alpha}$ for such $B$ . It follows that $\nu_{i}\in\mathcal{M}_{0,i}$ .

By linearity of the integral and by monotone convergence, we find that

[TABLE]

for every nonnegative Borel measurable function $f$ on $\SS_{0,i}$ . The same expression is then true for real-valued Borel measurable functions $f$ on $\SS_{0,i}$ for which at least one of the two integrals with $f$ replaced by $\lvert{f}\rvert$ is finite. This includes bounded, Borel measurable functions that vanish on a set of the form $\{x\in\SS_{0,i}:x_{i}\leqslant\varepsilon\}$ for some $\varepsilon>0$ .

Let $f\in\mathcal{C}_{0,i}$ and let $\varepsilon>0$ be such that $f(x)=0$ as soon as $x_{i}\leqslant\varepsilon$ . By (b), we have

[TABLE]

where $Z$ is a $\operatorname{Pa}(\alpha)$ random variable, independent of $\Theta_{i}$ . The limit is equal to

[TABLE]

since $f(z\Theta_{i})=0$ almost surely whenever $z\leqslant\varepsilon$ , as $\Theta_{i,i}=1$ almost surely.

(d) implies (c).* — For $z\in(0,\infty)$ , we have $b(t)\operatorname{\mathbb{P}}(X_{i}/t>z)\to c_{i}z^{-\alpha}$ as $t\to\infty$ , and thus $\nu_{i}(\{x:x_{i}>z\})=c_{i}z^{-\alpha}$ by (d). For open $G\subset\mathbb{R}^{d}$ , the Portmanteau theorem [25, Theorem 2.1(iii)] yields*

[TABLE]

By the Portmanteau lemma for weak convergence [37, Lemma 2.2] we obtain (c) where the law of $Y_{i}$ is $\operatorname{\mathbb{P}}(Y_{i}\in\,\cdot\,)=c_{i}^{-1}\nu_{i}(\,\cdot\,\cap\{x:x_{i}>1\})$ .

(d) implies (e).* — For every $z>0$ , we have*

[TABLE]

Since the limit is finite for every $z>0$ and since it converges to zero as $z\to\infty$ , it follows by the relative compactness criterion in [25, Theorem 2.5] that for every sequence $(t_{n})_{n}$ tending to infinity, there exists a subsequence along which $b(t_{n})\operatorname{\mathbb{P}}(X/t_{n}\in\,\cdot\,)$ converges in $\mathcal{M}_{0,I}$ . To show (e), we then need to show that these subsequence limits must coincide. To do so, we show that for every $f\in\mathcal{C}_{0,I}$ , the limit of $b(t)\operatorname{\mathbb{E}}[f(X/t)]$ exists as $t\to\infty$ . This fixes the value of the integral of such $f$ with respect to all subsequence limits, which then must be the same.

For $\varepsilon>0$ , let $h_{\varepsilon}:[0,\infty)\to[0,1]$ be the piece-wise linear function

[TABLE]

Put $\hbar_{\varepsilon}=1-h_{\varepsilon}$ . Write $I=\{i_{1},\ldots,i_{k}\}$ . Then

[TABLE]

For $f\in\mathcal{C}_{0,I}$ we can find $\varepsilon>0$ such that $f(x)=0$ if $\max(x_{i_{1}},\ldots,x_{i_{k}})\leqslant\varepsilon$ . Then $f(x)\prod_{\ell=1}^{k}\hbar_{\varepsilon}(x_{i_{\ell}})=0$ for all $x$ , and thus $f=\sum_{i\in I}f_{i}$ where, for $\ell\in\{1,\ldots,k\}$ , we have

[TABLE]

Each function $f_{i}$ belongs to $\mathcal{C}_{0,I}$ too but has moreover the property that $f_{i}(x)=0$ as soon as $x_{i}\leqslant\varepsilon/2$ . The restriction of $f_{i}$ to $\SS_{0,i}$ thus belongs to $\mathcal{C}_{0,i}$ . By (d),

[TABLE]

The existence of a limit has thus been shown, and convergence in $\mathcal{M}_{0,I}$ to some measure $\nu$ as stated in (e) follows.

(e) implies (d), (ii), (iii) and (iv).* — A function $f$ in $\mathcal{C}_{0,i}$ can be extended to a function in $\mathcal{C}_{0,I}$ denoted by the same symbol by putting $f(x)=0$ for $x\in\SS_{0,I}\setminus\SS_{0,i}$ . Hence, (e) implies (d), with $\nu_{i}$ as described in (ii).*

Statement (iii) follows from (ii) and the description of the law of $Y_{i}$ in terms of $\nu_{i}$ in the proof above of the implication that (d) implies (c).

Similarly, (iv) follows from (ii), equation (33), and Fubini’s theorem.

Proof A.2 (Proof of Theorem 3.8)

It is sufficient to show statement (a) in Theorem 3.1. By property (i) in Theorem 3.8, the weak convergence in Theorem 3.1(a) already holds for all $i\in K$ , and we need to show that it also holds for all $j\in I\setminus K$ . Choose $j\in I\setminus K$ and let $i=i(j)\in K$ be as in property (ii) of Theorem 3.8

*We will show that $\mathcal{L}(X/X_{j}\mid X_{j}>t)$ converges weakly as $t\to\infty$ to $\Theta_{j}$ whose law is defined in (21). Let $G\subset\SS$ be open and let $\delta>0$ . We have *

[TABLE]

By Theorem 3.1 applied to $K$ , we have $\mathcal{L}(X_{i}/s,X/X_{i}\mid X_{i}>s)\raisebox{-0.5pt}{\,\scriptsize$ \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize $d$ }}}}{{\longrightarrow}} $}\,\operatorname{Pa}(\alpha)\otimes\mathcal{L}(\Theta_{i})$ as $s\to\infty$ . Let $Z$ be a $\operatorname{Pa}(\alpha)$ random variable, independent of $\Theta_{i}$ . By the Portmanteau lemma for weak convergence, we have

[TABLE]

The equality on the second line follows from (ii) and the fact that $Z^{-\alpha}$ is uniformly distributed on $(0,1)$ and independent of $\Theta_{i}$ . Since $\delta>0$ was arbitrary, the monotone convergence theorem yields $\liminf_{t\to\infty}\operatorname{\mathbb{P}}(X/X_{j}\in G\mid X_{j}>t)\geqslant\operatorname{\mathbb{E}}[\mathds{1}\{\Theta_{i}/\Theta_{i,j}\in G\}\,\Theta_{i,j}^{\alpha}]/\operatorname{\mathbb{E}}[\Theta_{i,j}^{\alpha}]$ . Apply the Portmanteau lemma for weak convergence once more to obtain the stated weak convergence.

Proof A.3 (Proof of Theorem 3.9)

The properties of $\rho$ imply that $\SS_{\rho}$ is open and non-empty and that $0<\nu(\SS_{\rho})<\infty$ . The boundary of $\SS_{\rho}$ is $\{x\in\SS_{0,I}:\rho(x_{I})=1\}$ , which is $\nu$ -null set, since its $\nu$ -measure is bounded by the sum over $i\in I$ of $\nu(\{x\in\SS_{0,I}:\rho(x_{I})=1,x_{i}>0\})$ , which is zero by (17). Since $\rho(X_{I})>t$ if and only if $X_{I}/t\in\SS_{\rho}$ , the Portmanteau theorem [25, Theorem 2.1(iv)] implies $b(t)\operatorname{\mathbb{P}}[\rho(X_{I})>t]\to\nu(\SS_{\rho})$ as $t\to\infty$ .

Let $G\subset\mathbb{R}^{d}$ be open. By (iii) of the same Portmanteau theorem,

[TABLE]

The Portmanteau theorem for weak convergence implies the stated weak convergence of $\mathcal{L}(X/t\mid\rho(X)>t)$ as $t\to\infty$ . This proves statement (c) in Theorem 3.1 for the enlarged random vector $(X,\rho(X))$ .

\acks

The author is grateful to two anonymous reviewers whose suggestions have led to various improvements throughout the text. The author also wishes to thank Stefka Asenova and Gildas Mazo for inspiring discussions.

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Asadi, P., Davison, A. C. and Engelke, S. (2015). Extremes on river networks. The Annals of Applied Statistics 9, 2023–2050.
2[2] Basrak, B. and Segers, J. (2009). Regularly varying multivariate time series. Stochastic Processes and Their Applications 119, 1055–1080.
3[3] Bingham, N. H., Goldie, C. M. and Teugels, J. L. (1987). Regular Variation . Cambridge University Press, Cambridge.
4[4] Bortot, P. and Coles, S. (2000). A sufficiency property arising from the characterization of extremes of a Markov chain. Bernoulli 6, 183–190.
5[5] Coles, S. G. and Tawn, J. A. (1991). Modelling extreme multivariate events. Journal of the Royal Statistical Society. Series B (Methodological) 53, 377–392.
6[6] Das, B. and Resnick, S. I. (2011). Conditioning on an extreme component: Model consistency with regular variation on cones. Bernoulli 17, 226–252.
7[7] Dombry, C., Hashorva, E. and Soulier, P. (2018). Tail measure and spectral tail process of regularly varying time series. The Annals of Applied Probability 28, 3884–3921.
8[8] Dombry, C. and Ribatet, M. (2015). Functional regular variations, Pareto processes and peaks over threshold. Statistics and Its Interface 8, 9–17.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

One- versus multi-component regular variation

Abstract

keywords:

1 Introduction

1.1 Tail tree of a Markov tree

1.2 Regular variation

1.3 Outline

2 The spectral tail tree of a Markov tree

Theorem 2.1

Proof 2.2** **(Proof of Theorem 2.1)

Corollary 2.3

Proof 2.4

3 One- versus multi-component regular variation

Theorem 3.1

Corollary 3.2** **(Model consistency)

Proof 3.3

Corollary 3.4** **(Root-change formula)

Proof 3.5

Corollary 3.6

Proof 3.7

Theorem 3.8

Theorem 3.9

Example 3.10

Example 3.11

4 Regularly varying Markov trees

Corollary 4.1

Proof 4.2

Corollary 4.3

Proof 4.4

Example 4.5

5 Absolutely continuous case

Theorem 5.1

Proof 5.2

Example 5.3

Appendix A Proofs for Section 3

Proof A.1** **(Proof of Theorem 3.1)

Proof A.2** **(Proof of Theorem 3.8)

Proof A.3** **(Proof of Theorem 3.9)

Proof 2.2 (Proof of Theorem 2.1)

Corollary 3.2 (Model consistency)

Corollary 3.4 (Root-change formula)

Proof A.1 (Proof of Theorem 3.1)

Proof A.2 (Proof of Theorem 3.8)

Proof A.3 (Proof of Theorem 3.9)