One- versus multi-component regular variation and extremes of Markov trees
Johan Segers

TL;DR
This paper develops a comprehensive theory of multi-component regular variation for Markov trees, analyzing how tail behaviors change with different conditioning variables and establishing connections via a generalized time change formula.
Contribution
It introduces a novel multi-component regular variation framework for Markov trees, extending tail analysis beyond single-component cases and linking tail trees through a generalized formula.
Findings
Weak convergence to tail trees under tail assumptions
Balance of marginal tails leads to a generalized time change formula
Multi-component regular variation applies to broader models beyond Markov trees
Abstract
A Markov tree is a random vector indexed by the nodes of a tree whose distribution is determined by the distributions of pairs of neighbouring variables and a list of conditional independence relations. Upon an assumption on the tails of the Markov kernels associated to these pairs, the conditional distribution of the self-normalized random vector when the variable at the root of the tree tends to infinity converges weakly to a random vector of coupled random walks called tail tree. If, in addition, the conditioning variable has a regularly varying tail, the Markov tree satisfies a form of one-component regular variation. Changing the location of the root, that is, changing the conditioning variable, yields a different tail tree. When the tails of the marginal distributions of the conditioning variables are balanced, these tail trees are connected by a formula that generalizes the time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
One- versus multi-component regular variation
and extremes of Markov trees
Abstract
A Markov tree is a random vector indexed by the nodes of a tree whose distribution is determined by the distributions of pairs of neighbouring variables and a list of conditional independence relations. Upon an assumption on the tails of the Markov kernels associated to these pairs, the conditional distribution of the self-normalized random vector when the variable at the root of the tree tends to infinity converges weakly to a random vector of coupled random walks called tail tree. If, in addition, the conditioning variable has a regularly varying tail, the Markov tree satisfies a form of one-component regular variation. Changing the location of the root, that is, changing the conditioning variable, yields a different tail tree. When the tails of the marginal distributions of the conditioning variables are balanced, these tail trees are connected by a formula that generalizes the time change formula for regularly varying stationary time series. The formula is most easily understood when the various one-component regular variation statements are tied up to a single multi-component statement. The theory of multi-component regular variation is worked out for general random vectors, not necessarily Markov trees, with an eye towards other models, graphical or otherwise.
keywords:
Conditional independence; graphical model; Hüsler–Reiss distribution; max-linear model; Markov tree; multivariate Pareto distribution; Pickands dependence function; regular variation; root change formula; tail measure; tail tree; time change formula.
\authornames
J. SEGERS
\authorone
[UCLouvain]Johan Segers \addressoneUCLouvain, LIDAM/ISBA, Voie du Roman Pays 20, B-1348 Louvain-la-Neuve, Belgium. Email: [email protected]
1 Introduction
Imagine a random vector of nonnegative variables. One of the components, say , is known to have exceeded a large threshold. How does this information affect the conditional distribution of the whole vector ? There could be a causal link from to the other variables , perhaps via a network of dependence relations, so that tampering with would affect the whole system. Another possibility is that a large value of is merely the result of a large value of some other variable . The latter event, however, could have consequences for still other variables .
Depending on which one of the components is known to have been exceptionally large, the conditional distribution of is likely to be different. Still, if high values of two variables and are not unlikely to arrive together, the conditional distribution of given that is large must be connected to the one given that is large.
In this paper, these questions are studied for general random vectors using the language of regular variation. The answers are worked out for the particular case that is a Markov tree. A large value at a particular node is found to spread through the tree via independent increments along the edges. The joint limit distribution is the one of a vector of coupled geometric random walks. The couplings occur through the common edges of different paths starting at the same root node.
Graphical models, of which Markov trees are a special case, bring structure and sparsity to the web of dependence relations between many random variables [23, 38]. Extreme value theory for such models is a fairly recent subject. In [1], a metric that takes the distance along a river into account underlies a spatial model for extremes of river networks. Recursive max-linear models on directed acyclic graphs are proposed in [13] and put to work in [9, 14]. In [17], the density of a multivariate Pareto distribution is factorized through a version of the Hammersley–Clifford theorem. Such factorizations are also the theme in [10], where they form the basis of new inference methods for extremes of graphical models, including the identification of the graphical structure itself. Multivariate Hüsler–Reiss extreme-value copulas based on Gaussian Markov trees and higher-order truncated vines are introduced in [24], who propose composite likelihood methods based on bivariate margins to estimate the parameters.
Multivariate Pareto distributions arise as weak limits of normalized random vectors conditionally on the event that at least one component exceeds a high threshold. Although such conditioning events are covered by Theorem 3.9 below, the focus of this paper is rather on the case where the exceedance is known to have occurred at a specific variable. The message hinted at in the title is that both points of view are mathematically equivalent, but that, at least for Markov trees, the one-component limit is particularly elegant, as will be explained next.
1.1 Tail tree of a Markov tree
For a Markov chain, it was discovered in [35] that, conditionally on the event that the series is large at some time instant, the conditional distribution of the future of the system is that of a random walk, a process called tail chain in [26]. For light-tailed marginal distributions, this random walk is additive, and for heavy-tailed margins it is geometric, i.e., multiplicative, which is the convention used in this paper.
A Markov tree can be viewed as a coupled collection of Markov chains with common stretches. Take for instance the four-variate Markov tree in Figure 1. The nodes of the tree are and the three pairs of neighbours are , and . The vector is a Markov chain, and so is . These two chains are coupled via the common pair . Conditionally on , the variables , and are independent, since any path that connects two of the three nodes 1, 3 and 4 passes through node 2. This conditional independence property together with the distributions of the three pairs , and determines the joint distribution of .
For the moment, assume that the four variables have the same, regularly varying tail function. The set-up involving regular variation will be further motivated in Section 1.2. The effect (not necessarily causal) on of a large value at is via a multiplicative increment whose distribution is equal to the weak limit of conditionally on as . The existence of this limit is an assumption on the Markov kernel induced by the distribution of the pair . Similarly, a large value at affects and via the increments and , respectively. The effect of on is then through the composite increment , whereas on it is through . The conditional independence property ensures that the increments , and are mutually independent. The common edge on the paths from node to node and from node to node induces dependence between the two tail chains and via the common increment . In this paper, the random vector
[TABLE]
is called the tail tree induced by with root at node .
The tail tree represents a network of stochastic dependence relations that are not necessarily causal. Suppose the Markov tree in Figure 1 represents water levels at four locations on a river network. If water flows from left to right, node 2 represents a point where the stream branches into two channels, as occurs for instance in a river delta. If water flows from right to left, however, node 2 represents the junction of two branches coming from nodes 3 and 4 into a larger stream flowing towards node 1. In the first case, the tail tree describes how a high water level at the upstream node 1 may cause high water levels at various locations in the delta further downstream. In the second, case however, it is nodes 3 and 4 that are situated upstream, and the tail tree models the sources of a high water volume at the downstream site 1. Still other set-ups are possible, such as for instance node 3 being upstream and nodes 1 and 4 being downstream: high water levels at nodes 1 and 4 are then related through a common cause at node 2, which can itself perhaps be traced back to node 3.
Whatever the causal relationships within , it may make sense to change the conditioning variable. In Figure 1, for instance, suppose it is known that a large value has occurred at node 3 rather than at node 1. Tracing the paths from node 3 to the three other nodes yields the tail tree with root at node :
[TABLE]
The tail trees in (1) and (2) have a similar structure. The two edges on the path between the root nodes 1 and 3 have changed direction, however. The edge from node 2 to node 4 is common to both tail trees.
For each pair of neighbouring nodes, the choice of the root node determines which of the two increments appears in the tail tree: from to or from to . The distributions of and are connected by an expression that involves the marginal distributions of and . For stationary and reversible Markov chains, this relation underlies a sufficiency property discovered in [4]. For tail chains of not necessarily reversible Markov chains, it was described in [21, 33] and for tail processes of regularly varying stationary time series in [2] via the time change formula. This formula can be understood most easily through the connection between the tail process and the tail measure [7, 27, 32], and this is also the way in which the root change formula in Corollary 3.2 below will be derived, but then without the assumption of stationarity and for general random vectors, not necessarily Markov trees.
1.2 Regular variation
The language of regularly varying functions and measures provides a rich medium through which to express limit theorems. Recall that a positive, Lebesgue measurable function defined on a neighbourhood of infinity is regularly varying with index if for all . If is a nonnegative random variable with unbounded support, cumulative distribution function and tail function , regular variation of with index is equivalent to weak convergence of the conditional distribution of given that to a Pareto random variable with index , i.e., for all . We write \mathcal{L}(X/t\mid X>t)\raisebox{-0.5pt}{\,\scriptsize\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize}}}}{{\longrightarrow}}}\,\operatorname{Pa}(\alpha) as , where denotes the conditional distribution of the random object given the event , the arrow \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsized}}}}{{\longrightarrow}} denotes convergence in distribution, and denotes the said Pareto distribution.
For multivariate distributions, regular variation can be described via multivariate cumulative distribution functions as well, but an approach via convergence of Borel measures is more versatile. Let the state space be . Generalizations to star-shaped metric spaces or abstract cones as in [7, 18, 25, 34] are left for further work. Let denote the non-empty set of indices of variables of which the conditioning event is of possible interest. The marginal distributions of for are assumed to be regularly varying and the ratios of their tail functions are assumed to converge to positive constants. This set-up is a bit more general than the one of identical margins and comes at little technical or notational cost.
The measures involved may have infinite mass but need to assign finite values to sets that remain bounded away from or , depending on the conditioning event. The topology on the space of such measures will be the one proposed in [25], extending [18], and resembles the one of vague convergence of measures, but avoiding the need to consider artificially compactified spaces. Regular variation is defined as convergence of to a limit measure called tail measure. Here, is a scale function tending to infinity and calibrated to the marginal distributions of for .
It is instructive to formulate statements in terms of weak convergence of distributions. For a high threshold tending to infinity and for a component , consider the asymptotic distribution of the rescaled random vector given that . Decompose as . Here, represents the overall level of with respect to whereas represents a self-normalized version of . Convergence in distribution of given as is a special case of what is called one-component regular variation in [17], explored already in [16, 30] for the bivariate case but allowing for affine normalizations. The random variable is asymptotically distributed and independent of , whose weak limit, denoted by , captures extremal dependence within given that is large. Letting the index run through produces multiple such one-component regular variation statements, which, together, are equivalent to what can be called multi-component regular variation. The limit distributions that arise for various indices must be mutually consistent, and the tail measure mentioned at the end of the previous paragraph embraces them all at once.
In Section 3, the focus is on tying together multiple one-component regular variation limits. The theory is worked out for general random vectors, not necessarily Markov trees. A number of results in that section have already been formulated in the literature in one way or another, in slightly different settings. Some of the equivalence relations in Theorem 3.1, for instance, resemble those in [17, Theorem 1.4] and [34, Proposition 3.1]. The model consistency property between limit measures in Theorem 3.1(ii) is formulated in [6, Section 2] for the bivariate case. The root change formula in Corollary 3.4 extends the time change formula for regularly varying stationary time series stemming from [2] and studied extensively in [7, 20]. Multivariate Pareto distributions as in Theorem 3.9 are foreshadowed in [29, Section 6.3] and appear in [11, 31] when and in [8] for more general functionals . These are just a few connections, and the above list is by no means intended to be complete.
The set-up involving regular variation is intended to serve two purposes. First, to model tail dependence within a vector of random variables which have been transformed to the same, heavy-tailed distribution, such as the unit-Fréchet distribution, as is common in multivariate extreme value theory. Second, to model the joint distribution of a vector of regularly varying random variables, not necessarily identically distributed, but with equivalent tails, such as returns on financial portfolios composed of the same basket of underlying assets. The latter framework is more general than the former and comes at little additional notational cost.
1.3 Outline
For a Markov tree , convergence as of the conditional distribution of given that is proved in Section 2. The main assumption is that, for edges directed away from the root , the conditional distribution of given converges as . No regular variation is needed yet.
The tail trees pertaining to different roots can be linked up thanks to the theory of one- and multi-component regular variation developed in Section 3. The results do not rely on the Markov property and cover quite general random vectors on , as is illustrated briefly for max-linear models. An interesting special case of these are the recursive max-linear structural equation models introduced in [13], featuring a causal structure induced by a directed acyclic graph. Most of the proofs of this section are deferred to the Appendix.
When combined, the results in Section 2 and 3 serve to uncover the regular variation properties of Markov trees in Section 4. The common special case that the joint distribution of the Markov tree is absolutely continuous with respect to Lebesgue measure is the subject of Section 5. The theory then simplifies considerably and the limit distribution with respect to a single root is already sufficient to reconstruct the limit distributions with respect to all other possible roots .
In Sections 4 and 5, the distributions of the increments of the tail trees are calculated in case the pair distributions are max-stable, not necessarily absolutely continuous. For the Hüsler–Reiss distribution max-stable distribution, the tail tree is multivariate log-normal, constructed from partial sums of independent normal random variables along the edges of the tree.
2 The spectral tail tree of a Markov tree
A (finite) graph is a pair where is a non-empty finite set of vertices or nodes and where is a set of edges. Self-loops are excluded, i.e., for all . To avoid trivialities, is assumed to have at least two elements. Two nodes are neighbours if they are joined by an edge. A graph is undirected if implies . A path from a node to a node is a collection of edges such that for all , for distinct nodes such that and . An undirected tree is an undirected graph such that for any pair of distinct nodes and , there exists a unique path from to , and this path is then denoted by .
Let be an undirected tree and let be a random vector indexed by the nodes of the tree. The pair is a Markov tree if it satisfies the global Markov property [23]: whenever are disjoint, non-empty subsets of such that separates and (i.e., any path between a node and a node passes through some node in ), the conditional independence relation
[TABLE]
holds, where denotes the random vector for .
For an undirected tree and a node , let denote the directed, rooted tree that consists of directing the edges in outward starting from . Formally, is the subset of that is obtained by choosing for every pair of edges and in the one such that the first node separates the second one from . If , then is the (necessarily unique) parent of in whereas is a child of in .
Let be a nonnegative Markov tree, where is an undirected tree.
{condition}
There exists with the following two properties.
- (i)
For every directed edge , there exists a version of the conditional distribution of given and a probability measure on such that
[TABLE] 2. (ii)
For edges such that and such that there exists an edge for which , we have
[TABLE]
Assumption 2(ii) is similar to [26, equation (3.4)] and prevents non-extreme values to cause extreme ones. A similar assumption is [33, equation (2.4)], where it is illustrated [33, Example 7.5] what can go wrong without it.
Theorem 2.1
Let be a nonnegative Markov tree on . Assume Condition 2. Let be a vector of independent random variables such that the law of is for all . Then
[TABLE]
where and
[TABLE]
The random vector is called the tail tree of the Markov tree , adapting terminology for Markov chains in [26]. In Figure 2, the tail tree is illustrated for a tree with seven nodes. For subvectors where all nodes in lie on the same path starting at , the structure of the tail tree is that of a geometric random walk; take for instance and in Figure 2. The tail tree couples several geometric random walks together through the common edges in the underlying paths: in the same figure, consider for instance the vectors indexed by and by , respectively, which share the initial edge .
Proof 2.2** **(Proof of Theorem 2.1)
Put . The proof is by induction on .
If has only two elements, i.e., , then Condition 2(i) already confirms the convergence stated in (6) and (7). Therefore, we can henceforth assume that has at least three elements, i.e., . Identify with in such a way that the root is and such that if then . Since , we do not need to consider the components and in (6).
Step 1.* — Let denote the parent of in the directed tree , that is, is the unique node in such that is an edge in . Our way of numbering nodes implies that cannot be the parent of every other node. Condition 2 is then satisfied also for the nonnegative Markov tree on the tree that is obtained from by removing node from and edges and from . The induction hypothesis then means that, for every bounded, continuous function , we have*
[TABLE]
the joint distribution of being given by (7).
Let be a Lipschitz function. We will show that
[TABLE]
Recall that denotes the parent node of . We need to distinguish between two cases: is the root or is a non-root vertex. The case is similar to but easier than the case and is left to the reader. We assume henceforth that .
Step 2.* — Let be such that . We have*
[TABLE]
We will show that the expression (9) converges to zero as (Step 3). Moreover, we will find a bound for the limit superior of (10) as . The bound will depend on but will converge to zero as (Step 4). Together, these properties of (9) and (10) are sufficient to prove the theorem (Step 5).
Step 3: The term (9).* — The vertex is the parent of in , and therefore it separates from the other vertices. By the conditional independence property (3),*
[TABLE]
To explain our notation: the integral is over and is with respect to the conditional distribution of given that . The integrand involves the conditional expectation of a function of given that .
We change variables and integrate with respect to the conditional distribution of given that : we get
[TABLE]
where the integrand in (11) is given by
[TABLE]
By Assumption 2(i), we have X_{d}/x_{k}\mid X_{k}=x_{k}\raisebox{-0.5pt}{\,\scriptsize\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize}}}}{{\longrightarrow}}}\,M_{k,d} as . Define
[TABLE]
Recall that is bounded and (Lipschitz) continuous. By the extended continuous mapping theorem [37, Theorem 18.11], we have, for all vectors such that and for all functions such that as , the limit relation
[TABLE]
Moreover, \mathcal{L}(X_{1:(d-1)}/t\mid X_{0}=t)\raisebox{-0.5pt}{\,\scriptsize\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize}}}}{{\longrightarrow}}}\,\mathcal{L}(\Theta_{1:(d-1)}) as by the induction hypothesis. By the same extended continuous mapping theorem, the integral (11) converges to
[TABLE]
Recall that is a vector of independent random variables such that the law of is for . By construction, and are then independent too: each component of is a product of random variables with and thus . The above integral may therefore be simplified to
[TABLE]
since . It follows that the limit of (9) as is equal to zero.
Step 4: The term (10).* — We consider two cases: (Step 4.a) and (Step 4.b).*
Step 4.a: The case .* — Since , the integral (10) is bounded by*
[TABLE]
By the induction hypothesis, this sum converges to as . The latter probability converges to zero as , as required.
Step 4.b: The case .* — We decompose (10) into three terms:*
[TABLE]
Let be such that for all . Furthermore, recall that , so that also for all .
Step 4.b.i: The term (12).* — The term (12) is bounded by*
[TABLE]
The node separates the nodes [math] and . By the global Markov property, the expectation on the right-hand side of (15) is therefore equal to
[TABLE]
Let . The conditional expectation in the integrand in (16) satisfies
[TABLE]
Therefore, the integral in (16) is bounded by
[TABLE]
By Assumption 2(ii), we can first take the limit superior as and then the limit superior as to find that
[TABLE]
Since can be chosen arbitrarily close to zero, we find that the double limit superior above is equal to zero.
Step 4.b.ii: The term (13).* — By the induction hypothesis, the term (13) converges to zero as .*
Step 4.b.iii: The term (14).* — Since , the term (14) is bounded by*
[TABLE]
By the dominated convergence theorem, the expectation on the right-hand side converges to zero as .
Step 5.* — The terms (9) and (10) were analyzed in Steps 3 and 4, respectively. In Step 3, it was shown that the term (9) converges to zero as , for any such that . In Step 4, it was shown that the limit superior as of the term in (10) is bounded by a quantity depending on which converges to zero as . Since the expression in (8) does not depend on , its limit as must thus be zero.*
This completes the proof of the induction step and thus of the theorem.
Corollary 2.3
In the setting of Theorem 2.1, also \mathcal{L}(X/X_{u}\mid X_{u}>t)\raisebox{-0.5pt}{\,\scriptsize\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize}}}}{{\longrightarrow}}}\,\Theta_{u} as .
Proof 2.4
For a bounded and continuous function , we have
[TABLE]
Given , Theorem 2.1 allows us to find sufficiently high such that the absolute value inside the integral is bounded by for all . But then the left-hand side in the previous display is bounded by too, for all . Since was arbitrary, the stated convergence in distribution follows.
3 One- versus multi-component regular variation
Let be a random vector of nonnegative variables. Upon an obvious change in notation, Corollary 2.3 concerned weak convergence of as for some . This convergence plus regular variation of the marginal distribution of is a special case of what is called one-component regular variation in [17]. The weak limit, , depends on the choice of . There may be good reasons to consider these limits for several indices . Let be the set of all indices for which such a limit exists. How are these random vectors related?
In this section, several such one-component statements are combined into a single one which could be called multi-component regular variation. If , this is just ordinary multivariate regular variation. As discussed already in Section 1.2, the connections between the limits generalize the time change formula for stationary regularly varying time series and can be deduced from their connections to a limiting tail measure.
Let for some positive integer and let be non-empty. For , put . Define . Let denote the collection of Borel measures on with the property that is finite for every Borel set of that is contained in a set of the form for some . Let denote the collection of bounded, continuous functions for which there exists such that as soon as . Let be equipped with the smallest topology that makes the evaluation mappings continuous, where ranges over . This is the notion of convergence in [25], with, in their notation, and . The topology just defined is metrizable, turning into complete, separable metric space, with convenient characterizations of relative compactness, a Portmanteau theorem, and a mapping theorem, all very much in the spirit of the notion of vague convergence of Borel measures on locally compact second countable Hausdorff spaces. Convergence of measures with respect to this topology is denoted by the arrow \stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize0}}}}{{\longrightarrow}} . If is just a singleton, say, then notation is simplifed from to and so on. For , let denote the Pareto distribution on with shape parameter , that is, the distribution of a random variable such that for . Product measure is denoted by .
Theorem 3.1
Let be a random vector in and let be non-empty. Let and for and . Assume there exists a function , regularly varying at infinity with index , such that for . The following statements are equivalent:
- (a)
For every we have \mathcal{L}(X/X_{i}\mid X_{i}>t)\raisebox{-0.5pt}{\,\scriptsize\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize}}}}{{\longrightarrow}}}\,\mathcal{L}(\Theta_{i}) as for some random vector on . 2. (b)
For every we have \mathcal{L}(X_{i}/t,X/X_{i}\mid X_{i}>t)\raisebox{-0.5pt}{\,\scriptsize\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize}}}}{{\longrightarrow}}}\,\operatorname{Pa}(\alpha)\otimes\mathcal{L}(\Theta_{i}) as for some random vector on . 3. (c)
For every we have \mathcal{L}(X/t\mid X_{i}>t)\raisebox{-0.5pt}{\,\scriptsize\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize}}}}{{\longrightarrow}}}\,\mathcal{L}(Y_{i}) as for some random vector on . 4. (d)
For every there exists such that b(t)\operatorname{\mathbb{P}}(X/t\in\,\cdot\,)\raisebox{-0.5pt}{\,\scriptsize\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize}}}}{{\longrightarrow}}}\,\nu_{i} as in . 5. (e)
There exists such that b(t)\operatorname{\mathbb{P}}(X/t\in\,\cdot\,)\raisebox{-0.5pt}{\,\scriptsize\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize}}}}{{\longrightarrow}}}\,\nu as in .
In that case, the limiting objects are connected in the following ways: for all ,
- (i)
* is equal in distribution to , where and and are independent;* 2. (ii)
* is equal to the restriction of to ;* 3. (iii)
; 4. (iv)
for every Borel measurable , we have
[TABLE]
The proof of Theorem 3.1, together with the proofs of the other theorems in this section, is given in Appendix A.
To highlight the connection with the theory of one-component regular variation in [17], note that the random vector taking values in in [17, Theorem 1.4] plays the same role as the random vector in Theorem 3.1 above. The equivalence between (ii) and (iv) in the cited theorem is then the same as the equivalence between (a) and (b) in Theorem 3.1.
Further, note that in Theorem 3.1(a), for such that is not degenerate at [math], we necessarily have , i.e., the tail of is at least as heavy as the one of . If also , we can reverse the roles of and to find that the condition that the tails of all variables with are balanced is almost forced.
Apart from the characterizations (a)–(e) in Theorem 3.1, other equivalent ones are possible, for instance, involving sequences rather than functions, with a scaling function inside the probability rather than outside, or with respect to radial and ‘angular’ coordinates for some appropriate functional . See for instance [29, Theorem 6.1] and [25, Theorem 3.1]. The tail measures and are homogeneous with index [25, Theorem 3.1] and, upon a coordinate transformation, can be written as product measures. Since the focus here is on the weak limits , these properties are not further elaborated upon. Statement (e) in Theorem 3.1 implies that the vector is multivariate regularly varying with limit measure on , which in turn implies, among other things, that it is in the domain of attraction of a multivariate max-stable distribution with Fréchet margins and exponent measure ; see for instance [28, 29].
A noteworthy special case of (17) is when is the indicator function of the orthant , where has a non-empty intersection with and where for all . If , then , and thus
[TABLE]
A remarkable consequence is that the right-hand side does not depend on the choice of . This invariance property is a special case of a more general mutual consistency property of the limit distributions for that is formulated in Corollary 3.2 below.
Corollary 3.2** **(Model consistency)
If the equivalent conditions of Theorem 3.1 are fulfilled, then, for Borel measurable and for , we have
[TABLE]
Proof 3.3
By (17), both sides in (19) are equal to .
Corollary 3.4** **(Root-change formula)
If the equivalent conditions of Theorem 3.1 are fulfilled, then, for all Borel measurable and for all , we have
[TABLE]
In particular, if and only if , and then, for all as above,
[TABLE]
Proof 3.5
In Corollary 3.2, take . As , the left-hand side of (19) is . The right-hand side of (19) becomes , in which the indicator function is redundant.
The special case in (20) yields . If , we have , and the indicator function on the left-hand side of (20) can be omitted.
If the limit measure does not assign any mass to the coordinate hyperplane , the indicator in (17) is redundant and can be expressed entirely in terms of . Moreover, whether this occurs or not can be read off from the -th moments of the components of .
Corollary 3.6
In Theorem 3.1, we have, for ,
[TABLE]
If for some , then, for all Borel measurable ,
[TABLE]
Moreover, all tail trees for are determined by via (21).
Proof 3.7
Choose . In (17), let be the indicator function of the set and let the index in (17) be equal to the index chosen here. It follows that is zero if and infinity otherwise. By (20) with , we have if and only if if and only if . Equation (22) follows. If , we can omit the indicator on the left-hand side of (17), yielding (23).
Let be the indicator function of the set , where is non-empty and where for all . If , then, by (23),
[TABLE]
In contrast to equation (18), equation (24) is true only when , a prerequisite for which Corollary 3.6 gives a necessary and sufficient condition.
By Corollary 3.6, the case where for all leads to considerable simplifications. In fact, the special case in the next theorem implies that even the weak convergence of for can then be deduced from the weak convergence of alone.
Theorem 3.8
In Theorem 3.1, a sufficient condition for (a)–(e) to hold is that there exists a non-empty set with the following two properties:
- (i)
For every we have \mathcal{L}(X/X_{i}\mid X_{i}>t)\raisebox{-0.5pt}{\,\scriptsize\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize}}}}{{\longrightarrow}}}\,\mathcal{L}(\Theta_{i}) as for some random vector on . 2. (ii)
For every , there exists such that .
*In that case, also \mathcal{L}(X/X_{j}\mid X_{j}>t)\raisebox{-0.5pt}{\,\scriptsize\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize}}}}{{\longrightarrow}}}\,\mathcal{L}(\Theta_{j}) as for , where the law of is given in terms of the one of with via (21). *
The focus so far has been on weak limits of conditional distributions involving a high-threshold exceedance by a specific component. In the spirit of the multivariate peaks-over-thresholds methodology [10, 22], the following result covers, among other possibilities, the case where the conditioning event involves a high-threshold exceedance in at least one of a number of components.
Theorem 3.9
Suppose the conditions of Theorem 3.1 are fulfilled. Let be continuous and homogeneous of order one, that is, for and . If is contained in a set of the form for some and if , then as and \mathcal{L}(X/t\mid\rho(X)>t)\raisebox{-0.5pt}{\,\scriptsize\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize}}}}{{\longrightarrow}}}\,\nu(\,\cdot\,\cap\SS_{\rho})/\nu(\SS_{\rho}) as . The conclusions of Theorem 3.1 thus apply to the random vector in the space and relative to the index set .
Examples of in Theorem 3.9 are and , where . The special case and produces multivariate Pareto distributions as in [29, Section 6.3] and other references mentioned in Section 1.2. Also covered by Theorem 3.9 is for non-empty and for all provided for some (and hence all) . If, however, , then decays more rapidly than , and more refined models are needed, opening up a whole new world of possibilities.
Example 3.10
Let follow the max-linear model
[TABLE]
where are scalars such that for all and where are independent and identically distributed nonnegative random variables whose common distribution function has a regularly varying tail function with index . The marginal tails satisfy as . If exceeds a large threshold , the probability that this was due to is proportional to , and then the other factors for are of smaller order than . It follows that (a) in Theorem 3.1 holds where the law of is discrete with at most atoms and is given by
[TABLE]
with denoting a unit point mass at . From (26), we find
[TABLE]
It follows that as soon as for every such that . Indeed, in that case, if is large, then some variable with such that is positive was large, which in turn implies that is large as well, so that .
Example 3.11
Recursive max-linear models on directed acyclic graphs were introduced in [13]. Borrowing some of their notation, consider a directed acyclic graph with nodes and edges , where denotes the possibly empty set of parents of . Consider a random vector given by the structural equation model
[TABLE]
where the random variables are as in Example 3.10 with and where all coefficients and are (strictly) positive; the maximum over the empty set is zero by convention. Then by [14, Theorem 2.2], the random vector admits the max-linear representation
[TABLE]
with coefficients , for , defined as follows: and if , while
[TABLE]
where is the collection of paths from to in ; recall the definition of a path in the beginning of Section 2. The representation in (28) is of the form (25) with and with for . It follows that unless or .
The condition that whenever is satisfied as soon as : indeed, in that case, we have , so that implies and thus , implying . Through the structural equation model (27), a large value appearing at a node will also be felt at any of its descendants . We get for , which, by Corollary 3.4, means that the root-change formula (21) applies, by which the law of can be recovered from the one of . Moreover, Theorem 3.8 applies with equal to the set of leaf nodes, i.e., the nodes without descendants.
In the special case that the directed acyclic graph is also a directed, rooted tree, every node has either exactly one parent or is equal to the root node, say . In that case, the collection of paths between and is a singleton, , and the formula for in (28) simplifies to . Furthermore, the tail tree in (26) starting from the root node simplifies to the degenerate distribution at the point with coordinates for . This is of the form in (7) with degenerate increments for all .
4 Regularly varying Markov trees
As in Section 2, let be a nonnegative Markov tree on the undirected tree . The general theory in Section 3 sheds light on the relation between two tail trees emanating at different roots. For two different nodes and in , the sets of directed edges and are the same except for the edges connecting nodes on the path between and , which are directed in opposite ways in the two edge sets: For every , we have and the other way around.
Condition 2 was formulated relative to a single root . The next condition covers all nodes in a non-empty subset of as possible roots. For such , let denote the set of directed edges that appear in at least one of the directed trees .
{condition}
There exists a non-empty with the following two properties:
- (i)
For every , there exists a version of the conditional distribution of given and a probability measure on such that (4) holds. 2. (ii)
For every edge for which there exists such that and an edge such that , we have (5).
If in Condition 4 then every node that is on the path between and can be added to and Condition 4 remains true. Indeed, for such , we have , which takes care of (i), and for every node , which takes care of (ii). The author is grateful to an anonymous reviewer for having pointed this out.
Corollary 4.1
Let be a nonnegative Markov tree on the undirected tree . Let be non-empty. Let and for all . Assume that there exists a positive function , regularly varying at infinity with index , such that as for every . Assume that Condition 4 holds. Let be a vector of independent random variables such that has law for each . Then all conclusions of Theorem 3.1 hold with and with the tail tree in (7) for .
Proof 4.2
Condition 4 and Corollary 2.3 imply that assumption (a) in Theorem 3.1 is satisfied for and with the tail tree in (7), for every . All equivalence relations and other properties are then as stated in Theorem 3.1.
Corollary 4.3
In Corollary 4.1, if are neighbours in and if they both belong to , then the distributions of and mutually determine each other by
[TABLE]
for all Borel measurable .
Proof 4.4
To find (29), apply (20) to the case and the random vector . The two limit random vectors in Theorem 3.1(a) are and when conditioning on and on , respectively. Equation (29) implies for all , so that the distribution of can be recovered from the one of .
For different roots , the tail trees and have the same multiplicative structure. The differences between their distributions lie in the starting nodes of the paths and in the distributions of the multiplicative increments for edges on the paths and , since these edges change direction. For such edges of which the nodes belong to as well, the increment distributions are related by (29). See Figure 3 for an illustration.
For , the equality has interesting ramifications, see Corollaries 3.4 and 3.6 and Theorem 3.8. If all nodes on the path between and belong to as well, then, since
[TABLE]
we have as soon as for every .
Given the tree structure, the distribution of a Markov tree on is entirely determined by the bivariate distributions for . Markov chains of which all pairs are max-stable were proposed in [5, Section 4.6] and [36]. When extended to trees, this construction method provides models meeting Condition 4.
Example 4.5
Let the distribution of the random pair on be bivariate max-stable with cumulative distribution function
[TABLE]
where is a Pickands dependence function, that is, a convex function such that for all ; see [15] and the references therein. Both marginal distributions are unit-Fréchet, for . In particular, the marginal tail functions are regularly varying at infinity with index .
Let be the left-hand derivative of , which exists everywhere on , takes values between and , and is non-decreasing and continuous from the left; define as the right-hand limit. Since is convex, it is absolutely continuous, and the set of points in where it is not continuously differentiable is at most countable. For such that is differentiable at , we have
[TABLE]
It follows that \mathcal{L}(Y/x\mid X=x)\raisebox{-0.5pt}{\,\scriptsize\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize}}}}{{\longrightarrow}}}\,M as , where
[TABLE]
This is part (i) of Condition 2. Further, equation (5) in part (ii) of Condition 2 follows from the monotone regression dependence property of bivariate max-stable distributions established in [12], by which the supremum over in (5) is attained in and the limit superior as is bounded by , which tends to [math] as for every fixed .
*This construction using bivariate max-stable distributions is in some sense generic. Given a random variable on with expectation , one can define a Pickands dependence function by for , and then (31) holds. The extension to general exponents and tail constants is straightforward. *
5 Absolutely continuous case
If the joint distribution of the Markov tree on is absolutely continuous with respect to the Lebesgue measure on , the formulations of the conditions and results simplify considerably. Let denote the joint probability density function of and let , for , denote the marginal density of .
By the Hammersley–Clifford theorem [23, Theorem 3.9], is a Markov tree as soon as the joint density factorizes as
[TABLE]
The second product is over all unordered pairs of neighbours and denotes the bivariate density function of .
For such that , the density of is for . The following condition replaces Condition 4.
{condition}
For every , there exists a probability density function on such that
[TABLE]
Theorem 5.1
Let the random vector on be a Markov tree on the undirected tree with joint density function . Assume there exists a positive function , regularly varying at infinity with index , such that as for every . If Condition 5 holds, then the conditions of Corollary 4.1 are satisfied with , the same constants , and auxiliary function . For all pairs of neighbours , the density of is and for almost every , we have
[TABLE]
Moreover, for all , so that b(t)\operatorname{\mathbb{P}}(X/t)\raisebox{-0.5pt}{\,\scriptsize\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize}}}}{{\longrightarrow}}}\,\nu as , where satisfies
[TABLE]
for every and for every Borel measurable , with the tail tree in (7). Moreover, all tail trees are connected through (21).
Proof 5.2
The function is regularly varying at infinity with index too. By Karamata’s theorem [3, Proposition 1.5.10], we have and thus as .
Condition 4 with follows from Condition 5 and Scheffé’s theorem. Part (ii) of Condition 4 is void, since for every .
If are neighbours, we can apply (29) to , where , to find
[TABLE]
Since this is true for every , we must have for almost every , whence (32).
Since does not have an atom at [math], the identity (29) with implies that . Apply (30) and the observation on the line just below that equation to see that for all . By Corollary 3.4, all tail trees are then connected via (21).
Finally, -convergence to with the stated expression follows from Theorem 3.1 and Corollary 3.6.
Example 5.3
In Example 4.5, assume that is twice continuously differentiable on and that and . The distribution of is then absolutely continuous and the conditional density of given that converges as to the function
[TABLE]
The conditions on imply that and , so that is a probability density function with first moment equal to . Moreover, replacing the function by the Pickands dependence function amounts to changing by the function , in line with (32) with and .
An interesting example in this respect is the bivariate Hüsler–Reiss distribution [19] with Pickands dependence function
[TABLE]
for . Here is a parameter and is the standard normal cumulative distribution function. After tedious calculations, we find that is given by the density of the lognormal random variable , where is a standard normal random variable. Note indeed that . Moreover, the density function satisfies for all , which is (32) with and and . This also follows from the symmetry of the Hüsler–Reiss Pickands dependence function, i.e., for all , so that the pair is exchangeable.
*If all neighbouring pairs for of the Markov tree follow such Hüsler–Reiss max-stable distributions, the joint distribution of the tail tree is multivariate log-normal, since for all , where the random variables are independent and normally distributed with expectation and variance , with dependence parameter for all . *
Appendix A Proofs for Section 3
Proof A.1** **(Proof of Theorem 3.1)
(a) and (b) are equivalent.* — Clearly, (b) implies (a). Conversely, assume (a); let us show (b). Let and let be such that for all . We have*
[TABLE]
*It follows that as , where is a random variable. *
(b) implies (c) and (i).* — Since , statement (b) and the continuous mapping theorem [37, Theorem 2.3] imply that converges weakly to , where is a random variable independent of . Since almost surely, we have .*
(c) implies (a).* — Since , statement (c) and the continuous mapping theorem imply statement (a) with .*
(b) implies (d).* — Define a Borel measure on by*
[TABLE]
If is a Borel subset of contained in for some , then as soon as , since almost surely. As a consequence, for such . It follows that .
By linearity of the integral and by monotone convergence, we find that
[TABLE]
for every nonnegative Borel measurable function on . The same expression is then true for real-valued Borel measurable functions on for which at least one of the two integrals with replaced by is finite. This includes bounded, Borel measurable functions that vanish on a set of the form for some .
Let and let be such that as soon as . By (b), we have
[TABLE]
where is a random variable, independent of . The limit is equal to
[TABLE]
since almost surely whenever , as almost surely.
(d) implies (c).* — For , we have as , and thus by (d). For open , the Portmanteau theorem [25, Theorem 2.1(iii)] yields*
[TABLE]
By the Portmanteau lemma for weak convergence [37, Lemma 2.2] we obtain (c) where the law of is .
(d) implies (e).* — For every , we have*
[TABLE]
Since the limit is finite for every and since it converges to zero as , it follows by the relative compactness criterion in [25, Theorem 2.5] that for every sequence tending to infinity, there exists a subsequence along which converges in . To show (e), we then need to show that these subsequence limits must coincide. To do so, we show that for every , the limit of exists as . This fixes the value of the integral of such with respect to all subsequence limits, which then must be the same.
For , let be the piece-wise linear function
[TABLE]
Put . Write . Then
[TABLE]
For we can find such that if . Then for all , and thus where, for , we have
[TABLE]
Each function belongs to too but has moreover the property that as soon as . The restriction of to thus belongs to . By (d),
[TABLE]
The existence of a limit has thus been shown, and convergence in to some measure as stated in (e) follows.
(e) implies (d), (ii), (iii) and (iv).* — A function in can be extended to a function in denoted by the same symbol by putting for . Hence, (e) implies (d), with as described in (ii).*
Statement (iii) follows from (ii) and the description of the law of in terms of in the proof above of the implication that (d) implies (c).
Similarly, (iv) follows from (ii), equation (33), and Fubini’s theorem.
Proof A.2** **(Proof of Theorem 3.8)
It is sufficient to show statement (a) in Theorem 3.1. By property (i) in Theorem 3.8, the weak convergence in Theorem 3.1(a) already holds for all , and we need to show that it also holds for all . Choose and let be as in property (ii) of Theorem 3.8
*We will show that converges weakly as to whose law is defined in (21). Let be open and let . We have *
[TABLE]
By Theorem 3.1 applied to , we have \mathcal{L}(X_{i}/s,X/X_{i}\mid X_{i}>s)\raisebox{-0.5pt}{\,\scriptsize\stackrel{{\scriptstyle\raisebox{-0.5pt}{\mbox{\scriptsize}}}}{{\longrightarrow}}}\,\operatorname{Pa}(\alpha)\otimes\mathcal{L}(\Theta_{i}) as . Let be a random variable, independent of . By the Portmanteau lemma for weak convergence, we have
[TABLE]
The equality on the second line follows from (ii) and the fact that is uniformly distributed on and independent of . Since was arbitrary, the monotone convergence theorem yields . Apply the Portmanteau lemma for weak convergence once more to obtain the stated weak convergence.
Proof A.3** **(Proof of Theorem 3.9)
The properties of imply that is open and non-empty and that . The boundary of is , which is -null set, since its -measure is bounded by the sum over of , which is zero by (17). Since if and only if , the Portmanteau theorem [25, Theorem 2.1(iv)] implies as .
Let be open. By (iii) of the same Portmanteau theorem,
[TABLE]
The Portmanteau theorem for weak convergence implies the stated weak convergence of as . This proves statement (c) in Theorem 3.1 for the enlarged random vector .
\acks
The author is grateful to two anonymous reviewers whose suggestions have led to various improvements throughout the text. The author also wishes to thank Stefka Asenova and Gildas Mazo for inspiring discussions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Asadi, P., Davison, A. C. and Engelke, S. (2015). Extremes on river networks. The Annals of Applied Statistics 9, 2023–2050.
- 2[2] Basrak, B. and Segers, J. (2009). Regularly varying multivariate time series. Stochastic Processes and Their Applications 119, 1055–1080.
- 3[3] Bingham, N. H., Goldie, C. M. and Teugels, J. L. (1987). Regular Variation . Cambridge University Press, Cambridge.
- 4[4] Bortot, P. and Coles, S. (2000). A sufficiency property arising from the characterization of extremes of a Markov chain. Bernoulli 6, 183–190.
- 5[5] Coles, S. G. and Tawn, J. A. (1991). Modelling extreme multivariate events. Journal of the Royal Statistical Society. Series B (Methodological) 53, 377–392.
- 6[6] Das, B. and Resnick, S. I. (2011). Conditioning on an extreme component: Model consistency with regular variation on cones. Bernoulli 17, 226–252.
- 7[7] Dombry, C., Hashorva, E. and Soulier, P. (2018). Tail measure and spectral tail process of regularly varying time series. The Annals of Applied Probability 28, 3884–3921.
- 8[8] Dombry, C. and Ribatet, M. (2015). Functional regular variations, Pareto processes and peaks over threshold. Statistics and Its Interface 8, 9–17.
