Regression Type Models for Extremal Dependence

Linda Mhalla; Miguel de Carvalho; Val\'erie Chavez-Demoulin

arXiv:1704.08447·stat.ME·November 28, 2017

Regression Type Models for Extremal Dependence

Linda Mhalla, Miguel de Carvalho, Val\'erie Chavez-Demoulin

PDF

TL;DR

This paper introduces a flexible modeling framework for understanding how covariates influence the dependence structure of multivariate extreme values, with applications to temperature extremes.

Contribution

It develops a vector generalized additive model for covariate effects on angular densities in multivariate extremes, including estimation and theoretical properties.

Findings

01

The method accurately recovers covariate-adjusted angular densities in simulations.

02

Application reveals significant dependence dynamics of extreme temperatures between resorts.

03

Estimation procedure is consistent and asymptotically normal.

Abstract

We propose a vector generalized additive modeling framework for taking into account the effect of covariates on angular density functions in a multivariate extreme value context. The proposed methods are tailored for settings where the dependence between extreme values may change according to covariates. We devise a maximum penalized log-likelihood estimator, discuss details of the estimation procedure, and derive its consistency and asymptotic normality. The simulation study suggests that the proposed methods perform well in a wealth of simulation scenarios by accurately recovering the true covariate-adjusted angular density. Our empirical analysis reveals relevant dynamics of the dependence between extreme air temperatures in two alpine resorts during the winter season. Supplementary materials for this article are available online.

Tables3

Table 1. Table 1: Mean integrated absolute error (MIAE) estimates computed from 500 500 500 samples for the covariate-adjusted spectral densities in Examples 1 – 3 ; n 𝐫 subscript 𝑛 𝐫 n_{\mathbf{r}} denotes the number of angular observations.

$n_{𝐫}$	Covariate-adjusted angular density	MIAE
$300$	Logistic	$0.3936$
	Dirichlet	$0.3337$
	Hüsler–Reiss	$0.2463$
$500$	Logistic	$0.3538$
	Dirichlet	$0.2600$
	Hüsler–Reiss	$0.2016$

Table 2. Table 2: Selected models in each family of angular densities along with their AICs. The link functions g 𝑔 g are the logit function for the logistic model and the logarithm function for the Dirichlet and the Hüsler–Reiss models. The functions f ^ ^ 𝑓 \hat{f} with subscripts t 𝑡 t , z 𝑧 z , and d 𝑑 d are fitted smooth functions of time, NAO, and day in season, respectively.

Covariate-adjusted angular density	VGAM	AIC
Logistic	$\hat{α} (t, z, d) = g^{- 1} {{\hat{α}}_{0} + \hat{f_{t}} (t) + \hat{f_{z}} (z) + \hat{f_{d}} (d)}$	$- 280.15$
Dirichlet	$\hat{α} (z) = g^{- 1} {{\hat{α}}_{0} + \hat{f_{z}} (z)}$	$- 290.05$
Dirichlet	$\hat{β} (t, d) = g^{- 1} {{\hat{β}}_{0} + \hat{f_{t}} (t) + \hat{f_{d}} (d)}$	$- 290.05$
Hüsler–Reiss	$\hat{λ} (t, z, d) = g^{- 1} {{\hat{λ}}_{0} + \hat{f_{t}} (t) + \hat{f_{z}} (z) + \hat{f_{d}} (d)}$	$- 275.64$

Table 3. Table 3: Selected models in each family of angular densities along with their AICs. The link functions g 𝑔 g are the logit function for the logistic model and the logarithm function for the Dirichlet and the Hüsler–Reiss models. The functions f ^ ^ 𝑓 \hat{f} with subscripts t 𝑡 t and d 𝑑 d are fitted smooth functions of time and day in season, respectively.

Covariate-adjusted angular density	VGAM	AIC
Logistic	$\hat{α} (d) = g^{- 1} {{\hat{α}}_{0} + \hat{f_{d}} (d)}$	$- 402.76$
Dirichlet	$\hat{α} \equiv g^{- 1} ({\hat{α}}_{0})$	$- 404.95$
Dirichlet	$\hat{β} (t, d) = g^{- 1} {{\hat{β}}_{0} + \hat{f_{t}} (t) + \hat{f_{d}} (d)}$	$- 404.95$
Hüsler–Reiss	$\hat{λ} (t, d) = g^{- 1} {{\hat{λ}}_{0} + \hat{f_{t}} (t) + \hat{f_{d}} (d)}$	$- 402.98$

Equations88

G_{(\mu_{\mathbf{x}},\sigma_{\mathbf{x}},\xi_{\mathbf{x}})}(y)=\exp\bigg{[}-\bigg{\{}1+\xi_{\mathbf{x}}\bigg{(}\frac{y-\mu_{\mathbf{x}}}{\sigma_{\mathbf{x}}}\bigg{)}\bigg{\}}_{+}^{-1/\xi_{\mathbf{x}}}\bigg{]},

G_{(\mu_{\mathbf{x}},\sigma_{\mathbf{x}},\xi_{\mathbf{x}})}(y)=\exp\bigg{[}-\bigg{\{}1+\xi_{\mathbf{x}}\bigg{(}\frac{y-\mu_{\mathbf{x}}}{\sigma_{\mathbf{x}}}\bigg{)}\bigg{\}}_{+}^{-1/\xi_{\mathbf{x}}}\bigg{]},

V_{H}(\mathbf{y})=\int_{S_{d}}\max\bigg{(}\frac{w_{1}}{y_{1}},\ldots,\frac{w_{d}}{y_{d}}\bigg{)}\,\mathrm{d}H(\mathbf{w}).

V_{H}(\mathbf{y})=\int_{S_{d}}\max\bigg{(}\frac{w_{1}}{y_{1}},\ldots,\frac{w_{d}}{y_{d}}\bigg{)}\,\mathrm{d}H(\mathbf{w}).

\int_{S_{d}} w_{j} d H (w) = 1, j = 1, \dots, d .

\int_{S_{d}} w_{j} d H (w) = 1, j = 1, \dots, d .

μ (A_{y}) = V (y),

μ (A_{y}) = V (y),

μ (d y) = - V_{1 : d} (y) d y,

μ (d y) = - V_{1 : d} (y) d y,

(R, W) = (∥ Y ∥, \frac{Y}{∥ Y ∥}),

(R, W) = (∥ Y ∥, \frac{Y}{∥ Y ∥}),

μ (d y) = μ (d r \times d w) = \frac{d r}{r ^{2}} d H (w) .

μ (d y) = μ (d r \times d w) = \frac{d r}{r ^{2}} d H (w) .

E_{\mathbf{r}}=\bigg{\{}(y_{1},\ldots,y_{d})\in(0,\infty)^{d}:\sum_{j=1}^{d}\frac{y_{j}}{r_{j}}>1\bigg{\}},

E_{\mathbf{r}}=\bigg{\{}(y_{1},\ldots,y_{d})\in(0,\infty)^{d}:\sum_{j=1}^{d}\frac{y_{j}}{r_{j}}>1\bigg{\}},

R_{i} = Y^{i} > (j = 1 \sum d \frac{W _{i, j}}{r _{j}})^{- 1}, where W_{i, j} = \frac{Y _{j}^{i}}{R _{i}} .

R_{i} = Y^{i} > (j = 1 \sum d \frac{W _{i, j}}{r _{j}})^{- 1}, where W_{i, j} = \frac{Y _{j}^{i}}{R _{i}} .

μ (E_{r})

μ (E_{r})

ℓ_{E_{r}} (θ) = - μ (E_{r}) + i = 1 \sum n_{r} lo g {μ (d R_{i} \times d W_{i})},

ℓ_{E_{r}} (θ) = - μ (E_{r}) + i = 1 \sum n_{r} lo g {μ (d R_{i} \times d W_{i})},

ℓ_{E_{r}} (θ) \equiv i = 1 \sum n_{r} lo g {- V_{1 : d} (Y^{i}; θ)} .

ℓ_{E_{r}} (θ) \equiv i = 1 \sum n_{r} lo g {- V_{1 : d} (Y^{i}; θ)} .

V_{1 : d} (y; θ) = - ∥ y ∥^{- (d + 1)} h (\frac{y _{1}}{∥ y ∥}, \dots, \frac{y _{d}}{∥ y ∥}; θ)

V_{1 : d} (y; θ) = - ∥ y ∥^{- (d + 1)} h (\frac{y _{1}}{∥ y ∥}, \dots, \frac{y _{d}}{∥ y ∥}; θ)

ℓ_{E_{r}} (θ)

ℓ_{E_{r}} (θ)

\int_{S_{d}} w_{j} d H_{x} (w) = 1, j = 1, \dots, d .

\int_{S_{d}} w_{j} d H_{x} (w) = 1, j = 1, \dots, d .

G_{\mathbf{x}}(\mathbf{y})=\exp\bigg{\{}-\int_{S_{d}}\max\bigg{(}\frac{w_{1}}{y_{1}},\ldots,\frac{w_{d}}{y_{d}}\bigg{)}\,\text{d}H_{\mathbf{x}}(\mathbf{w})\bigg{\}}.

G_{\mathbf{x}}(\mathbf{y})=\exp\bigg{\{}-\int_{S_{d}}\max\bigg{(}\frac{w_{1}}{y_{1}},\ldots,\frac{w_{d}}{y_{d}}\bigg{)}\,\text{d}H_{\mathbf{x}}(\mathbf{w})\bigg{\}}.

G_{x} (y 1_{d}) = exp (- ϑ_{x} / y), y > 0,

G_{x} (y 1_{d}) = exp (- ϑ_{x} / y), y > 0,

h_{x} (w) = (1/ α_{x} - 1) {w (1 - w)}^{- 1 - 1/ α_{x}} {w^{- 1/ α_{x}} + (1 - w)^{- 1/ α_{x}}}^{α_{x} - 2}, w \in (0, 1),

h_{x} (w) = (1/ α_{x} - 1) {w (1 - w)}^{- 1 - 1/ α_{x}} {w^{- 1/ α_{x}} + (1 - w)^{- 1/ α_{x}}}^{α_{x} - 2}, w \in (0, 1),

h_{x} (w) = \frac{α _{x} β _{x} Γ ( α _{x} + β _{x} + 1 ) ( α _{x} w ) ^{α_{x} - 1} { β _{x} ( 1 - w ) } ^{β_{x} - 1}}{Γ ( α _{x} ) Γ ( β _{x} ) { α _{x} w + β _{x} ( 1 - w ) } ^{α_{x} + β_{x} + 1}}, w \in (0, 1),

h_{x} (w) = \frac{α _{x} β _{x} Γ ( α _{x} + β _{x} + 1 ) ( α _{x} w ) ^{α_{x} - 1} { β _{x} ( 1 - w ) } ^{β_{x} - 1}}{Γ ( α _{x} ) Γ ( β _{x} ) { α _{x} w + β _{x} ( 1 - w ) } ^{α_{x} + β_{x} + 1}}, w \in (0, 1),

h_{\mathbf{x}}(w)=\frac{\lambda_{\mathbf{x}}}{w(1-w)^{2}(2\pi)^{1/2}}\exp\bigg{\{}-\dfrac{\left[2+\lambda_{\mathbf{x}}^{2}\log\left\{w/(1-w)\right\}\right]^{2}}{8\lambda_{\mathbf{x}}^{2}}\bigg{\}},\quad w\in(0,1),

h_{\mathbf{x}}(w)=\frac{\lambda_{\mathbf{x}}}{w(1-w)^{2}(2\pi)^{1/2}}\exp\bigg{\{}-\dfrac{\left[2+\lambda_{\mathbf{x}}^{2}\log\left\{w/(1-w)\right\}\right]^{2}}{8\lambda_{\mathbf{x}}^{2}}\bigg{\}},\quad w\in(0,1),

h_{x} (w)

h_{x} (w)

h_{i, j_{x}} (w)

{(x^{i}, Y^{i}) : Y^{i} \in E_{r}},

{(x^{i}, Y^{i}) : Y^{i} \in E_{r}},

h_{x} (\frac{Y _{1}^{i}}{∥ Y ^{i} ∥}, \frac{Y _{2}^{i}}{∥ Y ^{i} ∥}) = h_{x} (w_{i}, 1 - w_{i}) \equiv h_{x} (w_{i}), for w_{i} \in [0, 1], i = 1, \dots, n_{r},

h_{x} (\frac{Y _{1}^{i}}{∥ Y ^{i} ∥}, \frac{Y _{2}^{i}}{∥ Y ^{i} ∥}) = h_{x} (w_{i}, 1 - w_{i}) \equiv h_{x} (w_{i}), for w_{i} \in [0, 1], i = 1, \dots, n_{r},

θ_{x}

θ_{x}

x

η (x) \equiv η = H_{0} β_{[0]} + k = 1 \sum q H_{k} f_{k} (x_{k}) .

η (x) \equiv η = H_{0} β_{[0]} + k = 1 \sum q H_{k} f_{k} (x_{k}) .

f_{k, l} (x_{k}^{i}) = s = 1 \sum d_{k} β_{[k l]_{s}} B_{s, \tilde{q}} (x_{k}^{i}), k = 1, \dots, q, l = 1, \dots, p, i = 1, \dots, n_{r},

f_{k, l} (x_{k}^{i}) = s = 1 \sum d_{k} β_{[k l]_{s}} B_{s, \tilde{q}} (x_{k}^{i}), k = 1, \dots, q, l = 1, \dots, p, i = 1, \dots, n_{r},

β_{[k]} = (β_{[k 1]_{1}}, \dots, β_{[k 1]_{\tilde{d}}}, \dots, β_{[k p]_{1}}, \dots, β_{[k p]_{\tilde{d}}})^{T} \in R^{\tilde{d} p} .

β_{[k]} = (β_{[k 1]_{1}}, \dots, β_{[k 1]_{\tilde{d}}}, \dots, β_{[k p]_{1}}, \dots, β_{[k p]_{\tilde{d}}})^{T} \in R^{\tilde{d} p} .

η = β_{[0]} + k = 1 \sum q X_{[k]} β_{[k]} = X_{VAM} β,

η = β_{[0]} + k = 1 \sum q X_{[k]} β_{[k]} = X_{VAM} β,

{β = (β_{[0]} β_{[1]} \dots β_{[q]})^{T} \in B \subset R^{p (1 + q \tilde{d})}, X_{VAM} = (1_{p n_{r} \times p} X_{[1]} \dots X_{[q]}) \in R^{p n_{r} \times {p (1 + q \tilde{d})}}

{β = (β_{[0]} β_{[1]} \dots β_{[q]})^{T} \in B \subset R^{p (1 + q \tilde{d})}, X_{VAM} = (1_{p n_{r} \times p} X_{[1]} \dots X_{[q]}) \in R^{p n_{r} \times {p (1 + q \tilde{d})}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Regression Type Models for Extremal Dependence

Linda Mhalla ${}^{\text{a}}$

Miguel de Carvalho ${}^{\text{b}}$

Valérie Chavez-Demoulin ${}^{\text{c,}}$

aGeneva School of Economics and Management (GSEM), Université de Genève, Switzerland; bSchool of Mathematics, University of Edinburgh, UK; cFaculty of Business and Economics (HEC), Université de Lausanne, Switzerland. contact Valérie Chavez-Demoulin ([email protected]), Faculty of Business and Economics (HEC), Université de Lausanne, Switzerland.

Supplementary materials for this article are available online.

Abstract

We propose a vector generalized additive modeling framework for taking into account the effect of covariates on angular density functions in a multivariate extreme value context. The proposed methods are tailored for settings where the dependence between extreme values may change according to covariates. We devise a maximum penalized log-likelihood estimator, discuss details of the estimation procedure, and derive its consistency and asymptotic normality. The simulation study suggests that the proposed methods perform well in a wealth of simulation scenarios by accurately recovering the true covariate-adjusted angular density. Our empirical analysis reveals relevant dynamics of the dependence between extreme air temperatures in two alpine resorts during the winter season. Supplementary materials for this article are available online.

keywords: Angular density; Covariate-adjustment; Penalized log-likelihood; Statistics of multivariate extremes; VGAM.

1 Introduction

In this paper, we address an extension of the standard approach for modeling non-stationary univariate extremes to the multivariate setting. In the univariate context, the limiting distribution for the maximum of a sequence of independent and identically distributed random variables, derived by Fisher and Tippett (1928), is given by a generalized extreme value distribution characterized by three parameters: $\mu$ (location), $\sigma$ (scale), and $\xi$ (shape). To take into account the effect of a vector of covariates $\mathbf{x}$ , one can let these parameters depend on $\mathbf{x}$ , and the resulting generalized extreme value distribution takes the form

[TABLE]

where $(a)_{+}=\max\{0,a\}$ ; see Coles (2001, ch. 6), Pauli and Coles (2001), Chavez-Demoulin and Davison (2005), Yee and Stephenson (2007), Wang and Tsai (2009), Eastoe and Tawn (2009), and Chavez-Demoulin and Davison (2005) for related approaches.

In the multivariate context, consider $\mathbf{Y}^{i}=\left(Y^{i}_{1},\ldots,Y^{i}_{d}\right)^{\mathrm{\scriptscriptstyle T}}$ independent and identically distributed random vectors with joint distribution $F$ , and unit Fréchet marginal distribution functions $F_{j}(y)=\exp(-1/y)$ , for $y>0$ . Pickands’ representation theorem (Coles, 2001, Theorem 8.1) states that the law of the standardized componentwise maxima, $\mathbf{M}_{n}=n^{-1}\max\{\mathbf{Y}^{1},\ldots,\mathbf{Y}^{n}\}$ , converges in distribution to a multivariate extreme value distribution, $G_{H}(\mathbf{y})=\exp\left\{-V_{H}(\mathbf{y})\right\},$ with

[TABLE]

Here $H$ is the so-called angular measure, that is, a positive finite measure on the unit simplex $S_{d}=\left\{(w_{1},\ldots,w_{d})\in[0,\infty)^{d}:w_{1}+\cdots+w_{d}=1\right\}$ that needs to obey

[TABLE]

The function $V(\mathbf{y})\equiv V_{H}(\mathbf{y})$ , is the so-called exponent measure and is continuous, convex, and homogeneous of order $-1$ , i.e., $V(t\mathbf{y})=t^{-1}V(\mathbf{y})$ for all $t>0$ .

The class of limiting distributions of multivariate extreme values yields an infinite number of possible parametric representations (Coles, 2001, ch. 8), as the validity of a multivariate extreme value distribution is conditional on its angular measure $H$ satisfying the moment constraint (3). Therefore, most literature has focused on the estimation of the extremal dependence structures described by spectral measures or equivalently angular densities (Boldi and Davison, 2007; Einmahl et al., 2009; de Carvalho et al., 2013; Sabourin and Naveau, 2014; Hanson et al., 2017). Related quantities, such as the Pickands dependence function (Pickands, 1981) and the stable tail dependence function (Huang, 1992; Drees and Kaufmann, 1998), were investigated by many authors (Einmahl et al., 2006; Gudendorf and Segers, 2012; Wadsworth and Tawn, 2013; Marcon et al., 2016). A wide variety of parametric models for the spectral density that allow flexible dependence structures were proposed (Kotz and Nadarajah, 2000, sec. 3.4).

However, few papers were able to satisfactorily address the challenging but incredibly relevant setting of modeling nonstationarity at joint extreme levels. Some exceptions include de Carvalho and Davison (2014), who proposed a nonparametric approach, where a family of spectral densities is constructed using exponential tilting. Castro and de Carvalho (2017) developed an extension of this approach based on covariate-varying spectral densities. However, these approaches are limited to replicated one-way ANOVA types of settings. de Carvalho (2016) advocated the use of covariate-adjusted angular densities, and Escobar-Bach et al. (2016) discussed estimation—in the bivariate and covariate-dependent framework—of the Pickands dependence function based on local estimation with a minimum density power divergence criterion. Finally, Mhalla et al. (2017) constructed, in a nonparametric framework, smooth models for predictor-dependent Pickands dependence functions based on generalized additive models.

Our approach is based on a non-linear model for covariate-varying extremal dependences. Specifically, we develop a vector generalized additive model that flexibly allows the extremal dependence to change with a set of covariates, but—keeping in mind that extreme values are scarce—it borrows strength from a parametric assumption. In other words, the goal is to develop a regression model for the extremal dependence through the parametric specification of an extremal dependence structure and then to model the parameters of that structure through a vector generalized additive model (VGAM) (Yee and Wild, 1996; Yee, 2015). One major advantage over existing methods is that our model may be used for handling an arbitrary number of dimensions and covariates of different types, and it is straightforward to implement, as illustrated in the R code (R Development Core Team, 2016) in the Supplementary Materials.

The remainder of this paper is organized as follows. In Section 2 we introduce the proposed model for covariate-adjusted extremal dependences. In Section 3 we develop our penalized likelihood approach and give details on the asymptotic properties of our estimator. In Section 4 we assess the performance of the proposed methods. An application to extreme temperatures in the Swiss Alps is given in Section 5. We close the paper in Section 6 with a discussion.

2 Flexible Covariate-Adjusted Angular Densities

2.1 Statistics of Multivariate Extremes: Preparations and Background

The functions $H$ and $V$ in (2) can be used to describe the structure of dependence between the extremes, as in the case of independence between the extremes, where $V(\mathbf{y})=\sum_{j=1}^{d}1/y_{j}$ , and in the case of perfect extremal dependence, where $V(\mathbf{y})=\max\{1/y_{1},\ldots,1/y_{d}\}$ . As a consequence, if $H$ is differentiable with angular density denoted $h$ , the more mass around the barycenter of $S_{d}$ , $(d^{-1},\ldots,d^{-1})$ , the higher the level of extremal dependence. Further insight into these measures may be obtained by considering the point process $P_{n}=\{n^{-1}\mathbf{Y}^{i}:i=1,\ldots,n\}$ . Following de Haan and Resnick (1977) and Resnick (1987, sec. 5.3), as $n\rightarrow\infty$ , $P_{n}$ converges to a non-homogeneous Poisson point process $P$ defined on $[\mathbf{0},\bm{\infty})\setminus\{\mathbf{0}\}$ with a mean measure $\mu$ that verifies

[TABLE]

where $A_{\mathbf{y}}=\mathbb{R}^{d}\setminus\left(\left[-\mathbf{\infty},y_{1}\right]\times\cdots\times\left[-\mathbf{\infty},y_{d}\right]\right)$ .

There are two representations of the intensity measure of the limiting Poisson point process $P$ that will be handy for our purposes. First, it holds that

[TABLE]

with $V_{1:d}$ being the derivative of $V$ with respect to all its arguments (Resnick, 1987, sec. 5.4). Second, another useful factorization of the intensity measure $\mu(\mathrm{d}\mathbf{y})$ , called the spectral decomposition, can be obtained using the following decomposition of the random variable $\mathbf{Y}=(Y_{1},\ldots,Y_{d})^{\mathrm{\scriptscriptstyle T}}$ into radial and angular coordinates,

[TABLE]

where $\left\lVert\cdot\right\rVert$ denotes the $L_{1}$ -norm. Indeed, it can be shown that (Beirlant et al., 2004, sec. 8.2.3) the limiting intensity measure factorizes across radial and angular components as follows:

[TABLE]

The spectral decomposition (5) allows the separation of the marginal and the dependence parts in the multivariate extreme value distribution $G_{H}$ , with the margins being unit Fréchet and the dependence structure being described by the angular measure $H$ .

The inference approach that we build on in this paper was developed by Coles and Tawn (1991) and is based on threshold excesses; see Huser et al. (2016) for a detailed review of likelihood estimators for multivariate extremes. The set of extreme events is defined as the set of observations with radial components exceeding a high fixed threshold, that is, the observations belonging to the extreme set,

[TABLE]

with $\mathbf{r}=(r_{1},\ldots,r_{d})$ being a large threshold vector. Since the points $n^{-1}\mathbf{Y}^{i}$ are mapped to the origin for non-extreme observations, the threshold $\mathbf{r}$ needs to be sufficiently large for the Poisson approximation to hold. Note that, $\mathbf{Y}^{i}\in E_{\mathbf{r}}$ , if and only if,

[TABLE]

Hence, the expected number of points of the Poisson process $P$ located in the extreme region $E_{\mathbf{r}}$ is

[TABLE]

Now, we can explicitly formulate the Poisson log-likelihood over the set $E_{r}$ ,

[TABLE]

where $\bm{\theta}$ represents the $p-$ vector of parameters of the measure $\mu$ and $n_{\mathbf{r}}$ represents the number of reindexed observations in the extreme set $E_{\mathbf{r}}$ . Using (6), the first term in (7) can be omitted when maximizing the Poisson log-likelihood, which, using (4), boils down to

[TABLE]

Thanks to the differentiability of the exponent measure $V$ and the support of the angular measure $H$ in the unit simplex $S_{d}$ , we can use the result of Coles and Tawn (1991, Theorem 1) that relates the angular density to the exponent measure via

[TABLE]

and reformulate the log-likelihood (8) as follows

[TABLE]

2.2 Vector Generalized Additive Models for Covariate-Adjusted Angular Densities

Our starting point for modeling is an extension of (1) to the multivariate setting. Whereas the model in (1) is based on indexing the parameters of the univariate extreme value distribution with a regressor, here we index the parameter ( $H$ ) of a multivariate extreme value distribution ( $G_{H}$ ) with a regressor $\mathbf{x}=(x_{1},\ldots,x_{q})^{\mathrm{\scriptscriptstyle T}}\in\mathcal{X}\subset\mathbb{R}^{q}$ . Our target object of interest is thus given by a family of covariate-adjusted angular measures $H_{\mathbf{x}}$ obeying

[TABLE]

Of particular interest is the setting where $H_{\mathbf{x}}$ is differentiable, in which case the covariate-adjusted angular density can be defined as $h_{\mathbf{x}}(\mathbf{w})=\mathrm{d}H_{\mathbf{x}}/\mathrm{d}\mathbf{w}$ . This yields a corresponding family of covariate-indexed multivariate extreme value distributions

[TABLE]

Other natural objects depending on $G_{\mathbf{x}}$ can be readily defined, such as the covariate-adjusted extremal coefficient, $\vartheta_{\mathbf{x}}$ , which solves

[TABLE]

where $\bm{1}_{d}$ is a $d-$ vector of ones. Here, $\vartheta_{\mathbf{x}}$ ranges from 1 to $d$ , and the closer $\vartheta_{\mathbf{x}}$ is to one, the closer we get to the case of complete dependence at that value of the covariate.

Some parametric models (Tawn, 1990; Coles and Tawn, 1991; Hüsler and Reiss, 1989; Cooley et al., 2010) are used below to illustrate the concept of covariate-adjusted angular densities and of a covariate-adjusted extremal coefficient, and we focus on the bivariate and trivariate settings for the sake of illustrating ideas. To develop insight and intuition on these models, see Figures 1 and 2.

Example 1 (Logistic angular surface).

Let

[TABLE]

with $\alpha:\mathcal{X}\subset\mathbb{R}^{q}\to(0,1]$ . In Figure 1 (left) we represent the case $\alpha_{x}=\exp\{\eta(x)\}/[1+\exp\{\eta(x)\}]$ , with $\eta(x)=x^{2}-0.5x-1$ and $x\in\mathcal{X}=[0.1,2]$ . This setup corresponds to be transitioning between a case of relatively high extremal dependence (lower values of $x$ ) to a case where we approach asymptotic independence (higher values of $x$ ).

Example 2 (Dirichlet angular surface).

Let

[TABLE]

with $\alpha:\mathcal{X}\subset\mathbb{R}^{q}\to(0,\infty)$ and $\beta:\mathcal{X}\subset\mathbb{R}^{q}\to(0,\infty)$ . In Figure 1 (middle) we consider the case $\alpha_{x}=\exp(x)$ and $\beta_{x}=x^{2}$ , with $x\in[0.9,3]$ . Note the different schemes of extremal dependence induced by the different values of the covariate $x$ as well as the asymmetry of the angular surface underlying this model.

Example 3 (Hüsler–Reiss angular surface).

Let

[TABLE]

where $\lambda:\mathcal{X}\subset\mathbb{R}^{q}\to(0,\infty)$ . In Figure 1 (right) we consider the case $\lambda_{x}=\exp(x)$ , with $x\in[0.1,2]$ . Under this specification, lower values of $x$ correspond to lower levels of extremal dependence, whereas higher values of $x$ correspond to higher levels of extremal dependence.

Example 4 (Pairwise beta angular surface).

Let

[TABLE]

where $\mathbf{w}=(w_{1},w_{2},w_{3})\in S_{3}$ and $\alpha,\beta_{i,j}:\mathcal{X}\subset\mathbb{R}^{q}\to(0,\infty)$ for $1\leq i<j\leq 3$ . In Figure 2, we consider the case $\alpha_{\mathbf{x}}=\exp\{\exp(x)\}$ , $\beta_{1,2_{\mathbf{x}}}=\exp(x)$ , $\beta_{1,3_{\mathbf{x}}}=x+1$ , and $\beta_{2,3_{\mathbf{x}}}=x+2$ , with $x\in[0.8,3.3]$ . For the different considered values of $x$ , different strengths of global and pairwise dependences can be observed. The mass is concentrated mostly at the center of the simplex due to a large global dependence parameter $\alpha_{\mathbf{x}}$ , compared to the pairwise dependence parameters.

The previous parametric models provide some examples of covariate-adjusted angular surfaces $h_{\mathbf{x}}$ . But, how can we learn about $h_{\mathbf{x}}$ from the data? Suppose we observe the regression data $\{(\mathbf{x}^{i},\mathbf{Y}^{i})\}_{i=1}^{n}$ , with $(\mathbf{x}^{i},\mathbf{Y}^{i})\in\mathcal{X}\times\mathbb{R}^{d}$ , and where we assume that $\mathbf{Y}^{i}=\left(Y^{i}_{1},\ldots,Y^{i}_{d}\right)^{\mathrm{\scriptscriptstyle T}}$ are independent random vectors with unit Fréchet marginal distributions. Using a similar approach as in Section 2.1, we convert the raw sample into a pseudo-sample of cardinality $n_{\mathbf{r}}$ ,

[TABLE]

and use the latter reindexed data to learn about $h_{\mathbf{x}}$ .

Without loss of generality, we restrain ourselves to the bivariate extreme value framework ( $d=2$ ), so that

[TABLE]

that is, the dimension of the angular observations $w_{i}$ is $M=d-1=1$ . We model $h_{\mathbf{x}}(\cdot)$ using $h(\cdot;\bm{\theta}_{\mathbf{x}})$ , where the parameter underlying the dependence structure

[TABLE]

is specified through a vector generalized additive model (VGAM) (Yee and Wild, 1996). Specifically, we model $h_{\mathbf{x}}(w)$ using a fixed family of parametric extremal dependence structures $h(w;\bm{\theta}_{\mathbf{x}})$ with a covariate-dependent set of parameters $\bm{\theta}_{\mathbf{x}}$ . To learn about $\bm{\theta}_{\mathbf{x}}$ from the pseudo-sample, we use a vector generalized additive model, which takes the form

[TABLE]

Here,

•

$\bm{\eta}=\mathbf{g}\left(\bm{\theta}_{\mathbf{x}}\right)=\left(g_{1}(\theta_{1x^{1}}),\ldots,g_{1}(\theta_{1x^{n_{\mathbf{r}}}}),\ldots,g_{p}(\theta_{px^{1}}),\ldots,g_{p}(\theta_{px^{n_{\mathbf{r}}}})\right)^{\mathrm{\scriptscriptstyle T}}$ is the vector of predictors and $g_{l}$ is a link function that ensures that $\theta_{l\cdot}$ is well defined, for $l=1,\ldots,p$ ,

•

$\bm{\beta}_{[0]}$ is a $pn_{\mathbf{r}}-$ vector of intercepts, with $p$ distinct values each repeated $n_{\mathbf{r}}$ times,

•

$\mathbf{x}_{k}=\left(x_{k}^{1},\ldots,x_{k}^{n_{\mathbf{r}}}\right)^{\mathrm{\scriptscriptstyle T}}\in\mathcal{X}_{k}^{{n_{\mathbf{r}}}}$ , for $k=1,\ldots,q$ ,

•

$\mathbf{f}_{k}=(\mathbf{f}_{k,1},\ldots,\mathbf{f}_{k,p})^{\mathrm{\scriptscriptstyle T}}$ , where $\mathbf{f}_{k,l}=(f_{k,l}(x^{1}_{k}),\ldots,f_{k,l}(x^{n_{\mathbf{r}}}_{k}))^{\mathrm{\scriptscriptstyle T}},$ and $f_{k,l}:\mathcal{X}_{k}\rightarrow\mathbb{R}$ are smooth functions supported on $\mathcal{X}_{k}$ , for $k=1,\ldots,q$ and $l=1,\ldots,p$ , and

•

$\mathbf{H}_{k}$ are $pn_{\mathbf{r}}\times pn_{\mathbf{r}}$ constraint matrices, for $k=0,\ldots,q$ .

The constraint matrices $\mathbf{H}_{k}$ are important quantities in the VGAM (11) that allow the tuning of the effects of the covariates on each of the $pn_{\mathbf{r}}$ components of $\bm{\eta}$ . For example, in Example 4, one might want to impose the same smooth effect of a covariate on each of the $\binom{3}{2}$ pairwise dependence parameters and at the same time restrict the effect of this covariate to be zero on the global dependence parameter. To avoid clutter in the notation, we assume from now on that $\mathbf{H}_{k}=\mathbf{I}_{pn_{\mathbf{r}}\times pn_{\mathbf{r}}}$ , for $k=0,\ldots,q$ .

The smooth functions $f_{k,l}$ are written as linear combinations of $B$ -spline basis functions

[TABLE]

where $B_{s,\tilde{q}}$ is the $s$ th $B$ -spline of order $\tilde{q}$ and $d_{k}=\tilde{q}+m_{k}$ , with $m_{k}$ the number of internal equidistant knots for $\mathbf{x}_{k}$ (Yee, 2015, sec. 2.4.5). To ease the notational burden, we suppose without loss of generality that $d_{k}\equiv\tilde{d}$ , for $k=1,\ldots,q$ , and define

[TABLE]

Therefore, the VGAM (11), with identity constraint matrices $\mathbf{H}_{k}$ , can be written as

[TABLE]

where

[TABLE]

for some $pn_{\mathbf{r}}\times\tilde{d}p$ submatrices $\mathbf{X}_{[k]}$ , $k=1,\ldots,q$ . The vector of parameters to be estimated in the VGAM (12) is $\bm{\beta}$ .

The specification in (12) makes it possible to simultaneously fit ordinary Generalized Additive Models (Wood, 2017) in each component of the vector of parameters $\bm{\theta}_{\mathbf{x}}$ , hence avoiding any non orthogonality-related issues that could arise if the $p$ components were to be treated separately (Chavez-Demoulin and Davison, 2005). Finally, if the dimension $M$ of the response vector of angular observations $w_{i}$ is greater than one ( $d>2$ ), then the vector of predictors $\bm{\eta}$ will instead be a $Mpn_{\mathbf{r}}-$ vector and the dimensions of the related quantities in (12) will change accordingly.

To give the unfamiliar reader insight on some of the quantities introduced above, we identify these quantities in the examples mentioned previously:

•

In Examples 1 and 3, $d=2$ , $M=1$ , $p=1$ , $q=1$ , and $\mathcal{X}=[0.1,2]$ . The difference between the VGAMs modeled in these two examples resides in the form of dependence of $\eta$ on $x$ and the link function $g$ . In Example 1, the parameter $\theta_{x}\in(0,1]$ , $\eta=x^{2}-0.5x-1$ , and the link function $g$ is the logit function, whereas in Example 3 the parameter $\theta_{x}\in(0,\infty)$ , $\eta=x$ , and the link function $g$ is the logarithm function.

•

In Example 2, $d=2$ , $M=1$ , $p=2$ , $q=1$ , $\mathcal{X}=[0.9,3]$ , and $\bm{\eta}=(x,x)^{\mathrm{\scriptscriptstyle T}}$ . The vector of parameters for the bivariate Dirichlet angular density $\bm{\theta}_{x}\in(0,\infty)^{2}$ and the link functions $g_{1}$ and $g_{2}$ are the logarithm and the square root functions, respectively.

•

In Example 4, $d=3$ , $M=2$ , $p=4$ , $q=1$ , $\mathcal{X}=[0.8,3.3]$ , and $\bm{\eta}=(\exp(x),x,\log(x+1),\log(x+2))^{\mathrm{\scriptscriptstyle T}}$ . The vector of parameters for the pairwise beta angular density $\bm{\theta}_{x}\in(0,\infty)^{4}$ and the link function $g_{l}$ is the logarithm function, for $l=1,\ldots,4$ .

3 Inference and Asymptotic Properties

The log-likelihood (2.1) with a covariate-dependent vector of parameters $\bm{\theta}_{\mathbf{x}}$ is now written as

[TABLE]

where $\mathbf{g}^{-1}$ is the componentwise inverse of $\mathbf{g}$ .

Incorporating a covariate-dependence in the extremal dependence model through a non-linear smooth model adds considerable flexibility in the modeling of the dependence parameter $\bm{\theta}_{\mathbf{x}}$ . The price to pay for this flexibility is reflected in the estimation procedure. The estimation of $\bm{\theta}_{\mathbf{x}}$ , hence of $\bm{\beta}$ , is performed by maximizing the penalized log-likelihood

[TABLE]

where the penalty term can be written as

[TABLE]

with $\mathbf{P}(\mbox{\boldmath$ \gamma $})$ a $p(1+q\tilde{d})\times p(1+q\tilde{d})$ block matrix with a first $p\times p$ block filled with zeros and $q$ blocks, each formed by a $p\tilde{d}\times p\tilde{d}$ matrix $\mathbf{P}_{k}$ that depends only on the knots of the $B$ -spline functions for the covariate $\mathbf{x}_{k}$ . The matrix $\mathbf{P}(\mbox{\boldmath$ \gamma $})$ can be written as $\mathbf{P}(\mbox{\boldmath$ \gamma $})=\tilde{\mathbf{X}}^{\mathrm{\scriptscriptstyle T}}\tilde{\mathbf{X}}$ for some $p(1+q\tilde{d})\times p(1+q\tilde{d})$ real matrix $\tilde{\mathbf{X}}$ . The vectors $\bm{\beta}_{[k]}$ are defined in (12), and $\gamma_{(l)k}$ are termed the smoothing parameters.

The penalty term in (13) controls the wiggliness and the fidelity to the data of the component functions in (11) through the vector $\gamma$ of the smoothing parameters $\gamma_{(l)k}$ for $l=1,\ldots,p$ and $k=1,\ldots,q$ . Larger values of $\gamma_{(l)k}$ lead to smoother effects of the covariate $\mathbf{x}_{k}$ on the $l$ th component of $\bm{\eta}$ .

The maximization of the penalized log-likelihood (13) is based on a Newton–Raphson (N–R) algorithm. At each step of the N–R algorithm, a set of smoothing parameters is proposed by outer iteration (Wood, 2017), and a penalized iterative reweighted least squares (PIRLS) algorithm is performed, in an inner iteration, to update the model coefficients estimates. We detail the inner fitting procedure in the following section and the outer iteration in Section 3.2.

3.1 Fitting Algorithm

We suppose that the penalized log-likelihood (13) depends only on the $p(1+q\tilde{d})-$ vector $\bm{\beta}$ and that the vector of smoothing parameters $\gamma$ is proposed (at each iteration of the N–R algorithm) by outer iteration and is therefore fixed in what follows.

The penalized maximum log-likelihood estimator (PMLE) $\hat{\bm{\beta}}$ satisfies the following score equation

[TABLE]

where $\mathbf{u(\bm{\beta})}=\partial\ell(\bm{\beta})/\partial\bm{\eta}\in\mathbb{R}^{pn_{\mathbf{r}}}$ and $\mathbf{X}_{\text{VAM}}$ is as defined in (12). To obtain $\hat{\bm{\beta}}$ , we update $\bm{\beta}^{(a-1)}$ , the $(a-1)$ th estimate of the true $\bm{\beta}_{0}$ , by Newton–Raphson:

[TABLE]

where

[TABLE]

The matrix $\mathbf{W}(\bm{\beta}^{(a-1)})$ is termed the working weight matrix. If the expectation ${\rm{E}}\{\partial^{2}\ell(\bm{\beta})/\partial\bm{\eta}\partial\bm{\eta}^{\mathrm{\scriptscriptstyle T}}\}$ is obtainable, a Fisher scoring algorithm is then preferred, as it ensures the positive definiteness of $\mathbf{W}$ over a larger region of the parameter space $\mathbf{B}$ than in the N–R algorithm. When the working weight matrix is not positive definite, which might happen when the parameter $\bm{\beta}^{(a-1)}$ is far from the true $\bm{\beta}_{0}$ , a Greenstadt (Greenstadt, 1967) modification is applied, and the negative eigenvalues of $\mathbf{W}(\bm{\beta}^{(a-1)})$ are replaced by their absolute values. With the different families of angular densities considered in Examples 1–4, the expected information matrix is not obtainable and is hence replaced by the observed information matrix on which a Greenstadt modification is applied whenever needed. See Yee (2015, Section 9.2) for other remedies and techniques for deriving well-defined working weight matrices.

Let $\mathbf{z}^{(a-1)}:=\mathbf{X}_{\text{VAM}}\bm{\beta}^{(a-1)}+\mathbf{W}(\bm{\beta}^{(a-1)})^{-1}\mathbf{u}(\bm{\beta}^{(a-1)})$ be the $pn_{\mathbf{r}}-$ vector of working responses. Then, (14) can be rewritten in a PIRLS form as

[TABLE]

where $\mathbf{X}_{\text{PVAM}}$ , $\mathbf{y}^{(a-1)}$ , and $\tilde{\mathbf{W}}^{(a-1)}$ are augmented versions of $\mathbf{X}_{\text{VAM}}$ , $\mathbf{z}^{(a-1)}$ and $\mathbf{W}(\bm{\beta}^{(a-1)})$ , respectively, and are defined as

[TABLE]

The algorithm stops when the change in the coefficients $\bm{\beta}$ between two successive iterations is sufficiently small. Convergence of the N–R algorithm is not guaranteed and might not occur if the quadratic approximation of $\ell(\bm{\beta},\mbox{\boldmath$ \gamma $})$ around $\hat{\bm{\beta}}$ is poor. See Yee (2015, 2016) for more details.

The plug-in penalized maximum log-likelihood estimator of the covariate-dependent angular density is defined as

[TABLE]

In the following section, we give details about the selection of the smoothing parameters $\gamma$ , which is outer to the PIRLS algorithm.

3.2 Selection of the Smoothing Parameters

To implement the PIRLS algorithm performed at each iteration of the N–R algorithm, a smoothing parameter selection procedure is conducted by minimizing a prediction error estimate given by the generalized cross validation (GCV) score.

Let $\mathbf{A}^{(a-1)}(\mbox{\boldmath$ \gamma $})$ be the influence matrix of the fitting problem at the $a$ th iteration, defined as

[TABLE]

Then, by minimizing the GCV score

[TABLE]

we aim at balancing between goodness of fit and complexity of the model, which is measured by the trace of the influence matrix and termed the effective degrees of freedom (EDF). The EDF of the fitted VGAM (12) are defined as the EDF obtained at convergence, that is, ${\rm{trace}}\left\{\mathbf{A}^{(c-1)}(\mbox{\boldmath$ \gamma $})\right\},$ where $c$ is the iteration at which convergence occurs.

Both the fitting algorithm of Section 3.1 and the smoothing parameter selection are implemented in the R package VGAM (Yee, 2017), with the latter being required from the R package mgcv (Wood, 2017).

Model selection between different, not necessarily nested, fitted VGAMs is performed based on the Akaike information criterion (AIC), where the number of parameters of the model is replaced by its EDF to account for penalization. More details on the (conditional) AIC for models with smoothers along with a corrected version of this criterion, which takes into account the smoothing parameter uncertainty, can be found in Wood (2017, sec. 6.11).

3.3 Large Sample Properties

In this section we derive the consistency and asymptotic normality of the PMLE $\hat{\bm{\beta}}$ defined in Section 3.1.

Based on the penalized log-likelihood (13), $\hat{\bm{\beta}}$ satisfies the following score equation

[TABLE]

where $\mathbf{m}(\bm{\beta})=\partial\ell(\bm{\beta})/\partial\bm{\beta}$ .

Let $\mathbf{B}_{0}$ be an open neighborhood around the true parameter $\bm{\beta}_{0}$ . Moreover, we define $\mathbf{m}(\mathbf{y},\bm{\beta})=\partial\ell(\mathbf{y};\bm{\beta})/\partial\bm{\beta}$ .

Our asymptotic results hold under the following customary assumptions:

( $\text{A}_{1}$ )

$\mbox{\boldmath$ \gamma $}=\begin{pmatrix}\gamma_{(1)1}&\cdots&\gamma_{(p)1}&\cdots&\gamma_{(1)q}&\cdots&\gamma_{(p)q}\end{pmatrix}^{\mathrm{\scriptscriptstyle T}}=o(n_{\mathbf{r}}^{-1/2})\mathbf{1}_{pq}$ . 2. ( $\text{A}_{2}$ )

Regularity conditions:

•

If $\bm{\beta}\neq\tilde{\bm{\beta}}$ , then $\ell(\mathbf{y};\bm{\beta})\neq\ell(\mathbf{y};\tilde{\bm{\beta}})$ , with $\bm{\beta},\tilde{\bm{\beta}}\in\mathbf{B}$ . Moreover, $\rm{E}\{{\rm{sup}}_{\bm{\beta}\in\mathbf{B}}|\ell(\mathbf{Y};\bm{\beta})|\}<\infty$ .

•

The true parameter $\bm{\beta}_{0}$ is in the interior of $\mathbf{B}$ .

•

For $\mathbf{y}\in(0,\infty)^{d}$ , $\ell(\mathbf{y};\bm{\beta})\in C^{3}(\mathbf{B}_{0})$ .

•

$\int{\rm{sup}}_{\bm{\beta}\in\mathbf{B}_{0}}\|\mathbf{m}(\mathbf{y},\bm{\beta})\|\,\mathrm{d}\mathbf{y}<\infty$ and $\int{\rm{sup}}_{\bm{\beta}\in\mathbf{B}_{0}}\|\partial\mathbf{m}(\mathbf{y},\bm{\beta})/\partial\bm{\beta}^{\mathrm{\scriptscriptstyle T}}\|\mathrm{d}\mathbf{y}<\infty$ .

•

For $\bm{\beta}\in\mathbf{B}_{0}$ , $\mathbf{i}(\bm{\beta}):={\rm{cov}}\{\mathbf{m}(\mathbf{Y},\bm{\beta})\}=\mathbf{X}_{\text{VAM}}^{\mathrm{\scriptscriptstyle T}}\mathbf{W}(\bm{\beta})\mathbf{X}_{\text{VAM}}$ exists and is positive-definite.

•

For each triplet $1\leq q,r,s\leq p(1+q\tilde{d})$ , there exists a function $M_{qrs}:(0,\infty)^{d}\rightarrow\mathbb{R}$ such that, for $\mathbf{y}\in(0,\infty)^{d}$ and $\bm{\beta}\in\mathbf{B}_{0}$ , $|\partial^{3}\ell(\mathbf{y};\bm{\beta})/\partial\bm{\beta}_{qrs}|\leq M_{qrs}(\mathbf{y})$ , and ${\rm{E}}\left\{M_{qrs}(\mathbf{Y})\right\}<\infty$ .

The next theorem characterizes the large sample behavior of our estimator.

Theorem 1.

Under $A_{1}$ and $A_{2}$ , it follows that as $n_{\mathbf{r}}\to\infty$ :

$\|\hat{\bm{\beta}}-\bm{\beta}_{0}\|=O_{p}(n_{\mathbf{r}}^{-1/2})$ . 2. 2.

$\sqrt{n_{\mathbf{r}}}(\hat{\bm{\beta}}-\bm{\beta}_{0})\overset{\mathrm{d}}{\rightarrow}N(\mathbf{0},\mathbf{i}(\bm{\beta}_{0})^{-1})$ .

These results are derived from a second-order Taylor expansion of the score equation (16) around the true parameter $\bm{\beta}_{0}$ along the same lines as in Vatter and Chavez-Demoulin (2015) and Davison (2003, p. 147). Similar results on the large sample behavior of the corresponding plug-in estimator (15) can be derived using the multivariate delta method. These results are useful to derive and construct approximate confidence intervals for conditional angular densities and to compare nested models based on likelihood ratio tests. Our proviso is similar to that of de Carvalho and Davison (2014) in the sense that asymptotic properties of the estimator $\hat{\bm{\beta}}$ are derived under the assumption of known margins and we sample from the limiting object $h_{\textbf{x}}$ , whereas in practice only a sample of (estimated) pseudo-angles, $\{\widehat{\mathbf{W}}_{i}\}_{i=1}^{n_{\mathbf{r}}}$ , would be available. Asymptotic properties under misspecification of the parametric model set for $h_{\textbf{x}}$ could in principle be derived under additional assumptions on $\bm{\beta}$ and $\mathbf{m}$ , along the same lines as in standard likelihood theory (Knight, 2000). The resulting theory is outside the scope of this work and is deliberately not studied here.

4 Simulation Study

4.1 Data Generating Processes and Preliminary Experiments

We assess the performance of our methods using the bivariate extremal dependence structures presented in Section 2.2—and displayed in Figure 1—as well as the trivariate pairwise beta dependence model from Example 4—depicted in Figure 2. Monte Carlo evidence will be reported in Section 4.2 and in the Supplementary Materials. For now, we concentrate on illustrating the methods over a single-run experiment on these scenarios. For each dependence model from Examples 1–3, we draw a sample $\{(Y^{i}_{1},Y^{i}_{2})\}_{i=1}^{n}$ from the corresponding bivariate extreme value distribution $G_{x}$ with sample size $n=6000$ and where each observation $(Y^{i}_{1},Y^{i}_{2})$ has unit Fréchet margins and is drawn from the chosen dependence model conditional on a fixed value $x^{i}$ of the covariate $x$ . For estimating $h_{x}$ , we only consider the observations with a radial component exceeding its 95% quantile, and we end up with $n_{\mathbf{r}}=300$ extreme (angular) observations. To gain insight into the bias and variance of our covariate-adjusted spectral density estimator, we compute its 95% asymptotic confidence bands based on Theorem 1 and at different values of $w\in(0,1)$ . There are two possible sources of bias in our estimation procedure. First, the limiting extremal dependence structure is estimated at a sub-asymptotic level, i.e., based on angular observations exceeding a finite diagonal threshold level. Then, the penalization of the model likelihood causes a smoothing bias (Wood, 2017) if the smoothing parameters do not vanish at a certain rate (see Section 3.3). The uncertainty due to the choice of the parametric model is deliberately not taken into account, that is, the simulations are performed in a well-specified framework.

Figure 3 displays the estimates of the covariate-adjusted spectral densities from Examples 1, 2, and 3 for various fixed values of the covariate $x$ that induce different extremal dependence strengths. All panels show that for the different extremal dependence schemes (strength and asymmetry), the covariate-adjusted spectral densities are accurately estimated and the true curves fall well within the 95% confidence bands. A systematic slight upward bias is observed when approaching extremal independence. This is due to the residual dependence in the data that we observe at finite threshold levels but that should vanish at an asymptotic level. This issue can be corrected either by taking higher threshold levels or considering angular observations simulated from the true spectral density. Finally, the estimates in the Dirichlet case seem to be a bit more biased, and this might be explained by the fact that both of the two non-orthogonal parameters of the model depend smoothly on the covariate $x$ .

We now consider the case of the trivariate pairwise beta dependence model from Example 4. The construction of the pairwise beta covariate-adjusted spectral density—which extends Cooley et al. (2010)—is such that the corresponding multivariate extreme value distribution cannot be computed in closed form. Hence, we draw a sample $\left\{\left(w_{i,1},w_{i,2},w_{i,3}\right)\right\}_{i=1}^{n_{\mathbf{r}}}$ with sample size $n_{\mathbf{r}}=300$ where each observation $\left(w_{i,1},w_{i,2},w_{i,3}\right)$ is drawn from the pairwise beta model conditional on a fixed value $x_{i}$ of the covariate $x$ , as illustrated in Figure 2. Figure 4 displays the contour plots of the estimates of the covariate-adjusted spectral density from Example 4 at three fixed values of $x$ .

All panels in Figure 4 show that, for the different extremal dependence schemes, i.e., for the different considered values of $x$ , the contour plots of the estimates are remarkably close to the actual contour plots. The estimates are slightly more biased near the edges of the simplex than in the center, reflecting a better estimation of the global dependence parameter compared to the pairwise dependence parameters.

4.2 Monte Carlo Evidence

A Monte Carlo study was conducted by simulating $500$ samples of sizes $6000$ and $10000$ , that is, $n_{\mathbf{r}}=300$ and $n_{\mathbf{r}}=500$ extreme (angular) observations, respectively. As can be seen from Figures 1 and 2 in the Supplementary Materials, our method successfully recovers the corresponding target covariate-adjusted angular densities with a high level of precision over the simulation study. In what follows we focus on documenting how the level of accuracy increases when the number of observations increases by assessing the mean integrated absolute error (MIAE)—which for the bivariate case can be written as

[TABLE]

The results are reported in Table 1.

As expected, an increase in the number of angular observations leads to a reduction of MIAE. Evidence from Table 1 should be supplemented with Figures 1 and 2 in the Supplementary Materials. The latter offer a more granular level of detail than that of Table 1 on the behavior of the estimator over specific values of the covariate and of the unit simplex.

5 Extreme Temperature Analysis

5.1 Data Description, Motivation for the Analysis, and Preprocessing

In this section, we describe an application to modeling the dependence between extreme air winter (December–January–February) temperatures at two sites in the Swiss Alps: Montana—at an elevation of $1427$ m—and Zermatt—at an elevation of $1638$ m. The sites are approximatively $37$ km apart.

In the Alpine regions of Switzerland, there is an obvious motivation to focus on extreme climatic events, as their impact on the local population and infrastructure can be very costly. As stated by Beniston (2007), warm winter spells, that is, periods with strong positive temperature exceedances in winter, can exert significant impacts on the natural ecosystems, agriculture, and water supply:

“Temperatures persistently above $0^{\circ}$ C will result in early snow-melt and a shorter seasonal snow cover, early water runoff into river basins, an early start of the vegetation cycle, reduced income for alpine ski resorts and changes in hydro-power supply because of seasonal shifts in the filling of dams (Beniston, 2004).”

In this analysis, we are interested in the dynamics of the dependence between extreme air temperatures in Montana and Zermatt during the winter season. The dynamics of both extreme high and extreme low winter temperatures in these two sites will be assessed and linked to the following explanatory factors: time (in years) ( $t$ ), day within season ( $d$ ), and the NAO (North Atlantic Oscillation) index ( $z$ ); the latter is a normalized pressure difference between Iceland and the Azores that is known to have a major direct influence on the alpine region temperatures, especially during winter (Beniston, 2005). The choice of the studied sites is of great importance in this analysis. Beniston and Rebetez (1996) showed that both cold and warm winters exhibit temperature anomalies that are altitude-dependent, with high-elevation resorts being more representative of free atmospheric conditions and less likely to be contaminated by urban effects. Therefore, to study the “pure” effect of the above-mentioned explanatory covariates on the winter temperature extremal dependence, we choose the two high elevation sites Montana and Zermatt.

The data consist of daily winter temperature minima and maxima measured at $2$ m above ground surface and were obtained from the MeteoSwiss website (www.meteoswiss.admin.ch). The data were available from 1981 to 2016, giving a total of $3190$ winter observations per site. Daily NAO index measurements were obtained from the NOAA (National Centers for Environmental Information), at https://www.ngdc.noaa.gov/ftp.html.

We first transform the minimum temperature data by multiplication by $-1$ and then fit at each site—and to both daily minimum and maximum temperatures—a Generalized Pareto Distribution (GPD) (Coles, 2001, ch. 4)

[TABLE]

to model events above the $95\%$ quantile $u_{95}$ for each of the four temperature time series. In (17), $\sigma>0$ is the scale parameter that depends on $u_{95}$ , and $-\infty<\xi<\infty$ is the shape parameter. As is common with temperature data analysis, we test the effect of time $t$ on the behavior of the threshold exceedances by allowing the scale parameter of the GPD (17) to smoothly vary with $t$ (Chavez-Demoulin and Davison, 2005). Based on the likelihood ratio tests, a model with a non-stationary scale parameter is preferred only in Zermatt for the threshold exceedances of the daily minimum temperatures ( $p$ -value $\approx 0.022$ ). Graphical goodness-of-fit tests for the four GPD models are conducted by comparing the distribution of a test statistic $S$ with the unit exponential distribution (if $Y\sim G_{\sigma,\xi}$ , then $S=-\ln\{1-G_{\sigma,\xi}(Y)\}$ is unit exponentially distributed). Figure 5 displays the resulting qq-plots and confirms the validity of these models.

The fitted models are then used to transform the data to a common unit Fréchet scale by probability integral transform and where the empirical distribution is used below $u_{95}$ . This results in two datasets of bivariate observations (in Montana and Zermatt) with unit Fréchet margins: one for the daily maximum temperatures and the other one for the daily minimum temperatures.

Following the theory developed in Section 2.1, we transform each of the two datasets into pseudo-datasets of radial and angular components. By retaining the angular observations corresponding to a radial component exceeding its $95\%$ quantile in each pseudo-dataset, we end up with two pseudo-samples of $160$ extreme bivariate (angular) observations in each pseudo-dataset.

5.2 Covariate-Adjusted Dependence of Extreme Temperatures

In the following analyses of the dynamics of the dependence between extreme temperatures in Montana and Zermatt—and in line with findings from previous analyses of extreme temperatures in Switzerland (Davison and Gholamrezaee, 2011; Davison et al., 2013; Dombry et al., 2013)—we assume asymptotic dependence in both extremely high and extremely low winter temperatures.

Dependence of Extreme High Winter Temperatures

The covariate-adjusted bivariate angular densities presented in Section 2.2 are now fitted to the pseudo-sample of extreme high temperatures. The effects of the explanatory covariates $t$ , $z$ , and $d$ are tested in each of the three angular densities: the logistic model (Example 1) with parameter $\alpha(t,z,d)$ , the Dirichlet model (Example 2) with parameters $\alpha(t,z,d)$ and $\beta(t,z,d)$ , and the Hüsler–Reiss model (Example 3) with parameter $\lambda(t,z,d)$ . Within each family of covariate-adjusted angular densities, likelihood ratio tests (LRT) are performed to select the most adequate VGAM for the dependence parameters. Table 2 shows the best models in each of the three families of angular densities.

All the considered covariates have a significant effect on the strength of dependence between extreme high temperatures in Montana and Zermatt. For the covariate-dependent Dirichlet model, the covariates affect the dependence parameters $\alpha$ and $\beta$ differently. However, these parameters lack interpretability, and Coles and Tawn (1994) mention the quantities $(\alpha+\beta)/2$ and $(\alpha-\beta)/2$ that can be interpreted as the strength and asymmetry of the extremal dependence, respectively. In this case, the best Dirichlet dependence model found in Table 2 is such that both the intensity and the asymmetry of the dependence are affected by time, NAO, and day in season.

The best models in the studied angular density families are then compared by means of the AIC (see Section 3.2) displayed in Table 2. The Dirichlet model with $\alpha(z)$ and $\beta(t,d)$ parameters has the lowest AIC and is hence selected. This suggests the presence of asymmetry in the dependence of extreme high temperatures between Montana and Zermatt. Figure 6 shows the fitted smooth effects of the covariates on the extremal coefficient—constructed via the covariate-adjusted extremal coefficient as in (10)—that lies between $1$ for perfect extremal dependence and $2$ for perfect extremal independence.

A decrease in the extremal coefficient, or equivalently an increase in the extremal dependence between high winter temperatures in Montana and Zermatt, is observed from $1988$ until $2006$ . This change might be explained first by a warm phase of very pronounced and persistent warm anomalies during the winter season, which occured countrywide from $1988$ to $1999$ (Jungo and Beniston, 2001), and then by an exceptionally warm $2006/2007$ winter that took place in Europe Luterbacher et al. (2007). Regarding the NAO effect, as expected, we observe an increase in the extremal dependence during the positive phase of NAO that has a geographically global influence on the Alps and results in warmer and milder winters, as depicted by Beniston (1997). In terms of the very negative NAO values (less than $-100$ ), there is an important uncertainty due to the corresponding small amount of joint extreme high temperatures (8%).

The right panel of Figure 6 suggests an increase in the extremal dependence around mid-December. This evidence also seems compatible with the countrywide findings by Beniston (1997), who claims that

“The anomalously warm winters have resulted from the presence of very persistent high pressure episodes which have occurred essentially during periods from late Fall to early Spring.”

Dependence of Extreme Low Winter Temperatures

The effects of the covariates time, NAO, and day in season on the dependence between extreme cold winters in Montana and Zermatt are now tested by fitting the bivariate angular densities of Section 2.2. Within each of the logistic, Dirichlet, and Hüsler–Reiss families, LRTs are performed, and the selected models are displayed in Table 3.

The explanatory covariates have different effects on the extremal dependence, depending on the family of angular densities. The AICs for the fitted models are quite close, and the asymmetric Dirichlet model has the lowest AIC and is hence the retained model. As opposed to the extremal dependence between warm winters in the two mountain sites, the NAO has a non-significant effect on the extremal dependence between cold winters. This might be explained by the fact that high values of the NAO index will affect the frequency of extreme low winter temperatures (less extremes) and hence the marginal behavior of the extremes at both sites, but not necessarily the dependence of the extremes between these sites (Beniston, 2004, sec. 7.3.2).

Figure 7 shows the fitted smooth effects of time and day in season. The extremal dependence between low winter temperatures in Montana and Zermatt is high, regardless of the values taken by the covariates $t$ and $d$ . The range of values of the extremal coefficient observed in Figure 7 is in line with the findings of Davison et al. (2013), where the value of the extremal coefficient for the dependence between extreme low winter temperatures (in Switzerland) is around $1.3$ for pairs of resorts separated by up to $100$ km. Overall, the extremal coefficient is lower in the extreme low winter temperatures than in the extreme high winter temperatures. This could be explained by the fact that minimum winter temperatures are usually observed overnight when the atmosphere is purer and not affected by local sunshine effects and hence is more favorable to the propagation over space of cold winter spells.

A decrease in the extremal dependence is observed from around $2007$ and results in values of the extremal coefficient that are comparable to those obtained under the warm winter spells scenario (see Figure 6). This can be explained by a decrease in the intensity of the joint extreme low temperatures, that is, milder joint extreme low temperatures, occurring during the last years of the analysis, as can be observed in Figure 8. The right panel of Figure 7 highlights a decrease in the extremal dependence when approaching spring. This effect can be explained by the fact that mountains often produce their own local winds.††\dagger††\dagger $\dagger$ https://www.morznet.com/morzine/climate/local-climate-in-the-alps These warm dry winds are mostly noticeable in spring and are called Foehn in the Alps. Local effects obviously lead to a decrease of extremal dependence between the two resorts.

6 Final Remarks

In this paper, we have introduced a sturdy and general approach to model the influence of covariates on the extremal dependence structure. Keeping in mind that extreme values are scarce, our methodology borrows strength from a parametric assumption and benefits directly from the flexibility of VGAMs. Our non-linear approach for covariate-varying extremal dependences can be regarded as a model for conditional extreme value copulas—or equivalently as a model for nonstationary multivariate extremes. An important advantage over existing methods is that our model profits from the VGAM framework, allowing the incorporation of a large number of covariates of different types (continuous, factor, etc) as well as the possibility for the smooth functions to accommodate different shapes. The fitting procedure is an iterative ridge regression, the implementation of which is based on an ordinary N–R type algorithm that is available in many statistical software. An illustration is provided in the R code in the Supplementary Materials.

The method paves the way for novel applications, as it is naturally tailored for assessing how covariates affect dependence between extreme values—and thus it offers a natural approach for modeling conditional risk. Conceptually, the proposed approach is valid in high dimensions. Yet, as for the classical setting without covariates, the number of parameters would increase quickly with the dimension and additional complications would arise. Relying on composite likelihoods (Padoan et al., 2010) instead of the full likelihood seems to represent a promising path for future extensions of the proposed methodology in a high-dimensional context.

Supplementary Materials

The online supplement to this article contains supplementary numerical experiments, R codes for implementing VGAM family functions for different angular density families, as well as the R codes used for the extreme temperature analysis.

Monte Carlo Evidence:

The file contains the results of the Monte Carlo study conducted in Section 4.2. (.pdf file)

Covariate Adjusted Angular Densities:

The file contains R codes for implementing the following angular density VGAM families: the bivariate logistic, the bivariate Dirichlet, the bivariate Hüsler–Reiss, and the trivariate pairwise beta (see Section 2.2). Examples of the use of the implemented VGAM families are provided. (.zip file)

Temperature Data Analysis:

The file contains the datasets obtained from the MeteoSwiss website as well as the R codes for the analysis of the extremal dependence between winter temperatures in Montana and Zermatt. (.zip file)

Acknowledgments

We thank the Editor, Associate Editor, and two anonymous referees for several insightful recommendations that substantially improved the paper. We extend our thanks to the participants of Workshop 2017, EPFL, for discussions and comments, and to Paul Embrechts for his constant encouragement.

Funding

The research was partially funded by FCT (Fundação para a Ciência e a Tecnologia, Portugal) through the project UID/MAT/00006/2013.

Bibliography55

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Beirlant et al. (2004) Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J., De Waal, D., and Ferro, C. (2004), Statistics of Extremes: Theory and Applications , New York: Wiley.
2Beniston (1997) Beniston, M. (1997), “Variations of Snow Depth and Duration in the Swiss Alps over the Last 50 Years: Links to Changes in Large-scale Climatic Forcings,” Climatic Change , 36, 281–300.
3Beniston (2004) — (2004), Climatic Change and Its Impacts: An Overview Focusing on Switzerland , Advances in Global Change Research, Netherlands: Springer.
4Beniston (2005) — (2005), “Warm Winter Spells in the Swiss Alps: Strong Heat Waves in a Cold Season? A Study Focusing on Climate Observations at the Saentis High Mountain Site,” Geophysical Research Letters , 32, 1–5.
5Beniston (2007) — (2007), “Linking Extreme Climate Events and Economic Impacts: Examples from the Swiss Alps,” Energy Policy , 35, 5384–5392.
6Beniston and Rebetez (1996) Beniston, M. and Rebetez, M. (1996), “Regional Behavior of Minimum Temperatures in Switzerland for the Period 1979–1993,” Theoretical and Applied Climatology , 53, 231–243.
7Boldi and Davison (2007) Boldi, M.-O. and Davison, A. C. (2007), “A Mixture Model for Multivariate Extremes,” Journal of the Royal Statistical Society, Series B , 69, 217–229.
8Castro and de Carvalho (2017) Castro, D. and de Carvalho, M. (2017), “Spectral Density Regression for Bivariate Extremes,” Stochastic Environmental Research and Risk Assessment , 31, 1603–1613.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Regression Type Models for Extremal Dependence

Abstract

1 Introduction

2 Flexible Covariate-Adjusted Angular Densities

2.1 Statistics of Multivariate Extremes: Preparations and Background

2.2 Vector Generalized Additive Models for Covariate-Adjusted Angular Densities

Example 1** (Logistic angular surface).**

Example 2** (Dirichlet angular surface).**

Example 3** (Hüsler–Reiss angular surface).**

Example 4** (Pairwise beta angular surface).**

3 Inference and Asymptotic Properties

3.1 Fitting Algorithm

3.2 Selection of the Smoothing Parameters

3.3 Large Sample Properties

Theorem 1**.**

4 Simulation Study

4.1 Data Generating Processes and Preliminary Experiments

4.2 Monte Carlo Evidence

5 Extreme Temperature Analysis

5.1 Data Description, Motivation for the Analysis, and Preprocessing

5.2 Covariate-Adjusted Dependence of Extreme Temperatures

Dependence of Extreme High Winter Temperatures

Dependence of Extreme Low Winter Temperatures

6 Final Remarks

Example 1 (Logistic angular surface).

Example 2 (Dirichlet angular surface).

Example 3 (Hüsler–Reiss angular surface).

Example 4 (Pairwise beta angular surface).

Theorem 1.