Hierarchical Transformed Scale Mixtures for Flexible Modeling of Spatial Extremes on Datasets with Many Locations
Likun Zhang, Benjamin A. Shaby, and Jennifer L. Wadsworth

TL;DR
This paper introduces a computationally efficient hierarchical scale mixture model for high-dimensional spatial extremes, preserving tail dependence properties and enabling practical inference on large datasets.
Contribution
It proposes a modified class of scale mixture Gaussian models that retain extremal dependence features while avoiding expensive Gaussian distribution evaluations, facilitating high-dimensional spatial extreme analysis.
Findings
Model captures diverse tail dependence structures.
Efficient inference via parallelized MCMC with Rcpp.
Application reveals complex tail dependence in wildfire risk data.
Abstract
Flexible spatial models that allow transitions between tail dependence classes have recently appeared in the literature. However, inference for these models is computationally prohibitive, even in moderate dimensions, due to the necessity of repeatedly evaluating the multivariate Gaussian distribution function. In this work, we attempt to achieve truly high-dimensional inference for extremes of spatial processes, while retaining the desirable flexibility in the tail dependence structure, by modifying an established class of models based on scale mixtures Gaussian processes. We show that the desired extremal dependence properties from the original models are preserved under the modification, and demonstrate that the corresponding Bayesian hierarchical model does not involve the expensive computation of the multivariate Gaussian distribution function. We fit our model to exceedances of a…
| Monte Carlo SE | 0.0013 | 0.0071 | 0.3570 | 0.0003 | 0.0134 | 0.0002 | 0.0001 | 0.0004 |
| ESS per second | 0.223 | 0.468 | 0.098 | 0.190 | 0.027 | 0.027 | 0.031 | 0.019 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\NewDocumentCommand\evalat
sOmm\IfBooleanTF#1 \mleft. #3 \mright—#4 #3#2—#4
Hierarchical Transformed Scale Mixtures for Flexible Modeling of Spatial Extremes on Datasets with Many Locations
Likun Zhang
Department of Statistics, Pennsylvania State University
Benjamin A. Shaby
Department of Statistics, Colorado State University
Jennifer L. Wadsworth
Department of Mathematics and Statistics, Fylde College, Lancaster University
Abstract
Flexible spatial models that allow transitions between tail dependence classes have recently appeared in the literature. However, inference for these models is computationally prohibitive, even in moderate dimensions, due to the necessity of repeatedly evaluating the multivariate Gaussian distribution function. In this work, we attempt to achieve truly high-dimensional inference for extremes of spatial processes, while retaining the desirable flexibility in the tail dependence structure, by modifying an established class of models based on scale mixtures Gaussian processes. We show that the desired extremal dependence properties from the original models are preserved under the modification, and demonstrate that the corresponding Bayesian hierarchical model does not involve the expensive computation of the multivariate Gaussian distribution function. We fit our model to exceedances of a high threshold, and perform coverage analyses and cross-model checks to validate its ability to capture different types of tail characteristics. We use a standard adaptive Metropolis algorithm for model fitting, and further accelerate the computation via parallelization and Rcpp. Lastly, we apply the model to a dataset of a fire threat index on the Great Plains region of the US, which is vulnerable to massively destructive wildfires. We find that the joint tail of the fire threat index exhibits a decaying dependence structure that cannot be captured by limiting extreme value models.
Keywords: Asymptotic dependence, Asymptotic independence, Censored likelihood, Threshold exceedance
1 Introduction
Modeling the dependence structure in the extremes of spatial processes is of great consequence for risk analysis of extreme events. In this paper, we make a slight alteration to the flexible class of randomly scaled transformed Gaussian process models to sidestep computational bottlenecks normally encountered in the likelihood. Our modification enables high-dimensional inference, while preserving submodels that transition smoothly between extremal dependence classes.
Generally, the probability that two spatially-indexed random variables exceed a high level simultaneously varies by their separation distance, and the particular way in which this joint probability decays must be well-represented in models if one hopes to accurately to assess risks posed by spatial extremal phenomena. Good estimation of how the dependence changes both with distance in space and as one moves farther into the joint tail will enable us to accurately calculate exceedance probabilities of areal quantities, predict at un-observed locations, and, secondarily, get a more realistic picture of marginal quantities.
In classical spatial modeling, Gaussian processes have been widely used due to their mathematical simplicity and tractability for larger datasets. However, the Gaussian density function is very light-tailed, and thus has the potential to underestimate probabilities associated with extreme events; furthermore, Gaussian models stipulate that the dependence among rare events at distinct locations will always diminish such that the probability of observing an extreme at one location, conditional upon an extreme at another location, is zero in the limit. This property is termed asymptotic independence, but models that only exhibit this phenomenon may be too inflexible for applications where the true tail dependence structure is uncertain.
Max-stable processes form an important class of models that exhibit the alternative scenario of asymptotic dependence. They are the natural extension of classical univariate extreme value theory to infinite dimensional settings, and therefore can provide an asymptotically-justified modeling framework for datasets consisting of block-maxima. Counterparts of max-stable processes suitable for threshold exceedances are called generalized Pareto processes (Ferreira and de Haan, 2014; Thibaud and Opitz, 2015). These processes are also asymptotically dependent, but possess the advantage that they bypass many of the computational difficulties of max-stable processes.
Despite the theoretical appeal of limiting max-stable and generalized Pareto processes, there are two main drawbacks to these models: (i) the assumption of asymptotic dependence may be incorrect and (ii) even if the data are asymptotically dependent, they will often not appear to follow such limiting models at sub-asymptotic levels. Both max-stable and generalized Pareto process dependence structures exhibit stability properties, meaning that their dependence structures are invariant to the operations of taking maxima and conditioning upon exceedances of higher thresholds, respectively. If the true data generating process exhibits weakening dependence in the un-observed region of the tail, inference drawn under these models about the far joint tail will over-estimate risk, sometimes substantially.
On account of the limitations of limiting models, it is desirable to find a family of spatial models that can transition between asymptotic dependence and asymptotic independence. In particular, we will be examining a class of marginally transformed Gaussian scale mixture models, which includes those recently proposed by Huser et al. (2017) and Huser and Wadsworth (2019). These models are of great interest due to their appealing theoretical properties and ability to flexibly capture both types of extremal dependence structure. Unfortunately, inference for these models is not feasible for large numbers of observed sites, as calculation of the censored likelihood, the preferred method for fitting joint tail models, entails integration over high-dimensional multivariate Gaussian distribution functions. To increase scalability, we propose an adaptation of the model by adding an independent measurement error term to each component. By adding this nugget effect, the new model circumvents the lengthy computation of the multivariate normal distribution function. Also it can elegantly avoid the integral of the process below the censoring threshold by considering the uncensored process as latent and drawing from it using Gibbs sampling, allowing for truly high-dimensional inference. Furthermore, we show that the modified models retain all the significant asymptotic properties of the original smooth models despite the presence of the measurement errors, which lays a solid theoretical foundation for correctly capturing the sub-asymptotic dependence behavior.
The article is organized as follows. Section 2 provides a brief literature review on the measures of extremal dependence and hybrid spatial extreme models, and further explains the intractability of the existing censored likelihood approaches for inference on these hybrid models. Section 3 describes our new model that alleviates the computational problems, and studies its extremal dependence properties. Section 4 includes a marginal transformation in the hierarchical model and details the inference using Gibbs sampling. Section 5 presents a simulation study that validates the methodology. We apply our model to a dataset of the Fosberg Fire Weather Index (FFWI) on the Great Plains in Section 6. Section 7 concludes with some discussion. Appendix A provides proofs of all the theoretical results. Appendix B includes supplementary diagnostics for the data application.
2 Spatial Dependence for Extremes
For a stochastic process , we write and so forth for simplicity, where denotes the th spatial location. It is useful to summarize the extremal dependence implied by the observed process concisely.
We restrict the scope to the bivariate case, focusing on stationary and isotropic random fields. One example of a bivariate dependence measure is the upper tail dependence coefficient:
[TABLE]
where , , and . Joe (1993) defined the upper tail dependence parameter as the limit . Asymptotic dependence is attained if and only if , while defines asymptotic independence.
For max-stable processes, , where . Max-stable distributions can be associated to a generalized Pareto counterpart, for which for all above a certain level (Rootzén et al., 2018). The fact that does not depend on is a manifestation of the threshold-stability of generalized Pareto processes. In practice, empirical estimates of (1) from data tend to show decreasing with both and , meaning that realistic models should also have this property.
When , i.e., the case of asymptotic independence, further detail about the behavior of is obtained by exploiting the joint tail assumption of Ledford and Tawn (1996):
[TABLE]
where is slowly varying at zero, that is, for any , and is the coefficient of tail dependence of the process . The pair of variables are asymptotically dependent when and . The remaining cases are all asymptotically independent, and the value of characterizes the strength of extremal dependence in the upper joint tail. In the case of a Gaussian process, , where is the correlation at lag . The variables are called positively associated when and negatively associated when . Near independence corresponds to .
Gaussian processes are asymptotically independent for all correlations . They might be considered candidates for modeling the joint tail of asymptotically independent phenomena, but as there is no theory to specifically recommend Gaussian processes in this scenario, it is desirable to consider other models as well. As an alternative, Opitz (2016) captures spatial dependence in asymptotically independent processes by construction of Laplace random fields, defined as mixtures of Gaussian processes with a random variance that is exponentially distributed. Wadsworth and Tawn (2012) proposed the class of inverted max-stable processes, for which the tail decay is specified fully by , although inference is computationally challenging.
In real datasets, it is difficult to conclude definitively whether data exhibit asymptotic independence or asymptotic dependence, and incorrectly assuming an asymptotically independent model can lead to equally severe problems with bias as incorrectly assuming an asymptotically dependent model. Because of this, a recent focus in the literature has been on models that can encompass both scenarios.
2.1 Traversing Asymptotic Independence and Dependence in Spatial Extremes
Wadsworth and Tawn (2012) were the first to introduce hybrid models that combine max-stable and inverted max-stable processes so that asymptotic dependence prevails at short distances, and asymptotic independence at long distances. However, inference for this model is difficult because there are a fairly large number of parameters involved, and the transition between the dependence classes takes place at the boundary of the parameter space.
Recently, several Gaussian scale mixture models were proposed to allow more flexible transitions between dependence classes. Through multiplying an asymptotically independent Gaussian process by a random effect that governs the extremal dependence, these models can be described by a small number of parameters and have non-trivial asymptotically independent and asymptotically dependent submodels. More precisely, suppose is a standard isotropic and stationary Gaussian process with covariance function indexed by a parameter vector , where is the length of the separation vector, so that is the covariance matrix of associated finite-dimensional distributions. The class of Gaussian scale mixture models can be constructed as
[TABLE]
where is a link function, and is a random scaling factor, from distribution indexed by , that can be interpreted as a constant random process over spatial domain with perfect dependence. Impacting simultaneously the whole domain , heavier tailed induces asymptotic dependence in , whereas lighter tailed induces asymptotic independence. Engelke et al. (2019) provide a fuller description of how extremal dependence of relates to the relative marginal tail heaviness of and .
Morris et al. (2017a) uses a space-time model based on skew- process, where is a identity function, is an inverse gamma random variable, and is a Matérn covariance function. On top of the mixture, they added covariate effects and a skew term. Since the inverse gamma distribution is heavy tailed, the skew- process is asymptotically dependent for . Asymptotic independence is achieved only when .
Huser et al. (2017) also used an identity link function, but placed few assumptions on the random scale, and provided more general results on the joint tail decay rates of the mixture processes. They showed that a wide class of Weibull-like tail decay in yields asymptotic independence, while a Pareto-like tail that is regularly varying at infinity gives asymptotic dependence. They also proposed a parametric model that bridges the two asymptotic regimes and provides a simple transition, in which is a two-parameter distribution
[TABLE]
where , and the support is . Since converges to as approaches 0, (4) forms a continuous parametric family on . When , (4) constitutes a class of Weibull-type distributions and thus assures asymptotic independence. When , the variable is Pareto distributed and thus gives asymptotic dependence. This shows that the model provides greater flexibility and can transition from asymptotic dependence to independence via adjusting the value of .
However, the previous two Gaussian scale mixture models both make the transition between the dependence classes at the limit or the boundary of the parameter space. They are also inflexible in their representation of asymptotic dependence structures because there is dominating preference over one dependence class. It may be more desirable to find a model for which the transition takes place in the interior of the parameter space so one we can quantify the uncertainty about the dependence class in a simpler manner. To overcome this, Huser and Wadsworth (2019) proposed a marginally transformed Gaussian scale mixture model, where transforms a standard Gaussian variable to standard Pareto, and itself is Pareto distributed:
[TABLE]
Here the type of asymptotic dependence is determined by the value of . When , is lighter tailed or equivalent to standard Pareto, which induces asymptotic independence; when , the converse is true, which induces asymptotic dependence. Specifically, the upper tail dependence parameter when and 0 otherwise, while the coefficient of tail dependence is
[TABLE]
where is the coefficient of tail dependence for (Huser and Wadsworth, 2019).
The model in (5) provides a smooth transition through asymptotically independent and asymptotically dependent submodels. It has many appealing asymptotic properties. However, inference for models of the form (3) is typically made via censored likelihood. This requires computing an integral where the integrand contains the Gaussian distribution function in dimensions, where is the number of components below a designated high threshold. Such integrals are computationally prohibitive for even moderately-sized datasets. In Section 3 we introduce a slight alteration to this model to make it tractable while preserving all the desired asymptotic results.
2.2 The Censored Likelihood
In multivariate and spatial extremes, the preferred approach to fitting the dependence structure is using a censored likelihood, which prevents observations from the bulk of the distribution from affecting the estimation of the extremal dependence structure. It provides a reasonable compromise between bias and variance compared to alternative approaches, although different censoring schemes have been adopted (Thibaud and Opitz, 2015; Huser et al., 2016).
For a process of the form (3) observed at spatial locations , we obtain the distribution function by conditioning on as
[TABLE]
where , and denotes the -variate Gaussian distribution with zero mean and covariance matrix .
Let be the set of locations with censored observations—that is, the set of locations where the components are below a high threshold; let be the set of locations with uncensored observations. For any index set , denote , as the matrix restricted to the rows in and the columns in , and let be the Schur complement of in . The likelihood is obtained via taking partial derivatives of (6) with respect to :
[TABLE]
Although only one-dimensional integral appears in (7), the integrand includes a -dimensional Gaussian distribution function. When approximating the integral using standard quadrature or Monte Carlo methods, one needs to compute for each sample point taken on . This is only feasible when the number of locations is moderate. Additionally, this calculation will have to be repeated for each time replicate.
To avoid the integrating the process below the threshold, one could instead think of as latent and draw from it using Monte Carlo methods. Consequently there is no need to compute the awkward likelihood (7). However, to update the Markov chain each time, it is now necessary to draw from a high-dimensional truncated distribution, which might again be computationally intensive.
Therefore, we propose to make a slight adjustment to the model in (3). Our new model is markedly more amenable to higher-dimensional inference, yet it keeps hold of the joint tail decay rates attained in the original model (e.g. Huser and Wadsworth, 2019; Huser et al., 2017). Equivalently, our new model has non-trivial asymptotically dependent and asymptotically independent submodels with the transition taking place in the interior of the parameter space in the case of our modified version of (5).
3 Model
3.1 Construction
We alter the models in Section 2.1 by adding an independent measurement error term to each component,
[TABLE]
where , and distribution of and the link function remain the same. That is, we add a simple nugget effect to the smooth process . When drawing the latent processes below the threshold, we can condition on the smooth process and simply update the noisy one. Because these error terms are independent of each other, there is only a univariate integral involved in the full conditional likelihood. Also, when we update the smooth process given the noisy process , no truncation or censoring is present and it is much easier to sample from the corresponding likelihood. Section 4 contains more details on the Markov Chain Monte Carlo (MCMC) updating scheme, where we show how this small alteration can hugely facilitate inference.
3.2 Dependence Properties
We begin with the model (5) from Huser and Wadsworth (2019), modified as in (8). Recall that is a stationary process with standard Pareto margins possessing asymptotic independence; i.e., and
[TABLE]
where is slowly varying at infinity, and for the Gaussian correlation .
Figure 1 illustrates the estimated coefficient of tail dependence as a function of for . For each combination of and , we generate 5,000,000 replicates from model (5) (i.e. ) and model (8) with respectively. For each replicate, we sample from a Gaussian copula with correlation . We then numerically approximate the joint survival probability in (2) to obtain an estimate of . The left panel of Figure 1 clearly shows that the smooth transition from asymptotic independence to asymptotic dependence takes place around , confirming the results from Huser and Wadsworth (2019) with a reasonable bias; the right panel shows that adding a measurement error has little effect on the tail dependence because exhibits similar behavior. This result invites investigation of whether the flexible asymptotic properties in Huser and Wadsworth (2019) are preserved in the altered model. In the following, we generalize the problem from the specific model of Huser and Wadsworth (2019) for the process in (8), to any with a wide class of marginal tail behaviors.
The impact of the additive Gaussian nugget effect on the extremal dependence of depends upon the marginal tail heaviness of : roughly, the heavier the tail of , the less the impact of the noise. Since we take a copula-like approach and employ as the spatial dependence model, we assume its margins, and those of , are identical over space.
We will focus on the dependence of under two broad classes of marginal distribution for : regularly varying tails, and Weibull-like tails. A measurable function is said to be regularly varying at infinity with index if for all ; we write . When , the function is slowly varying.
Regularly varying tails are defined through the survival function being regularly varying at infinity, i.e., , . Weibull-like tails are defined through the survival function
[TABLE]
where , and is termed the Weibull index. We also assume that has a density satisfying , with . The main results are now summarized in Proposition 3.1.
Proposition 3.1**.**
With definitions and notation as above:
If has a regularly varying tail, or Weibull-like with Weibull index , then and . 2. 2.
If has a Weibull-like tail with Weibull index then
[TABLE]
and . Note if then so is . 3. 3.
If has a Weibull-like tail with Weibull index then:
- (a)
If , then 2. (b)
If , then an interval can be given for (see Expression (27)). 3. (c)
If , then .
The proof of Proposition 3.1 is given in Appendix A.
For the process of Huser and Wadsworth (2019), described in equation (5). In this case, always has a regularly varying tail, so Proposition 3.1 part 1 gives , and , with and given in Section 2.1. This means the flexible asymptotic properties in Huser and Wadsworth (2019) are preserved in the altered model.
Another popular model for spatial data is the -process (Røislien and Omre, 2006), which is a Gaussian scale mixture for which the mixing variable follows an inverse gamma distribution. If is a -process, then it is asymptotically dependent with a regularly varying tail, so and by Proposition 3.1. Similarly, the skew- process (Padoan, 2011; Morris et al., 2017b) is regularly varying and asymptotically dependent, so the same conclusions apply.
For the Gaussian scale mixture of Huser et al. (2017), either has regularly varying or Weibull-like tails depending on the distribution of the scaling variable . In particular when in distribution (4), has a Weibull-like tail with Weibull index . As such, parts 1, 2 and 3a of Proposition 3.1 are relevant. In all cases , and for . When then has a regularly varying tail, and is asymptotically dependent with , and
[TABLE]
where is the cdf of the Student- distribution with degrees of freedom.
We note that the process in equation (3) is constructed only for its dependence properties, and there is no “natural” scale on which to express it. For example, considering the process of Huser and Wadsworth (2019) we could also write
[TABLE]
with , , which has the same dependence structure as defined in (3) and (5), since it is obtained through a monotonic marginal transformation. Taking from (11), Proposition 3.1 part 2 gives . Asymptotic dependence of implies asymptotic dependence of , but only bounds on are available.
In practice, if choosing a marginal scale for , there may be a trade-off between theoretical desires and computational practicality. Supposing that we wish to inherit the properties of , a heavy-tailed choice is best.
4 Bayesian Inference
4.1 Hierarchical Model
We define a Bayesian hierarchical model based on the process (8) defined in Section 3.1 and use a MCMC algorithm to fit to the data. For the reasons outlined in Section 2.2, we assume data are censored below a high threshold . In (8), the mixing parameter controls both joint and marginal behavior of the response , which we would prefer to separate. Therefore, motivated by the theory of univariate extremes, we first assume our observations above the same high threshold are generalized Pareto distributed, and we include a marginal transformation in the hierarchical model.
Let denote the observed process. We define a marginal transformation as follows:
[TABLE]
where is the marginal distribution function for process (8), and
[TABLE]
where , is a high threshold, and , with support . Conditioning on the smooth process , which was not truncated, the censored likelihood for an observation can be derived as
[TABLE]
where . Note that there are only univariate calculations required in (14), compared to (7) for which we have to estimate the -dimensional Gaussian distribution functions. In addition, since and are independent conditioning on the smooth process (), the joint likelihood of the whole vector is simply . Likelihoods for independent time replicates are simply multiplied together, and the proportion of censored observations can be treated as a known parameter or an unknown parameter that enters the hierarchical model. See Appendix A.2 for a complete statement of the hierarchical model. The priors for the model parameters are
[TABLE]
where halfCauchy(1) refers to the positively truncated standard Cauchy distribution. We implement this methodology for two different Gaussian scale mixture models: that of Huser and Wadsworth (2019), and that of Huser et al. (2017). The priors for are for the former, and for the latter. The prior for depends on the choice of the covariance function. In our implementations, we adopt the Matérn covariance function with , where is the range parameter, and is the smoothness parameter. The prior is then set to subject to two independent half Cauchy distributions with scale parameter 1.
4.2 Gibbs Sampler
To estimate the posterior distribution of the model parameters, we apply random walk Metropolis (RWM) algorithm using Log-Adaptive Proposals (LAP) as our adaptive tuning strategy (Shaby and Wells, 2010). Since conjugate priors are not available, we use random walk Metropolis-Hastings update steps.
At each MCMC iteration, we first update the smooth process conditioning on the true values for all non-censored sites, current values for and all other model parameters:
[TABLE]
where the likelihood function of conditioning on the random scaling factor is calculated in Appendix A.2. We then update using its conditional posterior distribution
[TABLE]
Since time independence is assumed, we can update and in a parallel fashion across . The other parameters are updated similarly using adaptively-tuned random walk Metropolis-Hastings updates, with the likelihood (14) multiplied by the corresponding priors in (15).
5 Simulation Studies
In this section, we present simulation results and conduct coverage analysis to investigate, firstly, whether the MCMC procedure is able to draw accurate inference on model parameters, and secondly, in the case of the modified version of model (5), to check whether our model captures asymptotic dependence characteristics correctly even when the data-generating model is different from the fitted model.
5.1 Parameter Estimation
To verify the accuracy of inference made by MCMC sampling, we generate data from model (8) in Section 3.1, in both the special case of the Huser et al. (2017) model (4) and the special case of the Huser and Wadsworth (2019) model (5), with the addition of nugget terms. In both cases, we use sites uniformly drawn from the unit square , with the latent Gaussian processes are generated using a Matérn covariance with smoothness parameter . The characteristic length scale parameter is set to in the case of model (4) and in the case of model (5).
For model (4), we use independent temporal replications, and set , (which is fixed during estimation), and nugget variance . This represents a challenging case, with a fairly long tail and nugget that is small relative to the scale of . For model (5), we use independent temporal replications, and set the nugget variance to . Because the latent is transformed to Pareto, is still small compared to the scale of smooth process ; see the Supplementary Material for a more in-depth discussion on the effects of . We consider two different scenarios for the dependence parameter : and , corresponding to asymptotic independence and asymptotic dependence respectively. Finally, in all cases the processes are marginally transformed to generalized Pareto distribution with , where and , the proportion of censored observations, are treated as known parameters.
The attenuation constants used in the LAP algorithm are . The prior for is halfCauchy, and the priors for the other parameters are specified in (15), where so that the prior for is fairly noninformative. We ran each MCMC chain for 400,000 iterations and thinned the results by a factor of 10. The parallelism of updating and is implemented in R via the foreach routine with doParallel package as a backend (Microsoft Corporation and Weston, 2017).
5.2 Coverage Analysis
We now study the coverage properties of the posterior inference based on the MCMC sampler for the posterior credible intervals with 100 simulated datasets drawn from the Huser et al. (2017) and Huser and Wadsworth (2019) models, under each of the scenarios described in the previous section.
Figures 2 and 3 shows the empirical coverage rates of highest posterior density credible intervals of several sizes, along with standard binomial confidence intervals. In all cases, we can see that the sampler performed well in generating posterior inference that is well calibrated, with close to nominal frequentist coverage. The coverage for larger is may be slightly different than nominal for large , but overall the results are quite good.
5.3 Simulation with Mis-specified Models
We now fit our model to data generated from other distributions to validate its ability to capture the tail dependence characteristics under mis-specification. We simulate datasets from models referenced in Section 2.1, and use the sampler described in Section 4.2 to fit model (8) with the Huser and Wadsworth (2019) latent process. The data were generated using four different simulation designs:
- •
Skew- process from Morris et al. (2017a) with (asymptotically dependent);
- •
Gaussian scale mixture from Huser et al. (2017) with (asymptotically dependent);
- •
Gaussian scale mixture from Huser et al. (2017) with (asymptotically independent).
For each simulation design, we simulate a single dataset using locations uniformly distributed on and independent time replicates. The Matérn covariance function with and is again specified for the latent Gaussian processes. For the skew- process, to give a distribution with degrees of freedom, and to simulate moderate skewness. For the last two designs, the were generated as described in (4), with .
To obtain good starting values for the latent smooth process for MCMC, we first marginally transform the simulated data to noisy scale mixture variables independently at each location using the following procedure. Following the semi-parametric procedure of Coles and Tawn (1991), we estimate each marginal distribution as a blend of the generalized Pareto distribution function above a high marginal threshold, and the empirical distribution function below that threshold. Fixing initial values for , we then transform the margins to noisy scale mixtures via . The next step is to run a Metropolis algorithm using the full conditional distribution (see (16)) 100 times and save the last random walk states as initial values for . This procedure is also used for data analysis in Section 6. Finally, with initial values in hand, the datasets from each design were fit using a fully Bayesian approach that simultaneously updates marginal and spatial dependence parameters.
Figure 4 displays the nonparametric and model-based estimates of the upper tail dependence defined in (1). To generate nonparametric estimates of at distance , we look at all pairs of points whose locations are apart (within some small tolerance), and compute the ratio of empirical probabilities . This is similar to an empirical variogram estimator. The nonparametric confidence envelopes are obtained by computing pointwise binomial confidence intervals, pretending that each pair of points is independent from each other pair of points. For parametric estimates, we take samples from the converged MCMC chain, and use parameters from each iteration to simulate pairs of based on our model to generate a smooth estimate for each MCMC iteration. Then, combining across MCMC iterations, we compute the pointwise average curves and their credible bands. The results in Figure 4 demonstrate that our model provides a sensible approximation to the extremal dependence structure of the mis-specified models. Borrowing strength across locations, the parametric estimators of are much more reliable than the nonparametric ones in that they are able to discriminate between the two asymptotic classes via estimating a relatively small number of parameters. Especially for the Huser et al. (2017) models, the dependence strength diminishes gradually as becomes greater, eventually resembling the Gaussian copula. With the extra tail flexibility, our model is able to accurately capture these features.
6 Data Analysis
We consider daily observations of Fosberg Fire Weather Index (FFWI) from 1974 to 2015 at 93 monitoring stations over parts of the Great Plains, mainly from Central Great Plains to South Texas Plains (Dunn et al., 2012). Figure 9 shows the observation locations as black triangles. The FFWI aims to quantify potential wildfire threat. It is a single number summary calculated from temperature, wind speed, and relative humidity; larger index values signify higher flame lengths and more rapid drying (Fosberg, 1978). Due to human activity and changes in the grassland ecosystem, the Great Plains region is becoming an important high-risk region of large wildfires. According to a in-depth study conducted by Donovan et al. (2017), the average total area burned by fires in Great Plains region between 2005 and 2014 was in the millions of hectares per year. In 2017, 809,380 hectares were lost to wildfires in a single week in Texas, Oklahoma, and Kansas alone (Herskovitz, 2017). As a consequence of the prevailing hot and dry air, wildfires in this region are particularly more concentrated in the spring, feasting on grasses made dry by long-term drought. Modeling the tail behavior of FFWI and studying the extremes of the process could have major implications for wildfire planning.
To ensure the independence over time and avoid seasonal effects, we take the maxima of the FFWI values over ten-day intervals during the spring season. Figure 5 shows the 50-year return levels estimated using the block maxima with 10-year sliding windows for 12 randomly-selected stations. There is no clear evidence for a systematic increase or decrease in the return levels. Although treating meteorological data as constant over time is often problematic, particularly for temperature data, the behavior in Figure 5 suggests that an assumption of constant marginal parameters in time is appropriate.
To account for the physical features of the terrain in the Great Plains, we describe the scale parameter in by the trend surface:
[TABLE]
where and are the longitude and latitude of the stations at which the data are observed. We constrain the joint prior of such that the support of is the positive real line. We model the shape parameter as constant over the spatial domain, as suggested by exploratory analysis (see Appendix B.1).
Similar to the procedure in Section 5.3, to obtain starting values for MCMC, we fit generalized Pareto distributions to model events above the 98% quantile, , and empirical distributions to those below , of the time series at each station separately, and then use the fitted models to transform the data to have noisy scale mixture distributions. We then ran the MCMC chain for 50,000 iterations thinned by 10 steps and discarded a burn-in period of 25,000 iterations.
6.1 Model Evaluation
First and foremost, we examine whether the estimate falls within or in accordance to whether the data-generating process is asymptotically independent or dependent. Table 1 reports the posterior means and 95% credible intervals for the model parameters. Trace plots for and can be found in Appendix B.2. For this dataset, the MCMC results show that the range of the mixing parameter is close to the interface between the two dependence class, while demonstrating asymptotic dependence. Nonetheless, the value of being close to 1/2 means that will still decrease with before eventually reaching a positive limit.
To better evaluate the model fit, we randomly hold out 5 stations for validation purposes, and exclude them from the MCMC analysis; see the red points in Figure 9. We then compare the empirical distributions of mean, and maxima for each time point at the 5 held-out stations with those simulated with parameters from MCMC samples. Though we modeled our data as censored observations, the model may still work a bit further into the center of the distribution. Results are displayed in Figure 6 where we only show values exceeding 80% threshold for mean, and 90% threshold for maxima. We can see that the fit displays a good match against the observed mean and maxima in the upper quantiles.
As a comparison, we change to be a process, and re-fit the model using MCMC. We apply proper scoring rules (Gneiting and Raftery, 2007), log scores and continuously-ranked probability scores (CRPS), to compare the quality of probabilistic forecasts. While running MCMC, we interpolate the latent process at the held-out locations for each iteration using the full conditional likelihood. Plugging the predictive draws at the held-out observations into the equation (14), we obtain the log score (simply the log-likelihood) as
[TABLE]
where are the validation stations. The left panel of Figure 7 compares the log scores between two models, showing that the transformed Gaussian scale mixture process clearly outperforms the process. Additionally, we calculate the CPRS (Matheson and Winkler, 1976) for both models,
[TABLE]
where is the marginal distribution estimated using parameters using one MCMC iteration, and is the observed value. The right panel of Figure 7 shows the averaged CRPS for the two models, where our model clearly has better results.
Similar to Figure 4, we show the empirical and model-based values of and for the block maxima of the FFWI in Figure 8, which confirms that our model captures the extremal spatial dependence in the data quite well. The quantity is an alternative dependence measure useful in the situation , and is defined as
[TABLE]
Recall that the posterior mean for is greater than 0.5, which means as . Interestingly, the black dashed curve of the right panel of Figure 8 seems to have a limit less than 1. This is because to attain the correct limit in this case, we would need to compute for values of that are very close to 1, which is very difficult numerically.
6.2 Results
To get an idea of what a realization of the fitted process looks like, the left panel of Figure 9 shows one realization of the latent scale mixture process using parameters from one MCMC iteration. The extreme values are mainly concentrated in two small regions. The right panel shows the same realization, now marginally transformed to the scale of the FFWI values. Since we modeled our data as partially censored observations, the map here only displays the areas where threshold exceedances are observed.
A quantity of great interest is areal exceedance probabilities, which represent the amount of territory simultaneously at extreme risk for wildfire. To obtain a Monte Carlo estimate of these joint probabilities, we use parameters from each MCMC sample to simulate 100 processes (on a grid), and calculate the total area that has FFWI over a designated threshold. Figure 10 shows the results. These curves represent total area at risk for various FFWI thresholds. The curves for the higher thresholds decay faster than those for the lower thresholds, which confirms that extreme events simultaneously occurring across large areas becomes less common when the threshold increases. This also shows that the joint tail of the fire threat index exhibits a weakening dependence structure, with more extreme events being more localized. It is not possible to capture this behavior using limiting extreme value models like max-stable or generalized Pareto processes.
7 Discussion
In this paper, we have proposed a new modeling approach, based on the class of transformed Gaussian scale mixture models, which includes those recently proposed by Huser et al. (2017) and Huser and Wadsworth (2019). We added a measurement error to the mixture, hence avoiding the need to calculate the onerous -dimensional Gaussian distribution function when dealing with the censored likelihood. We also circumvent the need to draw from a high-dimensional truncated distribution by treating the smooth process as latent and updating repeatedly using MCMC. In addition to its computational advantages, the presence of a measurement error term may also make the model more realistic for data collected by real-world instruments. Indeed, nugget effects are ubiquitous in environmental statistics, not just to represent measurements errors, but also as a result of small scale effects that are not included in the large scale model for spatial dependence.
Even with the presence of the measurement error, the model is still able to capture qualitatively different types of sub-asymptotic dependence behavior of spatial processes. In the case of our modification of the Huser and Wadsworth (2019) model, a smooth transition between both extremal dependence paradigms takes place in the interior of the parameter space, which enables inference of the dependence class in a simple manner. We proved that all the appealing asymptotic properties found in the original smooth process are preserved in the modified model, regardless of the size of the measurement error variance.
The model allows inference on spatial extreme-value datasets with relatively large numbers of locations. The computational limitations are similar to those of conventional spatial Gaussian process models. We have defined the model conditionally as a Bayesian hierarchical model, for which standard MCMC techniques can be used to fit the data. Computation is facilitated greatly by parallelizing over time and migrating some basic linear algebra to C/C++ via Rcpp. Even so, the lack of closed form marginal transformations creates a significant (though embarrassingly parallel) computational challenge that scales with the total number of exceedances, rather than the usual case of scaling with the number of spatial locations.
Despite easing computational limitations associated with the Huser et al. (2017) and Huser and Wadsworth (2019) models, our modified versions inherit the same theoretical limitations. Neither model is able to account for the possibility of independence between observations as the distance between sites becomes large, nor are they able to transition from asymptotic dependence at short range to asymptotic independence at longer range. Wadsworth and Tawn (2019) presents an alternative approach to modeling spatial extremes that begins to address these issues.
Another interesting possibility to explore would be to include the nugget term inside the link function . This would result in closed-form marginal distributions in some cases, perhaps making computations easier. However, it would change the dependence structure in ways that would not vanish in the limit, which is not the behavior we were aiming for here, but could be useful nonetheless.
Acknowledgements
We gratefully acknowledge support from NSF grant DMS-1752280 and EPSRC grant EP/P002838/1, along with seed grants from the Institute for CyberScience and the Institute for Energy and the Environment at Pennsylvania State University. Computations for this research were performed on the Pennsylvania State University’s Institute for CyberScience Advanced CyberInfrastructure (ICS-ACI).
Appendix A Technical appendix
A.1 Proof of Proposition 3.1
For the proof of Proposition 3.1, we begin by recalling useful results from the literature.
The first is Breiman’s lemma, see e.g. Breiman (1965) and Cline and Samorodnitsky (1994), and a corollary for sums of a regularly varying and light-tailed random variables.
Lemma A.1** (Breiman).**
Suppose that where , and for some . Then
[TABLE]
Corollary A.1.1**.**
Suppose that , , and is a random variable such that for some . Then
[TABLE]
Proof.
Since , for positive finite , . By assumption , and so applying Breiman’s Lemma to yields
[TABLE]
from which the result follows. ∎
The second result relates to sums of Weibull-tailed variables, i.e., those with survival functions satisfying (10) and the associated density
[TABLE]
The following lemma can be verified directly from Theorem 4.1 of Asmussen et al. (2018), by identifying that the conditions in Section 4 of that paper hold for this subclass when .
Lemma A.2** (Asmussen et al. (2018); Balkema et al. (1993)).**
Let be variables with density (20), with regularly varying functions and parameters , . Then the density and survival function of the convolution satisfy
[TABLE]
where , with determined by solving
[TABLE]
and
[TABLE]
Finally we note also the following two useful inequalities
[TABLE]
[TABLE]
where the two lower bounds are equal. We can now prove Proposition 3.1.
Proof of Proposition 3.1.
- When has a regularly varying tail, Corollary A.1.1 provides that . For the existence of , the variable also has a regularly varying tail. Further,
[TABLE]
so Corollary A.1.1 thus gives , and similarly for the lower bound. Hence . Therefore it follows that . If for then , so the result for follows as well.
When has a Weibull-like tail with Weibull index , then , and the rest of the argument follows as above.
- When , . Breiman’s lemma now yields
[TABLE]
As a consequence, inequality (22) does not lead to a precise asymptotic relationship for , but rather that it is asymptotically bounded within the range
[TABLE]
Combining (24) and (25), we get the stated bound for . It follows also that and , and hence .
- Here we use Lemma A.2, where different values of are found for the three cases , and . To make notation more obvious, we replace with for summation of , and etc., for summation of and , for example. The three cases are considered for the value of ; we always have .
a) For ,
[TABLE]
Therefore
[TABLE]
which implies .
To understand the joint behavior, we again use (22). Note that
[TABLE]
Consequently, and both have , with different . As does not affect the leading order behavior of in (26), the tails of and both have in the exponent, where
[TABLE]
Consequently and so .
b) When , solving equations (21) provides
[TABLE]
and
[TABLE]
For the margins, and maximum , whilst for the minimum, . We can now use both sets of inequalities (22) and (23), to give that the range of is
[TABLE]
Some simplification arises upon noting that , since
[TABLE]
As , i.e., as the nugget effect disappears, the endpoints converge to . As , i.e., as the nugget effect dominates, both endpoints converge to .
c) When , solving equations (21) provides
[TABLE]
and . Using inequality (23),
[TABLE]
noting that and leads to .
∎
A.2 Hierarchical model
Under spatiotemporal setting, we observe , and the temporal dependence is ignored. Then the hierarchical model can be described as
[TABLE]
where corresponds to for the Huser and Wadsworth (2019) model and for the Huser et al. (2017) model, and is a Gaussian process with Matérn covariance . The priors for are and respectively.
In Section 4, we have formulated the full conditional likelihood for a fixed time and location; see Equation (14). Next we are going to work out the conditional joint likelihood for a fixed time . Keeping the notations same as the main text, we denote as the link function that transforms the scale of the latent Gaussian process. For the model of Huser et al. (2017), is an identity function, and thus conditioning on remains to be a multivariate Gaussian random variable. For the Huser and Wadsworth (2019) model, the conditional density needs more careful scrutiny.
Lemma A.3**.**
Let , where . For the Huser and Wadsworth (2019) model, the density conditional on is
[TABLE]
where .
*Proof. *First note that
[TABLE]
Fixing as a constant, we can apply the chain rule to obtain
[TABLE]
Then the Jacobian matrix can be written as
[TABLE]
and
[TABLE]
Denote as the density function of . Then
[TABLE]
Apply the change of variables formula
[TABLE]
and we will get the desired result.
When updating certain variables for a random walk Metropolis-Hastings step, we calculate the corresponding full conditional likelihood for both current values and proposed values. Since entails most parameters and latent variables, we will evaluate this density over and over again. Looking back at its analytic form in (14), the most compute-intensive part is to calculate the marginal transformation as defined in (12), which requires the computation of the marginal quantile function of the noisy process, i.e. . Note that the marginal distribution function can be obtained through a convolution:
[TABLE]
where is the density of , and is the marginal distribution function of the smooth process, whose forms are derived in Huser et al. (2017) and Huser and Wadsworth (2019) for the two models of interest in this paper. Because (30) cannot be further simplified, we compute the improper integral numerically using the QUADPACK algorithms which are implemented within the gsl_integration library in C++. The computations in C++ and R are interfaced using the package Rcpp.
To compute th quantile of , i.e. , which is required for likelihood function evaluations, we first evaluate the distribution function at a fine grid of values. We then perform cubic spline interpolation through the control points to yield a continuous quantile function estimate. Due to the smoothness of the quantile function, numerical experiments showed that this technique suffered no measurable reduction in accuracy relative to the much slower technique of computing quantiles using a numerical root finder.
Appendix B Additional Diagnostics
B.1 Marginal parameters
To examine the trend surfaces of the marginal parameter , we fit univariate generalized Pareto distribution to the spring observations at each station over a high threshold . The estimated parameters are plotted in Figure 11. We can see that there is no obvious spatial pattern for the shape parameter, while there is significant longitudinal effect for the scale parameter. This give grounds for applying linear trend surface to scale, and constant surface to shape (see Equation (17)).
B.2 MCMC results
Batch means (Flegal et al., 2010) is a convenient way to compute Monte Carlo standard errors for MCMC outputs. If one divides a Markov chain into batches of size , the Monte Carlo estimate of based on th batch can be obtained as follows:
[TABLE]
and batch means estimate of the Monte Carlo standard error can be defined as
[TABLE]
where is the overall Monte Carlo estimate. See Figure 13 for batch means standard errors computed periodically for and . Other parameters have similar results. We report the stabilized batch means standard errors in Table 2.
For the data analysis, we used 20 cores from the CyberScience Advanced Cyber Infrastructure at Penn State. On average, each iteration takes approximately 1.76 seconds (CPU time). The effective sample size (ESS) per second is also reported in Table 2, in which the marginal parameters has lower values.
Supplementary Material
In this document, we examine the performance of the MCMC algorithm when , i.e., our proposed model (8) converges to the underlying smooth process from Huser and Wadsworth. One may suspect that the Markov chains will mix very poorly when the value of is small (although when , an alternative MCMC scheme might be possible, wherein components of could be updated from truncated distributions, albeit more complicated ones). We conduct another simulation study for datasets generated from (transition point) and various values (). Other parameter settings remain the same as in Section 5.1. The sites and the smooth processes are simulated using the same random number generating seed for each dataset. We want to see the critical point of for the algorithm to fail.
Figure 14 displays the results from running the MCMC algorithm, with each row showing one case. The red dashed line signifies the true parameter values, and the blue lines indicates the 95% posterior credible intervals. Each MCMC chain was run for 400,000 iterations. Thinning the results by a factor of 10, we show the last 50,000 iterations. We can see that, while the performance of stays stable for different true values, the Markov chain for converges slower when its true values becomes smaller. In the case where , the true value is outside of the 95% credible interval and the chain mixes very poorly. Figure 15 shows trace plots of an at one specific site and time which is censored for all four simulations due to the same collection of sites and underlying smoothing processes. We can see that for larger value of , there are more fluctuations for over the threshold, which might be the reason why the MCMC algorithm works better in this case. The dissatisfying performance when is very small might also have something to do with the particular sampler we used (random walk M-H with a symmetric proposal kernel), given the parameter is so close to the boundary, rather than a problem with the overall modeling scheme.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Asmussen et al. (2018) Asmussen, S., Hashorva, E., Laub, P. J. and Taimre, T. (2018), ‘Tail asymptotics of light-tailed Weibull-like sums’, Probability and Mathematical Statistics 37 .
- 3Balkema et al. (1993) Balkema, A. A., Klüppelberg, C. and Resnick, S. I. (1993), ‘Densities with gaussian tails’, Proceedings of the London Mathematical Society 3 (3), 568–588.
- 4Breiman (1965) Breiman, L. (1965), ‘On some limit theorems similar to the arc-sin law’, Theory of Probability & Its Applications 10 (2), 323–331.
- 5Cline and Samorodnitsky (1994) Cline, D. B. and Samorodnitsky, G. (1994), ‘Subexponentiality of the product of independent random variables’, Stochastic Processes and their Applications 49 (1), 75–98.
- 6Coles and Tawn (1991) Coles, S. G. and Tawn, J. A. (1991), ‘Modelling extreme multivariate events’, Journal of the Royal Statistical Society: Series B (Methodological) 53 (2), 377–392.
- 7Donovan et al. (2017) Donovan, V. M., Wonkka, C. L. and Twidwell, D. (2017), ‘Surging wildfire activity in a grassland biome’, Geophysical Research Letters 44 (12), 5986–5993.
- 8Dunn et al. (2012) Dunn, R. J., Willett, K. M., Thorne, P. W., Woolley, E. V., Durre, I., Dai, A., Parker, D. E. and Vose, R. E. (2012), ‘Hadisd: a quality-controlled global synoptic report database for selected variables at long-term stations from 1973–2011’, Climate of the Past Discussions 8 (3).
