Detecting British Columbia Coastal Rainfall Patterns by Clustering Gaussian Processes
Forrest Paton, Paul D. McNicholas

TL;DR
This paper develops a clustering method based on Gaussian process covariance kernels to analyze and identify patterns in British Columbia's coastal rainfall data, revealing insights related to El Niño and La Niña phenomena.
Contribution
It introduces a novel clustering approach for functional data using Gaussian process covariance kernels, applied to coastal rainfall analysis.
Findings
Clustering reveals distinct rainfall patterns linked to climate phenomena.
The method can identify extreme weather events from rainfall data.
Insights are consistent with known climate patterns like El Niño and La Niña.
Abstract
Functional data analysis is a statistical framework where data are assumed to follow some functional form. This method of analysis is commonly applied to time series data, where time, measured continuously or in discrete intervals, serves as the location for a function's value. Gaussian processes are a generalization of the multivariate normal distribution to function space and, in this paper, they are used to shed light on coastal rainfall patterns in British Columbia (BC). Specifically, this work addressed the question over how one should carry out an exploratory cluster analysis for the BC, or any similar, coastal rainfall data. An approach is developed for clustering multiple processes observed on a comparable interval, based on how similar their underlying covariance kernel is. This approach provides interesting insights into the BC data, and these insights can be framed in terms…
| Parameter | Truth | Mean Estimate | Standard Error |
|---|---|---|---|
| 0.33 | 0.33 | 0 | |
| 0.67 | 0.67 | 0 | |
| 1 | 1.23 | 0.02 | |
| 3 | 3.08 | 0.03 | |
| 3 | 2.18 | 0.07 | |
| 1 | 1.43 | 0.09 |
| Parameter | Truth | Mean Estimate | Standard Error |
|---|---|---|---|
| 0.5 | 0.52 | 0.004 | |
| 0.5 | 0.48 | 0.004 | |
| 1 | 1.30 | 0.032 | |
| 2 | 2.10 | 0.018 | |
| 1 | 2.01 | 0.081 | |
| 2 | 2.05 | 0.082 |
| Weather Station | Years in Group Two |
|---|---|
| Vancouver | 1995 |
| Tofino | 1995 |
| Port Hardy | 1992, 1994, 1995, 1998, 2000 |
| Victoria | 1999 |
| Tofino | |
|---|---|
| Parameter | Mean Estimate |
| 0.90 | |
| 0.10 | |
| 0.365 | |
| 0.923 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Detecting British Columbia Coastal Rainfall Patterns by Clustering Gaussian Processes
Forrest Paton and Paul D. McNicholas
(Department of Mathematics and Statistics, McMaster University, Hamilton, Ontario, Canada.)
Abstract
Functional data analysis is a statistical framework where data are assumed to follow some functional form. This method of analysis is commonly applied to time series data, where time, measured continuously or in discrete intervals, serves as the location for a function’s value. Gaussian processes are a generalization of the multivariate normal distribution to function space and, in this paper, they are used to shed light on coastal rainfall patterns in British Columbia (BC). Specifically, this work addressed the question over how one should carry out an exploratory cluster analysis for the BC, or any similar, coastal rainfall data. An approach is developed for clustering multiple processes observed on a comparable interval, based on how similar their underlying covariance kernel is. This approach provides interesting insights into the BC data, and these insights can be framed in terms of El Niño and La Niña; however, the result is not simply one cluster representing El Niño years and another for La Niña years. From one perspective, the results show that clustering annual rainfall can potentially be used to identify extreme weather patterns.
Keywords: British Columbia; Clustering; Coastal Rainfall; El Niño; Extreme weather; Gaussian processes; La Niña; Mixture model.
1 Introduction
In contrast to predictable yearly seasonal changes, El Niño, a well studied teleconnection, does not occur at regular intervals. This is of particular interest for policy makers, businesses, and people who rely on calculable, foreseeable weather patterns. For example, seasonal changes can specifically alter food production plans, such as when to deploy fishing vessels or harvest crops. While El Niño is primarily categorized through warming temperatures in the eastern and equatorial Pacific Ocean, its effects can be seen around the globe through teleconnections. Teleconnections are an environmental phenomena that describe correlated large-scale atmospheric changes over non-contiguous geographic regions. While some teleconnections are well established, others rely on observing statistical irregularities (Gudmundson and Babkina, 2003; Ward et al., 2014); namely, El Niño’s effect on precipitation patterns. Sir Gilbert Walker, a 20th century English scientist, for example, first identified the link between Asian monsoons and Pacific coastal barometer readings (Gudmundson and Babkina, 2003). However, the study of teleconnections, such as precipitation patterns, is notoriously complex because of intricate spatial and temporal correlations. While spatial data is often modelled under the assumption that geographical points near one another share more information than those far apart, work has been done to model both nearby and remote geographical correlations. Hewitt et al. (2018) predict precipitation in Colorado by modelling both locally available data and remote Pacific Ocean sea surface temperatures. Understanding El Niño’s effect on distant precipitation can shed light on these patterns. Specifically, classifying irregular precipitation patterns in the Americas can help understand El Niño’s impact on local weather systems.
A question of particular interest is: which years exhibit distinct rainfall patterns? For prediction, knowing how Pacific Ocean changes affect rainfall can have significant implications (O’Gorman, 2015). A model for clustering yearly rainfall data will also give insight to research on the mechanistic properties of teleconnections. This has overlap in the statistical field of functional data clustering, where data are assumed to follow some functional form. Herein, mixture model-based clustering provides an effective approach to cluster data, and Gaussian Processes provide a model for rainfall. Much recent work has been done modelling climatological data with spatial and time dependence (Armal et al., 2018; Cabral et al., 2019; Gupta et al., 2016). Specifically, the use of Gaussian processes (GPs) gives a probabilistic starting point. A GP is a stochastic process that generalizes a finite-dimensional normal distribution to function space. GPs have been used to successfully solve complex non-linear regression and classification problems (Rasmussen, 2005; Roberts et al., 2013). When multiple functions exist on the same interval, usually compact and finite, it can be useful to classify them into a finite number of mutually exclusive groups. Here the process would be defined on the index set time, specifically the months of the year.
2 Background
2.1 Gaussian Process
Rasmussen (2005) defines a GP as: “…a collection of random variables, any finite number of which have a joint Gaussian distribution”. A GP creates a set of random variables evaluated at . In essence, a GP is a distribution over functions, i.e.,
[TABLE]
where and are the mean function and covariance kernel, respectively, is the function’s index, and is the function’s output, i.e., is a point in . A common GP, and the kernel considered in this paper, is defined with mean function 0 and squared exponential (SE) covariance function
[TABLE]
where and are hyper-parameters that control the shape of the process; specifically, controls the amount of variation in and , the length-scale parameter, controls the correlation. The SE kernel is a widely used (Rasmussen, 2005). One convenient property of the SE kernel is infinite differentiability, which is useful because the first derivative is needed for hyper-parameter estimation. Here, the use of the term hyper-parameter refers to a set of parameters that make up the “non-parametric model”. That is, the hyper-parameters only appear in the model’s prior and, as shown later in (3), are integrated out of the final model (posterior).
Although the SE is the most popular, different kernel functions can be used. Because the kernel is needed to populate a multivariate normal distribution covariance matrix, the kernel is restricted to produce positive semi-definite matrices. While the SE covariance kernel is a popular choice, modelling the shape of the covariance kernel is an open ended problem. One systematic solution can be formed by considering a Bayesian model selection framework as discussed in Rasmussen (2005). Duvenaud et al. (2013) also provide a solution by considering automatic kernel selection through searching over appropriate kernel structures. For the SE covariance kernel, controls the height or the amplitude of the GP and controls the correlation between observations. This kernel is used to construct a matrix, K, which will serve as the covariance matrix in a multivariate normal distribution introduced in the next section:
[TABLE]
While a GP is defined on the entire real line, we only observe a finite number of realizations , and corresponding , where . The vector is commonly called the input and represents the location of the the process, i.e., observation . The vector is referred to as the output, and is the function evaluated at location . This allows for a generalization to a multivariate normal distribution via \mathbf{f}(\mathbf{x})\sim\mathcal{N}\big{(}\textbf{0},\ \textbf{K}\big{)}. This is possible because marginalizing a Gaussian distribution is trivial: the resulting distribution is Gaussian and we can ignore the () pairs that are unobserved or missing. Here changing the kernel affects the shape of the function, effectively controlling the magnitude of which observations and are correlated. Formulating the problem in this way, we can see the kernel is our prior on the function space, and the (marginal) likelihood for the GP comes after conditioning on the realized points. As shown in the next section, the hyper-parameters are often estimated to maximize the likelihood of the GP.
2.2 Likelihood
The previous section introduced the kernel function and how it relates to a prior on function space and how the hyper-parameters affect the correlation between the input and outcome . Now, the likelihood for a GP will be introduced and strategies for choosing the hyper-parameters will be illustrated. From the definition of a GP, , which will be shown formally below. First, let , where K is the covariance matrix constructed from the SE kernel shown in (2).
The likelihood for a GP is conditioned on the observed values to obtain a marginal likelihood, using to denote the normal density function:
[TABLE]
We get the marginal for by using Bayes’ rule and integrating over :
[TABLE]
Taking the natural logarithm of (3) gives
[TABLE]
This likelihood can be broken down into three main components, the data fit term, model complexity term, and a constant term:
[TABLE]
The data fit and complexity component share an interesting tradeoff. For small length-scale values , the model will fit the data well and the data fit component will be small. However, points will not be considered “near” each other, resulting in a high model complexity. Conversely, if is large (suggesting no correlation between points), then the the complexity will be small but the data fit term will be large (Murphy, 2012). This is because the SE kernel will converge to , turning K into a diagonal matrix. Because GPs have these inherent penalty terms for over- and under-fitting, cross validation is generally not used to estimate kernel hyper-parameters.
2.3 Predictive Distribution
GPs are commonly used in supervised regression tasks for their ability to non-parametrically approximate complex functions and solve functional engineering problems (Bin and Wenlai, 2013). It is often of interest to infer the function’s value outside of the paired training data (). To do this, a predictive distribution can be constructed. Let be the unobserved outputs to be inferred at locations . The joint distribution can be derived through probabilistic terms,
[TABLE]
[TABLE]
[TABLE]
Note that (8) is the joint distribution of the observed pairs () and unobserved pairs (, ). The expected value for can be derived using conditional properties which leads to
[TABLE]
a complete derivation is given by Rasmussen (2005). Then, (9) can then be used to compute estimates for the value of the function at location .
2.4 Model-Based Clustering
Clustering, a.k.a. unsupervised classification, is an unsupervised machine learning task which attempts to classify (unlabelled) data points into distinct groups. Commonly, clustering is defined as assigning data into groups such that data in the same cluster are more similar to each other than to data in different clusters. Initially this definition seems intuitive; however, practically there are some problems. Namely, grouping each data point into its own cluster would satisfy this definition. Following several others (e.g., Tiedeman, 1955; Wolfe, 1963), McNicholas (2016a) provides a definition not based on similarity: “a cluster is a unimodal component within an appropriate finite mixture model”, where the word ‘appropriate’ requires consideration and, specifically, that the component densities have the necessary flexibility to fit the data (see McNicholas, 2016a, for further discussion). Whatever definition of a cluster one may prefer, many methods have been developed to tackle this problem of unsupervised learning. Model-based refers to using probability distributions to model the clusters (as opposed to hierarchical clustering, -means clustering, etc.).
2.5 Finite Mixture Model
The finite mixture model is a popular tool for (model-based) clustering — recent reviews are provided by Bouveyron and Brunet-Saumard (2014) and McNicholas (2016b). The density of a -component finite mixture model is
[TABLE]
where , with , is the th mixing proportion is the th component density, and are the component density-specific parameters, with and . The likelihood for in a model-based clustering paradigm, using a Gaussian mixture model, is given by
[TABLE]
where is the density of a multivariate Gaussian distribution with mean and covariance matrix .
2.6 Expectation-Maximization Algorithm
Model-based clustering requires estimating the unknown model parameters from the likelihood in (11). The expectation-maximization (EM) algorithm (Dempster et al., 1977) provides a good starting point for this problem. Each iteration of the EM algorithm starts by computing the expectation of the complete-data log-likelihood (E-step), then maximizes the conditional expectation of the complete-data log-likelihood (M-step). The E- and M-steps are iterated until some stopping rule is met. Consider a Gaussian model-based clustering complete-data likelihood, denoted by , where denotes all the parameters and the complete-data comprise the observed together with the missing labels defined so that and if observation belongs to component and otherwise. Now, for the Gaussian mixture model-based clustering paradigm — corresponding to the likelihood in (11) — we have complete-data likelihood
[TABLE]
In the E-step, we compute
[TABLE]
conditional on the current parameter updates (estimates). Next, in the M-step, the parameters are updated. This amounts to estimating the covariance matrix and mean vector for a Gaussian mixture model. In the M-step, the updates are:
[TABLE]
where . After parameter estimation is completed, the clustering results are expressed through the probabilities , i.e., is the probability that belongs to component . These soft probabilities are often converted into hard classifications via maximum a posteriori probabilities:
[TABLE]
Extensive details on model-based clustering and parameter estimation are given by McNicholas (2016a).
3 Model to Cluster Functional Data
3.1 Model Formulation
We have seen that the log-likelihood for a GP with the observed output vector and corresponding input vector was distributed according to a multivariate Gaussian distribution. When clustering GPs, the goal will be to find clusters that contain processes which have similar paths. The meaning of similar path refers not only to how close two processes’ values are but also to how similar their shapes are (smooth, wiggly, etc.). Now let us define the notation used for the model: the th GP will have output vector , input vector , and
[TABLE]
The density in (18) is the probability density function, i.e., , used as the component density for the finite mixture model
[TABLE]
where denotes the hyper-parameters for the th cluster. Because the likelihood of a GP is a Gaussian distribution with covariance matrix , the complete-data likelihood is given by
[TABLE]
where is the covariance matrix corresponding to cluster and is the Gaussian density with mean and covariance . An SE covariance kernel is used as the prior on the function space, i.e.,
[TABLE]
The goal is to recover the pairs of kernel hyper-parameters and the mixing parameters , and thence to estimate the latent variables .
3.2 GP Parameter Estimation
The first step is to estimate each GP’s kernel hyperparameters. Herein, the kernel hyper-parameters for the th GP is denoted by , . In this step, the maximized kernel hyper-parameters for each GP, and , are estimated. To find these maximized hyper-parameters, an MLE solution is found using gradient ascent, starting with the log-likelihood i.e.,
[TABLE]
The derivative is then taken w.r.t. to the kernel hyper-parameters
[TABLE]
where . The partial derivatives for and are calculated from the first derivatives of the kernel function:
[TABLE]
After finding the gradient for the likelihood, a gradient ascent algorithm is used to find a sufficiently close solution. This algorithm is given by repeating the following updates until a stopping rule is statisfied:
[TABLE]
where superscript denotes iteration . After maximizing the kernel hyper-parameters, we have where denotes the maximized kernel hyper-parameters for the first GP, denotes the hyper-parameters for the second GP, and so on.
3.3 Cluster Parameter Estimation
The model seeks to cluster the processes and make inferences on the latent variables. A modified EM approach is used. First, the mixing proportions and cluster hyper-parameters are initialized randomly, i.e., we initialize , , and , where , , and randomly. Next, each GP’s () responsibilities are calculated for each of the clusters to get responsibilities
[TABLE]
Note that represents the responsibility, or conditional expected value, of the th process belonging to the th cluster, and . After the responsibilities are calculated, the mixing proportions are conditionally maximized on these responsibilities. The update for the th mixing proportion is
[TABLE]
where is the responsibility for cluster and . The cluster-specific kernel hyper-parameters, i.e., and , are then updated, where is the length-scale parameter for cluster and is the height parameter for cluster :
[TABLE]
i.e., we are weighting the maximized hyper-parameters, and , by their respective cluster responsibility .
This scheme, for calculating the responsibilities then updating the cluster parameters, is repeated until some stopping rule is met. In this case, when the change in expected complete-data log likelihood at iteration becomes small, i.e., until
[TABLE]
where is the expectation of the complete-data log likelihood.
3.4 Numerical Issues
At each iteration of gradient ascent, the GP’s likelihood gradient needs to be computed:
[TABLE]
This operation requires inverting a matrix . Inverting large matrices is notoriously computationally unstable, especially when the matrices are not full rank (or sufficiently close) and eigenvalues become very large or very small. One solution is to first decompose the matrix into lower-triangular form, i.e., and then invert via . We use the R (R Core Team, 2018) package FastGP (Gopalan and Bornn, 2016), which implements the package RcppEigen (Bates and Eddelbuettel, 2013) to invert the lower-triangular matrix .
4 Simulation Studies
This section will first look at two cases of simulated data. The hyper-parameters and the mixing proportions will vary based on the simulated sets. The method developed in the previous section will then be applied to recover the hyper-parameters and classify each GP into their respective groups. For the two simulation studies, noiseless squared exponential covariance functions will be used, which in effect means that a perfectly interpolated, noiseless process is observed for the simulation.
4.1 Simulation I
The first simulation starts with generating 30 GPs. The processes are generated on the interval with evenly spaced realizations, i.e., each process has seven values spread evenly on the interval. In all, 10 of the 30 GPs are generated from a multivariate normal distribution using the R package mvtnorm (Genz et al., 2009). Where the covariance matrix was constructed using an SE covariance kernel with hyper-parameters and . The remaining 20 were generated similarly but with a covariance matrix constructed with hyper-parameters and .
After running the algorithm described in Section 3, estimates for the set of hyper-parameters and mixing proportion were recorded (Table 1). The mixing proportion is easily identified and accurately estimated. Using the MAP classification, the algorithm was able to correctly classify each process. Table 1 gives the mean parameter estimates and standard errors. This was done by randomly starting the algorithm 10 times — i.e., initializing the parameters from a random uniform draw — and calculating the mean and standard error from these 10 starts.
Once the processes are coloured by their MAP classification (Figure 1), one can visually see the difference between the two process clusters. The processes (, blue) with the larger length-scale are smoother compared to those generated from the process with length-scale .
The length-scale parameter was also readily recovered in this scenario, producing similar estimates to the true hyper-parameter. The hyper-parameter , which, recall, controls the function’s variance (in , the function’s output), is not near the true parameter value. One reason for this could because this cluster has a comparatively small length-scale , which models the relative correlation between the points.
4.2 Simulation II
The second simulation was carried out by first generating 20 GPs. Ten were generated from an SE covariance kernel with hyper-parameters and . The remaining 10 GPs were generated from a covariance kernel with hyper-parameters and . Similarly to Simulation I, the GPs were generated first by constructing the covariance matrix, then by generating random samples using the R package mvtnorm. In all, equally spaced observed values were recorded for each GP (Figure 2). Based on the plot in Figure 2, there seems to be no clear distinction or natural groups of processes. After coloring the processes by their (correct) classifications (Figure 2), there is still ambiguity about the two groups separation.
Again, the method accurately recovers the mixing parameter and length-scale (Table 2). However, for cluster 1, the length-scale is slightly overestimated and the method inflates to account for the variance in the function’s output. The parameter estimates were calculated by randomly starting the algorithm 10 times and using the mean. Notably, the processes look very similar between groups, so much so that this solution might seem unconvincing if true group labels were unknown.
5 Coastal Rainfall in British Columbia
This section will look at historical monthly precipitation data for coastal regions of British Columbia (BC), Canada. These data are recorded by the Government of Canada and collected from the weather stations: Tofino A, Vancouver International Airport, Port Hardy A, and Victoria International Airport (Figure 3). These data were derived from the following resources available in the public domain.
Total monthly precipitation was recorded from January 1991 to December 2000. The ten years will be treated as independent GPs and each station will be treated separately for clustering purposes. The objective is to find cluster solutions for each station and compare the resulting clusters — because there are only 10 processes per station, cluster solutions were not explored. The data were processed first by removing the seasonality, i.e., the residual precipitation after a ten year monthly average was removed. The data were then centered and scaled such that the mean is [math] and standard deviation is . These analyses will use the SE covariance function discussed earlier. Thus, will be estimated and modelled holding constant. Note that the observed outputs are (artificially) connected by lines for illustrative purposes (Figure 4).
Figure 4 illustrates the scaled Tofino data plotted by year. Next, the maximized hyper-parameter is fitted and results are shown in Figure 6. Because of multi-modal likelihoods in the gradient ascent approach to hyper-parameter fitting, a grid search was used to choose the optimal value. An example likelihood from this is shown in Figure 5.
Instead of applying the EM algorithm to estimate mixture parameters, the relatively small number of processes for each station meant that an exhaustive search could be used. That is, each two-group combination of the ten years was considered. This amounted to maximizing likelihoods and selecting the model with the highest (complete-data) likelihood. Using these results, there seem to be two groups emerging, one group with a smaller length-scale and one group with a larger length-scale parameter. Group two has a larger group length scale parameter () as compared with group one (). Group two is labelled the “irregular group” as, with the exception of Port Hardy, the mixing proportion suggests only about of the data come from this group (see Table 4). In this case, the years with a smaller length-scale indicate monthly rainfall is less correlated month-to-month than those with a larger length-scale.
Vancouver, Tofino, and Port Hardy all had the year 1995 assigned to group two (Table 3). Additionally, Port Hardy had the years 1992, 1994, 1998, and 2000 assigned to group two. The years assigned to group two in at least one weather station (1992, 1994, 1995, 1998, 1999, 2000) share an interesting characteristic related to Pacific Ocean temperatures. Specifically, El Niño and La Niña are events classified using the Oceanic Niño Index (ONI), an index that measures irregular ocean temperature changes over a three month moving average. An El Niño (irregularly warm) event immediately preceded a La Niña (irregularly cold) event twice during the studied time period. Once in 1995 and again in 1998. Both times the year started with warm enough ocean temperatures to classify it as an El Niño period, and by the end of the calendar year the ocean had cooled enough to be classified as La Niña (NOAA, 2019). The other years in the irregular cluster also differed in terms of regular Pacific Ocean temperatures. Generally, years clustered into the irregular group tended to correspond with falling Equatorial Pacific Ocean temperatures as shown in Figure 7A). These irregular years also coincided with years where the middle region of the Pacific Ocean (5S-5N and 170-120W) started the calendar year warmer than it ended, as illustrated in Figure 7B).
From further consideration of the estimated cluster parameters in Table 4, Cluster 2’s years tend towards a larger length-scale compared to Cluster 1. This suggests that in years where El Niño changes to La Niña, rainfall patterns change more smoothly (i.e., are more correlated) across months as opposed to regular weather years.
6 Discussion
A method for clustering functional data has been introduced to cluster coastal rainfall data from BC. First, the hyper-parameters that make up a GP were optimized through a gradient-based maximum likelihood optimizer. Because of computational feasibility, parameter estimates were obtained by considering every two-group combination of years and choosing the maximum likelihood. For the simulation studies, the usual EM algorithm for finite Gaussian mixture models was modified. Instead of maximizing the standard covariance matrix, hyper-parameters for a kernel function that measures correlation in were optimized. The covariance matrix was then constructed from the optimal kernel parameters. Herein, we fix as known; however, if one were to consider data with more processes per station then could be considered. Notably missing, or incomplete, data can easily be handled by the proposed approach, either by using the predictive distribution to impute the missing data or by ignoring the missing values. This is possible because the model makes inference on the underlying hyper-parameters of the kernel, and not the particular index set of the process.
Two simulation studies were performed. When GPs from different distributions had a large difference in their length-scale parameter (i.e., 1 versus 3), parameters were readily recovered. When GPs had similar length-scale parameters, was recovered but tended to shrink towards a common estimate between both clusters. The application to the rainfall data from the coastal region of B.C discovered two groups of years, one which contained “regular” years and the other “irregular” years. The irregular years consisted of years where there was a transition from El Niño to La Niña, or more generally cooling Pacific Ocean temperatures. These results suggest El Niño events have some effect on kernel hyper-parameters. The data were standardized to have a zero mean function, implying correlation between rainfall patterns month-to-month can discriminate some El Niño events (as apposed to magnitude of rainfall).
The most obvious direction for future work is to apply the approach developed herein to other rainfall data. In terms of the BC coastal rainfall data, one could carry out a clustering of all locations combined to see whether some years stand out from others. The BC data could be treated as matrix variate data and clustered accordingly, perhaps after the fashion of Gallaugher and McNicholas (2018). In both cases, it is of interest to observe whether clusters emerge.
Acknowledgements
This work was supported by the Canada Research Chairs program and an E.W.R. Steacie Memorial Fellowship.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Armal et al. (2018) Armal, S., N., Devineni, R., Khanbilvardi (2018) Trends in Extreme Rainfall Frequency in the Contiguous United States: Attribution to Climate Change and Climate Variability Modes. J. Climate 31 , 369–385.
- 2Bates and Eddelbuettel (2013) Bates, D. and Eddelbuettel, D. (2013). Fast and Elegant Numerical Linear Algebra Using the Rcpp Eigen Package. Journal of Statistical Software , 52 (5), 1–24.
- 3Berrocal (2016) Berrocal, V. J. (2016). Identifying trends in the spatial errors of a regional climate model via clustering. Environmetrics , 27 , 90–102.
- 4Bin and Wenlai (2013) Bin, S. and Wenlai, Y. (2013). Application of gaussian process regression to prediction of thermal comfort index. In 2013 IEEE 11th International Conference on Electronic Measurement Instruments , volume 2, pp. 958–961.
- 5Bouveyron and Brunet-Saumard (2014) Bouveyron, C. and Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: A review. Computational Statistics and Data Analysis 71 , 52–78.
- 6Cabral et al. (2019) Cabral, R, Ferreira, A, Friederichs, P. (2019). Space–time trends and dependence of precipitation extremes in North‐Western Germany. Environmetrics 2019;e 2605.
- 7Dempster et al. (1977) Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B , 39 , 1–38.
- 8Duvenaud et al. (2013) Duvenaud, D., Lloyd, J., Grosse, R., Tenenbaum, J., Zoubin, G. (2013). Structure Discovery in Nonparametric Regression through Compositional Kernel Search. Proceedings of the 30th International Conference on Machine Learning , 28 , 1166–1174.
