Regularized Spatial Maximum Covariance Analysis
Wen-Ting Wang, Hsin-Cheng Huang

TL;DR
This paper introduces a regularized approach to maximum covariance analysis in climate studies, enhancing interpretability of coupled spatial patterns through smoothness and sparsity constraints, with an efficient algorithm for implementation.
Contribution
It proposes a novel regularization method for MCA that improves pattern interpretability by incorporating spatial smoothness and sparsity, solved via an efficient ADMM algorithm.
Findings
Enhanced interpretability of coupled climate patterns.
Successful application to precipitation and sea surface temperature data.
Demonstrated efficiency of the proposed algorithm.
Abstract
In climate and atmospheric research, many phenomena involve more than one meteorological spatial processes covarying in space. To understand how one process is affected by another, maximum covariance analysis (MCA) is commonly applied. However, the patterns obtained from MCA may sometimes be difficult to interpret. In this paper, we propose a regularization approach to promote spatial features in dominant coupled patterns by introducing smoothness and sparseness penalties while accounting for their orthogonalities. We develop an efficient algorithm to solve the resulting optimization problem by using the alternating direction method of multipliers. The effectiveness of the proposed method is illustrated by several numerical examples, including an application to study how precipitations in east Africa are affected by sea surface temperatures in the Indian Ocean.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical and numerical algorithms · Sparse and Compressive Sensing Techniques · Soil Geostatistics and Mapping
Regularized Spatial Maximum Covariance Analysis
Wen-Ting Wang
Hsin-Cheng Huang
Institute of Statistics, National Chiao Tung University
Institute of Statistical Science, Academia Sinica
Abstract
In climate and atmospheric research, many phenomena involve more than one meteorological spatial processes covarying in space. To understand how one process is affected by another, maximum covariance analysis (MCA) is commonly applied. However, the patterns obtained from MCA may sometimes be difficult to interpret. In this paper, we propose a regularization approach to promote spatial features in dominant coupled patterns by introducing smoothness and sparseness penalties while accounting for their orthogonalities. We develop an efficient algorithm to solve the resulting optimization problem by using the alternating direction method of multipliers. The effectiveness of the proposed method is illustrated by several numerical examples, including an application to study how precipitations in east Africa are affected by sea surface temperatures in the Indian Ocean.
keywords:
Singular value decomposition, Lasso, smoothing splines, orthogonal constraint, alternating direction method of multipliers
††journal: Environmetrics
1 Introduction
Many climate and atmospheric phenomena involve more than one meteorological spatial processes covarying in space. It is of interest to find dominant coupled patterns among these processes. For example, variations of sea surface temperatures (SSTs) in the Indian Ocean may affect precipitations in nearby countries in Africa, particularly over sensitive agricultural regions, and hence threaten the economies and livelihoods of these countries. Consequently, many studies have been conducted on the relationship between the SST and precipitation by analyzing their coupled patterns (e.g. Reason, 2002; Morioka et al., 2012; Omondi et al., 2013). A commonly used method is maximum covariance analysis (MCA), which seeks important spatial patterns that explain the maximum amount of covariance between the two processes by using the singular value decomposition (SVD) of the cross-covariance matrix (Tucker, 1958).
However, the leading coupled patterns obtained by MCA may sometimes be too noisy to be physically interpretable when the signal-to-noise ratio is low. Many approaches have been proposed to improve MCA. For example, Salim et al. (2005) and Salim and Pawitan (2007) proposed penalized likelihood approaches using roughness penalties to promote smoothness of the leading coupled patterns in space. However, these methods tend to capture global features but not localized ones. On the other hand, Witten et al. (2009) considered canonical correlation analysis with an constraint, and Lee et al. (2011) proposed a penalized likelihood method with the SCAD penalty (Fan and Li, 2001) to facilitate sparse patterns. However, these methods cannot be applied to continuous spatial domains with data observed at irregularly spaced locations. Additionally, all these methods ignore the orthogonal constraints in MCA patterns.
In this paper, we propose a regularization approach of MCA that incorporates smoothness and localized features in dominant coupled patterns. The proposed method, called spatial MCA (abbreviated as SpatMCA), is applicable to data measured irregularly in space. In addition, the resulting estimates can be effectively computed using the alternating direction method of multipliers (ADMM) (Boyd et al., 2011).
The remainder of this paper is organized as follows. In Section 2, we introduce the proposed SpatMCA method, including dominant coupled patterns estimation and spatial cross-covariance function estimation. Our ADMM algorithm for computing the SpatMCA estimate is provided in Section 3. Numerical experiments that illustrate the superiority of SpatMCA and an application to study the relationship between sea surface temperature and precipitation datasets are presented in Section 4.
2 The Proposed Method
Consider a sequence of uncorrelated, zero-mean, bivariate -continuous spatial processes on spatial domains and ,
[TABLE]
which have a common spatial covariance function ; for . According to Azaïez and Belgacem (2015), can be decomposed as , where are nonnegative singular values with , and and are the two corresponding sets of orthonormal basis functions. The decomposition is similar to the Karhunen-Loéve expansion (Karhunen, 1947; Loève, 1978) for a univariate spatial process. Suppose we observe data with added noise at the spatial locations for , according to
[TABLE]
where , and and are mutually uncorrelated. Assume , and denote the cross-covariance matrix between and by . Let be the SVD of , where , is a matrix with the -th element , and is a matrix with the -th element . We aim to identify the first dominant spatial coupled patterns and with large for processes and , as well as to estimate .
Let for . The sample cross-covariance matrix of and is . Then the MCA estimates of and obtained by the SVD of are and , the -th left and right singular vectors of , for . Let and be and matrices formed by the first left and right singular vectors of . Then solves the following constrained optimization problem (Lee and Cichocki, 2014):
[TABLE]
where and . However, may suffer from high estimation variability when or is large, is small, or or is large. Consequently, the patterns of may be too noisy to be physically interpretable. Additionally, for continuous spatial domains and , we also need to estimate at locations and , where data may be unavailable.
2.1 Regularized Spatial MCA
To reduce high estimation variability of MCA while controlling bias, our main idea is to introduce some spatial structure. We propose a regularization approach by maximizing the following objective function:
[TABLE]
over and , subject to and , where
[TABLE]
is a roughness penalty, , , , and are nonnegative smoothness parameters, and and are nonnegative sparseness parameters. Since the patterns of and could be very different, we allow and . Note that is the smoothing spline penalty, designed to enhance smoothness of and , and the Lasso penalty (Tibshirani, 1996) is applied to seek sparse patterns by shrinking and toward zero. The combination of the smoothness and sparseness penalties was shown by Wang and Huang (2017) to be effective in obtaining smooth and localized patterns for a univariate spatial process. Denote and as the maximizers of (2). When is larger, become smoother, and vice versa. When is larger, become more localized by forcing more elements of to be zero. Similar results can be applied to and for . On the other hand, when , the estimates reduce to the MCA estimates.
According to the smoothing spline theory (Green and Silverman, 1994), and are natural cubic splines and thin-plate splines for and with knots at and , respectively. Specifically,
[TABLE]
where for ,
[TABLE]
and the coefficients and for satisfy
[TABLE]
Here , , is a matrix with the -th element , and is a matrix with the -th row for . Therefore, and in (3) and (4) can be expressed in terms of and , respectively.
The roughness penalties of and can also be written as
[TABLE]
where is a known matrix determined only by for (Green and Silverman, 1994). Therefore, from (2) and (5), the proposed estimate of can be simplified by maximizing the following objective function:
[TABLE]
subject to and . We call the proposed method based on (6) SpatMCA. Given ,, the estimates of can be directly calculated by (3) and (4). Note that the SpatMCA estimate of (6) reduces to a sparse CCA estimate of Witten et al. (2009) if , for , and the orthogonal constraints of and are dropped.
2.2 Estimation of Cross-Covariance Function
To estimate , we also have to estimate . Given with and , the proposed estimate of is
[TABLE]
where , and \|\bm{M}\|_{F}=\Big{(}\displaystyle\sum_{i,j}m^{2}_{ij}\Big{)}^{1/2} is the Frobenius norm of a matrix . Then, the proposed estimate of is
[TABLE]
2.3 Tuning Parameter Selection
An -fold cross-validation (CV) is applied to select the tuning parameters , , and . First, we randomly decompose the index set into parts that are as close to the same size, , as possible. Let be the sub-matrix of corresponding to the -th part. For , we treat as the validation data, and we obtain the estimate of for based on the remaining data using the proposed method (6), where is a candidate index set. Then the proposed CV criterion is
[TABLE]
where , and is the estimate of from (7) with replaced by .
Owing to the high computation cost to select simultaneously for each , we recommend an effective two-step procedure for selecting them. Specifically, we first select and with by
[TABLE]
and then select and by
[TABLE]
Finally, we select the rank of by computing the CV values of (9) for , evaluated at the four selected tuning parameter values until no further reduction of the CV value is obtained. That is,
[TABLE]
3 Computation Algorithm
Let be a matrix with the -th element . The objective function (6) can be rewritten as
[TABLE]
subject to , where \bm{\Theta}={\left(\begin{array}[]{cc}-\tau_{1u}\bm{\Omega}_{1}&\bm{S}_{12}/2\\ \bm{S}^{\prime}_{12}/2&-\tau_{1v}\bm{\Omega}_{2}\\ \end{array}\right)}. The maximizer of (13), consisting of the orthogonal constraint and the Lasso penalty, is too complex to solve directly. We adopt the ADMM algorithm (originated by Gabay and Mercier, 1976) by decomposing the constrained optimization problems into small subproblems that can be efficiently handled. The readers are referred to Boyd et al. (2011) for more details regarding ADMM.
First, we transform (13) into the following equivalent form by adding parameter matrices and :
[TABLE]
subject to , and a new constraint , where is the -th element of , , and and are and sub-matrices of formed by the first and the last rows of , respectively. The resulting augmented Lagrange function is
[TABLE]
subject to , where and are matrices of Lagrange multipliers, and is a penalty parameter to promote convergence. Then the ADMM steps at the -th iteration have the following closed formed expressions:
[TABLE]
where
[TABLE]
is the -th element of , is the SVD of for , and are and sub-matrices of corresponding to and , and and are and sub-matrices of corresponding to and . Note that must be chosen large enough to ensure that in (14) is positive-definite.
4 Numerical Examples
This section contains several simulation examples in one-dimensional and two-dimensional spatial domains and an application of SpatMCA to a real dataset. We compared the performance of the proposed SpatMCA with three other methods: (1) MCA (); (2) SpatMCA with the smoothness penalties only (); (3) SpatMCA with the sparseness penalties only (), in terms of the following loss function:
[TABLE]
Throughout this section, we applied the proposed SpatMCA method and the ADMM algorithm given by (14)–(18) to compute the SpatMCA estimates with being ten times the maximum singular value of . Additionally, the stopping criterion for the ADMM algorithm is
[TABLE]
4.1 A One-Dimensional Experiment
We generated data from (1) with , ,
[TABLE]
, , equally spaced in , and
[TABLE]
where , , , and are normalization constants such that for . We considered three pairs of , and applied the proposed SpatMCA with and selected by (12). For each case, we applied the 5-fold CV of (12) to select among values of and (including [math] and the other 20 values equally spaced on the log scale from to ) and values of and (including [math] and the other 10 values equally spaced on the log scale from to ).
Figures 1 and 2 show the estimates of and , respectively, for the four methods based on three different combinations of singular values. Each case contains four estimated functions based on four randomly generated datasets. Not surprisingly, the MCA estimates considering no spatial structure are very noisy, particularly when the signal-to-noise ratio is small. Adding only the smoothness penalties (i.e., ) reduces noise, but introduces some bias. On the other hand, adding only the sparseness penalties (i.e, ) does not reduce much noise, despite that the estimated and are forced to be zeros at some locations. Our SpatMCA estimates generally reproduce the targets with little noise for all cases even for the small signal-to-noise ratio, indicating the effectiveness of regularization.
The cross-covariance function estimates for the four methods based on a randomly generated dataset are shown in Figure 3. The proposed SpatMCA can be seen to perform better than the other methods for all cases. Figure 4 shows boxplots of the four methods in terms of the loss function (19) based on simulation replicates, which further confirms the superiority of SpatMCA.
4.2 A Two-Dimensional Experiment
For a two-dimensional experiment, we generated data according to (1) with , , , ,
[TABLE]
for , equally spaced in , and equally spaced in . Here , , and are given by (20), (21), (22) and (23) with , respectively. We considered three pairs of , and applied the proposed SpatMCA with and selected by (12), resulting in different combinations. Similar to the previous subsection, we applied the 5-fold CV of (12) to select among values of and (including [math] and the other 20 values equally spaced on the log scale from to ) and values of and (including [math] and the other 10 values equally spaced on the log scale from to ).
Figures 5 and 6 show the estimates of and , respectively, for the four methods based on randomly selected data generated from three different combinations of singular values. Figure 7 shows the performance of the four methods in terms of the loss function (19) based on 50 simulation replicates. Similar to the one-dimensional example, SpatMCA outperforms all the other methods in all cases.
4.3 An Application to Sea Surface Temperature and Precipitation Datasets
We applied the proposed SpatMCA and MCA to investigate how precipitations in eastern Africa are affected by SSTs in the Indian Ocean and compared the differences between the two methods. The SST data are monthly averages (in degree Celsius) provided by the Met Office Marine Data Bank (available at http://www.metoffice.gov.uk/hadobs/hadisst/). The precipitation data are monthly averages (in mm) provided by the Earth System Research Laboratory, Physical Science Division of the National Oceanic and Atmospheric Administration (available at http://www.esrl.noaa.gov/psd/). Both datasets are on degree latitude by 1 degree longitude equiangular grid cells. As in Omondi et al. (2013), we considered a region of the Indian Ocean between latitudes N and S and between longitudes E and E for the SST dataset, and we chose a region of eastern Africa between N and S and between longitudes E and E for the precipitation dataset. We used the data observed from January 2011 to December 2015. Let and be the vectors of (1) corresponding to SST in the Indian Ocean and precipitation in eastern Africa. In this example, , , and .
First, the SST data and the precipitation data were detrended by subtracting their individual average for a given cell and a given month. Then, the data were randomly split into two parts as the training data and the validation data. We applied SpatMCA to the training data with selected by of , where 21 values of and (including [math] and the other 20 values equally spaced on the log scale from to ) and 21 values of and (including [math] and the other values equally spaced on the log scale from to ) were selected by using 5-fold CV of (10) and (11).
The best CV values with respect to for both methods are shown in Figure 8. Clearly, both methods selected . Figure 9 shows the first dominant coupled patterns of SST and precipitation obtained from SpatMCA and MCA. While both methods produce similar patterns, the SST pattern obtained by MCA is much noisier. Figure 10 shows two time series of the first maximum covariance variables, and , which are the projections of the training data for , onto and , respectively. As shown in the figure, the first maximum covariance variables of SST and precipitation are highly correlated. Indeed, the Pearson correlation coefficient between these two series is 0.59 for SpatMCA and 0.63 for MCA, showing the importance of these patterns.
We further used the validation data to compare the performance between SpatMCA and MCA in terms of the average squared error (ASE), , where is the sample cross-covariance matrix of the validation data, and is a generic estimate of . The resulting ASE for MCA is , which is larger than for SpatMCA. Figure 11 shows the ASEs with respect to for both SpatMCA and MCA, which further demonstrate the superiority of SpatMCA over MCA.
Acknowledgements
This research was supported in part by ROC Ministry of Science and Technology grant MOST 103-2118-M-001-007-MY3.
References
- Azaïez and Belgacem (2015)
M Azaïez and F Ben Belgacem.
Karhunen–loève’s truncation error for bivariate functions.
Computer Methods in Applied Mechanics and Engineering, 290:57–72, 2015.
- Boyd et al. (2011)
Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein.
Distributed optimization and statistical learning via the alternating direction method of multipliers.
Foundations and Trends in Machine Learning, 3:1–124, 2011.
- Fan and Li (2001)
Jianqing Fan and Runze Li.
Variable selection via nonconcave penalized likelihood and its oracle properties.
Journal of the American statistical Association, 96(456):1348–1360, 2001.
- Gabay and Mercier (1976)
Daniel Gabay and Bertrand Mercier.
A dual algorithm for the solution of nonlinear variational problems via finite element approximation.
Computer and Mathematics with Applications, 2:17–40, 1976.
- Green and Silverman (1994)
P.J. Green and B.W. Silverman.
Nonparametric regression and generalized linear model: a roughness penalty approach.
Chapman and Hall/CRC, London, 1994.
- Karhunen (1947)
Kari Karhunen.
Über lineare methoden in der wahrscheinlichkeitsrechnung.
Annales Academiæ Scientiarum Fennicæ Series A, 37:1–79, 1947.
- Lee and Cichocki (2014)
Namgil Lee and Andrzej Cichocki.
Big data matrix singular value decomposition based on low-rank tensor train decomposition.
In Advances in Neural Networks–ISNN 2014, pages 121–130. Springer, Switzerland, 2014.
- Lee et al. (2011)
Woojoo Lee, Donghwan Lee, Youngjo Lee, and Yudi Pawitan.
Sparse canonical covariance analysis for high-throughput data.
Statistical Applications in Genetics and Molecular Biology, 10(1), 2011.
- Loève (1978)
Michel Loève.
Probability theory.
Springer-Verlag, New York, 1978.
- Morioka et al. (2012)
Yushi Morioka, Tomoki Tozuka, Sebastien Masson, Pascal Terray, Jing-Jia Luo, and Toshio Yamagata.
Subtropical dipole modes simulated in a coupled general circulation model.
Journal of Climate, 25(12):4029–4047, 2012.
- Omondi et al. (2013)
P Omondi, JL Awange, LA Ogallo, J Ininda, and E Forootan.
The influence of low frequency sea surface temperature modes on delineated decadal rainfall zones in eastern africa region.
Advances in Water Resources, 54:161–180, 2013.
- Reason (2002)
CJC Reason.
Sensitivity of the southern african circulation to dipole sea-surface temperature patterns in the south indian ocean.
International Journal of Climatology, 22(4):377–393, 2002.
- Salim and Pawitan (2007)
Agus Salim and Yudi Pawitan.
Model-based maximum covariance analysis for irregularly observed climatological data.
Journal of agricultural, biological, and environmental statistics, 12(1):1–24, 2007.
- Salim et al. (2005)
Agus Salim, Yudi Pawitan, and K Bond.
Modelling association between two irregularly observed spatiotemporal processes by using maximum covariance analysis.
Journal of the Royal Statistical Society, Series C, 54(3):555–573, 2005.
- Tibshirani (1996)
Robert Tibshirani.
Regression shrinkage and selection via the lasso.
Journal of the Royal Statistical Society, Series B, 58(1):267–288, 1996.
- Tucker (1958)
Ledyard R Tucker.
An inter-battery method of factor analysis.
Psychometrika, 23(2):111–136, 1958.
- Wang and Huang (2017)
Wen-Ting Wang and Hsin-Cheng Huang.
Regularized principal component analysis for spatial data.
Journal of Computational and Graphical Statistics, 26:14–25, 2017.
- Witten et al. (2009)
Daniela M Witten, Robert Tibshirani, and Trevor Hastie.
A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.
Biostatistics, 10(3):515–534, 2009.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Azaïez and Belgacem (2015) M Azaïez and F Ben Belgacem. Karhunen–loève’s truncation error for bivariate functions. Computer Methods in Applied Mechanics and Engineering , 290:57–72, 2015.
- 2Boyd et al. (2011) Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning , 3:1–124, 2011.
- 3Fan and Li (2001) Jianqing Fan and Runze Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association , 96(456):1348–1360, 2001.
- 4Gabay and Mercier (1976) Daniel Gabay and Bertrand Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computer and Mathematics with Applications , 2:17–40, 1976.
- 5Green and Silverman (1994) P.J. Green and B.W. Silverman. Nonparametric regression and generalized linear model: a roughness penalty approach . Chapman and Hall/CRC, London, 1994.
- 6Karhunen (1947) Kari Karhunen. Über lineare methoden in der wahrscheinlichkeitsrechnung. Annales Academiæ Scientiarum Fennicæ Series A , 37:1–79, 1947.
- 7Lee and Cichocki (2014) Namgil Lee and Andrzej Cichocki. Big data matrix singular value decomposition based on low-rank tensor train decomposition. In Advances in Neural Networks–ISNN 2014 , pages 121–130. Springer, Switzerland, 2014.
- 8Lee et al. (2011) Woojoo Lee, Donghwan Lee, Youngjo Lee, and Yudi Pawitan. Sparse canonical covariance analysis for high-throughput data. Statistical Applications in Genetics and Molecular Biology , 10(1), 2011.
