Regularized Spatial Maximum Covariance Analysis

Wen-Ting Wang; Hsin-Cheng Huang

arXiv:1705.02716·stat.ME·May 9, 2017

Regularized Spatial Maximum Covariance Analysis

Wen-Ting Wang, Hsin-Cheng Huang

PDF

Open Access

TL;DR

This paper introduces a regularized approach to maximum covariance analysis in climate studies, enhancing interpretability of coupled spatial patterns through smoothness and sparsity constraints, with an efficient algorithm for implementation.

Contribution

It proposes a novel regularization method for MCA that improves pattern interpretability by incorporating spatial smoothness and sparsity, solved via an efficient ADMM algorithm.

Findings

01

Enhanced interpretability of coupled climate patterns.

02

Successful application to precipitation and sea surface temperature data.

03

Demonstrated efficiency of the proposed algorithm.

Abstract

In climate and atmospheric research, many phenomena involve more than one meteorological spatial processes covarying in space. To understand how one process is affected by another, maximum covariance analysis (MCA) is commonly applied. However, the patterns obtained from MCA may sometimes be difficult to interpret. In this paper, we propose a regularization approach to promote spatial features in dominant coupled patterns by introducing smoothness and sparseness penalties while accounting for their orthogonalities. We develop an efficient algorithm to solve the resulting optimization problem by using the alternating direction method of multipliers. The effectiveness of the proposed method is illustrated by several numerical examples, including an application to study how precipitations in east Africa are affected by sea surface temperatures in the Indian Ocean.

Equations66

{(η_{1 i} (s_{1}), η_{2 i} (s_{2})) : s_{1} \in D_{1}, s_{2} \in D_{2}}; i = 1, \dots, n,

{(η_{1 i} (s_{1}), η_{2 i} (s_{2})) : s_{1} \in D_{1}, s_{2} \in D_{2}}; i = 1, \dots, n,

\left(\begin{array}[]{c}\bm{Y}_{1i}\\ \bm{Y}_{2i}\end{array}\right)=\left(\begin{array}[]{c}\bm{\eta}_{1i}\\ \bm{\eta}_{2i}\end{array}\right)+\left(\begin{array}[]{c}\bm{\epsilon}_{1i}\\ \bm{\epsilon}_{2i}\end{array}\right);\quad{i=1,\dots,n},

\left(\begin{array}[]{c}\bm{Y}_{1i}\\ \bm{Y}_{2i}\end{array}\right)=\left(\begin{array}[]{c}\bm{\eta}_{1i}\\ \bm{\eta}_{2i}\end{array}\right)+\left(\begin{array}[]{c}\bm{\epsilon}_{1i}\\ \bm{\epsilon}_{2i}\end{array}\right);\quad{i=1,\dots,n},

\max_{\bm{U},\bm{V}}\mathrm{tr}(\bm{U}^{\prime}\bm{S}_{12}\bm{V})\quad\mbox{subject to $\bm{U}^{\prime}\bm{U}=\bm{V}^{\prime}\bm{V}=\bm{I}_{K}$},

\max_{\bm{U},\bm{V}}\mathrm{tr}(\bm{U}^{\prime}\bm{S}_{12}\bm{V})\quad\mbox{subject to $\bm{U}^{\prime}\bm{U}=\bm{V}^{\prime}\bm{V}=\bm{I}_{K}$},

tr (U^{'} S_{12} V) - k = 1 \sum K {τ_{1 u} J (u_{k}) + τ_{2 u} ∥ u_{k} ∥_{1} + τ_{1 v} J (v_{k}) + τ_{2 v} ∥ v_{k} ∥_{1}},

tr (U^{'} S_{12} V) - k = 1 \sum K {τ_{1 u} J (u_{k}) + τ_{2 u} ∥ u_{k} ∥_{1} + τ_{1 v} J (v_{k}) + τ_{2 v} ∥ v_{k} ∥_{1}},

J (u) = z_{1} + \dots + z_{d} = 2 \sum \int_{R^{d}} (\frac{\partial ^{2} u ( s )}{\partial x _{1}^{z_{1}} \dots \partial x _{d}^{z_{d}}})^{2} d s,

J (u) = z_{1} + \dots + z_{d} = 2 \sum \int_{R^{d}} (\frac{\partial ^{2} u ( s )}{\partial x _{1}^{z_{1}} \dots \partial x _{d}^{z_{d}}})^{2} d s,

\overset{u}{^}_{k} (s_{1})

\overset{u}{^}_{k} (s_{1})

\overset{v}{^}_{k} (s_{2})

g(r)=\left\{\begin{array}[]{ll}\displaystyle\frac{1}{16\pi}r^{2}\log{r};&\mbox{if $d=2$,}\vskip 3.0pt plus 1.0pt minus 1.0pt\\ \displaystyle\frac{\Gamma(d/2-2)}{16\pi^{d/2}}r^{4-d};&\mbox{if }d=1,3,\\ \end{array}\right.

g(r)=\left\{\begin{array}[]{ll}\displaystyle\frac{1}{16\pi}r^{2}\log{r};&\mbox{if $d=2$,}\vskip 3.0pt plus 1.0pt minus 1.0pt\\ \displaystyle\frac{\Gamma(d/2-2)}{16\pi^{d/2}}r^{4-d};&\mbox{if }d=1,3,\\ \end{array}\right.

{\left(\begin{array}[]{cc}\bm{G}_{1}&\bm{E}_{1}\\ \bm{E}^{\prime}_{1}&\bm{0}\\ \end{array}\right)\left(\begin{array}[]{c}{\bm{a}_{1}}\\ {\bm{b}_{1}}\end{array}\right)=\left(\begin{array}[]{c}\hat{\bm{u}}_{k}\\ \bm{0}\end{array}\right)}\quad\mbox{and}\quad{\left(\begin{array}[]{cc}\bm{G}_{2}&\bm{E}_{2}\\ \bm{E}^{\prime}_{2}&\bm{0}\\ \end{array}\right)\left(\begin{array}[]{c}{\bm{a}_{2}}\\ {\bm{b}_{2}}\end{array}\right)=\left(\begin{array}[]{c}\hat{\bm{v}}_{k}\\ \bm{0}\end{array}\right){.}}

{\left(\begin{array}[]{cc}\bm{G}_{1}&\bm{E}_{1}\\ \bm{E}^{\prime}_{1}&\bm{0}\\ \end{array}\right)\left(\begin{array}[]{c}{\bm{a}_{1}}\\ {\bm{b}_{1}}\end{array}\right)=\left(\begin{array}[]{c}\hat{\bm{u}}_{k}\\ \bm{0}\end{array}\right)}\quad\mbox{and}\quad{\left(\begin{array}[]{cc}\bm{G}_{2}&\bm{E}_{2}\\ \bm{E}^{\prime}_{2}&\bm{0}\\ \end{array}\right)\left(\begin{array}[]{c}{\bm{a}_{2}}\\ {\bm{b}_{2}}\end{array}\right)=\left(\begin{array}[]{c}\hat{\bm{v}}_{k}\\ \bm{0}\end{array}\right){.}}

J (u_{k}) = u_{k}^{'} Ω_{1} u_{k} \mbox an d J (v_{k}) = v_{k}^{'} Ω_{2} v_{k},

J (u_{k}) = u_{k}^{'} Ω_{1} u_{k} \mbox an d J (v_{k}) = v_{k}^{'} Ω_{2} v_{k},

tr (U^{'} S_{12} V) - k = 1 \sum K {τ_{1 u} u_{k}^{'} Ω_{1} u_{k} + τ_{2 u} ∥ u_{k} ∥_{1} + τ_{1 v} v_{k}^{'} Ω_{2} v_{k} + τ_{2 v} ∥ v_{k} ∥_{1}},

tr (U^{'} S_{12} V) - k = 1 \sum K {τ_{1 u} u_{k}^{'} Ω_{1} u_{k} + τ_{2 u} ∥ u_{k} ∥_{1} + τ_{1 v} v_{k}^{'} Ω_{2} v_{k} + τ_{2 v} ∥ v_{k} ∥_{1}},

\hat{D} = a r g min_{d_{1}, \dots, d_{K} \geq 0} ∥ S_{12} - \hat{U} D \hat{V}^{'} ∥_{F}^{2} = diag (\hat{d}_{1}, \dots, \hat{d}_{K}),

\hat{D} = a r g min_{d_{1}, \dots, d_{K} \geq 0} ∥ S_{12} - \hat{U} D \hat{V}^{'} ∥_{F}^{2} = diag (\hat{d}_{1}, \dots, \hat{d}_{K}),

\hat{C}_{12} (s_{1}, s_{2}) = k = 1 \sum K \hat{d}_{k} \overset{u}{^}_{k} (s_{1}) \overset{v}{^}_{k} (s_{2}) .

\hat{C}_{12} (s_{1}, s_{2}) = k = 1 \sum K \hat{d}_{k} \overset{u}{^}_{k} (s_{1}) \overset{v}{^}_{k} (s_{2}) .

CV (K, τ_{1 u}, τ_{2 u}, τ_{1 v}, τ_{2 v}) = \frac{1}{M} m = 1 \sum M ∥ S_{12}^{(m)} - \hat{U}_{K, τ_{1 u}, τ_{2 u}}^{(- m)} \hat{D}_{K, τ_{1 u}, τ_{2 u}, τ_{1 v}, τ_{2 v}}^{(- m)} (\hat{V}_{K, τ_{1 v}, τ_{2 v}}^{(- m)})^{'} ∥_{F}^{2},

CV (K, τ_{1 u}, τ_{2 u}, τ_{1 v}, τ_{2 v}) = \frac{1}{M} m = 1 \sum M ∥ S_{12}^{(m)} - \hat{U}_{K, τ_{1 u}, τ_{2 u}}^{(- m)} \hat{D}_{K, τ_{1 u}, τ_{2 u}, τ_{1 v}, τ_{2 v}}^{(- m)} (\hat{V}_{K, τ_{1 v}, τ_{2 v}}^{(- m)})^{'} ∥_{F}^{2},

(\overset{τ}{^}_{1 u} (K), \overset{τ}{^}_{1 v} (K)) = ar g min_{{τ_{1 u}, τ_{1 v}} \subset [0, \infty)^{2}} CV (K, τ_{1 u}, 0, τ_{1 v}, 0),

(\overset{τ}{^}_{1 u} (K), \overset{τ}{^}_{1 v} (K)) = ar g min_{{τ_{1 u}, τ_{1 v}} \subset [0, \infty)^{2}} CV (K, τ_{1 u}, 0, τ_{1 v}, 0),

(\overset{τ}{^}_{2 u} (K), \overset{τ}{^}_{2 v} (K)) = ar g min_{{τ_{2 u}, τ_{2 v}} \subset [0, \infty)^{2}} CV (K, \overset{τ}{^}_{1 u} (K), τ_{2 u}, \overset{τ}{^}_{1 v} (K), τ_{2 v}) .

(\overset{τ}{^}_{2 u} (K), \overset{τ}{^}_{2 v} (K)) = ar g min_{{τ_{2 u}, τ_{2 v}} \subset [0, \infty)^{2}} CV (K, \overset{τ}{^}_{1 u} (K), τ_{2 u}, \overset{τ}{^}_{1 v} (K), τ_{2 v}) .

\hat{K} = min {

\hat{K} = min {

CV (K + 1, \overset{τ}{^}_{1 u} (K + 1), \overset{τ}{^}_{2 u} (K + 1), \overset{τ}{^}_{1 v} (K + 1), \overset{τ}{^}_{2 v} (K + 1)); K = 1, 2, \dots} .

tr (G^{'} Θ G) - k = 1 \sum K (τ_{2 u} i = 1 \sum p_{1} ∣ g_{i k} ∣ + τ_{2 v} i = p_{1} + 1 \sum p_{1} + p_{2} ∣ g_{i k} ∣),

tr (G^{'} Θ G) - k = 1 \sum K (τ_{2 u} i = 1 \sum p_{1} ∣ g_{i k} ∣ + τ_{2 v} i = p_{1} + 1 \sum p_{1} + p_{2} ∣ g_{i k} ∣),

tr (G^{'} Θ G) - k = 1 \sum K (τ_{2 u} i = 1 \sum p_{1} ∣ r_{ik} ∣ + τ_{2 v} i = p_{1} + 1 \sum p_{1} + p_{2} ∣ r_{ik} ∣),

tr (G^{'} Θ G) - k = 1 \sum K (τ_{2 u} i = 1 \sum p_{1} ∣ r_{ik} ∣ + τ_{2 v} i = p_{1} + 1 \sum p_{1} + p_{2} ∣ r_{ik} ∣),

L (G, R, Q, Γ_{1}, Γ_{2}) =

L (G, R, Q, Γ_{1}, Γ_{2}) =

- tr (Γ_{1}^{'} (G - R)) - tr (Γ_{2}^{'} (G - Q))

- \frac{ζ}{2} (∥ G - R ∥_{F}^{2} + ∥ G - Q ∥_{F}^{2}),

G^{(ℓ + 1)} =

G^{(ℓ + 1)} =

=

R^{(ℓ + 1)} =

=

Q^{(ℓ + 1)} =

=

Γ_{1}^{(ℓ + 1)} =

Γ_{2}^{(ℓ + 1)} =

\displaystyle\mathcal{S}_{\tau_{2}}(\gamma_{1jk})=\left\{\begin{array}[]{ll}\mathrm{sign}(\gamma_{1{i}k})\max(|\gamma_{1{i}k}|-\tau_{2u},0);&\mbox{if ${i}\leq p_{1}$,}\vskip 3.0pt plus 1.0pt minus 1.0pt\\ \mathrm{sign}(\gamma_{1{i}k})\max(|\gamma_{1{i}k}|-\tau_{2v},0);&\mbox{{if $i>p_{1}$}},\\ \end{array}\right.

\displaystyle\mathcal{S}_{\tau_{2}}(\gamma_{1jk})=\left\{\begin{array}[]{ll}\mathrm{sign}(\gamma_{1{i}k})\max(|\gamma_{1{i}k}|-\tau_{2u},0);&\mbox{if ${i}\leq p_{1}$,}\vskip 3.0pt plus 1.0pt minus 1.0pt\\ \mathrm{sign}(\gamma_{1{i}k})\max(|\gamma_{1{i}k}|-\tau_{2v},0);&\mbox{{if $i>p_{1}$}},\\ \end{array}\right.

\mathrm{Loss}(\hat{C}_{12})=\frac{1}{p_{1}p_{2}}\sum_{i=1}^{p_{1}}\sum_{j=1}^{p_{2}}\big{(}\hat{C}_{12}(\bm{s}_{1i},\bm{s}_{2j})-C_{12}(\bm{s}_{1i},\bm{s}_{2j})\big{)}^{2}\>.

\mathrm{Loss}(\hat{C}_{12})=\frac{1}{p_{1}p_{2}}\sum_{i=1}^{p_{1}}\sum_{j=1}^{p_{2}}\big{(}\hat{C}_{12}(\bm{s}_{1i},\bm{s}_{2j})-C_{12}(\bm{s}_{1i},\bm{s}_{2j})\big{)}^{2}\>.

\frac{1}{p _{1} p _{2}} max (∥ G^{(ℓ + 1)} - G^{(ℓ)} ∥_{F}, ∥ G^{(ℓ + 1)} - R^{(ℓ + 1)} ∥_{F}, ∥ G^{(ℓ + 1)} - Q^{(ℓ + 1)} ∥_{F}) \leq 1 0^{- 4} .

\frac{1}{p _{1} p _{2}} max (∥ G^{(ℓ + 1)} - G^{(ℓ)} ∥_{F}, ∥ G^{(ℓ + 1)} - R^{(ℓ + 1)} ∥_{F}, ∥ G^{(ℓ + 1)} - Q^{(ℓ + 1)} ∥_{F}) \leq 1 0^{- 4} .

\left(\begin{array}[]{c}\bm{\eta}_{1i}\\ \bm{\eta}_{2i}\\ \end{array}\right)\sim N\left(\bm{0},{\left(\begin{array}[]{cc}\bm{I}&\bm{U}\mathrm{diag}(d_{1},d_{2})\bm{V}^{\prime}\\ \bm{V}\mathrm{diag}(d_{1},d_{2})\bm{U}^{\prime}&\bm{I}\\ \end{array}\right)}\right),

\left(\begin{array}[]{c}\bm{\eta}_{1i}\\ \bm{\eta}_{2i}\\ \end{array}\right)\sim N\left(\bm{0},{\left(\begin{array}[]{cc}\bm{I}&\bm{U}\mathrm{diag}(d_{1},d_{2})\bm{V}^{\prime}\\ \bm{V}\mathrm{diag}(d_{1},d_{2})\bm{U}^{\prime}&\bm{I}\\ \end{array}\right)}\right),

u_{1} (s_{1}) =

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical and numerical algorithms · Sparse and Compressive Sensing Techniques · Soil Geostatistics and Mapping

Full text

Regularized Spatial Maximum Covariance Analysis

Wen-Ting Wang

[email protected]

Hsin-Cheng Huang

[email protected]

Institute of Statistics, National Chiao Tung University

Institute of Statistical Science, Academia Sinica

Abstract

In climate and atmospheric research, many phenomena involve more than one meteorological spatial processes covarying in space. To understand how one process is affected by another, maximum covariance analysis (MCA) is commonly applied. However, the patterns obtained from MCA may sometimes be difficult to interpret. In this paper, we propose a regularization approach to promote spatial features in dominant coupled patterns by introducing smoothness and sparseness penalties while accounting for their orthogonalities. We develop an efficient algorithm to solve the resulting optimization problem by using the alternating direction method of multipliers. The effectiveness of the proposed method is illustrated by several numerical examples, including an application to study how precipitations in east Africa are affected by sea surface temperatures in the Indian Ocean.

keywords:

Singular value decomposition, Lasso, smoothing splines, orthogonal constraint, alternating direction method of multipliers

††journal: Environmetrics

1 Introduction

Many climate and atmospheric phenomena involve more than one meteorological spatial processes covarying in space. It is of interest to find dominant coupled patterns among these processes. For example, variations of sea surface temperatures (SSTs) in the Indian Ocean may affect precipitations in nearby countries in Africa, particularly over sensitive agricultural regions, and hence threaten the economies and livelihoods of these countries. Consequently, many studies have been conducted on the relationship between the SST and precipitation by analyzing their coupled patterns (e.g. Reason, 2002; Morioka et al., 2012; Omondi et al., 2013). A commonly used method is maximum covariance analysis (MCA), which seeks important spatial patterns that explain the maximum amount of covariance between the two processes by using the singular value decomposition (SVD) of the cross-covariance matrix (Tucker, 1958).

However, the leading coupled patterns obtained by MCA may sometimes be too noisy to be physically interpretable when the signal-to-noise ratio is low. Many approaches have been proposed to improve MCA. For example, Salim et al. (2005) and Salim and Pawitan (2007) proposed penalized likelihood approaches using roughness penalties to promote smoothness of the leading coupled patterns in space. However, these methods tend to capture global features but not localized ones. On the other hand, Witten et al. (2009) considered canonical correlation analysis with an $L_{1}$ constraint, and Lee et al. (2011) proposed a penalized likelihood method with the SCAD penalty (Fan and Li, 2001) to facilitate sparse patterns. However, these methods cannot be applied to continuous spatial domains with data observed at irregularly spaced locations. Additionally, all these methods ignore the orthogonal constraints in MCA patterns.

In this paper, we propose a regularization approach of MCA that incorporates smoothness and localized features in dominant coupled patterns. The proposed method, called spatial MCA (abbreviated as SpatMCA), is applicable to data measured irregularly in space. In addition, the resulting estimates can be effectively computed using the alternating direction method of multipliers (ADMM) (Boyd et al., 2011).

The remainder of this paper is organized as follows. In Section 2, we introduce the proposed SpatMCA method, including dominant coupled patterns estimation and spatial cross-covariance function estimation. Our ADMM algorithm for computing the SpatMCA estimate is provided in Section 3. Numerical experiments that illustrate the superiority of SpatMCA and an application to study the relationship between sea surface temperature and precipitation datasets are presented in Section 4.

2 The Proposed Method

Consider a sequence of uncorrelated, zero-mean, bivariate $L^{2}$ -continuous spatial processes on spatial domains $D_{1}\subset\mathbb{R}^{d}$ and $D_{2}\subset\mathbb{R}^{d}$ ,

[TABLE]

which have a common spatial covariance function $C_{jk}(\bm{s}_{j},\bm{s}_{k})=\mathrm{cov}(\eta_{ji}(\bm{s}_{j}),\eta_{ki}(\bm{s}_{k}))$ ; for $j,k=1,2$ . According to Azaïez and Belgacem (2015), $C_{12}(\bm{s}_{1},\bm{s}_{2})$ can be decomposed as $C_{12}(\bm{s}_{1},\bm{s}_{2})=\sum_{k=1}^{\infty}d_{k}u_{k}(\bm{s}_{1})v_{k}(\bm{s}_{2})$ , where $\{d_{k}\}$ are nonnegative singular values with $d_{1}\geq d_{2}\geq\cdots$ , and $\{u_{k}(\cdot)\}$ and $\{v_{k}(\cdot)\}$ are the two corresponding sets of orthonormal basis functions. The decomposition is similar to the Karhunen-Loéve expansion (Karhunen, 1947; Loève, 1978) for a univariate spatial process. Suppose we observe data $\bm{Y}_{ji}=(Y_{ji}(\bm{s}_{j1}),\dots,Y_{ji}(\bm{s}_{jp_{j}}))^{\prime}$ with added noise $\bm{\epsilon}_{ji}\sim(\bm{0},\sigma_{j}\bm{I})$ at the $p_{j}$ spatial locations $\bm{s}_{j1},\dots,\bm{s}_{jp_{j}}\in D_{j}$ for $j=1,2$ , according to

[TABLE]

where $\bm{\eta}_{ji}=(\eta_{ji}(\bm{s}_{j1}),\dots,\eta_{ji}(\bm{s}_{jp_{j}}))^{\prime}$ , and $\bm{\epsilon}_{1i},\bm{\epsilon}_{2i}$ and $(\bm{\eta}_{1i},\bm{\eta}_{2i})$ are mutually uncorrelated. Assume $d_{K+1}=0$ , and denote the cross-covariance matrix between $\bm{\eta}_{1i}$ and $\bm{\eta}_{2i}$ by $\bm{\Sigma}_{12}=\mathrm{cov}(\bm{\eta}_{1i},\bm{\eta}_{2i})$ . Let $\bm{\Sigma}_{12}=\bm{U}\bm{D}\bm{V}^{\prime}$ be the SVD of $\bm{\Sigma}_{12}$ , where $\bm{D}=\mathrm{diag}(d_{1},\dots,d_{K})$ , $\bm{U}={(}\bm{u}_{1},\dots,\bm{u}_{K})$ is a $p_{1}\times K$ matrix with the $(k,{i})$ -th element $u_{k}({\bm{s}_{1{i}}})$ , and $\bm{V}=(\bm{v}_{1},\dots,\bm{v}_{K})$ is a $p_{2}\times K$ matrix with the $(k,{i})$ -th element $v_{k}({\bm{s}_{2{i}}})$ . We aim to identify the first $L\leq K$ dominant spatial coupled patterns $(u_{1}(\cdot),\dots,u_{{L}}(\cdot))$ and $(v_{1}(\cdot),\dots,v_{L}(\cdot))$ with large $d_{1},\dots,d_{L}$ for processes $\eta_{1}({\cdot})$ and $\eta_{2}({\cdot})$ , as well as to estimate $C_{12}(\cdot,\cdot)$ .

Let $\bm{Y}_{j}=(\bm{Y}_{j1},\dots,\bm{Y}_{jn})^{\prime}$ for $j=1,2$ . The sample cross-covariance matrix of $\bm{Y}_{1}$ and $\bm{Y}_{2}$ is $\bm{S}_{12}=\bm{Y}^{\prime}_{1}\bm{Y}_{2}/n$ . Then the MCA estimates of $\bm{u}_{k}$ and $\bm{v}_{k}$ obtained by the SVD of $\bm{S}_{12}$ are $\tilde{\bm{u}}_{k}$ and $\tilde{\bm{v}}_{k}$ , the $k$ -th left and right singular vectors of $\bm{S}_{12}$ , for $k=1,\dots,K$ . Let $\tilde{\bm{U}}=(\tilde{\bm{u}}_{1},\dots,\tilde{\bm{u}}_{K})$ and $\tilde{\bm{V}}=(\tilde{\bm{v}}_{1},\dots,\tilde{\bm{v}}_{K})$ be $p_{1}\times K$ and $p_{2}\times K$ matrices formed by the first $K$ left and right singular vectors of $\bm{S}_{12}$ . Then $(\tilde{\bm{U}},\tilde{\bm{V}})$ solves the following constrained optimization problem (Lee and Cichocki, 2014):

[TABLE]

where $\bm{U}=(\bm{u}_{1},\dots,\bm{u}_{K})$ and $\bm{V}=(\bm{v}_{1},\dots,\bm{v}_{K})$ . However, $(\tilde{\bm{U}},\tilde{\bm{V}})$ may suffer from high estimation variability when $p_{1}$ or $p_{2}$ is large, $n$ is small, or $\sigma^{2}_{1}$ or $\sigma^{2}_{2}$ is large. Consequently, the patterns of $(\tilde{\bm{U}},\tilde{\bm{V}})$ may be too noisy to be physically interpretable. Additionally, for continuous spatial domains $D_{1}$ and $D_{2}$ , we also need to estimate $(u_{k}(\bm{s}^{*}_{1}),v_{k}(\bm{s}^{*}_{2}))$ at locations $\bm{s}^{*}_{1}{\in D_{1}}$ and $\bm{s}^{*}_{2}{\in D_{2}}$ , where data may be unavailable.

2.1 Regularized Spatial MCA

To reduce high estimation variability of MCA while controlling bias, our main idea is to introduce some spatial structure. We propose a regularization approach by maximizing the following objective function:

[TABLE]

over $u_{1}(\cdot),\dots,u_{K}(\cdot)$ and $v_{1}(\cdot),\dots,v_{K}(\cdot)$ , subject to $\bm{U}^{\prime}\bm{U}=\bm{V}^{\prime}\bm{V}=\bm{I}_{K}$ and $\bm{u}^{\prime}_{1}\bm{S}_{12}\bm{v}_{1}\geq\dots\geq\bm{u}^{\prime}_{K}\bm{S}_{12}\bm{v}_{K}$ , where

[TABLE]

is a roughness penalty, $\|\bm{u}_{k}\|_{1}=\sum_{{i}=1}^{p_{1}}u_{k}(\bm{s}_{1{i}})$ , $\|\bm{v}_{k}\|_{1}=\sum_{{i}=1}^{p_{2}}v_{k}(\bm{s}_{2{i}})$ , $\bm{s}=(x_{1},\dots,x_{d})^{\prime}$ , $\tau_{1u}$ and $\tau_{1v}$ are nonnegative smoothness parameters, and $\tau_{2u}$ and $\tau_{2v}$ are nonnegative sparseness parameters. Since the patterns of $u_{k}(\cdot)$ and $v_{k}(\cdot)$ could be very different, we allow $\tau_{1u}\neq\tau_{1v}$ and $\tau_{2u}\neq\tau_{2v}$ . Note that $J(\cdot)$ is the smoothing spline penalty, designed to enhance smoothness of $u_{k}(\cdot)$ and $v_{k}(\cdot)$ , and the $L_{1}$ Lasso penalty (Tibshirani, 1996) is applied to seek sparse patterns by shrinking $u_{k}(\cdot)$ and $v_{k}(\cdot)$ toward zero. The combination of the smoothness and sparseness penalties was shown by Wang and Huang (2017) to be effective in obtaining smooth and localized patterns for a univariate spatial process. Denote $\hat{u}_{1}(\cdot),\dots,\hat{u}_{K}(\cdot)$ and $\hat{v}_{1}(\cdot),\dots,\hat{v}_{K}(\cdot)$ as the maximizers of (2). When $\tau_{1u}$ is larger, $\{\hat{u}_{k}(\cdot)\}$ become smoother, and vice versa. When $\tau_{2u}$ is larger, $\{\hat{u}_{k}(\cdot)\}$ become more localized by forcing more elements of $\bm{u}_{k}$ to be zero. Similar results can be applied to $\tau_{1v}$ and $\tau_{2v}$ for $\{\hat{v}_{k}(\cdot)\}$ . On the other hand, when $\tau_{1u}=\tau_{2u}=\tau_{1v}=\tau_{2v}=0$ , the estimates reduce to the MCA estimates.

According to the smoothing spline theory (Green and Silverman, 1994), $\hat{u}(\cdot)$ and $\hat{v}(\cdot)$ are natural cubic splines and thin-plate splines for $d=1$ and $d\in{\{2,3\}}$ with knots at $\{\bm{s}_{11},\dots,\bm{s}_{1p_{1}}\}$ and $\{\bm{s}_{21},\dots,\bm{s}_{2p_{2}}\}$ , respectively. Specifically,

[TABLE]

where $\bm{s}_{{j}}=(x_{{{j}}1},\dots,x_{{{j}}d})^{\prime}$ for ${{j}}=1,2$ ,

[TABLE]

and the coefficients ${\bm{a}_{j}}=\left({a}_{j1},\dots,{a}_{jp_{j}}\right)^{\prime}$ and ${\bm{b}_{j}}=\left({b}_{j0},b_{j1},\dots,{b}_{jd}\right)^{\prime}$ for $j=1,2$ satisfy

[TABLE]

Here $\hat{\bm{u}}_{k}=(\hat{u}_{k}(\bm{s}_{11}),\dots,\hat{u}_{k}(\bm{s}_{1p_{1}}))^{\prime}$ , $\hat{\bm{v}}_{k}=(\hat{v}_{k}(\bm{s}_{21}),\dots,\hat{v}_{k}(\bm{s}_{2p_{2}}))^{\prime}$ , $\bm{G}_{j}$ is a $p_{j}\times p_{j}$ matrix with the $(i,{i^{\prime}})$ -th element $g(\|\bm{s}_{{ji}}-\bm{s}_{j{i^{\prime}}}\|)$ , and $\bm{E}_{j}$ is a $p_{j}\times(d+1)$ matrix with the $i$ -th row $(1,\bm{s}^{\prime}_{ji})$ for $j=1,2$ . Therefore, $\hat{u}_{k}(\cdot)$ and $\hat{v}_{k}(\cdot)$ in (3) and (4) can be expressed in terms of $\hat{\bm{u}}_{k}$ and $\hat{\bm{v}}_{k}$ , respectively.

The roughness penalties of $u_{k}(\cdot)$ and $v_{k}(\cdot)$ can also be written as

[TABLE]

where $\bm{\Omega}_{j}$ is a known $p_{j}\times p_{j}$ matrix determined only by $\bm{s}_{j1},\dots,\bm{s}_{jp_{j}}$ for $j=1,2$ (Green and Silverman, 1994). Therefore, from (2) and (5), the proposed estimate $(\hat{\bm{U}}_{K,\tau_{1u},\tau_{2u}},\hat{\bm{V}}_{K,\tau_{1v},\tau_{2v}})$ of $(\bm{U},\bm{V})$ can be simplified by maximizing the following objective function:

[TABLE]

subject to $\bm{U}^{\prime}\bm{U}=\bm{V}^{\prime}\bm{V}=\bm{I}_{K}$ and $\bm{u}^{\prime}_{1}\bm{S}_{12}\bm{v}_{1}\geq\dots\geq\bm{u}^{\prime}_{K}\bm{S}_{12}\bm{v}_{K}$ . We call the proposed method based on (6) SpatMCA. Given $(\hat{\bm{U}}_{K,\tau_{1u},\tau_{2u}}$ , $\hat{\bm{V}}_{K,\tau_{1v},\tau_{2v}})$ , the estimates of $(u_{1}(\cdot),v_{1}(\cdot)),\dots,(u_{K}(\cdot),v_{K}(\cdot))$ can be directly calculated by (3) and (4). Note that the SpatMCA estimate of (6) reduces to a sparse CCA estimate of Witten et al. (2009) if $\mathrm{var}(\bm{Y}_{j})=\bm{I}_{p_{j}}$ , $\bm{\Omega}_{j}=\bm{I}_{p_{j}}$ for $j=1,2$ , and the orthogonal constraints of $\bm{U}$ and $\bm{V}$ are dropped.

2.2 Estimation of Cross-Covariance Function

To estimate $C_{12}(\cdot,\cdot)$ , we also have to estimate $\bm{D}$ . Given $(\hat{\bm{U}},\hat{\bm{V}})=(\hat{\bm{U}}_{K,\tau_{1u},\tau_{2u}},\hat{\bm{V}}_{K,\tau_{1v},\tau_{2v}})$ with $\hat{\bm{U}}=(\hat{\bm{u}}_{1},\dots,\hat{\bm{u}}_{K})$ and $\hat{\bm{V}}=(\hat{\bm{v}}_{1},\dots,\hat{\bm{v}}_{K})$ , the proposed estimate of $\bm{D}$ is

[TABLE]

where $\hat{d}_{k}=\max\{\hat{\bm{u}}^{\prime}_{k}\bm{S}_{12}\hat{\bm{v}}_{k},0\};$ $k=1,\dots,K$ , and $\|\bm{M}\|_{F}=\Big{(}\displaystyle\sum_{i,j}m^{2}_{ij}\Big{)}^{1/2}$ is the Frobenius norm of a matrix $\bm{M}$ . Then, the proposed estimate of $C_{12}(\cdot,\cdot)$ is

[TABLE]

2.3 Tuning Parameter Selection

An $M$ -fold cross-validation (CV) is applied to select the tuning parameters $\tau_{1u}$ , $\tau_{2u}$ , $\tau_{1v}$ and $\tau_{2v}$ . First, we randomly decompose the index set $\{1,\dots,n\}$ into $M$ parts that are as close to the same size, $n_{M}$ , as possible. Let $(\bm{Y}^{(m)}_{1},\bm{Y}^{(m)}_{2})$ be the sub-matrix of $(\bm{Y}_{1},\bm{Y}_{2})$ corresponding to the $m$ -th part. For $m=1,\dots,M$ , we treat $(\bm{Y}^{(m)}_{1},\bm{Y}^{(m)}_{2})$ as the validation data, and we obtain the estimate $(\hat{\bm{U}}^{(-m)}_{K,\tau_{1u},\tau_{2u}},\hat{\bm{V}}^{(-m)}_{K,\tau_{1v},\tau_{2v}})$ of $(\bm{U},\bm{V})$ for ${\{}\tau_{1u},\tau_{2u},\tau_{1v},\tau_{2v}{\}}\in\mathcal{A}$ based on the remaining data $(\bm{Y}^{(-m)}_{1},\bm{Y}^{(-m)}_{2})$ using the proposed method (6), where $\mathcal{A}\subset[0,\infty)^{4}$ is a candidate index set. Then the proposed CV criterion is

[TABLE]

where $\bm{S}^{(m)}_{12}=\left(\bm{X}^{(m)}\right)^{\prime}\bm{Y}^{(m)}/n_{M}$ , and $\hat{\bm{D}}^{(-m)}_{{K,\tau_{1u},\tau_{2u},\tau_{1v},\tau_{2v}}}$ is the estimate of $\bm{D}$ from (7) with $({\hat{\bm{U}},\hat{\bm{V}}})$ replaced by $(\hat{\bm{U}}^{(-m)}_{K,\tau_{1u},\tau_{2u}},\hat{\bm{V}}^{(-m)}_{K,\tau_{1v},\tau_{2v}})$ .

Owing to the high computation cost to select $\{\tau_{1u},\tau_{2u},\tau_{1v},\tau_{2v}\}$ simultaneously for each $K$ , we recommend an effective two-step procedure for selecting them. Specifically, we first select $\tau_{1u}$ and $\tau_{1v}$ with $\tau_{2u}=\tau_{2v}=0$ by

[TABLE]

and then select $\tau_{2u}$ and $\tau_{2v}$ by

[TABLE]

Finally, we select the rank $K$ of $\bm{U}\bm{D}\bm{V}^{\prime}$ by computing the CV values of (9) for $K=1,2,\dots$ , evaluated at the four selected tuning parameter values until no further reduction of the CV value is obtained. That is,

[TABLE]

3 Computation Algorithm

Let $\bm{G}=(\bm{U}^{\prime},\bm{V}^{\prime})^{\prime}$ be a $(p_{1}+p_{2})\times K$ matrix with the $({i},k)$ -th element $g_{{i}k}$ . The objective function (6) can be rewritten as

[TABLE]

subject to $\bm{U}^{\prime}\bm{U}=\bm{V}^{\prime}\bm{V}=\bm{I}_{K}$ , where $\bm{\Theta}={\left(\begin{array}[]{cc}-\tau_{1u}\bm{\Omega}_{1}&\bm{S}_{12}/2\\ \bm{S}^{\prime}_{12}/2&-\tau_{1v}\bm{\Omega}_{2}\\ \end{array}\right)}$ . The maximizer of (13), consisting of the orthogonal constraint and the Lasso penalty, is too complex to solve directly. We adopt the ADMM algorithm (originated by Gabay and Mercier, 1976) by decomposing the constrained optimization problems into small subproblems that can be efficiently handled. The readers are referred to Boyd et al. (2011) for more details regarding ADMM.

First, we transform (13) into the following equivalent form by adding $(p_{1}+p_{2})\times K$ parameter matrices $\bm{Q}$ and $\bm{R}$ :

[TABLE]

subject to $\bm{Q}_{1}^{\prime}\bm{Q}_{1}=\bm{Q}_{2}^{\prime}\bm{Q}_{2}=\bm{I}_{K}$ , and a new constraint $\bm{G}=\bm{Q}=\bm{R}$ , where $r_{ik}$ is the $(i,k)$ -th element of $\bm{R}$ , $\bm{Q}=(\bm{Q}^{\prime}_{1},\bm{Q}^{\prime}_{2})^{\prime}$ , and $\bm{Q}_{1}$ and $\bm{Q}_{2}$ are $p_{1}\times K$ and $p_{2}\times K$ sub-matrices of $\bm{Q}$ formed by the first $p_{1}$ and the last $p_{2}$ rows of $\bm{Q}$ , respectively. The resulting augmented Lagrange function is

[TABLE]

subject to $\bm{Q}_{1}^{\prime}\bm{Q}_{1}=\bm{Q}_{2}^{\prime}\bm{Q}_{2}=\bm{I}_{K}$ , where $\bm{\Gamma}_{1}$ and $\bm{\Gamma}_{2}$ are $(p_{1}+p_{2})\times K$ matrices of Lagrange multipliers, and $\zeta\geq 0$ is a penalty parameter to promote convergence. Then the ADMM steps at the $(\ell+1)$ -th iteration have the following closed formed expressions:

[TABLE]

where

[TABLE]

$\gamma_{1{i}k}$ is the $({i},k)$ -th element of $\bm{\Gamma}_{1}$ , $\bm{E}_{j}^{(\ell)}\bm{\Lambda}_{j}^{(\ell)}\left(\bm{F}_{j}^{(\ell)}\right)^{\prime}$ is the SVD of $\zeta\bm{G}_{j}^{(\ell+1)}+\bm{\Gamma}^{(\ell)}_{2j}$ for $j=1,2$ , $\bm{G}^{(\ell+1)}_{1}$ and $\bm{G}^{(\ell+1)}_{21}$ are $p_{1}\times K$ and $p_{2}\times K$ sub-matrices of $\bm{G}^{(\ell+1)}$ corresponding to $\bm{U}$ and $\bm{V}$ , and $\bm{\Gamma}^{(\ell+1)}_{21}$ and $\bm{\Gamma}^{(\ell+1)}_{22}$ are $p_{1}\times K$ and $p_{2}\times K$ sub-matrices of $\bm{\Gamma}^{(\ell+1)}_{2}$ corresponding to $\bm{U}$ and $\bm{V}$ . Note that $\zeta$ must be chosen large enough to ensure that $\zeta\bm{I}-\bm{\Theta}$ in (14) is positive-definite.

4 Numerical Examples

This section contains several simulation examples in one-dimensional and two-dimensional spatial domains and an application of SpatMCA to a real dataset. We compared the performance of the proposed SpatMCA with three other methods: (1) MCA ( $\tau_{1u}=\tau_{1v}=\tau_{2u}=\tau_{2v}=0$ ); (2) SpatMCA with the smoothness penalties only ( $\tau_{2u}=\tau_{2v}=0$ ); (3) SpatMCA with the sparseness penalties only ( $\tau_{1u}=\tau_{1v}=0$ ), in terms of the following loss function:

[TABLE]

Throughout this section, we applied the proposed SpatMCA method and the ADMM algorithm given by (14)–(18) to compute the SpatMCA estimates with $\zeta$ being ten times the maximum singular value of $\bm{S}_{12}$ . Additionally, the stopping criterion for the ADMM algorithm is

[TABLE]

4.1 A One-Dimensional Experiment

We generated data from (1) with $K=2$ , $d=1$ , $n=1000,$

[TABLE]

$\bm{\epsilon}_{ji}\sim N(\bm{0},\bm{I})$ , ${p_{j}}=50$ , $(\bm{s}_{j1},\dots,\bm{s}_{jp_{j}})$ equally spaced in $[-7,7]$ , and

[TABLE]

where $\bm{s}_{j}=(x_{j1},\dots,x_{jd})^{\prime}$ , $c_{1}$ , $c_{2}$ , $c_{3}$ and $c_{4}$ are normalization constants such that $\|\bm{u}_{j}\|_{2}=\|\bm{v}_{j}\|_{2}=1$ for $j=1,2$ . We considered three pairs of $(d_{1},d_{2})\in\{(1,0),(0.5,0),(1,0.7)\}$ , and applied the proposed SpatMCA with $K=\{1,2,5\}$ and $\hat{K}$ selected by (12). For each case, we applied the 5-fold CV of (12) to select ${\{}\tau_{1u},\tau_{1v},\tau_{2u},\tau_{2v}{\}}$ among $21$ values of $\tau_{1u}$ and $\tau_{1v}$ (including [math] and the other 20 values equally spaced on the log scale from $10^{-2}$ to $10$ ) and $11$ values of $\tau_{2u}$ and $\tau_{2v}$ (including [math] and the other 10 values equally spaced on the log scale from $10^{-3}$ to $1$ ).

Figures 1 and 2 show the estimates of $u_{k}(\cdot)$ and $v_{k}(\cdot)$ , respectively, for the four methods based on three different combinations of singular values. Each case contains four estimated functions based on four randomly generated datasets. Not surprisingly, the MCA estimates considering no spatial structure are very noisy, particularly when the signal-to-noise ratio is small. Adding only the smoothness penalties (i.e., $\tau_{2u}=\tau_{2v}=0$ ) reduces noise, but introduces some bias. On the other hand, adding only the sparseness penalties (i.e, $\tau_{1u}=\tau_{1v}=0$ ) does not reduce much noise, despite that the estimated $\{u_{k}(\cdot)\}$ and $\{v_{k}(\cdot)\}$ are forced to be zeros at some locations. Our SpatMCA estimates generally reproduce the targets with little noise for all cases even for the small signal-to-noise ratio, indicating the effectiveness of regularization.

The cross-covariance function estimates for the four methods based on a randomly generated dataset are shown in Figure 3. The proposed SpatMCA can be seen to perform better than the other methods for all cases. Figure 4 shows boxplots of the four methods in terms of the loss function (19) based on $50$ simulation replicates, which further confirms the superiority of SpatMCA.

4.2 A Two-Dimensional Experiment

For a two-dimensional experiment, we generated data according to (1) with $K=2$ , $n=5,000$ , $p_{1}=25^{2}$ , $p_{2}=20^{2}$ ,

[TABLE]

$\bm{\epsilon}_{ji}\sim N(\bm{0},\bm{I})$ for $j=1,2$ , $(\bm{s}_{11},\dots,\bm{s}_{1p_{1}})$ equally spaced in $[-5,5]^{2}$ , and $(\bm{s}_{21},\dots,\bm{s}_{2p_{2}})$ equally spaced in $[-7,7]^{2}$ . Here $u_{1}(\cdot)$ , $v_{1}(\cdot)$ , $u_{2}(\cdot)$ and $v_{2}(\cdot)$ are given by (20), (21), (22) and (23) with $d=2$ , respectively. We considered three pairs of $(d_{1},d_{2})\in\{(1,0),(0.5,0),(1,0.7)\}$ , and applied the proposed SpatMCA with $K=\{1,2,5\}$ and $\hat{K}$ selected by (12), resulting in $12$ different combinations. Similar to the previous subsection, we applied the 5-fold CV of (12) to select ${\{}\tau_{1u},\tau_{1v},\tau_{2u},\tau_{2v}{\}}$ among $21$ values of $\tau_{1u}$ and $\tau_{1v}$ (including [math] and the other 20 values equally spaced on the log scale from $10^{-2}$ to $10$ ) and $11$ values of $\tau_{2u}$ and $\tau_{2v}$ (including [math] and the other 10 values equally spaced on the log scale from $10^{-3}$ to $1$ ).

Figures 5 and 6 show the estimates of $u_{k}(\cdot)$ and $v_{k}(\cdot)$ , respectively, for the four methods based on randomly selected data generated from three different combinations of singular values. Figure 7 shows the performance of the four methods in terms of the loss function (19) based on 50 simulation replicates. Similar to the one-dimensional example, SpatMCA outperforms all the other methods in all cases.

4.3 An Application to Sea Surface Temperature and Precipitation Datasets

We applied the proposed SpatMCA and MCA to investigate how precipitations in eastern Africa are affected by SSTs in the Indian Ocean and compared the differences between the two methods. The SST data are monthly averages (in degree Celsius) provided by the Met Office Marine Data Bank (available at http://www.metoffice.gov.uk/hadobs/hadisst/). The precipitation data are monthly averages (in mm) provided by the Earth System Research Laboratory, Physical Science Division of the National Oceanic and Atmospheric Administration (available at http://www.esrl.noaa.gov/psd/). Both datasets are on $1$ degree latitude by 1 degree longitude equiangular grid cells. As in Omondi et al. (2013), we considered a region of the Indian Ocean between latitudes $20^{\circ}$ N and $30^{\circ}$ S and between longitudes $20^{\circ}$ E and $120^{\circ}$ E for the SST dataset, and we chose a region of eastern Africa between $6^{\circ}$ N and $12^{\circ}$ S and between longitudes $20^{\circ}$ E and $42^{\circ}$ E for the precipitation dataset. We used the data observed from January 2011 to December 2015. Let $\bm{\eta}_{1i}$ and $\bm{\eta}_{2i}$ be the vectors of (1) corresponding to SST in the Indian Ocean and precipitation in eastern Africa. In this example, $p_{1}=3,591$ , $p_{2}=255$ , and $n=60$ .

First, the SST data and the precipitation data were detrended by subtracting their individual average for a given cell and a given month. Then, the data were randomly split into two parts as the training data and the validation data. We applied SpatMCA to the training data with $K$ selected by $\hat{K}$ of $(\ref{eq:ch5khat})$ , where 21 values of $\tau_{1u}$ and $\tau_{1v}$ (including [math] and the other 20 values equally spaced on the log scale from $10^{-1}$ to $10^{6}$ ) and 21 values of $\tau_{2u}$ and $\tau_{2v}$ (including [math] and the other $20$ values equally spaced on the log scale from $10^{-3}$ to $0.5$ ) were selected by using 5-fold CV of (10) and (11).

The best CV values with respect to $K$ for both methods are shown in Figure 8. Clearly, both methods selected $\hat{K}=1$ . Figure 9 shows the first dominant coupled patterns of SST and precipitation obtained from SpatMCA and MCA. While both methods produce similar patterns, the SST pattern obtained by MCA is much noisier. Figure 10 shows two time series of the first maximum covariance variables, $\{\hat{\bm{u}}^{\prime}_{1}\bm{Y}_{11},\dots,\hat{\bm{u}}^{\prime}_{1}\bm{Y}_{1n}\}$ and $\{\hat{\bm{v}}^{\prime}_{1}\bm{Y}_{21},\dots,\hat{\bm{v}}^{\prime}_{1}\bm{Y}_{2n}\}$ , which are the projections of the training data $(\bm{Y}_{j1},\dots,\bm{Y}_{jn})$ for $j=1,2$ , onto $\hat{\bm{u}}_{1}$ and $\hat{\bm{v}}_{1}$ , respectively. As shown in the figure, the first maximum covariance variables of SST and precipitation are highly correlated. Indeed, the Pearson correlation coefficient between these two series is 0.59 for SpatMCA and 0.63 for MCA, showing the importance of these patterns.

We further used the validation data to compare the performance between SpatMCA and MCA in terms of the average squared error (ASE), ${\mathrm{ASE}}=\frac{1}{p_{1}p_{2}}\|\bm{S}^{v}_{12}-\hat{\bm{\Sigma}}_{12}\|^{2}_{F}$ , where $\bm{S}^{v}_{12}$ is the sample cross-covariance matrix of the validation data, and $\hat{\bm{\Sigma}}_{12}$ is a generic estimate of $\bm{\Sigma}_{12}$ . The resulting ASE for MCA is $2.59\times 10^{-3}$ , which is larger than $2.25\times 10^{-3}$ for SpatMCA. Figure 11 shows the ASEs with respect to $K$ for both SpatMCA and MCA, which further demonstrate the superiority of SpatMCA over MCA.

Acknowledgements

This research was supported in part by ROC Ministry of Science and Technology grant MOST 103-2118-M-001-007-MY3.

References

Azaïez and Belgacem (2015)

M Azaïez and F Ben Belgacem.

Karhunen–loève’s truncation error for bivariate functions.

Computer Methods in Applied Mechanics and Engineering, 290:57–72, 2015.

Boyd et al. (2011)

Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein.

Distributed optimization and statistical learning via the alternating direction method of multipliers.

Foundations and Trends in Machine Learning, 3:1–124, 2011.

Fan and Li (2001)

Jianqing Fan and Runze Li.

Variable selection via nonconcave penalized likelihood and its oracle properties.

Journal of the American statistical Association, 96(456):1348–1360, 2001.

Gabay and Mercier (1976)

Daniel Gabay and Bertrand Mercier.

A dual algorithm for the solution of nonlinear variational problems via finite element approximation.

Computer and Mathematics with Applications, 2:17–40, 1976.

Green and Silverman (1994)

P.J. Green and B.W. Silverman.

Nonparametric regression and generalized linear model: a roughness penalty approach.

Chapman and Hall/CRC, London, 1994.

Karhunen (1947)

Kari Karhunen.

Über lineare methoden in der wahrscheinlichkeitsrechnung.

Annales Academiæ Scientiarum Fennicæ Series A, 37:1–79, 1947.

Lee and Cichocki (2014)

Namgil Lee and Andrzej Cichocki.

Big data matrix singular value decomposition based on low-rank tensor train decomposition.

In Advances in Neural Networks–ISNN 2014, pages 121–130. Springer, Switzerland, 2014.

Lee et al. (2011)

Woojoo Lee, Donghwan Lee, Youngjo Lee, and Yudi Pawitan.

Sparse canonical covariance analysis for high-throughput data.

Statistical Applications in Genetics and Molecular Biology, 10(1), 2011.

Loève (1978)

Michel Loève.

Probability theory.

Springer-Verlag, New York, 1978.

Morioka et al. (2012)

Yushi Morioka, Tomoki Tozuka, Sebastien Masson, Pascal Terray, Jing-Jia Luo, and Toshio Yamagata.

Subtropical dipole modes simulated in a coupled general circulation model.

Journal of Climate, 25(12):4029–4047, 2012.

Omondi et al. (2013)

P Omondi, JL Awange, LA Ogallo, J Ininda, and E Forootan.

The influence of low frequency sea surface temperature modes on delineated decadal rainfall zones in eastern africa region.

Advances in Water Resources, 54:161–180, 2013.

Reason (2002)

CJC Reason.

Sensitivity of the southern african circulation to dipole sea-surface temperature patterns in the south indian ocean.

International Journal of Climatology, 22(4):377–393, 2002.

Salim and Pawitan (2007)

Agus Salim and Yudi Pawitan.

Model-based maximum covariance analysis for irregularly observed climatological data.

Journal of agricultural, biological, and environmental statistics, 12(1):1–24, 2007.

Salim et al. (2005)

Agus Salim, Yudi Pawitan, and K Bond.

Modelling association between two irregularly observed spatiotemporal processes by using maximum covariance analysis.

Journal of the Royal Statistical Society, Series C, 54(3):555–573, 2005.

Tibshirani (1996)

Robert Tibshirani.

Regression shrinkage and selection via the lasso.

Journal of the Royal Statistical Society, Series B, 58(1):267–288, 1996.

Tucker (1958)

Ledyard R Tucker.

An inter-battery method of factor analysis.

Psychometrika, 23(2):111–136, 1958.

Wang and Huang (2017)

Wen-Ting Wang and Hsin-Cheng Huang.

Regularized principal component analysis for spatial data.

Journal of Computational and Graphical Statistics, 26:14–25, 2017.

Witten et al. (2009)

Daniela M Witten, Robert Tibshirani, and Trevor Hastie.

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.

Biostatistics, 10(3):515–534, 2009.

Bibliography18

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Azaïez and Belgacem (2015) M Azaïez and F Ben Belgacem. Karhunen–loève’s truncation error for bivariate functions. Computer Methods in Applied Mechanics and Engineering , 290:57–72, 2015.
2Boyd et al. (2011) Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning , 3:1–124, 2011.
3Fan and Li (2001) Jianqing Fan and Runze Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association , 96(456):1348–1360, 2001.
4Gabay and Mercier (1976) Daniel Gabay and Bertrand Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computer and Mathematics with Applications , 2:17–40, 1976.
5Green and Silverman (1994) P.J. Green and B.W. Silverman. Nonparametric regression and generalized linear model: a roughness penalty approach . Chapman and Hall/CRC, London, 1994.
6Karhunen (1947) Kari Karhunen. Über lineare methoden in der wahrscheinlichkeitsrechnung. Annales Academiæ Scientiarum Fennicæ Series A , 37:1–79, 1947.
7Lee and Cichocki (2014) Namgil Lee and Andrzej Cichocki. Big data matrix singular value decomposition based on low-rank tensor train decomposition. In Advances in Neural Networks–ISNN 2014 , pages 121–130. Springer, Switzerland, 2014.
8Lee et al. (2011) Woojoo Lee, Donghwan Lee, Youngjo Lee, and Yudi Pawitan. Sparse canonical covariance analysis for high-throughput data. Statistical Applications in Genetics and Molecular Biology , 10(1), 2011.