Spatial Blind Source Separation

Fran\c{c}ois Bachoc (IMT); Marc G. Genton (KAUST); Klaus Nordhausen; (TU WIEN); Anne Ruiz-Gazen (TSE); Joni Virta

arXiv:1812.09187·math.ST·September 1, 2020

Spatial Blind Source Separation

Fran\c{c}ois Bachoc (IMT), Marc G. Genton (KAUST), Klaus Nordhausen, (TU WIEN), Anne Ruiz-Gazen (TSE), Joni Virta

PDF

Open Access

TL;DR

This paper introduces a novel spatial blind source separation method using joint diagonalisation of multiple scatter matrices, with theoretical analysis, simulation validation, and real data application.

Contribution

It proposes a new estimator based on joint diagonalisation of multiple scatter matrices, extending previous models and analyzing its asymptotic properties.

Findings

01

The new estimator performs well in simulations.

02

Asymptotic properties are rigorously derived.

03

Method is demonstrated on real spatial data.

Abstract

Recently a blind source separation model was suggested for spatial data together with an estimator based on the simultaneous diagonalisation of two scatter matrices. The asymptotic properties of this estimator are derived here and a new estimator, based on the joint diagonalisation of more than two scatter matrices, is proposed. The asymptotic properties and merits of the novel estimator are verified in simulation studies. A real data example illustrates the method.

Figures19

Click any figure to enlarge with its caption.

Equations293

X (s) = Ω Z (s),

X (s) = Ω Z (s),

M (f) = n^{- 1} i = 1 \sum n j = 1 \sum n f (s_{i} - s_{j}) X (s_{i}) X (s_{j})^{T},

M (f) = n^{- 1} i = 1 \sum n j = 1 \sum n f (s_{i} - s_{j}) X (s_{i}) X (s_{j})^{T},

Γ (f) M (f_{0}) Γ (f)^{T} = I_{p} \mbox an d Γ (f) M (f) Γ (f)^{T} = Λ (f),

Γ (f) M (f_{0}) Γ (f)^{T} = I_{p} \mbox an d Γ (f) M (f) Γ (f)^{T} = Λ (f),

M (f) = n^{- 1} i = 1 \sum n j = 1 \sum n f (s_{i} - s_{j}) E {X (s_{i}) X (s_{j})^{T}} \mbox an d M (f_{0}) = n^{- 1} i = 1 \sum n E {X (s_{i}) X (s_{i})^{T}} .

M (f) = n^{- 1} i = 1 \sum n j = 1 \sum n f (s_{i} - s_{j}) E {X (s_{i}) X (s_{j})^{T}} \mbox an d M (f_{0}) = n^{- 1} i = 1 \sum n E {X (s_{i}) X (s_{i})^{T}} .

Γ (f) M (f_{0}) Γ (f)^{T} = I_{p} \mbox an d Γ (f) M (f) Γ (f)^{T} = Λ (f),

Γ (f) M (f_{0}) Γ (f)^{T} = I_{p} \mbox an d Γ (f) M (f) Γ (f)^{T} = Λ (f),

C (s_{1}, s_{2}) = cov {X (s_{1}), X (s_{2})} := {C_{k, l} (s_{1}, s_{2})}_{k, l = 1}^{p},

C (s_{1}, s_{2}) = cov {X (s_{1}), X (s_{2})} := {C_{k, l} (s_{1}, s_{2})}_{k, l = 1}^{p},

C (h) = k = 1 \sum r ρ_{k} (h) T_{k},

C (h) = k = 1 \sum r ρ_{k} (h) T_{k},

C_{X} (h) = k = 1 \sum p K_{k} (h) T_{k},

C_{X} (h) = k = 1 \sum p K_{k} (h) T_{k},

M (f_{0}) = n^{- 1} i = 1 \sum n X (s_{i}) X (s_{i})^{T}

M (f_{0}) = n^{- 1} i = 1 \sum n X (s_{i}) X (s_{i})^{T}

∣ K_{k} (x) ∣ \leq \frac{A}{1 + ∥ x ∥ ^{d + α}};

∣ K_{k} (x) ∣ \leq \frac{A}{1 + ∥ x ∥ ^{d + α}};

∣ f (x) ∣ \leq \frac{A}{1 + ∥ x ∥ ^{d + α}};

∣ f (x) ∣ \leq \frac{A}{1 + ∥ x ∥ ^{d + α}};

n \to \infty lim inf i = 2, \dots, p min [{Ω^{- 1} M (f) Ω^{- T}}_{i, i} - {Ω^{- 1} M (f) Ω^{- T}}_{i - 1, i - 1}] > 0.

n \to \infty lim inf i = 2, \dots, p min [{Ω^{- 1} M (f) Ω^{- T}}_{i, i} - {Ω^{- 1} M (f) Ω^{- T}}_{i - 1, i - 1}] > 0.

W (f)_{i} = n^{1/2} {M (f)_{a, b} - M (f)_{a, b}} .

W (f)_{i} = n^{1/2} {M (f)_{a, b} - M (f)_{a, b}} .

d_{w} [Q_{n}, N {0, V (f, f_{0})}] \to 0,

d_{w} [Q_{n}, N {0, V (f, f_{0})}] \to 0,

{n}^{1/2}\left(\begin{array}[]{c}\mathrm{vect}\left\{\widehat{\Gamma}(f)-\Omega^{-1}\right\}\\ \mathrm{diag}\left\{\widehat{\Lambda}(f)-{\Lambda}(f)\right\}\end{array}\right).

{n}^{1/2}\left(\begin{array}[]{c}\mathrm{vect}\left\{\widehat{\Gamma}(f)-\Omega^{-1}\right\}\\ \mathrm{diag}\left\{\widehat{\Lambda}(f)-{\Lambda}(f)\right\}\end{array}\right).

d_{w} {Q_{n}, N (0, F_{1})} \to 0,

d_{w} {Q_{n}, N (0, F_{1})} \to 0,

Γ \in Γ : Γ M (f_{0}) Γ^{T} = I_{p} Γ \mbox ha sr o w s γ_{1}^{T}, \dots, γ_{p}^{T} argmax l = 1 \sum k j = 1 \sum p {γ_{j}^{T} M (f_{l}) γ_{j}}^{2} .

Γ \in Γ : Γ M (f_{0}) Γ^{T} = I_{p} Γ \mbox ha sr o w s γ_{1}^{T}, \dots, γ_{p}^{T} argmax l = 1 \sum k j = 1 \sum p {γ_{j}^{T} M (f_{l}) γ_{j}}^{2} .

Γ \in Γ : Γ M (f_{0}) Γ^{T} = I_{p} Γ \mbox ha sr o w s γ_{1}^{T}, \dots, γ_{p}^{T} argmax l = 1 \sum k j = 1 \sum p {γ_{j}^{T} M (f_{l}) γ_{j}}^{2} .

Γ \in Γ : Γ M (f_{0}) Γ^{T} = I_{p} Γ \mbox ha sr o w s γ_{1}^{T}, \dots, γ_{p}^{T} argmax l = 1 \sum k j = 1 \sum p {γ_{j}^{T} M (f_{l}) γ_{j}}^{2} .

d_{w} {Q_{n}, N (0, F_{k})} \to 0,

d_{w} {Q_{n}, N (0, F_{k})} \to 0,

ρ (h) = 2^{1 - κ} Γ (κ)^{- 1} (h / ϕ)^{κ} K_{κ} (h / ϕ),

ρ (h) = 2^{1 - κ} Γ (κ)^{- 1} (h / ϕ)^{κ} K_{κ} (h / ϕ),

\sc MDI (Γ) = (p - 1)^{- 1/2} in f {∥ C Γ Ω - I_{p} ∥, C \in C},

\sc MDI (Γ) = (p - 1)^{- 1/2} in f {∥ C Γ Ω - I_{p} ∥, C \in C},

(I_{p^{2}} - D_{p, p}) Σ (I_{p^{2}} - D_{p, p}),

(I_{p^{2}} - D_{p, p}) Σ (I_{p^{2}} - D_{p, p}),

Σ (f)_{i, j} = 2 n^{- 1} tr {R T (f)_{s, t} R T (f)_{u, v}} \mbox an d Σ (f, g)_{i, j} = 2 n^{- 1} tr {R T (f)_{s, t} R T (g)_{u, v}} .

Σ (f)_{i, j} = 2 n^{- 1} tr {R T (f)_{s, t} R T (f)_{u, v}} \mbox an d Σ (f, g)_{i, j} = 2 n^{- 1} tr {R T (f)_{s, t} R T (g)_{u, v}} .

V (f, g) = (Σ (f) Σ (g, f) Σ (f, g) Σ (g)) .

V (f, g) = (Σ (f) Σ (g, f) Σ (f, g) Σ (g)) .

A_{i, j} = ⎩ ⎨ ⎧ - 1/2 - λ_{I_{p} (i)} {λ_{I_{p} (i)} - λ_{J_{p} (i)}}^{- 1} 0 \mbox f or i = j \in D (p), \mbox f or i = j \neq \in D (p), \mbox o t h er w i se,

A_{i, j} = ⎩ ⎨ ⎧ - 1/2 - λ_{I_{p} (i)} {λ_{I_{p} (i)} - λ_{J_{p} (i)}}^{- 1} 0 \mbox f or i = j \in D (p), \mbox f or i = j \neq \in D (p), \mbox o t h er w i se,

B_{i, j} = {{λ_{I_{p} (i)} - λ_{J_{p} (i)}}^{- 1} 0 \mbox f or i = j \neq \in D (p), \mbox o t h er w i se,

B_{i, j} = {{λ_{I_{p} (i)} - λ_{J_{p} (i)}}^{- 1} 0 \mbox f or i = j \neq \in D (p), \mbox o t h er w i se,

C_{i, j} = {- λ_{i} 0 \mbox f or j = d_{p} (i), \mbox o t h er w i se \mbox an d D_{i, j} = {10 \mbox f or j = d_{p} (i), \mbox o t h er w i se .

C_{i, j} = {- λ_{i} 0 \mbox f or j = d_{p} (i), \mbox o t h er w i se \mbox an d D_{i, j} = {10 \mbox f or j = d_{p} (i), \mbox o t h er w i se .

G = (A C B D) .

G = (A C B D) .

(M_{Ω^{- 1}})_{a, b} = {(Ω^{- 1})_{J_{p} (b), J_{p} (a)} 0 \mbox i f I_{p} (a) = I_{p} (b), \mbox i f I_{p} (a) \neq = I_{p} (b) \mbox an d \overset{ˉ}{M}_{Ω^{- 1}} = (M_{Ω^{- 1}} 0 0 I_{p}) .

(M_{Ω^{- 1}})_{a, b} = {(Ω^{- 1})_{J_{p} (b), J_{p} (a)} 0 \mbox i f I_{p} (a) = I_{p} (b), \mbox i f I_{p} (a) \neq = I_{p} (b) \mbox an d \overset{ˉ}{M}_{Ω^{- 1}} = (M_{Ω^{- 1}} 0 0 I_{p}) .

F_{1} = \overset{ˉ}{M}_{Ω^{- 1}} G \tilde{V} (f) G^{T} \overset{ˉ}{M}_{Ω^{- 1}}^{T} .

F_{1} = \overset{ˉ}{M}_{Ω^{- 1}} G \tilde{V} (f) G^{T} \overset{ˉ}{M}_{Ω^{- 1}}^{T} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlind Source Separation Techniques · Spectroscopy and Chemometric Analyses · Speech and Audio Processing

Full text

\AtAppendix\AtAppendix\AtAppendix\AtAppendix\AtAppendix\AtAppendix

Spatial blind source separation

François BACHOC

[email protected]

Institut de Mathématiques de Toulouse, Université Paul Sabatier,

118 route de Narbonne, 31062 Toulouse, France

Marc G. GENTON

[email protected]

Statistics Program, King Abdullah University of Science and Technology,

Thuwal 23955-6900, Saudi Arabia

Klaus NORDHAUSEN

[email protected]

CSTAT, Vienna University of Technology,

Wiedner Hauptstr. 7, A-1040 Vienna, Austria

Anne RUIZ-GAZEN

[email protected]

Toulouse School of Economics, University of Toulouse Capitole,

21 allée de Brienne, 31000 Toulouse, France

Joni VIRTA

[email protected]

Department of Mathematics and Statistics, University of Turku,

20014 Turun yliopisto, Finland,

Department of Mathematics and Systems Analysis, Aalto University,

PL 11000, 00076 AALTO, Finland.

Abstract

Recently a blind source separation model was suggested for spatial data together with an estimator based on the simultaneous diagonalisation of two scatter matrices. The asymptotic properties of this estimator are derived here and a new estimator, based on the joint diagonalisation of more than two scatter matrices, is proposed. The asymptotic properties and merits of the novel estimator are verified in simulation studies. A real data example illustrates the method.

keywords:

Joint diagonalisation; Limiting distribution; Multivariate random field; Spatial scatter matrix.

1 Introduction

There is an abundance of multivariate data measured at spatial locations $s_{1},\ldots,s_{n}$ in a domain $\mathcal{S}^{d}\subseteq\mathbb{R}^{d}$ . Such data exhibit two kinds of dependence: measurements taken closer to each other tend to be more similar than measurements taken further apart, and the variable values within a single location are likely to be correlated.

This complexity makes modelling multivariate spatial data computationally and theoretically difficult due to the large number of parameters required to represent the dependencies. In this work we address this problem through blind source separation, a framework established as independent component analysis for independent and identically distributed data and for stationary and non-stationary time series; see Comon & Jutten (2010) and Nordhausen & Oja (2018). Denoting a $p$ -variate random field as $X(s)=\{X_{1}(s),\ldots,X_{p}(s)\}^{\mathrm{\scriptscriptstyle T}}$ , where T is the transpose operator, we assume that $X(s)$ obeys the spatial blind source separation model introduced in Nordhausen et al. (2015). That is, $X(s)$ at a location $s$ is a linear mixture of an underlying $p$ -variate latent field $Z(s)=\{Z_{1}(s),\ldots,Z_{p}(s)\}^{\mathrm{\scriptscriptstyle T}}$ with independent components,

[TABLE]

where $\Omega$ is an unknown $p\times p$ full rank matrix. In this introduction section, we consider that the random fields $X$ and $Z$ have mean functions zero, for the sake of simplicity.

When the observed random field $X$ takes the form (1.1), modeling and computational simplifications can be obtained. Indeed, if no assumption at all is made on $X$ , then the distribution of $X$ is characterized by $p$ covariance functions and by $p(p-1)/2$ cross-covariance functions. In contrast, when it is assumed that $X$ takes the form (1.1), then the distribution of $X$ is characterized by $p$ covariance functions and by a $p\times p$ matrix. As a function is an infinite-dimensional object, it is more difficult to model and estimate than a fixed-dimensional matrix. Thus, when the observed random field $X$ takes the form (1.1), modeling simplifications are available.

When no assumption is made on $X$ , a common practice in geostatistics is to let each of the $p$ covariance functions and each of the $p(p-1)/2$ cross-covariance functions of $X$ be characterized by $q$ parameters. For instance the case $q=2$ can correspond to a variance and a length scale parameter for an isotropic function. Then, the resulting $qp(p+1)/2$ parameters are usually estimated jointly by optimizing a fit criterion, typically the likelihood (Genton & Kleiber, 2015). This requires to perform an optimization in dimension $qp(p+1)/2$ , where the computational cost of an evaluation of the likelihood is $O(p^{3}n^{3})$ . Once the $qp(p+1)/2$ parameters are estimated, the prediction of $X(s)$ for new values of $s$ can be performed at the computational cost $O(p^{3}n^{3})$ .

In contrast, consider that model (1.1) holds for $X$ . We will show in this paper that an estimate of $\Omega^{-1}$ can be obtained. This is carried out by, first, computing scatter matrices with computational cost $O(p^{2}n^{2})$ and, second, performing an optimization in dimension $p^{2}$ where the computational cost of the function to be evaluated is $O(p^{2})$ , see § 4 for details. If each covariance function of $Z$ is characterized by $q$ parameters, each of them can be estimated separately, by optimizing the likelihood in dimension $q$ . The evaluation cost of the likelihood is $O(n^{3})$ . Once the $qp$ covariance parameters are estimated, the prediction of $X(s)$ for new values of $s$ can be performed at cost $O(pn^{3})$ . Indeed, the predictions of $Z_{1}(s),\ldots,Z_{p}(s)$ can be performed separately at cost $O(n^{3})$ and aggregated with negligible cost.

Not all random fields $X$ obey a spatial blind source separation model of the form (1.1). For instance, (1.1) forces the cross-covariance functions of $X$ to be symmetric. Nevertheless, it is a reasonable assumption in a fair number of practical situations (Nordhausen et al., 2015) and brings the computational benefits discussed above. Furthermore, an additional benefit of the form (1.1) is dimension reduction. In blind source separation, often significantly fewer than the full $p$ latent components are needed to capture the essential structure of the original observations and the remaining components can be discarded as noise.

We thus consider the spatial blind source separation model (1.1) in this paper and focus on the estimation of $\Omega^{-1}$ . As discussed above, this estimation enables to estimate the cross-covariance functions of $X$ and to perform prediction. Our approach for estimating $\Omega^{-1}$ is based on the use of local covariance, or scatter, matrices,

[TABLE]

where $f:\mathbb{R}^{d}\to\mathbb{R}$ is called the kernel function. Nordhausen et al. (2015) obtained estimators $\widehat{\Gamma}(f)$ of $\Omega^{-1}$ through a generalized eigendecomposition of pairs of local covariance matrices with kernels of the form $(f_{0},f_{h})$ , with $f_{h}(s_{i}-s_{j})=I(\|s_{i}-s_{j}\|{\leq}h)$ , for a positive constant $h$ , where $I(\cdot)$ denotes the indicator function and $f_{0}(s)=I(s=0)$ . Their estimators were based on the following definition, with $f=f_{h}$ for some $h>0$ .

Definition 1.1.

An unmixing matrix estimator $\widehat{\Gamma}(f)$ jointly diagonalizes $\widehat{M}(f_{0})$ and $\widehat{M}(f)$ in the following way

[TABLE]

*where $\widehat{\Lambda}(f)$ is a diagonal matrix with diagonal elements in decreasing order. *

This method is conceptually close to principal component analysis where latent variables that have maximal variance are found through the diagonalisation of the covariance matrix. However, since the covariance matrix does not capture spatial information, it was extended to the concept of a local covariance matrix in Nordhausen et al. (2015). Analogously, diagonalising local covariance matrices then aims to find latent fields that maximize spatial correlation.

Here, we expand on their work by not restricting the kernel $f$ in Definition 1.1 to be of the “ball” form $f_{h}$ . Furthermore, we derive the asymptotic behavior for the method proposed in Nordhausen et al. (2015) for a large class of kernel functions $f$ .

The idea when constructing these kernel functions is that the mean values of $\widehat{M}(f)$ and $\widehat{M}(f_{0})$ would be diagonal matrices if, in their definition, the mixed components $X$ were replaced by the latent components $Z$ . Hence, a general blind source separation strategy is to undo the mixing in $X$ by finding a matrix $\widehat{\Gamma}(f)$ which simultaneously diagonalizes $\widehat{M}(f)$ and $\widehat{M}(f_{0})$ . This is computationaly simple and can always be done exactly using generalized eigenvalue-eigenvector theory. From temporal blind source separation, it is however well known that when diagonalising only two matrices, the choice of the matrices can have a large impact on the separation efficiency. Therefore, it is a popular strategy to approximately diagonalize more than two matrices with the hope of including more information; see for example Belouchrani et al. (1997), Nordhausen (2014), Miettinen et al. (2014), Matilainen et al. (2015) and Miettinen et al. (2016). Approximate diagonalization becomes then necessary as the matrices commute only at the population level but not when estimated using finite data. There are many algorithms available for this purpose. We use this idea to extend the method of Nordhausen et al. (2015) to jointly diagonalize more than two local covariance matrices. We also derive the asymptotic behaviour of these novel estimators.

2 Spatial blind source separation model

2.1 General assumptions

In the spatial blind source separation model, the following assumptions are made: {assumption} $\mathrm{E}\{Z(s)\}=0$ for $s\in\mathcal{S}^{d}$ ;

{assumption} $\mathrm{cov}\{Z(s)\}=\mathrm{E}\{Z(s)Z(s)^{\mathrm{\scriptscriptstyle T}}\}=I_{p}$ ;

{assumption} $\mathrm{cov}\{Z(s_{1}),Z(s_{2})\}=\mathrm{E}\{Z(s_{1})Z(s_{2})^{\mathrm{\scriptscriptstyle T}}\}=D(s_{1},s_{2})$ , where $D$ is a diagonal matrix whose diagonal elements depend only on $s_{1}-s_{2}$ .

Let $\mathrm{cov}\{Z_{k}(s_{i}),Z_{k}(s_{j})\}=K_{k}(s_{i}-s_{j})=D(s_{i},s_{j})_{k,k}$ , where $K_{k}$ denotes the stationary covariance function of $Z_{k}$ , for $k=1,\ldots,p$ .

Assumption 2.1 is made for convenience and can easily be replaced by assuming a constant unknown mean (see Lemma B.31 in the supplementary material). Assumption 2.1 says that the components of $Z(s)$ are uncorrelated and implies that the variances of the components are one, which reduces identifiability issues and comes without loss of generality. Assumption 2.1 says that there is also no spatial cross-dependence between the components. However, even after these assumptions are made, the model is not uniquely defined. The order of the latent fields and also their signs can be changed. This is common for all blind source separation approaches and is not considered a problem in practice.

2.2 Identifiability

The expectations of $\widehat{M}(f)$ and $\widehat{M}(f_{0})$ are respectively

[TABLE]

Thus the empirical procedure of Definition 1.1, operating on $\widehat{M}(f)$ and $\widehat{M}(f_{0})$ , can be associated to the following theoretical procedure, operating on $M(f)$ and $M(f_{0})$ .

Definition 2.1.

For any function $f:\mathbb{R}^{d}\to\mathbb{R}$ , an unmixing matrix functional $\Gamma(f)$ is defined as a functional which jointly diagonalizes $M(f)$ and $M(f_{0})$ in the following way

[TABLE]

*where $\Lambda(f)$ is a diagonal matrix with diagonal elements in decreasing order. *

We remark that an unmixing matrix $\Gamma(f)$ can be found using the generalized eigenvalue-eigenvector theory. In addition, an unmixing matrix is never unique, since if $\Gamma(f)$ and $\Lambda(f)$ satisfy Definition 2.1, then $S\Gamma(f)$ and $\Lambda(f)$ also satisfy Definition 2.1 for any diagonal matrix $S$ with diagonal elements equal to $-1$ or $1$ . We also remark that $\Lambda(f)$ is not the expectation of $\widehat{\Lambda}(f)$ , in general. Indeed, Definitions 1.1 and 2.1 are based on non-linear functions of $\{\widehat{M}(f),\widehat{M}(f_{0})\}$ and of $\{M(f),M(f_{0})\}$ .

The usual notion of identifiability in blind source separation is that any unmixing functional $\Gamma(f)$ should recover the components of $Z$ up to signs and order of the components. Thus, any unmixing functional $\Gamma(f)$ should coincide with $\Omega^{-1}$ , up to the order and signs of the rows.

Definition 2.2.

*We say that the unmixing problem given by $f$ is identifiable if any unmixing functional $\Gamma(f)$ satisfying Definition 2.1 can be written as $PS\Omega^{-1}$ , where $P$ is a permutation matrix and $S$ is a diagonal matrix with diagonal elements equal to $-1$ or $1$ . *

The motivation behind identifiability is that, if identifiability holds, then estimating $M(f_{0})$ and $M(f)$ consistently by $\widehat{M}(f_{0})$ and $\widehat{M}(f)$ enables to obtain $\widehat{\Gamma}(f)$ , which will be approximately equal to a matrix of the form $PS\Omega^{-1}$ , with $P$ and $S$ as in Definition 2.2. The following proposition provides a necessary and sufficient condition for identifiability. This proposition is proved in § B.2 of the supplementary material. All the other theoretical results in this paper are also proved in the supplementary material. Let $M^{-{\mathrm{\scriptscriptstyle T}}}$ denote the inverse of the transpose of $M$ .

Proposition 2.3.

*The unmixing problem given by $f$ is identifiable if and only if the diagonal elements of $\Omega^{-1}M(f)\Omega^{-{\mathrm{\scriptscriptstyle T}}}$ are distinct. *

We remark that identifiability is a joint property of the kernel $f$ and the covariance functions $K_{1},\ldots,K_{p}$ . For instance, consider the situation where $K_{1},\ldots,K_{p}$ are compactly supported and equal to zero at distances larger than $0<r<\infty$ , and where one uses the function $f(s)=I(r_{1}<\|s\|\leq r_{2})$ , with $r\leq r_{1}<r_{2}<\infty$ as kernel. Then identifiability does not hold because $\Omega^{-1}M(f)\Omega^{-{\mathrm{\scriptscriptstyle T}}}$ is equal to the zero matrix. On the other hand, if $f$ is a ball kernel of the form $f(s)=I(\|s\|\leq r_{0})$ with $r_{0}>0$ , then identifiability may hold, for the same covariance functions $K_{1},\ldots,K_{p}$ .

Finally, for any kernel $f$ , a necessary condition for identifiability is that there does not exist $k,l\in\{1,\ldots,p\}$ , $k\neq l$ , such that $K_{k}(s_{i}-s_{j})=K_{l}(s_{i}-s_{j})$ for all $i,j=1,\ldots,n$ . Indeed, if this was the case, then the diagonal elements $k$ and $l$ of $\Omega^{-1}M(f)\Omega^{-{\mathrm{\scriptscriptstyle T}}}$ would be equal, for any kernel $f$ . An extreme example of this issue is $K_{1}=\cdots=K_{p}$ with only Gaussian components. If this is the case, then, for any orthogonal matrix $Q$ , the distribution of the random field $QZ$ is the same as that of the random field $Z$ . Hence, no statistical procedure can be expected to recover the components of $Z$ , even up to signs and permutations, when only observing the transformed random field $X$ .

2.3 Relationships with other models of multivariate random fields

The spatial blind source separation is notably different from the usual multivariate models for spatial data, which are often defined starting with their covariance functions contained in a cross-covariance matrix,

[TABLE]

whereas our approach for estimating $\Omega^{-1}$ does not need to model or estimate the covariance functions of the latent fields $Z_{1}(s),\ldots,Z_{p}(s)$ .

In a recent extensive review, Genton & Kleiber (2015) discussed different approaches to define cross-covariance matrix functionals and gave a list of properties and conventions that they should satisfy, for instance stationarity and invariance under rotation. As Genton & Kleiber (2015) pointed out, to create general classes of models with well-defined cross-covariance functionals is a major challenge. Multivariate spatial models are particularly challenging as many parameters need to be fitted. In textbooks such as Wackernagel (2003) usually the following two popular models are described.

In the intrinsic correlation model it is assumed that the stationary covariance matrix $C(h)$ can be written as the product of the variable covariances and the spatial correlations, $C(h)=\rho(h)T$ , for all lags $h$ , where $T$ is a non-negative definite $p\times p$ matrix and $\rho(h)$ a univariate spatial correlation function.

The more popular linear model of coregionalization is a generalization of the intrinsic correlation model, and the covariance matrix then has the form

[TABLE]

for some positive integer $r\leq p$ with all the $\rho_{k}$ ’s being univariate spatial correlation functions and $T_{k}$ ’s being non-negative definite $p\times p$ matrices, often called coregionalization matrices. Hence, with $r=1$ this reduces to the intrinsic correlation model. The linear model of coregionalization implies a symmetric cross-covariance matrix.

Estimation in the linear model of coregionalization is discussed in several papers. Goulard & Voltz (1992) focused on the coregionalization matrices using an iterative algorithm where the spatial correlation functions are assumed to be known. The algorithm was extended in Emery (2010). Assuming Gaussian random fields, an expectation-maximisation algorithm was suggested in Zhang (2007) and a Bayesian approach was considered in Gelfand et al. (2004).

There is a simple connection between the spatial blind source separation model and the linear model of coregionalization. The covariance matrix $C_{X}(h)$ resulting from a spatial blind source separation model is always symmetric and can be written as

[TABLE]

with $T_{k}=\omega_{k}\omega_{k}^{\mathrm{\scriptscriptstyle T}}$ , $\omega_{k}$ being the ${k}$ th column of $\Omega$ . Thus the spatial blind source separation model is a special case of the linear model of coregionalization with $r=p$ and where all coregionalization matrices $T_{k}$ , ${k}=1,\ldots,p$ , are rank one matrices.

3 Asymptotic properties for simultaneous diagonalisation of two matrices

Recall the definition (1.2) of a local covariance matrix and that

[TABLE]

is the covariance estimator. Asymptotic results can be derived for the previous estimators assuming that Assumptions 1 to 3 hold together with the following assumptions: {assumption} The coordinates $Z_{1},\ldots,Z_{p}$ of $Z$ are stationary Gaussian processes on $\mathbb{R}^{d}$ ;

{assumption} A fixed $\Delta>0$ exists so that, for all $n\in\mathbb{N}$ and, for all $i\neq j$ , $i,j=1,\ldots,n$ , $\|s_{i}-s_{j}\|\geq\Delta$ ;

{assumption} Fixed $A>0$ and $\alpha>0$ exist such that, for all $x\in\mathbb{R}^{d}$ and, for all $k=1,\ldots,p$ ,

[TABLE]

{assumption}

Assuming Assumption 3.1 holds, then for the same $A>0$ and $\alpha>0$ we have

[TABLE]

{assumption}

We have

[TABLE]

Assumption 3.1 implies that $\mathcal{S}^{d}$ is unbounded as $n\to\infty$ , which means that we address the increasing domain asymptotic framework (Cressie, 1993).

Assumption 3 holds in particular for the function $I(s=0)$ and for the “ball” and “ring” kernels $B(h)(s)=I(\|s\|\leq h)$ with fixed $h\geq 0$ and $R(h_{1},h_{2})(s)=I(h_{1}\leq\|s\|\leq h_{2})$ with fixed $h_{2}\geq h_{1}\geq 0$ .

Up to reordering the components of $Z$ , which comes without loss of generality, Assumption 3 is an asymptotic version of the identifiability condition in Proposition 2.3. Under Assumption 3, identifiability in the sense of Definition 2.2 holds for sufficiently large $n$ , from Proposition 2.3.

Proposition 3.1 below gives the consistency of the estimator $\widehat{M}(f)$ , where $f$ satisfies Assumption 3. The proof of this proposition is provided in § B.4 of the supplementary material.

Proposition 3.1.

*Suppose $n\to\infty$ and Assumptions 2.1 to 3.1 hold and let $f:\mathbb{R}^{d}\to\mathbb{R}$ satisfy Assumption 3. Then $\widehat{M}(f)-M(f)\to 0$ in probability when $n\to\infty$ . *

We remark that $M(f)$ depends on $n$ and that we do not assume that the sequence of matrices $M(f)$ converges to a fixed matrix as $n\to\infty$ . Hence, Proposition 3.1 shows that $\widehat{M}(f)-M(f)$ converges to zero, and not that $\widehat{M}(f)$ converges to $M(f)$ .

Next, we show the joint asymptotic normality of $n^{1/2}\{\widehat{M}(f_{0})-M(f_{0})\}$ and $n^{1/2}\{\widehat{M}(f)-M(f)\}$ , seen as sequences of $p^{2}\times 1$ random vectors. Similarly as in Proposition 3.1, we do not need to assume that the sequence of $2p^{2}\times 2p^{2}$ covariance matrices of these two sequences of vectors converges to a fixed matrix. Hence, we will not show that these sequences of random vectors converge jointly to a fixed Gaussian distribution. Instead, we will show that the distances between the distributions of these random vectors and Gaussian distributions converge to zero as $n\to\infty$ . As a distance between distributions, we consider a metric $d_{w}$ generating the topology of weak convergence on the set of Borel probability measures on Euclidean spaces (see, e.g., Dudley (2002), p. 393). The benefit of such a distance is that a sequence of distributions $(\mathcal{L}_{n})_{n\in\mathbb{N}}$ converges to a fixed distribution $\mathcal{L}$ if and only if $d_{w}(\mathcal{L}_{n},\mathcal{L})$ converges to zero. The next proposition provides the asymptotic normality result. It is proved in § B.4 of the supplementary material.

Proposition 3.2.

Assume the same assumptions as in Proposition 3.1. Let $W(f)$ be the vector of size $p^{2}\times 1$ , defined for $i=(a-1)p+b$ , $a,b\in\{1,\ldots,p\}$ , by

[TABLE]

Let $Q_{n}$ be the distribution of $\{W(f)^{\mathrm{\scriptscriptstyle T}},W(f_{0})^{\mathrm{\scriptscriptstyle T}}\}^{\mathrm{\scriptscriptstyle T}}$ . Then, as $n\to\infty$ ,

[TABLE]

*where $\mathcal{N}$ denotes the normal distribution and details concerning the matrix $V(f,f_{0})$ are given in Appendix A.2. Furthermore, the largest eigenvalue of $V(f,f_{0})$ is bounded as $n\to\infty$ . *

In Proposition 3.2, $V(f,f_{0})$ is a $2p^{2}\times 2p^{2}$ matrix that depends on $n$ and is interpreted as an asymptotic covariance matrix. Also, in Proposition 3.2, the vectors $W(f)$ and $W(f_{0})$ , that are asymptotically Gaussian, are obtained by row vectorization of $n^{1/2}\{\widehat{M}(f_{0})-M(f_{0})\}$ and $n^{1/2}\{\widehat{M}(f)-M(f)\}$ . Taking $f(s)=I(\|s\|\leq h)$ with $h>0$ in Propositions 3.1 and 3.2 gives the asymptotic properties of the method proposed in Nordhausen et al. (2015).

Remark 3.3.

*Propositions 3.1 and 3.2 remain valid when centering the process $X$ by $\bar{X}=n^{-1}\sum_{i=1}^{n}X(s_{i})$ . Indeed, we prove in Lemma B.31 of the supplementary material that the difference between the centered estimator and $\widehat{M}(f)$ is of order $O_{p}(n^{-1})$ . *

For a matrix $A$ with rows $l_{1}^{\mathrm{\scriptscriptstyle T}},\ldots,l_{k}^{\mathrm{\scriptscriptstyle T}}$ , let $\mathrm{vect}(A)=(l_{1}^{\mathrm{\scriptscriptstyle T}},\ldots,l_{k}^{\mathrm{\scriptscriptstyle T}})^{\mathrm{\scriptscriptstyle T}}$ be the row vectorization of $A$ and for a matrix $A$ of size $k\times k$ , let ${\mathrm{diag}}(A)=(A_{1,1},\ldots,A_{k,k})^{\mathrm{\scriptscriptstyle T}}$ . Next, Proposition 3.4 shows the joint asymptotic normality of the estimators $\widehat{\Gamma}(f)$ and $\widehat{\Lambda}(f)$ . This proposition is proved in § B.4 of the supplementary material.

Proposition 3.4.

Assume the same assumptions as in Proposition 3.1. Assume also that Assumption 3 holds. For $\widehat{\Gamma}(f)$ and $\widehat{\Lambda}(f)$ in Definition 1.1, let $Q_{n}$ be the distribution of

[TABLE]

Then, we can choose $\widehat{\Gamma}(f)$ and $\widehat{\Lambda}(f)$ in Definition 1.1 so that when $n\to\infty$ ,

[TABLE]

*where details concerning the matrix ${F_{1}}$ are given in Appendix A.3. *

In Proposition 3.4, similarly as before, we consider the sequences of vectors obtained by vectorizing $n^{1/2}\{\widehat{\Gamma}(f)-\Omega^{-1}\}$ and taking the diagonal of $n^{1/2}\{\widehat{\Lambda}(f)-\Lambda(f)\}$ . Again, we do not show that the sequence of joint distributions of these vectors converges to a fixed distribution. Instead, we show that these joint distributions are asymptotically close to Gaussian distributions, with covariance matrices given by ${F_{1}}$ . We remark that ${F_{1}}$ denotes a sequence of $(p^{2}+p)\times(p^{2}+p)$ matrices. We also remark that, in Definition 1.1, $\widehat{\Gamma}(f)$ is not uniquely defined. It is defined up to the signs of its rows. Hence, Proposition 3.4 shows that there exists a choice of the sequence $\widehat{\Gamma}(f)$ in Definition 1.1 such that asymptotic normality holds as $n\to\infty$ .

The performance of the estimators $\widehat{\Gamma}(f)$ and $\widehat{\Lambda}(f)$ depends on the choice of $\widehat{M}(f)$ that should be chosen so that $\widehat{\Lambda}(f)$ has diagonal elements as distinct as possible. This is similar to the time series context as described in Miettinen et al. (2012). To avoid this dependency in the time series context, the joint diagonalisation of more than two matrices has been suggested and we will apply this concept to the spatial context in the following section.

4 Improving the estimation of the spatial blind source separation model by jointly diagonalising more than two matrices

Spatial blind source separation with more than two kernel functions of the form $f_{0},f_{1},\ldots,f_{k}$ , with $k\geq 2$ , can be formulated as

[TABLE]

We can show that, if $k=1$ , the set of $\widehat{\Gamma}$ satisfying (4.1) coincides with the set of $\widehat{\Gamma}(f_{1})$ satisfying Definition 1.1. From experience in time series blind source separation (see for example Miettinen et al., 2016), usually the diagonalisation of several matrices gives a better separation than those based on two matrices only. In this paper, we indeed show that using $k\geq 2$ is beneficial from a theoretical point of view and in practice.

The identifiability notion of Definition 2.2 and Proposition 2.3 can be extended to the case of more than two local covariance matrices. We first remark that the theoretical version of (4.1) is

[TABLE]

We then extend Definition 2.2 and Proposition 2.3 to the case of more than two local covariance matrices.

Definition 4.1.

*We say that the unmixing problem given by $f_{1},\ldots,f_{k}$ is identifiable if any unmixing functional $\Gamma$ satisfying (4.2) can be written as $PS\Omega^{-1}$ , where $P$ is a permutation matrix and $S$ is a diagonal matrix with diagonal elements equal to $-1$ or $1$ . *

Proposition 4.2.

*The unmixing problem given by $f_{1},\ldots,f_{k}$ is identifiable if and only if for every pair $i\neq j$ , $i,j=1,\ldots,p$ , there exists $l=1,\ldots,k$ such that $\{\Omega^{-1}M(f_{l})\Omega^{-{\mathrm{\scriptscriptstyle T}}}\}_{i,i}\neq\{\Omega^{-1}M(f_{l})\Omega^{-{\mathrm{\scriptscriptstyle T}}}\}_{j,j}$ . *

Proposition 4.2 is proved in § B.5 of the supplementary material. We remark that the identifiability condition in Proposition 4.2 is weaker than that in Proposition 2.3, because if the condition in Proposition 2.3 holds with $f$ being one of the $f_{1},\ldots,f_{k}$ , then the condition in Proposition 4.2 holds. This is one of the benefits of jointly diagonalising more than two matrices.

One of the main theoretical contributions of this paper is to provide an asymptotic analysis of the joint diagonalisation of several matrices in the spatial context. Assumption 3, on asymptotic identifiability, can be replaced by the following weaker assumption.

{assumption}

A fixed $\delta>0$ and $n_{0}\in\mathbb{N}$ exist so that for all $n\in\mathbb{N}$ , $n\geq n_{0}$ , for every pair $i\neq j$ , $i,j=1,\ldots,p$ , there exists $l=1,\ldots,k$ , such that $|\{\Omega^{-1}M(f_{l})\Omega^{-{\mathrm{\scriptscriptstyle T}}}\}_{i,i}-\{\Omega^{-1}M(f_{l})\Omega^{-{\mathrm{\scriptscriptstyle T}}}\}_{j,j}|\geq\delta$ .

In the next proposition, we prove the consistency of $\widehat{\Gamma}$ . This proposition is proved in § B.6 of the supplementary material.

Proposition 4.3.

*Suppose Assumptions 2.1 to 3.1 hold. Let $k\in\mathbb{N}$ be fixed. Let $f_{1},\ldots,f_{k}:\mathbb{R}^{d}\to\mathbb{R}$ satisfy Assumption 3. Assume that Assumption 4 holds. Let $\widehat{\Gamma}=\widehat{\Gamma}\{\widehat{M}(f_{0}),\widehat{M}(f_{1}),\ldots,\widehat{M}(f_{k})\}$ satisfy (4.1). Then we can choose $\widehat{\Gamma}$ so that $\widehat{\Gamma}\to\Omega^{-1}$ in probability when $n$ goes to infinity. *

In Proposition 4.3, we remark that $\widehat{\Gamma}$ is defined only up to permutation of the rows and multiplications of them by $1$ or $-1$ . Hence, we show that there exists a choice of a sequence $\widehat{\Gamma}$ that converges to $\Omega^{-1}$ . The next proposition provides an asymptotic normality result. It is proved in § B.6 of the supplementary material.

Proposition 4.4.

Assume the same assumptions as in Proposition 4.3. Let $(\widehat{\Gamma}_{n})_{n\in\mathbb{N}}$ be any sequence of $p\times p$ matrices so that for any $n\in\mathbb{N}$ , $\widehat{\Gamma}_{n}=\widehat{\Gamma}_{n}\{\widehat{M}(f_{0}),\widehat{M}(f_{1}),\ldots,\widehat{M}(f_{k})\}$ satisfies (4.1). Then, a sequence of permutation matrices $(P_{n})$ and a sequence of diagonal matrices $(D_{n})$ exist, with diagonal components in $\{-1,1\}$ , so that the distribution $Q_{n}$ of ${n}^{1/2}\mathrm{vect}(\check{\Gamma}_{n}-\Omega^{-1})$ with $\check{\Gamma}_{n}=D_{n}P_{n}\widehat{\Gamma}_{n}$ satisfies, as $n\to\infty$ ,

[TABLE]

*where details concerning the matrix $F_{k}$ are given in Appendix A.4. *

In Proposition 4.4, for any $n\in\mathbb{N}$ , the choice of $\widehat{\Gamma}_{n}$ satisfying (4.1) is not unique. The proposition shows that, for any choice of the sequence of matrices $\widehat{\Gamma}_{n}$ , one can exchange the rows and multiply them by $1$ or $-1$ , to obtain a sequence of matrices $\check{\Gamma}_{n}$ that converges to $\Omega^{-1}$ as $n\to\infty$ . Furthermore, similarly as in Proposition 3.4, we show that the sequence of distributions of ${n}^{1/2}\mathrm{vect}(\check{\Gamma}_{n}-\Omega^{-1})$ is asymptotically close to a sequence of Gaussian distributions. The sequence of $p^{2}\times p^{2}$ covariance matrices of these Gaussian distributions is ${F_{k}}$ .

The idea of joint diagonalisation is not new in spatial data analysis. For example in Xie & Myers (1995), Xie et al. (1995) and De Iaco et al. (2013), in a model-free context, matrix variograms have been jointly diagonalized. However, the unmixing matrix was restricted to be orthogonal, which would therefore not solve the spatial blind source separation model.

While two symmetric matrices can always be simultaneously diagonalized, this is usually not the case for more than two matrices which are estimated based on finite data. Therefore, algorithms are needed for approximate joint diagonalisation. In this paper we use an algorithm which is based on Givens rotations (Clarkson, 1988). Other possible algorithms and their impact on the properties of the estimates are for example discussed in Illner et al. (2015).

5 Simulations

5.1 Preliminaries

In this section we use simulated data to verify our asymptotic results and to compare the efficiencies of the different local covariance estimates under a varying set of spatial models. All simulations are performed in R (R Core Team, 2019) with the help of the packages geoR (Ribeiro Jr & Diggle, 2016), JADE (Miettinen et al., 2017) and RcppArmadillo (Eddelbuettel & Sanderson, 2014). To generate the simulation data, we have chosen some particular covariance functions for the latent fields. However, our proposed methods do not use this information in any way, but operate solely through the selection of local covariance matrices.

5.2

Asymptotic approximation of the unmixing matrix estimator

We start with a simple simulation to establish the validity of the asymptotic approximation of the unmixing matrix estimator $\widehat{\Gamma}(f)$ for different kernels $f$ and to obtain some preliminary comparative results between the proposed estimators. We consider a centered, three-variate spatial blind source separation model $X({s})={\Omega}{Z}({s})$ where each of the three independent latent fields has a Matérn covariance function with shape and range parameters $(\kappa,\phi)\in\{(6,\text{1$ \cdot $2}),(1,\text{1$ \cdot $5}),(\text{0$ \cdot $25},1)\}$ , which correspond to the left panel in Fig. 5.1. We recall that the Matérn correlation function is defined by

[TABLE]

where $\kappa>0$ is the shape parameter, $\phi>0$ is the range parameter and $K_{\kappa}$ is the modified Bessel function of the second kind of order $\kappa$ . Our location pattern is constructed in the following way: the first 200 locations are drawn uniformly random from an origin-centered square $S_{1}$ of side length $200^{1/2}$ units. For the next 200 locations, we scale the side length of the square $S_{1}$ by the factor $2^{1/2}$ to obtain the larger square $S_{2}$ and draw the points uniformly random on $S_{2}\setminus S_{1}$ . Next, we always scale the side length of the previous square $S_{j}$ by $2^{1/2}$ to obtain $S_{j+1}$ and draw the same amount of locations we already have on $S_{j+1}\setminus S_{j}$ , thus doubling the number of points every time. This process is continued until we have obtained a total of $3200$ locations. In the simulation we consider the sample sizes $n=100\times 2^{j}$ , for $j=1,\ldots,5$ , each time using the first $n$ of the $3200$ points, that is, all points inside the $j$ th innermost square on the left-hand side of Fig. 5.2. The six samples then correspond to nested samples of points and represent the increasing domain asymptotic scheme implied by Assumption 3.1.

We expect any successful unmixing estimator $\widehat{\Gamma}$ to satisfy $\widehat{\Gamma}{\Omega}\approx{I}_{p}$ up to sign changes and row permutations. The minimum distance index (Ilmonen et al., 2010b) is defined as,

[TABLE]

where $\mathcal{C}$ is the set of all matrices with exactly one non-zero element in each row and column and $\|\cdot\|$ is the Frobenius norm. The minimum distance index measures how close $\widehat{\Gamma}\Omega$ is to the identity matrix up to scaling, order and signs of its rows, and $0\leq\mathrm{\sc MDI}(\widehat{{\Gamma}})\leq 1$ with lower values indicating more efficient estimation. Moreover, for any $\widehat{{\Gamma}}$ such that ${n}^{1/2}\mathrm{vect}(\widehat{{\Gamma}}-{I}_{p})\rightarrow\mathcal{N}({0},{\Sigma})$ for some limiting covariance matrix ${\Sigma}$ , the transformed index $n(p-1)\mathrm{\sc MDI}(\widehat{{\Gamma}})^{2}$ converges to a limiting distribution $\sum_{i=1}^{k}\delta_{i}\chi^{2}_{i}$ where $\chi^{2}_{1},\ldots,\chi^{2}_{k}$ are independent chi-squared random variables with one degree of freedom and $\delta_{1},\ldots,\delta_{k}$ are the $k$ non-zero eigenvalues of the matrix,

[TABLE]

where ${D}_{p,p}=\sum_{j=1}^{p}{E}^{jj}\otimes{E}^{jj}$ and $E^{jj}$ is the $p\times p$ matrix with one as its $(j,j)$ th element and the rest of the elements equal zero, and $\otimes$ is the usual tensor matrix product. In particular, the expected value of the limiting distribution is the sum of the limiting variances of the off-diagonal elements of $\widehat{{\Gamma}}$ . This provides us with a useful single-number summary to measure the asymptotic efficiency of the method, i.e., the mean value of $n(p-1)\mathrm{\sc MDI}(\widehat{{\Gamma}})^{2}$ over several replications.

Following the argument of the proof of Proposition B.17 in the supplementary material, our spatial blind source separation estimators are affine equivariant. More precisely, let $\widehat{\Gamma}(I_{p})$ be computed from $\{Z(s_{i})\}_{i=1,\ldots,n}$ according to (4.1) and recall that $\widehat{\Gamma}$ is computed from $\{X(s_{i})\}_{i=1,\ldots,n}$ according to (4.1). Then we have $\widehat{\Gamma}=\widehat{\Gamma}(I_{p})\Omega^{-1}$ , up to sign changes and row permutations. In this sense, $\widehat{\Gamma}{\Omega}$ is invariant to the value of ${\Omega}$ . As the minimum distance index depends on $\widehat{\Gamma}$ only through $\widehat{\Gamma}{\Omega}$ , it is thus without loss of generality that we may consider throughout § 5 only the trivial mixing case ${\Omega}={I}_{3}$ . Taking different $\Omega$ into consideration would give exactly the same results as those provided below.

Recall that the ball and ring kernels are defined as $B(h)(s)=I(\|s\|\leq h)$ and $R(h_{1},h_{2})(s)=I(h_{1}\leq\|s\|\leq h_{2})$ for fixed $h\geq 0$ and $h_{2}\geq h_{1}\geq 0$ . We simulate 2000 replications for each sample size $n$ and estimate the unmixing matrix in each case with three different choices for the local covariance matrix kernels: $B(1),R(1,2)$ and $\{B(1),R(1,2)\}$ , where the argument $s$ is dropped and the brackets $\{\}$ denote the joint diagonalisation of the kernels inside. The latent covariance functions on the left panel of Fig. 5.1 show that the dependencies of the last two fields die off rather quickly, and we would expect that already very local information is sufficient to separate the fields. Moreover, out of all one-unit intervals, the magnitudes of the three covariance functions differ the most from each other in the interval from 1 to 2 and we may reasonably assume that either $R(1,2)$ or $\{B(1),R(1,2)\}$ will be the most efficient choice.

The mean values of $n(p-1)\mathrm{\sc MDI}(\widehat{{\Gamma}})^{2}$ over the 2000 replications are shown as the solid lines in Fig. 5.3, with the dashed lines representing the asymptotic approximated values of the means, towards which they are expected to converge (see Propositions 3.4 and 4.4). As evidenced in Fig. 5.3, this is indeed what happens. For the reasons detailed in the previous paragraph, the kernel $R(1,2)$ is notably a more efficient choice than $B(1)$ . However, the ball kernel still carries some additional information to the ring as their joint diagonalisation, $\{B(1),R(1,2)\}$ , gives the best results out of the three choices, albeit marginally. As the main purpose of the current simulation was to verify the limiting theorems and compare the different choices of kernels, the estimation accuracy of the sources was considered jointly, through the minimum distance index. However, as it is possible that some of the individual sources are more difficult to estimate than others, we have included in § C.2 of the supplementary material a simulation study exploring individual component recovery.

The previous investigation and Fig. 5.3 used only the expected value of the asymptotic distribution. In Fig. C.1 of the supplementary material, we have also plotted the estimated densities of $n(p-1)\mathrm{\sc MDI}(\widehat{{\Gamma}})^{2}$ for all local covariance matrices and a few selected sample sizes and compared with the density of the asymptotic approximation estimated from a sample of 100,000 random variables drawn from the corresponding distributions. Overall, the two densities fit each other rather well, especially for the local covariance matrices involving the ring kernel. This shows that the asymptotic approximation to the distribution of $n(p-1)\mathrm{\sc MDI}(\widehat{{\Gamma}})^{2}$ is good already for small sample sizes.

5.3 The effect of range on the efficiency

The second simulation explores the effect of the range of the latent fields on the asymptotically optimal choice of local covariance matrices. The comparisons between the estimators are made on the basis of the expected values of the asymptotic approximations to the distribution of $n(p-1)\mathrm{\sc MDI}(\widehat{{\Gamma}})^{2}$ (that is, using the equivalent of the dashed lines in Fig. 5.3), meaning that no randomness is involved in this simulation.

We consider three-variate random fields ${X}({s})={\Omega}{Z}({s})$ , where ${\Omega}={I}_{3}$ and the latent fields have Matérn covariance functions with respective shape parameters $\kappa=2,1,\text{0$ \cdot $25}$ and a range parameter $\phi\in\,\{\text{1$ \cdot $0},\text{1$ \cdot $1},\text{1$ \cdot $2},\ldots,\text{30$ \cdot $0}\}$ . The three covariance functions are shown for $\phi=1$ in the right panel of Fig. 5.1. The random field is observed at three different point patterns: diamond-shaped, rectangular and random, which was simulated once and held fixed throughout the study. The diamond-shaped point pattern has a radius of $m=30$ and a total of $n=1861$ locations, whereas the rectangular point pattern has a “radius” of $m=15$ with a total of $n=1891$ locations. In both patterns, the horizontal and vertical distance between two neighbouring locations is one unit and examples of the two pattern types are shown in the middle and right panels of Fig. 5.2 with a radius $m=10$ . A rectangular pattern with “radius” $m$ is defined to have the width $2m+1$ and the height $m+1$ . The random point pattern is generated simply by simulating $n=1861$ points uniformly in the rectangle $(-30,30)\times(-30,30)$ . We consider a total of eight different local covariance matrices, $B(r),R(r-1,r)$ for $r=1,3,5$ , and the joint diagonalisations of the previous sets: $\{B(1),B(3),B(5)\}$ and $\{R(0,1),R(2,3),R(4,5)\}$ .

The results of the simulation are displayed in Fig. 5.4 where the two joint diagonalisations are denoted by having value “J” as the parameter $r$ . Recall that the lower the value on the $y$ -axis, the better that particular method is at estimating the three latent fields. The relative ordering of the different curves is very similar across all three plots, and it seems that the choice of the location pattern does not have a large effect on the results. In all the patterns, the local covariance matrices with either $r=1$ or $r=3$ are the best choices for small values of the range $\phi$ but they quickly deteriorate as $\phi$ increases. The opposite happens for the local covariance matrices with $r=5$ ; they are among the worst for small $\phi$ and relatively improve with increasing $\phi$ . The joint diagonalisation-based choices fall somewhere in-between and are never the best nor the worst choice. However, they yield performance very close to the best choice in the right end of the range-scale and are close to the optimal ones in the left end. Thus, their use could be justified in practice as the “safe choice”. Comparing the two types of local covariance matrices, balls and rings, we observe that in the majority of cases the rings prove superior to the balls.

5.4 Efficiency comparison

To compare a larger number of local covariance matrices and their combinations, we simulate three-variate random fields ${X}({s})={\Omega}{Z}({s})$ , where ${\Omega}={I}_{3}$ and the latent fields have Matérn covariance functions with the shape parameters $\kappa=6,1,\text{0$ \cdot $25}$ and the range parameter $\phi=20$ , in kilometers. We consider two different fixed-location patterns fitted inside the map of Finland; see Fig. 5.5. The first location pattern has the locations drawn uniformly from the map and the second location pattern is drawn from a west-skew distribution. Both patterns have a total of $n=1000$ locations and to better distinguish the scale we have added three concentric circles with respective radii of 10, 20, and 30 kilometers in the empty area of the skew map.

We simulate a total of 2000 replications of the above scheme with the fixed maps. In each case we compute the minimum distance index values of the estimates obtained with the local covariance matrix kernels $B(r),R(r-10,r),G(r)$ , where $r=10,20,30,100$ , and the joint diagonalisation of each of the three quadruplets $\{B(10),B(20),B(30),B(100\}$ , $\{R(10),R(20),R(30),R(100\}$ and $\{G(10),G(20),G(30),G(100\}$ adding up to a total of 15 estimators. The Gaussian kernel is parametrized as $G(r)\equiv\exp[-\text{0$ \cdot $5}\{\Phi^{-1}(\text{0$ \cdot $95})s/r\}^{2}]$ , where $s$ is the distance and $\Phi^{-1}(x)$ is the quantile function of the standard normal distribution, making $G(r)$ have $90$ % of its total mass in the radius $r$ ball around its center. Thus, $G(r)$ can be considered a smooth approximation of $B(r)$ . The larger radius kernels $B(100)$ , $R(90,100)$ , $G(100)$ are included in the simulation to investigate what happens when we overestimate the dependency radius. The mean minimum distance index values for the 15 estimators are plotted in Fig. 5.6 and show that for both maps and all local covariance types, increasing the radius yields more accurate separation results all the way up to $r=30$ , whereas for $r=100$ the results again worsen. This observation shows that when using a single local covariance matrix, the choice of the type and the radius are especially important, most likely requiring some expert knowledge on the study. However, this problem is completely averted when we use the joint diagonalisation of several matrices. For both maps and all local covariance types the joint diagonalisation produces results very comparable to the best individual matrices, even though the joint diagonalisations also include the “bad choices”, $r=10,20,100$ . We also observe a similar behaviour in the first and second simulation studies where, in the absence of knowledge on the optimal choice, the joint diagonalisation either is the most efficient choice or provides a performance very close to the most efficient choice. Thus, we recommend the use of the joint diagonalisation of scatter matrices with a sufficiently large variation of radii for the kernels.

Finally, a comparison between the two maps reveals that the relative behaviour of the estimators is roughly the same in both maps, but the estimation is generally more difficult in the skew map, revealed by the on average higher minimum distance index values. This is explained by the large number of isolated points which contribute no information to the estimation of the local covariance matrices, making the sample size essentially smaller than $n=1000$ .

6 Data application

To illustrate the benefit of jointly diagonalising more than two scatter matrices from a practical point of view, we reconsider the moss data from the Kola project which are available in the R package StatDa (Filzmoser, 2015) and described in Reimann et al. (2008), for example. The data consist of 594 samples of terrestrial moss collected at different sites in north Europe on the borders of Norway, Finland and Russia. The corresponding map with sampling locations is given in the online supplement in Fig. D.1. The amount of 31 chemical elements found in the moss samples was already used as a spatial blind source separation example in Nordhausen et al. (2015) where the covariance matrix and $B(50)$ were simultaneously diagonalized. The goal of that analysis was to reveal interpretable components exhibiting clear spatial patterns. In Nordhausen et al. (2015), the radius of 50 kilometers was carefully chosen by an expert in that analysis and considered best compared to several other radii not mentioned there. The analysis found six meaningful components, which could be used to distinguish underlying natural geological patterns from environmental pollution patterns. These six components had the six largest eigenvalues and are visualized in Fig. D.2 in the online supplement.

We show that the gold standard components can be stably estimated without subject knowledge on the optimal radius by simply jointly diagonalizing a large enough collection of local covariance matrices. To address the compositional nature of the data, we follow the same data preparation steps as in Nordhausen et al. (2015) and then compute five competing spatial blind source separation estimates. The scatters we used in addition to the covariance matrix are detailed in Table 6. Using these methods, we identify the six components with the highest correlations, in absolute values, to the six main components identified in Nordhausen et al. (2015). Table 6 gives the correlations of the six components.

The table shows that when using only two scatters, estimators 1, 2 and 3, some components cannot be easily found. However, when jointly diagonalising more than two scatters, the results are more stable and less dependent on the chosen distances of the scatters as can be seen for estimators 4 and 5.

This is illustrated using the gold standard and estimators 3 and 4 in Fig. A.1 in the Appendix for the first two components. For completeness, § D of the online supplement contains all six components for the three estimators. The first two components represent, according to Nordhausen et al. (2015), areas with different types of industrial contamination and Figure A.1 shows that the gold standard and estimator 4 agree quite well on these, but estimator 3 yields a different map. More precisely, the first component obtained by the gold standard and the estimator 4 highlights a cluster of negative scores around the Monchegorsk and Apatity region, which reveals the mining and processing of alkaline deposits. This cluster is not revealed by estimator 3. Similarly, the second components are similar between the gold standard and the estimator 4, but the component from the estimator 3 differs from these two, especially for the sampling locations in Finland. Thus, using several scatters gives a more stable impression whereas the maps can vary considerably when only two scatters are used, in which case subject expertise becomes more relevant.

7 Discussion

Our proposed methodology can be extended in multiple directions in future work. The assumptions of Gaussian or stationary fields could be relaxed. The spatial and temporal blind source separation methodologies could be combined to obtain spatio-temporal blind source separation. If used for dimension reduction, estimators for the number of latent non-noise fields could be devised using strategies similar to those in Virta & Nordhausen (2019). Additionally, the combination of spatial blind source separation with univariate kriging and univariate modelling warrants investigation.

How to choose the local covariance matrices optimally is also of interest. This is still an open problem for temporal blind source separation methods, such as second-order blind identification (Belouchrani et al., 1997). Several strategies have been suggested, see for example Tang et al. (2005), and many of them could be useful also in selecting the kernels in spatial blind source separation. The estimation accuracy of our proposed method is based on how well separated the eigenvalues of the matrices $M(f_{0})^{-1/2}M(f_{l})M(f_{0})^{-1/2}$ , $l=1,\ldots,k$ , are. Since the connection between the eigenvalues and the unknown covariance functions is complicated, our suggestion, backed up also by the simulations, is to stay on the safe side and jointly use a large number of ring kernels. However, including large numbers of unnecessary kernels can still have the drawback of inducing some noise to the estimates. One way to remove the unneeded kernels would be to first obtain preliminary estimates for the latent fields using a large number of kernels jointly. Then, our asymptotic results could be used to select from a large collection of sets of kernels, the one which achieves the smallest value of $\delta_{1}+\cdots+\delta_{k}$ ; see § 5.2. The final estimates could then be computed with this asymptotically optimal choice of kernels. A similar technique was used in the context of temporal blind source separation in Taskinen et al. (2016).

Acknowledgement

The work of Nordhausen, Ruiz-Gazen and Virta was partly supported by the CRoNoS Cost action. The work of Nordhausen was also partly supported by the Austrian Science Fund. Ruiz-Gazen acknowledges funding from the French National Research Agency (ANR) under the Investments for the Future (Investissements d’Avenir) program. The authors are very grateful for the comments by the referees which helped considerably to improve the manuscript.

\appendixone

Appendix A Appendix

A.1 Notation

Let $y$ and $z$ be the $np\times 1$ vectors defined by $y_{(i-1)p+j}=Y_{j}(s_{i})$ and $z_{(i-1)p+j}=Z_{j}(s_{i})$ , for $i=1,\ldots,n$ , $j=1,\ldots,p$ . Let $R=\mathrm{cov}(y)$ and $R_{z}=\mathrm{cov}(z)$ . Let $e_{b}(p)$ be the $b$ th base column vector of $\mathbb{R}^{p}$ for $b=1,\ldots,p$ . For $f:\mathbb{R}^{d}\to\mathbb{R}$ and for $b,l=1,\ldots,p$ , let $T_{b,l}(f)$ be the $np\times np$ matrix, that we see as a block matrix composed of $n^{2}$ blocks of sizes $p^{2}$ , and with block $i,j$ equal to $f(s_{i}-s_{j})(1/2)\{e_{b}(p)e_{l}(p)^{\mathrm{\scriptscriptstyle T}}+e_{l}(p)e_{b}(p)^{\mathrm{\scriptscriptstyle T}}\}$ .

For $b\in\mathbb{N}$ , we let $\mathcal{D}(b)=\{1+(i-1)(b+1);i=1,\ldots,b\}$ . We remark that $\{\mathrm{vect}(M)_{i};i\in\mathcal{D}(b)\}=\{M_{i,i};i=1,\ldots,b\}$ for a $b\times b$ matrix $M$ . Let $\bar{\mathcal{D}}_{b}=\{1,\ldots,b^{2}\}\backslash\mathcal{D}_{b}$ . We remark that $\{\mathrm{vect}(M)_{i};i\in\bar{\mathcal{D}}(b)\}=\{M_{i,j};i,j=1,\ldots,b,i\neq j\}$ for a $b\times b$ matrix $M$ . For $a\in\{1,\ldots,b^{2}\}$ , let $I_{b}(a)$ and $J_{b}(a)$ be the unique $i,j\in\{1,\ldots,b\}$ so that $a=b(i-1)+j$ . For $i\in\{1,\ldots,b\}$ , let $d_{b}(i)=1+(i-1)(b+1)$ and note that $\{\mathrm{vect}(M)_{d_{b}(i)};i=1,\ldots,b\}=\{M_{i,i};i=1,\ldots,b\}$ for a $b\times b$ matrix $M$ . For a matrix $M$ of size $b\times b$ , recall that $\mathrm{diag}(M)=(M_{1,1},\ldots,M_{b,b})^{\mathrm{\scriptscriptstyle T}}$ and that $\mathrm{tr}(M)$ denotes its trace.

A.2 Expression of the matrix $V(f,f_{0})$ from Proposition 3.2

Let $f,g:\mathbb{R}^{d}\to\mathbb{R}$ . Using the notation of Appendix A.1, let $\Sigma(f)$ and $\Sigma(f,g)$ be the $p^{2}\times p^{2}$ matrices defined by, for $i=(s-1)p+t$ and $j=(u-1)p+v$ , with $s,t,u,v\in\{1,\ldots,p\}$ ,

[TABLE]

Let

[TABLE]

Then $V(f,f_{0})$ is equal to $V(f,g)$ for $g=f_{0}$ .

A.3 Expression of the matrix ${F_{1}}$ from Proposition 3.4

From Assumption 3, there exists $n_{0}\in\mathbb{N}$ such that for $n\geq n_{0}$ the diagonal elements of $\Omega^{-1}M(f)\Omega^{-{\mathrm{\scriptscriptstyle T}}}$ are strictly decreasing. Write these diagonal elements as $\lambda_{1}>\cdots>\lambda_{p}$ . Using the notation of Appendix A.1, for $n\geq n_{0}$ , let $A$ , $B$ , $C$ and $D$ be respectively the $p^{2}\times p^{2}$ , $p^{2}\times p^{2}$ , $p\times p^{2}$ and $p\times p^{2}$ matrices defined by

[TABLE]

Let

[TABLE]

Let $M_{\Omega^{-1}}$ and $\bar{M}_{\Omega^{-1}}$ be respectively the $p^{2}\times p^{2}$ and $(p^{2}+p)\times(p^{2}+p)$ matrices defined by

[TABLE]

Let $\tilde{V}(f)$ be defined as $V(f_{0},f)$ but with $R$ replaced by $R_{z}$ . Then, for $n\geq n_{0}$ , ${F_{1}}$ is defined as

[TABLE]

A.4 Expression of the matrix ${F_{k}}$ from Proposition 4.4

Let $D(f)=\Omega^{-1}M(f)\Omega^{-{\mathrm{\scriptscriptstyle T}}}$ . For a diagonal matrix $\Lambda$ , let $\Lambda_{r}=\Lambda_{r,r}$ . Let $A_{0},A_{1},\ldots,A_{k}$ and $B$ be $p^{2}\times p^{2}$ matrices defined by, for $n\geq n_{0}$ with the notation of Assumption 4,

[TABLE]

and

[TABLE]

Let $G$ be the $p^{2}\times(k+1)p^{2}$ matrix defined by $G=B(A_{0},A_{1},\ldots A_{k})$ , for $n\geq n_{0}$ . Let $M_{\Omega^{-1}}$ be as in Appendix A.3. Let $\tilde{V}(f_{1},\ldots,f_{k})$ be the $(k+1)p^{2}\times(k+1)p^{2}$ matrix composed of $(k+1)^{2}$ blocks of size $p^{2}\times p^{2}$ with block $(i+1),(j+1)$ defined similarly as $\Sigma(f_{i},f_{j})$ in Appendix A.2, but with $R$ replaced by $R_{z}$ . Then, for $n\geq n_{0}$ , ${F_{k}}$ is defined as

[TABLE]

A.5 Map for data application

Appendix B Proofs

B.1 Introduction

We first prove Proposition 2.3 in Section B.2. Then Section B.3 provides general results for the proofs of Propositions 3.1, 3.2, 3.4, 4.3 and 4.4. Section B.4 provides the proofs of Propositions 3.1, 3.2 and 3.4. Section B.5 provides the proof of Proposition 4.2. Section B.6 provides the proofs of Propositions 4.3 and 4.4.

Propositions 3.1, 3.2, 3.4, 4.3 and 4.4 correspond to Proposition B.7, B.9, B.17, B.27 and B.29, respectively.

B.2 Proof of Proposition 2.3

We let $D(f)=\Omega^{-1}M(f)\Omega^{-{\mathrm{\scriptscriptstyle T}}}$ for $f:\mathbb{R}^{d}\to\mathbb{R}$ . We restate Proposition 2.3 and prove it.

Proposition B.1.

*The unmixing problem given by $f$ is identifiable if and only if the diagonal elements of $\Omega^{-1}M(f)\Omega^{-{\mathrm{\scriptscriptstyle T}}}$ are distinct. *

Proof B.2 (of Proposition B.1 (Proposition 2.3)).

We have that $\Gamma(f)$ is an unmixing functional if and only if

[TABLE]

*Hence the matrix $\Gamma(f)\Omega$ is orthogonal, and its rows provide the eigenvectors of the diagonal matrix $D(f)$ . If the diagonal elements of $D(f)$ are distinct, then the set of one-dimensional eigenspaces of $D(f)$ is unique and thus $\Gamma(f)\Omega=PS$ where $P$ is a permutation matrix and $S$ is a diagonal matrix with diagonal elements equal to $-1$ or $1$ . If there are two diagonal elements of $D(f)$ that are equal, say the first and the second without loss of generality, then consider the $p\times p$ block diagonal matrix $Q$ with first $2\times 2$ block equal to $\{(2^{-1/2},2^{-1/2})^{\mathrm{\scriptscriptstyle T}},(2^{-1/2},-2^{-1/2})^{\top}\}$ and second $(p-2)\times(p-2)$ block equal to $I_{p-2}$ . Then there is an unmixing functional $\Gamma(f)$ such that $\Gamma(f)\Omega=Q$ . In this case, $\Gamma(f)=Q\Omega^{-1}$ , which is not of the form $PS\Omega^{-1}$ . *

B.3 General results

Recall that $d\in\mathbb{N}$ and $p\in\mathbb{N}$ are fixed. $Z_{1},\ldots,Z_{p}$ are $p$ independent stationary Gaussian processes on $\mathbb{R}^{d}$ with zero mean functions, unit variances and covariance functions $K_{1},\ldots,K_{p}$ . We have $Z=(Z_{1},\ldots,Z_{p})^{\mathrm{\scriptscriptstyle T}}$ and $X=(X_{1},\ldots,X_{p})^{\mathrm{\scriptscriptstyle T}}=\Omega Z$ with $\Omega$ a fixed invertible $p\times p$ matrix.

Let $s_{1},\ldots,s_{n}$ be the $n$ observation points in $\mathcal{S}^{d}$ and let $f$ be a kernel function from $\mathbb{R}^{d}$ into $\mathbb{R}$ . We recall

[TABLE]

and

[TABLE]

where $D(s_{i},s_{j})$ is the $p\times p$ diagonal matrix defined by

[TABLE]

Let $|x|=\max_{i=1,\dots,m}|x_{i}|$ be the sup norm for $x\in\mathbb{R}^{m}$ . Since this norm is equivalent to the Euclidean norm, and since we work under Assumptions 3.1 to 3, we can assume without loss of generality that the following conditions hold.

{condition}

With $\Delta>0$ defined in Assumption 3.1, for all $n\in\mathbb{N}$ and for all $a\neq b$ , $a,b\in\{1,\ldots,n\}$ , we have $|s_{a}-s_{b}|\geq\Delta$ .

{condition}

With $A<+\infty$ and $\alpha>0$ defined in Assumptions 3.1 and 3, for all $s\in\mathbb{R}^{d}$ and for all $k=1,\ldots,p$ , we have

[TABLE]

{condition}

With $A<+\infty$ and $\alpha>0$ defined in Assumptions 3.1 and 3, for all $s\in\mathbb{R}^{d}$ , we have

[TABLE]

For a matrix $M$ , denote by $M_{i,j}$ the element from the $i$ th row and the $j$ th column of $M$ . For a vector $V_{n}$ or a matrix $M_{n}$ , denote by $(V_{n})_{i}$ the $i$ th element of $V_{n}$ and by $(M_{n})_{i,j}$ the element from the $i$ th row and the $j$ th column of $M_{n}$ . The singular values of a $n\times n$ matrix $M$ are denoted by $\rho_{1}(M)\geq\dots\geq\rho_{n}(M)\geq 0$ and, in the case when $M$ is symmetric, the eigenvalues are denoted by $\lambda_{1}(M)\geq\dots\geq\lambda_{n}(M)$ . The spectral norm is given by $\rho_{1}(M)$ and $\|M\|_{F}^{2}=\sum_{i,j}(M_{i,j})^{2}$ denotes the Frobenius norm. For a sequence of random variables $X_{n}$ , we write $X_{n}=o_{p}(1)$ when $X_{n}$ converges to [math] in probability as $n\to\infty$ and we write $X_{n}=O_{p}(1)$ when $X_{n}$ is bounded in probability as $n\to\infty$ . Let $e_{i}(k)$ be the $i$ th base column vector of $\mathbb{R}^{k}$ . Let $y$ be the $np\times 1$ vector defined by $y_{(i-1)p+j}=X_{j}(s_{i})$ , for $i=1,\ldots,n$ , $j=1,\ldots,p$ .

Lemma B.3.

Under Conditions B.3 and B.3, there exists a finite constant $C<+\infty$ so that for all $n\in\mathbb{N}$ ,

[TABLE]

Proof B.4.

Let $\ell_{a}^{\mathrm{\scriptscriptstyle T}}$ denote the $a$ th row of $\Omega$ . We have

[TABLE]

*from Condition B.3. Note also that $\lambda_{1}\{\mathrm{cov}(y)\}=\lambda_{1}\{\mathrm{cov}(\tilde{y})\}$ where $\tilde{y}$ is the $np\times 1$ vector defined by $\tilde{y}_{(j-1)n+i}=X_{j}(s_{i})$ for $i=1,\ldots,n$ , $j=1,\ldots,p$ . Hence, the lemma is a direct consequence of Lemma 6 in Furrer et al. (2016). *

The next theorem provides a general multivariate central limit theorem for quadratic forms of Gaussian vectors. It extends standard central limit theorems in spatial statistics, see, e.g., Bachoc (2014) or Istas & Lang (1997), by allowing cases where the sequence of covariance matrices is non-converging or asymptotically singular. The full proof is given for self-consistency, although some of the arguments have appeared previously.

Theorem B.5.

Let $(y_{n})$ be a sequence of $n$ -dimensional centered Gaussian vectors. Let $R_{n}$ be the covariance matrix of $y_{n}$ . Assume that for all $n$ , $\lambda_{1}(R_{n})\leq A$ where $A$ is a fixed finite constant. Let $k\in\mathbb{N}$ be fixed and let $(T_{1,n}),\ldots,(T_{k,n})$ be $k$ sequences of deterministic $n\times n$ symmetric matrices. Assume that for $i=1,\ldots,k$ , for $n\in\mathbb{N}$ , $\rho_{1}(T_{i,n})\leq A$ . Let $\Sigma_{n}$ be the $k\times k$ matrix defined for $1\leq i,j\leq k$ , by

[TABLE]

Let $r_{n}$ be the $k$ -dimensional vector defined for $i=1,\ldots,k$ , by

[TABLE]

Let $V_{n}$ be the $k\times 1$ vector defined for $1\leq i\leq k$ , by

[TABLE]

Let $Q_{n}$ be the probability measure of ${n}^{1/2}(V_{n}-r_{n})$ on $\mathbb{R}^{k}$ . Let $\mathcal{N}(0,\Sigma_{n})$ be the Gaussian distribution on $\mathbb{R}^{k}$ with mean vector [math] and covariance matrix $\Sigma_{n}$ . Let $d_{w}$ denote a metric generating the topology of weak convergence on the set of Borel probability measures on $\mathbb{R}^{k}$ ; for specific examples see the discussion in Dudley (2002) p. 393. Then we have, for $n\to\infty$ ,

[TABLE]

Proof B.6.

Assume that $d_{w}\{Q_{n},\mathcal{N}(0,\Sigma_{n})\}\not\to 0$ when $n\to\infty$ . Then there exists $\epsilon>0$ fixed and a subsequence $n_{m}$ so that $d_{w}\{Q_{n_{m}},\mathcal{N}(0,\Sigma_{n_{m}})\}\geq\epsilon$ . Let $a_{1},\ldots,a_{k}\in\mathbb{R}$ be fixed. Let $S_{n_{m}}=R_{n_{m}}^{1/2}(\sum_{i=1}^{k}a_{i}T_{i,n_{m}})R_{n_{m}}^{1/2}$ . We have

[TABLE]

Hence, we see that $\Sigma_{n_{m}}$ is a non-negative matrix, and, from the assumptions on $(R_{n_{m}})$ and $(T_{i,n_{m}})$ , that $(\Sigma_{n_{m}})_{i,i}\leq 2A^{4}$ . Also, $|(r_{n})_{i}|=|n^{-1}\mathrm{tr}(R_{n}^{1/2}T_{i,n}R_{n}^{1/2})|\leq A^{2}$ . Hence, by compacity, and up to extracting a further subsequence, we can assume that $r_{n_{m}}\to r$ and $\Sigma_{n_{m}}\to\Sigma$ when ${n_{m}\to\infty}$ . One can show simply that $d_{w}\{\mathcal{N}(0,\Sigma_{n_{m}}),\mathcal{N}(0,\Sigma)\}\to 0$ when ${n_{m}\to\infty}$ . Hence, when $n_{m}\to\infty$ ,

[TABLE]

Let us prove (B.2) . We have

[TABLE]

where we have applied the triangle inequality for the metric $d_{w}$ in the last inequality above. Hence $\limsup d_{w}\{Q_{n_{m}},\mathcal{N}(0,\Sigma)\}=\limsup d_{w}\{Q_{n_{m}},\mathcal{N}(0,\Sigma_{n_{m}})\}\geq\epsilon$ . Thus (B.2) is proved.

We remark that the matrix $S_{n_{m}}=R_{n_{m}}^{1/2}(\sum_{i=1}^{k}a_{i}T_{i,n_{m}})R_{n_{m}}^{1/2}$ is symmetric, because $T_{1,n_{m}},\ldots,T_{k,n_{m}}$ are assumed to be symmetric in the theorem. Hence, $S_{n_{m}}$ can be diagonalized and there exist a matrix $P_{n_{m}}$ such that $P_{n_{m}}P_{n_{m}}^{\mathrm{\scriptscriptstyle T}}=I_{n_{m}}$ and a diagonal matrix $D_{n_{m}}$ such that $S_{n_{m}}=P_{n_{m}}D_{n_{m}}P_{n_{m}}^{\mathrm{\scriptscriptstyle T}}$ . Let also $z_{n_{m}}=R_{n_{m}}^{-1/2}y_{n_{m}}$ . Observe that $z_{n_{m}}$ follows the $\mathcal{N}(0,I_{n_{m}})$ distribution. We have

[TABLE]

where $\xi_{n_{m}}$ follows the $\mathcal{N}(0,I_{n_{m}})$ distribution. Hence letting

[TABLE]

we have

[TABLE]

If $\sum_{i,j=1}^{k}a_{i}a_{j}\Sigma_{i,j}=0$ , then $\sum_{i,j=1}^{k}a_{i}a_{j}(\Sigma_{n_{m}})_{i,j}\to 0$ when $n_{m}\to\infty$ . Hence, $2n_{m}^{-1}\mathrm{tr}(S_{n_{m}}^{2})\linebreak[1]\to 0$ and so $\mathrm{var}(W_{n_{m}})\to 0$ . Hence $W_{n_{m}}\to\mathcal{N}(0,0)=\mathcal{N}(0,\sum_{i,j=1}^{k}a_{i}a_{j}\Sigma_{i,j})$ in distribution when $n_{m}\to\infty$ .

Now, if $\sum_{i,j=1}^{k}a_{i}a_{j}\Sigma_{i,j}>0$ , one can show from the Lindeberg-Feller central limit theorem that when $n_{m}\to\infty$ , $W_{n_{m}}\to\mathcal{N}(0,\sum_{i,j=1}^{k}a_{i}a_{j}\Sigma_{i,j})$ in distribution, see also Lemma 2 in Istas & Lang (1997).

Hence, since both of the above-considered convergences in distribution hold for any $a_{1},\ldots,a_{k}$ , we have, by Cramér-Wold theorem, that when $n_{m}\to\infty$ , ${n_{m}}^{1/2}(V_{n_{m}}-r_{n_{m}})\to\mathcal{N}(0,\Sigma)$ in distribution. This is in contradiction with (B.2). Hence when ${n\to\infty}$

[TABLE]

B.4 Asymptotics when diagonalising two matrices

The next proposition gives the consistency of $\widehat{M}(f)$ .

Proposition B.7.

*Let Conditions B.3 and B.3 hold and let $f:\mathbb{R}^{d}\to\mathbb{R}$ satisfy Condition B.3. Then as $n\to\infty$ , $\widehat{M}(f)-M(f)\to 0$ in probability. *

Proof B.8.

Clearly ${E}\{\widehat{M}(f)\}=M(f)$ . Let $k,l\in\{1,\ldots,p\}$ be fixed. In order to prove the proposition, it is sufficient to show that when $n\to\infty$ , $\mathrm{var}\{\widehat{M}(f)_{k,l}\}\to 0$ .

We have,

[TABLE]

Let $T_{k,l}(f)$ be the $np\times np$ matrix, that we see as a block matrix composed of $n^{2}$ blocks of sizes $p^{2}$ , and with block $i,j$ equal to $f(s_{i}-s_{j})(1/2)\{e_{k}(p)e_{l}(p)^{\mathrm{\scriptscriptstyle T}}+e_{l}(p)e_{k}(p)^{\mathrm{\scriptscriptstyle T}}\}$ . We remark that $T_{k,l}(f)$ is symmetric. With this notation,

[TABLE]

The largest singular value of $T_{k,l}(f)$ is bounded as $n\to\infty$ . Indeed, from Gershgorin’s circle theorem, $\rho_{1}\{T_{k,l}(f)\}$ is no larger than $\max_{i=1,\ldots,np}\sum_{j=1}^{np}|T_{k,l}(f)_{i,j}|$ . This maximum is no larger than $\max_{i=1,\ldots,n}\sum_{j=1}^{n}|f(s_{i}-s_{j})|$ . This last quantity is bounded as $n\to\infty$ from Condition B.3 and from Lemma 4 in Furrer et al. (2016).

Hence $\rho_{1}\{T_{k,l}(f)\}$ is bounded by a constant $B<+\infty$ . Thus, using Theorem 3.2d.3 in Mathai & Provost (1992) and the fact that $T_{k,l}(f)$ is symmetric,

[TABLE]

*with $\lambda_{1}\{\mathrm{cov}(y)\}\leq C$ from Lemma B.3. *

The next proposition is a corollary of Theorem B.5 and gives the asymptotic normality of $\widehat{M}(f)$ .

Proposition B.9.

Let, for $k,l=1,\ldots,p$ and $f:\mathbb{R}^{d}\to\mathbb{R}$ , $T_{k,l}(f)$ be defined as in the proof of Proposition B.7. Let $R=\mathrm{cov}(y)$ and let $\Sigma(f)$ be the $p^{2}\times p^{2}$ matrix defined by, for $i=(s-1)p+t$ and $j=(u-1)p+v$ , with $s,t,u,v\in\{1,\ldots,p\}$ ,

[TABLE]

Define, for $g:\mathbb{R}^{d}\to\mathbb{R}$ , $\Sigma(f,g)$ as the $p^{2}\times p^{2}$ matrix defined for $i=(s-1)p+t$ and $j=(u-1)p+v$ , with $s,t,u,v\in\{1,\ldots,p\}$ by

[TABLE]

Let

[TABLE]

Then, $V(f,g)$ is symmetric non-negative definite.

Assume that Conditions B.3 and B.3 hold. Let $f_{1},f_{2}:\mathbb{R}^{d}\to\mathbb{R}$ satisfy Condition B.3. Let for $r=1,2$ , $W(f_{r})$ be the vector of size $p^{2}\times 1$ , defined for $i=(a-1)p+b$ , $a,b\in\{1,\ldots,p\}$ , by $W(f_{r})_{i}={n}^{1/2}\{\widehat{M}(f_{r})_{a,b}-M(f_{r})_{a,b}\}$ .

Let $Q_{n}$ be the distribution of $\{W(f_{1})^{\mathrm{\scriptscriptstyle T}},W(f_{2})^{\mathrm{\scriptscriptstyle T}}\}^{\mathrm{\scriptscriptstyle T}}$ . Then as $n\to\infty$

[TABLE]

*Furthermore $\lambda_{1}\{V(f_{1},f_{2})\}$ is bounded as $n\to\infty$ . *

Proposition 3.2 is a direct corollary of Proposition B.9 with $f_{1}=f$ and $f_{2}=f_{0}$ . Moreover, Proposition B.9 gives details concerning the matrix $V(f_{1},f_{2})$ .

Proof B.10.

Let $a,b\in\{1,\ldots,p\}$ and $f:\mathbb{R}^{d}\to\mathbb{R}$ . We have seen in the proof of Proposition B.7 that

[TABLE]

Hence

[TABLE]

Let us first prove that $V(f,g)$ is symmetric non-negative definite. Let $W(f)$ be defined as $W(f_{1})$ , but with $f_{1}$ replaced by $f$ . Let $W(g)$ be defined similarly with $f_{1}$ replaced by $g$ . For $a_{1},\ldots,a_{p^{2}},b_{1},\ldots,b_{p^{2}}\in\mathbb{R}$ , we have

[TABLE]

Now for $i,j=1,\ldots,p^{2}$ , let $a,b,u,v\in\{1,\ldots,p\}$ such that $i=(a-1)p+b$ and $j=(u-1)p+v$ . We have, using Theorem 3.2d.3 in Mathai & Provost (1992) and the fact that $T_{a,b}(f)$ and $T_{u,v}(f)$ are symmetric,

[TABLE]

We show similarly that

[TABLE]

and that

[TABLE]

Hence

[TABLE]

Hence, since a square matrix is uniquely defined by its corresponding quadratic forms, it follows that $V(f,g)$ is the covariance matrix of the vector

[TABLE]

Hence, $V(f,g)$ is symmetric non-negative definite.

*Let us now prove (B.3). We have, from Lemma B.3 and the proof of Proposition B.7, that $\lambda_{1}(R)$ and $\rho_{1}\{T_{a,b}(f_{r})\}$ are bounded as $n\to\infty$ for $r=1,2$ . Hence, (B.3) is a consequence of Theorem B.5. Finally, $\lambda_{1}\{V(f_{1},f_{2})\}$ is bounded as $n\to\infty$ because each component of $V(f_{1},f_{2})$ is bounded as $n\to\infty$ . *

Our objective is now to prove Proposition 3.4, which is a central limit theorem for $\widehat{\Gamma}(f)-\Omega^{-1}$ and $\widehat{\Lambda}(f)-\Lambda(f)$ .

There is an equivariance property in Definition 1.1 that we will exploit. More precisely, let $D_{0}=D(0,0)=I_{p}$ and let

[TABLE]

For $f:\mathbb{R}^{d}\to\mathbb{R}$ , let

[TABLE]

and

[TABLE]

Let $\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}$ and $\Lambda\{\widehat{D}_{0},\widehat{D}(f)\}$ satisfy the following modification of Definition 1.1:

[TABLE]

where $\Lambda\{\widehat{D}_{0},\widehat{D}(f)\}$ is a diagonal matrix with diagonal elements in decreasing order. Then, we can show that

[TABLE]

satisfy Definition 1.1. The above display is the equivariance property that we will exploit. That is, we will first show a central limit theorem for $\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}-I_{p}$ and $\Lambda\{\widehat{D}_{0},\widehat{D}(f)\}-\Lambda(f)$ in Lemma B.15. Then, we will use the equivariance property (B.5) to obtain, directly, a central limit theorem for $\widehat{\Gamma}(f)-\Omega^{-1}$ and $\widehat{\Lambda}(f)-\Lambda(f)$ in Proposition B.17.

In the next lemma, we first show a central limit theorem for $\widehat{D}_{0}-I_{p}$ and $\widehat{D}(f)-D(f)$ . Recall that $\mathrm{vect}(M)=(l_{1}^{\mathrm{\scriptscriptstyle T}},\ldots,l_{k}^{\mathrm{\scriptscriptstyle T}})^{\mathrm{\scriptscriptstyle T}}$ where $l_{1}^{\mathrm{\scriptscriptstyle T}},\ldots,l_{k}^{\mathrm{\scriptscriptstyle T}}$ are the $k$ rows of a matrix $M$ . Recall also the notation $f_{0}(x)=I(x=0)$ .

Lemma B.11.

Let Conditions B.3 and B.3 hold. Let $f:\mathbb{R}^{d}\to\mathbb{R}$ satisfy Condition B.3. Let

[TABLE]

Let $\tilde{V}(f)$ be as $V(f_{0},f)$ in Proposition B.9 but where $R$ is replaced by $\mathrm{cov}(z)$ where $z$ is the $np\times 1$ vector defined for $i=1,\ldots,n$ , $j=1,\ldots,p$ , by $z_{(i-1)p+j}=Z_{j}(s_{i})$ . Let $Q_{\mbox{\small st},n}$ be the distribution of $Y_{n}$ . Then we have

[TABLE]

*Furthermore, $\lambda_{1}(\tilde{V}(f))$ is bounded as $n\to\infty$ . *

Proof B.12.

*The proof is identical to the proof of Proposition B.9. We remark that, with the notation of the proof of Proposition B.9, $W(f_{r})={n}^{1/2}\mathrm{vect}\{\widehat{M}(f_{r})-M(f_{r})\}$ , for $r=1,2$ . *

Now, we show, in Lemma B.13, that the transformation given by (B.4), that defines $\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}$ and $\Lambda\{\widehat{D}_{0},\widehat{D}(f)\}$ from $\widehat{D}_{0}$ and $\widehat{D}(f)$ , is asymptotically linear, so to speak. This will allow us to transfer the central limit theorem for $\widehat{D}_{0}-I_{p}$ and $\widehat{D}(f)-D(f)$ into a central limit theorem for $\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}-I_{p}$ and $\Lambda\{\widehat{D}_{0},\widehat{D}(f)\}-\Lambda(f)$ , for an appropriate choice of $\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}$ . This argument is similar to the delta method in asymptotic statistics.

We will need the following notation. We let $\mathcal{D}(k)=\{1+(i-1)(k+1);i=1,\ldots,k\}$ . We remark that $\{\mathrm{vect}(M)_{i};i\in\mathcal{D}(k)\}=\{M_{i,i};i=1,\ldots,k\}$ for a $k\times k$ matrix $M$ . Let $\bar{\mathcal{D}}_{k}=\{1,\ldots,k^{2}\}\backslash\mathcal{D}_{k}$ . We remark that $\{\mathrm{vect}(M)_{i};i\in\bar{\mathcal{D}}(k)\}=\{M_{i,j};i,j=1,\ldots,k,i\neq j\}$ for a $k\times k$ matrix $M$ . For $a\in\{1,\ldots,k^{2}\}$ , let $I_{k}(a)$ and $J_{k}(a)$ be the unique $i,j\in\{1,\ldots,k\}$ so that $a=k(i-1)+j$ . For $i\in\{1,\ldots,k\}$ , let $d_{k}(i)=1+(i-1)(k+1)$ and note that $\{\mathrm{vect}(M)_{d_{k}(i)};i=1,\ldots,k\}=\{M_{i,i};i=1,\ldots,k\}$ for a $k\times k$ matrix $M$ . For a matrix $M$ of size $k\times k$ , recall that $\mathrm{diag}(M)=(M_{1,1},\ldots,M_{k,k})^{\mathrm{\scriptscriptstyle T}}$ .

Lemma B.13.

Let Conditions B.3 and B.3 hold. Let $f:\mathbb{R}^{d}\to\mathbb{R}$ satisfy Condition B.3. Assume that Assumption 3 holds. Remark then that there exists $n_{0}\in\mathbb{N}$ such that for $n\geq n_{0}$ the diagonal elements of $D(f)$ are strictly decreasing. Write these diagonal elements as $\lambda_{1}>\ldots>\lambda_{p}$ . For $n\geq n_{0}$ , let $A$ be the $p^{2}\times p^{2}$ matrix defined by

[TABLE]

Let $B$ be the $p^{2}\times p^{2}$ matrix defined by

[TABLE]

Let $C$ be the $p\times p^{2}$ matrix defined by

[TABLE]

Let $D$ be the $p\times p^{2}$ matrix defined by

[TABLE]

Then, with probability going to one as $n\to\infty$ , there exist $\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}$ and $\Lambda\{\widehat{D}_{0},\widehat{D}(f)\}$ satisfying (B.4). Furthermore, $\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}$ can be chosen so that as $n\to\infty$ ,

[TABLE]

Proof B.14.

Let us assume that $n\geq n_{0}$ throughout the proof. From Proposition B.7, with probability going to one, the eigenvalues of $D_{0}^{-1/2}\widehat{D}(f)D_{0}^{-1/2}$ are distinct. In the rest of the proof, we set ourselves on the event when this is the case. Then, choose $\widehat{\Gamma}=\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}$ and $\widehat{\Lambda}=\Lambda\{\widehat{D}_{0},\widehat{D}(f)\}$ satisfying (B.4) and such that

[TABLE]

We remark that $\widehat{\Gamma}$ and $\widehat{\Lambda}$ indeed exist, since when $\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}$ and $\Lambda\{\widehat{D}_{0},\widehat{D}(f)\}$ satisfy (B.4), one can multiply each row of $\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}$ by $1$ or $-1$ and still satisfy (B.4).

Let

[TABLE]

and

[TABLE]

Assume that $T_{1}-T_{2}\not\to 0$ in probability when $n\to\infty$ . Then there exist $\epsilon>0$ and a subsequence $n_{m}\to\infty$ so that along $n_{m}$

[TABLE]

One can show, as for the proof of Proposition B.7, that $\limsup\lambda_{1}\{D(f)\}<+\infty$ when $n\to\infty$ . Hence, up to extracting a further subsequence, we can assume that when $n_{m}\to\infty$ , $D(f)\to D_{\infty}(f)$ , where $D_{\infty}(f)$ has distinct, decreasing, eigenvalues.

From Lemma 4.3 in Sun & Sun (2002), since $D_{0}^{-1/2}D_{\infty}(f)D_{0}^{-1/2}=D_{\infty}(f)$ is diagonal, there exists a sequence of random orthogonal matrices $U_{n}$ such that $U_{n_{m}}\widehat{D}_{0}^{-1/2}\widehat{D}(f)\widehat{D}_{0}^{-1/2}U_{n_{m}}^{\mathrm{\scriptscriptstyle T}}=\Lambda_{n_{m}}$ is diagonal and goes to $D_{\infty}(f)$ in probability and so that $U_{n_{m}}\to I_{p}$ in probability when $n_{m}\to\infty$ . Hence, the pair $(U_{n_{m}}\widehat{D}_{0}^{-1/2},\Lambda_{n_{m}})$ satisfies (B.4) with probability going to one. Furthermore, all the matrices $\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}$ for which $\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}$ and $\widehat{\Lambda}$ satisfy (B.4) satisfy $\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}=SU_{n_{m}}\widehat{D}_{0}^{-1/2}$ where $S$ is a diagonal matrix with diagonal elements equal to $-1$ or $1$ . Hence with probability going to one, we must have $S=I_{p}$ for (B.6) to be also satisfied. Hence, with probability going to one, $\widehat{\Gamma}=U_{n_{m}}\widehat{D}_{0}^{-1/2}$ . Hence we have finally obtained $\widehat{\Gamma}\to I_{p}$ and $|\widehat{\Lambda}-D(f)|\to 0$ in probability when $n_{m}\to\infty$ .

The rest of the proof is similar to those given in Ilmonen et al. (2010a) and Miettinen et al. (2012). By definition of $\widehat{\Gamma}$ and $\widehat{\Lambda}$ , we have

[TABLE]

Hence

[TABLE]

Also, from Lemma B.11, we have ${n_{m}}^{1/2}(\widehat{D}_{0}-I_{p})=O_{p}(1)$ and ${n_{m}}^{1/2}\{\widehat{D}(f)-D(f)\}=O_{p}(1)$ . Thus, we get

[TABLE]

This then yields

[TABLE]

*This is in contradiction with (B.7), by definition of $A$ , $B$ , $C$ and $D$ . Hence the proof is finished. *

From Lemmas B.11 and B.13, we now obtain a central limit theorem for $\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}-I_{p}$ and $\Lambda\{\widehat{D}_{0},\widehat{D}(f)\}-\Lambda(f)$ . Note that $\Lambda(f)=D(f)$ .

Lemma B.15.

Assume the same conditions as in Lemma B.13 and let $n_{0}$ be defined as in Lemma B.13. Let, for $n\geq n_{0}$ ,

[TABLE]

from Lemma B.13. For $\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}$ and $\Lambda\{\widehat{D}_{0},\widehat{D}(f)\}$ satisfying (B.4), let

[TABLE]

Let $Q_{n}$ be the distribution of $X_{n}$ . Let $\tilde{V}(f)$ be defined as in Lemma B.11. Then, we can choose $\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}$ and $\Lambda\{\widehat{D}_{0},\widehat{D}(f)\}$ satisfying (B.4) such that when $n\to\infty$ ,

[TABLE]

Proof B.16.

*The lemma is a direct consequence of Lemmas B.11 and B.13. The proof is carried out by contradiction, by taking subsequences along which the bounded sequences of matrices $G$ and $\tilde{V}(f)$ converge, and by applying Slutsky’s lemma. *

We now use the equivariance property (B.5) to conclude.

Proposition B.17.

Assume the same conditions as in Lemma B.13. Let $\tilde{V}(f)$ and $G$ be defined as in Lemmas B.11 and B.15. Let $M_{\Omega^{-1}}$ be the $p^{2}\times p^{2}$ matrix defined by

[TABLE]

Let $\bar{M}_{\Omega^{-1}}$ be the matrix of size $(p^{2}+p)\times(p^{2}+p)$ defined by

[TABLE]

For $\Gamma\{\widehat{M}_{0},\widehat{M}(f)\}$ and $\Lambda\{\widehat{M}_{0},\widehat{M}(f)\}$ satisfying Definition 1.1, let

[TABLE]

Let $Q_{n}$ be the distribution of $X_{n}$ . Let, for $n\geq n_{0}$ with $n_{0}$ as in Lemma B.13,

[TABLE]

Then, we can choose $\Gamma\{\widehat{M}_{0},\widehat{M}(f)\},\Lambda\{\widehat{M}_{0},\widehat{M}(f)\}$ , satisfying Definition 1.1, so that when $n\to\infty$ ,

[TABLE]

Proof B.18.

The proof directly follows from Lemma B.15 and from (B.5). Indeed, for $\Gamma\{\widehat{D}_{0},\widehat{D}(f)\}$ and $\Lambda\{\widehat{D}_{0},\widehat{D}(f)\}$ satisfying the central limit theorem in Lemma B.15, we can choose $\Gamma\{\widehat{M}_{0},\widehat{M}(f)\}$ and $\Lambda\{\widehat{M}_{0},\widehat{M}(f)\}$ satisfying Definition 1.1 and such that

[TABLE]

*We also remark that $\Lambda(f)=D(f)$ . *

B.5 Proof of Proposition 4.2

We let $D(f)=\Omega^{-1}M(f)\Omega^{-{\mathrm{\scriptscriptstyle T}}}$ for $f:\mathbb{R}^{d}\to\mathbb{R}$ . We restate Proposition 4.2 and prove it.

Proposition B.19.

*The unmixing problem given by $f_{1},\ldots,f_{k}$ is identifiable if and only if for every pair $i\neq j$ , $i,j=1,\ldots,p$ , there exists $l=1,\ldots,k$ such that $\{\Omega^{-1}M(f_{l})\Omega^{-{\mathrm{\scriptscriptstyle T}}}\}_{i,i}\neq\{\Omega^{-1}M(f_{l})\Omega^{-{\mathrm{\scriptscriptstyle T}}}\}_{j,j}$ . *

Proof B.20 (of Proposition B.19 (Proposition 4.2)).

We have that $\Gamma$ satisfies (4.2) if and only if

[TABLE]

If the condition of the proposition holds, then only orthogonal matrices of the form $PS$ satisfy (B.8), with the double sum being equal to

[TABLE]

*where $P$ is a permutation matrix and $S$ is a diagonal matrix with diagonal elements equal to $-1$ or $1$ , see the end of the proof of Lemma B.21 below. Assume now that there exist $i\neq j$ , $i,j=1,\ldots,p$ , such that for $l=1,\ldots,k$ , $\{\Omega^{-1}M(f_{l})\Omega^{-{\mathrm{\scriptscriptstyle T}}}\}_{i,i}=\{\Omega^{-1}M(f_{l})\Omega^{-{\mathrm{\scriptscriptstyle T}}}\}_{j,j}$ . Without loss of generality, assume that $i=1$ and $j=2$ . Then consider the $p\times p$ block diagonal matrix $Q$ with first $2\times 2$ block equal to $\{(2^{-1/2},2^{-1/2})^{\mathrm{\scriptscriptstyle T}},(2^{-1/2},-2^{-1/2}){{}^{\mathrm{\scriptscriptstyle T}}}\}$ and second $(p-2)\times(p-2)$ block equal to $I_{p-2}$ . Then one can show that $Q$ satisfies (B.8). In this case, $\Gamma=Q\Omega^{-1}$ satisfies (4.2), and is not of the form $PS\Omega^{-1}$ . *

B.6 Asymptotics when diagonalising more than two matrices

Lemma B.21.

Let Conditions B.3 and B.3 hold. Let $k\in\mathbb{N}$ be fixed. Let $f_{1},\ldots,f_{k}:\mathbb{R}^{d}\to\mathbb{R}$ satisfy Condition B.3. Assume that Assumption 4 holds. Let $\widehat{\Gamma}=\widehat{\Gamma}\{\widehat{D}_{0},\widehat{D}(f_{1}),\ldots,\widehat{D}(f_{k})\}$ be such that

[TABLE]

*Then we can choose $\widehat{\Gamma}$ so that $\widehat{\Gamma}\to I_{p}$ in probability when $n\to\infty$ . *

Proof B.22.

Let, for $U$ a $p\times p$ orthogonal matrix with rows $u_{1}^{\mathrm{\scriptscriptstyle T}},\ldots,u_{p}^{\mathrm{\scriptscriptstyle T}}$ ,

[TABLE]

Let

[TABLE]

We observe that any orthogonal matrix can be obtained from a matrix in $E_{0}$ , by row permutation and row multiplication by $1$ or $-1$ . Hence, for any $n$ , there exists $\widehat{U}$ so that $\widehat{U}\in\operatornamewithlimits{argmax}_{U\in E_{0}}\widehat{g}(U)$ and $\widehat{U}\widehat{D}_{0}^{-1/2}$ satisfies (B.9).

We now aim at showing that $\widehat{U}\to I_{p}$ in probability as $n\to\infty$ , which will conclude the proof since $\widehat{D}_{0}\to I_{p}$ in probability. Assume that this is not the case. Then, there exists $\epsilon>0$ and a subsequence $(n_{m})_{m\in\mathbb{N}}$ so that for all $m\in\mathbb{N}$ and along $n_{m}$

[TABLE]

The matrices $D(f_{1}),\ldots,D(f_{l})$ are bounded (this can be shown as in Proposition B.7). Hence, by compacity, up to extracting a further subsequence, we have that (B.10) holds along $n_{m}$ and, as $m\to\infty$ and along $n_{m}$ , $D(f_{1})\to D_{\infty}(f_{1}),\ldots,D(f_{k})\to D_{\infty}(f_{k})$ .

We let

[TABLE]

We have, from Proposition B.7 and as observed in Miettinen et al. (2016), that, as $m\to\infty$ and along $n_{m}$ ,

[TABLE]

in probability as $m\to\infty$ . Hence, using a standard M-estimator argument and because $E_{0}$ is compact, if the unique maximum of $g_{\infty}$ on $E_{0}$ is $I_{p}$ , we obtain that, as $m\to\infty$ and along $n_{m}$ , $\widehat{U}\to I_{p}$ in probability. This is contradictory to (B.10).

Hence, to conclude the proof, it suffices to show that the unique maximum of $g_{\infty}$ on $E_{0}$ is $I_{p}$ . We have

[TABLE]

Also,

[TABLE]

We next show that the identity matrix $I_{p}$ is the unique maximizer of $g_{\infty}$ in $E_{0}$ . To see this, consider an arbitrary orthogonal matrix $U$ which maximizes $g_{\infty}$ . From (B.11) we see that $U^{\mathrm{\scriptscriptstyle T}}D_{\infty}(f_{l})U$ is a diagonal matrix for all $l=1,\ldots,k$ . Then, by its non-singularity, the matrix $U$ must have a column with a non-zero first element. Call the first (from the left) such column of $U$ by $u$ . We show that all other elements of $u$ must be zero. By the previous, $u$ is an eigenvector of all $D_{\infty}(f_{l})$ and we have,

[TABLE]

for some eigenvalues $\psi_{l}\in\mathbb{R}$ , $l=1,\ldots,k$ . Assume then that $u$ has a second non-zero element at some arbitrary position $q\neq 1$ , meaning that both $u_{1},u_{q}\neq 0$ . Then we write

[TABLE]

*which in turn implies that $D_{\infty}(f_{l})_{1,1}=D_{\infty}(f_{l})_{q,q}$ for all $l=1,\ldots,k$ . By a continuity argument, this is a contradiction with Assumption 4. As the choice of $q$ was arbitrary, the only non-zero element in $u$ is the first. Repeating now the same reasoning for other elements besides the first, we observe that each column of the maximizer $U$ must have a single non-zero element, and by its orthogonality we have $U=PD$ for some permutation matrix $P$ and some diagonal matrix $D$ with diagonal components in $\{-1,1\}$ . The only matrix of that form belonging to $E_{0}$ is $I_{p}$ and thus, for all $U\in E_{0}$ with $U\neq I_{p}$ , we have $g(U)<g(I_{p})$ . *

Lemma B.23.

Assume the same setting and conditions as in Lemma B.21. For a diagonal matrix $\Lambda$ , let $\Lambda_{r}=\Lambda_{r,r}$ . Let $A_{0},A_{1},\ldots,A_{k}$ and $B$ be $p^{2}\times p^{2}$ matrices defined by, for $n\geq n_{0}$ with the notation of Assumption 4,

[TABLE]

for $l=1,\ldots,k$ ,

[TABLE]

and

[TABLE]

Then, as $n\to\infty$ , there exists $\widehat{\Gamma}$ satisfying (B.9) so that

[TABLE]

Proof B.24.

Assume that $n\geq n_{0}$ throughout the proof. Let $\widehat{\Gamma}$ satisfy (B.9) and $\widehat{\Gamma}\to I_{p}$ in probability when $n\to\infty$ (the existence follows from Lemma B.21). The proof of the lemma follows the proofs of ii) in Theorem 2 of Miettinen et al. (2016) and Theorem 3 in Virta et al. (2018) and as such, we present below only some key steps.

From Lemma B.11, we have $n^{1/2}(\widehat{D}_{0}-I_{p})=O_{p}(1)$ and $n^{1/2}\{\widehat{D}(f_{l})-D(f_{l})\}=O_{p}(1)$ , for all $l=1,\ldots,k$ . By a continuity argument and our assumptions, we further have $D(f_{1})\to D_{\infty}(f_{1}),\ldots,D(f_{k})\to D_{\infty}(f_{k})$ such that the limit matrices satisfy: there exists a fixed $\delta>0$ so that for every pair $i\neq j$ , $i,j=1,\ldots,p$ , there exists $l=1,\ldots,k$ such that $|D_{\infty}(f_{l})_{i,i}-D_{\infty}(f_{l})_{j,j}|\geq\delta$ . The previous convergence holds up to extracting a subsequence. We omit this step in this proof for concision, but see the proof of Lemma B.13. Finally, the rotation $\widehat{U}$ so that $\widehat{\Gamma}=\widehat{U}\widehat{D}_{0}^{-1/2}$ also satisfies $\widehat{U}\to I_{p}$ in probability.

Then, as in Virta et al. (2018), the maximisation problem,

[TABLE]

yields the estimation equations $n^{1/2}\widehat{Y}=n^{1/2}\widehat{Y}^{\mathrm{\scriptscriptstyle T}}$ , where

[TABLE]

where we have used the shorthand $\widehat{R}(f_{l})=\widehat{D}_{0}^{-1/2}\widehat{D}(f_{l})\widehat{D}_{0}^{-1/2}$ and $\mathrm{Diag}(M)$ is equal to the square matrix $M$ but with its off-diagonal elements set to zero. Linearizing the estimating equations asymptotically and vectorizing, we arrive at the following form,

[TABLE]

where $K$ is the $p^{2}\times p^{2}$ commutation matrix satisfying $K^{2}=I_{p}$ , $n^{1/2}\widehat{F}=\sum_{l=1}^{k}n^{1/2}\{\widehat{R}(f_{l})-D(f_{l})\}D_{\infty}(f_{l})=O_{p}(1)$ and $\mathrm{vec}(M)=(c_{1}^{\mathrm{\scriptscriptstyle T}},\ldots,c_{q}^{\mathrm{\scriptscriptstyle T}})^{\mathrm{\scriptscriptstyle T}}$ is the column vectorization where $c_{1},\ldots,c_{q}$ are the ${q}$ columns of a matrix $M$ . The orthogonality constraint can be similarly linearized to yield,

[TABLE]

Summing (B.12) and (B.13), we obtain,

[TABLE]

where $\widehat{A}\rightarrow A=(I_{p}-K)\{(\sum_{l=1}^{k}D_{l}^{2}\otimes I_{p})+(\sum_{l=1}^{k}D_{l}\otimes D_{l})K\}+I_{p}+K$ in probability, where we use the notation $D_{l}=D_{\infty}(f_{l})$ , $l=1,\ldots,k$ . Using the fact that $K(A\otimes B)K=B\otimes A$ for any conformable matrices $A,B$ , we get the alternative form,

[TABLE]

Continuing as in Virta et al. (2018), each diagonal element of $\widehat{U}$ has a corresponding $1\times 1$ diagonal block equal to 2 in $A$ . Similarly, each pair of $(a,b)$ th and $(b,a)$ th off-diagonal elements in $\widehat{U}$ has a corresponding $2\times 2$ sub-matrix in $A$ of the form,

[TABLE]

where $d_{la}$ is the $a$ th diagonal element of $D_{l}$ . The inverse of the sub-matrix is

[TABLE]

showing that $A$ is invertible as by our assumptions $\sum_{l=1}^{k}(d_{la}-d_{lb})^{2}\neq 0$ for all distinct pairs $a,b=1,\ldots,p$ . Thus, by Slutsky’s theorem, we obtain from (B.14) that,

[TABLE]

showing that, $n^{1/2}(\widehat{U}-I_{p})=O_{p}(1)$ . Consequently, also $n^{1/2}(\widehat{\Gamma}-I_{p})=O_{p}(1)$ .

Finally, we next proceed as in the proof of ii) in Theorem 2 of Miettinen et al. (2016) to obtain that, as $n\to\infty$ ,

[TABLE]

and for $i\neq j$ ,

[TABLE]

*Hence, the lemma follows from the definition of $A_{0},A_{1},\ldots,A_{k},B$ . *

Lemma B.25.

Assume the same settings and conditions as in Lemma B.21. Let $\Sigma(f,g)$ be as in Proposition B.9 with $R$ replaced by $\mathrm{cov}(z)$ where $z$ is the $np\times 1$ vector defined by, for $i=1,\ldots,n$ , $j=1,\ldots,p$ , $z_{(i-1)p+j}=Z_{j}(s_{i})$ . Let $f_{0}(x)=I(x=0)$ . Let $\tilde{V}(f_{1},\ldots,f_{k})$ be the $(k+1)p^{2}\times(k+1)p^{2}$ matrix, composed of $(k+1)^{2}$ blocks of sizes $p^{2}\times p^{2}$ with block $(i+1),(j+1)$ equal to $\Sigma(f_{i},f_{j})$ for $i,j=0,\ldots,p$ .

Let $G$ be the $p^{2}\times(k+1)p^{2}$ matrix defined by $G=B(A_{0},A_{1},\ldots A_{k})$ , for $n\geq n_{0}$ , with the notation of Lemma B.23. Let $M_{\Omega^{-1}}$ be as in Proposition B.17 and let, for $n\geq n_{0}$ ,

[TABLE]

Then, $\widehat{\Gamma}=\widehat{\Gamma}\{\widehat{M}_{0},\widehat{M}(f_{1}),\ldots,\widehat{M}(f_{k})\}$ satisfying (B.9), with $\widehat{D}_{0},\widehat{D}(f_{1}),\ldots,\widehat{D}(f_{k})$ replaced by $\widehat{M}_{0},\widehat{M}(f_{1}),\ldots,\widehat{M}(f_{k})$ , can be chosen so that, with $Q_{n}$ the distribution of ${n}^{1/2}\mathrm{vect}(\widehat{\Gamma}-\Omega^{-1})$ , we have as $n\to\infty$

[TABLE]

Proof B.26.

*The proof is the same as that of Proposition B.17. In particular, for $\widehat{\Gamma}\{\widehat{D}_{0},\widehat{D}(f_{1}),\ldots,\widehat{D}(f_{k})\}$ satisfying (B.9), the matrix $\widehat{\Gamma}\{\widehat{D}_{0},\widehat{D}(f_{1}),\ldots,\widehat{D}(f_{k})\}\Omega^{-1}$ satisfies (B.9), with $\widehat{D}_{0},\widehat{D}(f_{1}),\ldots,\widehat{D}(f_{k})$ replaced by $\widehat{M}_{0},\widehat{M}(f_{1}),\ldots,\widehat{M}(f_{k})$ . *

Proposition B.27.

*Assume the same settings and conditions as in Lemma B.21. Let $(\widehat{\Gamma}_{n})_{n\in\mathbb{N}}$ be any sequence of $p\times p$ matrices so that for any $n\in\mathbb{N}$ , $\widehat{\Gamma}_{n}=\widehat{\Gamma}_{n}\{\widehat{M}_{0},\widehat{M}(f_{1}),\ldots,\widehat{M}(f_{k})\}$ satisfies (B.9), with $\widehat{D}_{0},\widehat{D}(f_{1}),\ldots,\widehat{D}(f_{k})$ replaced by $\widehat{M}_{0},\widehat{M}(f_{1}),\ldots,\widehat{M}(f_{k})$ . Then, there exists a sequence of permutation matrices $(P_{n})$ and a sequence of diagonal matrices $(D_{n})$ , with diagonal components in $\{-1,1\}$ so that, with $\check{\Gamma}_{n}=D_{n}P_{n}\widehat{\Gamma}_{n}$ , the sequence $(\check{\Gamma}_{n})$ satisfies the conclusions of Lemma B.21, with the limit $I_{p}$ replaced by $\Omega^{-1}$ . *

Proof B.28.

*With the notation of the proof of Lemma B.21, for $\widehat{\Gamma}_{n}$ satisfying (B.9), there exist $P_{n},D_{n}$ , as described in the proposition, so that $D_{n}P_{n}\widehat{\Gamma}_{n}\widehat{D}_{0}^{1/2}\in E_{0}$ and $D_{n}P_{n}\widehat{\Gamma}_{n}$ satisfies (B.9). Hence, with the same argument as in the proof of the last part of Lemma B.21, we have $D_{n}P_{n}\widehat{\Gamma}_{n}\to I_{p}$ in probability as $n\to\infty$ . Furthermore, as in the proof of Lemma B.25, we can show that $D_{n}P_{n}\widehat{\Gamma}_{n}\Omega^{-1}$ satisfies the conclusion of this lemma. The proof is concluded by observing that any matrix $\bar{\Gamma}$ satisfies (B.9), with $\widehat{D}_{0},\widehat{D}(f_{1}),\ldots,\widehat{D}(f_{k})$ replaced by $\widehat{M}_{0},\widehat{M}(f_{1}),\ldots,\widehat{M}(f_{k})$ , if and only if the corresponding matrix $\bar{\Gamma}\Omega$ satisfies (B.9). *

Proposition B.29.

*Assume the same settings and conditions as in Lemma B.21. Let $(\widehat{\Gamma}_{n})_{n\in\mathbb{N}}$ be any sequence of $p\times p$ matrices so that for any $n\in\mathbb{N}$ , $\widehat{\Gamma}_{n}=\widehat{\Gamma}_{n}\{\widehat{M}_{0},\widehat{M}(f_{1}),\ldots,\widehat{M}(f_{k})\}$ satisfies (B.9), with $\widehat{D}_{0},\widehat{D}(f_{1}),\ldots,\widehat{D}(f_{k})$ replaced by $\widehat{M}_{0},\widehat{M}(f_{1}),\ldots,\widehat{M}(f_{k})$ . Then, there exists a sequence of permutation matrices $(P_{n})$ and a sequence of diagonal matrices $(D_{n})$ , with diagonal components in $\{-1,1\}$ so that, with $\check{\Gamma}_{n}=D_{n}P_{n}\widehat{\Gamma}_{n}$ , the sequence $(\check{\Gamma}_{n})$ satisfies the conclusions of Lemma B.25. *

Proof B.30.

*The proof is the same as that of Proposition B.27. *

The results of Propositions 4.3 and 4.4 derive directly from Propositions B.27 and B.29.

Lemma B.31.

Let Conditions B.3 and B.3 hold. Let $f$ satisfy Condition B.3. Let $\bar{X}=n^{-1}\sum_{i=1}^{n}X(s_{i})$ . Let

[TABLE]

Then as $n\to\infty$

[TABLE]

Proof B.32.

Let ${a},l\in\{1,\ldots,p\}$ and let $f_{i,j}=f(s_{i}-s_{j})$ . We have

[TABLE]

Now, for $q=1,\ldots,p$ , ${E}(\bar{X}_{q})=0$ and

[TABLE]

Also, $\max_{i=1,\ldots,n}\sum_{j=1}^{n}|\mathrm{cov}\{X_{q}(s_{i}),X_{q}(s_{j})\}|$ is bounded because of (B.4) and Lemma 4 in Furrer et al. (2016). Hence $\mathrm{var}(\bar{X}_{q})=O(1/n)$ and so $\bar{X}_{q}=O_{p}(n^{-1/2})$ .

Also, let

[TABLE]

Then ${E}(\epsilon_{q})=0$ and

[TABLE]

From Condition B.3 and Lemma 4 in Furrer et al. (2016), there exists a finite constant $H$ so that

[TABLE]

Hence

[TABLE]

as before. Hence $\epsilon_{q}=O_{p}(n^{-1/2})$ . Also, we have seen above that

[TABLE]

*is bounded. Hence, from (B.32), we conclude the proof of the lemma. *

Appendix C Simulation complements

C.1 Asymptotic approximate distribution of the unmixing matrix estimator

In Fig. 5.3 of Section 5.2, we used only the expected value of the asymptotic approximation. In Fig. C.1 we have plotted the estimated densities of $n(p-1)\mathrm{\sc MDI}(\widehat{{\Gamma}})^{2}$ , in solid lines, against the densities of the corresponding asymptotic approximations, in dashed lines, for all local covariance matrices and a few selected sample sizes. The density functions of the asymptotic approximations are estimated from a sample of 100,000 random variables drawn from the corresponding distributions. Overall, the two densities fit each other rather well, especially for the local covariance matrices involving the ring kernel. This shows that the asymptotic distribution of $n(p-1)\mathrm{\sc MDI}(\widehat{{\Gamma}})^{2}$ is a good approximation of the true distribution already for small sample sizes.

C.2 A simulation study on individual component estimation accuracy

In this simulation, we investigate the estimation accuracies of the individual latent fields under the spatial blind source separation model.

We use the same setting as in the first simulation study in Section 5. That is, let $X({s})={\Omega}{Z}({s})$ where each of the three independent latent fields has a Matérn covariance function with shape and range parameters $(\kappa,\phi)\in\{(6,\text{1$ \cdot $2}),(1,\text{1$ \cdot $5}),(\text{0$ \cdot $25},1)\}$ , illustrated in the left panel in Fig. 5.1. The location pattern is taken to be the same growing pattern of nested squares depicted on the left of Fig. 5.2.

To quantify the estimation accuracies of the individual components, we use the same strategy as in the real data example of Section 6. Let $\widehat{Z}(s)=\{\widehat{Z}_{1}(s),\widehat{Z}_{2}(s),\widehat{Z}_{3}(s)\}^{\mathrm{\scriptscriptstyle T}}=\widehat{\Gamma}X({s})$ contain the estimated components for a single repetition of the simulation. For each of the three true sources, $j=1,2,3$ , we record the maximum absolute sample correlation between $Z_{j}(s)$ and $\widehat{Z}_{l}(s)$ over $l=1,2,3$ . The larger the maximum absolute correlation, the better the source field $j$ was estimated.

Due to the affine equivariance of the estimators, the estimated components $\widehat{\Gamma}X({s})$ are invariant to the choice of $\Omega$ , up to their signs and order. More precisely, let $\widehat{\Gamma}(I_{p})$ be computed from $\{Z(s_{i})\}_{i=1,\ldots,n}$ according to (4.1), let $\widehat{Z}_{I_{p}}(s)=\widehat{\Gamma}(I_{p})Z(s)$ and recall that $\widehat{\Gamma}$ is computed from $\{X(s_{i})\}_{i=1,\ldots,n}$ according to (4.1). Then we have

[TABLE]

up to order and signs of the components. Therefore, it is without loss of generality that we may again assume that $\Omega=I_{3}$ . Any other choice of $\Omega$ would lead to exactly the same results.

The mean maximum source-wise absolute correlations over 2000 repetitions are shown in Fig. C.2 for a range of sample sizes. We have used two choices of kernels, $R(1,2)$ and $R(7,9)$ . The first one was chosen due to its good performance in the main simulation study and the second one to see how the individual components are estimated under a bad choice of kernel.

The results indicate that the first and third sources are estimated almost equally well, but that the second source is somewhat more difficult to estimate. We postulate that this is caused by its corresponding covariance function being the middle one in the left panel in Fig. 5.1. That is, the first and third sources are unlikely to be mixed with each other due to their extremal covariance functions. The second source, on the other hand, is between the other two and can be mistaken for both the first and the third source. We also note in the right panel of Fig. C.2 that using an inferior choice of a kernel leads to an overall worse estimation accuracy for all sources. Finally, on the left-hand side of Fig. C.2, we observe a convergence to one of the maximum absolute correlation, under the appropriate choice of kernel $R(1,2)$ .

Appendix D Further details concerning the real data example

Figure D.1 describes the sampling area from the Kola data and Figures D.2-D.4 visualize the components discussed in Section 6.

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bachoc (2014) Bachoc, F. (2014). Asymptotic analysis of the role of spatial sampling for covariance parameter estimation of Gaussian processes. J. Multivariate Anal. 125 , 1–35.
2Belouchrani et al. (1997) Belouchrani, A. , Abed-Meraim, K. , Cardoso, J.-F. & Moulines, E. (1997). A blind source separation technique using second-order statistics. IEEE T. Signal Proces. 45 , 434–444.
3Clarkson (1988) Clarkson, D. B. (1988). Remark AS R 71: A remark on algorithm as 211. the F-G diagonalization algorithm. J. R. Stat. Soc. C-Appl. 37 , 147–151.
4Comon & Jutten (2010) Comon, P. & Jutten, C. (2010). Handbook of Blind Source Separation: Independent component analysis and applications . Academic press.
5Cressie (1993) Cressie, N. (1993). Statistics for Spatial Data . John Wiley & Sons, Inc., New York, 2nd ed.
6De Iaco et al. (2013) De Iaco, S. , Myers, D. E. , Palma, M. & Posa, D. (2013). Using simultaneous diagonalization to identify a space–time linear coregionalization model. Math. Geosci. 45 , 69–86.
7Dudley (2002) Dudley, R. M. (2002). Real Analysis and Probability . Cambridge University Press.
8Eddelbuettel & Sanderson (2014) Eddelbuettel, D. & Sanderson, C. (2014). Rcpparmadillo: Accelerating R with high-performance C++ linear algebra. Comput. Stat. Data. An. 71 , 1054–1063.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Spatial blind source separation

Abstract

keywords:

1 Introduction

Definition 1.1**.**

2 Spatial blind source separation model

2.1 General assumptions

2.2 Identifiability

Definition 2.1**.**

Definition 2.2**.**

Proposition 2.3**.**

2.3 Relationships with other models of multivariate random fields

3 Asymptotic properties for simultaneous diagonalisation of two matrices

Proposition 3.1**.**

Proposition 3.2**.**

Remark 3.3**.**

Proposition 3.4**.**

4 Improving the estimation of the spatial blind source separation model by jointly diagonalising more than two matrices

Definition 4.1**.**

Proposition 4.2**.**

Proposition 4.3**.**

Proposition 4.4**.**

5 Simulations

5.1 Preliminaries

5.2

5.3 The effect of range on the efficiency

5.4 Efficiency comparison

6 Data application

7 Discussion

Acknowledgement

Appendix A Appendix

A.1 Notation

A.2 Expression of the matrix V(f,f0)V(f,f_{0})V(f,f0​) from Proposition 3.2

A.3 Expression of the matrix F1{F_{1}}F1​ from Proposition 3.4

A.4 Expression of the matrix Fk{F_{k}}Fk​ from Proposition 4.4

A.5 Map for data application

Appendix B Proofs

B.1 Introduction

B.2 Proof of Proposition 2.3

Proposition B.1**.**

Proof B.2** (of Proposition B.1 (Proposition 2.3)).**

B.3 General results

Lemma B.3**.**

Proof B.4**.**

Theorem B.5**.**

Proof B.6**.**

B.4 Asymptotics when diagonalising two matrices

Proposition B.7**.**

Proof B.8**.**

Proposition B.9**.**

Proof B.10**.**

Lemma B.11**.**

Proof B.12**.**

Lemma B.13**.**

Proof B.14**.**

Lemma B.15**.**

Proof B.16**.**

Proposition B.17**.**

Proof B.18**.**

B.5 Proof of Proposition 4.2

Proposition B.19**.**

Proof B.20** (of Proposition B.19 (Proposition 4.2)).**

B.6 Asymptotics when diagonalising more than two matrices

Lemma B.21**.**

Proof B.22**.**

Lemma B.23**.**

Proof B.24**.**

Lemma B.25**.**

Proof B.26**.**

Proposition B.27**.**

Proof B.28**.**

Proposition B.29**.**

Proof B.30**.**

Definition 1.1.

Definition 2.1.

Definition 2.2.

Proposition 2.3.

Proposition 3.1.

Proposition 3.2.

Remark 3.3.

Proposition 3.4.

Definition 4.1.

Proposition 4.2.

Proposition 4.3.

Proposition 4.4.

A.2 Expression of the matrix $V(f,f_{0})$ from Proposition 3.2

A.3 Expression of the matrix ${F_{1}}$ from Proposition 3.4

A.4 Expression of the matrix ${F_{k}}$ from Proposition 4.4

Proposition B.1.

Proof B.2 (of Proposition B.1 (Proposition 2.3)).

Lemma B.3.

Proof B.4.

Theorem B.5.

Proof B.6.

Proposition B.7.

Proof B.8.

Proposition B.9.

Proof B.10.

Lemma B.11.

Proof B.12.

Lemma B.13.

Proof B.14.

Lemma B.15.

Proof B.16.

Proposition B.17.

Proof B.18.

Proposition B.19.

Proof B.20 (of Proposition B.19 (Proposition 4.2)).

Lemma B.21.

Proof B.22.

Lemma B.23.

Proof B.24.

Lemma B.25.

Proof B.26.

Proposition B.27.

Proof B.28.

Proposition B.29.

Proof B.30.

Lemma B.31.

Proof B.32.