Unsupervised Deep Slow Feature Analysis for Change Detection in   Multi-Temporal Remote Sensing Images

Bo Du; Lixiang Ru; Chen Wu; Liangpei Zhang

arXiv:1812.00645·cs.CV·September 6, 2019

Unsupervised Deep Slow Feature Analysis for Change Detection in Multi-Temporal Remote Sensing Images

Bo Du, Lixiang Ru, Chen Wu, Liangpei Zhang

PDF

1 Repo

TL;DR

This paper introduces a novel deep learning-based change detection method called Deep Slow Feature Analysis (DSFA) that effectively highlights changes in multi-temporal remote sensing images by combining deep networks with slow feature analysis.

Contribution

The paper proposes a new DSFA algorithm that integrates deep networks with SFA for improved change detection in complex multi-temporal remote sensing images, outperforming existing methods.

Findings

01

DSFA outperforms state-of-the-art algorithms in experiments.

02

Deep networks enhance feature extraction for change detection.

03

SFA effectively suppresses unchanged components.

Abstract

Change detection has been a hotspot in remote sensing technology for a long time. With the increasing availability of multi-temporal remote sensing images, numerous change detection algorithms have been proposed. Among these methods, image transformation methods with feature extraction and mapping could effectively highlight the changed information and thus has better change detection performance. However, changes of multi-temporal images are usually complex, existing methods are not effective enough. In recent years, deep network has shown its brilliant performance in many fields including feature extraction and projection. Therefore, in this paper, based on deep network and slow feature analysis (SFA) theory, we proposed a new change detection algorithm for multi-temporal remotes sensing images called Deep Slow Feature Analysis (DSFA). In DSFA model, two symmetric deep networks are…

Tables8

Table 1. TABLE I: Change detection results of Taizhou dataset using OTSU.

OTSU	OA_CHG	OA_UN	OA	Kappa	F1
CVA	0.8439	0.9970	0.9667	0.8890	0.9093
PCA	0.7755	0.9961	0.9525	0.8374	0.8658
MAD	0.8855	0.9474	0.9352	0.8030	0.8148
IRMAD	0.9056	0.9818	0.9667	0.8942	0.9150
USFA	0.7093	0.9922	0.9363	0.7773	0.8148
ISFA	0.8077	0.9991	0.9612	0.8684	0.8918
PCANet	0.8469	0.9992	0.9691	0.8967	0.9155
SDPCANet	0.9151	0.9863	0.9722	0.9115	0.9287
DSFA-64-2	0.8294	0.9982	0.9648	0.8819	0.9032
DSFA-128-2	0.8985	0.9954	0.9763	0.9227	0.9372
DSFA-256-2	0.8450	0.9966	0.9667	0.8888	0.9090

Table 2. TABLE II: Change detection results of Taizhou dataset using Kmeans.

Kmeans	OA_CHG	OA_UN	OA	Kappa	F1
CVA	0.8453	0.9970	0.9670	0.8900	0.9102
PCA	0.7731	0.9964	0.9523	0.8365	0.8649
MAD	0.8827	0.9500	0.9367	0.8066	0.8464
IRMAD	0.9054	0.9818	0.9667	0.8942	0.9149
USFA	0.7166	0.9915	0.9372	0.7814	0.8185
ISFA	0.8074	0.9991	0.9612	0.8683	0.8916
PCANet	0.8469	0.9992	0.9691	0.8967	0.9155
SDPCANet	0.9151	0.9863	0.9722	0.9115	0.9287
DSFA-64-2	0.8316	0.9981	0.9652	0.8830	0.9042
DSFA-128-2	0.9006	0.9951	0.9764	0.9232	0.9377
DSFA-256-2	0.8457	0.9966	0.9668	0.8892	0.9094

Table 3. TABLE III: Best Change detection results of Taizhou dataset.

BEST	OA	Kappa	F1
CVA	0.9756	0.9222	0.9373
PCA	0.9633	0.8810	0.9041
MAD	0.9472	0.8298	0.8626
IRMAD	0.9669	0.8945	0.9150
USFA	0.9476	0.8315	0.8640
ISFA	0.9776	0.9287	0.9426
PCANet	0.9691	0.8967	0.9155
SDPCANet	0.9722	0.9115	0.9287
DSFA-64-2	0.9715	0.9070	0.9254
DSFA-128-2	0.9783	0.9304	0.9439
DSFA-256-2	0.9713	0.9072	0.9250

Table 4. TABLE IV: Change detection results of Nanjing dataset using OTSU.

OTSU	OA_CHG	OA_UN	OA	Kappa	F1
CVA	0.8595	0.9168	0.9076	0.6933	0.7487
PCA	0.8625	0.9363	0.9244	0.7398	0.7853
MAD	0.9534	0.8530	0.8691	0.6236	0.6999
IRMAD	0.9530	0.8922	0.9019	0.6987	0.7568
USFA	0.5959	0.9680	0.9084	0.6234	0.6757
ISFA	0.6416	0.9760	0.9224	0.6816	0.7260
PCANet	0.8680	0.9334	0.9229	0.7367	0.7829
SDPCANet	0.8426	0.9397	0.9242	0.7351	0.7806
DSFA-64-2	0.7288	0.9817	0.9412	0.7647	0.7987
DSFA-128-2	0.7465	0.9806	0.9431	0.7747	0.8078
DSFA-256-2	0.7360	0.9793	0.9403	0.7633	0.7980

Table 5. TABLE V: Change detection results of Nanjing dataset using Kmeans.

Kmeans	OA_CHG	OA_UN	OA	Kappa	F1
CVA	0.8578	0.9184	0.9087	0.6958	0.7506
PCA	0.8650	0.9352	0.9240	0.7390	0.7846
MAD	0.9518	0.8557	0.8711	0.6276	0.7028
IRMAD	0.9564	0.8882	0.8991	0.6924	0.7523
USFA	0.5832	0.9692	0.9074	0.6159	0.6685
ISFA	0.6437	0.9760	0.9227	0.6833	0.7275
PCANet	0.8680	0.9334	0.9229	0.7367	0.7829
SDPCANet	0.8426	0.9397	0.9242	0.7351	0.7806
DSFA-64-2	0.7290	0.9817	0.9412	0.7647	0.7987
DSFA-128-2	0.7463	0.9807	0.9432	0.7748	0.8079
DSFA-256-2	0.7361	0.9792	0.9403	0.7632	0.7980

Table 6. TABLE VI: Best Change detection results of Nanjing dataset.

BEST	OA	Kappa	F1
CVA	0.9248	0.7178	0.7652
PCA	0.9341	0.7518	0.7925
MAD	0.9227	0.7244	0.7725
IRMAD	0.9229	0.7340	0.7815
USFA	0.9164	0.6997	0.7517
ISFA	0.9336	0.7578	0.7984
PCANet	0.9229	0.7367	0.7829
SDPCANet	0.9242	0.7351	0.7806
DSFA-64-2	0.9450	0.7915	0.8244
DSFA-128-2	0.9439	0.7850	0.8195
DSFA-256-2	0.9409	0.7664	0.8015

Table 7. TABLE VII: Change detection results of River dataset.

	Kmeans					OTSU
	OA_CHG	OA_UN	OA	Kappa	F1	OA_CHG	OA_UN	OA	Kappa	F1
CVA	0.8168	0.9082	0.8979	0.5868	0.6432	0.8712	0.8770	0.8764	0.5474	0.6135
PCA	0.5899	0.9532	0.9123	0.5531	0.6024	0.5734	0.9560	0.9129	0.5484	0.5971
MAD	0.8022	0.9142	0.9016	0.5927	0.6474	0.8563	0.8864	0.8830	0.5591	0.6223
IRMAD	0.8093	0.9130	0.9013	0.5940	0.6488	0.8271	0.9059	0.8970	0.5872	0.6440
USFA	0.8297	0.8953	0.8879	0.5638	0.6250	0.8400	0.8871	0.8818	0.5514	0.6155
ISFA	0.6127	0.9377	0.9011	0.5267	0.5826	0.6377	0.9314	0.8984	0.5281	0.5856
PCANet	0.8024	0.9487	0.9322	0.6889	0.7273	0.8024	0.9487	0.9322	0.6889	0.7273
SDPCANet	0.5393	0.9850	0.9348	0.6166	0.6507	0.5393	0.9850	0.9348	0.6166	0.6507
DSFA-64-2	0.6164	0.9848	0.9434	0.6796	0.7293	0.6134	0.9851	0.9432	0.6780	0.7102
DSFA-128-2	0.6877	0.9812	0.9482	0.7207	0.7508	0.6864	0.9815	0.9483	0.7206	0.7494
DSFA-256-2	0.6622	0.9777	0.9422	0.6888	0.7283	0.6615	0.9778	0.9422	0.6884	0.7207

Table 8. TABLE VIII: Best Change detection results of River dataset.

BEST	OA	Kappa	F1
CVA	0.9264	0.6242	0.6841
PCA	0.9204	0.6075	0.6641
MAD	0.9140	0.5972	0.6481
IRMAD	0.9095	0.5984	0.6510
USFA	0.9180	0.6098	0.6590
ISFA	0.9098	0.5285	0.5879
PCANet	0.9322	0.6889	0.7273
SDPCANet	0.9348	0.6166	0.6507
DSFA-64-2	0.9454	0.7109	0.7419
DSFA-128-2	0.9483	0.7270	0.7566
DSFA-256-2	0.9423	0.7007	0.7344

Equations100

mi n_{g_{j}} : ⟨(\overset{g}{˙}_{j} (s))^{2} ⟩_{t}, j \in [1, 2, \dots, M],

mi n_{g_{j}} : ⟨(\overset{g}{˙}_{j} (s))^{2} ⟩_{t}, j \in [1, 2, \dots, M],

⟨ g_{j} (s) ⟩_{t} = 0,

⟨ g_{j} (s) ⟩_{t} = 0,

⟨ g_{j} (s)^{2} ⟩_{t} = 1,

⟨ g_{j} (s)^{2} ⟩_{t} = 1,

\forall i < j : ⟨ g_{i} (s) g_{j} (s) ⟩_{t} = 0,

\forall i < j : ⟨ g_{i} (s) g_{j} (s) ⟩_{t} = 0,

g_{j} (s) = w_{j}^{T} s,

g_{j} (s) = w_{j}^{T} s,

⟨(w_{j}^{T} \overset{s}{˙})^{2} ⟩_{t} = w_{j}^{T} ⟨ \overset{s}{˙} \overset{s}{˙}^{T} ⟩_{t} w_{j} = w_{j}^{T} A w_{j},

⟨(w_{j}^{T} \overset{s}{˙})^{2} ⟩_{t} = w_{j}^{T} ⟨ \overset{s}{˙} \overset{s}{˙}^{T} ⟩_{t} w_{j} = w_{j}^{T} A w_{j},

⟨(w_{j}^{T} s) ⟩_{t} = 0,

⟨(w_{j}^{T} s) ⟩_{t} = 0,

⟨(w_{j}^{T} s) (w_{j}^{T} s) ⟩_{t} = w_{j}^{T} ⟨ s s^{T} ⟩_{t} w_{j} = w_{j}^{T} B w_{j} = 1,

⟨(w_{j}^{T} s) (w_{j}^{T} s) ⟩_{t} = w_{j}^{T} ⟨ s s^{T} ⟩_{t} w_{j} = w_{j}^{T} B w_{j} = 1,

⟨(w_{i}^{T} s) (w_{j}^{T} s) ⟩_{t} = w_{i}^{T} ⟨ s s^{T} ⟩_{t} w_{j} = w_{i}^{T} B w_{j} = 0.

⟨(w_{i}^{T} s) (w_{j}^{T} s) ⟩_{t} = w_{i}^{T} ⟨ s s^{T} ⟩_{t} w_{j} = w_{i}^{T} B w_{j} = 0.

⟨(w_{j}^{T} \overset{s}{˙})^{2} ⟩_{t} = w_{j}^{T} A w_{j} = \frac{w _{j}^{T} A w _{j}}{w _{j}^{T} B w _{j}} = \frac{⟨( w _{j}^{T} s ˙ ) ^{2} ⟩ _{t}}{⟨( w _{j}^{T} s ) ( w _{j}^{T} s ) ⟩ _{t}} .

⟨(w_{j}^{T} \overset{s}{˙})^{2} ⟩_{t} = w_{j}^{T} A w_{j} = \frac{w _{j}^{T} A w _{j}}{w _{j}^{T} B w _{j}} = \frac{⟨( w _{j}^{T} s ˙ ) ^{2} ⟩ _{t}}{⟨( w _{j}^{T} s ) ( w _{j}^{T} s ) ⟩ _{t}} .

A W = B W Λ,

A W = B W Λ,

mi n_{w_{j}} : \frac{1}{n} i = 1 \sum n (w_{j}^{T} x_{i} - w_{j}^{T} y_{i})^{2},

mi n_{w_{j}} : \frac{1}{n} i = 1 \sum n (w_{j}^{T} x_{i} - w_{j}^{T} y_{i})^{2},

\frac{1}{2 n} [i = 1 \sum n w_{j}^{T} x_{i} + i = 1 \sum n w_{j}^{T} y_{i}] = 0,

\frac{1}{2 n} [i = 1 \sum n w_{j}^{T} x_{i} + i = 1 \sum n w_{j}^{T} y_{i}] = 0,

\frac{1}{2 n} [i = 1 \sum n (w_{j}^{T} x_{i})^{2} + i = 1 \sum n (w_{j}^{T} y_{i})^{2}] = 1,

\frac{1}{2 n} [i = 1 \sum n (w_{j}^{T} x_{i})^{2} + i = 1 \sum n (w_{j}^{T} y_{i})^{2}] = 1,

\frac{1}{2 n} [i = 1 \sum n (w_{j}^{T} x_{i}) (w_{l}^{T} x_{i}) + i = 1 \sum n (w_{j}^{T} y_{i}) (w_{l}^{T} y_{i})] = 0.

\frac{1}{2 n} [i = 1 \sum n (w_{j}^{T} x_{i}) (w_{l}^{T} x_{i}) + i = 1 \sum n (w_{j}^{T} y_{i}) (w_{l}^{T} y_{i})] = 0.

A = \frac{1}{n} i = 1 \sum n (x_{i} - y_{i}) (x_{i} - y_{i})^{T},

A = \frac{1}{n} i = 1 \sum n (x_{i} - y_{i}) (x_{i} - y_{i})^{T},

B = \frac{1}{2 n} [i = 1 \sum n x_{i} x_{i}^{T} + i = 1 \sum n y_{i} y_{i}^{T}] .

B = \frac{1}{2 n} [i = 1 \sum n x_{i} x_{i}^{T} + i = 1 \sum n y_{i} y_{i}^{T}] .

\overset{w}{^}_{j} = \frac{w _{j}}{w _{j}^{T} B w _{j}} .

\overset{w}{^}_{j} = \frac{w _{j}}{w _{j}^{T} B w _{j}} .

f_{1}^{1} (X) = s (w_{1}^{1} X + b_{1}^{1}),

f_{1}^{1} (X) = s (w_{1}^{1} X + b_{1}^{1}),

X_{ϕ} = f (θ_{1}, X) = s (w_{o}^{1} f_{l}^{1} (X) + b_{o}^{1}),

X_{ϕ} = f (θ_{1}, X) = s (w_{o}^{1} f_{l}^{1} (X) + b_{o}^{1}),

Y_{ϕ} = f (θ_{2}, Y) = s (w_{o}^{2} f_{l}^{2} (X) + b_{o}^{2}) .

Y_{ϕ} = f (θ_{2}, Y) = s (w_{o}^{2} f_{l}^{2} (X) + b_{o}^{2}) .

Σ_{X X} = \hat{X}_{ϕ} \hat{X}_{ϕ}^{T} + r * I,

Σ_{X X} = \hat{X}_{ϕ} \hat{X}_{ϕ}^{T} + r * I,

Σ_{Y Y} = \hat{Y}_{ϕ} \hat{Y}_{ϕ}^{T} + r * I,

Σ_{Y Y} = \hat{Y}_{ϕ} \hat{Y}_{ϕ}^{T} + r * I,

Σ_{X Y} = (\hat{X}_{ϕ} - \hat{Y}_{ϕ}) (\hat{X}_{ϕ} - \hat{Y}_{ϕ})^{T} .

Σ_{X Y} = (\hat{X}_{ϕ} - \hat{Y}_{ϕ}) (\hat{X}_{ϕ} - \hat{Y}_{ϕ})^{T} .

A_{ϕ} W = B_{ϕ} W Λ \Leftrightarrow B_{ϕ}^{- 1} A_{ϕ} W = W Λ,

A_{ϕ} W = B_{ϕ} W Λ \Leftrightarrow B_{ϕ}^{- 1} A_{ϕ} W = W Λ,

[\frac{1}{2} (Σ_{X X} + Σ_{Y Y})]^{- 1} Σ_{X Y} W = W Λ.

[\frac{1}{2} (Σ_{X X} + Σ_{Y Y})]^{- 1} Σ_{X Y} W = W Λ.

L (θ_{1}, θ_{2}) = t r [(B_{ϕ}^{- 1} A_{ϕ})^{2}],

L (θ_{1}, θ_{2}) = t r [(B_{ϕ}^{- 1} A_{ϕ})^{2}],

\nabla_{A} = \frac{\partial L ( θ _{1} , θ _{2} )}{\partial A _{ϕ}} = 2 B_{ϕ}^{- 1} A_{ϕ} B_{ϕ}^{- 1},

\nabla_{A} = \frac{\partial L ( θ _{1} , θ _{2} )}{\partial A _{ϕ}} = 2 B_{ϕ}^{- 1} A_{ϕ} B_{ϕ}^{- 1},

\nabla_{B} = \frac{\partial L ( θ _{1} , θ _{2} )}{\partial B _{ϕ}} = - 2 B_{ϕ}^{- 1} A_{ϕ} B_{ϕ}^{- 1} A_{ϕ} B_{ϕ}^{- 1} .

\nabla_{B} = \frac{\partial L ( θ _{1} , θ _{2} )}{\partial B _{ϕ}} = - 2 B_{ϕ}^{- 1} A_{ϕ} B_{ϕ}^{- 1} A_{ϕ} B_{ϕ}^{- 1} .

\frac{\partial A _{ϕ}^{ab}}{\partial X ^ _{ϕ}^{ij}} = \frac{1}{n} (ξ_{(a = i)} \hat{X}_{ϕ}^{bj} + ξ_{(b = i)} \hat{X}_{ϕ}^{aj}) - \frac{1}{n} (ξ_{(a = i)} \hat{Y}_{ϕ}^{bj} + ξ_{(b = i)} \hat{Y}_{ϕ}^{aj}),

\frac{\partial A _{ϕ}^{ab}}{\partial X ^ _{ϕ}^{ij}} = \frac{1}{n} (ξ_{(a = i)} \hat{X}_{ϕ}^{bj} + ξ_{(b = i)} \hat{X}_{ϕ}^{aj}) - \frac{1}{n} (ξ_{(a = i)} \hat{Y}_{ϕ}^{bj} + ξ_{(b = i)} \hat{Y}_{ϕ}^{aj}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rulixiang/dsfanet
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Unsupervised Deep Slow Feature Analysis for Change Detection in Multi-Temporal Remote Sensing Images

Bo Du, Lixiang Ru, Chen Wu, and Liangpei Zhang Manuscript submitted December 2, 2018, revised June 12, 2019. This work was supported in part by the National Natural Science Foundation of China under Grants 61601333, 61822113, and 41871243. *Corresponding author: Chen Wu.*B. Du is with the School of Computer Science, and Collaborative Innovation Center of Geospatial Technology, Wuhan University, Wuhan, P.R. China (e-mail: [email protected]). L. Ru is with the School of Computer Science, Wuhan University, Wuhan, P.R. China (e-mail:[email protected]). C. Wu and L. Zhang are with the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, and School of Computer Science, Wuhan University, Wuhan, P.R. China (e-mail: [email protected], [email protected]).

Abstract

Change detection has been a hotspot in remote sensing technology for a long time. With the increasing availability of multi-temporal remote sensing images, numerous change detection algorithms have been proposed. Among these methods, image transformation methods with feature extraction and mapping could effectively highlight the changed information and thus has better change detection performance. However, changes of multi-temporal images are usually complex, existing methods are not effective enough. In recent years, deep network has shown its brilliant performance in many fields including feature extraction and projection. Therefore, in this paper, based on deep network and slow feature analysis (SFA) theory, we proposed a new change detection algorithm for multi-temporal remotes sensing images called Deep Slow Feature Analysis (DSFA). In DSFA model, two symmetric deep networks are utilized for projecting the input data of bi-temporal imagery. Then, the SFA module is deployed to suppress the unchanged components and highlight the changed components of the transformed features. The CVA pre-detection is employed to find unchanged pixels with high confidence as training samples. Finally, the change intensity is calculated with chi-square distance and the changes are determined by threshold algorithms. The experiments are performed on two real-world datasets and a public hyperspectral dataset. The visual comparison and quantitative evaluation have both shown that DSFA could outperform the other state-of-the-art algorithms, including other SFA-based and deep learning methods.

Index Terms:

Change detection, Deep network, Slow feature analysis, Remote sensing images.

I Introduction

Change detection is defined as the process of identifying differences in the state of an object or phenomenon by observing it at different times [1]. With the rapid development of remote sensing technology, more remote sensing images of the earth surface are now available [2, 3, 4]. The multi-temporal remote sensing images covering the same area could help to detect land-cover and land-use changes, so that change detection could be better applied to diverse real-world applications, such as deforestation monitoring, damage assessment, vegetation phenology variation study, and disaster monitoring [5, 6, 7, 8, 9, 10].

Generally, change detection algorithms could be divided into the following categories: 1) Image algebra methods mainly include image difference, image ratio, image regression, and change vector analysis [11, 12]. These methods directly calculate the difference between multi-temporal remote sensing images; 2) Image transformation algorithms extract the effective features of multi-temporal remote sensing images by transforming and combining their feature bands, and mainly include Principle Component Analysis (PCA) [13], Multivariate Alteration Detection (MAD) [14, 15], Gramm-Schmidt transformation (GS) [16] and Independent Component Analysis [17]; 3) Classification methods mainly include post-classification and compound classification, which are both based on classification to obtain land-use categories [18, 19, 20, 21]; 4) Other advanced methods contains the algorithms based on wavelet, Markov random field, and local gradual descent, etc. [22, 23, 24, 25]. Among all these kinds of change detection algorithms, image transformation methods have been widely studied and applied. The basic idea of image transformation is projecting the original multiband images into a new feature space to better separate changed and unchanged pixels. In this process, the most crucial work is to find an effective projecting algorithm to extract the determinative features.

Changed pixels in multi-temporal remote sensing images always have the feature differences with diverse change directions, while the features of unchanged pixels are supposed to be generally invariant [1]. However, owing to the atmospheric conditions, illumination and sensor calibration and so on, those unchanged pixels always have slight differences [26, 27]. Compared with changed pixels, changes of unchanged pixels usually have the consistent direction. By minimizing the feature variation of unchanged pixels, changed pixels could also be highlighted and separated. Inspired by this idea, slow feature analysis is proposed for detecting real changes and obtained satisfactory performance [28, 29].

SFA is a feature learning algorithm that extracts invariant and slowly varying features from input signals [30, 31]. And it has been successfully applied to solve diverse real-world problems, such as human action recognition, dynamic texture recognition and time series analysis, etc [32, 33, 34, 35]. In change detection problems, changed and unchanged pixels correspond to quickly and slowly varying features in SFA, respectively. Based on this theory, Wu, et al. [28] used SFA to suppress the spectral difference between slowly varying unchanged pixels, so that the changed pixels can be highlighted and well detected. By solving SFA problems, the proposed algorithms in [28] could get the projecting matrices to map original data, so that the unchanged components could be suppressed. All these algorithms have shown their good performance in some real-world remote sensing images. However, limited by the feature representative ability, linear SFA algorithms are sometimes not able to separate the changed and unchanged pixels. The potential solutions include projecting original feature into a higher-dimensional complex feature space to improve the model’s complexity and feature representation ability.

Actually, in [36], Wu, et al. proposed a kernel version slow feature analysis (KSFA) for scene change detection. And the results have also shown that nonlinear extension of SFA is effective. However, in this method, KSFA is only designed for computing the change probabilities of bi-temporal scene level features. Some of its details are not suitable for pixel-wise change detection of multi-spectral imagery. Besides, KSFA is sensitive to the selection of kernel function. Different kernel function could lead to very different performances.

Deep networks have been proved to have a powerful ability of representing non-linear functions, and thus can project original features into a more complex feature space [37, 38]. Due to the growing availability of both data and computing resources, deep neural networks have been resurging in these years. Numerous kinds of networks have been developed to complete different tasks, such as classification [39], detection [40], segmentation [41], and feature mapping[38], etc. Besides, in recent years, deep networks have also been applied to learn non-linear transformations of highly correlated datasets, and performed well [42].

Therefore, inspired by the idea of utilizing deep network learning non-linear transformations, we propose a new algorithm called Deep Slow Feature Analysis (DSFA) in this paper. In DSFA, two deep networks are used to extract and represent the features of remote sensing images obtained at different times, respectively. The transformed features by deep networks are then taken as the inputs of SFA to obtain the projecting matrix. The projecting matrix could extract the most invariant component of multi-temporal remote sensing images, so the changed pixels could be accentuated. We formulate the loss function for DSFA model to make sure that the transformed features can represent the original data better. The intention of DSFA is to extract the invariant components of input features, which means that utilizing unchanged pixels as the inputs will help accelerating the training process and improving the final performance. However, in fact, labeled data are usually rare in remote sensing problems. Therefore, in DSFA, we use CVA to make a pre-detection and find unchanged pixel pairs as the inputs for training process. When the deep network is converged, the transformed features will be calculated by passing original features through trained networks. Then the difference of transformed features in SFA space is calculated. Finally, the change intensity map is calculated with chi-square distance, and the binary change map is obtained with threshold algorithms.

The rest of this paper is organized as follows. Section II introduces the SFA theory and the details of SFA in change detection. Section III presents the algorithm details of proposed DSFA. In Section IV, we implement our proposed method and perform experiments on two real-world datasets and a public hyperspectral dataset. In Section V, some settings of our experiments are discussed. And Section VI draws the conclusion of this paper.

II Slow Featue Analysis

In this section, we’ll introduce the mathematical theory of SFA, and how SFA is extended to solve change detection problems. Mathematically, SFA is formulated as follows:

Given a multi-dimensional temporal signal $s(t)=[s_{1}(t),s_{2}(t),\cdots,s_{n}(t)]$ , where $n$ represents the dimension and $t\in[t_{0},t_{1}]$ , the target of SFA is finding a set of transforming functions $[g_{1}(x),g_{2}(x),\cdots,g_{M}(x)]$ to generate the output signal $z(t)=[g_{1}(s),g_{2}(s),\cdots,g_{M}(s)]$ and ensuring that transformed signal is time invariant as possible. Mathematically, the objective function of SFA is

[TABLE]

under the following constraints:

[TABLE]

where $\langle g_{j}(s)\rangle_{t}$ denotes the mean signal of $g_{j}(s)$ over time $t$ and $\dot{g}_{j}(s)$ is the first-order derivate of $g_{j}(s)$ . Therefore, the objective of SFA is minimizing the mean value of the first-order derivate of transformed signal. Among these constraints, Constraint (2) is to simplify the process of solving the optimization problem. Constraint (3) ensures that each output signal could contain certain information. And Constraint (4) is presented to eliminate the correlation between output signals and force each signal carries different type of information.

In the linear case, the transforming function could be expressed as a mapping matrix:

[TABLE]

where $w_{j}^{T}$ denotes the transposition of $w_{j}$ . And the objective function and constraints could be reformulated as follows:

[TABLE]

In (6), $A=\langle\dot{s}\dot{s}^{T}\rangle_{t}$ is the expectation of the covariance matrix of the first-order derivative of input signals. (7) represents Constraint (2), and it can be implemented by pre-processing the input data. (8) and (9) denote Constrain (3) and (4), respectively. And $B=\langle ss^{T}\rangle_{t}$ is the expectation of covariance matrix of original input signals.

In SFA theory, (9) can be integrated to (6) as follows:

[TABLE]

And this optimization problem can be solved by the generalized eigenvalue problem:

[TABLE]

where $W$ and $\Lambda$ is the generalized eigenvector matrix and a diagonal matrix of eigenvalues, respectively. According to (10) and (11), the most invariant component of the output signal has the smallest eigenvalue.

In pixel-based change detection problems, the input signals are raw pixels of remote sensing images, which are discrete. In consequence, SFA need to be reconstructed to cope with discrete cases. As shown in Figure 1, the objective of SFA in change detection problems is suppressing unchanged pixels to highlight changed ones, so that they could be separated much easier. Mathematically, let $x_{i},y_{i}\in\mathds{R}^{m}$ denote corresponding pixels in bi-temporal remote sensing images, where $m$ is the number of bands. After normalizing the input data, the objective of SFA is reformulated as

[TABLE]

where $n$ is the total number of pixels. And constraints are rewritten as

[TABLE]

In the generalized eigenvalue problem of SFA, $A$ and $B$ in $(11)$ are reformulated as follows:

[TABLE]

When $A$ and $B$ are obtained, the eigenvector matrix $W$ will be solved. By normalizing $W$ , the final mapping matrix is obtained.

[TABLE]

Then the change detection result, the difference between transformed bi-temporal images, is calculated as $D_{j}=\hat{w}^{T}x_{j}-\hat{w}^{T}y_{j}$ .

III Methodology

As mentioned above, those existing SFA-based change detection algorithms are all linear. In order to improve the representing ability of features and final change detection performance, in this section, we propose Deep Slow Feature Analysis (DSFA). The main structure of DSFA is shown in Figure 2.

As we can see in Figure 2, the input of DSFA is pairwise pixels of multi-temporal imagery. Then DSFA could be roughly divided to two parts: Deep Network module and SFA constraint. In the Deep Network module, two symmetric networks, whose layers are all Fully Connected Layer, are used to project original input data into a new complex high-dimensional feature space. In Figure 2, the red nodes denote the nodes of input layers, the blue nodes represent the nodes of hidden layers and the yellow nodes are used to represent output layers. Each hidden layer of the Deep Network module has the same number of nodes. After the original data is transformed, we use the SFA constraint to suppress the invariant components and highlight the changed components of transformed features. We formulate the loss function of DSFA so that the parameters of deep networks could be solved based on gradient-based optimization algorithms.

III-A Formulation

Mathematically, DSFA is defined as follows: Assuming the original bi-temporal remote sensing images are $X,Y\in\mathds{R}^{m\times n}$ , where $m$ and $n$ respectively denote the number of feature bands and pixels. For clarity, let $h_{i}$ denotes the number of nodes of the $i-th$ hidden layer of the networks, and $o$ is the number of nodes of the output layer. Given an instance $X$ , the output of the first hidden layer could be formulated as

[TABLE]

where $w_{1}^{1}\in\mathds{R}^{h_{1}\times m}$ and $b_{1}^{1}\in\mathds{R}^{h_{1}}$ denote the weight matrix and the bias vector, respectively. And $s(\cdot)$ represents the activation function. The output of the subsequent layers is calculated in the same way. For a network with $l$ hidden layers, the output of the last hidden layer is $f^{1}_{l}(X)=s(w_{l}^{1}f^{1}_{l-1}(X)+b_{l}^{1}$ ), where $w_{1}^{l}\in\mathds{R}^{h_{l}\times h_{l-1}}$ and $b_{1}^{l}\in\mathds{R}^{h_{l}}$ . After that, $f^{1}_{l}(X)$ will be mapped by the output layer.

Finally, the final transformed feature of this network is

[TABLE]

where $w_{o}^{1}\in\mathds{R}^{o\times h_{l}}$ and $b_{o}^{1}\in\mathds{R}^{o}$ are the weight matrix and bias vector, respectively. And $\theta_{1}$ is the set of all the parameters in the network, including ${w_{1}^{1},\cdots,w_{l}^{1},w_{o}^{1}}$ and ${b_{1}^{1},\cdots,b_{l}^{1},b_{o}^{1}}$ . And for another instance $Y$ , $Y_{\phi}$ has a symmetric expression and meaning.

[TABLE]

When the original given data is mapped into a new high dimensional feature space by deep networks, let $\hat{X}_{\phi}=X_{\phi}-\frac{1}{n}\mathbf{1}X_{\phi}$ and $\hat{Y}_{\phi}=Y_{\phi}-\frac{1}{n}\mathbf{1}Y_{\phi}$ denote the centralized $X_{\phi}$ and $Y_{\phi}$ , respectively, where $\mathbf{1}\in\mathds{R}^{o\times o}$ is a matrix whose elements are all 1. Then the covariance matrix of transformed data will be calculated.

[TABLE]

where $I$ denotes the identity matrix and $r$ is a regularization constant. Assume that $r>0$ , so that $\Sigma_{XX}$ and $\Sigma_{YY}$ are both positive definite and invertible. Therefore, in DSFA problem, the generalized eigenvalue problem to be solved is formulated as:

[TABLE]

where $A_{\phi}=\Sigma_{XY}$ and $B_{\phi}=\frac{1}{2}(\Sigma_{XX}+\Sigma_{YY})$ . According to $(22-24)$ , the final form of this problem is

[TABLE]

Based on SFA theory, the most invariant component has the smallest eigenvalue. Thus, the objective of DSFA could be designed as minimizing the total square of all eigenvalues, so that the variance of unchanged pixels can be suppressed and changed pixels are much easier to be detected. The loss function of DSFA then could be formulated as follows:

[TABLE]

where $tr(\cdot)$ denotes the trace of a matrix. Utilizing (27), the loss value of DSFA could be calculated and the parameters of networks $\theta_{1}$ and $\theta_{2}$ can be obtained with gradient-based optimization algorithm.

III-B Optimization

To calculate the gradient of $\mathcal{L}(\theta_{1},\theta_{2})$ with respect to all the $w_{l}^{v}$ and $b_{l}^{v}$ , we could use the back-propagation algorithm, which requires computing the gradient of $\mathcal{L}(\theta_{1},\theta_{2})$ with respect to $\hat{X}_{\phi}$ and $\hat{Y}_{\phi}$ .

According to the reference [43], and using the fact that $A_{\phi}$ and $B_{\phi}$ are both symmetric, we could then have:

[TABLE]

Utilizing the derivation in [42], we could have the gradient of $A_{\phi}$ with respect to each element of $\hat{X}_{\phi}$ :

[TABLE]

where $\xi_{(e)}$ represents the indicator function. If $e$ is true, then $\xi_{(e)}=1$ , otherwise $\xi_{(e)}=0$ . Similarly, the gradient of $B_{\phi}$ with respect to each element of $\hat{X}_{\phi}$ is computed as follows:

[TABLE]

Integrating (28)-(31), the gradient of $\mathcal{L}(\theta_{1},\theta_{2})$ with respect to $\hat{X}_{\phi}^{ij}$ is:

[TABLE]

The derivation process isn’t straight and its details are presented in Appendix A. Finally, it’s obvious that the gradient of $\mathcal{L}(\theta_{1},\theta_{2})$ with respect to $\hat{X}_{\phi}$ could be computed as:

[TABLE]

And for another instance $Y_{\phi}$ , the expression of $\mathcal{L}(\theta_{1},\theta_{2})/\partial Y_{\phi}$ is symmetric. We then could utilize Gradient Descent algorithms to minimize the loss to obtain the parameters of deep network module of DSFA.

According to loss function, the objective of DSFA is projecting the difference of pairwise pixels into an invariant difference feature space. Therefore, if we utilize unchanged pairwise pixels as training samples, the learned non-linear projection of deep network will have better performance in extracting the invariant components. However, in practice, priori labeled information in change detection is always hard to get. To select unchanged pairwise pixels for training process, in this paper, we use the CVA method to make a pre-detection. In this process, CVA and Kmeans method are employed to obtain the difference map and the binary change map of input multi-temporal imagery, respectively. Training samples are then randomly selected from the detected unchanged areas.

After obtained the training set and trained the network, the original data will be passed through the deep network to get the transformed features $X_{\phi}$ and $Y_{\phi}$ . Then, the generalized eigenvalue problem will be solved to obtain the projecting matrix $w_{\phi}$ and the difference between mapped features is calculated as follows:

[TABLE]

Then the change intensity of bi-temporal images could be calculated. In order to eliminate the differences in the scale of each feature bands, in this paper, we use chi-square distance to measure the intensity of changes, which is calculated as

[TABLE]

In (35), $m$ is the number of feature bands, and $\sigma^{2}$ is variance of each bands obtained by statistically analyzing. Threshold algorithms, such as OSTU method and Kmeans method, are then employed to get the final binary change map. The whole detailed process of training and generating binary change map for DSFA is summarized in Algorithm 1.

IV Experiment

To evaluate the performance of DSFA, in this section, we implement DSFA on TensorFlow and perform experiments on three multi-temporal remote sensing image datasets. Datasets used in our experiment include two Enhanced Thematic Mapper (ETM) datasets and a public hyperspectral change detection dataset. The first one is Taizhou dataset, covering the city of Taizhou, China, acquired in 2000 and 2003. And the second is Nanjing dataset, which are respectively acquired in 2000 and 2002. Both datasets were obtained by the Landsat 7 Enhanced Thematic Mapper Plus (ETM+) sensor with a spatial resolution of 30 m. And 6 spectral bands (1-5 and 7) are selected for our experiments.The band 6 has a spatial resolution of 60m, so it’s dropped and not used in our experiments. The third dataset is River dataset 111Avaliable: http://crabwq.github.io/, and consists of two hyperspectral images with a size of $463\times 241$ , which are respectively obtained in May, 2013 and December, 2013, Jiangsu Province, China. Each image in this dataset contains 198 spectral bands after noisy bands removal.

IV-A Experiment settings

In the DSFA model, the weight and bias matrices of each layer are initialized randomly, and need to be optimized. The other values, including the number of layers and nodes in each view and the DSFA regularization parameter in (22-24) are hyperparameters. As for the DSFA regularization parameter, we tuned it over the range $[10^{-8},10^{-1}]$ , and eventually selected $10^{-4}$ as the value for our proposed model. The influence of the regularization parameter $r$ is discussed in the Section V.

Some other conventional and SFA-based change detection algorithms are also implemented for comparison, including CVA, PCA [13], MAD [14], IRMAD [44], USFA [28], ISFA [28], PCANet [45] and SDPCANet [46]. All of them are unsupervised algorithms. Before calculating the difference map, PCA uses Principal Component Analysis method to project original data into a new lower dimensional feature space. MAD is a change detection method based on the established theory Canonical Correlation Analysis (CCA), which is firstly proposed in [47]. It utilizes CCA to maximize the correlation between the features of multi-temporal images. IRMAD is an iteratively weighted extension of MAD. It firstly calculates the original MAD variates. And in the following iterations, it applies different weights to each pixels or regions to emphasize the changed parts of images. USFA and ISFA are proposed in [28]. Based on the SFA theory, USFA computes a projecting matrix to suppress the unaltered components of input data to highlight changed components. And ISFA is an iteratively weighted extension of USFA, and has the same way to calculate weights as IRMAD. PCANet method firstly takes gabor wavelets and fuzzy c-means as the pre-detection method to select the training samples. Then, a PCANet [48] model is trained with the image patches centered at the interested pixels. Finally, the change map is obtained by classifying the remain patches with the trained model. SDPCANet developed PCANet by using a context-aware saliency detection method [49] to select more robust and confident training samples in the pre-detection process.

For all these algorithms, we choose all of the output feature bands to calculate the change intensity.

IV-B Experiments on Taizhou ETM dataset

The study area of the first dataset is Taizhou city, Jiangsu Province, China. The image size is $400\times 400$ . Figure 3 shows the pseudo color and ground truth images of this dataset. (a) and (b) are the pseudo color images acquired in 2002 and 2003, respectively. And (c) is the sampled ground truth image of changed and unchanged regions of Taizhou city, where the green pixels represent the unchanged regions, red pixels represent changed regions.The background of image (c) is the gray scale image of (a), and they denote the unsampled regions. The changed area contains 4227 pixels, and unchanged area contains 17163 pixels.

In the experiment of DSFA on Taizhou dataset, 4000 pixels, which are about 2.5% of the total number of pixels, are randomly selected from the unchanged region of CVA pre-detected image for training to get the parameters of the networks and the projecting matrix of SFA. Due to the use of random initialization, for DSFA, we take the sum of change intensity of 10 independent runs as the final change intensity map, and the presented values of evaluation criteria are the results of the summed intensity map.

Figure 4 shows the change intensity maps of Taizhou dataset by (a) CVA, (b) PCA, (c) MAD, (d) IRMAD, (e) USFA, (f) ISFA, (g) DSFA-64-2, (h) DSFA-128-2, and (i) DSFA-256-2. Since PCANet and SDPCANet are both classification-based methods, there’re no intensity maps of them. DSFA- $h$ - $l$ refers to a DSFA model with $l$ hidden layers and each hidden layer has $h$ nodes. All of these change intensity maps are calculated with all the output feature bands. In this figure, brighter regions have bigger change probabilities. As the Figure 4 shows, visually, PCA, ISFA and DSFA-128-2 have the best performance in differentiating the changed and unchanged pixels. The unchanged regions of MAD and IRMAD are grey, which means they could not suppress the unchanged background from changed pixels very well. Similarly, CVA and USFA have bad performance in extracting changed pixels from unchanged background. As for other DSFA-based methods, DSFA-64-2 and DSFA-256-2, they have a moderate performance in change intensity map among all these methods. Though DSFA-based methods visually have some noise points, actually, these noise points probably represent truly changed pixels of unsampled region.

In Table I, we present the accurate evaluation of binary change results segmented by OTSU method. PCANet and SDPCANet are both classification-based methods, so their results presented here are their classification results and needn’t to be processed by OTSU. The evaluation criteria include the overall accuracy of sampled changed area (OA_CHG), the overall accuracy of sampled unchanged area (OA_UN), the overall accuracy of all sampled regions (OA), Kappa coefficient, and F1 score. The best values of each evaluation criteria are highlighted with bold.

As the Table I shows, SDPCANet and PCANet have the best performance on OA_CHG and OA_UN, respectively. On detecting unchanged pixels, IRMAD has the second worst performance. On the contrary, ISFA performs bad on detection changed regions. And it is worth noting that DSFA-128-2 outperforms the other algorithms on OA, which indicates that it has a higher accuracy in both changed and unchanged part of remote sensing images. And other DSFA-based methods also have very good performance on OA, especially compared with USFA and ISFA. Besides, on Kappa coefficient, all DSFA-based methods have better performance than USFA and ISFA. The Kappa coefficient and F1 score of DSFA-128-2 are respectively 0.9227 and 0.9372, which are also much better than the other change detection methods. Considering the total detection accuracy of all changed and unchanged pixels, Kappa coefficient, and F1 score, DSFA-128-2 is the best method, and SDPCANet is the second best method and only slightly worse than DSFA-128-2.

The change detection results obtained by Kmeans method are presented in Table II. PCANet and SDPCANet’s results presented here also needn’t to be processed by Kmeans. As we can see from this table, all of these methods don’t show obvious differences in performance when using different threshold algorithms. And this suggests that these methods, including our proposed DSFA-based algorithms, are robust to different threshold methods. The results in Table II are very similar to those in Table I. SDPCANet has the best performance on OA_CHG, but shows lower accuracy on OA_UN. On the contrary, PCANet is the best method in detecting unchanged regions, but has low accuracy in detecting changed pixels. For both changed and unchanged regions, DSFA-128-2 has a detection accuracy of 97.64%, which is still the highest among all methods. DSFA-128-2 also has the highest Kappa coefficient and F1 score. Generally, all of DSFA-based algorithms have pretty good performance. And among all the methods, DSFA-128-2 is still the best one.

In Table III, we present the best change detection results of Taizhou dataset by traversing of all thresholds. Since SDPCANet ans PCANet don’t need to be post-porocessed by threshold methods, their presented results are still based on their classification results. In this table, we could see that all DSFA-based methods could outperform the other algorithms exclude CVA and ISFA. And among all DSFA-based methods, DSFA-128-2 has best performance in all evaluation criteria. ISFA has almost the same performance with DSFA-128-2. Besides, it’s worth noting that the best change detection results of USFA and ISFA are much better than those obtained with OTSU and Kmeans method, while DSFA-based methods’ best results are very close to those using OTSU and Kmeans method. We can conclude that though the best results of ISFA are very close to DSFA, the latter has much better discriminability than the former.

In Figure 5, we present the binary change maps obtained by OTSU method of (a) CVA, (b) PCA, (c) MAD, (d) IRMAD, (e) USFA, (f) ISFA, (g) PCANet, (h) SDPCANet, (i) DSFA-64-2, (j) DSFA-128-2 and (k) DSFA-256-2. In this figure, green, red, white, and purple regions represent unchanged pixels that are detected as unchanged, changed pixels that are detected as changed, changed pixels that are detected as unchanged, and unchanged pixels that are detected as changed, respectively. And we could refer them as true negative, true positive, false negative, and false positive samples. As Figure 5 presents, intuitively, DSFA-128-2 have the best performance. And compared with DSFA-128-2, the results of MAD-based methods have more false positive pixels than other algorithms. CVA, PCA and two SFA-based methods tend to classify changed pixels as unchanged. Compared with DSFA-128-2, PCANet has more false negative regions and SDPCANet has more false positive regions. The other DSFA-based methods, DSFA-64-2 and DSFA-256-2, are prone to judge some specific changed regions as unchanged.

IV-C Experiments on Nanjing ETM dataset

The second experiment is carried on the Nanjing ETM dataset. Nanjing dataset includes two 6 spectral bands remote sensing images with a size of 800 $\times$ 800, which are acquired in 2000 and 2002, respectively. Figure 6 presents the pseudo color images of Nanjing city obtained in (a) 2000, (b) 2002, and (c) is the ground truth of sampled changed and unchanged areas. The red part of (c) represents the sampled changed area of Nanjing city, which includes 2363 pixels. And the green part is the sampled unchanged area and includes 12393 pixels.

In the experiment on Nanjing dataset, we randomly select 8000 pixels from unchanged area pre-detected by CVA to train our DSFA model. Like the experiment on Taizhou dataset, the presented results of each evaluation criteria of DSFA are based on the total change intensity map of 10 runs.

Figure 7 shows the change intensity maps of Nanjing dataset by (a) CVA, (b) PCA, (c) MAD, (d) IRMAD, (e) USFA, (f) ISFA, (g) DSFA-64-2, (h) DSFA-128-2, and (i) DSFA-256-2. In this figure, brighter regions have bigger change probabilities. As we can see from this figure, USFA and ISFA have less bright area, which means that they tend to detect much less changed pixels that other change detection algorithms. And CVA, MAD and IRMAD have more bright area which indicates that thses methods are prone to categorize these unchanged pixels to changed. DSFA-128-2 and DSFA-256-2 have very close results to each other. Both them have very good discriminability of changed and unchanged pixels. In addition, the result of PCA is also very close to DSFA-64-2. But the distinction between their changed and unchanged regions is not very obvious. On the whole, visually, the result of DSFA-128-2 is the best in calculating the change intensity.

In Table IV, we present the change detection results of Nanjing dataset utilizing OTSU method. The best values of each evaluation criteria are highlighted with bold in this table. As we can see, in general, DSFA-based methods, especially DSFA-128-2, have the best performance among all these methods. DSFA-128-2 could outperform other algorithms on OA_UN, OA, Kappa coefficient and F1 score. And in these criteria, all DSFA-based methods are much better than others. MAD and IRMAD have the best performance on OA_CHG, which is consistent with their change intensity results. Similar to MAD and IRMAD, CVA and PCA have very high values on OA_CHG, but are far worse than DSFA-based methods on OA_UN, OA and Kappa coefficient. The results of PCANet and SDPCANet are also very similar to the results of PCA method. On the contrary, USFA and ISFA do well in detecting unchanged pixels, but have the lowest accuracy on OA_CHG.

Table V shows the evaluation results of the experiment on Nanjing dataset using Kmeans method. Similar to the results of OTSU, compared to MAD-based and SFA-based methods, DSFA is still better in detecting unchanged and changed areas, respectively. On the whole, DSFA-based algorithms have higher overall accuracies, Kappa values and F1 score than others. PCANet-based methods have higher OA_CHG than DSFA-based methods, but worse performances on the other criteria. In general, PCANet-based methods is the second best.

In Table VI, we present the best threshold result of each changed detection methods by traversing all values. We could see from this table that DSFA-based methods are still the best on all the criteria. PCA, IRMAD and ISFA have high values on F1 score, but are much worse on OA and Kappa than DSFA-based methods. Besides, it’s also worth noting that the best results of DSFA-based methods are very close to the results obtained by OTSU and Kmeans, which could be an evidence of the good discriminability of DSFA’s results. On the contrary, threshold results and the best results of USFA and ISFA have a sensible difference. And the best results of CVA, PCA and MAD-based methods are also much better than their threshold results in both OA and Kappa coefficient.

Figure 8 shows the binary change maps of (a) CVA, (b) PCA, (c) MAD, (d) IRMAD, (e) USFA, (f) ISFA, (g) PCANet, (h) SDPCANet, (i) DSFA-64-2, (j) DSFA-128-2 and (k) DSFA-256-2, which are segmented by OTSU method. According to this figure, we could see that the binary change result of DSFA with different net structure are almost the same. Obviously, compared with DSFA’s results, results of MAD and IRMAD have much more purple pixels, which represent the false positive samples. On the contrary, results of USFA and ISFA contain more false negative pixels, which are colored with white. The results of CVA and PCA are close to DSFA’s results, but still has less true negative and more false positive samples than the latter. Besides, PCANet and SDPCANet also have a higher false positive rate than DSFA-based methods.

IV-D Experiments on River dataset

The River dataset consists of two 198 bands images with a spatial size of $463\times 241$ . The changed regions of this dataset contain 12566 pixels, while the unchanged regions contain 99017 pixels. Figure 9 presents the bi-temporal images and ground truth map of River dataset. In Figure 9, changed regions are white and unchanged regions are black.

In Figure 10, we present the change intensity maps of our proposed methods and all control methods. PCANet and SDPCANet are based on classification, so there’s no relevant intensity maps in this figure. As can be observed from Figure 10, intuitively, all DSFA-based methods have better discriminability than CVA, PCA, and methods based on MAD and USFA. CVA, PCA and ISFA also have a better performance in separating the changed and unchanged regions than MAD, IRMAD and USFA. Visually, compared to the ground truth map, DSFA-based methods have relatively high false negative rate in the upper-right area of imagery. And other methods have brighter upper-right and lower-left area, which suggests that these methods are prone to detect these areas as changed, while most of them are unchanged actually.

We then use OTSU and Kmeans method to obtain the results with different criteria using the aforementioned methods. The obtained numerical results, along with the results of PCANet and SDPCANet, are presented in Table VII. The best value of each column is highlighted in bold in this table.

As could be observed from Table VII, DSFA-based method could achieve better performance on OA_UN, OA, Kappa and F1 score. Among all these methods, DSFA-128-2 has the best performance on OA, Kappa and F1 score, and the third best performance on OA_UN. DSFA-64-2 and SDPCANet both have the highest accuracy on OA_UN. Though PCANet have high performance on OA_CHG and F1 score, its performance on OA_UN, OA and Kappa are much worse than DSFA-based methods. In addition, it’s also worth noting that the results using Kmeans and OTSU of our proposed methods still show very slight differences, which indicates that our proposed DSFA method are robust to different threshold methods.

The best results of each method are obtained by traversing all possible thresholds, and are presented in Table VIII. DSFA-based methods still have the best performance. Specifically, DSFA methods have much better performance on OA, Kappa and F1 score than other methods. Actually, DSFA-128-2 could outperform all other methods on all criteria. DSFA-64-2 and DSFA-256-2 respectively have the second and third best OA and Kappa value, and they’re very close to DSFA-128-2 on F1 score. In addition, the best values of DSFA methods are only slightly better than the results obtained with threshold methods, which also suggests that the transformed features of DSFA have a better discriminability.

In Figure 11, the binary change maps obtained by different methods are presented. Consistent with the results in Figure 10, DSFA algorithms have lower accuracies in detecting the changes in the upper-right regions of the original images, but have much better performance in other regions. The changes in the upper-right regions are not apparent and the background is complex, which we think is the main reason of DSFA’s lower accuracy. On the contrary, CVA, PCA, MAD-based and SFA-based methods have a relatively high flase positive rate in both the upper-right regions and lower-left regions. It’s also noticed that SDPCANet also has a high flase negative rate in the upper-right region, and PCANet tends to categorize the unchanged pixels in the lower-left regions as changed. On the whole, DSFA methods have the best performance visually and numerically.

IV-E Runtime Analysis

Though our proposed DSFA is based on fully connected networks, it’s actually not very time consuming compared with other methods. We present the comparison of the runtime of IRMAD, ISFA, DSFA-128-2, PCANet and SDPCANet on three datasets in Figure 12. IRMAD and ISFA are implemented with MATLAB and run on CPU. PCANet and SDPCANet are also implemented with MATLAB but accelerated with 12 threads. DSFA-128-2 is implemented with Python and runs on CPU and GPU separately, which are respectively denoted by DSFA-CPU and DSFA-GPU in Figure 12. The CPU used is Intel Xeon E5 with a clock rate of 2.2 GHz. The GPU used is a single NVIDAI 1080Ti card.

As presents in this figure, ISFA and IRMAD are the two fastest methods, followed by DSFA-GPU and DSFA-CPU. Two PCANet-based method are the most time consuming. Besides, DSFA-GPU and DSFA-CPU are both faster than IRMAD and ISFA on River dataset, due to the smaller image size and more spectral bands of this dataset. On Taizhou and Nanjing dataset, the runtime of DSFA-GPU is very close to ISFA and IRMAD. DSFA-CPU is a little more time consuming, but it’s still acceptable considering its improvements than IRMAD and ISFA.

V Discussion

V-A Hyperparameter Analysis

In our experiments, we take $10^{-4}$ as the value of the regularization parameter $r$ in Equation (22-23). However, in fact, $r$ does not have significant influence on the final results when it’s small enough.

In Figure 13, we present the relationship curves between the final change detection accuracy and $r$ on three datasets. The network used is DSFA-128-2. It can be observed that when $r<10^{-4}$ , the accuracy curves on three dataset only have ignorable changes. On the contrary, when $r>10^{-4}$ , the accuracies are much lower because a larger $r$ may corrupt the characteristic of the covariance matrices in Equation (22-23).

V-B Selection of Training Samples

In the Figure 14, we present the final accuracies using difference training sample selection strategies. This experiment is performed on the River dataset using DSFA-128-2. In Figure 14, Negative and Ground Truth strategy respectively mean that training samples are selected from the changed and unchanged regions of the ground truth image. Random strategy means training samples are absolutely randomly selected from the original imagery. And CVA strategy denotes that the training samples are selected from the unchanged regions of the change detection results of CVA.

As shown in this figure, Negative strategy leads to a very bad result, since the learned projection from changed pixel pairs conflicts with the main idea of SFA and DSFA. Random strategy is very slightly better than CVA and Ground Truth strategy on OA_UN, but much worse on the other criteria. This because Random strategy will take quite a few changed pixel pairs as training samples, which would mislead the training process of DSFA. In addition, the results of CVA are almost the same with results of Ground Truth strategy, which indicates that DSFA with a simple pre-detection step to generate training samples could also achieve the same valid performance with using the Ground Truth. And in the field of change detection, labeling ground truth are usually hard and time consuming in both research and practical problems. Therefore, CVA is taken as the pre-detection method in our proposed algorithm.

VI Conclusion

In this paper, we proposed a novel change detection algorithm called DSFA for multi-temporal remote sensing images. In the DSFA model, two deep networks are used to project the bi-temporal original input data into a new feature space. Then, SFA is used to extract the most invariant components of unchanged pixels and suppress them in changed regions to highlight changed components. We formulated the SFA process and loss function of DSFA model, and presented the derivation of computing gradient of loss. Our proposed algorithm is unsupervised, which means it doesn’t need priori labeled pixels for the training process.

We implemented our algorithm and performed experiments on two multi-spectral datasets and a public hyperspectral dataset. The visual and quantitative results have both shown that our method could outperform the other state-of-the-art methods, including other SFA-based and deep network algorithms.

Our proposed method currently focuses on differentiating the changed and unchanged regions in bi-temporal remote sensing imagery. The future work is required to explore DSFA’s potential in detecting multi-classes changes. And in consideration of that SFA is originally designed for solving the problems of continuous signals, it will be promising to develop a specific DSFA model for change detection of sequent or video imagery.

Appendix A Derivation of Gradient of loss

Here we will present the detailed deduction process of computing the gradient of $\mathcal{L}(\theta_{1},\theta_{2})$ with respect to $\hat{X}_{\phi}$ . Based on the reference [43], we have the following equations.

[TABLE]

Based on (36) and the fact that $A_{\phi}$ and $B_{\phi}$ are both symmetric, we could obtain:

[TABLE]

Then, combining (37), $\nabla_{B}=\partial\mathcal{L}(\theta_{1},\theta_{2})/\partial B_{\phi}$ is calculated as the following equation:

[TABLE]

We could expand the expression of $A_{\phi}$ out:

[TABLE]

First, based on the derivation in the appendix of [42], we have:

[TABLE]

Also,

[TABLE]

Integrating (42) and (43) into (41):

[TABLE]

Similarly, with respect to $B_{\phi}$ , we have:

[TABLE]

Putting (44) and (45) together, the gradient of $\mathcal{L}(\theta_{1},\theta_{2})$ with respect to $\hat{X}_{\phi}^{ij}$ is then computed as:

[TABLE]

Obviously, $\nabla_{A}$ and $\nabla_{B}$ are both symmetric matrices. Therefore,

[TABLE]

Finally, we could obtain the gradient of $\mathcal{L}(\theta_{1},\theta_{2})$ with respect to $\hat{X}_{\phi}$ :

[TABLE]

Bibliography49

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Singh, “Review article digital change detection techniques using remotely-sensed data,” International journal of remote sensing , vol. 10, no. 6, pp. 989–1003, 1989.
2[2] B. Du, L. Zhang, D. Tao, and D. Zhang, “Unsupervised transfer learning for target detection from hyperspectral images,” Neurocomputing , vol. 120, pp. 72–82, 2013.
3[3] L. Zhang, L. Zhang, D. Tao, and X. Huang, “Tensor discriminative locality alignment for hyperspectral image spectral–spatial feature extraction,” IEEE Transactions on Geoscience and Remote Sensing , vol. 51, no. 1, pp. 242–256, 2013.
4[4] B. Du, L. Zhang, L. Zhang, T. Chen, and K. Wu, “A discriminative manifold learning based dimension reduction method for hyperspectral classification,” International Journal of Fuzzy Systems , vol. 14, no. 2, pp. 272–277, 2012.
5[5] G. Xian, C. Homer, and J. Fry, “Updating the 2001 national land cover database land cover classification to 2006 by using landsat imagery change detection methods,” Remote Sensing of Environment , vol. 113, no. 6, pp. 1133–1147, 2009.
6[6] G. Xian and C. Homer, “Updating the 2001 national land cover database impervious surface products to 2006 using landsat imagery change detection methods,” Remote Sensing of Environment , vol. 114, no. 8, pp. 1676–1686, 2010.
7[7] P. R. Coppin and M. E. Bauer, “Digital change detection in forest ecosystems with remote sensing imagery,” Remote sensing reviews , vol. 13, no. 3-4, pp. 207–234, 1996.
8[8] R. E. Kennedy, P. A. Townsend, J. E. Gross, W. B. Cohen, P. Bolstad, Y. Wang, and P. Adams, “Remote sensing change detection tools for natural resource managers: Understanding concepts and tradeoffs in the design of landscape monitoring projects,” Remote sensing of environment , vol. 113, no. 7, pp. 1382–1396, 2009.