Locality and Structure Regularized Low Rank Representation for   Hyperspectral Image Classification

Qi Wang; Xiange He; Xuelong Li

arXiv:1905.02488·cs.CV·May 8, 2019

Locality and Structure Regularized Low Rank Representation for Hyperspectral Image Classification

Qi Wang, Xiange He, Xuelong Li

PDF

TL;DR

This paper introduces a novel regularized low rank representation model that incorporates local geometric structure and spatial-spectral features to improve hyperspectral image classification accuracy.

Contribution

The paper proposes the LSLRR model with locality and structure regularization, enhancing classical LRR for better segmentation and discrimination in hyperspectral data.

Findings

01

LSLRR outperforms state-of-the-art methods on three HSI datasets.

02

The model effectively exploits local and global data structures.

03

Incorporating spatial-spectral features improves classification accuracy.

Abstract

Hyperspectral image (HSI) classification, which aims to assign an accurate label for hyperspectral pixels, has drawn great interest in recent years. Although low rank representation (LRR) has been used to classify HSI, its ability to segment each class from the whole HSI data has not been exploited fully yet. LRR has a good capacity to capture the underlying lowdimensional subspaces embedded in original data. However, there are still two drawbacks for LRR. First, LRR does not consider the local geometric structure within data, which makes the local correlation among neighboring data easily ignored. Second, the representation obtained by solving LRR is not discriminative enough to separate different data. In this paper, a novel locality and structure regularized low rank representation (LSLRR) model is proposed for HSI classification. To overcome the above limitations, we present…

Tables4

Table 1. TABLE I: Classification accuracy (%) of different comparison methods and the proposed lslrr for indian pines dataset

Class	SVM	SVMCK	JRSRC	cdSRC	LRR	LGIDL	LSLRR
1	87.80	70.73	58.54	85.37	19.15	63.41	100
2	78.29	88.17	92.68	91.05	63.04	96.26	95.69
3	64.66	88.35	95.18	91.16	56.36	91.97	94.25
4	77.46	91.08	92.96	94.84	21.13	87.32	98.05
5	91.72	88.74	88.51	92.18	75.17	90.80	94.42
6	97.41	97.72	87.21	99.39	86.15	99.09	98.93
7	64.02	100	72.00	100	47.97	84.01	100
8	98.14	98.14	99.07	100	79.53	96.98	100
9	33.33	38.89	33.33	50.00	11.11	38.89	27.78
10	70.15	88.23	84.91	89.83	72.11	90.17	90.37
11	83.52	96.06	97.56	95.97	83.88	95.74	95.79
12	66.88	81.84	82.02	85.39	37.45	90.26	95.51
13	95.65	90.76	88.04	94.57	80.43	94.02	96.20
14	94.82	98.86	95.69	98.95	90.86	99.21	99.56
15	59.37	81.27	95.10	82.42	24.50	94.24	100
16	94.05	89.29	80.95	94.05	17.86	89.29	97.42
OA	81.67	91.93	92.36	93.61	70.47	94.52	95.63
AA	78.60	86.76	83.99	90.32	54.19	87.60	92.74
kappa	0.7902	0.9076	0.9124	0.9270	0.6545	0.9374	0.9512

Table 2. TABLE II: Classification accuracy (%) of different comparison methods and the proposed lslrr for pavia university dataset

Class	SVM	SVMCK	JRSRC	cdSRC	LRR	LGIDL	LSLRR
1	94.11	96.02	95.62	96.57	88.46	96.81	97.33
2	96.94	99.63	99.26	99.44	97.06	99.79	99.98
3	81.44	82.40	88.82	89.27	72.67	89.22	91.98
4	94.37	97.32	91.69	93.27	74.41	98.18	98.73
5	99.30	97.03	99.84	99.92	68.47	100	100
6	86.73	95.63	94.91	96.19	67.54	99.35	99.90
7	86.30	89.47	87.89	92.64	80.36	94.70	96.52
8	84.02	91.14	92.68	94.08	82.68	92.17	94.97
9	99.89	98.00	99.59	99.89	94.56	98.89	99.11
OA	93.05	95.86	96.24	97.02	86.73	97.81	98.52
AA	91.46	94.07	94.47	95.70	80.69	96.57	97.61
kappa	0.9078	0.9524	0.9499	0.9605	0.8186	0.9710	0.9804

Table 3. TABLE III: Classification accuracy (%) of different comparison methods and the proposed lslrr for salinas dataset

Class	SVM	SVMCK	JRSRC	cdSRC	LRR	LGIDL	LSLRR
1	99.32	98.59	98.48	88.24	96.12	99.53	99.69
2	99.87	98.14	98.81	99.86	96.07	99.46	99.95
3	99.52	99.04	99.25	91.21	94.25	99.73	99.73
4	98.79	99.40	98.04	87.31	92.37	99.09	99.17
5	97.76	97.41	97.44	99.65	93.87	98.86	99.06
6	99.67	98.94	98.75	99.79	94.42	99.60	99.79
7	99.57	98.44	99.26	98.71	96.62	99.50	99.67
8	88.35	92.82	93.59	97.64	82.73	93.99	94.27
9	99.86	99.02	99.44	99.63	97.88	99.41	99.92
10	95.41	93.38	94.84	96.08	87.32	97.85	99.10
11	96.45	92.32	95.86	78.23	79.70	96.75	99.21
12	99.67	99.73	100	97.87	94.81	99.95	100
13	97.74	96.21	94.48	93.68	92.64	97.82	98.01
14	96.95	92.03	94.39	91.73	72.44	92.62	94.09
15	68.79	88.94	88.20	96.09	61.87	89.04	95.10
16	99.30	96.45	96.97	77.73	86.49	98.66	99.42
OA	92.64	95.46	95.84	96.13	86.81	96.58	97.77
AA	96.06	96.30	96.74	93.34	88.72	97.62	98.51
kappa	0.9179	0.9494	0.9536	0.9524	0.8520	0.9619	0.9752

Table 4. TABLE IV: Running time of different HSI classification methods

Methods	OA(%)	AA(%)	Kappa	Time(s)
SVM	81.67	78.60	0.7902	4.23
SVMCK	91.93	86.76	0.9076	6.17
JRSRC	92.36	83.99	0.9124	328.62
cdSRC	93.61	90.32	0.9270	118.86
LRR	70.47	54.19	0.6545	242.37
LGIDL	94.52	87.60	0.9374	382.13
LSLRR	95.63	92.74	0.9512	336.25

Equations46

Z, E min r ank (Z) + λ ∥ E ∥_{0} s . t . Y = A Z + E,

Z, E min r ank (Z) + λ ∥ E ∥_{0} s . t . Y = A Z + E,

Z, E min ∥ Z ∥_{*} + λ ∥ E ∥_{1} s . t . Y = A Z + E,

Z, E min ∥ Z ∥_{*} + λ ∥ E ∥_{1} s . t . Y = A Z + E,

Z, E min ∥ Z ∥_{*} + λ ∥ E ∥_{2, 1} s . t . Y = A Z + E,

Z, E min ∥ Z ∥_{*} + λ ∥ E ∥_{2, 1} s . t . Y = A Z + E,

M_{ij} = ∥ x_{i} - x_{j} ∥_{2}^{2} + ∥ l_{i} - l_{j} ∥_{2}^{2},

M_{ij} = ∥ x_{i} - x_{j} ∥_{2}^{2} + ∥ l_{i} - l_{j} ∥_{2}^{2},

M_{ij} = ∥ x_{i} - x_{j} ∥_{2}^{2} + m ∥ l_{i} - l_{j} ∥_{2}^{2},

M_{ij} = ∥ x_{i} - x_{j} ∥_{2}^{2} + m ∥ l_{i} - l_{j} ∥_{2}^{2},

i, j \sum M_{ij} ∣ Z_{ij} ∣ = ∥ M \circ Z ∥_{1},

i, j \sum M_{ij} ∣ Z_{ij} ∣ = ∥ M \circ Z ∥_{1},

Z, E min ∥ Z ∥_{*} + λ ∥ E ∥_{2, 1} + α ∥ M \circ Z ∥_{1} s . t . Y = A Z + E, Z \geq 0.

Z, E min ∥ Z ∥_{*} + λ ∥ E ∥_{2, 1} + α ∥ M \circ Z ∥_{1} s . t . Y = A Z + E, Z \geq 0.

Z = Z_{1}^{*} 000 0 Z_{2}^{*} 00 00 ⋱ 0 000 Z_{c}^{*},

Z = Z_{1}^{*} 000 0 Z_{2}^{*} 00 00 ⋱ 0 000 Z_{c}^{*},

\hat{Q}_{ij} = e x p (- \frac{∥ x _{i} - x _{j} ∥ _{2}^{2} + m ∥ l _{i} - l _{j} ∥ _{2}^{2}}{σ}),

\hat{Q}_{ij} = e x p (- \frac{∥ x _{i} - x _{j} ∥ _{2}^{2} + m ∥ l _{i} - l _{j} ∥ _{2}^{2}}{σ}),

Z, E min ∥ Z ∥_{*} + λ ∥ E ∥_{2, 1} + α ∥ M \circ Z ∥_{1} + β ∥ Z - Q ∥_{F}^{2} s . t . X = \overset{ˉ}{X} Z + E, 1_{m}^{T} Z = 1_{m + n}^{T}, Z \geq 0,

Z, E min ∥ Z ∥_{*} + λ ∥ E ∥_{2, 1} + α ∥ M \circ Z ∥_{1} + β ∥ Z - Q ∥_{F}^{2} s . t . X = \overset{ˉ}{X} Z + E, 1_{m}^{T} Z = 1_{m + n}^{T}, Z \geq 0,

Z, E, D min ∥ Z ∥_{*} + λ ∥ E ∥_{2, 1} + α ∥ M \circ Z ∥_{1} + β ∥ Z - Q ∥_{F}^{2} s . t . X = D Z + E, 1_{m}^{T} Z = 1_{m + n}^{T}, Z \geq 0,

Z, E, D min ∥ Z ∥_{*} + λ ∥ E ∥_{2, 1} + α ∥ M \circ Z ∥_{1} + β ∥ Z - Q ∥_{F}^{2} s . t . X = D Z + E, 1_{m}^{T} Z = 1_{m + n}^{T}, Z \geq 0,

l ab e l (x_{j}) = l = 1, ..., c ar g max S_{l} (\overset{z}{^}_{j}) .

l ab e l (x_{j}) = l = 1, ..., c ar g max S_{l} (\overset{z}{^}_{j}) .

H, J, Z, E, D min ∥ Z ∥_{*} + λ ∥ E ∥_{2, 1} + α ∥ M \circ Z ∥_{1} + β ∥ Z - Q ∥_{F}^{2} s . t . X = D Z + E, Z = J, H = Z, 1_{m}^{T} Z = 1_{m + n}^{T}, Z \geq 0.

H, J, Z, E, D min ∥ Z ∥_{*} + λ ∥ E ∥_{2, 1} + α ∥ M \circ Z ∥_{1} + β ∥ Z - Q ∥_{F}^{2} s . t . X = D Z + E, Z = J, H = Z, 1_{m}^{T} Z = 1_{m + n}^{T}, Z \geq 0.

H \geq 0, J, Z, E, D min ∥ J ∥_{*} + λ ∥ E ∥_{2, 1} + α ∥ M \circ H ∥_{1} + β ∥ Z - Q ∥_{F}^{2} + < Y_{1}, X - D Z - E > + < Y_{2}, Z - J > + < Y_{3}, H - Z > + < Y_{4}, 1_{m}^{T} Z - 1_{m + n}^{T} > + \frac{μ}{2} (∥ X - D Z - E ∥_{F}^{2} + ∥ Z - J ∥_{F}^{2} + ∥ H - Z ∥_{F}^{2} + ∥ 1_{m}^{T} Z - 1_{m + n}^{T} ∥_{F}^{2}),

H \geq 0, J, Z, E, D min ∥ J ∥_{*} + λ ∥ E ∥_{2, 1} + α ∥ M \circ H ∥_{1} + β ∥ Z - Q ∥_{F}^{2} + < Y_{1}, X - D Z - E > + < Y_{2}, Z - J > + < Y_{3}, H - Z > + < Y_{4}, 1_{m}^{T} Z - 1_{m + n}^{T} > + \frac{μ}{2} (∥ X - D Z - E ∥_{F}^{2} + ∥ Z - J ∥_{F}^{2} + ∥ H - Z ∥_{F}^{2} + ∥ 1_{m}^{T} Z - 1_{m + n}^{T} ∥_{F}^{2}),

H^{k + 1} = ar g min_{H \geq 0} \frac{α}{μ ^{k}} ∥ M \circ H^{k} ∥_{1} + \frac{1}{2} ∥ H^{k} - Z^{k} + \frac{Y _{3}^{k}}{μ ^{k}} ∥_{F}^{2} .

H^{k + 1} = ar g min_{H \geq 0} \frac{α}{μ ^{k}} ∥ M \circ H^{k} ∥_{1} + \frac{1}{2} ∥ H^{k} - Z^{k} + \frac{Y _{3}^{k}}{μ ^{k}} ∥_{F}^{2} .

H_{ij}^{k + 1} = ma x [0, Θ_{w_{ij}} (Z_{ij}^{k} - \frac{Y _{3, ij}^{k}}{μ ^{k}})],

H_{ij}^{k + 1} = ma x [0, Θ_{w_{ij}} (Z_{ij}^{k} - \frac{Y _{3, ij}^{k}}{μ ^{k}})],

J^{k + 1} = ar g min_{J} \frac{1}{μ ^{k}} ∥ J^{k} ∥_{*} + \frac{1}{2} ∥ Z^{k} - J^{k} + \frac{Y _{2}^{k}}{μ ^{k}} ∥_{F}^{2} = U S_{1/ μ^{k}} (Σ) V^{T},

J^{k + 1} = ar g min_{J} \frac{1}{μ ^{k}} ∥ J^{k} ∥_{*} + \frac{1}{2} ∥ Z^{k} - J^{k} + \frac{Y _{2}^{k}}{μ ^{k}} ∥_{F}^{2} = U S_{1/ μ^{k}} (Σ) V^{T},

Z^{k + 1} = ar g min_{Z} β ∥ Z^{k} - Q ∥_{F}^{2} + \frac{μ ^{k}}{2} ∥ Z^{k} - J^{k} + \frac{Y _{2}^{k}}{μ ^{k}} ∥_{F}^{2} + \frac{μ ^{k}}{2} ∥ X - D Z^{k} - E^{k} + \frac{Y _{1}^{k}}{μ ^{k}} ∥_{F}^{2} + \frac{μ ^{k}}{2} ∥ H^{k} - Z^{k} + \frac{Y _{3}^{k}}{μ ^{k}} ∥_{F}^{2} + \frac{μ ^{k}}{2} ∥ 1_{m}^{T} Z^{k} - 1_{m + n}^{T} + \frac{Y _{4}^{k}}{μ ^{k}} ∥_{F}^{2} .

Z^{k + 1} = ar g min_{Z} β ∥ Z^{k} - Q ∥_{F}^{2} + \frac{μ ^{k}}{2} ∥ Z^{k} - J^{k} + \frac{Y _{2}^{k}}{μ ^{k}} ∥_{F}^{2} + \frac{μ ^{k}}{2} ∥ X - D Z^{k} - E^{k} + \frac{Y _{1}^{k}}{μ ^{k}} ∥_{F}^{2} + \frac{μ ^{k}}{2} ∥ H^{k} - Z^{k} + \frac{Y _{3}^{k}}{μ ^{k}} ∥_{F}^{2} + \frac{μ ^{k}}{2} ∥ 1_{m}^{T} Z^{k} - 1_{m + n}^{T} + \frac{Y _{4}^{k}}{μ ^{k}} ∥_{F}^{2} .

Z^{k + 1} = [W^{k}]^{- 1} [2 β Q^{k} + μ^{k} (D^{T} A^{k} + B^{k} + C^{k} + 1_{m} F^{k})],

Z^{k + 1} = [W^{k}]^{- 1} [2 β Q^{k} + μ^{k} (D^{T} A^{k} + B^{k} + C^{k} + 1_{m} F^{k})],

E^{k + 1} = ar g min_{E} \frac{λ}{μ ^{k}} ∥ E^{k} ∥_{2, 1} + \frac{1}{2} ∥ X - D Z^{k} - E^{k} + \frac{Y _{1}^{k}}{μ ^{k}} ∥_{F}^{2} .

E^{k + 1} = ar g min_{E} \frac{λ}{μ ^{k}} ∥ E^{k} ∥_{2, 1} + \frac{1}{2} ∥ X - D Z^{k} - E^{k} + \frac{Y _{1}^{k}}{μ ^{k}} ∥_{F}^{2} .

E^{k + 1} (:, i) = {\frac{∣∣ g _{i} ∣ ∣ _{2} - \frac{λ}{μ ^{k}}}{∣∣ g _{i} ∣ ∣ _{2}} g_{i}, i f \frac{λ}{μ ^{k}} < ∣∣ g_{i} ∣ ∣_{2}, 0, o t h er w i se .

E^{k + 1} (:, i) = {\frac{∣∣ g _{i} ∣ ∣ _{2} - \frac{λ}{μ ^{k}}}{∣∣ g _{i} ∣ ∣ _{2}} g_{i}, i f \frac{λ}{μ ^{k}} < ∣∣ g_{i} ∣ ∣_{2}, 0, o t h er w i se .

D^{k + 1} = ar g min_{D} \frac{μ ^{k}}{2} ∥ X - D^{k} Z^{k} - E^{k} + \frac{Y _{1}^{k}}{μ ^{k}} ∥_{F}^{2} .

D^{k + 1} = ar g min_{D} \frac{μ ^{k}}{2} ∥ X - D^{k} Z^{k} - E^{k} + \frac{Y _{1}^{k}}{μ ^{k}} ∥_{F}^{2} .

D^{k + 1} = w D^{k} + (1 - w) D^{n e w},

D^{k + 1} = w D^{k} + (1 - w) D^{n e w},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Locality and Structure Regularized Low Rank Representation for Hyperspectral Image Classification

Qi Wang, Xiang He, and Xuelong Li This work was supported by the National Key R&D Program of China under Grant 2017YFB1002202, National Natural Science Foundation of China under Grant 61773316, Natural Science Foundation of Shaanxi Province under Grant 2018KJXX-024, Fundamental Research Funds for the Central Universities under Grant 3102017AX010, and the Open Research Fund of Key Laboratory of Spectral Imaging Technology, Chinese Academy of Sciences.Q. Wang is with the School of Computer Science, with the Center for Optical Imagery Analysis and Learning (OPTIMAL) and with the Unmanned System Research Institute (USRI), Northwestern Polytechnical University, Xi’an 710072, Shaanxi, China (e-mail: [email protected]).X. He is with the School of Computer Science and the Center for Optical Imagery Analysis and Learning, Northwestern Polytechnical University, Xi’an 710072, Shaanxi, China (e-mail: [email protected]).X. Li is with the Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, Shaanxi, P. R. China and with the University of Chinese Academy of Sciences, Beijing 100049, P. R. China (e-mail: [email protected]).

Abstract

Hyperspectral image (HSI) classification, which aims to assign an accurate label for hyperspectral pixels, has drawn great interest in recent years. Although low rank representation (LRR) has been used to classify HSI, its ability to segment each class from the whole HSI data has not been exploited fully yet. LRR has a good capacity to capture the underlying low-dimensional subspaces embedded in original data. However, there are still two drawbacks for LRR. First, LRR does not consider the local geometric structure within data, which makes the local correlation among neighboring data easily ignored. Second, the representation obtained by solving LRR is not discriminative enough to separate different data. In this paper, a novel locality and structure regularized low rank representation (LSLRR) model is proposed for HSI classification. To overcome the above limitations, we present locality constraint criterion (LCC) and structure preserving strategy (SPS) to improve the classical LRR. Specifically, we introduce a new distance metric, which combines both spatial and spectral features, to explore the local similarity of pixels. Thus, the global and local structures of HSI data can be exploited sufficiently. Besides, we propose a structure constraint to make the representation have a near block-diagonal structure. This helps to determine the final classification labels directly. Extensive experiments have been conducted on three popular HSI datasets. And the experimental results demonstrate that the proposed LSLRR outperforms other state-of-the-art methods.

Index Terms:

Hyperspectral image classification, low rank representation, block-diagonal structure.

I Introduction

Hyperspectral images (HSIs) are acquired by hyperspectral imaging sensors from the same spatial location and different spectral wavelengths. Due to the quite small wavelength interval (usually 10 $nm$ ) between every two neighboring bands, HSI generally has a very high spectral resolution. HSI is acquired from hundreds of continuous wavelengths, including a large range from visible to infrared spectrum, so HSI is composed of a great number of spectral bands, which makes hyperspectral data contain abundant discriminative information for the observed land surface. Since HSI can reflect well the distinct property of different land materials, HSI classification [1], which is to assign the pixels of HSI a proper label, has attracted much attention over the past few decades.

Although the rich spectral information for each pixel brings a lot of help to classify hyperspectral data, there are still many challenges in HSI classification task. Due to the hundreds of spectral bands, the data of HSI has a very high dimensionality, which leads to the Hughes phenomenon [2]. In addition, it usually costs lots of time to label HSI datasets, hence most hyperspectral data has very limited training samples, which becomes another major challenge. To address the above problems, a great number of SVM-based approaches have been developed over the past years. Support Vector Machine (SVM) is a widely used classifier in most classification tasks. Since it can effectively handle the high dimensional data, SVM has achieved great success in HSI classification. SVM with composite kernel (SVMCK) [3] was proposed to construct multiple composite kernels, which integrates both spectral and spatial information to enhance the classification performance. Specifically, the weighted kernels in [3] can effectively solve the problem that HSI usually has the limited labeled samples. Besides, SVM with graph kernel (SVMGK) [4] developed a recursive graph kernel, which considered high-level spatial relationship rather than the simple pairwise relation. Besides the advantage that graph kernel is easy to compute, it can also be suitable for the small training data. However, SVM-based methods have a common drawback that their performance is easy to be influenced by parameters settings.

Motivated by recent development in subspace segmentation, low rank representation (LRR) has become an effective method for HSI classification. LRR was first proposed for subspace segmentation by Liu et al. in [5]. Due to its considerable ability to exploit the underlying low-dimensional subspace structures of given data, LRR has attracted extensive attention and achieved great success in various fields, such as face recognition [6], image classification [7], subspace clustering [8], object detection [9], etc. In particular, LRR is also applied successfully in hyperspectral image analysis [10] and obtains promising performance in the past few years. For instance, Sun et al. [11] presented a structured group low-rank prior, incorporating the spatial information, for sparse representation (SR) to classify HSI. Mei et al. [12] proposed to decompose the original hyperspectral data into low-rank intrinsic spectral signature and sparse noise to alleviate spectral variation, which degrades strongly the performance of hyperspectral analysis. However, there are still some shortcomings for common LRR. First, in spite that LRR has a great ability to capture the global structure of given data, it ignores the equally crucial local structure. This makes LRR fail to characterize the neighboring relation of each two pixels. Second, if all data are located in the union of multiple independent subspaces, the observed data with the same class should lie in the same subspace. Therefore, the ideal representation of given data would have a class-wise block-diagonal structure. Nevertheless, the traditional LRR can not obtain that structure. Third, most LRR based methods employ the whole samples as the dictionary to learn the low-rank representation. However, the dictionary has too many redundant atoms, which not only increases the computational cost, but also decreases the discriminative ability to reveal the potential property of HSI.

To tackle the aforementioned drawbacks, this paper proposes a novel locality and structure regularized low rank representation (LSLRR) for HSI classification. The main contributions are summarized as follows.

We introduce a new distance metric to measure the similarity of HSI pixels. For HSI classification, the spatial information is of great importance to acquire higher classification accuracy. The proposed measurement skillfully combines both spectral and spatial features into a unified distance metric, in which the involved parameter can be adjusted to fit different HSI datasets with different compactness of each class.
We present a novel locality constraint criterion (LCC) for LRR to further exploit the low-dimensional manifold structure of HSI. LRR can effectively capture the global structure of the given data, but the local geometry structure is also significant for most tasks. The proposed LSLRR with LCC successfully characterizes the global and local structures of HSI to explore the more reasonable representation.
An effective structure preserving strategy (SPS) is proposed to learn the more discriminative low-rank representation for HSI data. As we all know, the ideal representation of multi-class data has a class-wise block-diagonal structure. However, the original LRR hardly obtain the representation like that. Moreover, the learned representation for testing set can be used directly to classify HSI.

The reminder of this paper is organized as follows. In section II, two typical representation-based methods for hyperspectral image analysis are introduced. Then the proposed LSLRR is described in detail in section III. An optimization algorithm for solving LSLRR is derived in section IV. Besides, section V shows the extensive experimental results and corresponding analyses. Finally, we conclude this paper in section VI.

II Related Work

As we all know, low rank representation (LRR) and sparse representation (SR) are two typical representation-based approaches. This paper mainly focuses on LRR, which has achieved huge success in hyperspectral remote sensing fields [13]. Since SR has some common features with LRR, and has also attracted much attention in recent years, we will provide an overview about both SR-based and LRR-based methods for HSI classification in this section. In addition, the involved dictionary learning techniques are also introduced here.

II-A SR-based Methods

Given some data vectors, SR seeks the sparse representation based on the linear combination of atoms in dictionary. Due to its great classification performance, SR has been applied widely in hyperspectral analysis. Chen et al. [14] proposed a joint sparsity model which represented the hyperspectral pixels within a patch by the same sparse coefficients. In [15], the sparsity of HSI was exploited by a probabilistic graphical model, which can effectively capture the conditional dependences. Zhang et al. [16] developed a nonlocal weighted joint SR model, where different weights were employed to spatial neighboring pixels. In order to solve the problem that SR-based methods usually neglect the representation residuals, Li et al. [17] proposed a robust sparse representation for HSI classification, which is robust for outliers. Moreover, Li et al. [18] presented a new superpixel-level joint sparse model (JSM) for HSI classification, which explored the class-level sparsity to combine multiple-features of pixels in local regions. A spectral-spatial adaptive SR was developed for HSI compression in [19], which made use of both spectral and spatial features. And it utilized superpixel segmentation to generate adaptive homogeneous regions. Gan et al. [20] incorporated multiple types of features, which helps so much for HSI classification task, into a kernel sparse representation classifier (KSRC). In addition, Fang et al. [21] proposed a multiscale adaptive sparse representation, which effectively integrated contextual feature at multiple scales by an adaptive sparse technique. Considering that $\ell_{1}$ -based SR may obtain unstable representation results, Tang et al. [22] incorporated manifold learning into SR to exploit the local structure and get the smooth sample representation. For more detailed description, A useful survey about SR-based methods can be referred in [23].

II-B LRR-based Methods

Another popular representation-based method is LRR. Different from SR, LRR seeks the low-rank representation for given data. And most LRR-based approaches have been proposed for hyperspectral image analysis [24]. Du et al. [25] utilized the joint sparse and low rank representation to solve the abundance estimation problem for HSI. Low-rank constraint is integrated to overcome the drawback of local spectral redundancy and correlation for HSI denoising in [26]. Shi et al. [27] proposed a semi-supervised framework for HSI classification, where LRR reconstruction is employed to decrease the influence of noise and outliers and make domain adaption more robust. A novel framework combining the maximum a posteriori (MAP) and LRR, exploiting the high spectral correlation, is proposed for HSI segmentation in [28]. Considering that the underlying low-dimensional structure in HSI data is multiple subspaces rather other single subspace, Sumarsono et al. [29] adopted LRR as a preprocessing step for supervised and unsupervised classification of HSI. Most studies have demonstrated that the contextual information is very beneficial to improve the classification accuracy of HSI. Almost all state-of-the-art work, which employed LRR for HSI classification, combined both spectral and spatial features. For instance, a new low-rank structured group priori was presented to exploit the spatial information between neighboring pixels by Sun et al. in [11]. Soltani-Farani et al. [30] proposed to add the spatial characteristics by partitioning the HSI into several square patches as contextual groups. However, the fixed-size squares window neglects the difference between the pixels in the same window. He et al. [31] applied a superpixel segmentation algorithm to divide HSI into some homogeneous regions with adaptive size, which is better than fixed-size patches to utilize contextual features. In addition, a new spectral-spatial HSI classification method using $\ell_{1/2}$ regularized LRR was developed in [32], where the contextual information is efficiently incorporated into the spectral signatures by representing the spatial adjacent pixels in a low-rank form.

II-C Dictionary Learning

Since LRR can greatly exploit the global structure for the given data, it is superior to SR in some cases. Even so, one thing that LRR and SR have in common is that they both assume to describe every sample as the linear combination of some atoms in a given dictionary. And the selection of dictionary is fairly important to the performance of LRR. In general, dictionary learning methods can be roughly divided into two categories [33]: (1) learning a dictionary based on mathematical model. Many traditional models such as contourlet, wavelet, bandelet, wavelet packets, all can be used to construct an effective dictionary. (2) building a dictionary to behave well in training set. The second class of methods have brought more and more concern. The major advantage is that they can obtain great experimental results in most practical applications. These state-of-the-art methods include Optimal Directions (MOD) [34], Union of Orthobases [35], Generalized PCA (GPCA) [36], K-SVD [37] and so on. For HSI classification, some dictionary learning techniques have been proposed. Soltani-Farani et al. [30] presented a spatial-aware dictionary learning method that is to divide HSI data into some contextual neighborhoods and then model the pixels with the same group as a common subspace. Motivated by Learning Vector Quantization (LVQ), Wang et al. [38] proposed a novel dictionary learning method for the sparse representation, and modeled the spatial context by a Bayesian graph. He et al. [31] applied a joint low rank representation model in every spatial group to learn an appropriate dictionary.

III Locality and Structure Regularized Low Rank Representation (LSLRR)

In this section, we will describe the proposed LSLRR in detail. The original LRR formulas are first introduced. Then two main powerful regularization terms and dictionary learning scheme are presented. Finally, we derive an optimization algorithm to solve the objective function of LSLRR.

III-A Low Rank Representation

Low rank representation (LRR) is based on the assumption that all data are sufficiently sampled from multiple low-dimensional subspaces embedded in a high-dimensional space. [5] indicates that LRR can effectively explore the underlying low-dimensional structures for the given data. Assume that data samples $Y\in\mathbb{R}^{d\times n}$ are drawn from a union of many subspaces which are denoted as $\bigcup^{k}_{i=1}S_{k}$ , where $S_{1},S_{2},...,S_{k}$ are the low-dimensional subspaces. The LRR model aims to seek the low-rank representation $Z\in\mathbb{R}^{m\times n}$ and the sparse noises $E\in\mathbb{R}^{d\times n}$ based on the given dictionary $A\in\mathbb{R}^{d\times m}$ . Specifically, LRR is formulated as the following rank minimization problem

[TABLE]

where $A$ and $E$ are the dictionary matrix and sparse noise component, respectively. $\lVert\cdot\rVert_{0}$ is the $\ell_{0}$ norm, the number of all nonzero elements. $\lambda$ is the regularization coefficient to balance the weights of rank term and reconstruction error. It is worth noting that the only difference between SR and LRR is that SR aims to find the sparsest representation while LRR is to seek the low-rank representation. But LRR can effectively capture the global structure of data samples.

However, it is difficult to solve the non-convex problem (1) due to the discrete nature of the rank operation and $\ell_{0}$ norm. Therefore, the original minimization problem (1) needs to be relaxed in order to make it solvable. The common convex relaxation of problem (1) is presented as

[TABLE]

where $\lVert\cdot\rVert_{*}$ , defined as the sum of all singular values of $Z$ , is the nuclear norm. $\lVert\cdot\rVert_{1}$ is the $\ell_{1}$ norm, i.e., the sum of the absolute value of all elements. And $\lVert Z\rVert_{*}$ and $\lVert E\rVert_{1}$ are the convex envelope of $rank(Z)$ and $\lVert E\rVert_{0}$ , respectively. Then problem (2) has a nontrivial solution. In fact, the solution of problem (2) is equal to that of problem (1) in this case of free noise [8]. However, in practical applications most data are noisy, even strongly corrupted. Therefore, when a large number of data samples are grossly corrupted, a robust model [5] is presented as

[TABLE]

where $\lVert\cdot\rVert_{2,1}$ is the $\ell_{2,1}$ norm, which is defined as $\lVert E\rVert_{2,1}=\sum_{j}^{n}{\sqrt{\sum_{i}^{d}E_{i,j}^{2}}}$ . Specifically, compared to $\ell_{1}$ norm, $\ell_{2,1}$ norm expects more columns of $E$ to be zero vector, i.e., some samples are clean and others are noisy.

III-B Locality Constraint Criterion (LCC) for LSLRR

For hyperspectral image (HSI) classification, if some pixels have a neighboring relation, there is a high probability that they belong to the same class. That is, spatial similarity is a beneficial information to improve the classification accuracy of HSI. Therefore, it is very necessary to incorporate the contextual information into the classifier. Furthermore, LRR has a powerful ability to exploit the global structure of HSI data, but the local manifold structure between adjacent pixels, which is also helpful to classify HSI, is neglected by LRR. Therefore, we develop a local structure constraint, which utilizes both the spectral and spatial similarity, to improve the performance of the original LRR model.

Suppose that HSI data is denoted as $X=[x_{1},x_{2},...,x_{n}]\in\mathbb{R}^{d\times n}$ , where $d$ and $n$ are the number of spectral bands and all pixels, respectively. And $x_{i}$ denotes the spectral column vector of the i-th pixel of HSI data $X$ . Similarly, assume that the spatial feature matrix $L=[l_{1},l_{2},...,l_{n}]\in\mathbb{R}^{2\times n}$ , and $l_{i}$ denotes the position coordinate of the i-th pixel. A simple way to compute the distance matrix which combines both spectral and spatial features is formulated as

[TABLE]

where $M_{ij}$ is the distance between the i-th and j-th pixels. Note that the spectral values of $X$ and coordinate values of $L$ are normalized to a range of [0, 1]. However, the distance metric is not reasonable enough because the above spectral and spatial features are unequal and have different physical meanings. Therefore, a more accurate similarity metric between two pixels is proposed as

[TABLE]

where $m$ is a hyper-parameter for controlling the weight of spectral and spatial distance. For different HSI datasets, the compactness of each category is different. And it is more appropriate to choose a large value of $m$ for the HSI dataset with high compactness of each class. As we all know, two pixels with a larger distance should have a smaller similarity. Besides, the low-rank representation $Z$ can be viewed as the affinity matrix, in which $Z_{ij}$ denotes the similarity of the i-th and j-th samples. As such, to keep the difference between classes and the compactness within classes, the locality constraint as a penalty term for LRR is introduced as follows

[TABLE]

where $\circ$ is the Hadamard product which denotes element-wise product of two matrixs. Moreover, the locality constraint also takes the sparsity of low-rank representation matrix $Z$ into account. Because $Z$ stands for the similarity between dictionary and the original data, all elements of $Z$ should have non-negative values. Therefore, the final locality regularization term can be written as $\lVert M\circ Z\rVert_{1}$ with the constraint $Z\geq 0$ . And locality regularized low rank representation (LLRR) model can be formulated as

[TABLE]

III-C Structure Preserving Strategy (SPS) for LSLRR

Hyperspectral data $X$ is first divided into two parts, denoting $X=[\bar{X},\hat{X}]$ , where $\bar{X}$ represents the training data and $\hat{X}$ represents the testing data. Rearrange the permutation of samples according to each class that $\bar{X}=[\bar{X}_{1},\bar{X}_{2},...,\bar{X}_{c}]\in\mathbb{R}^{d\times m}$ , where $X_{i}$ is the i-th class set of training samples, and $c$ denotes the number of classes. Besides, $\hat{X}=[\hat{x}_{1},\hat{x}_{2},...,\hat{x}_{n}]\in\mathbb{R}^{d\times n}$ is the testing feature matrix, whose i-th column is the spectral vector of the i-th testing sample. In LRR model, we set the data $Y=[\bar{X},\hat{X}]$ while the dictionary $A=\bar{X}$ . So $[\bar{X},\hat{X}]=\bar{X}Z$ is obtained. Similarly, $Z$ can be written as $[\bar{Z},\hat{Z}]$ , where $\bar{Z}$ and $\hat{Z}$ are the low-rank representation for $\bar{X}$ and $\hat{X}$ under the base $\bar{X}$ , respectively.

In general LRR model, all data are used as the dictionary and each sample is considered as the atom of the dictionary, e.g. $X=XZ+E$ . When removing sparse noise $E$ , the data $X$ can be reconstructed by low-rank representation $Z$ based on the data itself. Furthermore, if data samples are permuted based on the order of classes, the ideal representation matrix $Z$ would has a class-wise block-diagonal structure as follows

[TABLE]

where $c$ is the number of classes. The proposed model $[\bar{X},\hat{X}]=\bar{X}[\bar{Z},\hat{Z}]+E$ has a similar property to the classical LRR model $X=XZ+E$ . That is, representation matrix $\bar{Z}$ and $\hat{Z}$ should also have a class-wise block-dagonal structure as the form of (8).

To make $\bar{Z}$ and $\hat{Z}$ hold the above structure, we introduce a structured auxiliary matrix $Q$ to constrain $Z$ . Firstly, $Q$ is also divided into two parts: $\bar{Q}$ and $\hat{Q}$ . We can obtain $\bar{Z}^{*}_{i}$ , $i=1,2,...,c$ , with setting $A=X_{i}$ by solving the model (LABEL:Eq:LLRR). Let $\bar{Q}=diag(\bar{Z}^{*}_{1},\bar{Z}^{*}_{2},...,\bar{Z}^{*}_{n})$ , where $diag$ is the diagonal operation. Note that this step actually utilizes the label information for the training data $\bar{X}$ . So the class-wise block-diagonal structure for $\bar{Z}$ is easy to preserve. Secondly, it’s difficult to hold the structure (8) for $\hat{Z}$ without a prior about the number of each class testing samples. As is known to us, there’re lots of zero elements in $\hat{Z}$ when it has a block-diagonal structure. In addition, we previously mention that $Z_{ij}$ represents the similarity of the i-th and j-th samples. We employ the Gaussian similarity function to generate the auxiliary matrix $\hat{Q}$ as follows

[TABLE]

where the parameter $\sigma$ is used to control the width of neighbors. If distance between the i-th training pixel and the j-th testing pixel is large enough (e.g., larger than $\theta$ , where $\theta$ is maximum distance parameter), we will set $\lVert x_{i}-x_{j}\rVert^{2}_{2}+m\lVert l_{i}-l_{j}\rVert^{2}_{2}=\infty$ . Thus, $\hat{Q}$ would has many zeros elements and $\hat{Z}$ would be a sparse matrix. Finally, $Q$ is obtained by $Q=[\bar{Q},\hat{Q}]$ . So the structure constraint can be written as $\lVert Z-Q\rVert^{2}_{F}$ , which makes the low-rank representation $\bar{Z}$ and $\hat{Z}$ have an approximatively block-diagonal structrue.

Considering that the j-th column of $Z$ represents the similarity between each training pixels and the j-th testing pixel, we enforce the sum of each column of $Z$ to be 1, i.e., $1^{T}_{m}Z=1^{T}_{m+n}$ . After incorporating the above two crucial techniques into the classical LRR model, the locality and structure regularized low rank representation can be formulated as

[TABLE]

where $1_{m}$ and $1_{m+n}$ are unit vectors with length of $m$ and $m+n$ , respectively.

III-D Dictionary Learning for LSLRR

Dictionary learning is a crucial step for most classification problems. Generally, the whole samples are usually used for the dictionary for LRR. However, when the data samples are corrupted by noise, they can not well reconstruct themselves by polluted dictionary. Besides, high-quality dictionary can improve significantly the performance of classification methods. The process of learning the low rank representation can also become easy with a compact dictionary. Here, we will learn a discriminative dictionary from the corrupted HSI data.

For the problem (LABEL:Eq:LSLRR1), the dictionary is randomly selected from HSI data, and the atoms in $\bar{X}$ are a part of the whole HSI pixels. In the solving process, the dictionary $\bar{X}$ is fixed. However, if the selected samples are not representative and discriminative, or even worse (i.e. grossly corrupted) for the whole data, the obtained low-rank representation $Z$ would be useless. Therefore, we integrate a dictionary learning process into the problem (LABEL:Eq:LSLRR1) instead of fixing some dictionary atoms. Then the final objective function can be demonstrated as

[TABLE]

where $\alpha$ and $\beta$ control the weights of locality and structure constraints, respectively. The proposed method, namely LSLRR, has a considerable ability to require the block-diagonal representation and simultaneously to learn a discriminative dictionary. In addition, Fig. 1 illustrates the proposed LSLRR. The given data is first divided into training set $\bar{X}$ and testing set $\hat{X}$ . Then the low rank representation matrix $\bar{Z}$ for training set and $\hat{Z}$ for testing set are obtained based on the dictionary $D$ . Besides, $\bar{Z}$ is a block-diagonal matrix, and $\hat{Z}$ is an approximately block-diagonal matrix.

III-E HSI Classification via LSLRR

Hyperspectral pixels belonging to the same class have a extremely similar spectral reflectance curve, which is the theoretical evidence to classify HSI. Although HSI data has a great number of bands and the dimensionality is very high, the similarity between neighboring bands is also very high. [39] indicates that many low-dimensional subspaces exist in HSI data space. Besides, Chakrabarti et al. [40] made a lot of statistical analyses based on real-world HSI data, and came to a conclusion that the rank of HSI data matrix is approximately equal to the number of classes. This implies HSI data satisfy the low-rank property. Pixels of each class have a similar position in the whole HSI space, and they make up a low-dimensional subspace. For the proposed LSLRR, it can effectively segment these subspaces embedded in HSI from both global and local aspects. Recall that $\hat{z}_{ij}$ in $\hat{Z}$ strands for the similarity of the i-th training pixel and j-th testing pixel. The larger the value of $\hat{z}_{ij}$ is, the higher the possibility of $x_{i}$ and $x_{j}$ belongs to the same class. Therefore, the final classification results can be directly obtained and it is no need to employ some complex classification algorithms. Specifically, the label of a testing pixel $x_{j}$ can be confirmed as follows. First, compute the sum of the j-th column of $\hat{Z}$ for each class. The result is denoted by $S_{l}(\hat{z}_{j})$ , $l\in[1,...,c]$ . Second, the label of $x_{j}$ , denoted by $label(x_{j})$ , is determined as

[TABLE]

IV Optimization Algorithm for Solving LSLRR

In this section, we derive an optimization algorithm to solve the LSLRR model (LABEL:Eq:LSLRR2). In recent years, a great number of algorithms [41], [42] have been developed to solve the rank minimization optimization problem. Here, we adopt the high-efficiency inexact Augmented Lagrange Multiplier (IALM) method to solve the proposed LSLRR. Firstly, we introduce two auxiliary variables $H$ and $J$ to make the problem (LABEL:Eq:LSLRR2) become easily solvable. Thus, the equivalent problem of (LABEL:Eq:LSLRR2) is converted to

[TABLE]

Then the corresponding augmented Lagrangian function for (LABEL:Eq:solve1) can be written as

[TABLE]

where $<A,B>=trace(A^{T}B)$ , $\mu>0$ is a penalty parameter and $Y_{1}$ , $Y_{2}$ , $Y_{3}$ and $Y_{4}$ are Lagrange multipliers. The alternative optimization algorithm can be applied to solve the problem (LABEL:Eq:solve2) with five optimization variables ( $H,J,Z,E,D$ ). The detailed updating schemes can be seen as follows.

Updata H: fix $J$ , $Z$ , $E$ , and $D$ , and then $H$ can be updated as follows

[TABLE]

The solution for (15) can be computed [43] by

[TABLE]

where $\Theta_{w}(x)=max(x-w,0)+min(x+w,0)$ , $w_{ij}=(\alpha/\mu^{k})M_{ij}$ .

Updata J: fix $H$ , $Z$ , $E$ , and $D$ , and then $J$ can be updated as follows

[TABLE]

where $U\Sigma V^{T}$ is the singular value decomposition (SVD) of $Z^{k}+Y^{k}_{2}/\mu^{k}$ , and $S_{\epsilon}(x)=sgn(x)max(|x|-\epsilon,0)$ is the soft-thresholding operator [5].

Updata Z: fix $H$ , $J$ , $E$ , and $D$ , and then $Z$ can be updated as follows

[TABLE]

Problem (LABEL:Eq:update_Z) is a quadratic minimization problem. And it has a closed-form solution, which can be obtained by making the derivative of (LABEL:Eq:update_Z) be zero. The optimal solution for variable $Z$ is

[TABLE]

where $A=X-E+Y_{1}/\mu$ , $B=J-Y_{2}/\mu$ , $C=H+Y_{3}/\mu$ , $F=1^{T}_{m+n}-Y_{4}/\mu$ , and $W=2\beta I+\mu(D^{T}D+2I+1_{m}1^{T}_{m})$ .

Updata E: fix $H$ , $J$ , $Z$ , and $D$ , and then $E$ can be updated as follows

[TABLE]

Denote $G=X-DZ+Y_{1}/\mu$ , then the j-th column of optimal $E$ [5] is

[TABLE]

Updata D: fix $H$ , $J$ , $Z$ , and $E$ , and then $D$ can be updated as follows

[TABLE]

Problem (22) is also a quadratic minimization problem. Here, we employ an iteration updating strategy to obtain the optimal solution of dictionary $D$ . Firstly, we initialize the dictionary $D^{0}$ by randomly selecting a part of HSI pixels. Secondly, the updating dictionary $D^{new}$ is obtained by solving the problem (22). Finally, the detailed updating rule is

[TABLE]

where $w$ is a weight parameter. For each iteration, $D^{new}=(X-E+Y^{k}_{1}/\mu^{k})Z^{T}(ZZ^{T})^{-1}$ .

Finally, the overall optimization algorithm for solving the proposed LSLRR (LABEL:Eq:LSLRR2) is described as Algorithm 1.

V Experiments and Analyses

In this section, some comprehensive experiments are conducted to prove the effectiveness of the proposed LSLRR for HSI classification. Many state-of-the-art classification algorithms are considered as the comparison methods. After the experiments, some detailed analyses are also given.

V-A Dataset Descriptions

To evaluate the classification performance of the proposed LSLRR model, three popular hyperspectral datasets are used to conduct the verification experiments. The detailed descriptions are shown as follows [44].

Indian Pines: The scene is collected by AVIRIS sensor over the most agricultural regions in the northwestern Indiana, America. And the dataset is composed of $145\times 145$ pixels with 220 spectral bands whose wavelength ranges from 0.4-2.5 $\mu m$ . After removing some noise and water-absorption bands, the remaining image has 200 spectral bands, which can be used for classification task. In addition, there are 16 classes for this dataset. 2. 2.

Pavia University: This dataset was captured by ROSIS sensor over the urban area of the University of Pavia, northern Italy, on July 8, 2002. The original dataset consists of 115 spectral bands covering 0.43-0.86 $\mu m$ , of which 12 noisy bands are removed and 103 bands are retained. The size of each band is $610\times 340$ with a spatial resolution of 1.3 meters per pixel. Nine categories of ground covering are considered for the classification experiments. 3. 3.

Salinas: The image is also gathered by AVIRIS sensor and contains the wavelength range of 0.4-2.5 $\mu m$ like the Indian Pines. It has a high spatial resolution of 3.7 meters per pixel. The covered area consists of 512 lines and 217 samples. Besides, there are 204 spectral bands after discarding some polluted bands. The number of ground category is also 16. This scene mainly consists of bare soils, vegetables, and vineyard fields.

V-B Experimental Setups

Before demonstrating the experimental results, the comparison methods, corresponding parameter settings and evaluation indexes are first introduced as follows.

1). Comparison Algorithms: To verify the superiority of the proposed LSLRR, some state-of-the-art HSI classification methods are considered. They are 1) SVM [45]; 2) SVMCK [3]; 3) JRSRC [46]; 4) cdSRC [47]; 5) LRR [5]; 6) LGIDL [31].

The above competitors can roughly be divided into three categories: SVM-based, SR-based, and LRR-based methods. To be specific, the classic Support Vector Machine (SVM) is a great classifier which has been widely applied in HSI classification. And another powerful SVM-based method, SVM with composite kernel (SVMCK), has achieved promising classification accuracy due to incorporating the contextual information into the kernels. Furthermore, we also take two SR-based classification algorithms into account. The first one is joint robust sparse representation classifier (JRSRC), which makes these pixels in neighboring regions represented jointly by some common training samples with the same sparse coefficients. An advantage for JRSRC is that it is robust to the HSI outliers. The second is class-dependent sparse representation classifier (cdSRC) , which effectively integrates the idea of KNN into SRC in a class-wise manner and characterizes both Euclidean distance and correlation information between training and testing set. Finally, these LRR-based approaches are the original LRR and LGIDL. Among them, the LGIDL employs superpixel segmentation to obtain the adaptive spatial correlation regions and yields fairly competitive performance.

2). Parameter Settings: Every method is repeated ten times to avoid the bias due to the random sampling. All free parameters of these algorithms are determined via cross validation, using training data only. For SVM-based comparison methods, we choose RBF $K(x_{i},x_{j})=exp(-\gamma\lVert x_{i}-x_{j}\rVert^{2})$ as the kernel function of SVM, and the optimal parameters $C$ and $\gamma$ are tuned by grid search algorithm. The one vs. one strategy is applied in the implementation of SVM. Specifically, the parameters of SVM are $C=2000$ , $\gamma=0.1$ for Indian Pines, $C=1500$ , $\gamma=0.08$ for Pavia University, and $C=4000$ , $\gamma=0.001$ for Salinas. For SVMCK, we select the mean spectral values of square patches as the spatial feature, and employ the weighted summation kernel to balance the spatial and spectral components. The patch size $T$ and kernel weight $\mu$ for three datasets are $\{T=15,\ \mu=0.7\}$ , $\{T=5,\ \mu=0.8\}$ , and $\{T=50,\ \mu=0.4\}$ . Moreover, the optimal parameter settings of JRSRC and LGIDL are followed as [46] and [31], respectively. For the proposed LSLRR, the corresponding parameters are set as $\{\lambda=20,\ \alpha=0.8,\ \beta=0.6,\ m=25\}$ , $\{\lambda=10,\ \alpha=0.3,\ \beta=1.2,\ m=15\}$ , $\{\lambda=10,\ \alpha=1,\ \beta=0.4,\ m=40\}$ for three HSI datasets, respectively

3). Evaluation indexes: We adopt three quantitative metric, overall accuracy (OA), average accuracy (AA) and kappa coefficient ( $\kappa$ ), to evaluate the performance of different classification methods. Specifically, OA index denotes the percentage of HSI pixels which are classified correctly. AA index refers to the average value of accuracy of each class. However, both OA and AA index only involve the errors of commission and they do not cover the user accuracy. The kappa coefficient ( $\kappa$ ), a more reasonable measurement, not only involves the errors of commission but also the errors of omission.

V-C Experimental Results and Analyses

Indian Pines: We randomly select 10% labeled samples in each class as the training set, and the rest as the testing set. Table I demonstrates the final classification performance (i.e., the accuracy for each category, OA, AA and kappa coefficient $\kappa$ ) for the Indian Pines dataset. The corresponding classification maps of each algorithm are shown in Fig. 2. Among these comparison algorithms of HSI classification, SVM and LRR are pixel-wise classification methods which only utilize the spectral feature. Other algorithms (SVMCK, JRSRC, LGIDL and LSLRR) combine both spectral and spatial information to classify HSI data. One can be seen easily from Table I that classification accuracy of SVM and LRR is far lower (OA decreases at least 10%) than that of the other methods. This indicates that the contextual feature can bring a great help for HSI classification. In addition, SVM outperforms the LRR a lot, which verifies the popular SVM is a superior classification algorithm. For classification accuracy of every class in Table I, LGIDL achieves the best result for the 2-th class. JRSRC achieves the best result for the 3-th class. cdSRC achieves the best result for the 6-th class. The proposed LSLRR also obtains the highest accuracy in most classes. Furthermore, the classification OA of the proposed LSLRR improves more than 20% compared with the classical LRR. This is because LCC helps LRR to capture the local feature and SPS makes the solution $\hat{Z}$ close to ideal block-diagonal matrix. Moreover, Table I also obviously demonstrates that LSLRR has achieved the best performance than all other comparison methods. Fig. 5 (a) illustrates the classification accuracy of various methods when different number of samples are considered as training set. It can be clearly observed that the classification performance of SVM and LRR is the worst. And other classification methods all have a promising performance. Among these, the proposed LSLRR yields the best classification results.

Pavia University: 5% of labeled HSI pixels are chose to be training set, and the remaining 95% is used for testing. In order to compare the experimental results quantitatively and visually, Table II and Fig. 3 exhibit the classification performance of Pavia University, and the corresponding visual maps of all methods, respectively. As is shown in Table II and Fig. 3, only a small number of HSI pixels are classified wrongly, and the classification accuracy of LSLRR is the highest in three evaluation indexes. Except for the 9-th class, LSLRR achieves the best results for other 8 classes. This indicates that LSLRR is an effective and superior approach to classify HSIs. After incorporating the spatial characteristics into the composite kernels, SVMCK yields better classsification results in almost all classes compared with SVM. Similarly, the OA of original LRR is the lowest, and the main samples which is wrongly classified is class 3, 4, 5, and 6. As is seen from Fig. 3 (f), there are so many red pixels (class 2) in the blue regions (class 6). Through improving LRR by two powerful techniques, LSLRR achieves the OA of 99.9% in the 6-th class. Compared with LRR, OA of LSLRR improves nearly 12%, and kappa coefficient ( $\kappa$ ) of LSLRR improves more than 16%. Furthermore, we also investigate the influence of different number of training pixels on classification accuracy for Pavia University set. And the corresponding figure is demonstrated in Fig. 5 (b). Interestingly, the curve of LSLRR is the highest while that of LRR is the lowest, which reveals the improvement of LSLRR for LRR is successful.

Salinas: Similar to Pavia University, 5% pixels are selected to train classification model and the rest 95% is as the testing set. The classification accuracy of comparing methods and LSLRR are displayed in Table III. For the purpose of visualization, the classification maps are illustrated in Fig. 4. From the visual maps, the most classified-wrongly pixels are in the dark-blue (class 8) and dark-green (class 15) regions. This is because the land surfaces of the 8-th and the 15-th classes have homologous properties, and the corresponding spectral reflectance curves are very similar. In addition, it is easy to observe that the proposed LSLRR yields the best accuracy compared with other methods, which justifys the effectiveness of LSLRR. From Table III, we can see that the classification accuracy of most classes is more than 99% and all OA is not lower than 94%. Moreover, Fig. 5 (c) exhibits the overall accuracy of different methods for Salinas scene versus the percentage of training samples. This clearly displays that LSLRR can still obtain the best performance although a small number of pixels are used for training set.

Fig. 6 exhibits the OA of three HSI datasets when the value of parameter $m$ changes. Other crucial parameters are followed as subsection V-B. $m$ is an important parameter to control the weight of spatial information in the LCC. From Fig. 6, we can get that the optimal value of $m$ is 25, 15, and 40 for Indian Pines, Pavia University, and Salinas, respectively. The way we employ to measure the spatial similarity is by Eucliden distance, which is more suitable to pixels of the same class distributing in a square or circular shape. As is seen from Fig. 6, the shapes of many classes in Pavia Unversity are slender, while Salinas has many pixels whose distribution is more uniform. Therefore, the most appropriate $m$ for Salinas is the largest, and that for Pavia Unversity is the smallest. In summary, the large value of $m$ is more reasonable for the HSI dataset, which has higher compactness for each class.

Fig. 7 and Fig. 8 illustrate the overall accuracy of three HSI datasets under different values of parameter $\alpha$ and $\beta$ , respectively. When investigating the influence of classification accuracy about parameter $\alpha$ or $\beta$ , other parameters are set as the optimal values. Obviously, the optimal values of $\alpha$ for three datasets are 0.8, 0.3 and 1.0, respectively. When the locality constraint criterion (LCC) is not added, i.e. $\alpha=0$ , the classification accuracy decreases a lot comparing with the highest OA for all three datasets. Especially for Indian Pines, OA decreases more than 12%. This indicates LCC is extremely important for the proposed LSLRR. Furthermore, one can be easily seen that the optimal values of $\beta$ for three datasets are 0.6, 1.2 and 0.4, respectively. Similarly, when $\beta=0$ , classification accuracy is very low. And the OA index improves so fast when the value of $\beta$ starts to increase from 0. It demonstrates the importance of structure preserving strategy (SPS). To sum up, both LCC and SPS can provide a great deal of help to improve significantly the classification accuracy.

V-D Comparison of Running Time

As follows, in order to testify the efficiency of the proposed LSLRR, we use running time to compare the computational complexity of all algorithms. Indian Pines dataset is considered as an example, and 10% of labeled pixels of each class are used for training model. The experiments are conducted in MATLAB R2015a on a PC of Intel Core i7-3770 3.40GHz CPU with 32 GB RAM. TABLE IV shows OA, AA, kappa coefficient and running time of every methods. According to it, the time consuming of SVM and SVMCK are the least, but their classification accuracy is not high enough comparing with JRSRC, cdSRC, LGIDL and LSLRR. JRSRC and LGIDL can obtain promising classification performance, but the running time is too long. For the proposed LSLRR, it is computationally acceptable and the classification accuracy is the highest.

VI Conclusion

In this paper, a novel locality and structure regularized low rank representation (LSLRR) is proposed to classify hyperspectral images. In order to overcome the drawbacks of traditional low rank representation (LRR), LSLRR introduces two key techniques, locality constraint criterion (LCC) and structure preserving strategy (SPS), to improve LRR and make it more suitable for HSI classification. In LSLRR, a new similarity metric combining both spatial and spectral characteristics is first presented. And then LCC utilizes the new similarity metric to make HSI pixels with large distance have a small similarity, which can easily capture the local structure. Besides, SPS makes the solution of LSLRR close to a class-wise block-diagonal matrix. Finally, the classification results can be easily obtained without any complex classifiers. Extensive experiments on three public HSI datasets are carried out to evaluate the performance of the proposed LSLRR. And the experimental results show that LSLRR outperforms other state-of-the-art comparison methods.

Bibliography47

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Q. Wang, Z. Meng, and X. Li, “Locality adaptive discriminant analysis for spectral–spatial classification of hyperspectral images,” IEEE Geosci. Remote Sens. Lett. , vol. 14, no. 11, pp. 2077–2081, 2017.
2[2] G. Hughes, “On the mean accuracy of statistical pattern recognizers,” IEEE Trans. Inf. Theory , vol. 14, no. 1, pp. 55–63, 1968.
3[3] G. Camps-Valls, L. Gomez-Chova, J. Muñoz-Marí, J. Vila-Francés, and J. Calpe-Maravilla, “Composite kernels for hyperspectral image classification,” IEEE Geosci. Remote Sens. Lett. , vol. 3, no. 1, pp. 93–97, 2006.
4[4] G. Camps-Valls, N. Shervashidze, and K. M. Borgwardt, “Spatio-spectral remote sensing image classification with graph kernels,” IEEE Geosci. Remote Sens. Lett. , vol. 7, no. 4, pp. 741–745, 2010.
5[5] G. Liu, Z. Lin, and Y. Yu, “Robust subspace segmentation by low-rank representation,” in Proceedings of the 27th international conference on machine learning (ICML-10) , 2010, pp. 663–670.
6[6] Y. Li, J. Liu, Z. Li, Y. Zhang, H. Lu, S. Ma et al. , “Learning low-rank representations with classwise block-diagonal structure for robust face recognition.” in AAAI , 2014, pp. 2810–2816.
7[7] L. Li, S. Li, and Y. Fu, “Learning low-rank and discriminative dictionary for image classification,” Image Vision Comput. , vol. 32, no. 10, pp. 814–823, 2014.
8[8] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 35, no. 1, pp. 171–184, 2013.