Hyperspectral Image Classification with Deep Metric Learning and Conditional Random Field
Yi Liang, Xin Zhao, Alan J.X. Guo, and Fei Zhu

TL;DR
This paper introduces a novel hyperspectral image classification framework combining deep metric learning with conditional random fields, effectively leveraging spectral features and spatial information to improve accuracy with less training data.
Contribution
The paper proposes a new integrated approach that combines spectrum-based deep metric learning and CRF for hyperspectral classification, reducing data requirements and enhancing performance.
Findings
Improved classification accuracy on real hyperspectral datasets.
Reduced training data needs compared to traditional methods.
Lower computational cost in experiments.
Abstract
To improve the classification performance in the context of hyperspectral image processing, many works have been developed based on two common strategies, namely the spatial-spectral information integration and the utilization of neural networks. However, both strategies typically require more training data than the classical algorithms, aggregating the shortage of labeled samples. In this letter, we propose a novel framework that organically combines the spectrum-based deep metric learning model and the conditional random field algorithm. The deep metric learning model is supervised by the center loss to produce spectrum-based features that gather more tightly in Euclidean space within classes. The conditional random field with Gaussian edge potentials, which is firstly proposed for image segmentation tasks, is introduced to give the pixel-wise classification over the hyperspectral…
| DML | NN-CRF | DML-CRF | D-CNN | CSFF | ||
|---|---|---|---|---|---|---|
| Pavia Univ. | OA() | |||||
| AA() | ||||||
| Salinas | OA() | |||||
| AA() | ||||||
| 3D-CNN | CSFF | DML-CRF | |
|---|---|---|---|
| Pavia Univ. | |||
| Salinas |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Hyperspectral Image Classification with Deep Metric Learning and Conditional Random Field
Yi Liang, Xin Zhao, Alan J.X. Guo, and Fei Zhu The work was supported in part by the National Natural Science Foundation of China under Grant 61701337 and the Natural Science Foundation of Tianjin under Grand 18JCQNJC01600. *(Corresponding author: Alan J.X. Guo.)*Y. Liang, X. Zhao, A. Guo, and F. Zhu are with the Center for Applied Mathematics, Tianjin University, China. (liangyi_math; zhaoxin_zx; jiaxiang.guo; [email protected])
Abstract
To improve the classification performance in the context of hyperspectral image processing, many works have been developed based on two common strategies, namely the spatial-spectral information integration and the utilization of neural networks. However, both strategies typically require more training data than the classical algorithms, aggregating the shortage of labeled samples. In this letter, we propose a novel framework that organically combines the spectrum-based deep metric learning model and the conditional random field algorithm. The deep metric learning model is supervised by the center loss to produce spectrum-based features that gather more tightly in Euclidean space within classes. The conditional random field with Gaussian edge potentials, which is firstly proposed for image segmentation tasks, is introduced to give the pixel-wise classification over the hyperspectral image by utilizing both the geographical distances between pixels and the Euclidean distances between the features produced by the deep metric learning model.
The proposed framework is trained by spectral pixels at the deep metric learning stage and utilizes the half handcrafted spatial features at the conditional random field stage. This settlement alleviates the shortage of training data to some extent. Experiments on two real hyperspectral images demonstrate the advantages of the proposed method in terms of both classification accuracy and computation cost.
1 Introduction
Hyperspectral images (HSI) are usually acquired by spaceborne or airborne sensors, recording the reflection spectra or radiance spectra over hundreds of channels. They are usually formatted as data cubes. The height and width of an HSI data cube correspond to the real world object under a specific resolution, while the depth is decided by the channels of the sensors. As a crucial task, the classification of HSI pixels attracts great attention for a long time [1, 2, 3]. Many early methods are based on classical machine learning algorithms and their variations, for instance, principal component analysis (PCA) [4, 5], independent component analysis (ICA) [6], linear discriminant analysis (LDA) [7, 8], support vector machine (SVM) [9], and sparse representation [10, 11].
In recent years, neural networks (NN) have gained popularity in many applications related to machine learning, due to its power in generating abstract representations from the original data. An increasing number of NN-based algorithms have been adapted to HSI classification tasks and achieved impressive results. Representatives of the earlier models are stacked autoencoder (SAE) [12], and deep belief network (DBN) [13]. With the advances in deep learning, various deep models have been applied to HSI classification tasks, demonstrating their power in both processing spatial data and producing self-learned features. This category of algorithms mainly includes convolutional neural network (CNN) [14, 15, 16], recurrent neural network (RNN) [17, 18, 19], and deep metric learning (DML) [20, 21, 22], to name a few.
The conditional random field (CRF) is a probabilistic graphical algorithm that enables to characterize the contextual information among the labels and the data [23]. As an important application of CRF, image segmentation has also attracted attention in classifying HSI pixels [24, 25, 26, 27]. In most of these works, the CRFs were integrated sequentially after the CNNs as a post-processing step, processing the output features extracted by CNN encoders. For example, in [25], a restricted CRF algorithm is applied to refine the superpixel classification from a CNN to the final pixel-wise classification results. In [27], the authors utilized a CRF to improve the predictions on the CNN outputs and designed a specific deconvolutional network to produce the final classifications.
In this letter, a framework that combines the DML and CRF algorithms is proposed. The DML model supervised with center loss is employed to extract spectrum-based features from individual pixels. The CRF algorithm is applied to give final predictions by modeling both the spatial and spectral information from the spectrum-based features extracted by the DML model. To be more precise, our work has advantages in the following aspects:
- •
The intrinsic relations between DML and CRF help to improve the classification accuracies. To the best of our knowledge, we are the first to introduce a framework that benefits from the underlying connections between DML and CRF.
- •
The setting of employing a spectrum-based DML model and a handcraft spatial-based CRF algorithm keeps the framework simple. Compared to the CNN models, our framework is spectrum-based in the training phase and engages a simpler model structure, thus alleviating the shortage of labeled HSI data raised in the CNN models [15, 20].
- •
In practice, the proposed framework shows high efficiency in computation cost, for introducing the convolutional CRF (ConvCRF) [28], in which the CRF inferences are implemented on the GPU phase by convolutional operations.
2 Proposed Framework
As the two main parts of the proposed framework, DML and CRF algorithms are firstly introduced separately. Substantially, an overview of the whole DML-CRF framework is presented.
2.1 Deep Metric Learning
In [20], the center loss proposed in the deep metric learning model [29] was first introduced to the HSI classification tasks. A 3-layered fully connected network was built to extract spectral features from the input data. As illustrated in Fig. 1, the model is jointly supervised by cross-entropy loss (also called softmax loss) and center loss. Under this settlement, the extracted features from the same class gather more tightly in Euclidean space. This model is adopted to encode the spectrum in our work.
Throughout this letter, we use to denote the pixel with the label , from the HSI . Let be the function defined by the neural network, whose values are the extracted features. Use to denote the predicted probability distribution that is calculated by applying the softmax function on the extracted features . During the training stage, a joint loss that sums the center loss and the cross-entropy loss is engaged. As the key part of DML, the center loss is defined to measure the Euclidean distance between the produced features and its class centers , as
[TABLE]
where the class centers are formulated as
[TABLE]
At the testing stage, samples are fed to the neural network . The outputs, which include both the extracted feature and the predicted probability distribution , are collected for the subsequential CRF step.
2.2 Conditional Random Field
The CRF algorithm plays an important role in image segmentation, with the merit of exploiting the global context information [23, 30, 28]. In this letter, we use the CRF with Gaussian edge potentials to fuse the spatial-spectral information, and give reasonable pixel-wise predictions of the HSI. The notations in this letter mainly follow the models of fully connected CRF in [30] and ConvCRF in [28].
Let be a choice of predictions over all the pixels in an HSI, the probability of is calculated from an energy function by a Gibbs distribution:
[TABLE]
where is the partition function [23]. In this algorithm, the energy function is set to have two parts, with
[TABLE]
where is the unary potential and is the pairwise potential. As in most applications of CRF, the unary potential is set to be the cost of a pixel taking label , which is
[TABLE]
The pairwise potential is set to be
[TABLE]
where is called compatibility function and given by the Potts model , the and are respectively termed the appearance and smooth kernels, and the and are linear combination weights. If we denote the position of as , the appearance kernel in (3) is defined as
[TABLE]
and the smoothness kernel is
[TABLE]
As stated in [30], the appearance kernel is based on the observation that neighboring pixels with similar features tend to be from the same class, while the smoothness kernel helps to eliminate small isolated regions. It is noteworthy that the pairwise potential integrates both the spectral information and geographical information.
Mathematically, the final prediction is obtained by
[TABLE]
which is, however, hard to compute. Usually, a method of mean field approximation [30] is used to approximately calculate the results. In [28], the authors assumed that the pairwise potentials only take effect when the Manhattan distance between and is less than the so-called filter-size . Under this assumption, the mean field inference algorithm could be implemented on the GPU phase and calculated more efficiently. This inference algorithm is called ConvCRF. Readers may refer [30, 28] for more detailed definitions and calculations of CRF.
2.3 A Summary
In general, we first use a DML model to generate spectrum-based features , as well as the preliminary predictions . Then, the preliminary predictions are reformulated as the unary potentials of CRF by (2). The pairwise potentials, which include the appearance and smooth kernels, are expressed by (3) using features and the corresponding pixels’ positions . Finally, the ConvCRF algorithm is adopted to make CRF inference, producing the final predictions over all the pixels in an HSI.
In essence, it is the intrinsic connection between the center loss of DML and the appearance kernel of CRF that contributes to the performance of the proposed framework. Compared to the features extracted by the conventional NN models, the features extracted by DML with center loss gather more tightly in Euclidean space within the same class, i.e., pixels from the same class tend to be encoded as more similar features. Meanwhile, the appearance kernel (4) is designed to rely on the Euclidean distances between features . When compared to CRFs that rely on raw pixel spectra or features from plain NN models, the existence of center loss in DML rationalizes the CRF algorithm in our framework and enhance the final classification results.
3 Experiments
3.1 Datasets Description
The experiments are carried out on two well-known HSI datasets, namely the Pavia University scene and the Salinas scene collected by the ROSIS sensor and the AVRIS sensor, respectively111The datasets are available online: http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes. The Pavia University scene used in experiments has a size in . The spatial resolution is , while the band depth covers the wavelength from to with noisy and water absorption bands removed. Regarding the Salinas scene, the size of the image is after removing water absorption bands. The spatial resolution is , and the spectra cover a bandwidth range from to .
3.2 Experimental Settings and Results
To show the advantage of combining DML and CRF, contrast experiments are carried out by implementing DML, NN-CRF, and the proposed framework DML-CRF. Here, the only difference between the NN model and the DML model is the absence of center loss in the former. Moreover, two state-of-art methods, namely D-CNN [15] and CSFF (DML-CSFF) [21] are also compared as baselines of HSI classification algorithms. Experiments are performed with deep learning platforms Caffe [31] and PyTorch [32], on a machine equipped with CPU of Intel Xeon [email protected] and GPU of NVIDIA TitanX.
As for the DML model in DML and DML-CRF, the length of the extracted feature is set to be . The hyperparameters, such as learning rate, balance weight , and etc., are all chosen as their default values in the original paper [20]. Regarding the CRF algorithm in NN-CRF and DML-CRF, there are five parameters , , , , and . According to [30], the performances of CRF in terms of classificationy are relatively robust to these five parameters. Therefore, the default setting,
[TABLE]
in [30, 28] are used directly. The only hyperparameter that needs to be set is the filter-size in ConvCRF, which is chosen as for Pavia University scene and for Salinas scene. An analysis of these variables is given in Section 3.3. The comparing methods D-CNN and CSFF (DML-CSFF) are implemented by following their original papers [15, 21].
If not otherwise specified, the training samples used in all the experiments follow the same preprocessing procedure. Each dataset is firstly normalized to have zero mean and unit variance. The training set is formed by randomly chosen pixels per class. For D-CNN, HSI patches from each class are randomly chosen instead. To avoid overfitting effects, virtual samples are generated by the linear combinations of the pixels from the same class, with formula They are adopted in the training stages of all the aforementioned neural networks. The classification performances are evaluated by three metrics, namely overall accuracy (OA), average accuracy (AA), and the kappa coefficient (). Briefly, the metric OA is the percentage of correctly classified samples over all the testing samples, the metric AA is calculated by averaging the classification accuracies from each class, and the coefficient measures the agreement between the predicted labels and groundtruth labels by the formula
[TABLE]
In this formula, the notation represents the chance that the predicted label agrees with groundtruth label, which is the overall accuracy (OA), while is the hypothetical probability of chance agreement. Assume we have the predicted distribution which has chance to output a predicted label , and the groundtruth distribution which has chance to output a groundtruth label , is then calculated by
[TABLE]
The classification results with mean and standard deviation over five runs are reported in TABLE 1. As shown in the first three columns, the absence of either DML or CRF deteriorates the classification accuracies. Compared to the state-of-the-art methods, the proposed DML-CRF still leads to comparable results. The proposed DML-CRF outperforms D-CNN with a large margin on both datasets. When compared to CSFF, DML-CRF performs better in all the metrics on the Pavia University scene. On the Salinas scene, DML-CRF surpasses CSFF in terms of AA, but is slightly inferior to CSFF in terms of OA and . The testing times of the comparing methods are given in TABLE 2. We observe the DML-CRF is overwhelmingly faster than CSFF and several times faster than D-CNN, thanks to the implementation of ConvCRF on GPU.
In the proposed DML-CRF framework, the parameters in the DML model are trained by spectral data, while the parameters in the CRF algorithm are set directly. Compared to the most of the spatial-spectral algorithms which use the HSI patches as training data, only the spectral data is engaged in the training of DML-CRF. This alleviates the shortage of HSI data in one sense. Also, the algorithm DML-CRF engages a simple and spectrum-based DML model, hence it has fewer parameters than the spatial-spectral algorithms which usually use multiple CNN layers as the model structure. Typically, a model with less trainable parameters tends to have less overfitting issues, therefore it performs better with insufficient training data. In this letter, the training datasets of DML-CRF and D-CNN are set to have the same cardinalities. Comparison between the classification accuracies of DML-CRF and D-CNN in TABLE 1 partially confirms our hypothesis mentioned above.
3.3 Parameter Optimization
This subsection mainly discusses the effects of different choices of parameters and hyperparameters in CRF. For the hyperparameters in the DML model, details on their behaviors of them can be found in [20].
To verify the robustness of CRF to the parameters , , , , and , we anchor the default values by . Under this setting, we perform several experiments by varying every single parameter at one time. The relationships between the parameters and the classification performances are presented in Fig. 2. It is obvious that the classification performances are relatively robust to the parameters.
Regarding the only hyperparameter , which is the filter-size in ConvCRF, it controls the size of the spatial information that CRF takes into account. The effect of on the classification accuracies are shown in Fig. 3. As expected, larger filter-sizes lead to higher accuracies, but also require more cost of computation.
4 Conclusion
In this letter, we proposed a framework that combines DML and CRF. The DML model is used to extract features from pixels of HSIs. The advantage of center loss reduces the Euclidean distances between the extracted features which share the same class label. Later, the CRF algorithm is applied to give predictions over the whole HSI by using both the extracted features and their position information. Contrast experiments demonstrated that the absence of either DML or CRF declines the classification performances. Moreover, the proposed framework provides comparable results to the state-of-art methods in both classification accuracies and computation cost. Additional experiments are performed to show the effects of varying parameters and hyperparameters on the classification accuracies.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. Lu and Q. Weng, “A survey of image classification methods and techniques for improving classification performance,” International Journal of Remote Sensing , vol. 28, no. 5, pp. 823–870, 2007.
- 2[2] M. Fauvel, Y. Tarabalka, J. A. Benediktsson, J. Chanussot, and J. C. Tilton, “Advances in spectral-spatial classification of hyperspectral images,” Proceedings of the IEEE , vol. 101, no. 3, pp. 652–675, 2013.
- 3[3] W. Li, F. Feng, H. Li, and Q. Du, “Discriminant analysis-based dimension reduction for hyperspectral image classification: A survey of the most recent advances and an experimental comparison of different techniques,” IEEE Geoscience and Remote Sensing Magazine , vol. 6, no. 1, pp. 15–34, March 2018.
- 4[4] S. Prasad and L. M. Bruce, “Limitations of principal components analysis for hyperspectral target recognition,” IEEE Geoscience and Remote Sensing Letters , vol. 5, no. 4, pp. 625–629, Oct 2008.
- 5[5] J. Jiang, J. Ma, C. Chen, Z. Wang, Z. Cai, and L. Wang, “Superpca: A superpixelwise pca approach for unsupervised feature extraction of hyperspectral imagery,” IEEE Transactions on Geoscience and Remote Sensing , vol. 56, no. 8, pp. 4581–4593, Aug 2018.
- 6[6] A. Villa, J. A. Benediktsson, J. Chanussot, and C. Jutten, “Hyperspectral image classification with independent component discriminant analysis,” IEEE Transactions on Geoscience and Remote Sensing , vol. 49, no. 12, pp. 4865–4876, Dec. 2011.
- 7[7] T. V. Bandos, L. Bruzzone, and G. Camps-Valls, “Classification of hyperspectral images with regularized linear discriminant analysis,” IEEE Transactions on Geoscience and Remote Sensing , vol. 47, no. 3, pp. 862–873, Mar. 2009.
- 8[8] W. Li, S. Prasad, J. E. Fowler, and L. M. Bruce, “Locality-preserving dimensionality reduction and classification for hyperspectral image analysis,” IEEE Transactions on Geoscience and Remote Sensing , vol. 50, no. 4, pp. 1185–1198, April 2012.
