TL;DR
This paper investigates effective transfer learning strategies for SAR target recognition using deep CNNs, focusing on what, where, and how to transfer knowledge from source tasks and data to improve performance.
Contribution
It analyzes transfer learning issues specific to SAR images and proposes a transitive transfer method with domain adaptation to enhance recognition accuracy.
Findings
Transfer learning from natural images is less effective for SAR.
Identifies optimal network layers and source tasks for SAR transfer.
Proposes a multi-source domain adaptation approach for SAR recognition.
Abstract
Deep convolutional neural networks (DCNNs) have attracted much attention in remote sensing recently. Compared with the large-scale annotated dataset in natural images, the lack of labeled data in remote sensing becomes an obstacle to train a deep network very well, especially in SAR image interpretation. Transfer learning provides an effective way to solve this problem by borrowing the knowledge from the source task to the target task. In optical remote sensing application, a prevalent mechanism is to fine-tune on an existing model pre-trained with a large-scale natural image dataset, such as ImageNet. However, this scheme does not achieve satisfactory performance for SAR application because of the prominent discrepancy between SAR and optical images. In this paper, we attempt to discuss three issues that are seldom studied before in detail: (1) what network and source tasks are better…
Click any figure to enlarge with its caption.
Figure 0
Figure 1
Figure 12
Figure 13
Figure 14
Figure 15
Figure 17
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 9
Figure 9| Category | 2S1 | BMP2 | BRDM2 | BTR60 | BTR70 | D7 | T62 | T72 | ZIL131 | ZSU23 | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 17∘ | 299 | 233 | 298 | 256 | 233 | 299 | 299 | 232 | 299 | 299 | 2747 |
| 15∘ | 274 | 195 | 274 | 295 | 196 | 274 | 273 | 196 | 274 | 274 | 2425 |
| Elaborated Type | Cargo | Bulk Carrier | Container Ship | Total |
|---|---|---|---|---|
| train | 100 | 100 | 100 | 300 |
| test | 79 | 132 | 135 | 346 |
| Network | A_ConvNet | H_Net | AlexNet_Conv |
|---|---|---|---|
| conv1 | 16(5) | 48(5) | 96(11) |
| conv2 | 32(5) | 96(5) | 256(5) |
| conv3 | 64(5) | 128(3) | 384(3) |
| conv4 | 128(6) | 128(3) | 384(3) |
| conv5 | None | 256(3) | 256(3) |
| size | 0.4 M | 0.7 M | 4 M |
| Network | Source Task | Net(OpenSAR) | Transferred Layers (Frozen) | ||||
| 1 | 2 | 3 | 4 | 5 | |||
| A_ConvNet | MSTAR | 0.8757 | 0.8612 | 0.8612 | 0.8670 | 0.8208 | none |
| H_Net | SAR(recon) | 0.8555 | 0.8641 | 0.8483 | 0.8223 | 0.7818 | 0.68 |
| MSTAR | 0.8805 | 0.8728 | 0.8526 | 0.8410 | 0.8324 | ||
| SAR(recon)*MSTAR | 0.8844 | 0.88 | 0.8858 | 0.8902 | 0.8584 | ||
| AlexNet_Conv | ImageNet | 0.8439 | 0.8901 | 0.8584 | 0.8468 | 0.8584 | 0.7774 |
| SAR | 0.8974 | 0.8988 | 0.8883 | 0.8921 | 0.8526 | ||
| MSTAR | 0.9017 | 0.9075 | 0.8859 | 0.8757 | 0.8511 | ||
| AlexNet_Conv (transitive transfer) | ImageNet*SAR | 0.8439 | 0.8930 | 0.8901 | 0.8718 | 0.8671 | 0.7109 |
| ImageNet*MSTAR | 0.8931 | 0.8902 | 0.8815 | 0.8902 | 0.7283 | ||
| ImageNet*SAR*MSTAR | 0.8988 | 0.8931 | 0.8960 | 0.9017 | 0.7486 | ||
| SAR*MSTAR | 0.8988 | 0.9032 | 0.8959 | 0.8872 | 0.8612 | ||
| Network | AlexNet_Conv | H_Net | ||
|---|---|---|---|---|
| Source Tasks | (recon) | |||
| Fine-tune | 89.31% | 89.94% | 86.41% | 88.44% |
| ITL | 89.88% | 90.46% | 87.28% | 89.88% |
| STL | 90.75% | 91.9% | 88.43% | 89.31% |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
What, Where and How to Transfer in SAR Target Recognition Based on Deep CNNs
Zhongling Huang, Zongxu Pan, and Bin Lei This work was supported by the National Natural Science Foundation of China under Grant 61701478 and the Joint Training Program of University of Chinese Academy of Sciences.Zhongling Huang is with School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Huairou District, Beijing 101408, China. (e-mail: [email protected])The authors are with the Key Laboratory of Technology in Geo-spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China, and also with Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China.
Abstract
Deep convolutional neural networks (DCNNs) have attracted much attention in remote sensing recently. Compared with the large-scale annotated dataset in natural images, the lack of labeled data in remote sensing becomes an obstacle to train a deep network very well, especially in SAR image interpretation. Transfer learning provides an effective way to solve this problem by borrowing the knowledge from the source task to the target task. In optical remote sensing application, a prevalent mechanism is to fine-tune on an existing model pre-trained with a large-scale natural image dataset, such as ImageNet. However, this scheme does not achieve satisfactory performance for SAR application because of the prominent discrepancy between SAR and optical images. In this paper, we attempt to discuss three issues that are seldom studied before in detail: (1) what network and source tasks are better to transfer to SAR targets, (2) in which layer are transferred features more generic to SAR targets and (3) how to transfer effectively to SAR targets recognition. Based on the analysis, a transitive transfer method via multi-source data with domain adaptation is proposed in this paper to decrease the discrepancy between the source data and SAR targets. Several experiments are conducted on OpenSARShip. The results indicate that the universal conclusions about transfer learning in natural images cannot be completely applied to SAR targets, and the analysis of what and where to transfer in SAR target recognition is helpful to decide how to transfer more effectively.
Index Terms:
SAR target recognition, transfer learning, deep convolutional neural networks, domain adaptation.
I Introduction
Deep learning techniques, which automatically learn effective hierarchical features from the large-scale dataset, have been widely used in remote sensing data analysis in recent years. However, scarce labeled data, the biggest obstacle of applying deep learning to the field of remote sensing, still exists and significantly restricts the further development. Different from tasks that have millions of labeled samples in natural image fields, the training data in the remote sensing field is usually inadequate to train a deep network well. Instead of training a deep network from scratch with a few data, transfer learning, which aims to transfer knowledge from the source domain with a large-scale dataset to the target domain, provides an effective way to train a deep network with limited data. The most straightforward and commonly used trick is to fine-tune the network based on a pre-trained one.
Remote sensing data, mainly from optical (multi- and hyper-spectral) and synthetic aperture radar (SAR) sensors, are multi-modal with different imaging geometries and content. Penatti et al. [1] firstly indicated that the deep features can be generalized from everyday objects in daily images to objects in optical remote sensing images. Different kinds of convolution neural networks (CNNs), such as CaffeNet, AlexNet, VGG, trained on ImageNet, a natural image dataset, are tried to transfer to 3 bands optical remote sensing images classification, and achieve a remarkable performance [2]. Many subsequent literatures choose a variety of existing successfully pre-trained CNN models on ImageNet to transfer to various tasks, such as image registration [3], airplane detection [4], scene classification [5, 6], image segmentation [7] and image super-resolution [8], for both hyper-spectral and multi-channel remote sensing images. For optical remote sensing applications, transferring knowledge from natural images is prevalent since imaging mechanisms of both natural and optical remote sensing images are the same so that they can share some low- and mid-level features, such as those resemble either Gabor filters or color blobs. Apart from taking natural images as the source data, remote sensing data obtained from other platforms can also be used. Windrim et al. [9] transfers CNNs trained from certain hyper-spectral images (HSI) to classify other HSI from different satellite(aerial) platform. Similarly, Samat et al. [10] transfers between training and validation data of hyper-spectral images with domain adaptation to weaken the statistical distribution difference.
Due to different imaging mechanisms, approaches for the interpretation of optical remote sensing images cannot be directly used for interpreting SAR images in general. While transfer learning has begun to attract attention in optical remote sensing application recently, relevant study in SAR images has not caught up with yet. We just find a few studies in which transfer learning is applied to conquer the difficulty of lacking labeled SAR data to train a deep network. Yang et al. [11] made the classifier learn the common knowledge among with different target-aspect angles of SAR targets via transfer learning. Malmgren-Hansen et al. [12] proposed a generation approach on SAR data, and answered the question about how to transfer the knowledge from the simulated SAR data to the real one. Huang et al. [13] indicated that features learned from a large amount of unlabeled SAR scene images via stacked convolutional auto-encoders are transferable to SAR target recognition task. To our best knowledge, there is not yet adequate evidence indicating whether the optical images can or cannot be transferred to SAR images with effect. Although several studies have attempted to explore how to transfer knowledge from optical to SAR images, even the rationality of this transfer is still under debate. Pros are as follows: Kang et al. [14] utilized the intermediate layers of the pre-trained network on CIFAR-10 dataset as the feature extractor for classification of TerraSAR-X images and Wang et al. [15] fine-tuned the VGG-16 model trained under natural images to detect ships in SAR images. While cons also exist, for example, Marmanis et al. [16] thought that initialization with the weights learned from optical images has little effect on classification of SAR data, simply because the distributions of optical images and SAR data are probably too different from each other to transfer even in low layers.
Considering the particularity of SAR images, especially the different imaging mechanisms between optical and SAR sensors, it’s not easy to transfer the features immediately from those successfully models which are often trained with natural image dataset [17]. The problem of transferring from other datasets to remote sensing data with large variations still remains to be solved and the transferability of trained networks to other imaging modalities needs to be further investigated [18]. In this paper, we will explore transfer learning focusing on SAR target recognition in a further way and try to discover more properties.
The contribution of this paper is to answer the following three questions about transfer learning via CNNs for SAR target recognition.
I-1 What to Transfer
The network and source tasks should be both considered in transfer learning on SAR target recognition. A deeper well-trained network with a large-scale dataset generally has a stronger ability in extracting generic features, and the distance between source data and target data affects the transferability of features. We explore the influence of different source data and tasks, including optical images, SAR scene images, and SAR target dataset, as well as classification and reconstruction, and different architectures to show what network together with datasets should be transferred to SAR target recognition. Besides, we propose a transitive transfer method via multi-source data to improve the generality of features in layers significantly.
I-2 Where to Transfer
The transferability of features varies from layer to layer in deep CNNs. Some are general meaning that they are applicable to other tasks, while the others are more specific to a particular task. Generally speaking, the transferability of features decreases from low-level to high-level. We analyze the generality and specificity of features in different layers with various source data when taking SAR target recognition as the target task, so as to decide which level of the features can be used as the off-the-shelf representation for the target task.
I-3 How to Transfer
To transfer features that are specific to a particular task from the source task to the target task, we propose a method based on multi-kernel maximum mean discrepancy in domain adaptation, which combines the unsupervised and supervised learning to utilize the best of source and target data regardless of the labels. The approach increases the generality of features in task-specific layers, resulting in a stronger feature representation of the target data and a better performance in recognition.
I-4 SAR Specific Model
We provide the SAR specific model pre-trained on a large-scale SAR land cover and land use dataset with a strong ability to extract spatial features of SAR images, which is validated to be well transferred to SAR targets, such as MSTAR and OpenSARShip datasets [19].
The rest of this paper is organized as follows. After a brief introduction of transfer learning and domain adaptation in Section II, the proposed method is detailed in Section III. Experiments and discussions are presented in Section IV to validate the effectiveness of the proposed method. Finally, the conclusions are drawn in Section V.
II Related Work
In this paper, we are interested in transfer learning on SAR target recognition. Consequently, we will introduce some typical literatures on SAR target recognition with transfer learning methods in this section firstly, and then followed by several related literatures about transfer learning and domain adaptation.
The simulated SAR images of vehicles with dense sampling of objects in different view angles are used for pre-training the CNN model to learn generic features that can be transferred to real SAR images in automatic target recognition (ATR) applications, proposed by Malmgren-Hansen et al. [12] for the first time. However, the simulation of SAR images requires high technology but the technique is not so mature to simulate enough reliable models and difficult to popularize. On the other hand, Huang et al. [13] found the features from unlabeled SAR scene images trained with a stacked convolutional auto-encoders are transferable to SAR targets. Although this is impressive and helpful under the case of lacking enough SAR targets but with adequate unlabeled SAR images, how generic or specific are the features from different source tasks transferred to SAR targets is still unknown. The transferability of features needs to be further explored and the discrepancy between source data and SAR targets should be fully taken into consideration to improve the performance of transfer learning.
Transfer learning, usually aiming at transferring knowledge from a large dataset known as the source domain to a small dataset called as the target domain [20], is widely popularized in deep convolutional neural networks based approaches. Yosinski et al. [21] discussed the transferability of features in deep neural networks, taking AlexNet trained on ImageNet as an example. They proposed a method to analyze how transferable the features are and found the generalization of features to other datasets and tasks apparently decreases as the layer goes deeper, leading to more specific features to a particular dataset or task especially in layer 6 and 7 which is widely applied in the subsequent studies [22, 23, 24]. Although the co-adaptation of neurons between layers will bring out the optimization difficulty, fine-tuning the transferring features on the target dataset can disentangle this issue. Azizpour et al. [22] investigated several influencing factors on transferability, including network structure, early stopping, fine-tuning, similarity between source and target tasks, etc. Among these factors, the similarity between source and target tasks is the most significant one to determine whether the learnt representation is generic or not. Considering the large domain discrepancy, Tan et al. [25] proposed a transitive transfer learning method to transfer knowledge even when the source and target domains share few factors directly, with the aid of some annotated images as the intermediate domain to bridge them. Then they proposed a selective learning algorithm to transfer from face to airplane images which are totally different with each other [26]. This also inspires us to think whether the intermediate task closer to SAR target recognition are capable to increase the feature generality.
Domain adaptation approaches are often adopted to decrease the domain discrepancy between source and target tasks in transfer learning. Maximum mean discrepancy (MMD), a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space, proposed by Borgwardt et al. [25], is used as the discrepancy metric between the source and the target in domain adaptation and transfer learning [27, 26, 28]. Long et al. proposed the deep adaptation network [23], residual transfer networks [29] and joint adaptation network [30] successively based on the idea of domain adaptation using multi-kernel MMD metric to reduce the domain discrepancy between source and target, which inspired us to focus on learning transferable features in SAR target recognition. In most domain adaptation problems, the labeled source data and unlabeled target data are used, with common or similar categories but different distributions, such as Office-31 dataset which consists of 4,652 images within 31 categories collected from different environment variation of Amazon (downloaded from amazon.com), Webcam (taken by web camera) and DSLR (taken by digital SLR camera). In our case, however, the categories of SAR targets to be recognized are usually never seen before so the classifier should be retrained. Moreover, those methods of transferring among natural images are probably not applicable in SAR targets. In this paper, we will explore the specialized regulations and approaches specific to SAR target recognition.
III Methods
According to the three questions of what, where and how to transfer in SAR target recognition that this paper prepare to explore, we will firstly elaborate the method of analyzing the transferability of features and then propose our approaches to make full use of the transferred features.
III-A Generic or Specific
The features extracted from different layers of deep convolutional networks can be grouped into two categories, the generic feature and the specific one. Features with generality means they are capable to represent other dataset and those with specificity are closely related to the chosen data or tasks. In order to analyze the transferability of features in different layers on SAR target recognition case, we adopt the method of qualifying the generality versus specificity of features in each layer of a deep CNN [21]. Suppose there are different source tasks to transfer. Denote the , source tasks as , , respectively, and the target task as . For a network with layers, we would like to explore: 1) whether the features from the layer are generic to the target task or specific to the source task. 2) From which layer does the transferability of features decline dramatically.
Firstly, we train the network on source task from scratch, denoted as . Then the network is trained on , with the layers copied from and fixed as a feature extractor of the target task and the layers, as well as the classification layer randomly initialized, as shown in Fig. 1. If the performance of this transferred network on , denoted as is better than the performance of the retrained network on , denoted as , the features in layer are declared to be general. Otherwise, they are deemed to be specific to . We compare the performance of and to evaluate the degree of generality of the layer features from different source tasks and , as shown in Fig. 2. The results are given in Section IV.
III-B Transitive Transfer via Multi-Source
In this paper, we propose the transitive transfer via multi-source datasets. In the field of SAR image interpretation, various kind of tasks, such as image classification, reconstruction, target detection and recognition, are solved individually. Even for similar tasks, different problems usually do not cross paths with each other. Taking target recognition as an example, recognizing targets in optical images and SAR images, or recognizing different kinds of SAR targets such as airplanes and ships, are usually looked upon as different problems. Deep learning is a powerful tool to complete those tasks but training a new network for each task is time-consuming and data hungry for some tasks with limited labeled data. What if transitively transferring the knowledge task by task, especially from remotely similar task to similar one? Can it be helpful to enhance and enrich the ability of feature extraction on target dataset? In our method, as shown in Fig. 3, given a network trained on , we simply fine-tune all layers on to fit the source task, obtaining the network . Similarly, we get the with knowledge from source data , , … and . And then we will analyze the transferability of features in each layer similar to Section III-A. The results are given in Section IV.
III-C Transfer Learning with Domain Adaptation
According to the previous analysis, specific features constrain the transferring among various tasks. To solve this issue, we propose a transitive transfer based method with domain adaptation to decrease the discrepancy between source and target task. Firstly, we will introduce the multi-kernel maximum mean discrepancy (MK-MMD) and then the two algorithms of the proposed method will be presented.
III-C1 Multi-Kernel Maximum Mean Discrepancy (MK-MMD)
Maximum mean discrepancy (MMD) was firstly proposed by Borgwardt et al. [25] as the discrepancy metric to compare the distributions based on two sets of data. In transfer learning, most domain adaptation methods are based on the MMD to narrow the gap between source and target domain. Suppose the distributions of the source data and target data are and , respectively. For two dataset and with different distributions and , their MMD is defined as
[TABLE]
where denotes an element of a set of functions in the unit ball of a Reproducing Kernel Hilbert Space (RKHS) and denotes the expectation of with the distribution . In RKHS, the expectation is referred to as the embedding of , and denoted as for short, that is
[TABLE]
As a result, the MMD can be regarded as a distance between embeddings of the probability distributions in a RKHS which represents a metric of source and target data. Furthermore, the square of MMD can be written as
[TABLE]
where the denotes the inner product in RKHS and the feature map can be associated with the kernel map in RKHS. Consequently, the empirical estimate of MMD can be given by
[TABLE]
The kernel is usually defined as the convex combination of basis kernels,
[TABLE]
and in our method the Gaussian kernel function is selected as the basis kernel.
In order to use mini-batch stochastic gradient descent (SGD) more easily and less time-consumingly in CNN, Gretton et al. [31] proposed the unbiased estimate of MK-MMD with linear complexity which gives an approximation of a summation form. Given a quad-tuple , by supposing , the square of MMD can be rewritten as
[TABLE]
where
[TABLE]
III-C2 Deep Domain Adaptation Based on Transitive Transfer with Multi-Source
In our method, we will choose a variety of source tasks with diverse similarity to the target task to assist recognizing some new types of SAR targets by transitive transfer learning from distant to similar. Given a set of source tasks and arrange them in ascending order according to the similarity with the target task . We pre-train and fine-tune the network as proposed in Section III-B and analyze the transferability of features in each layer to see where the generality drops fiercely. Suppose the first layers have the strong ability to extract general features of target data, denoted as the off-the-shelf layers, and the layers are more specific than the previous layers, denoted as the adaptation layers. Since the is fine-tuned on at last, we only adapt the datasets of and .
In the popular domain adaptation methods [23, 30, 32, 33], the source data and target data share the same set of categories but with different probability distributions with the target data all unlabeled. The classification loss of source data and MMD between source and target data are combined to back-propagate to decrease the discrepancy, and then the target data can be classified into categories directly. In our case, however, the types of SAR targets to be recognized are never seen before and the classification layer should be retrained. In this paper, we proposed two algorithms and will have an elaborate discussion in Section IV on how to choose appropriate algorithm in different scenarios.
Firstly, an integrated learning algorithm which combines the classification and domain adaptation is proposed as ITL, shown in Fig. 4(a). Given a mini-batch of a quad-tuple of as the input to the network, the transfer loss in the adaptation layer is calculated by Equation. (6) and (7), denoted as
[TABLE]
where . The classification loss of target data is calculated by the standard Softmax loss, denoted as where represents the category classifier. The network is trained by minimize the total loss of
[TABLE]
where denotes the trade-off between transfer loss and classification loss and denotes the weight of transfer loss in each adaptation layer. In ITL algorithm, the transfer loss in adaptation layers are only added as a regularizer to classification and is a dynamic parameter in the training process to keep a good balance on transfer loss and classification loss, especially at the later stage in training, should be reduced by 0.1 to get a better trade-off. The setting of depends on the transferability of each adaptation layer. Generally, the learning rate of the off-the-shelf layers should be smaller and the classification layer larger than the adaptation layers.
Secondly, considering the transfer loss and the classification loss are mutually interactive and restrictive when combined to optimize the parameters of the network, we propose a two-step training algorithm, namely STL as shown in Fig. 4(b). In the first step of training the adaptation layers, the off-the-shelf layers are frozen because of the generality of representing the target data which also lowers the computational cost of optimizing the parameters. The transfer loss calculated by Equation. (8) is used to train the adaptation layers, aiming at decreasing the feature discrepancy in specific layers. Then the classification loss combined with the transfer loss is minimized to train the classification layer, with a minor updating in the off-the-shelf and adaptation layers. In the second step, is reduced by 0.1 than the first step to make the transfer loss play a subordinate role as a constraint term.
IV Experimental Results and Discussion
IV-A Datasets and Tasks Description
In our experiments, we analyze the transferability of features using different source tasks and networks, and evaluate the proposed method on the target task, OpenSARShip recognition. The alternative source datasets / tasks contain ImageNet, TerraSAR-X images and SAR targets of MSTAR, for classification or reconstruction. Here are the brief descriptions of these datasets and tasks.
IV-A1 ImageNet for Classification
ImageNet is a well-known large-scale dataset of natural images in computer vision, providing the most comprehensive and diverse coverage of the image world [34]. It contains 3.2 million labeled images over 5247 categories, over 600 images for each category on average. Generally, a subset of the large hand-labeled ImageNet dataset with 1.2 million images in 1000 object classes is considered as the benchmark to train the deep networks in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [35] where the remarkable deep CNNs for image classification, such as AlexNet [36], GoogLeNet [37] and ResNet [38], are proposed in 2012, 2015 and 2016, respectively.
IV-A2 TerraSAR-X Images for Classification and Reconstruction
Firstly, we collect over 50,000 SAR image slices without annotation. These SAR slices are randomly cropped from SAR scene images covering various landscapes from TerraSAR-X, a German Earth-observation satellite which provides high-quality and precise earth observation data of 3 m resolution with StripMap mode. With rich texture information in those unlabeled SAR slices, a deep stacked convolutional auto-encoders is trained to reconstruct the slices, generating a series of hierarchical convolution layers capable of extracting efficient features.
Besides, a high-resolution SAR land cover annotated dataset [39], collected from TerraSAR-X horizontally polarized (HH), multi-look ground range detected (MGD) products, is applied for SAR land cover classification task in our experiments. The selected SAR images were taken in High Resolution Spotlight mode with the pixel spacing of 1.25 m, acquired with an incidence angle between 20*∘* and 50*∘*, and with descending and ascending pass directions. Covering 288 full scenes of urban and non-urban areas all over the world, such as cities in Africa, Asia, Europe and some ocean areas, this dataset is hierarchically annotated of 3 levels, 150 categories and more than 100,000 patches. In our experiments, 7 categories with a high-level annotation of Settlements, Public transportation, Industrial areas, Agricultural land, Natural vegetation, Bare ground and Water bodies are applied for classification, as shown in Fig. 5.
IV-A3 MSTAR for SAR Target Recognition
The Moving and Stationary Target Acquisition and Recognition (MSTAR) public release dataset [40] collected by Sandia National Laboratory SAR sensor platform contains 10 categories of military vehicles: the T72, BTR70, BMP2, 2S1, BRDM2, BTR60, D7, T62, ZIL131 and ZSU23, with the resolution of 1 ft on X-band. Those targets chips acquired at depression angle of 17*∘* are usually used as the training data, 15*∘* as the testing data to evaluate the SAR target recognition algorithms. Details of MSTAR dataset for 10-category SAR target recognition are shown in Table I.
IV-A4 OpenSARShip for SAR Target Recognition
Huang et al. [41] present a SAR ship dataset of Sentinel-1, containing 11346 ship chips from 41 Sentinel-1 SAR images. The dataset provides the Single Look Complex (SLC) and the Ground Range Detected (GRD) product of the IW mode, with polarization of VH and VV, as well as four formats of amplitude values, visualized data in gray scale, visualized data in pseudo-color and radiometric calibrated data. OpenSARShip contains 17 types of ships, such as Cargo, Tanker, Passenger, Tug, etc, but unbalanced numbers in each type (8470 in Cargo and 4 in Towing for example). There are 5 elaborated types in Cargo, naming Cargo, Container Ship, Bulk Carrier, General Cargo and Other Cargo. In order to evaluate the method with limited target data and balance the training numbers of each category, we select the elaborated types of Cargo, Container Ship and Bulk Carrier of GRD mode (with resolution of 10 m) and VV polarization in our experiments, filtering those ship chips with the size larger than 70 70 pixel to ensure the sufficient image information. The details are shown as Table II and Fig. 6.
IV-B What to Transfer
In this section, we will discuss how the different networks and source tasks affect the transferability of features in SAR target recognition and then apply the conclusion to our subsequent experiments. Azizpour et al. [22] indicated that over-parameterizing the network by increasing the width (number of parameters at each layer) and depth (the number of convolution layers) can improve the performance on other datasets in transfer learning when they are close to the source tasks but may harm the transferability of features to distant target tasks. According to the previous researches, it’s important to select the appropriate network and source task to transfer in SAR target recognition problem.
Aiming at recognizing the SAR ship targets of OpenSARShip dataset with only hundreds of labeled images, the first thought would be transferring layers from a close dataset such as MSTAR. We will discuss different networks pre-trained on MSTAR dataset in IV-B1. Besides, we will explore how other source data or tasks, such as ImageNet classification, SAR images reconstruction, and SAR land cover classification, perform on transferring to SAR target recognition in IV-B2, as well as the transitive transfer method using multi-source tasks.
IV-B1 What Network
Three networks of A_ConvNet [42] which has the state-of-the-art performance on MSTAR recognition, H_Net [13] which is also well-performed on MSTAR using the stacked convolutional auto-encoders to learn hierarchical layers with unlabeled SAR images and transfer to SAR targets, and AlexNet [36] which is the breakthrough in large-scale image classification with deep neural network, are explored in this section. With more than 90% of parameters in fully-connected layers, we only use the convolution layers of AlexNet due to the data scale in SAR targets, denoted as AlexNet_Conv.
As depicted in Table III, denotes channels and the kernel size of in each convolution layer. We denote the network retrained on MSTAR and OpenSARShip as and , respectively. It can be seen in Fig. 7 that as the network going deeper and wider from A_ConvNet to AlexNet_Conv, the performance on SAR targets is decreasing. A_ConvNet is successful in MSTAR because the smaller network offers an appropriate feature space to fit the limited data. When it comes to a deeper and wider network, a more complex and non-linear function is going to be learnt with limited data which is difficult to find the optimal solution.
Next, we follow the instruction in Section III-A to analyze the feature transferability in each layer of the three models, , , and . Considering the hyper-parameter of the conv5 layer in A_ConvNet is specific to classification, only the first four convolution layers are transferred in our experiments. We record the recognition rate on OpenSARShip test data as , denoting the model trained by transferring and freezing the first layers of the model where is in . The remaining higher layers together with the classification layer are randomly initialized and trained on OpenSARShip. The performance of is shown in Fig. 8 and Table IV. Although A_ConvNet performs better on small scale dataset like OpenSARShip and MSTAR training from scratch than H_Net and AlexNet_Conv, the features in each layer of reflect low generality to OpenSARShip, observing the performance of is not as good as . On the other hand, the over-parameterized networks and improve the performance on OpenSARShip in transfer learning.
It can be inferred that even though the difficulty for a small dataset to find an optimal solution in training a deeper and wider network, the learnt features are more general to a related task so that the transferring features are able to help the related target task find a better solution.
IV-B2 What Source Data / Tasks
Intuitively, we can imagine that the closer data or tasks are better to provide transferable features to SAR target recognition, such as other kind of SAR target recognition, SAR land cover classification. However, in some cases, we do not have enough labeled SAR data to pre-train a deeper network with strongly representative features. On the other hand, the abundant unlabeled SAR images can be easily collected. Huang et al. [13] indicated that the large scale of unlabeled SAR scene data can be reconstructed with training a stacked convolution auto-encoders of which the stacked convolutional layers are capable to transfer to SAR target recognition task. Still, whether the well-known natural images pre-trained models popular in transferring to other remote sensing tasks are transferable to SAR targets remains to be explored.
Firstly, we experiment the AlexNet_Conv pre-trained with ImageNet denoted as , SAR land cover dataset denoted as and MSTAR denoted as respectively in transferring to OpenSARShip and the results can be found in Fig. 9 and row 4 of Table IV. Compared with the source tasks of SAR land cover classification and MSTAR target recognition, the features in the first layer of perform well on generalizing but show much specificity in higher layers, performing a significant drop when transferring and freezing the second to fifth convolution layers. Even though the low-level features learnt from natural images that resemble Gabor filters are effective to represent SAR targets, the features from higher layers are more specific on natural images which indicates more distant the mid-level features of natural images and SAR targets present, much worse in high layers. On the other hand, features in and show more robust on generalization to SAR targets. More specifically, the SAR land cover classification trained model performs better in higher layers due to the large scale dataset with abundant SAR image information and the similar task of classification.
What if we don’t have the large-scale annotated SAR images to pre-train a deep network? Row 3 of Table IV shows how the unlabeled SAR images performs in transferring. Due to the distance between unlabeled SAR images reconstruction task and the OpenSARship recognition, the transferability of features in decreases to be specific just in layer 2 while the MSTAR recognition task much more similar with our target task results in more general features in .
Limited SAR annotated data in reality, it is not easy to find a source task which is both similar to SAR targets and with a large amount of related data. With features specific to natural images in higher layers of ImageNet pre-trained models and specific to reconstruction tasks of unlabeled SAR pre-trained models, we are going to explore the transitive transfer method with multi-source tasks related to SAR targets to enhance the generality of features in deep networks.
The denotes the network of simply finetuning the convolution layers on with MSTAR dataset. Fig. 10 shows the performance of transferring different layers of the pre-trained network to OpenSARShip recognition, comparing with the black line which denotes the performance of . The areas above the black line indicate the features are general to the target task and those below the line indicate the specificity. Strikingly, distinctly increases the generality of features in mid and high layers which indicates although the distant source task of unlabeled SAR images reconstruction, the intermediate task of MSTAR classification has an impact on enhancing the transferability of features to other SAR target recognition tasks, on the condition that the pre-trained model on SAR images reconstruction provides a good basis.
Now that the multi-source transitive transferring performs well on feature generalization, we attempt to explore the ImageNet pre-trained model transferring to SAR target recognition. Yosinski et al. [21] pointed out that the fragile co-adaptation would affect the performances when freezing the first several layers. Our experiments prove that the effect of the co-adaptation in training AlexNet_Conv with OpenSARShip can be ignored due to the tiny fluctuation, as shown in Fig. 11. is fine-tuned with a subset of annotated SAR land cover dataset with 12,000 slices for classification, obtaining . Similarly, and are obtained with MSTAR dataset. As shown in Fig. 12, the transferability of features from layer 2 to layer 4 are remarkably increased from to , especially in where the features generality of the fourth layer is comparable to the lower-level features.
The distance among those tasks is illustrated in Fig. 13. The abundant SAR scene images from similar sensors to SAR targets are suitable for pre-training a deep network to transfer to other SAR related tasks with limited data, but the image reconstruction task with unlabeled data is distant to SAR target recognition task which affect the transferability of features in mid and high layers. As a comparison, the MSTAR classification is close to OpenSARShip recognition while the MSTAR data is limited to train a deeper and wider network. Consequently, if it is possible to obtain a large-scale annotated SAR image dataset, the pre-trained model will be very useful for SAR target recognition. If not, the unlabeled SAR images also help as transitive transfer 1 shows in Fig. 13 where MSTAR classification task can build a bridge between unlabeled SAR image reconstruction and OpenSARShip recognition to improve the generality of features in layers. On the other way, if you want to take the use of the natural images pre-trained models to SAR related problems, we will give an advice to learn some information from SAR images based on the model as transitive transfer 2 shows in Fig. 13, which is useful in decreasing the specificity to natural images of features in higher layers.
The analysis in this section reveals that the transferability of features is influenced by the generality of the transferred network and the distant between the source and the target tasks, that is to say, the network and the source tasks both have an impact on transferring to SAR target recognition task. Multi-source transitive transferring is a good idea to combine different source datasets from large-scale to limited, as well as from distant to similar, to obtain more general features. The network gradually learns more useful knowledge in the process of completing different tasks. Despite the large diversity between natural images and SAR targets, the low-level features are general and transferable, and fixing with more knowledge of SAR images via multi-source transitive transferring can notably increase the transferability. Multi-source transitive transferring method can not only adopt a larger network, but also combine the greatly generic low-level features of training on ImageNet and the improving transferable features in higher layers.
IV-C Where and How to Transfer Effectively
In the previous researches, Yosinski et al. [21] found that the performance drops in fully-connected layers, due to the representation specificity when transferring to other natural images. As a result, the follow-up studies [23, 43, 32] are accustomed to adapting features in each of the fully-connected layers when transferring to other natural images. Moreover, Hu et al. [2], Zhao et al. [6], Marmanis et al. [5] individually transfer the high-level features from the first fully-connected layer of AlexNet to remote sensing images classification task. However, in our previous discussion we find that this conclusion cannot be simply applied to SAR target recognition. In this section, we will discuss where to transfer features in different situations and how to transfer more effectively to reduce the discrepancy between source and target domain.
The features in are good enough to transfer to the SAR targets. For MSTAR dataset, the model achieves an overall accuracy of 99.34% by fine-tuning all layers, better than the state-of-the-art. And for OpenSARShip, it performs with a fine-tuning result of 91.04% which is 1.73%, 1.1%, 4.23% and 2.6% better than , , and , respectively. In this part, we mainly focus on the four pre-trained models from AlexNet_Conv and H_Net to see how to make them more effective in transferring to SAR targets. Fig. 14 presents the transferability of each layer in different scenarios. In and scenario, the first four convolution layers show a strong ability to extract the generic features of OpenSAR but rapidly decreased in layer 5. However, in and scenario, even though the performance of transferring the first four layers are not as good as AlexNet_Conv ones, the features in layer 5 present a better generalization. We visualize the features in layer 4 and layer 5 from MSTAR and OpenSARShip dataset of different scenarios by t-sne [44], as shown in Fig. 15, where the blue dots denote the MSTAR dataset and the orange ones denote the OpenSARShip dataset. The features of MSTAR and OpenSARShip of layer 5 in can be simply distinguished which indicates the large difference of feature distributions between source and target data. In , however, the feature distribution presents more indistinguishable between source and target data, more noticeable in . These properties concern the choice of strategies of how to transfer the features in SAR target recognition.
We experiment the ITL and STL algorithms proposed in Section III-C in different scenarios of AlexNet_Conv and H_Net.
IV-C1 AlexNet_Conv
For STL algorithm, according to the previous analysis, we consider the layer as the off-the-shelf layers in and because of the great performance of and while the layer 5 as the adaptation layer. After the first step of updating the adaptation layer, we can observe an obvious improvement on similar feature distributions of source and target data which implies the discrepancy of features in source and target is decreased, as shown in Fig. 16. In this step, the off-the-shelf layers are fixed since the quality of generality makes them possible to extract the off-the-shelf features of OpenSARShip data. Moreover, it is an unsupervised learning part so that all labeled and unlabeled OpenSARShip data can be used to narrow the gap of feature distributions. Next, the classification layer is trained with labeled OpenSAR data by combining the cross-entropy loss of labels and outputs of Softmax layer and the transfer loss. In this part, the learning rate in layer 1, 2, 3, 4, 5 are set to so that the previous layers are slightly fine-tuned and the learning rate of classification layer is set to which is 100 times larger than previous layers. The transfer loss constrains the whole network to maintain the property of narrowing the discrepancy between source and target, and it should be controlled by the trade-off to avoid dominating the total loss and preventing the continuous decreasing of classification loss. In our experiments, we set as 1.5 in AlexNet_Conv scenarios.
Table V shows the performance of different algorithms in different scenarios. STL approach boosts the performance by 1.44% and 1.96% respectively on and compared with the common fine-tuning methods on transfer learning. For ITL approach of combining the transfer loss and classification loss to fine-tune all layers, the results are not as good as the STL but still better than simply fine-tuning method, improving 0.57% and 0.48% on and , respectively. It refers that the transfer loss certainly has an impact on improving the performance of classification but the first 4 layers in AlexNet_Conv scenarios are enough to extract the general features of OpenSARShip so that the constraint of transfer loss would be better not to affect the off-the-shelf layers to restrict the good feature representation. This also verifies the advantage of analyzing the transferability of layers to distinguish the off-the-shelf and the adaptation layers.
IV-C2 H_Net
In H_Net scenario, the features of OpenSARShip in layer 4 and layer 5 are more likely to share the similar distribution with MSTAR than AlexNet_Conv scenarios but the performance of either transferring and fixing the first 4 layers or fine-tuning all layers is worse, as shown in Table IV and V. The underlying reason lies in the fact that the base network of has a stronger ability to extract generic features than and . Consequently, in lower layers the transitive transferring via multi-source makes an effort of generalizing the features to OpenSAR to improve the performance of SAR target recognition task while in high-level layer the discrepancy dominates the transferability. In our experiments, the ITL approach improves the performance by 1.15% compared with fine-tuning all layers. With STL algorithm, considering the decline of feature generalization in higher layers, we set the layer 4 and layer 5 as the adaptation layers in scenario and improve the performance of 2.02%, compared with simply fine-tuning all layers. The multiples of learning rate in each convolution layer are set to 0.1, 0.1, 0.1, 0.5, 1, and 10 in classification layer. The results are sensitive to the trade-off value of . In the second step of combining the transfer loss and classification loss, must be set to a smaller value to constrain the transfer loss due to the major effect on fine-tuning with the classification loss.
When it comes to , we observe in Table. V that the performance of STL is not as good as ITL. Fig. 14 shows that the generalization of features in lower layers is not as good as the layer 5 especially in bottom layers which indicates that the lower layers have the potential to improve the ability of extracting good features by fine-tuning rather than treated as the off-the-shelf layers. ITL and STL improve the performance of recognizing the OpenSARShip by 1.44% and 0.87% in , respectively. As a result, combining the transfer loss with the classification loss to fine-tune all layers as ITL approach is a better choice in .
V Conclusion
In this paper, we elaborately explore what network and source tasks are better to transfer, in which layer the features are more generic to transfer and how to effectively transfer in SAR target recognition. We find that the transferability is up to generalization capacity of the network and the distance between source and target task. A small network is appropriate to train with limited labeled SAR targets but when transferring to other SAR target recognition tasks the feature generality is not enough to extract a good representation. As a result, a larger network trained with a large-scale dataset and a source task similar to SAR target recognition are both required. If possible, a deep network pre-trained with a large-scale annotated SAR scene dataset is a good source to transfer and we have released the resource in [19]. Otherwise, the unlimited unlabeled SAR images are also helpful especially using transitive transfer proposed in this paper to transfer knowledge from large-scale dataset to small-scale one, with closer distance to SAR target recognition task. We do not suggest to use natural images pre-trained model straightforwardly to SAR targets due to the large difference between them which may result in much specific features in higher layers. Instead, the mid level features specific to natural images can be generalized to SAR target by transitive transfer with SAR related tasks. In order to decrease the discrepancy between source and target domain in very high layer, the proposed MK-MMD based transfer method to separately train the adaptation layer and slightly update the off-the-shelf layers is recommended which improves the performance than simply fine-tuning all layers in SAR target recognition transferring.
Acknowledgment
We thank Dr. Corneliu Octavian Dumitru in German Aerospace Center (DLR) to provide the TerraSAR-X annotated land cover images and we also thank Science Service System for the provision of images (Proposals MTH-1118 and LAN-3156).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] O. A. B. Penatti, K. Nogueira, and J. A. dos Santos, “Do deep features generalize from everyday objects to remote sensing and aerial scenes domains?” in Proc. 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , June 2015, pp. 44–51.
- 2[2] F. Hu, G.-S. Xia, J. Hu, and L. Zhang, “Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery,” Remote Sens. , vol. 7, no. 11, pp. 14 680–14 707, 2015. [Online]. Available: http://www.mdpi.com/2072-4292/7/11/14680
- 3[3] S. Wang, D. Quan, X. Liang, M. Ning, Y. Guo, and L. Jiao, “A deep learning framework for remote sensing image registration,” ISPRS Journal of Photogrammetry and Remote Sensing , vol. 145, pp. 148–164, 2018. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S 0924271617303891
- 4[4] Z. Chen, T. Zhang, and C. Ouyang, “End-to-end airplane detection using transfer learning in remote sensing images,” Remote Sens. , vol. 10, no. 1, 2018. [Online]. Available: http://www.mdpi.com/2072-4292/10/1/139
- 5[5] D. Marmanis, M. Datcu, T. Esch, and U. Stilla, “Deep learning earth observation classification using imagenet pretrained networks,” IEEE Geosci. Remote Sens. Lett. , vol. 13, no. 1, pp. 105–109, 2016.
- 6[6] B. Zhao, B. Huang, and Y. Zhong, “Transfer learning with fully pretrained deep convolution networks for land-use classification,” IEEE Geosci. Remote Sens. Lett. , vol. 14, no. 9, pp. 1436–1440, 2017.
- 7[7] G. Fu, C. Liu, R. Zhou, T. Sun, and Q. Zhang, “Classification for high resolution remote sensing imagery using a fully convolutional network,” Remote Sens. , vol. 9, no. 5, p. 498, 2017.
- 8[8] Y. Yuan, X. Zheng, and X. Lu, “Hyperspectral image superresolution by transfer learning,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. , vol. 10, no. 5, pp. 1963–1974, 2017.
