A Novel Cost Function for Despeckling using Convolutional Neural   Networks

Giampaolo Ferraioli; Vito Pascazio; Sergio Vitale

arXiv:1906.04441·eess.IV·January 17, 2020

A Novel Cost Function for Despeckling using Convolutional Neural Networks

Giampaolo Ferraioli, Vito Pascazio, Sergio Vitale

PDF

TL;DR

This paper introduces a new convolutional neural network-based despeckling method for SAR images, utilizing a novel cost function that considers spatial consistency and noise statistics to improve image clarity.

Contribution

It proposes a new cost function for CNN-based despeckling that enhances SAR image quality by integrating spatial and statistical noise properties.

Findings

01

Improved despeckling performance on simulated SAR data.

02

Effective preservation of image details and structures.

03

Enhanced noise reduction compared to traditional methods.

Abstract

Removing speckle noise from SAR images is still an open issue. It is well know that the interpretation of SAR images is very challenging and despeckling algorithms are necessary to improve the ability of extracting information. An urban environment makes this task more heavy due to different structures and to different objects scale. Following the recent spread of deep learning methods related to several remote sensing applications, in this work a convolutional neural networks based algorithm for despeckling is proposed. The network is trained on simulated SAR data. The paper is mainly focused on the implementation of a cost function that takes account of both spatial consistency of image and statistical properties of noise.

Tables3

Table 1. TABLE I: Hyper-parameters of the proposed network

Layer	Features	Kernel	Lerning	Batch	ReLU
Layer	Maps	Dimension	Rate	Normalization	ReLU
1	64	$3 \times 3$	$2 \cdot 10^{- 6}$	False	False
2-9	64	$3 \times 3$	$2 \cdot 10^{- 6}$	True	True
10	1	$3 \times 3$	$2 \cdot 10^{- 6}$	False	False

Table 2. TABLE II: Numerical Results: M-index evaluated on clip1 and clip2

method	clip1	clip2
Proposed	5.59	6.55
PPB	10.65	10.27

Table 3. TABLE III: Numerical Results: M-index evaluated on real SAR image

method	M-index
Proposed	8.36
PPB	7.29

Equations20

Y = f (X, N) = X \cdot N

Y = f (X, N) = X \cdot N

p (N) = \frac{1}{Γ ( L )} N^{L} e^{- N L}

p (N) = \frac{1}{Γ ( L )} N^{L} e^{- N L}

z^{(l)} = w^{(l)} * x^{(l)} + b^{(l)},

z^{(l)} = w^{(l)} * x^{(l)} + b^{(l)},

z^{(l)} (m, \cdot, \cdot) = n = 1 \sum N w^{(l)} (m, n, \cdot, \cdot) * y^{(l)} (n, \cdot, \cdot) + b^{(l)} (m) .

z^{(l)} (m, \cdot, \cdot) = n = 1 \sum N w^{(l)} (m, n, \cdot, \cdot) * y^{(l)} (n, \cdot, \cdot) + b^{(l)} (m) .

y^{(l)} ≜ f_{l} (x^{(l)}, Φ_{l}) = {max (0, w^{(l)} * x^{(l)} + b^{(l)}), w^{(l)} * x^{(l)} + b^{(l)}, l < L l = L

y^{(l)} ≜ f_{l} (x^{(l)}, Φ_{l}) = {max (0, w^{(l)} * x^{(l)} + b^{(l)}), w^{(l)} * x^{(l)} + b^{(l)}, l < L l = L

f (x, Φ) = f_{L} (f_{L - 1} (\dots f_{1} (x, Φ_{1}), \dots, Φ_{L - 1}), Φ_{L})

f (x, Φ) = f_{L} (f_{L - 1} (\dots f_{1} (x, Φ_{1}), \dots, Φ_{L - 1}), Φ_{L})

\hat{X} = f (x, Φ)

\hat{X} = f (x, Φ)

C = λ C_{1} + C_{2}

C = λ C_{1} + C_{2}

C_{1} = SID (\frac{Y}{X ^}, \frac{Y}{X}) = SID (\hat{N}, N)

C_{1} = SID (\frac{Y}{X ^}, \frac{Y}{X}) = SID (\hat{N}, N)

C_{2} = ∣∣ \hat{X} - X ∣ ∣^{2}

C_{2} = ∣∣ \hat{X} - X ∣ ∣^{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Novel Cost Function for Despeckling using Convolutional Neural Networks

Giampaolo Ferraioli

Dipartimento di Scienze e Tecnologie

*Università di Napoli Parthenope

*Napoli,Italy

[email protected]

Vito Pascazio

Dipartimento di Ingegneria

*Università di Napoli Parthenope

*Napoli,Italy

[email protected]

Sergio Vitale

Dipartimento di Ingegneria

*Università di Napoli Parthenope

*Napoli,Italy

[email protected]

Abstract

Removing speckle noise from SAR images is still an open issue. It is well know that the interpretation of SAR images is very challenging and despeckling algorithms are necessary to improve the ability of extracting information. An urban environment makes this task more heavy due to different structures and to different objects scale. Following the recent spread of deep learning methods related to several remote sensing applications, in this work a convolutional neural networks based algorithm for despeckling is proposed. The network is trained on simulated SAR data. The paper is mainly focused on the implementation of a cost function that takes account of both spatial consistency of image and statistical properties of noise.

Index Terms:

SAR, speckle, cnn, despeckling, deep learning

I Introduction

In the last decades, remote sensing has continuously grown providing more and more images of the planet. The way to extract useful informations is still an open issue, even more when we are dealing with SAR sensors. SAR images are affected by multiplicative noise called speckle, that impairs performances of different tasks such as classification, object detection and segmentation. In fact, in these years a very big area of research has grown to tackle this problem and a lot of despeckling algorithms have been proposed. As said before, speckle is a multiplicative noise given by the interaction of electromagnetic fields scattered in different directions from a rough surface. Let’s consider $Y$ a SAR image, it can be expressed as [1]:

[TABLE]

where $X$ is the noise-free image and $N$ is the multiplicative speckle. In the hypothesis of fully developped speckle, its distribution is known and, for an intensity image, it is a Gamma distribution [2]:

[TABLE]

where $L$ is the number of looks of SAR image, (Fig. 1). An ideal despeckling filter will remove the noise without introducing artefacts and preserving the spatial informations. The despeckling filters are usually divided in two categories: local and non local filters. The formers as Lee[3], Enhanced Lee[4] and Kuan filter[5] rely on similarity between the target and its adjacent pixels. The latter as Patch Probabilist Based (PPB)[6], SAR-BM3D[7], NL-SAR [8] look for similarity in a wider window search. Nowadays, with the increasing of deep learning solutions in a lot of fields related to image processing, another branch of filters has born. Indeed, in the last years also convolutional neural networks (CNN) based solutions have been proposed such as [9], [10]. Using CNNs for despeckling is quite challenging because the lack of a clean reference: once a real SAR image is acquired, there is no possibility to have a speckle free image to use as reference.

The trends to overcome this problem are mainly two:

•

training a network to perform one of despeckling filter as in [10], in which a CNN is proposed to perform multilook when there is no chance to have several acquisitions of same data;

•

training on simulated data as in [9].

As in [9], in this work SAR simulated data are used. Clean images are taken from three datasets: UCID, BSD[11] and scraped Google Maps[12]. The Google Maps dataset is composed by images in urban environment, instead in UCID and BSD there are generic images.

II Proposed Approach

In this work a deep learning solution for despeckling is proposed. It is focused on the use of deep convolutional neural networks and on their ability to predict the noise and provide a filtered image in which spatial and statistical details are preserved.

II-A Convolutional Neural Networks

A CNN is composed by a combination of several layers, connected in different ways (cascade, parallel, loop). Each layer can perform different function: convolution, pooling, non-linearities.

A generic layer provides a set of $M$ so-called feature maps. Higher is the level of the layer, more abstract is its output and more representative of overall interaction between layers. So the $l$ -th generic convolutional layer, for $N$ -bands input $\mathbf{x}^{(l)}$ , yields an $M$ -band output $\mathbf{z}^{(l)}$

[TABLE]

whose $m$ -th component is a combination of 2D convolutions:

[TABLE]

The tensor $\mathbf{w}$ is a set of $M$ convolutional $N\times(K\times K)$ kernels, with a $K\times K$ spatial support (receptive field), while $\mathbf{b}$ is a $M$ -vector bias. These parameters, $\Phi_{l}\triangleq\left(\mathbf{w}^{(l)},\mathbf{b}^{(l)}\right)$ , are learnt during the training phase. In this work we use a pointwise ReLU activation function $g_{l}(\cdot)\triangleq\max(0,\cdot)$ yielding the intermediate layer outputs

[TABLE]

whose concatenation gives the overall CNN function

[TABLE]

where $\Phi\triangleq(\Phi_{1},\ldots,\Phi_{L})$ is the whole set of parameters to learn.

In the proposed solution, the network (Fig. 2) is composed by 10 convolutional layers each, except the first and the last, followed by a Rectified Linear Unit (ReLu) activations to ensure fast convergence. The network has a single band image affected by speckle noise $Y$ , the overall output is its filtered version

[TABLE]

II-B Training

The goal of the work is to provide a network for despeckling urban areas. For this aim the CNN is trained on the Google Maps dataset that supply a set of urban images on which speckle is simulated according to (1) and (2). Moreover, in order to give robustness to the network, also a set of generic grayscale images from the UCID and BSD dataset are taking in count for the training.

The training process is performed by the Stochastic Gradient Descent with momentum, with learning rate $\eta=2\cdot 10^{-6}$ on $30000\times(65\times 65)$ training patches and $12000\times(65\times 65)$ for the validation.

The cost function $C(\cdot)$ computes the distance between output and reference and according to its value, the parameters $\Phi$ of the network are updated via the SGD optimization process

[TABLE]

In this work $C(\cdot)$ is a linear combination of two terms: $C_{2}$ is the mean squared error between filtered image and the noise-free reference; $C_{1}$ computes a single band adaptation of Spectral Information Divergence (SID) [13] between the estimated ratio image $\hat{N}$ and the reference one $N$ . Using $C_{2}$ ensures to minimize the spatial distance between $\hat{X}$ and $X$ . Minimizing $C_{1}$ makes the network able to predict the speckle noise and preserve its statistical properties. The aim of using this cost function is two fold: first the network has to predict directly the clean image, second has to take care about the statistical properties of the noise and to do not remove spatial details from the noisy image, but just the speckle.

III Experimental results

In order to assess the performance in an urban environment, the proposed solution is tested on Google Maps images. The networks has never seen these images during the training process. In Fig. 3-4 is shown a comparison with PPB, one of the most well known solution in the state of art for despeckling. Although the PPB filtered images seems to be very clean, the proposed solution preserves better the spatial details and give a closer result to the reference. The network seems to remove the noise and to preserve spatial details that in PPB tend to disappear. PPB works well on big scale object like large buildings and roads, but the overall result tends to be over smoothed and so the most of lower scale objects are filtered. The proposed solution is able to generalize the object scale: it can remove the noise saving spatial details at different scales. In fact, cars and trees are still visible in Fig. 3, as well as the reconstruction of the roofs in Fig. 4. Given that a despeckling solution can be used as pre-processing for other tasks like classification and object detection, preserving objects at different scale plays a very important role in the assessment of performances.

Moreover, in Tab. II numerical results are shown. For numerical assessment M-index [14] has been computed: this index takes into account the filtering accuracy in both regularizing homogeneous areas, computing the Equivalent Number of Looks (ENL), and preserving structures and details, computing homogeneity of ratio images. An ideal filter would produce an M-index equal to zero. The values of this index confirm what we say in the visual comparison.

Same considerations can be done for real data: in Fig. 5 results on a real SAR images are shown. Without a reference it is difficult to state the quality of a filter, so together with filtered images (top row) we show also the ratio between noisy and filtered image (bottom row). Even if Tab III shows a better M-index for PPB, also in this case the proposed solution better preserves details than PPB that tends to present an over-smoothed filtered image as well. Considering the ratio images, it is clear that PPB suppresses a lot of details, meanwhile the proposed solution faces some difficulties filtering strong scatterers.

IV Conclusion and Future Works

In this work a deep convolutional neural network for despeckling in urban areas is proposed. The network is trained and tested on simulated data. Moreover, the CNN is trained to predict both the clean image and the noise, in order to ensure spatial and statistical consistency in the filtered image. The results are encouraging, the estimated clean images show good details preservation and don’t seem to create spatial artefacts on homogeneous areas. In future works, the potential of CNN for despeckling in unsupervised learning will be explored in order to avoid the use of a clean reference.

Bibliography14

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Lapini F. Argenti and L. Alparone, “A tutorial on speckle reduction in synthetic aperture radar images,” IEEE Geosci. Remote Sens. Mag. , vol. 1, no. 3, pp. 6–35, 2013.
2[2] R. Touzi, “A review of speckle filtering in the context of estimation theory,” IEEE Transactions on Geoscience and Remote Sensing , vol. 40, no. 11, pp. 2392–2404, Nov 2002.
3[3] J. Lee, “Digital image enhancement and noise filtering by use of local statistics,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. PAMI-2, no. 2, pp. 165–168, March 1980.
4[4] A. Lopes, R. Touzi, and E. Nezry, “Adaptive speckle filters and scene heterogeneity,” IEEE Transactions on Geoscience and Remote Sensing , vol. 28, no. 6, pp. 992–1000, Nov 1990.
5[5] D. T. Kuan, A. A. Sawchuk, T. C. Strand, and P. Chavel, “Adaptive noise smoothing filter for images with signal-dependent noise,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. PAMI-7, no. 2, pp. 165–177, March 1985.
6[6] C. A. Deledalle, L. Denis, and F. Tupin, “Iterative weighted maximum likelihood denoising with probabilistic patch-based weights,” IEEE Transactions on Image Processing , vol. 18, no. 12, pp. 2661–2672, Dec 2009.
7[7] S. Parrilli, M. Poderico, C. V. Angelino, and L. Verdoliva, “A nonlocal sar image denoising algorithm based on llmmse wavelet shrinkage,” IEEE Transactions on Geoscience and Remote Sensing , vol. 50, no. 2, pp. 606–616, Feb 2012.
8[8] C. Deledalle, L. Denis, F. Tupin, A. Reigber, and M. Jäger, “Nl-sar: A unified nonlocal framework for resolution-preserving (pol)(in)sar denoising,” IEEE Transactions on Geoscience and Remote Sensing , vol. 53, no. 4, pp. 2021–2038, April 2015.