A CNN-Based Super-Resolution Technique for Active Fire Detection on   Sentinel-2 Data

Massimiliano Gargiulo; Domenico Antonio Giuseppe Dell'Aglio; Antonio; Iodice; Daniele Riccio; and Giuseppe Ruello

arXiv:1906.10413·cs.CV·June 26, 2019

A CNN-Based Super-Resolution Technique for Active Fire Detection on Sentinel-2 Data

Massimiliano Gargiulo, Domenico Antonio Giuseppe Dell'Aglio, Antonio, Iodice, Daniele Riccio, and Giuseppe Ruello

PDF

TL;DR

This paper introduces a CNN-based super-resolution method to enhance Sentinel-2 multispectral data, especially SWIR bands, for improved active fire detection at higher spatial resolution, validated on a fire-affected area.

Contribution

A novel CNN-based super-resolution technique for Sentinel-2 data that improves fire detection accuracy by increasing spatial resolution of key spectral bands.

Findings

01

Outperforms alternative super-resolution methods in accuracy metrics

02

Enhances active fire detection capabilities using super-resolved bands

03

Validated on Vesuvius fire-affected region

Abstract

Remote Sensing applications can benefit from a relatively fine spatial resolution multispectral (MS) images and a high revisit frequency ensured by the twin satellites Sentinel-2. Unfortunately, only four out of thirteen bands are provided at the highest resolution of 10 meters, and the others at 20 or 60 meters. For instance the Short-Wave Infrared (SWIR) bands, provided at 20 meters, are very useful to detect active fires. Aiming to a more detailed Active Fire Detection (AFD) maps, we propose a super-resolution data fusion method based on Convolutional Neural Network (CNN) to move towards the 10-m spatial resolution the SWIR bands. The proposed CNN-based solution achieves better results than alternative methods in terms of some accuracy metrics. Moreover we test the super-resolved bands from an application point of view by monitoring active fire through classic indices. Advantages and…

Tables1

Table 1. Table 1: In the left part of the table: average results in terms of main metrics (at 20-m), typically used in pansharpening and super-resolution context. In the right part of the table: average results in terms of classification metrics.

Methods	SAM	Q-index	ERGAS	HCC	Precision	Recall	IoU
	(0)	(1)	(0)	(1)	(1)	(1)	(1)
NNI	0.001960	0.9182	9.353	0.1355	0.8329	0.5773	0.5309
Bicubic	0.001964	0.9515	7.155	0.471	0.8387	0.5900	0.5471
HPF [26]	0.064590	0.9405	8.150	0.2826	0.7799	0.5991	0.5476
PRACS [25]	0.001979	0.9535	7.057	0.5117	0.7993	0.5987	0.5497
GS2-GLP [25]	0.050190	0.9540	7.043	0.4694	0.8008	0.6131	0.5571
SRNN [13]	0.001963	0.9688	5.943	0.6246	0.8373	0.5649	0.5158
$S R N N_{+}$ (Proposed)	0.001956	0.9743	5.425	0.6334	0.8414	0.5642	0.5157

Equations6

y^{(l)} = w^{(l)} * x^{(l)} + b^{(l)} .

y^{(l)} = w^{(l)} * x^{(l)} + b^{(l)} .

L (Φ^{(n)}) = E [x - \hat{x} (Φ^{(n)})_{1}]

L (Φ^{(n)}) = E [x - \hat{x} (Φ^{(n)})_{1}]

A F I_{1} = \frac{ρ _{12}}{ρ _{8}} A F I_{2} = \frac{ρ _{11}}{ρ _{8}} A F I_{3} = \frac{ρ _{12}}{ρ _{11}}

A F I_{1} = \frac{ρ _{12}}{ρ _{8}} A F I_{2} = \frac{ρ _{11}}{ρ _{8}} A F I_{3} = \frac{ρ _{12}}{ρ _{11}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A CNN-Based Super-Resolution Technique for Active Fire Detection on Sentinel-2 Data

M. Gargiulo , D. A. G. Dell’Aglio , A. Iodice , D. Riccio , and G. Ruello

University of Naples “Federico II”, Italy

Abstract— Remote Sensing applications can benefit from a relatively fine spatial resolution multispectral (MS) images and a high revisit frequency ensured by the twin satellites Sentinel-2. Unfortunately, only four out of thirteen bands are provided at the highest resolution of 10 meters, and the others at 20 or 60 meters. For instance the Short-Wave Infrared (SWIR) bands, provided at 20 meters, are very useful to detect active fires. Aiming to a more detailed Active Fire Detection (AFD) maps, we propose a super-resolution data fusion method based on Convolutional Neural Network (CNN) to move towards the 10-m spatial resolution the SWIR bands. The proposed CNN-based solution achieves better results than alternative methods in terms of some accuracy metrics. Moreover we test the super-resolved bands from an application point of view by monitoring active fire through classic indices. Advantages and limits of our proposed approach are validated on specific geographical area (the mount Vesuvius, close to Naples) that was damaged by widespread fires during the summer of 2017.

1 Introduction

The remote sensing products are exploiting more and more in the earth monitoring because of the increasing number of satellites [1]. The European Space Agency has recently launched the twin satellites Sentinel-2 which can acquire global data for different applications such as risk management (floods, subsidence, landslide), land monitoring, water management, soil protection and so forth [2]. Sentinel-2 data are even useful in burnt area and active fire monitoring, using several algorithms [3]. A plethora of these is essentially based on the threshold of spectral indices involving Near-Infrared (NIR) and Short-Wave Infrared (SWIR) bands [4, 5, 6] that Sentinel-2 provides at spatial resolution of 10 m and 20 m, respectively. Therefore, it is common to resort to the 20-m resolution indices, by just downscaling the NIR band from 10 m to 20 m. However, following this approach spatial information from NIR band would be lost. An alternative approach to enhance the AFD method using the Sentinel-2 images is to produce the Active Fire Indices (AFIs) by upscaling the SWIR bands from 20 m to 10 m. Beyond the shadow of a doubt the main issue is the choice of the method to improve the spatial resolution of the SWIR bands. In general, Single Image Super Resolution (SISR) and Super Resolution Data Fusion (SRDF) methods are the two most popular ways to increase the spatial resolution of the images. The SISR methods do not use additional information from other sources, and they rely on spatial features of original image to increase its own resolution. On the other hand, SRDF methods (for instance, pan-sharpening) are based on the idea that the spatial information from other sources is useful to improve the spatial resolution of the original image [7]. In order to produce the 10-m AFIs from Sentinel-2 bands with SRDF methods, the use of all the highest spatial resolution bands is not very benefiting because of their smoke-sensitivity. The major contribution is derived from the NIR which is the only band we consider in the SRDF approaches.

The rest of the paper is organized as follows. Section 2 describes the study area and the picked dataset. Section 3 gives more details about the methodology, focusing on the proposed CNN-based super-resolution method, hereafter $SRNN_{+}$ , on the spectral fire indices and on the considered accuracy metrics. Section 4 summarizes experimental results, placing more attention on the super-resolved SWIR bands both in terms of visual inspection and numerical analysis while conclusions are drawn in Section 5.

2 Study Area and Dataset

The area under investigation is located at the Vesuvius (in Fig. 1), a volcano close to Naples,Italy. We are motivated by the choice of the study area since the presence of a natural park with a huge variety of flora and fauna considering its limited size [8]. At the beginning of July 2017, hundreds of wildfires ignited and damaged across the Italy country, whose the most serious were at Vesuvius. In fact, fires had been interesting the Vesuvius area for several days and the situation quickly became more dangerous due to adverse climatic conditions (winds and dry weather) [9]. The considered dataset is the Sentinel-2 Level-1C product acquired on 12th July 2017. As we can see in Fig. 1, the area under investigation is mainly covered by heavy smoke (Fig. 1-(b)) which reduces the usability of 10-m spectral information.

3 Methodology

3.1 Proposed CNN-based Super-Resolution Fusion

Our goal is to improve the spatial resolution of SWIR bands using a Convolutional Neural Network (CNN). CNNs have attracted an increasing interest in many remote sensing applications, like object detection [10], classification [11], pansharpening [12], and others, because of their capability to approximate complex non-linear functions, benefiting from the reduction in computation time obtained thanks to the GPU usage. On the downside the availability of a large amount of data is required for training. In this work we propose to use a relatively shallow architecture. This is composed by a cascade of $L=3$ convolutional layers. The first two are interleaved by Rectified Linear Unit (ReLU) activations that ensure fast convergence of the training process [11], and a linear activation function is considered in the last layer. The $l$ -th ( $1\leq l\leq 3$ ) convolutional layer, with $N$ -band input $\mathbf{x}^{(l)}$ , yields an $M$ -band output $\mathbf{y}^{(l)}$

[TABLE]

In $l=1$ case, the $\mathbf{x}^{(l)}$ input is equal to the input, instead in $l=3$ case, the $\mathbf{y}^{(l)}$ is the output of the CNN. The tensor $\mathbf{w}$ is a set of $M$ convolutional $N\times(K\times K)$ filters, where a $K\times K$ is the receptive field, while $\mathbf{b}$ is a $M$ -vector bias. These parameters, $\Phi_{l}\triangleq\left(\mathbf{w}^{(l)},\mathbf{b}^{(l)}\right)$ , are refined during the training phase. Further information about the CNN architecture can be found in [13]. In the supervised learning we need to generate a large amount of training samples, i.e. examples of inputs-target pairs. As reported in the pansharpening case [14] the training samples have to ensue the Wald’s protocol, that means to consider as inputs the downsampled PAN-MS pairs and taking as corresponding output the original MS.

This approach has inspired our study where the highest spatial resolution bands play the role of the PAN and the MS is acted by SWIR. In our case we consider the training samples $\mathbf{x}^{(1)}$ = ( $\mathbf{\tilde{x}}$ , $\mathbf{\tilde{z}}$ ) as input to the network, where $\mathbf{\tilde{x}}$ and $\mathbf{\tilde{z}}$ are respectively the lower resolution version of the SWIR bands and of the 10-m bands provided by Sentinel-2, instead we consider the sharpened SWIR bands as output ( $\mathbf{y}^{(3)}$ = $\mathbf{x}$ ). Furthermore, the cost function and the learning optimization algorithm are required in the learning phase. In more details we use the L1-norm as cost function, in place of the L2-norm, to be more effective in error back-propagation when the computed errors are very low [12]. Specifically, the loss is computed on the cost function by averaging over the training examples at each updating step of the learning process:

[TABLE]

where $\mathbf{x}$ represents the target and $\hat{\mathbf{x}}$ the output of the CNN, dependent on the learnable parameters ( $\Phi^{(n)}$ ) . Instead, in this work we use the ADAM optimization method, an adaptive version of the Stochastic Gradient Descent (SGD), and it adapts the learning rates for each parameter of the CNN. This method requires very few tuning [15] and minimizes loss very speedily [16].

3.2 Spectral Fire Indices

The proposed model is evaluated, from the application point of view, by monitoring active fires through the computation of three different spectral indices (in Fig. 2), mainly used to this aim in literature [17, 18, 19] because of their ease computing. The AFIs [17, 18] are defined on Sentinel-2 data as follows:

[TABLE]

where $\rho_{8}$ is the 10-m spatial resolution NIR band, centered at the wavelength of 0.834 $\mu$ m; while $\rho_{11}$ and $\rho_{12}$ are the 20-m SWIR bands, centered at 1.610 $\mu$ m and 2.190 $\mu$ m, respectively. All of these bands represent the radiance data at top of the atmosphere. The choice of these indices is based on their physical properties . Specifically, the conditions $AFI_{1}>1$ and $AFI_{3}>1$ often occur in active fire; while the condition $AFI_{2}<1$ is verified near the fire fronts [19].

3.3 Results Accuracy Metrics

To evaluate the performance when the target image is available (in our case, at 20-m spatial resolution), the proposed method is compared to alternative methods using four reference metrics, commonly used for pansharpening [20]:

Spectral Angular Mapper (SAM) the spectral distortion between pixel of reference image and estimated one [21];

-

Universal Image Quality Index (UIQI, or Q-index), an image quality indicator introduced in [22];

-

Relative Dimensionless Global Error (as known as ERGAS) which reduces to the root mean square error (RMSE) in case of single band [7];

-

High-frequency Correlation Coefficient (HCC), the correlation coefficient between the high-pass components of two images [13].

For a full resolution analysis we consider the active fire monitoring application, and all the methods compete with each others in terms of binary classification. To this end we need to define a ground truth on which computing the main classification metrics. In this context, such ground truth is performed with a differential multi-temporal approach, based on a thresholding of the difference between two cloudily-free realizations of Normalized Difference Vegetation Index (NDVI) in two different date (before and after the fire event). This ground truth (GT) is affected by noise (or small bright pixels) and so we have used a morphological operator (opening) to erase this undesired noisy effect.

Thus, we compare this GT with the active fire maps obtained by thresholding the above-mentioned spectral indices. In our case, the AFIs use the super-resolved bands with the different considered approaches. In particular we consider different thresholds on each of AFIs to match the best detection of the active fires. In order to evaluate the quality of the obtained binary maps, we have considered some metrics, typically used in classification task:

Precision (P) is the ratio between the correctly predicted positive observations and the total predicted positive ones;

-

Recall (R) is the ratio between the correctly predicted positive observations to the all observations in actual class;

-

Intersection over Union (IoU) is the ratio between the overlapping area and the union area. The intersection and the union are computed on the predicted positive observations and the positives from the GT.

It is worthwhile to remember that a high precision corresponds to a low false positive rate. In other words, higher the percentage of correctly predicted positive over the total predicted positive, higher precision. Instead, a high recall corresponds to a low false negative rate, that means higher recall higher detection rate.

4 Results and Discussion

4.1 Training Phase

Given the lack of sufficient available input-output samples in the present context, we start from a pre-trained CNN solution [13] to train the network’s parameters $\Phi^{n}$ . In [13] a super-resolution technique is considered for $\rho_{11}$ band ( $\mathbf{x}=\rho_{11}$ ), and thus we extend to the $\rho_{12}$ band an equivalent solution ( $\mathbf{x}=\rho_{12}$ ). In particular, to create a pre-trained solution for this other band as well we use the identical dataset considered in [13]. After that, we fine-tune the weights of the CNN from Naples in two different dates, close to the target date (specifically June 27th and July 27th). This can be considered as a geographical fine-tuning, because we adapt the weights of the CNN on the geometric features of the study area. Then, we test this fine-tuned solution on the date under investigation (July 12th, 2017). Once left apart the target date for testing, 17 $\times$ 17 patches for training are uniformly extracted from two above-mentioned date in the remaining segments. Overall, 10k patches are collected from the considered dates and randomly partitioned in 80% for training phase and 20% for validation phase. The 8k training patches are grouped in 32-size mini-batches for the implementation of the ADAM-based training. The fine-tuned solution is considered better than the solution from scratch, when a large amount of data for training phase is not available, or when the computing power is not sufficient [23]. Eventually, we minimize the L1-norm cost function, defined in the Methodology section, on the training examples using the ADAM learning algorithm. Thus, we set the ADAM default values $\eta$ = 0.002, $\beta_{1}$ = 0.9, and $\beta_{2}$ = 0.999, as reported in [24]. In this specific case, the training phase requires 200 epochs ( $32\times 200$ weight update) performed in few minutes using GPU cards, while the test can be done in real-time.

4.2 Comparison between Super-Resolution Proposal and SISR/SRDF techniques

In this section, $SRNN_{+}$ is compared to a pre-trained CNN-based method (SRNN), three popular SRDFs adapted to the Sentinel-2/SWIR problem, including GS2-GLP [25], HPF [26] and PRACS [25], and even the SISRs, that are the Nearest Neighbour (NNI) and bicubic interpolation techniques. The numerical results obtained for the area of interest are reported in the left part of the Tab. 1. In the results we consider an average on the SWIR bands. In the top part of the table, the $SRNN_{+}$ is compared to the SISR techniques, and the improvement is very remarkable in the HCC metric which deals with the fact that high frequency components are much affected by the super-resolution and are mostly localized on boundaries. Moving from the top to the bottom of the table, the proposed $SRNN_{+}$ method compares favourably against classical fusion methods, which take information from the additional input band $\rho_{8}$ . In the penultimate row it is given the performance of the pre-trained SRNN model and even in this case the $SRNN_{+}$ performs slightly better results in terms of all the metrics at 20-m resolution. As it can be seen the additional fine-tuning, although having few training patches, provide a further gain. To conclude this section we show in Figg. 4-3 some sample results (at 10-m without reference) which further confirm the effectiveness of the proposed method.

4.3 Comparison between Different AFIs and Maps

Once active fire ( $AF$ ) is detected by considering the followed rules: $AFD=AFI_{k}>\alpha$ , where $k\in\{1,2,3\}$ , the performance is computed in terms of Precision, Recall and IoU and reported on the right-hand side of Tab. 1. The numerical results confirmed the effectiveness of the proposal, and the Fig.5-6 further confirm the superiority of the proposed method. As we can see in Tab. 1, the $SRNN_{+}$ have the best performance in terms of precision metric. In particular, its values are much greater than those relating to the classic techniques, demonstrating that it benefits from the joint information obtained from the visible bands. The low false alarm rate aspect is well visible in Fig.6, where an urban detail included in the study area is shown. The Fig.6 only refers to $AFI_{2}$ , but similar results are provided by the other indices analysed. On the other hand, both in terms of recall and IoU measures, the proposal have worse performance. We suppose this are mainly due to the ground truth used in validation, which probably over-estimates the areas interested by fires. In fact, as we can see in the central column of the figure 5, the $SRNN_{+}$ AFIs better define these areas, resulting lighter and thinner then that ones obtained by other techniques. Furthermore we can observe from visual inspection that the boundaries are more evident considering $\rho_{12}$ and $\rho_{11}$ than $\rho_{12}$ and $\rho_{8}$ . In general, even though this determines a low detection rate on the maps obtained by the $AFI_{3}$ with respect to $AFI_{1}$ .

5 Conclusion

In this work we propose the $SRNN_{+}$ to further enhance the spatial resolution of the Sentinel-2 SWIR bands. For the specific goal (AFD) we fine-tune the weights of the CNN on the geographic study area and then we test the proposed approach both in terms of visual quality assessment and AFD capability. Eventually we show very promising results in terms of super-resolution metrics and even in AFD. The achieved results encourage us to exploit different architectural choices and/or learning strategies, and extending this approach to other remote sensing applications.

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Neha Joshi, Matthias Baumann, Andrea Ehammer, Rasmus Fensholt, Kenneth Grogan, Patrick Hostert, Martin Jepsen, Tobias Kuemmerle, Patrick Meyfroidt, Edward Mitchard, et al., “A review of the application of optical and radar remote sensing data fusion to land use mapping and monitoring,” Remote Sensing , vol. 8, no. 1, pp. 70, 2016.
2[2] M. Drusch et al., “Sentinel-2: Esa’s optical high-resolution mission for gmes operational services,” Remote Sensing of Environment , vol. 120, no. Supplement C, pp. 25 – 36, 2012, The Sentinel Missions - New Opportunities for Science.
3[3] Astrid Verhegghen, Hugh Eva, Guido Ceccherini, Frederic Achard, Valery Gond, Sylvie Gourlet-Fleury, and Paolo Cerutti, “The potential of Sentinel satellites for burnt area mapping and monitoring in the Congo Basin forests,” Remote Sensing , vol. 8, no. 12, pp. 986, 2016.
4[4] L Cicala, CV Angelino, N Fiscante, and SL Ullo, “Landsat-8 and Sentinel-2 for fire monitoring at a local scale: A case study on Vesuvius,” in 2018 IEEE International Conference on Environmental Engineering (EE) . IEEE, 2018, pp. 1–6.
5[5] José Pereira, Emilio Chuvieco, A Beudoin, and N Desbois, “Remote sensing of burned areas: A review. A review of remote sensing methods for the study of large wildland fires,” Departamento de Geografía, Universidad de Alcalá , pp. 127–184, 01 1997.
6[6] Yogesh Kant and K. V. S. Badarinath, “Studies on land surface temperature over heterogeneous areas using AVHRR data,” International Journal of Remote Sensing , vol. 21, no. 8, pp. 1749–1756, 2000.
7[7] Lucien Wald, Data Fusion. Definitions and Architectures - Fusion of Images of Different Spatial Resolutions , Presses de l’Ecole, Ecole des Mines de Paris, Paris, France, 2002, ISBN 2-911762-38-X.
8[8] Lucio Mascolo, Maurizio Sarti, Ferdinando Nunziata, and Maurizio Migliaccio, “Vesuvius national park monitoring by COSMO-Sky Med Ping Pong data analysis,” in ESA Special Publication , 2013, vol. 713.