TL;DR
This paper introduces a novel deep learning-based method for multiple-image super-resolution, leveraging image fusion and neural networks to improve resolution beyond existing single-image and multi-image techniques.
Contribution
It presents a new deep learning approach that combines multiple-image fusion with low-to-high resolution mapping, outperforming current state-of-the-art methods.
Findings
Outperforms existing SRR methods in experiments
Effective fusion of multiple low-resolution images
Deep learning enhances super-resolution accuracy
Abstract
Super-resolution reconstruction (SRR) is a process aimed at enhancing spatial resolution of images, either from a single observation, based on the learned relation between low and high resolution, or from multiple images presenting the same scene. SRR is particularly valuable, if it is infeasible to acquire images at desired resolution, but many images of the same scene are available at lower resolution---this is inherent to a variety of remote sensing scenarios. Recently, we have witnessed substantial improvement in single-image SRR attributed to the use of deep neural networks for learning the relation between low and high resolution. Importantly, deep learning has not been exploited for multiple-image SRR, which benefits from information fusion and in general allows for achieving higher reconstruction accuracy. In this letter, we introduce a new method which combines the advantages…
| Algorithm | IFC | PSNR | PSNRHF | PSNRLS | SSIM | UIQI | VIF | Time (s) |
| SR-DWT [4] | 2.281 | 28.833 | 40.580 | 38.613 | 0.813 | 0.757 | 0.458 | 4 |
| ResNet [15] | 2.517 | 28.773 | 34.038 | 33.470 | 0.823 | 0.749 | 0.453 | 30 |
| GPA [19] | 2.436 | 28.054 | 32.924 | 32.522 | 0.792 | 0.712 | 0.422 | 15 |
| SR-ADE [20] | 2.289 | 27.237 | 32.049 | 31.742 | 0.756 | 0.666 | 0.378 | 17 |
| EvoIM [11] | 3.190 | 31.185 | 39.067 | 38.166 | 0.863 | 0.801 | 0.561 | 4 |
| EvoNetA | 2.979 | 32.929 | 41.522 | 41.437 | 0.919 | 0.864 | 0.596 | 161 |
| EvoNet | 3.256 | 35.065 | 44.839 | 44.645 | 0.948 | 0.902 | 0.661 | 118 |
| EvoNetA—image registration performed for ResNet outputs | ||||||||
| Algorithm | IFC | PSNR | PSNRHF | PSNRLS | SSIM | UIQI | VIF | |
| Sydney | SR-DWT [4] | 1.146 | 14.883 | 32.306 | 29.717 | 0.345 | 0.284 | 0.125 |
| ResNet [15] | 1.070 | 14.533 | 32.609 | 31.029 | 0.292 | 0.176 | 0.105 | |
| GPA [19] | 1.191 | 16.619 | 31.928 | 30.710 | 0.398 | 0.236 | 0.121 | |
| SR-ADE [20] | 1.375 | 17.250 | 30.349 | 29.289 | 0.467 | 0.302 | 0.132 | |
| EvoIM [11] | 1.271 | 16.384 | 34.657 | 32.560 | 0.429 | 0.314 | 0.129 | |
| EvoNet | 1.387 | 16.722 | 34.349 | 32.607 | 0.487 | 0.334 | 0.139 | |
| Bushehr | SR-DWT [4] | 1.032 | 15.432 | 36.475 | 34.403 | 0.344 | 0.199 | 0.087 |
| ResNet [15] | 1.194 | 15.481 | 37.072 | 35.997 | 0.424 | 0.233 | 0.098 | |
| GPA [19] | 1.285 | 14.827 | 35.135 | 34.168 | 0.474 | 0.253 | 0.114 | |
| SR-ADE [20] | 1.185 | 14.704 | 33.804 | 32.963 | 0.458 | 0.218 | 0.102 | |
| EvoIM [11] | 1.134 | 14.470 | 37.237 | 35.956 | 0.362 | 0.227 | 0.098 | |
| EvoNet | 1.261 | 14.528 | 36.878 | 35.739 | 0.433 | 0.261 | 0.109 | |
| Bandar Abbas | SR-DWT [4] | 1.031 | 18.697 | 36.021 | 34.542 | 0.419 | 0.221 | 0.092 |
| ResNet [15] | 1.395 | 19.385 | 38.714 | 37.657 | 0.561 | 0.292 | 0.130 | |
| GPA [19] | 1.419 | 16.414 | 35.736 | 34.900 | 0.551 | 0.292 | 0.140 | |
| SR-ADE [20] | 1.305 | 16.187 | 33.381 | 32.634 | 0.521 | 0.249 | 0.124 | |
| EvoIM [11] | 1.148 | 16.068 | 37.158 | 35.909 | 0.414 | 0.255 | 0.114 | |
| EvoNet | 1.494 | 16.226 | 39.350 | 38.162 | 0.527 | 0.318 | 0.153 | |
| Mean scores | SR-DWT [4] | 1.070 | 16.337 | 34.934 | 32.887 | 0.369 | 0.234 | 0.101 |
| ResNet [15] | 1.220 | 16.467 | 36.132 | 34.894 | 0.426 | 0.234 | 0.111 | |
| GPA [19] | 1.299 | 15.953 | 34.266 | 33.259 | 0.474 | 0.260 | 0.125 | |
| SR-ADE [20] | 1.288 | 16.047 | 32.512 | 31.629 | 0.482 | 0.256 | 0.119 | |
| EvoIM [11] | 1.184 | 15.641 | 36.351 | 34.809 | 0.402 | 0.265 | 0.114 | |
| EvoNet | 1.381 | 15.825 | 36.859 | 35.503 | 0.482 | 0.304 | 0.134 | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Deep Learning for Multiple-Image Super-Resolution
Michal Kawulok
Pawel Benecki
Szymon Piechaczek
Krzysztof Hrynczenko
Daniel Kostrzewa
and Jakub Nalepa
The reported work was funded by European Space Agency (SuperDeep project, realized by Future Processing). MK and JN were partially supported by the National Science Centre under Grant DEC-2017/25/B/ST6/00474. PB and DK were supported by the Silesian University of Technology, Poland, funds no. BKM-509/RAu2/2017. M. Kawulok, P. Benecki, S. Piechaczek, K. Hrynczenko, D. Kostrzewa, and J. Nalepa are with Future Processing, Gliwice, Poland and with Silesian University of Technology, Gliwice, Poland (e-mail: [email protected]).
Abstract
Super-resolution reconstruction (SRR) is a process aimed at enhancing spatial resolution of images, either from a single observation, based on the learned relation between low and high resolution, or from multiple images presenting the same scene. SRR is particularly valuable, if it is infeasible to acquire images at desired resolution, but many images of the same scene are available at lower resolution—this is inherent to a variety of remote sensing scenarios. Recently, we have witnessed substantial improvement in single-image SRR attributed to the use of deep neural networks for learning the relation between low and high resolution. Importantly, deep learning has not been exploited for multiple-image SRR, which benefits from information fusion and in general allows for achieving higher reconstruction accuracy. In this letter, we introduce a new method which combines the advantages of multiple-image fusion with learning the low-to-high resolution mapping using deep networks. The reported experimental results indicate that our algorithm outperforms the state-of-the-art SRR methods, including these that operate from a single image, as well as those that perform multiple-image fusion.
Index Terms:
Super-resolution, deep learning, convolutional neural networks, image processing
I Introduction
Super-resolution reconstruction (SRR) is aimed at generating a high-resolution (HR) image from a single or multiple low-resolution (LR) observations. In many cases, the SRR algorithms are the only possibility to obtain images of sufficient spatial resolution, as HR data may not be available due to high acquisition costs or sensor limitations. Such situations are an inherent problem to remote sensing, in particular concerning satellite imaging for Earth observation purposes.
The existing approaches towards SRR can be categorized into single-image and multiple-image methods. The former consist in learning the LR-HR relation from a large number of examples. This relation allows us to reconstruct an HR image from an LR scene (unseen during training). Multiple-image SRR is based on information fusion, which benefits from the differences (mainly subpixel shifts) between LR images—in general, these approaches allow for more accurate reconstruction than single-image SRR, as they combine more data extracted from the analyzed scene. The recent advancements in deep learning, especially in deep convolutional neural networks (CNNs), have greatly improved single-image SRR, however it is worth noting that correct fusion of multiple LR images still offers higher reconstruction accuracy. Despite that, to the best of our knowledge, deep learning has not been employed for multiple-image SRR.
In this letter, our contribution lies in combining the advantages of single-image SRR based on deep learning with the benefits of information fusion offered by multiple-image reconstruction (Section II presents the related work). We introduce EvoNet (Section III), which employs a deep residual network, more specifically ResNet [15], to enhance the capabilities of evolutionary imaging model (EvoIM) [11] for multiple-image SRR. The results of our extensive experimental validation (Section IV) focused on satellite imaging are highly encouraging and they show that EvoNet renders qualitatively and quantitatively better outcome than the state-of-the-art techniques for single-image and multiple-image SRR.
II Related Work
In this section, we outline the state of the art in multiple-image SRR (Section II-A), and we present the recent advancements in using deep learning for SRR (Section II-B).
II-A Multiple-image super-resolution reconstruction
Existing techniques for multiple-image SRR are based on the premise that each LR observation in a set has been derived from an original HR image , degraded using an assumed imaging model (IM) that usually includes image warping, blurring, decimation and contamination with the noise. The reconstruction consists in reversing that degradation process, which requires solving an ill-posed optimization problem, therefore most SRR techniques employ some regularization to provide spatial smoothness of the reconstructed HR image . In one of the earliest approaches, Irani and Peleg performed SRR relying on image registration (hence reducing the IM to subpixel shifts) [10]. A hierarchical subpixel displacement estimation was combined with the Bayesian reconstruction in the gradient projection algorithm (GPA) [19]. Another popular optimization technique applied here is the projection onto convex sets [1], which consists in updating the HR target image iteratively based on the error measured between and a downsampled version of the reconstruction outcome , degraded using the assumed IM. Farsiu et al. introduced fast and robust super-resolution (FRSR) [8] based on maximum likelihood estimation coupled with simplified regularization—importantly, the error is measured in the HR coordinates, thus avoiding the expensive scaling operation. Among other methods, adaptive Wiener filter [9] and random Markov fields [16] were used to specify the IM. Zhu et al. proposed adaptive detail enhancement (SR-ADE) [20] for reconstructing satellite images—a bilateral filter is employed to decompose the input images and amplify the high-frequency detail information.
Recently, we proposed the EvoIM method [11, 12], which employs a genetic algorithm to optimize the hyper-parameters that control the IM used in FRSR [8], and to evolve the convolution kernels instead of the Gaussian blur used in FRSR. We showed that the reconstruction process can be effectively adapted to different imaging conditions—in particular, we used Sentinel-2 images at original resolution as LR inputs, and compared the reconstruction outcome with SPOT images presenting the same region.
II-B Deep learning for single-image super-resolution
Inspired by earlier approaches based on sparse coding [3], Dong et al. proposed super-resolution CNN (SRCNN) [5], followed by its faster version (FSRCNN) [6], for learning the LR-to-HR mapping from a number of LR–HR image pairs. Despite relatively simple architecture, SRCNN outperforms the state-of-the-art example-based methods. Liebel and Korner have successfully trained SRCNN with Sentinel-2 images, improving its capacities of enhancing satellite data [17]. The same architecture was used to improve spatial resolution of sea surface temperature maps [7]. Kim et al. addressed certain limitations of SRCNN with a very deep super-resolution network [13] which can be efficiently trained relying on fast residual learning. The domain expertise was exploited using a sparse coding network [18], which achieves high training speed and model compactness. Lai et al. proposed deep Laplacian pyramid networks with progressive upsampling [14], aimed at achieving high processing speed. Recently, generative adversarial networks (GANs) are being actively explored for SRR [15]. GANs are composed of a generator (ResNet in [15]), trained to perform SRR, whose outcome is classified by a discriminator, learned to distinguish between the images reconstructed by the generator and the real HR images (used for reference). In this way, the generator is promoted for generating images that are hard to distinguish from the real ones, thus it also learns avoiding the artifacts.
III The proposed EvoNet algorithm
A flowchart of the proposed method is presented in Fig. 1. First of all, each of LR input images () is subject to single-image SRR using ResNet. This step produces a set of images , whose dimensions are larger than those of . In parallel to that, the LR input set undergoes image registration to determine subpixel shifts between the images. The obtained single-image SRR outcomes () alongside the subpixel shifts allow for composing the initial HR image using the median shift-and-add method (the dimensions are increased again , hence compared with ). Finally, is subject to the iterative EvoIM process, which produces the final reconstruction outcome .
III-A Residual neural network applied to the input images
Each LR image is independently enhanced using ResNet to obtain a higher-quality input data () for further multiple-image fusion. For this purpose, we exploit the architecture described in [15], which is composed of 16 residual blocks with skip connections, and it is trained employing the mean square error (MSE) as the loss function (during training, ResNet is guided to reduce MSE between each HR image and the reconstruction outcome obtained from the artificially-degraded HR image). For EvoNet, we modify the final layer, which determines the upscaling factor ( in our case, compared with in [15]).
III-B Multiple-image fusion
The EvoIM process, which we employ for multiple-image fusion, consists in iterative filtering of an HR image , composed of registered LR inputs. In EvoNet, we register the original images, before they are processed with ResNet (the ResNet reconstruction does not introduce any information that may contribute to better assessment of the displacement values). As the dimensions of the ResNet outputs are larger than those of , the computed shift values are multiplied by 2 to compose . Subsequently, EvoIM solves the optimization problem (analogously to the FRSR method). The update step is computed as:
[TABLE]
where is a hyper-parameter that controls the update step, is a diagonal matrix representing the number of the LR measurements that contributed to , is the regularization term controlled with the hyper-parameter, while and are convolution kernels (in FRSR, is the Gaussian blur and ). The hyper-parameters alongside the convolution kernels are optimized during the EvoIM evolutionary training. Importantly, ResNet and EvoIM are trained separately before they are combined within the EvoNet framework.
IV Experiments
For validation, we used three types of data in the test set, namely: (i) artificially-degraded (AD) images—10 scenes, for each a set obtained from an HR image with different subpixel shifts applied before further degradation, each of size pixels, (ii) real satellite (RS) images of the same region, acquired at different resolution—we used three Sentinel-2 scenes as LR ( LR images in each scene), two of which are matched with SPOT images (presenting Bushehr, Iran, LR of size pixels, and Bandar Abbas, Iran, pixels) and one is matched with Digital Globe WorldView-4 image (Sydney, Australia, pixels), and (iii) real satellite images available without any higher-resolution reference (RS, over 20 scenes). For AD and RS, we quantify the reconstruction quality based on the similarity between and , and for RS, we rely exclusively on subjective qualitative assessment (as no reference is available). The reconstruction outcome is evaluated quantitatively at the dimensions larger than for input LR images (EvoNet and ResNet enlarge LR images , so we downscale these outcomes for fair comparison with the remaining methods). For RS, is compared with Digital Globe and SPOT images, downscaled to fit the dimensions of . In addition to peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM), we measure the similarity using more advanced metrics [2]: information fidelity criterion (IFC), visual information fidelity (VIF), universal image quality index (UIQI), and PSNR for images treated with a high-pass filter (PSNRHF) and local standard deviation (PSNRLS). For all these metrics, higher values indicate higher similarity between the reconstruction outcome and the reference image.
EvoNet is compared with two single-image SRR methods: SRR based on wavelet transform (SR-DWT) [4] and ResNet [15], and with three multiple-image ones: GPA [19], SR-ADE [20], and EvoIM [11]. EvoIM (also exploited in EvoNet) was trained separately for artificially-degraded images and for real satellite data, as reported in [11], using PSNRHF [2] as the fitness function (there were no overlaps between training and test sets). ResNet was trained using images from the DIV2K dataset111DIV2K dataset is available at https://data.vision.ee.ethz.ch/cvl/DIV2K. We implemented all the investigated algorithms in C++, and we used Python with Keras to implement ResNet. The experiments were run on an Intel i5 3.2 GHz computer with 16 GB RAM, and ResNet was trained on a GTX 1060 6 GB GPU.
In Table I, we report the reconstruction accuracy for AD images alongside the processing times. For fair comparison, all the reconstruction tests were run on a CPU, which explains long times of ResNet and EvoNet (GPU was used only for training ResNet). EvoNet allows for the most accurate reconstruction, rendering consistently best scores, and multiple-image EvoIM renders higher scores than single-image SR-DWT and ResNet. Examples of reconstruction are presented in Fig. 2—the outcome of ResNet is more blurred than EvoNet, with less details visible, and EvoIM produces definitely more artifacts; overall, EvoNet renders very plausible outcome, which most resembles the HR image. We have also tried to register the images after they are processed with ResNet—as expected, this decreases the reconstruction accuracy, while extending the processing time (see Table I).
Quantitative results obtained for RS images are reported in Table II (we also show the values averaged over three images). It can be seen that for Sydney and Bandar Abbas, EvoNet renders highest scores for most metrics (including IFC and VIF which were found most meaningful for assessing SRR [2]). For Bushehr, the scores differ less among the methods, and the metrics are not consistent in indicating the most accurate method—possibly because this image contains more plain areas compared with two remaining scenes. Average PSNR is highest for ResNet, which can be caused by using MSE as the loss function for training (PSNR is based on MSE). All other metrics indicate that EvoNet outperforms the remaining methods. From Fig. 3, it can be seen that the quantitative results are coherent with the visual assessment—all the methods increase the interpretation capacities compared with LR, and the outcome obtained using EvoNet recovers more details than ResNet, without introducing the artifacts visible for EvoIM.
The outcomes obtained for RS images (without any HR reference) generally confirm our observations discussed for RS images. In Fig. 4, we show an interesting example of reconstruction from Lunar Reconnaissance Orbiter Camera images. It is worth noting that these LR images contain some artifacts in a form of faint vertical stripes, which result from the sensor characteristics (the images were not preprocessed). In this case, not only does EvoNet render the highest reconstruction quality, but it also manages to make these artifacts less apparent compared with EvoIM and ResNet (this can be explained by the fact that ResNet changes the artifacts to be grid-like, which can be further reduced during the fusion).
V Conclusions
In this letter, we proposed a novel method for multiple-image super-resolution which exploits the recent advancements in deep learning. We demonstrated that the ResNet deep CNN applied to enhance each individual LR image before performing the multiple-image fusion, can substantially improve the final super-resolved image. The reported quantitative and qualitative results indicate that the proposed approach is highly competitive with the state of the art both in single-image SRR, as well as in multiple-image super-resolution.
Our ongoing work is aimed at developing deep architectures for learning the entire process of multiple-image reconstruction, possibly including image registration.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] T. Akgun, Y. Altunbasak, and R. M. Mersereau, “Super-resolution reconstruction of hyperspectral images,” IEEE Trans. on Image Process. , vol. 14, no. 11, pp. 1860–1875, 2005.
- 2[2] P. Benecki, M. Kawulok, D. Kostrzewa, and L. Skonieczny, “Evaluating super-resolution reconstruction of satellite images,” Acta Astronautica , vol. 153, pp. 15–25, 2018.
- 3[3] H. Chavez-Roman and V. Ponomaryov, “Super resolution image generation using wavelet domain interpolation with edge extraction via a sparse representation,” IEEE Geoscience and Remote Sensing Letters , vol. 11, no. 10, pp. 1777–1781, Oct 2014.
- 4[4] H. Demirel and G. Anbarjafari, “Discrete wavelet transform-based satellite image resolution enhancement,” IEEE Trans. on Geoscience and Remote Sensing , vol. 49, no. 6, pp. 1997–2004, 2011.
- 5[5] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in Proc. ECCV . Springer, 2014, pp. 184–199.
- 6[6] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in Proc. ECCV . Springer, 2016, pp. 391–407.
- 7[7] A. Ducournau and R. Fablet, “Deep learning for ocean remote sensing: An application of convolutional neural networks for super-resolution on satellite-derived SST data,” in Proc. WPRRS , 2016, pp. 1–6.
- 8[8] S. Farsiu, M. D. Robinson, M. Elad, and P. Milanfar, “Fast and robust multiframe super resolution,” IEEE Trans. on Image Process. , vol. 13, no. 10, pp. 1327–1344, 2004.
