Underwater Color Restoration Using U-Net Denoising Autoencoder

Yousif Hashisho; Mohamad Albadawi; Tom Krause; and Uwe Freiherr von; Lukas

arXiv:1905.09000·cs.CV·February 18, 2020

Underwater Color Restoration Using U-Net Denoising Autoencoder

Yousif Hashisho, Mohamad Albadawi, Tom Krause, and Uwe Freiherr von, Lukas

PDF

TL;DR

This paper introduces a U-Net based denoising autoencoder for real-time underwater color restoration, significantly improving visual quality for underwater vehicle perception with a novel single autoencoder approach.

Contribution

It presents the first autoencoder model capable of effective underwater color restoration, balancing accuracy and computational efficiency for real-time applications.

Findings

01

Outperforms state-of-the-art methods in color restoration quality

02

Enables real-time processing suitable for underwater vehicles

03

Uses a novel training dataset construction method

Abstract

Visual inspection of underwater structures by vehicles, e.g. remotely operated vehicles (ROVs), plays an important role in scientific, military, and commercial sectors. However, the automatic extraction of information using software tools is hindered by the characteristics of water which degrade the quality of captured videos. As a contribution for restoring the color of underwater images, Underwater Denoising Autoencoder (UDAE) model is developed using a denoising autoencoder with U-Net architecture. The proposed network takes into consideration the accuracy and the computation cost to enable real-time implementation on underwater visual tasks using end-to-end autoencoder network. Underwater vehicles perception is improved by reconstructing captured frames; hence obtaining better performance in underwater tasks. Related learning methods use generative adversarial networks (GANs) to…

Tables1

Table 1. TABLE I: Evaluation of UGAN and UDAE using three metrics over 1 , 040 1 040 1,040 images with a resolution of 256 × 256 256 256 256\times 256 .

Objective Evaluation
Metrics	MSE	SSIM	MS-SSIM-L1
UDAE	$0.0028$	$0.9653$	$0.0753$
UGAN	$0.0061$	$0.9186$	$0.1415$

Equations2

L = α \cdot L^{M S - S S I M} + (1 - α) \cdot L^{L 1},

L = α \cdot L^{M S - S S I M} + (1 - α) \cdot L^{L 1},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConcatenated Skip Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · U-Net · Denoising Autoencoder · Solana Customer Service Number +1-833-534-1729

Full text

Underwater Color Restoration Using U-Net Denoising Autoencoder

Yousif Hashisho

Tom Krause

Department of Maritime Graphics

*Fraunhofer Institute for Computer Graphics Research (IGD)

*Rostock, Germany

[email protected]

Department of Maritime Graphics

*Fraunhofer Institute for Computer Graphics Research (IGD)

*Rostock, Germany

[email protected]

Mohamad Albadawi

Uwe Freiherr von Lukas

Department of Maritime Graphics

*Fraunhofer Institute for Computer Graphics Research (IGD)

*Rostock, Germany

[email protected]

Department of Maritime Graphics

*Fraunhofer Institute for Computer Graphics Research (IGD)

Department of Computer Science*

*University of Rostock

*Rostock, Germany

[email protected]

Abstract

Visual inspection of underwater structures by vehicles, e.g. remotely operated vehicles (ROVs), plays an important role in scientific, military, and commercial sectors. However, the automatic extraction of information using software tools is hindered by the characteristics of water which degrade the quality of captured videos. As a contribution for restoring the color of underwater images, Underwater Denoising Autoencoder (UDAE) model is developed using a denoising autoencoder with U-Net architecture. The proposed network takes into consideration the accuracy and the computation cost to enable real-time implementation on underwater visual tasks using end-to-end autoencoder network. Underwater vehicles perception is improved by reconstructing captured frames; hence obtaining better performance in underwater tasks. Related learning methods use generative adversarial networks (GANs) to generate color corrected underwater images, and to our knowledge this paper is the first to deal with a single autoencoder capable of producing same or better results. Moreover, image pairs are constructed for training the proposed network, where it is hard to obtain such dataset from underwater scenery. At the end, the proposed model is compared to a state-of-the-art method.

Index Terms:

autoencoders, underwater image, image restoration, Generative Adverarial Networks, real-time

I Introduction

Marine robots, such as remotely operated vehicles (ROVs), are being increasingly used in the scientific, military, and commercial sectors. They are critical in collecting data and performing certain underwater operations. Due to safety and health concerns, human intervention can be risky and limited when executing underwater missions [1]. Thus, underwater vehicles are supplied with cameras systems for performing numerous vision tasks. For instance, Choi et al., 2017 [2] operated an ROV manually for inspecting harbour structures and acquiring high quality videos. Manjunatha et al., 2018 [3] built a robot equipped with a high definition camera for visual inspection at a specified depth in a water body. However, the automatic extraction of information using software tools is hindered by underwater image degradation caused by poor water medium and light behaviour.

Contrast loss and color distortion affect the algorithms and ultimately the vehicle performance in gathering data and processing them. An image enhancement technique is needed for vehicle navigation by human operator to facilitate underwater tasks. Furthermore, the processing speed should be taken into consideration for a real-time implementation.

This paper proposes Underwater Denoising Autoencoder (UDAE), a deep learning network based on a single denoising autoencoder [4] using U-Net [5] as a CNN architecture, for improving the quality of underwater imagery and video material. The contributions presented in this paper can be summarized as follows:

•

UDAE network is proposed which is specialized in underwater color restoration.

•

Faster processing speed is achieved than the state-of-the-art method which optimize the real-time capability.

•

A new dataset with a combination of different underwater scenarios (turbidity, depth, temperature, attenuation type..) is synthesized for training the proposed network. The synthetic dataset is generated using a generative deep learning method.

•

The fully end-to-end proposed model generalizes well (real underwater images) with different degradation types.

The rest of the paper is as follows: §II talks about relevant work; §III gives experiments and methods followed to restore underwater images; §IV presents corrected underwater images and the performance of the proposed network; finally, §V summarizes the paper.

II Related Work

Numerous attempts have been made with different image improvement methods for restoring the color of raw underwater images. These methods fall into two categories [6]: hardware-based methods [7, 8] and software-based methods [9, 10, 11]. Software-based methods invert the formation of underwater images and construct physical models for image enhancement in addition to modifying the image pixel values. Hardware-based methods capture multiple images with help of polarization filters, stereo setups or specialized hardware devices and use the obtained additional information [12, 6].

Both categories show good performance, however, they are limited to certain scenarios and don’t match various underwater lightening conditions. They are expensive to implement since some of them use specialized sensors and multiple images for the enhancement. Recent approaches have focused on Generative Adversarial Networks (GANs) as a new way for achieving better results.

When improving underwater imagery using deep learning models (e.g. GANs), image pairs consisting of clean and distorted underwater images are needed for training the model. It is hard to capture clean underwater images without the attenuation of light and other underwater effects. Thus, several works have been done to synthesize training images.

Li et al., 2018 [13] used two types of networks: Water Generative Adversarial Network (WaterGAN) for generating realistic underwater images and Underwater Image Restoration Network for correcting the color. The generator of WaterGAN models the formation of underwater image using three stages: Attenuation, Scattering, and Camera Model. After that, the learned generator is used to generate training image samples for the color restoration network. First, a relative depth map is estimated and reconstructed from the input image and are both used for color restoration. They showed efficiency for real-time applications, however, their network is limited to certain degradation type appearance due to the way of generating images. Figure 1 shows the images that were used for training the network which do not reflect underwater structures. The clean images consist of in-air images, whereas the corruption process is limited to certain degradation types (e.g. greenish mask).

As an improvement over the aforementioned data generation method, Li et al., 2018 [14] and Fabbri et al., 2018 [15] used CycleGAN [16] for generating underwater images. After synthesizing the data, it was later used for training their color restoration model.

The previously mentioned deep learning methods showed good performance in restoring the color. However in certain scenarios, they led to an unrealistic color correction of underwater images as in Li et al., 2018 [14]. The training dataset lacked true colors of underwater structures such as coral reefs and fish. Furthermore, a drawback in the color restoration model, Underwater Generative Adversarial Network (UGAN), of Fabbri et al., 2018 [15] is the efficiency of real-time implementation with high resolution images, as the model’s architecture makes it computationally costly.

We follow the same procedure as in Fabbri et al., 2018 [15] for generating synthetic images. However, a different set of images is used for the training of CycleGAN. Fabbri et al., 2018 [15] collected clear underwater images and style-transferred the characteristics of degradation from distorted underwater images to them. Our generated dataset is composed of various underwater locations with different degradation types, leading to a better generalization than their network.

III Methodology

Two important aspects are discussed in building the UDAE model. The first aspect is the methodology followed to generate the underwater dataset for training the network. The second one is the architecture of the UDAE model and the benefits of using a denoising autoencoder.

III-A Dataset

A dataset is gathered and filtered to be used for the generation process of the image pairs. This section is divided into two subsections. The first subsection discusses data collection of underwater images, while the second discusses generating data for obtaining underwater image pairs.

III-A1 Data Collection

To train a network capable of restoring the true underwater color from the distorted images, clear images were gathered without light scattering in them. These images were taken from different sources on the Internet. As it is hard to get clear images, it was possible to obtain them from:

•

Large fish aquariums such as the ones in museums and touristic towers.

•

Underwater images that were captured in a close distance to the structures with artificial light exposure.

•

Various images and frames taken from videos that were enhanced and processed by commercial software tools.

The clean images were chosen based on contrast loss and degradation presented in underwater images. After that, distorted images were gathered with different attenuation types from various locations. Some of them were captured by Fraunhofer IGD from the Baltic Sea, while the others were gathered from the Internet corresponding to different locations, depths, temperatures and other degradation factors.

III-A2 Image Pairs Data Generation

$15,131$ images composed of clear and distorted images were collected. After that, the collected images were filtered, based on a subjective quality evaluation, into two categories: A (clear) containing $7,055$ images and B (distorted) containing $8,076$ images. The two different categories are shown in Figure 2. All images were resized to $512\times 512$ using area interpolation method.

After gathering suitable images, CycleGAN generative model was used for style-transferring. It uses adversarial loss for learning a mapping from a source domain X to a target domain Y ( $G:X\rightarrow Y$ ) [16]. It was used to transfer the underwater style from B images to A ones, and the result was the category A′, Figure 3. The image pairs in A and A′ were then used to train the autoencoder. The training of the CycleGAN model took around $9$ days on $4$ NVIDIA TITAN X GPU devices, after that $7,055$ image pairs were generated and filtered into $5,194$ after removing failure cases. The failure cases are due to limitations in the style transfer of CycleGAN.

III-B Proposed Network

Denoising autoencoder is used for restoring the color of underwater images. We consider the problem of the color restoration as a reconstruction of a corrupted input. Consider that $x$ is the clean image and $\tilde{x}$ is the corrupted version of it by the style transfer $c(\tilde{x}|x)$ . Then we would try to reconstruct a repaired input by learning a decoding distribution $p_{\theta}(x|z)$ from an encoded distribution $q_{\phi}(z|\tilde{x})$ . Denoising autoencoders are expected to capture implicit invariances in the data and extract the key features from the input images [4, 17]. U-Net is used as a CNN architecture due to its efficiency in computation and training111the parameters learn well even with a small dataset in addition to its ability to propagate context information to higher resolution layers [5].

For a better illustration of the proposed UDAE network, refer to Figure 4. Same kernel sizes and layers were used as in UNet [5]. First of all, a distorted $RGB$ underwater image is fed into the encoder of the denoising autoencoder. In the encoder part, subsequent convolutions downsample the image gradually to a latent variable. In each downsampling stage, $3\times 3$ 2-D convolutions are used twice followed by a rectified linear unit (ReLU) and a $2\times 2$ max-pooling with a stride of $2$ . The number of feature maps are doubled in each stage. In the decoder part, upsampling is done from the latent variable back to the original input size sequentially. After each upsampling, the tensor (image) is concatenated with the output of the corresponding symmetric layer in the encoder side and $3$ consecutive convolutions are followed. The feature maps are reduced gradually to $3$ channels. The concatenation of the output of layers combines the contextual information from the downsampling step [5]. The reconstructed image should bear resemblance to the clean images, therefore and inspired by the work of Zhao et al. [18], Multi-scale Structural SIMilarity (MS-SSIM) index and absolute value ( $L1$ ) loss functions were used. The loss function can be expressed as:

[TABLE]

where $\mathcal{L}$ represents the loss of the reconstructed image and $\alpha$ is set to $0.80$ after conducting several experiments and observing best reconstruction. The objective of the autoencoder is to minimize the loss function as much as possible. Weight decay is omitted in the proposed network since the presented noise in the input images has a similar regularization effect to weight decay with faster training dynamics [19]. Tensorflow framework was used for the training.

IV Results and Discussion

The training of UDAE took around $1$ day on NVIDIA Quadro M5000. It was then tested on $1,040$ images with a resolution of $512\times 512$ . The average time per image in seconds was $0.01601$ ( $62.45$ $fps$ ) on NVIDIA RTX 2080ti. The selected loss function was capable of preserving details when reconstructing the image. $SSIM$ is sensitive to various types of image degradation [20], whereas $L1$ preserves colors and luminance [18]. The proposed network produced good results as shown in Figure 5 with a suitable speed for real-time implementation.

In certain scenarios where the clean image is only partially clear such as the one in subfigure 5c, the reconstructed image showed a better recovery from the distorted color than the clean image itself. The reason is that the network in general learned an encoding and decoding distribution capable of reconstructing color-recovered images.

Additionally, UDAE network was tested on real data such as underwater videos extracted from YouTube to evaluate its generalization ability. Figure 6 shows samples of the reconstructed images on the following videos: Baltic Sea222https://www.youtube.com/watch?v=Y-SVGO0r6n0, Scuba Diving333https://www.youtube.com/watch?v=OSdrb1XNXZI, and Fish Hunting444https://www.youtube.com/watch?v=aLt7aGFcVkM. The color of the input underwater images with different degradation type was restored and the details were preserved.

IV-A Comparison with UGAN

UDAE was compared to Underwater Generative Adversarial Network (UGAN) [15]. First, both networks were tested on the dataset described in Section III-A ( $1,040$ images) due to the availability of the clean image and for an objective evaluation. The testing images are of size $256\times 256$ . Three metrics were used for the evaluation: MSE, SSIM, and MS-SSIM-L1 (eq. 1). In all three metrics, UDAE showed better reconstruction error than that of UGAN, Table I555MSE and MS-SSIM-L1 give a score [math] for identical images, while SSIM gives a score $1$ ..

For a fair comparison, both networks were then evaluated on the testing images that the authors of UGAN published in their paper, Figure 7666for a better comparison of images, it is better to view them in digital form.. The average processing time was calculated over $1,813$ testing images resized to $256\times 256$ . The average time per image of UGAN was $0.0099$ seconds ( $100.94$ $fps$ ), whereas that of UDAE was $0.0043$ seconds ( $230.67$ $fps$ ). The processing was conducted on NVIDIA RTX2080ti.Since clean images were not available, the evaluation was only based on the human perception.

UDAE showed good generalization where the color was restored and the details were preserved. UGAN achieved good performance in restoring the colors of some images such as subfigure 7a, however, UDAE had better color brightness. Another inference drawn from the images is the background reconstruction. In many images, UGAN failed to reconstruct the background properly such as images with plain background as in subfigure 7b, whereas our proposed network was capable of restoring the color of both the foreground and the background without any artifacts. An example on the artifacts is the halo effect shown in UGAN reconstructed image. As for the high frequencies, the images of subfigure 7c were zoomed in by a factor of $9$ using bilinear interpolation method in Figure 8.

UDAE outperforms UGAN network in preserving and reconstructing better details. The coral reefs in the reconstructed image of UGAN were blurry and many details were lost. The details are important for object detection and tracking by underwater vehicles. Some failure cases were noticed by our proposed network such as subfigure 7d. This will be kept for future work where a better dataset with more degradation types would be established for a better generalization.

V Conclusion

This paper proposed Underwater Denoising Autoencoder (UDAE); a new way for restoring the color of underwater images using a single denoising autoencoder with real-time capability. We showed that it is possible to reconstruct underwater images using a network based on a single denoising autoencoder, where it gave same or better results than a network based on a GAN. However, using a single autoencoder is better suited for real-time implementation. Additionally, as an improvement to previous networks, UDAE is capable of restoring better color in images and preserving the details.

We believe that there is a space for improving the network, where better generalization ability should be achieved. The network was trained on a relatively small dataset, however, obtaining a larger one with various color distortion would lead to great improvement. The processing speed could be also improved by trying different CNN-baseline or latent space size.

Bibliography20

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. B. Wynn, V. A. Huvenne, T. P. Le Bas, B. J. Murton, D. P. Connelly, B. J. Bett, H. A. Ruhl, K. J. Morris, J. Peakall, D. R. Parsons et al. , “Autonomous underwater vehicles (auvs): Their past, present and future contributions to the advancement of marine geoscience,” Marine Geology , vol. 352, pp. 451–468, 2014. [Online]. Available: https://doi.org/10.1016/j.margeo.2014.03.012 · doi ↗
2[2] J. Choi, Y. Lee, T. Kim, J. Jung, and H.-T. Choi, “Development of a rov for visual inspection of harbor structures,” in 2017 IEEE Underwater Technology (UT) . IEEE, 2017, pp. 1–4. [Online]. Available: https://doi.org/10.1109/UT.2017.7890285 · doi ↗
3[3] M. Manjunatha, A. A. Selvakumar, V. P. Godeswar, and R. Manimaran, “A low cost underwater robot with grippers for visual inspection of external pipeline surface,” Procedia computer science , vol. 133, pp. 108–115, 2018. [Online]. Available: https://doi.org/10.1016/j.procs.2018.07.014 · doi ↗
4[4] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th international conference on Machine learning . ACM, 2008, pp. 1096–1103. [Online]. Available: https://doi.org/10.1145/1390156.1390294 · doi ↗
5[5] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention . Springer, 2015, pp. 234–241. [Online]. Available: https://doi.org/10.1007/978-3-319-24574-4_28 · doi ↗
6[6] H. Lu, Y. Li, Y. Zhang, M. Chen, S. Serikawa, and H. Kim, “Underwater optical image processing: a comprehensive review,” Mobile networks and applications , vol. 22, no. 6, pp. 1204–1211, 2017. [Online]. Available: https://doi.org/10.1007/s 11036-017-0863-4 · doi ↗
7[7] Y. Y. Schechner and N. Karpel, “Recovery of underwater visibility and structure by polarization analysis,” IEEE Journal of oceanic engineering , vol. 30, no. 3, pp. 570–587, 2005. [Online]. Available: https://doi.org/10.1109/JOE.2005.850871 · doi ↗
8[8] T. Treibitz and Y. Y. Schechner, “Active polarization descattering,” IEEE transactions on pattern analysis and machine intelligence , vol. 31, no. 3, pp. 385–399, 2009. [Online]. Available: https://doi.org/10.1109/TPAMI.2008.85 · doi ↗