Self-Committee Approach for Image Restoration Problems using   Convolutional Neural Network

Byeongyong Ahn; and Nam Ik Cho

arXiv:1705.04528·cs.CV·June 13, 2017

Self-Committee Approach for Image Restoration Problems using Convolutional Neural Network

Byeongyong Ahn, and Nam Ik Cho

PDF

Open Access

TL;DR

This paper introduces a self-committee approach using a single CNN to improve image restoration by averaging outputs from multiple transformed inputs, enhancing results without needing multiple networks.

Contribution

The proposed method leverages input transformations to generate multiple outputs from one CNN, improving image restoration performance without additional networks.

Findings

01

Enhanced denoising results with input transforms.

02

Improved super-resolution performance.

03

Single network achieves multi-trial benefits.

Abstract

There have been many discriminative learning methods using convolutional neural networks (CNN) for several image restoration problems, which learn the mapping function from a degraded input to the clean output. In this letter, we propose a self-committee method that can find enhanced restoration results from the multiple trial of a trained CNN with different but related inputs. Specifically, it is noted that the CNN sometimes finds different mapping functions when the input is transformed by a reversible transform and thus produces different but related outputs with the original. Hence averaging the outputs for several different transformed inputs can enhance the results as evidenced by the network committee methods. Unlike the conventional committee approaches that require several networks, the proposed method needs only a single network. Experimental results show that adding an…

Tables4

Table 1. TABLE I: 8 FR operations employed to constitute the committee

$k$	Discription
1	Original
2	FlipUD
3	Rotation $(90^{\circ})$
4	Rotation $(90^{\circ})$ +FlipUD
5	Rotation $(180^{\circ})$
6	Rotation $(180^{\circ})$ +FlipUD
7	Rotation $(- 90^{\circ})$
8	Rotation $(- 90^{\circ})$ +FlipUD

Table 2. TABLE II: 6 types of committee that are evaluated

Committee Name	Discription	$♯$ of Members
SCN-F	Original+Flip ( $K = {1, 2}$ )	2
SCN-R	Original+Rotation ( $K = {1, 3, 5, 7}$ )	4
SCN-FR	Original+FR ( $K = {1 \sim 8}$ )	8
SCN-I	Original+Inversion	2
SCN-Full	Original+FR+Inversion	16
SCN-L	Original+Linear (for SR only)	3

Table 3. TABLE III: Individual PSNR results for gaussian denoising.

Method	DnCNN	SCN-F	SCN-R	SCN-FR	SCN-I	SCN-Full
$σ = 30$
Cameraman	29.24	29.26	29.28	29.28	29.28	29.30
Lena	31.62	31.66	31.67	31.68	31.66	31.69
Barbara	28.84	28.89	28.93	28.94	28.91	28.96
Boat	29.36	29.38	29.40	29.40	29.38	29.41
Couple	29.20	29.22	29.24	29.25	29.23	39.25
Fingerprint	26.61	26.64	26.66	26.67	26.71	26.73
Hill	29.24	29.26	29.26	29.27	29.26	29.27
House	32.38	32.43	32.43	32.44	32.42	33.45
Jetplane	31.12	31.15	31.17	31.17	31.18	31.19
Man	29.23	29.25	29.26	29.27	29.24	29.26
Montage	31.82	31.89	31.93	31.95	31.87	31.94
Peppers	29.86	29.89	29.91	29.91	29.95	29.98
Average	29.87	29.91	29.93	29.94	29.92	29.95

Table 4. TABLE IV: Average PSNR results for super-resolution

Dataset	Upscaling Factor	SRCNN	SCN-FR	SCN-L	SCN-I	SCN-Full
Set5	2	36.71	36.91	36.72	36.81	36.92
	3	32.83	32.97	32.84	32.89	32.98
	4	30.51	30.63	30.53	30.60	30.64
Set14	2	32.54	32.66	32.55	32.60	32.67
	3	29.34	29.45	29.35	29.39	29.45
	4	27.52	27.58	27.53	27.57	27.59

Equations17

f (g (Y)) = g (f (Y))

f (g (Y)) = g (f (Y))

\hat{X}_{F R, I} = \frac{\sum _{k \in K} g _{k}^{- 1} ( f ( g _{k} ( Y )))}{∣ K ∣}

\hat{X}_{F R, I} = \frac{\sum _{k \in K} g _{k}^{- 1} ( f ( g _{k} ( Y )))}{∣ K ∣}

f (α Y + β) = α f (Y) + β

f (α Y + β) = α f (Y) + β

\hat{X}_{L} = \frac{\sum _{α, β} x ^ _{α, β}}{\sum _{α, β} 1}

\hat{X}_{L} = \frac{\sum _{α, β} x ^ _{α, β}}{\sum _{α, β} 1}

\overset{x}{^}_{α, β} = \frac{f ( α Y + β ) - β}{α} .

\overset{x}{^}_{α, β} = \frac{f ( α Y + β ) - β}{α} .

\hat{X}_{I} = \frac{f ( Y ) + ( 1 - f ( 1 - Y ))}{2}

\hat{X}_{I} = \frac{f ( Y ) + ( 1 - f ( 1 - Y ))}{2}

\hat{X}_{F u l l} = \frac{\sum _{α, β} \sum _{k \in K} g _{k}^{- 1} ( f ( g _{k} ( α Y + β ))) - β}{\sum _{α, β} α ∣ K ∣}

\hat{X}_{F u l l} = \frac{\sum _{α, β} \sum _{k \in K} g _{k}^{- 1} ( f ( g _{k} ( α Y + β ))) - β}{\sum _{α, β} α ∣ K ∣}

α \in {ma x (X) - min (X), 1, \frac{1}{ma x ( X ) - min ( X )}}

α \in {ma x (X) - min (X), 1, \frac{1}{ma x ( X ) - min ( X )}}

β = (1 - α) m e an (X) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptical Systems and Laser Technology · Image Processing Techniques and Applications · Photoacoustic and Ultrasonic Imaging

Full text

Self-Commmittee Approach for Image Restoration Problems using

Convolutional Neural Network

Byeongyong Ahn, and Nam Ik Cho B. Ahn and N. I. Cho are with the Dept. of Electrical and Computer Engineering, Seoul National University, 1, Gwanak-ro, Gwanak-Gu, Seoul 151-742, Korea and also affiliated with INMC (e-mail: [email protected]; [email protected]).

Abstract

There have been many discriminative learning methods using convolutional neural networks (CNN) for several image restoration problems, which learn the mapping function from a degraded input to the clean output. In this letter, we propose a self-committee method that can find enhanced restoration results from the multiple trial of a trained CNN with different but related inputs. Specifically, it is noted that the CNN sometimes finds different mapping functions when the input is transformed by a reversible transform and thus produces different but related outputs with the original. Hence averaging the outputs for several different transformed inputs can enhance the results as evidenced by the network committee methods. Unlike the conventional committee approaches that require several networks, the proposed method needs only a single network. Experimental results show that adding an additional transform as a committee always brings additional gain on image denoising and single image supre-resolution problems.

Index Terms:

Image Restoration, Convolutional Neural Network, Image Prior, Convolutional Neural Network Committee

I Introduction

Image restoration problems are to estimate high-quality images from low-resolution or degraded ones, which are mostly ill-posed problems. Hence conventional image restoration methods exploited various kinds of image priors such as gradient model [1, 2, 3], wavelet model [4, 5], Markov random field (MRF) [6, 7, 8], sparse representation [9, 10, 11] and nonlocal self similarity (NSS) prior [12, 13, 14]. Although these algorithms have shown promising results, they suffer from some drawbacks. First, some of the models are heuristically designed and they involve parameters that needs to be tuned by a user. Therefore the performance may often depend on the characteristics of input image and parameters. Moreover, the methods find the optimal solution by solving complex optimization problems that are mostly computationally expensive and also difficult to be parallelized.

In recent years, learning based methods that can overcome the above stated problems have been developed. For example, Schmidt et al. [15] proposed a cascade of shrinkage fields (CSF). The algorithm unifies the random field-based model and quadratic optimization into a learning framework. Chen et al. [16] proposed a trainable nonlinear reaction diffusion (TNRD) model. This method learns the parameters for a diffusion model by the gradient descent procedure. In addition, with the rapid progress of graphic processing units (GPU) programming and parallel processing, deep learning based image restoration methods have also attracted great attentions. Burger et al. [17] proposed a multilayer perceptron (MLP) based denoising algorithm, which achives the competitive performance with prior model based methods. Dong et al. [18] presented a convolutional neural network (CNN) based image super-resolution method, which is shown to outperform the prior based methods. Kim et al. [19] proposed a skip-connection, which showed that learning the residual image is more effective. Zhang et al. [20] developed a deep CNN for image denoising, which utilizes the residual learning and batch normalization [21]. This network shows the state-of-the art performance for many restoration problems including Gaussian image denoising, single image super-resolution (SISR) and JPEG image deblocking. Although the deep learning based methods are proven to be effective in many tasks, they also have some limitations. First, the training can be struck to a local minima and therefore the initial condition for the training affects the performance. Zhao et al.[22] showed that local minima limits the network performance. Second, since the training aims to minimize only the pixel-based error, we do not know which prior or which structure is well dealt with the newtork. In this respect, it is shown that combining some image priors [23] or using multiple networks can improve the performance of restoration or classification problems [25].

In this letter, we propose a committee approach that works at the inference stage to enhance the performance of CNN based image restoration methods. The idea of “network committees” for a vision task was introduced in [24, 25], and it was shown to achive the best performance for MNIST digit classification problem [26]. The main idea of this method is to average the outputs of differently trained networks (called network committees) to the same input, which could alleviate the local minima problem and increase the performance. Our proposed method differs from the conventional committee approaches in constructing the committee members. Specifically, we use only a single network named base network, and instead of preparing committees as the different networks, we define the committees as the outputs of the network with differently transformed inputs. Precisely, we note that the trained network sometimes finds different feature map for the transformed input such as flipped or linearly transformed images and thus produces different output (when inverse transformed). Thus we prepare several transforms, and the transformed inputs are passed through the network and their outputs are used as committees. The outputs are averaged to be the final output. The proposed method is named self-committee network (SCN) in the sense that only a single network is used. Experimental results show that the proposed method can improve the performance of the CNN based image restoration methods without additional training or fine-tuning.

II Proposed Algorithm

The key ideas of our method are summarized as follows. Fist, some transformations are applied to an input image, which constructs a group of images for the given input. The group members are individually passed through the network and the outputs are inverse transformed to the original image space. Then the final output is estimated from the group of output images. An example of the proposed SCN framework is illustrated in Fig. 1. In this letter, two kinds of image transformations are considered, which would bring the output with almost the same performance but different characteristics.

II-A Flip and/or Rotation (FR)

Training based image restoration algorithms [15, 16, 17, 20, 18] aim to learn a mapping function $f(Y)\sim X$ for a degraded image $Y$ and its ground-truth $X$ . In the view of human visual system (HVS), it is natural that the mapping function should also work the same for the flipped and/or rotated image, i.e., it is desired that the FR image of the restored output must be the same as the restored output of the FR input :

[TABLE]

where $g$ is the FR operation. Most prior based image restoration methods satisfy this condition, because the FR operations do not affect the image prior such as gradient distribution or sparsity.

However, it does not hold for the CNN based image restoration methods. Although they augment training data by FR operations [20, 16], it does not force the trained convolution filters to be spatially symmetric, which is needed for FR invariance. Therefore, they produce different results for the FR images and thus it is worth to construct FR committees, where specific operations are summarized in Table. I. In detail, we make member inputs $\{g_{k}(Y)\}$ and their corresponding member outputs $\{g_{k}^{-1}(f(g_{k}(Y)))\}$ . Ciregan et al. [24] showed that averaging the outputs of the networks trained from the different initial states can improve the performance. Following the study, we also average the member outputs to get the final output

[TABLE]

where $K$ is a subset of $\{1,2,...8\}$ and $\left|K\right|$ is the size of $K$ .

II-B Linearity

Some image degradation models such as noise-free blurring or image downsampling are assumed as a linear model, $Y=XHV$ where $H$ is a blur kernel and $V$ is a resizing matrix. Therefore, it is natural that their corresponding restoration problems, i.e. deblurring or SISR, are also linear:

[TABLE]

for any scalar $\alpha$ and $\beta$ . However, the neural network assumes that the mapping function is non-linear and the network contains bias term in every neuron and non-linear activation functions such as rectified linear unit (ReLU). As a result, (3) does not hold for neural network based algorithms, which will produce different outputs for the scaled and/or biased inputs (even when they are restored by removing the bias and rescaled). Hence we can prepare a committee for the member of inputs with several different $\alpha$ and $\beta$ , i.e., we construct the output as

[TABLE]

where

[TABLE]

However, we cannot freely set the $\alpha$ and $\beta$ in the noisy environments $Y=XHV+N$ where $N$ is the noise, because the scaling $\alpha$ changes the noise characteristics. Assuming that the noise distribution is zero mean and symmetric, we can use just two committees such that $\{(\alpha,\beta)\}=\{(1,0),(-1,1)\}$ for the noisy environment in order not to scale the noise component. Specifically, we obtain the output as

[TABLE]

which maintain the range of input pixel values, on which the network is trained and works best.

Since the linearity and FR invariance are independent property, they can also cooperate to make a larger committee as

[TABLE]

III Experimental Results

We conducted experiments for two types of the image restoration: Image denoising and SISR. The performance is evaluated by the peak signal-to-noise ratio (PSNR) [27] and improved PSNR (IPSNR) compared to the base network. We test 6 types of committees that are summarized in Table. II.

III-A Experiments on image denoising network

For Gaussian image denoising, we use DnCNN [20] as a base network because of its promising performance and short run-time on GPU. The test set is shown in Fig. 2, which is consisted of 12 images that are widely used for the test of image denoising. Fig. 3 summarizes the average IPSNR for various noise levels and Table. III shows the PSNR results on overall test images with $\sigma=30$ .

The results suggest the followings

•

The employment of additional committee always improves the performance.

•

The information of an image is severely distorted in a high noise level. Therefore, only a single network is hard to be optimal and adding the committees is more beneficial at higher noise level.

In order to analyze the improvement in view of the feature space, we extracted feature maps from an original image and its inverted one as illustrated in Fig. 4. As shown in Fig. 4 - (b), low-level feature maps of an inverted image are similar to the inversion of the original feature maps. However, the high-level features show somewhat different characteristics. The original feature map and inverted image feature map are similar in some cases (in the first and third row) but in other cases, they show weak correlation (in the second and fourth row). Moreover, the output of the inverted image would be re-inverted to the original image space and therefore, the two feature maps are distinct in the end. It implies that the function of a committee is expanding the feature maps and enables more accurate process, rather than just augmenting the input.

III-B Experiments on a single image super-resolution network

We also test the proposed SCN framework for a SISR. In order to show the robustness to the base network, we used SRCNN [18] as a base network. We adopt two test datasets (Set 5 and Set 14) with three scaling factors (2, 3 and 4). Four committees as shown in Table. II are tested: SCN-FR, SCN-I, SCN-L, and SCN-Full. For SCN-L, we set the parameters $\alpha$ and $\beta$ to

[TABLE]

By using these values, we can maintain the mean pixel value and prevent the pixel value saturation. Table IV lists the average PSNRs of different committees and Fig. 5 presents an example. As shown in the results, the committee is beneficial for various the image restoration tasks and network formulations. Since the activation function (ReLU) keeps the linearity in a large range, scaling and shifting the input do not show notable difference. On the other hand, the inversion reverses the signs of the feature maps and thus draws out informations that are discarded from the original network. Hence the SCN-I generally yields higher PSNR than the SCN-L, which is just a scaling based committee.

IV Conclusion

In this letter, we have presented a self-committee method to improve the performance of CNN based image restoration algorithms. Unlike the existing approaches that use several differently trained networks as the committee members, we use a single network and use the outputs of transformed inputs as the committee member. The transfomed inputs induce different feature maps from the original, and thus produces the outputs with different characteristics. Hence averaging the outputs from differently transformed inputs could enhance the restoration performances. Experiments show that the proposed method enhances the performance of state-of-the-art image denoising and SISR networks.

Bibliography27

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: Nonlinear Phenomena , vol. 60, no. 1-4, pp. 259–268, 1992.
2[2] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, “An iterative regularization method for total variation-based image restoration,” Multiscale Modeling & Simulation , vol. 4, no. 2, pp. 460–489, 2005.
3[3] Y. Weiss and W. T. Freeman, “What makes a good model of natural images?” in IEEE Conference on Computer Vision and Pattern Recognition(CVPR) , 2007, pp. 1–8.
4[4] S. G. Chang, B. Yu, and M. Vetterli, “Adaptive wavelet thresholding for image denoising and compression,” IEEE Transactions on Image Processing , vol. 9, no. 9, pp. 1532–1546, 2000.
5[5] N. Remenyi, O. Nicolis, G. Nason, and B. Vidakovic, “Image denoising with 2d scale-mixing complex wavelet transforms,” IEEE Transactions on Image Processing , vol. 23, no. 12, pp. 5165–5174, 2014.
6[6] S. Roth and M. J. Black, “Fields of experts: A framework for learning image priors,” in IEEE Conference on Computer Vision and Pattern Recognition(CVPR) , vol. 2, 2005, pp. 860–867.
7[7] X. Lan, S. Roth, D. Huttenlocher, and M. J. Black, “Efficient belief propagation with learned higher-order markov random fields,” in European Conference on Computer Vision(ECCV) , 2006, pp. 269–282.
8[8] S. Z. Li, Markov random field modeling in image analysis . Springer Science & Business Media, 2009.