Motion Blur removal via Coupled Autoencoder

Kavya Gupta; Brojeshwar Bhowmick; Angshul Majumdar

arXiv:1812.09888·cs.CV·December 27, 2018

Motion Blur removal via Coupled Autoencoder

Kavya Gupta, Brojeshwar Bhowmick, Angshul Majumdar

PDF

TL;DR

This paper introduces a coupled autoencoder approach for motion blur removal, framing deblurring as a transfer learning problem, which enables faster, on-the-fly image restoration with improved quality over existing methods.

Contribution

It presents a novel formulation that recasts deblurring as transfer learning and employs a coupled autoencoder for simultaneous learning of weights and coupling maps.

Findings

01

Outperforms state-of-the-art techniques in image quality

02

Operates faster without costly inverse problems

03

Effective for real-time motion blur removal

Abstract

In this paper a joint optimization technique has been proposed for coupled autoencoder which learns the autoencoder weights and coupling map (between source and target) simultaneously. The technique is applicable to any transfer learning problem. In this work, we propose a new formulation that recasts deblurring as a transfer learning problem, it is solved using the proposed coupled autoencoder. The proposed technique can operate on-the-fly, since it does not require solving any costly inverse problem. Experiments have been carried out on state-of-the-art techniques, our method yields better quality images in shorter operating times.

Figures22

Click any figure to enlarge with its caption.

Tables1

Table 1. Table 1 : Comparison of PSNR and SSIM values

Blur type		Krishnan et al.[8]	Whyte et al.[11]	Pan et al.[9]	Xu et al.[10]	Xu et al.[19]	Ren et al.[1]	Proposed
Uniform blur	(PSNR)	23.7679	23.5257	23.6927	22.9265	25.6540	20.9342	30.8893
Uniform blur	(SSIM)	0.6929	0.6899	0.7015	0.6620	0.7708	0.7312	0.8787
Non Uniform blur	(PSNR)	20.3013	20.4161	19.6594	20.5548	19.9718	20.8226	29.6364
Non Uniform blur	(SSIM)	0.5402	0.5361	0.5345	0.5812	0.5692	0.7328	0.8711

Equations8

W_{D S}, W_{E S}, W_{D T}, W_{E T}, M argmin ∥ X_{S} - W_{D S} φ (W_{E S} X_{S}) ∥_{F}^{2} + ∥ X_{T} - W_{D T} φ (W_{E T} X_{T}) ∥_{F}^{2} + λ ∥ φ (W_{E T} X_{T}) - M φ (W_{E S} X_{S}) ∥_{F}^{2}

W_{D S}, W_{E S}, W_{D T}, W_{E T}, M argmin ∥ X_{S} - W_{D S} φ (W_{E S} X_{S}) ∥_{F}^{2} + ∥ X_{T} - W_{D T} φ (W_{E T} X_{T}) ∥_{F}^{2} + λ ∥ φ (W_{E T} X_{T}) - M φ (W_{E S} X_{S}) ∥_{F}^{2}

W_{D S}, W_{E S}, W_{D T}, W_{E T}, M, Z_{S}, Z_{T} argmin ∥ X_{S} - W_{D S} Z_{S} ∥_{F}^{2} + ∥ X_{T} - W_{D T} Z_{T} ∥_{F}^{2} + λ ∥ Z_{T} - M Z_{S} ∥_{F}^{2} + η ∥ Z_{S} - φ (W_{E S} X_{S}) - B_{S} ∥_{F}^{2} + η ∥ Z_{T} - φ (W_{E T} X_{T}) - B_{T} ∥_{F}^{2}

W_{D S}, W_{E S}, W_{D T}, W_{E T}, M, Z_{S}, Z_{T} argmin ∥ X_{S} - W_{D S} Z_{S} ∥_{F}^{2} + ∥ X_{T} - W_{D T} Z_{T} ∥_{F}^{2} + λ ∥ Z_{T} - M Z_{S} ∥_{F}^{2} + η ∥ Z_{S} - φ (W_{E S} X_{S}) - B_{S} ∥_{F}^{2} + η ∥ Z_{T} - φ (W_{E T} X_{T}) - B_{T} ∥_{F}^{2}

B_{S} \leftarrow φ (W_{E S} X_{S}) + B_{S} - Z_{S}

B_{S} \leftarrow φ (W_{E S} X_{S}) + B_{S} - Z_{S}

B_{T} \leftarrow φ (W_{E T} X_{T}) + B_{T} - Z_{T}

B_{T} \leftarrow φ (W_{E T} X_{T}) + B_{T} - Z_{T}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSolana Customer Service Number +1-833-534-1729

Full text

MOTION BLUR REMOVAL VIA COUPLED AUTOENCODER

Abstract

In this paper a joint optimization technique has been proposed for coupled autoencoder which learns the autoencoder weights and coupling map (between source and target) simultaneously. The technique is applicable to any transfer learning problem. In this work, we propose a new formulation that recasts deblurring as a transfer learning problem; it is solved using the proposed coupled autoencoder. The proposed technique can operate on-the-fly; since it does not require solving any costly inverse problem. Experiments have been carried out on state-of-the-art techniques; our method yields better quality images in shorter operating times.

**Index Terms— ** autoencoder,coupling,deblurring,motion blur,split bregman

1 Introduction

Motion blurring is one of the common source of image corruption. It can be caused due to multiple reasons such as lens aberrations, camera shaking, long exposure time etc. The process of recovering sharp images from blurred images is termed as deblurring. This work proposes deblurring as a transfer learning problem, where the source domain is the blurred image and the target domain is the clean image. We solve it via coupled autoencoder. Our technique is generic and applicable for solving any inverse problem in imaging, e.g. denoising, inpainting, super-resolution etc. However, in this work we focus on deblurring. Despite significant progress in image deblurring, which is a well studied topic in computer vision, the existing deblurring algorithms often fail due to non-generalized kernel based approaches and also due to computational complexity. In this paper we recover the sharp image for a given blurred image with the help of a learning framework and hence can be generalized to broad class of blur. We focus on removal of the motion blur - uniform and non-uniform (space variant).

There are plethora of studies such as [1],[2],[3],[4] which use neural networks and CNNs frameworks for solving computer vision tasks. But there are only a few studies [5],[6],[7] which solve inverse problems via autoencoders. In [5], the authors used Sparse Stacked Autoencoders (SSAE) for denoising the images which can also be put in deblurring framework. The motivation for having fast deblurring techniques comes from the requirement of sharp and undistorted images for the purpose of SLAM (Simultaneous Localization and Mapping), visual odometry , optical flow etc. There is a constant problem of motion blur while acquiring images and videos with the cameras fitted to the high speed motion drones. Distorted images will intervene with the mapping of the visual points, hence the pose estimation and tracking will be corrupted. Usual deblurring techniques solve an iterative inverse problem. This yields good quality images, but precludes itself from real-time applications. Our proposed method is light weighted, learning source and target autoencoders with mapping between source and target simultaneously.

2 Literature Review

2.1 Deblurring Techniqes

Image deblurring techniques can be categorized into two types - blind and non-blind techniques. Non-blind techniques require priors about the blur kernel and it’s parameters whereas for blind deblurring techniques we assume that the kernel is unknown. Estimation of accurate kernels is detrimental for good deblurring especially in case of space variant blurs. Single image deblurring techniques jointly estimate the motion kernels and sharp image. [8],[9],[10] use sparsity priors to retreive latent sharp image for better kernel estimations. In [11], blur kernel is estimated in the camera motion space itself.

Recent development has been on devising learning based techniques for learning the degradation models. In [2] authors proposed an image deconvolution neural network for non-blind deconvolution which focuses on removal of uniform blur. In [1] a vectorization based CNN method was proposed which showed improvement on various high and low level vision tasks. [4] proposes a deep learning approach of predicting the motion kernel at patch level using a CNN.

2.2 Coupled Representation Learning

Coupled dictionary learning has a rich literature [12],[13],[14]. It has been applied to a wide range of problems in image synthesis, e.g., single image super-resolution, photo-sketch synthesis, cross spectral (RGB-NIR) face recognition, RGB-depth classification etc. It has also been used for trans-lingual information retrieval [15]. The main idea in coupled dictionary is learning of dictionaries (and corresponding coefficients) for the two domains - source and target, such that the coefficients from one domain can be linearly mapped to the other.

The concept of coupled autoencoder is new; it follows from dictionary learning. The main idea here is to learn an autoencoder for the source and another for the target along with a mapping from the source to the target (semi-coupled) and vice versa (fully coupled). There are only a handful of studies on this. In [7], it learns two deep stacked autoencoders for two domains - source (low resolution image) and target (high resolution image). These are learnt separately. Once the autoencoders are learnt a mapping from the deepest layer of the source autoencoder is learnt to the target autoencoder. The approach is piecemeal and hence sub-optimal. The autoencoders are learnt independently for the different domains and hence there is no feedback from one to the other during the learning stage. The mapping from the source to target is a stand-alone process which has no influence on the source and target autoencoder learning.

In another study [16], it learns coupled shallow autoencoders and stacks them up to form a deep architecture greedily. As in the prior work, their coupled autoencoder learning process is sub-optimal as well. The autoencoders (for source and target) are learnt separately; finally a mapping between the two is learnt, consequently the coupling has no bearing on the autoencoder training.

The only prior work that optimally learns the mapping during the autoencoder training process is [17]. However they use a marginalized denoising autoencoder, which is much simpler to solve compared to the full autoencoder having separate encoding and decoding layers.

3 Proposed Formulation

This is the first work that introduces an optimal formulation for coupled autoencoder. Unlike prior studies, the autoencoders for source and target will be learnt along with the linear mapping between the two. This is optimal in the sense that all the variables influence each other in the learning process - this has been missing in prior works.

Figure 1 shows the schematic diagram. The source autoencoder uses the blurred samples and the target autoencoder uses the corresponding clean samples. The coupling learns to map the representation from the source (blurred) to the target (clean). Mathematically this is expressed as,

[TABLE]

The first term ${\parallel X_{S}-W_{DS}\varphi(W_{ES}X_{S})\parallel}_{F}^{2}$ is the standard autoencoder formulation for the source. Here $W_{DS}$ corresponds to the decoder and $W_{ES}$ is the encoder. The second term ${\parallel X_{T}-W_{DT}\varphi(W_{ET}X_{T})\parallel}_{F}^{2}$ is the autoencoder for the target; $W_{DT}$ and $W_{ET}$ are the decoder and the encoder respectively. ${\parallel\varphi(W_{ET}X_{T})-M\varphi(W_{ES}X_{S})\parallel}_{F}^{2}$ is the coupling term. The linear map M couples the representation from the source to that of the target.

We solve it using the variable splitting technique. The first step is to introduce proxies for the representations, i.e. $Z_{S}=\varphi(W_{ES}X_{S})$ and $Z_{T}=\varphi(W_{ET}X_{T})$ and forming the Lagrangian. A better approach to handle this is to use the Split Bregman technique [18]. A Bregman relaxation variable is introduced such that the value of the hyper-parameter need not be changed; the relaxation variable is updated in every iteration automatically so that it enforces equality at convergence. The Split Bregman formulation is,

[TABLE]

Here $B_{S}$ and $B_{T}$ are Bregman relaxation variables. We invoke the alternating minimization method of multipliers [20] to segregate (2) into the following (simpler) sub-problems.

P1 : $\underset{W_{DS}}{\text{argmin}}{\parallel X_{S}-W_{DS}Z_{S}\parallel}_{F}^{2}$

P2 : $\underset{W_{DT}}{\text{argmin}}{\parallel X_{T}-W_{DT}Z_{T}\parallel}_{F}^{2}$

P3 : $\underset{W_{ES}}{\text{argmin}}{\parallel Z_{S}-\varphi(W_{ES}X_{S})-B_{S}\parallel}_{F}^{2}\\ \hskip 15.00002pt\equiv\underset{W_{ES}}{\text{argmin}}{\parallel\varphi^{-1}(Z_{S}-B_{S})-W_{ES}X_{S}\parallel}_{F}^{2}$

P4 : $\underset{W_{ET}}{\text{argmin}}{\parallel Z_{T}-\varphi(W_{ET}X_{T})-B_{T}\parallel}_{F}^{2}\\ \hskip 15.00002pt\equiv\underset{W_{ET}}{\text{argmin}}{\parallel\varphi^{-1}(Z_{T}-B_{T})-W_{ET}X_{T}\parallel}_{F}^{2}$

P5 : $\underset{Z_{S}}{\text{argmin}}{\parallel X_{S}-W_{DS}Z_{S}\parallel}_{F}^{2}+\lambda{\parallel Z_{T}-MZ_{S}\parallel}_{F}^{2}+\\ \hskip 25.00003pt\eta{\parallel Z_{S}-\varphi(W_{ES}X_{S})-B_{S}\parallel}_{F}^{2}$

P6 : $\underset{Z_{T}}{\text{argmin}}{\parallel X_{T}-W_{DT}Z_{T}\parallel}_{F}^{2}+\lambda{\parallel Z_{T}-MZ_{S}\parallel}_{F}^{2}+\\ \hskip 25.00003pt\eta{\parallel Z_{T}-\varphi(W_{ET}X_{T})-B_{T}\parallel}_{F}^{2}$

P7 : $\underset{M}{\text{argmin}}{\parallel Z_{T}-MZ_{S}\parallel}_{F}^{2}$

Sub-problems P1, P2, P5, P6, P7 are simple least squares problems having an analytic solution in the form of pseudo-inverse. Sub-problems P3 and P4 are originally non-linear least squares problems; but since the activation function is trivial to invert (it is applied element-wise), we can convert P3 and P4 to simple linear least squares problems and solve them using pseudo-inverse. The final step in every iteration is to update the Bregman relaxation variables. This is achieved by simple gradient descent.

[TABLE]

We have used two stopping criteria. Iterations stop when a specified maximum number of iterations is reached or when the objective function converges to a local minimum. By convergence we mean that the value of the objective function does not change much in successive iterations.

4 Results

For the experimental evaluation of our proposed method, we use images from standard image blur dataset [21]- CERTH dataset. It contains 630 undistorted, 220 naturally-blurred and 150 artificially-blurred images in the training set and 619 undistorted, 411 naturally blurred and 450 artificially blurred images in the evaluation set. For performance evaluation we take a small subset of the dataset. We take 50 images for training, 20 images for testing and corrupt them with the motion blur kernels. The algorithm is tested for two scenarios – uniform kernel and non-uniform kernel [22]. The images in the dataset have huge sizes so we did the deblurring patch wise, a protocol followed in [5]. For training and testing we divided the whole image into number of overlapping patches. Each individual patch acts as a sample. We do not do any preprocessing to the images or patches. We use the blurry patches to train the source autoencoder, the corresponding clean patches to train the target autoencoder and a mapping is learnt between the target and source representation. During testing, the blurred patch is given as the input of the source. The corresponding source representation is obtained from the source encoder from which the target representation is obtained by the coupling map. Once the target representation is obtained, the decoder of the target is used to generate the clean (deblurred) patch. Once all the individual patches of the testing image are deblurred we reconstruct the whole image by placing patches at their locations and averaging at the overlapping regions. We do not get blocking artefacts consequently we do not do any post processing to the recovered images.

We use patches of 40x40 with an overlapping of 20x20 for all the images. The input size for the autoencoders hence become 1600 and are learned with 1400 hidden nodes. There are just two parameters $\lambda$ and $\eta$ while training which do not require much tuning. Whereas the testing is parameter free. The testing flow is shown in the Figure 1.

We compare the proposed method with several state-of-the-art techniques with [8],[9],[10],[11],[19] which estimates the kernel and retreive deblurred image by deconvolution and [1] which learns a Vectorized CNN for deblurring. The consolidated results are shown in Table 1. The PSNR and SSIM values are averaged over for all the 20 testing images. We see that we consistently perform better than all the methods on both PSNR and SSIM for both uniform and non-uniform blur. Our experimentation showed that the framework can learn characteristics for different kinds of motion blur and yields good performances on them as well.

For visualizing the results, we show two sets of testing images. Figure 2 and Figure 3 show the result comparison on uniform blur and non-uniform blur removal problem respectively.

5 Conclusion

In this paper, we proposed an optimal formulation of learning coupled autoencoders while simultaneously learning a mapping between source and target autoencoders. Our method is generalized and is applicable for any tranfer learning problem. Through experimental evaluation we showed success of our proposed method on motion blurred images. Our method is computationaly inexpensive and deblur images in seconds.

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. S. Ren and L. Xu, “On vectorization of deep convolutional neural networks for vision tasks,” ar Xiv preprint ar Xiv:1501.07338 , 2015.
2[2] L. Xu, J. S. Ren, C. Liu, and J. Jia, “Deep convolutional neural network for image deconvolution,” in Advances in Neural Information Processing Systems , 2014, pp. 1790–1798.
3[3] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE transactions on pattern analysis and machine intelligence , vol. 38, no. 2, pp. 295–307, 2016.
4[4] J. Sun, W. Cao, Z. Xu, and J. Ponce, “Learning a convolutional neural network for non-uniform motion blur removal,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2015, pp. 769–777.
5[5] K. Cho, “Simple sparsification improves sparse denoising autoencoders in denoising highly corrupted images,” in Proceedings of the 30th international conference on machine learning (ICML-13) , 2013, pp. 432–440.
6[6] X. J. Mao, C. Shen, and Y.B. Yang, “Image restoration using convolutional auto-encoders with symmetric skip connections,” ar Xiv preprint ar Xiv:1606.08921 , 2016.
7[7] K. Zeng, J. Yu, R. Wang, C. Li, and D. Tao, “Coupled deep autoencoder for single image super-resolution,” IEEE transactions on cybernetics , vol. 47, no. 1, pp. 27–37, 2017.
8[8] D. Krishnan, T. Tay, and R. Fergus, “Blind deconvolution using a normalized sparsity measure,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on . IEEE, 2011, pp. 233–240.