TL;DR
This paper introduces weighted proximal methods (WPMs) to improve the computational efficiency of the RED framework by reducing the number of denoiser calls needed for inverse problem solving.
Contribution
It demonstrates that existing RED solvers are special cases of WPMs and proposes variants that significantly decrease computation time.
Findings
WPMs encompass existing RED solvers as special cases.
Variants of WPM reduce denoiser calls and computation time.
Numerical experiments confirm improved efficiency of WPM-based RED solutions.
Abstract
REgularization by Denoising (RED) is an attractive framework for solving inverse problems by incorporating state-of-the-art denoising algorithms as the priors. A drawback of this approach is the high computational complexity of denoisers, which dominate the computation time. In this paper, we apply a general framework called weighted proximal methods (WPMs) to solve RED efficiently. We first show that two recently introduced RED solvers (using the fixed point and accelerated proximal gradient methods) are particular cases of WPMs. Then we show by numerical experiments that slightly more sophisticated variants of WPM can lead to reduced run times for RED by requiring a significantly smaller number of calls to the denoiser.
| FP-MPE | APG | WPM | |
|---|---|---|---|
| Butterfly | |||
| Boats | |||
| House | |||
| Parrot | |||
| Lena | |||
| Barbara | |||
| Peppers | |||
| Leaves | |||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\WarningFilter
latexText page
Solving RED with Weighted Proximal Methods
Tao Hong, Irad Yavneh, and Michael Zibulevsky T. Hong, I. Yavneh, and M. Zibulevsky are with the Department of Computer Science, Technion-Israel Institute of Technology, Haifa, 32000 Israel. (Email: {hongtao,irad,mzib}@cs.technion.ac.il).
Abstract
REgularization by Denoising (RED) is an attractive framework for solving inverse problems by incorporating state-of-the-art denoising algorithms as the priors. A drawback of this approach is the high computational complexity of denoisers, which dominate the computation time. In this paper, we apply a general framework called weighted proximal methods (WPMs) to solve RED efficiently. We first show that two recently introduced RED solvers (using the fixed point and accelerated proximal gradient methods) are particular cases of WPMs. Then we show by numerical experiments that slightly more sophisticated variants of WPM can lead to reduced run times for RED by requiring a significantly smaller number of calls to the denoiser.
Index Terms:
Inverse problem, denoising algorithms, RED, weighted proximal methods, weighting.
I Introduction
THE goal of inverse problems is to recover an unknown signal from an indirect measurement . The measurement is commonly modelled as where denotes an abstract operator and is often assumed to be white Gaussian noise with mean zero and variance . In this paper, we assume to be a linear operator, with , and focus on natural images. Lacking any prior knowledge about the signal , we may reconstruct via the maximum likelihood (ML) minimization problem,
[TABLE]
However, it is well-known that this approach is not generally useful. Even in the simple denoising problem, where is the identity matrix, ML results in , that is, we simply recover the noisy image. Furthermore, quite often , resulting in infinitely many solutions, and even if this is not the case, may be highly ill-conditioned. For these reasons, the prevalent approach is to assume that the signal is sampled from some prior distribution, and to employ the maximum a posteriori probability (MAP) estimator, as formulated in Section II. In our setting, MAP will result in adding to the right-hand side of (1) a term , where is a parameter and is a regularization as discussed below. This approach has been applied with a large variety of priors, such as -based regularization [1], wavelets [2], total variation [3], kernel regularization [4], sparsity [5], and neural networks [6].
Naturally, the most widely studied problem in this framework is image denoising, e.g., [7, 5, 8, 6, 9]. Indeed, recent work suggests that the performance of leading image denoisers is close to a possible ceiling [10, 11, 12]. The availability of such powerful denoising algorithms has motivated researchers to seek ways to employ denoisers as priors for quite general inverse problems. The authors in [13, 14, 15] “manually” adopted priors used in existing denoisers for specific alternative inverse problems. Following this path, several authors proposed a general framework, called Plug-and-Play Priors () [16, 17], for using the abundance of high-performance image denoisers as priors for other inverse problems. These authors formulate inverse problems as an optimization task and employ an Alternating Direction Method of Multipliers (ADMM) algorithm to tackle the corresponding minimization problem [18]. The image denoising algorithm is incorporated in each step of ADMM as an implicit prior.
Motivated by , Romano et al. introduced REgularization by Denoising (RED) [19], which defines an optimization problem that includes the denoiser as an explicit prior. Given a differentiable denoiser , RED employs the following prior,
[TABLE]
where denotes transpose. Under two assumptions this leads to a convex minimization problem, and standard gradient based iterative methods are guaranteed to converge to a global minimum. Further details are provided in Section II.
Using state-of-the-art denoisers to construct priors is appealing, as it enables us to exploit the vast progress in denoising algorithms for addressing general inverse problems, and RED is a good framework to achieve this goal due to its flexibility. However, RED may be relatively expensive because at each iteration we must apply the denoising algorithm to evaluate the gradient, and the complexity of denoising algorithms is generally high. Indeed, the numerical experiments in [19] reveal this concern. In that paper the authors propose three solvers for RED, namely, steepest descent (SD), the fixed-point (FP) method and the ADMM scheme. Amongst these, the FP method is the most efficient, but it still needs hundreds of iterations to complete the recovery process.
Recently, [20] employed vector extrapolation to accelerate the FP method for RED, whereas [21] applies an accelerated proximal gradient (APG) algorithm111APG is also known as FISTA [22] or Nesterov’s acceleration [23].. Both these approaches are faster than FP for RED, but they still require dozens of iterations. In this paper, we propose a general framework called weighted proximal methods (WPMs) [24]. We show that FP and APG are in fact two particular variants of WPMs, and that by seeking a more effective weighting for WPMs we obtain a faster algorithm.
The rest of this paper is organized as follows. We review the RED framework and the FP and APG solvers in Section II. The general WPM scheme is introduced in Section III, and the choice of weighting is discussed. Numerical experiments on image deblurring and super-resolution tasks are presented in Section IV to demonstrate the efficiency of WPMs, followed by conclusions in Section V.
II REgularization by Denoising (RED)
The MAP recovery process is formulated as follows:
[TABLE]
Assuming a robust Gibbs-like distribution of , we have
[TABLE]
where denotes the so-called prior and is a scaling parameter. Note that small corresponds to highly probable signals. If is sampled from white Gaussian noise with mean zero and variance , then we have
[TABLE]
This leads to the following minimization problem [25],
[TABLE]
Substituting the RED prior (2) into (3), we obtain
[TABLE]
In [19] two assumptions are made regarding the image denoising algorithm used in RED:
Assumption 1**.**
For any scalar arbitrarily close to , .
Assumption 2**.**
The spectral radius of the symmetric Jacobian is upper bounded by one.
Given Assumption 1, we have
[TABLE]
Hence, the gradient of is the residual of the denoiser,
[TABLE]
With (5), the gradient of becomes
[TABLE]
Assumption 2 implies convexity of , and therefore of as well. Hence, any solution of yields a global minimum. This is a nonlinear problem, and we therefore resort to iterative solvers. One such solver is the FP method mentioned above, which lags the nonlinear term :
[TABLE]
We note that (7) can efficiently be solved for exactly in the Fourier domain if is block-circulant, or treated iteratively for a general . The FP method can be accelerated using the APG approach as described in the following algorithm. Further discussion of APG can be found in [21].
III Weighted Proximal Methods
Consider the following composite problem and assume its solution set is nonempty,
[TABLE]
where and are convex and differentiable. Denote the proximal operator by
[TABLE]
where is a symmetric positive definite matrix called the weighting and denotes the -norm, . With these, we describe the explicit form of WPMs for (8) in Algorithm 2 [24, Chap. 10.7.5]. Note that by setting with , we recover the proximal gradient (PG) method. Usually, PG is used for (8) when is nonsmooth [26], whereas here we use it even though is differentiable. We do this for computational efficiency, knowing that applying the denoiser is the most expensive part of the solution process.
To apply Algorithm 2 to RED, we set and . If is convex, solving (9) is equivalent to satisfying the first-order optimality condition,
[TABLE]
Substituting , and , at the th iteration into (10) and rearranging, we obtain
[TABLE]
In this paper, we use the conjugate gradient (CG) method to approximately solve (11) for .
Next we discuss possible practical choices for the weighting . Note first that if we set , where is the identity matrix, and select the step-size , (11) is reduced to (7) and we recover the FP method. Moreover, by using the accelerated version of Algorithm 2 (cf. [24, Chap. 10.7.5]) we get APG [21] . We now propose a more elaborate approach of choosing some approximation to the Hessian of as the weighting. (Because of the abstract denoiser in , the exact Hessian is not computable.) Specifically, we choose the symmetric-rank-one (SR1) approximation to the Hessian [27, Chap. 6.2], as is used in quasi-Newton methods. The SR1 approximation is described in Algorithm 3. This choice yields faster convergence in our experiments than either FP or APG, as shown below. We henceforth use WPM to denote Algorithm 2 with the weighting chosen by Algorithm 3.
Unlike the traditional SR1, we formulate each from the initial rather the previous iterate [27]. Moreover, we scale by as suggested in [28], which we found useful in practice. In the practical implementation of Algorithm 3, we efficiently represent as a matrix-vector multiplication operator rather than as an explicit matrix.
In general, the step-size in Algorithm 2 needs to be chosen by some line search process to guarantee monotonically decreasing objective values at each iteration. However, because evaluating the objective value in RED requires calling the denoiser, standard line search methods may dramatically increase the complexity of the algorithm. To maintain a low computational cost, we fix and reduce the step-size by half only if the objective value exhibits a relative growth above some threshold, i.e., , where we use in all our experiments. In practice, we found that we never needed to reduce the step-size.
In this paper we only investigate the SR1 approximation to the Hessian of . We acknowledge that a more accurate Hessian estimate may prove to be even more cost-effective for RED, but leave such investigation to future work. Because we use an approximate Hessian for the weighting, our algorithm is equivalent to a quasi-newton proximal method. It follows that if both and are strongly convex and their gradients are Lipschitz continuous, WPM with SR1 estimation, an appropriate step-size , and exact solution of (9), converges linearly; see details in [29]. Because we depart from these strict requirements for efficiency, we cannot claim provable convergence in our implementation. However, in all our experiments WPM converged. Finally, we note that [21] challenges the validity in practice of the underlying assumptions of RED for most denoisers, concluding that (6) is not truly the gradient of (4). Nevertheless, setting (6) to zero, as is the objective of all the algorithms we discuss here, remains a most attractive method for signal recovery.
IV Numerical Experiments
In this section we investigate the performance of solvers for RED. Following [19], we perform our tests on image deblurring and super-resolution tasks and use the trainable nonlinear reaction diffusion (TNRD) [6] method as the abstract denoiser. We remark that one can adopt deep denoising techniques instead of TNRD, since the differentiability requirement of the denoiser is not mandatory in practice [21]. This may possibly lead to improved results in practice, but we do not investigate such options here. Also, since the authors in [19] already show the superiority of RED for image deblurring and super-resolution tasks compared with other popular algorithms, we largely omit such comparisons in this paper and concentrate on computational efficiency. Moreover, the experiments conducted in [20] demonstrated that the FP method converges faster than LBFGS and Nesterov’s acceleration for RED. Therefore, we only compare WPM to FP [19], FP-MPE [20], and APG [21]. All of the experiments are carried out on a laptop with Intel iU CPU @2.50GHz and 8GB RAM.
For image deblurring, the image is degraded by convolving with a point spread function (PSF), uniform blur or a Gaussian blur with a standard derivation , and then adding Gaussian noise with mean zero and . The recovered peak-signal-to-noise ratio (PSNR) versus the number of denoiser evaluations (left column) and running time (right column) when using RED for the “Starfish” image are shown in Figure 2. We find that the performances of FP-MPE and APG are similar, whereas WPM is more efficient than both, requiring less denoiser evaluations and running time to achieve a comparable PSNR. These results also indicate that indeed the denoiser dominates the complexity of solving RED.
Next, we test the algorithms on image super-resolution. A low resolution image is generated by blurring a high-resolution image with a Gaussian kernel with standard derivation , and then downscaling by a factor of 3. To the resulting image we add Gaussian noise with mean zero and , resulting in our deteriorated image. The PSNR of the recovered fine-resolution image versus the number of denoiser evaluations (left) and running time (right) for the “Plants” image are presented in Figure 3. Again, we observe that WPM requires less denoiser evaluations and running time to achieve a comparable PSNR.
Examining the performance of the algorithms further, we run them on eight additional images tested in [19]. For each image, we run the FP method with denoiser evaluations and take the final PSNR as a benchmark. Then we examine how many denoiser evaluations are needed for APG, FP-MPE and WPM, to achieve a similar PSNR. The results are listed in Table I. Evidently, with the exception of “Boats” and “House” in the deblurring task, we observe that WPM requires the smallest number of denoiser evaluations to achieve a comparable PSNR, demonstrating its efficiency for solving RED. Additionally, we present the recovered results of the “Starfish” and “Leaves” images from deblurring with uniform and Gaussian blurs, respectively, and the “Butterfly” image from super-resolution in Figure 1 to visualize the effectiveness of RED solved by WPM.
V Conclusion
In this paper, we propose a general framework for RED called weighted proximal methods (WPMs). By setting and , we retrieve the FP and APG methods. However, by choosing the weighting to be an approximation to the Hessian of , we obtain a more efficient algorithm. The experiments on image deblurring and super-resolution tasks demonstrate that WPM with a simple and inexpensive approximation to the Hessian can substantially reduce the overall number of denoiser evaluations in the recovery process, usually resulting in significant speedup. In future work we aim to design better Hessian approximations in order to accelerate the computation further.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. A. King, R. B. Schwinger, P. W. Doherty, and B. C. Penney, “Two-dimensional filtering of spect images using the metz and wiener filters,” Journal of Nuclear Medicine , vol. 25, no. 11, pp. 1234–1240, 1984.
- 2[2] A. Chambolle, R. A. De Vore, N.-Y. Lee, and B. J. Lucier, “Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage,” IEEE Transactions on Image Processing , vol. 7, no. 3, pp. 319–335, 1998.
- 3[3] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: nonlinear phenomena , vol. 60, no. 1-4, pp. 259–268, 1992.
- 4[4] D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” in Computer Vision (ICCV), 2011 IEEE International Conference on . IEEE, 2011, pp. 479–486.
- 5[5] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Transactions on Image processing , vol. 15, no. 12, pp. 3736–3745, 2006.
- 6[6] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 39, no. 6, pp. 1256–1272, 2017.
- 7[7] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in Computer Vision and Pattern Recognition, CVPR, IEEE Computer Society Conference on , vol. 2, 2005, pp. 60–65.
- 8[8] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-d transform-domain collaborative filtering,” IEEE Transactions on Image Processing , vol. 16, no. 8, pp. 2080–2095, 2007.
