Convolutional Sparse Representations with Gradient Penalties
Brendt Wohlberg

TL;DR
This paper investigates convolutional sparse representations for image denoising, showing that gradient penalties on coefficient maps significantly improve their performance over traditional block-based methods.
Contribution
It introduces gradient penalties into convolutional sparse coding, enhancing noise removal capabilities beyond existing block-based approaches.
Findings
Gradient penalties improve convolutional sparse coding performance.
Convolutional representations with penalties outperform block-based methods in noise removal.
Gradient-penalized models achieve superior image reconstruction quality.
Abstract
While convolutional sparse representations enjoy a number of useful properties, they have received limited attention for image reconstruction problems. The present paper compares the performance of block-based and convolutional sparse representations in the removal of Gaussian white noise. While the usual formulation of the convolutional sparse coding problem is slightly inferior to the block-based representations in this problem, the performance of the convolutional form can be boosted beyond that of the block-based form by the inclusion of suitable penalties on the gradients of the coefficient maps.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5| Test Image | |||||
|---|---|---|---|---|---|
| Method | 1 | 2 | 3 | 4 | 5 |
| BPDN | 29.47 | 32.91 | 30.08 | 31.73 | 30.19 |
| CBPDN | 29.31 | 32.70 | 29.76 | 31.27 | 30.09 |
| CBPDN + Grd | 29.28 | 32.76 | 30.02 | 31.22 | 30.12 |
| CBPDN + STV | 30.17 | 33.01 | 29.90 | 32.09 | 30.34 |
| CBPDN + VTV | 29.60 | 33.04 | 29.96 | 31.63 | 30.31 |
| CBPDN + RTV | 29.28 | 32.84 | 29.76 | 31.29 | 30.19 |
| Test Image | |||||
|---|---|---|---|---|---|
| Method | 1 | 2 | 3 | 4 | 5 |
| CBPDN + Grd | -2.31 | -3.16 | -2.51 | -1.39 | -0.94 |
| CBPDN + STV | +0.04 | -0.22 | -0.04 | -0.03 | +0.03 |
| CBPDN + VTV | -0.64 | -0.77 | -0.89 | -0.29 | -0.34 |
| CBPDN + RTV | -1.28 | -0.66 | -0.73 | -0.47 | -0.33 |
| Test Image | |||||
|---|---|---|---|---|---|
| Method | 1 | 2 | 3 | 4 | 5 |
| BPDN | 29.47 | 32.03 | 29.92 | 31.38 | 30.19 |
| CBPDN | 29.24 | 31.73 | 29.54 | 30.89 | 30.00 |
| CBPDN + STV | 29.90 | 32.36 | 29.86 | 31.68 | 30.29 |
| CBPDN + VTV | 29.54 | 32.35 | 29.86 | 31.34 | 30.25 |
| CBPDN + RTV | 29.16 | 32.49 | 29.76 | 31.25 | 30.19 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Convolutional Sparse Representations with Gradient Penalties
Abstract
While convolutional sparse representations enjoy a number of useful properties, they have received limited attention for image reconstruction problems. The present paper compares the performance of block-based and convolutional sparse representations in the removal of Gaussian white noise. The usual formulation of the convolutional sparse coding problem is slightly inferior to the block-based representations in this problem, but the performance of the convolutional form can be boosted beyond that of the block-based form by the inclusion of suitable penalties on the gradients of the coefficient maps.
**Index Terms— ** Convolutional Sparse Representations, Convolutional Sparse Coding, Total Variation
1 Introduction
Sparse representations are well-established as a tool for inverse problems in a wide variety of areas, including signal and image processing, computer vision, and machine learning [1]. The standard form is a linear representation , where is the dictionary, is the representation, and is the signal to be represented. When is a linear transform with a fast transform operator, such as the Discrete Wavelet Transform, these representations can be computed for large images, but when is learned from training data and represented as an explicit matrix, this is not feasible, the standard approach being to independently compute the representations over a set of overlapping image patches. Convolutional sparse representations are a recent111More accurately, the label convolutional is recent, but the equivalent translation invariant sparse representations are much older [2, Sec. II]. alternative that replace the general linear representation with a sum of convolutions222Typically circular convolutions [3]. , where the elements of the dictionary are linear filters, and the representation consists of the set of coefficient maps .
There is growing interest in imaging and image processing applications of the convolutional form [4, 5, 6, 7, 8, 9]. Surprisingly, denoising of Gaussian white noise, arguably the simplest of all imaging inverse problems, has received almost no attention beyond a very brief example providing insufficient detail for reproducibilty [10, Sec. 4.4]. The present paper argues that, despite its numerous advantages in many contexts, the convolutional form is not competitive for the Gaussian white noise denoising problem, but that these deficiencies can be mitigated by moving beyond simple regularization, the specific form being investigated here consisting of additional penalties on the gradients of the coefficient map333A weighting strategy applied to the penalty has also been found to improve the denoising performance of convolutional sparse representations [11, Sec. 8], but that approach is not considered here due to space constraints..
It is emphasised that these extensions have relevance beyond the specific denoising test problem considered here, in that the improved performance reported on this problem can also be expected to have an impact on more general image reconstruction problems, e.g. when convolutional sparse coding is employed as the prior within the plug-and-play priors framework [12, 13]. There is also evidence that the inclusion of such gradient penalties enhances the performance of convolutional sparse representations in certain image decomposition/restoration problems [7, 9].
2 Convolutional Sparse Coding
The most widely used form of convolutional sparse coding is Convolutional Basis Pursuit DeNoising (CBPDN), defined as
[TABLE]
where the allow distinct weighting of the term for each filter . At present, the most efficient approach to solving this problem [2] is via the Alternating Direction Method of Multipliers (ADMM) [14] framework. An outline of this method is presented here as a basis for extensions proposed in following sections.
Problem (1) can be written as
[TABLE]
where is the Hadamard product, is a linear operator such that , and , , and are the block matrices/vectors
[TABLE]
This problem can be expressed in ADMM standard form as
[TABLE]
which can be solved via the ADMM iterations
[TABLE]
The solution to (6) is given by the soft thresholding operation [15, Sec. 6.5.2] where . The only computationally expensive step is (5), which can be solved via the equivalent DFT domain problem
[TABLE]
where denotes the DFT of variable . The solution for (8) is given by the linear system (for filters and an image with pixels)
[TABLE]
The key to solving this very large linear system is the observation that it can be decomposed into independent linear systems [16], each of which has a system matrix consisting of the sum of rank-one and diagonal terms so they they can be solved very efficiently by exploiting the Sherman-Morrison formula [17].
3 Gradient Regularization
An extension of (1) to include an penalty on the gradients of the coefficient maps was proposed in [6]. The primary purpose of this extension was as a regularization for an impulse filter intended to represent the low-frequency components of the image, but a small non-zero regularization on the other dictionary filters was found to provide a small improvement to the impulse noise denoising performance [6]. Considering the edge-smoothing effect of gradient regularization, a reasonable alternative to consider is Total Variation (TV) regularization. We consider three different variants:
scalar TV [18] applied independently to each coefficient map, 2. 2.
vector TV [19] applied jointly to the set of coefficient maps, 3. 3.
scalar TV [18] applied to the reconstructed image components rather than to the coefficient maps .
3.1 Scalar TV on Coefficient Map
The CBPDN problem extended by adding a scalar TV term on each coefficient map can be written as
[TABLE]
where and are filters that compute the gradients along image rows and columns respectively. The TV term can be written as where linear operators and are defined such that , and defining444Note that the notation is overloaded, taking on a different definition in each section.
[TABLE]
allows further reduction to .
Problem (10) can be written in standard ADMM form as
[TABLE]
The resulting subproblem has the form
[TABLE]
and the solution of the equivalent DFT domain problem is given by
[TABLE]
Since and are diagonal (the are diagonal, and therefore so are ), they can be grouped together with the term; the independent linear systems described in Sec. 2 are again composed from rank-one and diagonal terms and the Sherman-Morrison solution [17] can be directly applied without any substantial increase in computational cost.
The subproblem for (18) can be decomposed into the independent problems
[TABLE]
The solution for (21) is the same as that for (6), and (22) can be solved by use of the block soft thresholding operation [15, Sec. 6.5.1] applied in the same way as in the ADMM algorithm for the standard isotropic TV denoising problem [20, 21], [22, Sec. 4.1], i.e.
[TABLE]
where for .
3.2 Vector TV on Coefficient Maps
Instead of independently applying scalar TV to each coefficient map, one can treat the set of coefficient maps as a multi-channel image and apply Vector TV [19], originally designed for restoration of colour images. The corresponding extension of the CBPDN problem can be written as
[TABLE]
Using the as defined in Sec. 3.1, the TV term can be written as
[TABLE]
Defining I_{B}=\left(\begin{array}[]{cccc}I&I&\ldots&I\end{array}\right) and
[TABLE]
allows further reduction to
Problem (24) can be written in standard ADMM form as
[TABLE]
The resulting subproblem has the same form as (19) and can be solved in the same way. The subproblem is the same as (21) and can be solved in the same way, while the subproblem, which only differs from (22) in the first term, can be solved by
[TABLE]
where for .
3.3 Scalar TV in Image Domain
The use of TV regularization here is motivated as an exploration of additional or alternative forms of regularization to the standard regularization applied to the coefficient maps . An alternative way of introducing TV regularization, however, would be to consider it as a regularization on the components of the reconstructed image, which can be written as
[TABLE]
The final TV term can be expressed as
[TABLE]
Introducing linear operators defined such that , this can be written as
[TABLE]
and defining \Gamma_{l}=\left(\begin{array}[]{ccc}G_{l,0}&G_{l,1}&\ldots\end{array}\right) allows further reduction to \mu\big{\lVert}\sqrt{(\Gamma_{0}\mathbf{x})^{2}+(\Gamma_{1}\mathbf{x})^{2}}\big{\rVert}_{1}\;.
Problem (34) can be written in standard ADMM form as
[TABLE]
The resulting subproblem corresponding to (5) has the form
[TABLE]
and the solution of the equivalent DFT domain problem is given by
[TABLE]
Although the left hand side has the same algebraic form as that of (20), here and are rank-one rather than diagonal, and can therefore not be grouped together with the term as in the solution for (20). In this case the left hand side is rank-three plus a diagonal: while it cannot be solved using the simple Sherman-Morrison approach, there is still an efficient solution via iterated application of the Sherman-Morrison formula, as used to solve the CBPDN problem for a multi-channel image and dictionary [23]. This involves a greater cost in terms of computation time, but there is a corresponding reduction in memory requirements because and are only of the size of the image rather than of the size of the set of coefficient maps.
The subproblem for (41) has the same form as (21) – (22), and can be solved in the same way.
4 Results
The performance of standard block-based sparse coding and the different convolutional sparse coding methods described in Sections 2 and 3 was compared on a Gaussian white noise restoration problem. The standard sparse coding was computed via the Basis Pursuit DeNoising (BPDN) problem (i.e. problem (2) where is a standard dictionary matrix) and the resulting denoised blocks were aggregated via averaging (weighted by the number of blocks covering each pixel) to obtain a denoised image.
Two different dictionaries, one standard and one convolutional, were learned from the same set of ten training images (selected from images on Flickr with a Creative Commons license) of pixels each. The convolutional dictionary consisted of 128 filters of size , and was learned via the convolutional dictionary learning algorithm described in [2], while the standard dictionary consisted of 128 vectors of 64 coefficients each (i.e. a vectorised image block), and was learned via a non-convolutional variant of the algorithm used for learning the convolutional dictionary, applied to all image blocks in the training images. The standard dictionary was used for the BPDN experiments and the convolutional dictionary was used for all CBPDN experiments.
A set of five greyscale reference images, depicted in Fig. 1, was constructed by cropping regions of pixels from well-known standard test images. The regions were chosen to contain diversity of content while avoiding large smooth areas, and the size was chosen to be relatively small so that it would be computationally feasible to optimise method parameters via a grid search. The reference images were scaled so that pixel values were in the interval , and corresponding test images were constructed by adding Gaussian white noise with a standard deviation of 0.05. Following standard practice [24][6, Sec. 3], the CBPDN decomposition was applied to highpass filtered images, obtained by subtracting a lowpass component computed by Tikhonov regularization [25, pg. 3] with regularization parameter .
For the first set of experiments, the results of which are displayed in Table 1, the denoising performance of the different methods was individually optimised for each image via a search over a logarithmically spaced grid on the and parameters. The main points worth noting are:
- •
BPDN is consistently better than CBPDN by a small margin.
- •
CBPDN + Grd ( of gradient regularization, as in [6, Sec. 4]) gives very similar performance to CBPDN, being slightly better on some test images and slightly worse on others.
- •
CBPDN + STV (see Sec. 3.1) gives the best overall performance on three of the five test images, with performance within a few tenths of a dB of the best in the other cases. It is consistently better than CBPDN, and better than BPDN in all but one of the test cases.
- •
In a comparison between CBPDN + STV and CBPDN + VTV (see Sec. 3.2), the former is sometimes better by a moderate margin, but when it is worse this is by a very small amount.
- •
CBPDN + RTV (see Sec. 3.3) is always worse than the other two TV-augmented CBPDN methods, and is sometimes no better than CBPDN.
The computation times per iteration for the different methods were approximately 0.5 s for BPDN and CBPDN, 0.6 s for CBPDN + Grd, 2.2 s for CBPDN + STV and CBPDN + VTV, and 2.4 s for CBPDN + RTV, i.e. the improved performance of the TV methods is obtained at a significant computational cost.
The second set of experiments evaluated the efficacy of the terms augmenting plain CBPDN by comparing the denoising performance at the best choices of both and (as in Table 1) with the same method with fixed to zero and optimisation only over . (There is no need to perform a corresponding comparison with fixed to zero since this corresponds to the baseline CBPDN method.) The differences between the PSNR values of the methods optimised over both parameters and only optimised over are displayed in Table 2. Note that, for CBPDN + STV, there is a positive difference in two cases and a very small negative difference in two other cases, i.e. for most of the test images, the convolutional representation with only a TV regularization term is competitive with the baseline CBPDN. For all of the other methods the performance is substantially degraded without the term.
The final set of experiments considers a more realistic scenario in which ground truth is not available for parameter selection for the test images, making it necessary to choose the and parameters by optimising over a distinct parameter selection image set. The same and parameters were selected for all test images by finding the values giving the best average performance for a separate image set, again via a search on a logarithmically spaced grid. The results for this experiment are presented in Table 3. Overall, the relative performances of the different methods do not differ qualitatively from those of the experiments reported in Table 1. (CBPDN + Grd is excluded from this set of experiments since it is clear from the first two sets of experiments that it is not competitive.)
5 Conclusions
While a strictly apples-to-apples comparison between BPDN and CBPDN denoising methods is difficult to construct, the careful attempt reported here indicates that BPDN is slightly superior to baseline CBPDN, but that augmentation of the baseline CBPDN functional with the appropriate TV term substantially boosts performance, surpassing that of BPDN in all but one of the five test cases considered here. With respect to the specific form of additional TV term, scalar TV applied independently to each coefficient map is somewhat superior to a joint vector TV term over all of the coefficient maps, and both of these methods are substantially superior to TV applied in the reconstruction domain rather than to the coefficient maps, indicating that the gain from a TV term on the coefficient maps should not be viewed simply as resulting from denoising via a synthesis of sparse representation and TV image models. It is particularly interesting that the convolutional sparse coding problem with only an STV penalty is competitive in performance with the usual CBPDN form with only an penalty. At a more abstract level, these results suggest that penalties that exploit the spatial structure of the coefficient maps are necessary to achieve the true potential of the convolutional model.
Implementations of the algorithms proposed here are included in the Python version of the SPORCO library [26, 25].
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. Mairal, F. Bach, and J. Ponce, “Sparse modeling for image and vision processing,” Foundations and Trends in Computer Graphics and Vision , vol. 8, no. 2-3, pp. 85–283, 2014. doi: 10.1561/0600000058 · doi ↗
- 2[2] B. Wohlberg, “Efficient algorithms for convolutional sparse representations,” IEEE Trans. Image Process. , vol. 25, no. 1, pp. 301–315, Jan. 2016. doi: 10.1109/TIP.2015.2495260 · doi ↗
- 3[3] ——, “Boundary handling for convolutional sparse representations,” in Proc. IEEE Conf. Image Process. (ICIP) , Sep. 2016, pp. 1833–1837. doi: 10.1109/ICIP.2016.7532675 · doi ↗
- 4[4] S. Gu, W. Zuo, Q. Xie, D. Meng, X. Feng, and L. Zhang, “Convolutional sparse coding for image super-resolution,” in Proc. IEEE Int. Conf. Comp. Vis. (ICCV) , Dec. 2015. doi: 10.1109/ICCV.2015.212 · doi ↗
- 5[5] Y. Liu, X. Chen, R. K. Ward, and Z. J. Wang, “Image fusion with convolutional sparse representation,” IEEE Signal Process. Lett. , 2016. doi: 10.1109/lsp.2016.2618776 · doi ↗
- 6[6] B. Wohlberg, “Convolutional sparse representations as an image model for impulse noise restoration,” in Proc. IEEE Image Video Multidim. Signal Process. Workshop (IVMSP) , Bordeaux, France, Jul. 2016. doi: 10.1109/IVMSPW.2016.7528229 · doi ↗
- 7[7] H. Zhang and V. Patel, “Convolutional sparse coding-based image decomposition,” in British Mach. Vis. Conf. (BMVC) , York, UK, Sep. 2016
- 8[8] T. M. Quan and W.-K. Jeong, “Compressed sensing reconstruction of dynamic contrast enhanced MRI using GPU-accelerated convolutional sparse coding,” in Proc. IEEE Int. Symp. Biomed. Imaging (ISBI) , Apr. 2016, pp. 518–521. doi: 10.1109/ isbi.2016.7493321 · doi ↗
