Convolutional Sparse Representations with Gradient Penalties

Brendt Wohlberg

arXiv:1705.04407·cs.CV·March 25, 2021

Convolutional Sparse Representations with Gradient Penalties

Brendt Wohlberg

PDF

TL;DR

This paper investigates convolutional sparse representations for image denoising, showing that gradient penalties on coefficient maps significantly improve their performance over traditional block-based methods.

Contribution

It introduces gradient penalties into convolutional sparse coding, enhancing noise removal capabilities beyond existing block-based approaches.

Findings

01

Gradient penalties improve convolutional sparse coding performance.

02

Convolutional representations with penalties outperform block-based methods in noise removal.

03

Gradient-penalized models achieve superior image reconstruction quality.

Abstract

While convolutional sparse representations enjoy a number of useful properties, they have received limited attention for image reconstruction problems. The present paper compares the performance of block-based and convolutional sparse representations in the removal of Gaussian white noise. While the usual formulation of the convolutional sparse coding problem is slightly inferior to the block-based representations in this problem, the performance of the convolutional form can be boosted beyond that of the block-based form by the inclusion of suitable penalties on the gradients of the coefficient maps.

Figures5

Click any figure to enlarge with its caption.

Tables3

Table 1. Table 1 : Comparison of denoising performance (PSNR in dB) of the different denoising methods for each of the five test images, with parameters individually optimised for each image. Bold values indicate the best performing CBPDN method. An italic value in the BPDN row indicates that BPDN gave the best overall performance for that image.

	Test Image
Method	1	2	3	4	5
BPDN	29.47	32.91	30.08	31.73	30.19
CBPDN	29.31	32.70	29.76	31.27	30.09
CBPDN + Grd	29.28	32.76	30.02	31.22	30.12
CBPDN + STV	30.17	33.01	29.90	32.09	30.34
CBPDN + VTV	29.60	33.04	29.96	31.63	30.31
CBPDN + RTV	29.28	32.84	29.76	31.29	30.19

Table 2. Table 2 : PSNR difference in dB between results for optimisation over both λ 𝜆 \lambda and μ 𝜇 \mu (Table 1 ) and for optimisation over μ 𝜇 \mu only, with λ = 0 𝜆 0 \lambda=0 .

	Test Image
Method	1	2	3	4	5
CBPDN + Grd	-2.31	-3.16	-2.51	-1.39	-0.94
CBPDN + STV	+0.04	-0.22	-0.04	-0.03	+0.03
CBPDN + VTV	-0.64	-0.77	-0.89	-0.29	-0.34
CBPDN + RTV	-1.28	-0.66	-0.73	-0.47	-0.33

Table 3. Table 3 : Comparison of denoising performance (PSNR in dB) of the different denoising methods for each of the five test images, all with the same parameters obtained by optimising over a separate image set. Bold values indicate the best performing CBPDN method. An italic value in the BPDN row indicates that BPDN gave the best overall performance for that image.

	Test Image
Method	1	2	3	4	5
BPDN	29.47	32.03	29.92	31.38	30.19
CBPDN	29.24	31.73	29.54	30.89	30.00
CBPDN + STV	29.90	32.36	29.86	31.68	30.29
CBPDN + VTV	29.54	32.35	29.86	31.34	30.25
CBPDN + RTV	29.16	32.49	29.76	31.25	30.19

Equations63

\operatorname*{arg\,min}_{\{\mathbf{x}_{m}\}}\frac{1}{2}\Big{\lVert}\sum_{m}\mathbf{d}_{m}\ast\mathbf{x}_{m}-\mathbf{s}\Big{\rVert}_{2}^{2}+\lambda\sum_{m}\alpha_{m}\left\|\mathbf{x}_{m}\right\|_{1}\;,\vspace{-1.2mm}

\operatorname*{arg\,min}_{\{\mathbf{x}_{m}\}}\frac{1}{2}\Big{\lVert}\sum_{m}\mathbf{d}_{m}\ast\mathbf{x}_{m}-\mathbf{s}\Big{\rVert}_{2}^{2}+\lambda\sum_{m}\alpha_{m}\left\|\mathbf{x}_{m}\right\|_{1}\;,\vspace{-1.2mm}

\operatorname*{arg\,min}_{\mathbf{x}}\;(1/2)\big{\lVert}D\mathbf{x}-\mathbf{s}\big{\rVert}_{2}^{2}+\lambda\left\|\boldsymbol{\alpha}\odot\mathbf{x}\right\|_{1}\;,\vspace{-1.2mm}

\operatorname*{arg\,min}_{\mathbf{x}}\;(1/2)\big{\lVert}D\mathbf{x}-\mathbf{s}\big{\rVert}_{2}^{2}+\lambda\left\|\boldsymbol{\alpha}\odot\mathbf{x}\right\|_{1}\;,\vspace{-1.2mm}

D=\left(\begin{array}[]{ccc}D_{0}&D_{1}&\ldots\end{array}\right)\;\;\;\boldsymbol{\alpha}=\left(\begin{array}[]{c}\alpha_{0}\mathbf{1}\\ \alpha_{1}\mathbf{1}\\ \vdots\end{array}\right)\;\;\;\mathbf{1}=\left(\begin{array}[]{c}1\\ 1\\ \vdots\end{array}\right)\;\;\;\mathbf{x}=\left(\begin{array}[]{c}\mathbf{x}_{0}\\ \mathbf{x}_{1}\\ \vdots\end{array}\right)\;.\vspace{-1mm}

D=\left(\begin{array}[]{ccc}D_{0}&D_{1}&\ldots\end{array}\right)\;\;\;\boldsymbol{\alpha}=\left(\begin{array}[]{c}\alpha_{0}\mathbf{1}\\ \alpha_{1}\mathbf{1}\\ \vdots\end{array}\right)\;\;\;\mathbf{1}=\left(\begin{array}[]{c}1\\ 1\\ \vdots\end{array}\right)\;\;\;\mathbf{x}=\left(\begin{array}[]{c}\mathbf{x}_{0}\\ \mathbf{x}_{1}\\ \vdots\end{array}\right)\;.\vspace{-1mm}

\operatorname*{arg\,min}_{\mathbf{x},\mathbf{y}}\;(1/2)\big{\lVert}D\mathbf{x}-\mathbf{s}\big{\rVert}_{2}^{2}\!+\!\lambda\left\|\boldsymbol{\alpha}\odot\mathbf{y}\right\|_{1}\text{ s.t. }\mathbf{x}\!-\!\mathbf{y}\!=\!0\;,\vspace{-1mm}

\operatorname*{arg\,min}_{\mathbf{x},\mathbf{y}}\;(1/2)\big{\lVert}D\mathbf{x}-\mathbf{s}\big{\rVert}_{2}^{2}\!+\!\lambda\left\|\boldsymbol{\alpha}\odot\mathbf{y}\right\|_{1}\text{ s.t. }\mathbf{x}\!-\!\mathbf{y}\!=\!0\;,\vspace{-1mm}

x^{(j + 1)}

x^{(j + 1)}

y^{(j + 1)}

u^{(j + 1)}

\operatorname*{arg\,min}_{\hat{\mathbf{x}}}\;(1/2)\big{\lVert}\hat{D}\hat{\mathbf{x}}-\hat{\mathbf{s}}\big{\rVert}_{2}^{2}+(\rho/2)\left\|\hat{\mathbf{x}}-\left(\hat{\mathbf{y}}-\hat{\mathbf{u}}\right)\right\|_{2}^{2}\;,\vspace{-0.8mm}

\operatorname*{arg\,min}_{\hat{\mathbf{x}}}\;(1/2)\big{\lVert}\hat{D}\hat{\mathbf{x}}-\hat{\mathbf{s}}\big{\rVert}_{2}^{2}+(\rho/2)\left\|\hat{\mathbf{x}}-\left(\hat{\mathbf{y}}-\hat{\mathbf{u}}\right)\right\|_{2}^{2}\;,\vspace{-0.8mm}

(\hat{D}^{H} \hat{D} + ρ I) \hat{x} = \hat{D}^{H} \hat{s} + ρ (\hat{y} - \hat{u}) . \vspace - 0.8 mm

(\hat{D}^{H} \hat{D} + ρ I) \hat{x} = \hat{D}^{H} \hat{s} + ρ (\hat{y} - \hat{u}) . \vspace - 0.8 mm

{x_{m}} arg min

{x_{m}} arg min

μ m \sum β_{m} (g_{0} * x_{m})^{2} + (g_{1} * x_{m})^{2}_{1}, \vspace - 1 mm

\Gamma_{l}=\left(\begin{array}[]{ccc}\beta_{0}G_{l}&0&\ldots\\ 0&\beta_{1}G_{l}&\ldots\\ \vdots&\vdots&\ddots\end{array}\right)\vspace{-1mm}

\Gamma_{l}=\left(\begin{array}[]{ccc}\beta_{0}G_{l}&0&\ldots\\ 0&\beta_{1}G_{l}&\ldots\\ \vdots&\vdots&\ddots\end{array}\right)\vspace{-1mm}

\displaystyle\operatorname*{arg\,min}_{\mathbf{x},\mathbf{y}_{0},\mathbf{y}_{1},\mathbf{y}_{2}}\frac{1}{2}\big{\lVert}D\mathbf{x}-\mathbf{s}\big{\rVert}_{2}^{2}

\displaystyle\operatorname*{arg\,min}_{\mathbf{x},\mathbf{y}_{0},\mathbf{y}_{1},\mathbf{y}_{2}}\frac{1}{2}\big{\lVert}D\mathbf{x}-\mathbf{s}\big{\rVert}_{2}^{2}

\displaystyle\mkern-36.0mu\text{ s.t. }\left(\begin{array}[]{c}\Gamma_{0}\mathbf{x}\\ \Gamma_{1}\mathbf{x}\\ \mathbf{x}\end{array}\right)-\left(\begin{array}[]{c}\mathbf{y}_{0}\\ \mathbf{y}_{1}\\ \mathbf{y}_{2}\end{array}\right)=0\;.

x arg min

x arg min

\frac{ρ}{2} ∥ Γ_{1} x - y_{1} + u_{1} ∥_{2}^{2} + \frac{ρ}{2} ∥ x - y_{2} + u_{2} ∥_{2}^{2}, \vspace - 2 mm

(\hat{D}^{H} \hat{D} +

(\hat{D}^{H} \hat{D} +

\hat{Γ}_{0}^{H} (\hat{y}_{0} - \hat{u}_{0}) + \hat{Γ}_{1}^{H} (\hat{y}_{1} - \hat{u}_{1})) . \vspace - 2 mm

y_{2} arg min

y_{2} arg min

y_{0}, y_{1} arg min

+ (ρ /2) ∥ Γ_{1} x - y_{1} + u_{1} ∥_{2}^{2} . \vspace - 2 mm

\displaystyle\mathbf{y}_{l}=\frac{\mathbf{z}_{l}}{\sqrt{\mathbf{z}_{0}^{2}+\mathbf{z}_{1}^{2}}}\max\Big{(}0,\sqrt{\mathbf{z}_{0}^{2}+\mathbf{z}_{1}^{2}}-\frac{\mu}{\rho}\Big{)}\quad l\in\{0,1\}\vspace{-1mm}

\displaystyle\mathbf{y}_{l}=\frac{\mathbf{z}_{l}}{\sqrt{\mathbf{z}_{0}^{2}+\mathbf{z}_{1}^{2}}}\max\Big{(}0,\sqrt{\mathbf{z}_{0}^{2}+\mathbf{z}_{1}^{2}}-\frac{\mu}{\rho}\Big{)}\quad l\in\{0,1\}\vspace{-1mm}

{x_{m}} arg min

{x_{m}} arg min

\displaystyle\mu\Big{\lVert}\sqrt{\vphantom{\sum}\smash[b]{\sum_{m}\beta_{m}\left[(\mathbf{g}_{0}\ast\mathbf{x}_{m})^{2}+(\mathbf{g}_{1}\ast\mathbf{x}_{m})^{2}\right]}}\Big{\rVert}_{1}\;.\vspace{-1mm}

\mu\Big{\lVert}\sqrt{\vphantom{\sum}\smash[b]{\sum_{m}\beta_{m}\left[(G_{0}\mathbf{x}_{m})^{2}+(G_{1}\mathbf{x}_{m})^{2}\right]}}\Big{\rVert}_{1}\;.

\mu\Big{\lVert}\sqrt{\vphantom{\sum}\smash[b]{\sum_{m}\beta_{m}\left[(G_{0}\mathbf{x}_{m})^{2}+(G_{1}\mathbf{x}_{m})^{2}\right]}}\Big{\rVert}_{1}\;.

\Gamma_{l}=\left(\begin{array}[]{ccc}\sqrt{\beta_{0}}G_{l}&0&\ldots\\ 0&\sqrt{\beta_{1}}G_{l}&\ldots\\ \vdots&\vdots&\ddots\end{array}\right)

\Gamma_{l}=\left(\begin{array}[]{ccc}\sqrt{\beta_{0}}G_{l}&0&\ldots\\ 0&\sqrt{\beta_{1}}G_{l}&\ldots\\ \vdots&\vdots&\ddots\end{array}\right)

\displaystyle\operatorname*{arg\,min}_{\mathbf{x},\mathbf{y}_{0},\mathbf{y}_{1},\mathbf{y}_{2}}\;\frac{1}{2}\big{\lVert}D\mathbf{x}-\mathbf{s}\big{\rVert}_{2}^{2}

\displaystyle\operatorname*{arg\,min}_{\mathbf{x},\mathbf{y}_{0},\mathbf{y}_{1},\mathbf{y}_{2}}\;\frac{1}{2}\big{\lVert}D\mathbf{x}-\mathbf{s}\big{\rVert}_{2}^{2}

\displaystyle\mkern-36.0mu\text{ s.t. }\left(\begin{array}[]{c}\Gamma_{0}\mathbf{x}\\ \Gamma_{1}\mathbf{x}\\ \mathbf{x}\end{array}\right)-\left(\begin{array}[]{c}\mathbf{y}_{0}\\ \mathbf{y}_{1}\\ \mathbf{y}_{2}\end{array}\right)=0\;.

\displaystyle\mathbf{y}_{l}=\frac{\mathbf{z}_{l}}{\sqrt{I_{B}\mathbf{z}_{0}^{2}+I_{B}\mathbf{z}_{1}^{2}}}\max\Big{(}0,\sqrt{I_{B}\mathbf{z}_{0}^{2}+I_{B}\mathbf{z}_{1}^{2}}-\frac{\mu}{\rho}\Big{)}\vspace{-2.8mm}

\displaystyle\mathbf{y}_{l}=\frac{\mathbf{z}_{l}}{\sqrt{I_{B}\mathbf{z}_{0}^{2}+I_{B}\mathbf{z}_{1}^{2}}}\max\Big{(}0,\sqrt{I_{B}\mathbf{z}_{0}^{2}+I_{B}\mathbf{z}_{1}^{2}}-\frac{\mu}{\rho}\Big{)}\vspace{-2.8mm}

\operatorname*{arg\,min}_{\{\mathbf{x}_{m}\}}\frac{1}{2}\Big{\lVert}\sum_{m}\mathbf{d}_{m}\ast\mathbf{x}_{m}-\mathbf{s}\Big{\rVert}_{2}^{2}+\lambda\sum_{m}\alpha_{m}\left\|\mathbf{x}_{m}\right\|_{1}+\\[-1.0pt] \mu\bigg{\lVert}\sqrt{\vphantom{\sum}\smash[b]{\Big{(}\mathbf{g}_{0}\ast\sum_{m}\beta_{m}\mathbf{d}_{m}\ast\mathbf{x}_{m}\Big{)}^{2}+\Big{(}\mathbf{g}_{1}\ast\sum_{m}\beta_{m}\mathbf{d}_{m}\ast\mathbf{x}_{m}\Big{)}^{2}}}\bigg{\rVert}_{1}\;.

\operatorname*{arg\,min}_{\{\mathbf{x}_{m}\}}\frac{1}{2}\Big{\lVert}\sum_{m}\mathbf{d}_{m}\ast\mathbf{x}_{m}-\mathbf{s}\Big{\rVert}_{2}^{2}+\lambda\sum_{m}\alpha_{m}\left\|\mathbf{x}_{m}\right\|_{1}+\\[-1.0pt] \mu\bigg{\lVert}\sqrt{\vphantom{\sum}\smash[b]{\Big{(}\mathbf{g}_{0}\ast\sum_{m}\beta_{m}\mathbf{d}_{m}\ast\mathbf{x}_{m}\Big{)}^{2}+\Big{(}\mathbf{g}_{1}\ast\sum_{m}\beta_{m}\mathbf{d}_{m}\ast\mathbf{x}_{m}\Big{)}^{2}}}\bigg{\rVert}_{1}\;.

\mu\bigg{\lVert}\sqrt{\vphantom{\sum}\smash[b]{\Big{(}\sum_{m}\beta_{m}(\mathbf{g}_{0}\ast\mathbf{d}_{m})\ast\mathbf{x}_{m}\Big{)}^{2}\!\!+\!\Big{(}\sum_{m}\beta_{m}(\mathbf{g}_{1}\ast\mathbf{d}_{m})\ast\mathbf{x}_{m}\Big{)}^{2}}}\bigg{\rVert}_{1}\;.\vspace{-1mm}

\mu\bigg{\lVert}\sqrt{\vphantom{\sum}\smash[b]{\Big{(}\sum_{m}\beta_{m}(\mathbf{g}_{0}\ast\mathbf{d}_{m})\ast\mathbf{x}_{m}\Big{)}^{2}\!\!+\!\Big{(}\sum_{m}\beta_{m}(\mathbf{g}_{1}\ast\mathbf{d}_{m})\ast\mathbf{x}_{m}\Big{)}^{2}}}\bigg{\rVert}_{1}\;.\vspace{-1mm}

\mu\bigg{\lVert}\sqrt{\vphantom{\sum}\smash[b]{\Big{(}\sum_{m}G_{0,m}\mathbf{x}_{m}\Big{)}^{2}+\Big{(}\sum_{m}G_{1,m}\mathbf{x}_{m}\Big{)}^{2}}}\bigg{\rVert}_{1}\;,\vspace{-1mm}

\mu\bigg{\lVert}\sqrt{\vphantom{\sum}\smash[b]{\Big{(}\sum_{m}G_{0,m}\mathbf{x}_{m}\Big{)}^{2}+\Big{(}\sum_{m}G_{1,m}\mathbf{x}_{m}\Big{)}^{2}}}\bigg{\rVert}_{1}\;,\vspace{-1mm}

\displaystyle\operatorname*{arg\,min}_{\mathbf{x},\mathbf{y}_{0},\mathbf{y}_{1},\mathbf{y}_{2}}\frac{1}{2}\big{\lVert}D\mathbf{x}-\mathbf{s}\big{\rVert}_{2}^{2}

\displaystyle\operatorname*{arg\,min}_{\mathbf{x},\mathbf{y}_{0},\mathbf{y}_{1},\mathbf{y}_{2}}\frac{1}{2}\big{\lVert}D\mathbf{x}-\mathbf{s}\big{\rVert}_{2}^{2}

\displaystyle\mkern-36.0mu\text{ s.t. }\left(\begin{array}[]{c}\Gamma_{0}\mathbf{x}\\ \Gamma_{1}\mathbf{x}\\ \mathbf{x}\end{array}\right)-\left(\begin{array}[]{c}\mathbf{y}_{0}\\ \mathbf{y}_{1}\\ \mathbf{y}_{2}\end{array}\right)=0\;.

x arg min

x arg min

\frac{ρ}{2} ∥ Γ_{1} x - y_{1} + u_{1} ∥_{2}^{2} + \frac{ρ}{2} ∥ x - y_{2} + u_{2} ∥_{2}^{2} . \vspace - 1 mm

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Convolutional Sparse Representations with Gradient Penalties

Abstract

While convolutional sparse representations enjoy a number of useful properties, they have received limited attention for image reconstruction problems. The present paper compares the performance of block-based and convolutional sparse representations in the removal of Gaussian white noise. The usual formulation of the convolutional sparse coding problem is slightly inferior to the block-based representations in this problem, but the performance of the convolutional form can be boosted beyond that of the block-based form by the inclusion of suitable penalties on the gradients of the coefficient maps.

**Index Terms— ** Convolutional Sparse Representations, Convolutional Sparse Coding, Total Variation

1 Introduction

Sparse representations are well-established as a tool for inverse problems in a wide variety of areas, including signal and image processing, computer vision, and machine learning [1]. The standard form is a linear representation $D\mathbf{x}\approx\mathbf{s}$ , where $D$ is the dictionary, $\mathbf{x}$ is the representation, and $\mathbf{s}$ is the signal to be represented. When $D$ is a linear transform with a fast transform operator, such as the Discrete Wavelet Transform, these representations can be computed for large images, but when $D$ is learned from training data and represented as an explicit matrix, this is not feasible, the standard approach being to independently compute the representations over a set of overlapping image patches. Convolutional sparse representations are a recent111More accurately, the label convolutional is recent, but the equivalent translation invariant sparse representations are much older [2, Sec. II]. alternative that replace the general linear representation with a sum of convolutions222Typically circular convolutions [3]. $\sum_{m}\mathbf{d}_{m}\ast\mathbf{x}_{m}\approx\mathbf{s}$ , where the elements of the dictionary $\mathbf{d}_{m}$ are linear filters, and the representation consists of the set of coefficient maps $\mathbf{x}_{m}$ .

There is growing interest in imaging and image processing applications of the convolutional form [4, 5, 6, 7, 8, 9]. Surprisingly, denoising of Gaussian white noise, arguably the simplest of all imaging inverse problems, has received almost no attention beyond a very brief example providing insufficient detail for reproducibilty [10, Sec. 4.4]. The present paper argues that, despite its numerous advantages in many contexts, the convolutional form is not competitive for the Gaussian white noise denoising problem, but that these deficiencies can be mitigated by moving beyond simple $\ell_{1}$ regularization, the specific form being investigated here consisting of additional penalties on the gradients of the coefficient map333A weighting strategy applied to the $\ell_{1}$ penalty has also been found to improve the denoising performance of convolutional sparse representations [11, Sec. 8], but that approach is not considered here due to space constraints..

It is emphasised that these extensions have relevance beyond the specific denoising test problem considered here, in that the improved performance reported on this problem can also be expected to have an impact on more general image reconstruction problems, e.g. when convolutional sparse coding is employed as the prior within the plug-and-play priors framework [12, 13]. There is also evidence that the inclusion of such gradient penalties enhances the performance of convolutional sparse representations in certain image decomposition/restoration problems [7, 9].

2 Convolutional Sparse Coding

The most widely used form of convolutional sparse coding is Convolutional Basis Pursuit DeNoising (CBPDN), defined as

[TABLE]

where the $\alpha_{m}$ allow distinct weighting of the $\ell_{1}$ term for each filter $\mathbf{d}_{m}$ . At present, the most efficient approach to solving this problem [2] is via the Alternating Direction Method of Multipliers (ADMM) [14] framework. An outline of this method is presented here as a basis for extensions proposed in following sections.

Problem (1) can be written as

[TABLE]

where $\odot$ is the Hadamard product, $D_{m}$ is a linear operator such that $D_{m}\mathbf{x}_{m}=\mathbf{d}_{m}\!\ast\!\mathbf{x}_{m}$ , and $D$ , $\boldsymbol{\alpha}$ , and $\mathbf{x}$ are the block matrices/vectors

[TABLE]

This problem can be expressed in ADMM standard form as

[TABLE]

which can be solved via the ADMM iterations

[TABLE]

The solution to (6) is given by the soft thresholding operation [15, Sec. 6.5.2] $\mathbf{y}=\mathop{\mathrm{sign}}(\mathbf{z})\odot\max(0,\left|\mathbf{z}\right|-\lambda\boldsymbol{\alpha}/\rho)$ where $\mathbf{z}=\mathbf{x}+\mathbf{u}$ . The only computationally expensive step is (5), which can be solved via the equivalent DFT domain problem

[TABLE]

where $\hat{\mathbf{z}}$ denotes the DFT of variable $\mathbf{z}$ . The solution for (8) is given by the $MN\times MN$ linear system (for $M$ filters and an image $\mathbf{s}$ with $N$ pixels)

[TABLE]

The key to solving this very large linear system is the observation that it can be decomposed into $N$ independent $M\times M$ linear systems [16], each of which has a system matrix consisting of the sum of rank-one and diagonal terms so they they can be solved very efficiently by exploiting the Sherman-Morrison formula [17].

3 Gradient Regularization

An extension of (1) to include an $\ell_{2}$ penalty on the gradients of the coefficient maps was proposed in [6]. The primary purpose of this extension was as a regularization for an impulse filter intended to represent the low-frequency components of the image, but a small non-zero regularization on the other dictionary filters was found to provide a small improvement to the impulse noise denoising performance [6]. Considering the edge-smoothing effect of $\ell_{2}$ gradient regularization, a reasonable alternative to consider is Total Variation (TV) regularization. We consider three different variants:

scalar TV [18] applied independently to each coefficient map, 2. 2.

vector TV [19] applied jointly to the set of coefficient maps, 3. 3.

scalar TV [18] applied to the reconstructed image components $D_{m}\mathbf{x}_{m}$ rather than to the coefficient maps $\mathbf{x}_{m}$ .

3.1 Scalar TV on Coefficient Map

The CBPDN problem extended by adding a scalar TV term on each coefficient map can be written as

[TABLE]

where $\mathbf{g}_{0}$ and $\mathbf{g}_{1}$ are filters that compute the gradients along image rows and columns respectively. The TV term can be written as $\mu\sum_{m}\beta_{m}\left\|\sqrt{(G_{0}\mathbf{x}_{m})^{2}+(G_{1}\mathbf{x}_{m})^{2}}\right\|_{1}$ where linear operators $G_{0}$ and $G_{1}$ are defined such that $G_{l}\mathbf{x}_{m}=\mathbf{g}_{l}\ast\mathbf{x}_{m}$ , and defining444Note that the $\Gamma_{l}$ notation is overloaded, taking on a different definition in each section.

[TABLE]

allows further reduction to $\mu\left\|\sqrt{(\Gamma_{0}\mathbf{x})^{2}+(\Gamma_{1}\mathbf{x})^{2}}\right\|_{1}$ .

Problem (10) can be written in standard ADMM form as

[TABLE]

The resulting $\mathbf{x}$ subproblem has the form

[TABLE]

and the solution of the equivalent DFT domain problem is given by

[TABLE]

Since $\hat{\Gamma}_{0}^{H}\hat{\Gamma}_{0}$ and $\hat{\Gamma}_{1}^{H}\hat{\Gamma}_{1}$ are diagonal (the $\hat{G}_{l}$ are diagonal, and therefore so are $\hat{\Gamma}_{l}$ ), they can be grouped together with the $\rho I$ term; the independent linear systems described in Sec. 2 are again composed from rank-one and diagonal terms and the Sherman-Morrison solution [17] can be directly applied without any substantial increase in computational cost.

The $\mathbf{y}$ subproblem for (18) can be decomposed into the independent problems

[TABLE]

The solution for (21) is the same as that for (6), and (22) can be solved by use of the block soft thresholding operation [15, Sec. 6.5.1] applied in the same way as in the ADMM algorithm for the standard isotropic TV denoising problem [20, 21], [22, Sec. 4.1], i.e.

[TABLE]

where $\mathbf{z}_{l}=\Gamma_{l}\mathbf{x}+\mathbf{u}_{l}$ for $l\in\{0,1\}$ .

3.2 Vector TV on Coefficient Maps

Instead of independently applying scalar TV to each coefficient map, one can treat the set of coefficient maps as a multi-channel image and apply Vector TV [19], originally designed for restoration of colour images. The corresponding extension of the CBPDN problem can be written as

[TABLE]

Using the $G_{l}$ as defined in Sec. 3.1, the TV term can be written as

[TABLE]

Defining $I_{B}=\left(\begin{array}[]{cccc}I&I&\ldots&I\end{array}\right)$ and

[TABLE]

allows further reduction to $\mu\left\|\sqrt{I_{B}(\Gamma_{0}\mathbf{x})^{2}+I_{B}(\Gamma_{1}\mathbf{x})^{2}}\right\|_{1}\;.$

Problem (24) can be written in standard ADMM form as

[TABLE]

The resulting $\mathbf{x}$ subproblem has the same form as (19) and can be solved in the same way. The $\mathbf{y}_{2}$ subproblem is the same as (21) and can be solved in the same way, while the $\mathbf{y}_{0},\mathbf{y}_{1}$ subproblem, which only differs from (22) in the first term, can be solved by

[TABLE]

where $\mathbf{z}_{l}=\Gamma_{l}\mathbf{x}+\mathbf{u}_{l}$ for $l\in\{0,1\}$ .

3.3 Scalar TV in Image Domain

The use of TV regularization here is motivated as an exploration of additional or alternative forms of regularization to the standard $\ell_{1}$ regularization applied to the coefficient maps $\mathbf{x}$ . An alternative way of introducing TV regularization, however, would be to consider it as a regularization on the components $D_{m}\mathbf{x}_{m}$ of the reconstructed image, which can be written as

[TABLE]

The final TV term can be expressed as

[TABLE]

Introducing linear operators $G_{l,m}$ defined such that $G_{l,m}\mathbf{x}=\beta_{m}(\mathbf{g}_{l}\ast\mathbf{d}_{m})\ast\mathbf{x}$ , this can be written as

[TABLE]

and defining $\Gamma_{l}=\left(\begin{array}[]{ccc}G_{l,0}&G_{l,1}&\ldots\end{array}\right)$ allows further reduction to $\mu\big{\lVert}\sqrt{(\Gamma_{0}\mathbf{x})^{2}+(\Gamma_{1}\mathbf{x})^{2}}\big{\rVert}_{1}\;.$

Problem (34) can be written in standard ADMM form as

[TABLE]

The resulting $\mathbf{x}$ subproblem corresponding to (5) has the form

[TABLE]

and the solution of the equivalent DFT domain problem is given by

[TABLE]

Although the left hand side has the same algebraic form as that of (20), here $\hat{\Gamma}_{0}^{H}\hat{\Gamma}_{0}$ and $\hat{\Gamma}_{1}^{H}\hat{\Gamma}_{1}$ are rank-one rather than diagonal, and can therefore not be grouped together with the $\rho I$ term as in the solution for (20). In this case the left hand side is rank-three plus a diagonal: while it cannot be solved using the simple Sherman-Morrison approach, there is still an efficient solution via iterated application of the Sherman-Morrison formula, as used to solve the CBPDN problem for a multi-channel image and dictionary [23]. This involves a greater cost in terms of computation time, but there is a corresponding reduction in memory requirements because $\mathbf{y}_{0}$ and $\mathbf{y}_{1}$ are only of the size of the image rather than of the size of the set of coefficient maps.

The $\mathbf{y}$ subproblem for (41) has the same form as (21) – (22), and can be solved in the same way.

4 Results

The performance of standard block-based sparse coding and the different convolutional sparse coding methods described in Sections 2 and 3 was compared on a Gaussian white noise restoration problem. The standard sparse coding was computed via the Basis Pursuit DeNoising (BPDN) problem (i.e. problem (2) where $D$ is a standard dictionary matrix) and the resulting denoised blocks were aggregated via averaging (weighted by the number of blocks covering each pixel) to obtain a denoised image.

Two different dictionaries, one standard and one convolutional, were learned from the same set of ten training images (selected from images on Flickr with a Creative Commons license) of $1024\times 1024$ pixels each. The convolutional dictionary consisted of 128 filters of size $8\times 8$ , and was learned via the convolutional dictionary learning algorithm described in [2], while the standard dictionary consisted of 128 vectors of 64 coefficients each (i.e. a vectorised $8\times 8$ image block), and was learned via a non-convolutional variant of the algorithm used for learning the convolutional dictionary, applied to all $8\times 8$ image blocks in the training images. The standard dictionary was used for the BPDN experiments and the convolutional dictionary was used for all CBPDN experiments.

A set of five greyscale reference images, depicted in Fig. 1, was constructed by cropping regions of $256\times 256$ pixels from well-known standard test images. The regions were chosen to contain diversity of content while avoiding large smooth areas, and the size was chosen to be relatively small so that it would be computationally feasible to optimise method parameters via a grid search. The reference images were scaled so that pixel values were in the interval $[0,1]$ , and corresponding test images were constructed by adding Gaussian white noise with a standard deviation of 0.05. Following standard practice [24][6, Sec. 3], the CBPDN decomposition was applied to highpass filtered images, obtained by subtracting a lowpass component computed by Tikhonov regularization [25, pg. 3] with regularization parameter $\lambda_{L}=2.0$ .

For the first set of experiments, the results of which are displayed in Table 1, the denoising performance of the different methods was individually optimised for each image via a search over a logarithmically spaced grid on the $\lambda$ and $\mu$ parameters. The main points worth noting are:

•

BPDN is consistently better than CBPDN by a small margin.

•

CBPDN + Grd ( $\ell_{2}$ of gradient regularization, as in [6, Sec. 4]) gives very similar performance to CBPDN, being slightly better on some test images and slightly worse on others.

•

CBPDN + STV (see Sec. 3.1) gives the best overall performance on three of the five test images, with performance within a few tenths of a dB of the best in the other cases. It is consistently better than CBPDN, and better than BPDN in all but one of the test cases.

•

In a comparison between CBPDN + STV and CBPDN + VTV (see Sec. 3.2), the former is sometimes better by a moderate margin, but when it is worse this is by a very small amount.

•

CBPDN + RTV (see Sec. 3.3) is always worse than the other two TV-augmented CBPDN methods, and is sometimes no better than CBPDN.

The computation times per iteration for the different methods were approximately 0.5 s for BPDN and CBPDN, 0.6 s for CBPDN + Grd, 2.2 s for CBPDN + STV and CBPDN + VTV, and 2.4 s for CBPDN + RTV, i.e. the improved performance of the TV methods is obtained at a significant computational cost.

The second set of experiments evaluated the efficacy of the terms augmenting plain CBPDN by comparing the denoising performance at the best choices of both $\lambda$ and $\mu$ (as in Table 1) with the same method with $\lambda$ fixed to zero and optimisation only over $\mu$ . (There is no need to perform a corresponding comparison with $\mu$ fixed to zero since this corresponds to the baseline CBPDN method.) The differences between the PSNR values of the methods optimised over both parameters and only optimised over $\mu$ are displayed in Table 2. Note that, for CBPDN + STV, there is a positive difference in two cases and a very small negative difference in two other cases, i.e. for most of the test images, the convolutional representation with only a TV regularization term is competitive with the baseline CBPDN. For all of the other methods the performance is substantially degraded without the $\ell_{1}$ term.

The final set of experiments considers a more realistic scenario in which ground truth is not available for parameter selection for the test images, making it necessary to choose the $\lambda$ and $\mu$ parameters by optimising over a distinct parameter selection image set. The same $\lambda$ and $\mu$ parameters were selected for all test images by finding the values giving the best average performance for a separate image set, again via a search on a logarithmically spaced grid. The results for this experiment are presented in Table 3. Overall, the relative performances of the different methods do not differ qualitatively from those of the experiments reported in Table 1. (CBPDN + Grd is excluded from this set of experiments since it is clear from the first two sets of experiments that it is not competitive.)

5 Conclusions

While a strictly apples-to-apples comparison between BPDN and CBPDN denoising methods is difficult to construct, the careful attempt reported here indicates that BPDN is slightly superior to baseline CBPDN, but that augmentation of the baseline CBPDN functional with the appropriate TV term substantially boosts performance, surpassing that of BPDN in all but one of the five test cases considered here. With respect to the specific form of additional TV term, scalar TV applied independently to each coefficient map is somewhat superior to a joint vector TV term over all of the coefficient maps, and both of these methods are substantially superior to TV applied in the reconstruction domain rather than to the coefficient maps, indicating that the gain from a TV term on the coefficient maps should not be viewed simply as resulting from denoising via a synthesis of sparse representation and TV image models. It is particularly interesting that the convolutional sparse coding problem with only an STV penalty is competitive in performance with the usual CBPDN form with only an $\ell_{1}$ penalty. At a more abstract level, these results suggest that penalties that exploit the spatial structure of the coefficient maps are necessary to achieve the true potential of the convolutional model.

Implementations of the algorithms proposed here are included in the Python version of the SPORCO library [26, 25].

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. Mairal, F. Bach, and J. Ponce, “Sparse modeling for image and vision processing,” Foundations and Trends in Computer Graphics and Vision , vol. 8, no. 2-3, pp. 85–283, 2014. doi: 10.1561/0600000058 · doi ↗
2[2] B. Wohlberg, “Efficient algorithms for convolutional sparse representations,” IEEE Trans. Image Process. , vol. 25, no. 1, pp. 301–315, Jan. 2016. doi: 10.1109/TIP.2015.2495260 · doi ↗
3[3] ——, “Boundary handling for convolutional sparse representations,” in Proc. IEEE Conf. Image Process. (ICIP) , Sep. 2016, pp. 1833–1837. doi: 10.1109/ICIP.2016.7532675 · doi ↗
4[4] S. Gu, W. Zuo, Q. Xie, D. Meng, X. Feng, and L. Zhang, “Convolutional sparse coding for image super-resolution,” in Proc. IEEE Int. Conf. Comp. Vis. (ICCV) , Dec. 2015. doi: 10.1109/ICCV.2015.212 · doi ↗
5[5] Y. Liu, X. Chen, R. K. Ward, and Z. J. Wang, “Image fusion with convolutional sparse representation,” IEEE Signal Process. Lett. , 2016. doi: 10.1109/lsp.2016.2618776 · doi ↗
6[6] B. Wohlberg, “Convolutional sparse representations as an image model for impulse noise restoration,” in Proc. IEEE Image Video Multidim. Signal Process. Workshop (IVMSP) , Bordeaux, France, Jul. 2016. doi: 10.1109/IVMSPW.2016.7528229 · doi ↗
7[7] H. Zhang and V. Patel, “Convolutional sparse coding-based image decomposition,” in British Mach. Vis. Conf. (BMVC) , York, UK, Sep. 2016
8[8] T. M. Quan and W.-K. Jeong, “Compressed sensing reconstruction of dynamic contrast enhanced MRI using GPU-accelerated convolutional sparse coding,” in Proc. IEEE Int. Symp. Biomed. Imaging (ISBI) , Apr. 2016, pp. 518–521. doi: 10.1109/ isbi.2016.7493321 · doi ↗