BSGD-TV: A parallel algorithm solving total variation constrained image   reconstruction problems

Yushan Gao; Thomas Blumensath

arXiv:1812.01307·cs.IT·December 5, 2018

BSGD-TV: A parallel algorithm solving total variation constrained image reconstruction problems

Yushan Gao, Thomas Blumensath

PDF

Open Access

TL;DR

This paper introduces a parallel algorithm for large-scale total variation constrained image reconstruction, demonstrating faster convergence than existing methods through theoretical proof and numerical experiments.

Contribution

The paper presents a novel parallel algorithm for TV constrained image reconstruction with proven convergence and improved speed over block ADMM.

Findings

01

Faster convergence compared to block ADMM

02

Theoretical proof of convergence

03

Effective for large-scale problems

Abstract

We propose a parallel reconstruction algorithm to solve large scale TV constrained linear inverse problems. We provide a convergence proof and show numerically that our method is significantly faster than the main competitor, block ADMM.

Equations27

x^{⋆} = ar g x min f (x) (y - Ax)^{T} (y - Ax) + g (x) 2 λ TV (x),

x^{⋆} = ar g x min f (x) (y - Ax)^{T} (y - Ax) + g (x) 2 λ TV (x),

TV (x) = s, t \sum (x_{s, t} - x_{s - 1, t})^{2} + (x_{s, t} - s_{s, t - 1})^{2},

TV (x) = s, t \sum (x_{s, t} - x_{s - 1, t})^{2} + (x_{s, t} - s_{s, t - 1})^{2},

x^{k + 1}

x^{k + 1}

= x^{k} + 2 μ A^{T} (y - j = 1 \sum N (z^{j})^{k - 1})

= x^{k} + 2 μ A^{T} (y - A x^{k - 1})

[x^{k} x^{k + 1}] =

[x^{k} x^{k + 1}] =

= M [x^{k - 1} x^{k}] + [0 2 μ A^{T} y] .

det ([- v I I - 2 μ A^{T} A I - v I]) = det (A^{T} A - \frac{v - v ^{2}}{2 μ} I) = 0

det ([- v I I - 2 μ A^{T} A I - v I]) = det (A^{T} A - \frac{v - v ^{2}}{2 μ} I) = 0

u = \frac{v - v ^{2}}{2 μ},

u = \frac{v - v ^{2}}{2 μ},

v_{1} = \frac{1 + 1 - 8 μu}{2}, v_{2} = \frac{1 - 1 - 8 μu}{2} .

v_{1} = \frac{1 + 1 - 8 μu}{2}, v_{2} = \frac{1 - 1 - 8 μu}{2} .

f (x^{k + 1}) < f (x^{k}) + (x^{k + 1} - x^{k})^{T} \nabla f (x^{k - 1}) + \frac{1}{2 μ} ∥ x^{k + 1} - x^{k} ∥^{2},

f (x^{k + 1}) < f (x^{k}) + (x^{k + 1} - x^{k})^{T} \nabla f (x^{k - 1}) + \frac{1}{2 μ} ∥ x^{k + 1} - x^{k} ∥^{2},

\hat{x}^{k + 1} = x^{k} - μ \nabla f (x^{k - 1})

\hat{x}^{k + 1} = x^{k} - μ \nabla f (x^{k - 1})

x^{k + 1} = ar g x min {2 μg (x) + ∥ x - \hat{x}^{k + 1} ∥} .

Q (x, x^{k}, x^{k - 1}) =

Q (x, x^{k}, x^{k - 1}) =

+ \frac{1}{2 μ} ∥ x - x^{k} ∥^{2} + g (x),

Q (x^{k + 1}, x^{k}, x^{k - 1}) < Q (x^{k}, x^{k}, x^{k - 1}) \equiv f (x^{k}) + g (x^{k})

Q (x^{k + 1}, x^{k}, x^{k - 1}) < Q (x^{k}, x^{k}, x^{k - 1}) \equiv f (x^{k}) + g (x^{k})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging Techniques and Applications · Medical Image Segmentation Techniques · Sparse and Compressive Sensing Techniques

Full text

BSGD-TV: A parallel algorithm solving total variation constrained image reconstruction problems

Yushan Gao1, Thomas Blumensath2.

1 University of Southampton UK. 2University of Southampton UK.

Abstract

We propose a parallel reconstruction algorithm to solve large scale TV constrained linear inverse problems. We provide a convergence proof and show numerically that our method is significantly faster than the main competitor, block ADMM.

1 Introduction

Our algorithm is inspired by applications in computed tomography (CT), where the efficient inversion of large sparse linear systems is required [1]: $\mathbf{y}\approx\mathbf{A}\mathbf{x}_{true}$ , where $\mathbf{x}_{true}\in\mathbb{R}^{c}$ is the vectorised version of a 3D image that is to be reconstructed and $\mathbf{A}\in\mathbb{R}^{r\times c}$ is an X-ray projection model. $\mathbf{y}\in\mathbb{R}^{r}$ are the vectorised noisy projections. We are interested in minimizing $f(\mathbf{x})+g(\mathbf{x})$ , where $f(\mathbf{x})$ is quadratic and $g(\mathbf{x})$ is convex but non-smooth [2]. For example:

[TABLE]

where $\lambda$ is a relaxation parameter and $\text{TV}(\mathbf{x})$ is the total variation (TV) of the image $\mathbf{x}$ . For 2D images, it is defined as:

[TABLE]

where $x_{s,t}$ is the intensity of image pixel in row $s$ and column $t$ .

We recently introduced a parallel reconstruction algorithm called coordinate-reduced stochastic gradient descent (CSGD) to minimize quadratic objective function $f(\mathbf{x})$ [3]. We here introduce a slight modification by simplifying the step length calculation and show that the modified version converges to the least squares solution of $f(\mathbf{x})$ . We will call this modified algorithm block stochastic gradient descend (BSGD). We combine BSGD with an iterative shrinkage/thresholding (ISTA-type) step [4] to solve Eq.1. The new algorithm, called BSGD-TV, is compared with block ADMM-TV [5], an algorithm sharing the same parallel architecture and the same communication cost. Simulation results show that BSGD-TV is significantly faster as it requires significantly fewer matrix vector products compared to block ADMM-TV.

2 BSGD-TV Algorithm

2.1 Algorithm description

BSGD works on blocks of $\mathbf{x}$ and $\mathbf{y}$ . We assume that $\mathbf{A}$ is divided into $M$ row blocks and $N$ column blocks. Let $\{\mathbf{x}_{J_{j}}\}_{j=1}^{N}$ and $\{\mathbf{y}_{I_{i}}\}_{i=1}^{M}$ be sub-vectors of $\mathbf{x}$ and $\mathbf{y}$ and let $\mathbf{A}_{I_{i}}^{J_{j}}$ be the associated block of matrix $\mathbf{A}$ so that $\mathbf{y}_{I_{i}}\approx\sum_{j=1}^{N}\mathbf{A}_{I_{i}}^{J_{j}}\mathbf{x}_{J_{j}}$ . Our algorithm splits the optimization into blocks, so that each parallel process only computes using a single block $\mathbf{x}_{J_{j}}$ and $\mathbf{y}_{I_{i}}$ for some $J_{j}\in\{J_{j}\}_{j=1}^{N}$ and $I_{i}\in\{I_{i}\}_{i=1}^{M}$ . Each process also requires an estimate of the current residual $\mathbf{r}_{I_{i}}$ and computes a vector $\mathbf{z}^{j}_{I_{i}}$ , both of which are of the same size as $\mathbf{y}_{I_{i}}$ . The main steps (ignoring initialisation) are described in Algo.1.

To effectively solve line 9, we here adopt method proposed in [6].

2.2 BSGD Convergence

BSGD without the proximal operator ( $\lambda$ =0 in Eq.1), and with parallelization over all subsets can be shown to converge to the least squares solution. To see this, we write the update of $\mathbf{x}$ as

[TABLE]

In this form, BSGD is similar to gradient descent but uses an old gradient. Assume that there is a fixed point $\mathbf{x}^{\star}$ defined by $\mathbf{x}^{\star}=\mathbf{x}^{\star}+2\mu\mathbf{A}^{T}(\mathbf{y}-\mathbf{A}\mathbf{x}^{\star})$ . Note that the fixed point condition implies that, if $\mathbf{A}$ is full column rank, then $\mathbf{x}^{\star}=(\mathbf{A}^{T}\mathbf{A})^{-1}\mathbf{A}^{T}\mathbf{y}$ . Thus the fixed point is the least squares solution. Theorem 2.1 states the conditions on parameter $\mu$ for convergence when all subsets $\{I_{i}\}_{i=1}^{M}$ and $\{J_{j}\}_{j=1}^{N}$ are selected within one epoch.

Theorem 2.1.

If $\mu\in(0,\frac{1}{2u_{max}})$ , where $u_{max}$ is the maximum eigenvalue of $\mathbf{A}^{T}\mathbf{A}$ and assume $\mathbf{A}$ is full column rank, then BSGD without the TV operator ( $\lambda$ =0 in Eq.1), and with parallelization over all subsets converges to the least squares solution $\mathbf{x}^{\star}$ .

Proof of Theorem 2.1.

The iteration in Eq.3 can be written as

[TABLE]

Standard convergence results for iterative method of this type with fixed $\mathbf{M}$ require the spectral radius of $\mathbf{M}$ to be less than 1 [7]. Let $v$ be any (possibly complex valued) eigenvalue of $\mathbf{M}$ , i.e. $v$ satisfies det $(\mathbf{M}-v\mathbf{I})=0$ . It is straightforward to obtain:

[TABLE]

By Eq.5, we see that eigenvalues $u$ of $\mathbf{A}^{T}\mathbf{A}$ correspond to

[TABLE]

Eigenvalues of $\mathbf{M}$ are then given by

[TABLE]

As the spectral radius of $\mathbf{M}$ corresponds to the largest magnitude of the eigenvalues of $\mathbf{M}$ , we require $|v_{1}|<1$ and $|v_{2}|<1$ to ensure the convergence of the algorithm. $\mathbf{A}^{T}\mathbf{A}$ is a positive definite matrix and thus has only positive, real valued eigenvalues $u$ . Thus $v_{1}$ and $v_{2}$ are real valued if $0<\mu\leq\frac{1}{8u}$ and complex valued if $\mu$ is $\frac{1}{8u}<\mu$ . In the complex case, it is easy to see that $|v_{1}|<1$ and $|v_{2}|<1$ if $\mu<\frac{1}{2u}$ , implying that the acceptable range of $\mu$ is $(0,\frac{1}{2u_{max}})$ . ∎

Theorem 2.2 gives a general convergence condition when applying BSGD-TV to solve Eq.1.

Theorem 2.2.

If the constant step length $\mu$ satisfies

[TABLE]

where $f(\mathbf{x})$ is defined in Eq.1, then BSGD-TV converges to the optimal solution of Eq.1.

Proof of Theorem 2.2.

With parallelization over all subsets, BSGD-TV computes

[TABLE]

We define a function $\mathbf{Q}$ as

[TABLE]

where $\|\cdot\|^{2}$ is the squared $\ell_{2}$ norm. The fact that $\arg\min_{\mathbf{x}}\{\mathbf{Q}(\mathbf{x},\mathbf{x}^{k},\mathbf{x}^{k-1})\}\equiv\mathbf{x}^{k+1}$ means that:

[TABLE]

Finally, the definition of $\mathbf{Q}$ and the requirement on the step length $\mu$ in Eq.8, mean that $f(\mathbf{x}^{k+1})+g(\mathbf{x}^{k+1})<\mathbf{Q}(\mathbf{x}^{k+1},\mathbf{x}^{k},\mathbf{x}^{k-1})<f(\mathbf{x}^{k})+g(\mathbf{x}^{k})$ holds, so that BSGD-TV converges to the fixed point of Eq.1. ∎

3 Simulations

We show experimentally that the method also converges when only a fraction $\alpha$ and $\gamma$ of subsets of $\{\mathbf{x}_{J_{j}}\}_{j=1}^{N}$ and $\{\mathbf{y}_{I_{i}}\}_{i=1}^{M}$ are randomly selected to calculate the $\mathbf{g}$ at each iteration. The simulation geometry is shown in Fig.1.

We add Gaussian noise to the projections so that the SNR of $\mathbf{y}$ is 17.7 dB. We define the relative error as $\frac{\|\mathbf{x}_{dif}\|}{\|\mathbf{x}_{true}\|}$ , where $\|\mathbf{x}_{dif}\|$ is the $\ell_{2}$ norm of the difference between reconstructed image vector and the original vector $\mathbf{x}_{true}$ . Convergence is shown in Fig.2a. We plot relative error against epochs, where an epoch is a normalised iteration count that corrects for the fact that the stochastic version of our algorithm only updates a subset of elements at each iteration.

BSGD-TV and ADMM-TV are faster than ISTA in terms of epochs. However, ADMM-TV is significantly slower than BSGD-TV, because ADMM-TV requires matrix inversions at each iteration, while BSGD does not. Even when implementing ADMM-TV using as few conjugate gradient iterations per step as possible, as shown in Fig.2b, BSGD-TV is more computationally efficient in terms of the number of required matrix vector multiplications [3]. Compared to ISTA and GD, our block method allows these computations to be fully parallelised which would enable to reconstruct large scale CT reconstructions while the computation node have limited storage capacity.

4 Conclusion

BSGD-TV is a parallel algorithm for large scale TV constrained CT reconstruction. It is similar to the popular ISTA algorithm but is specially designed for optimisation in distributed networks. The advantage is that individual compute nodes only operate on subsets of $\mathbf{y}$ and $\mathbf{x}$ , which means they can operate with less internal memory. The method converges significantly faster than block-ADMM methods.

Bibliography7

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] X. Guo, “Convergence studies on block iterative algorithms for image reconstruction”, Applied Mathematics and Computation, 273 : 525–534, 2016.
2[2] E. Sidky and X.Pan, “Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization”, Physics in Medicine & \& Biology, 53 (17):4777–4807, 2008.
3[3] Y. Gao and T.Blumensath, “A Joint Row and Column Action Method for Cone-Beam Computed Tomography”, IEEE Transactions on Computational Imaging, 2018.
4[4] A. Beck and M.Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems”, SIAM journal on imaging sciences, 2 (1):183–202, 2009.
5[5] N. Parikh and S. Boyd, “Block splitting for distributed optimization”, Mathematical Programming Computation, 6 (1):77–102, 2014.
6[6] A. Beck and M.Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems”, IEEE Transactions on Image Processing, 18 (11):2419–2434, 2009.
7[7] Y. Saad, “Iterative methods for sparse linear systems”, siam, 82 , 2003.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

BSGD-TV: A parallel algorithm solving total variation constrained image reconstruction problems

Abstract

1 Introduction

2 BSGD-TV Algorithm

2.1 Algorithm description

2.2 BSGD Convergence

Theorem 2.1**.**

Proof of Theorem 2.1.

Theorem 2.2**.**

Proof of Theorem 2.2.

3 Simulations

4 Conclusion

Theorem 2.1.

Theorem 2.2.