BSGD-TV: A parallel algorithm solving total variation constrained image reconstruction problems
Yushan Gao, Thomas Blumensath

TL;DR
This paper introduces a parallel algorithm for large-scale total variation constrained image reconstruction, demonstrating faster convergence than existing methods through theoretical proof and numerical experiments.
Contribution
The paper presents a novel parallel algorithm for TV constrained image reconstruction with proven convergence and improved speed over block ADMM.
Findings
Faster convergence compared to block ADMM
Theoretical proof of convergence
Effective for large-scale problems
Abstract
We propose a parallel reconstruction algorithm to solve large scale TV constrained linear inverse problems. We provide a convergence proof and show numerically that our method is significantly faster than the main competitor, block ADMM.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications · Medical Image Segmentation Techniques · Sparse and Compressive Sensing Techniques
BSGD-TV: A parallel algorithm solving total variation constrained image reconstruction problems
Yushan Gao1, Thomas Blumensath2.
1 University of Southampton UK. 2University of Southampton UK.
Abstract
We propose a parallel reconstruction algorithm to solve large scale TV constrained linear inverse problems. We provide a convergence proof and show numerically that our method is significantly faster than the main competitor, block ADMM.
1 Introduction
Our algorithm is inspired by applications in computed tomography (CT), where the efficient inversion of large sparse linear systems is required [1]: , where is the vectorised version of a 3D image that is to be reconstructed and is an X-ray projection model. are the vectorised noisy projections. We are interested in minimizing , where is quadratic and is convex but non-smooth [2]. For example:
[TABLE]
where is a relaxation parameter and is the total variation (TV) of the image . For 2D images, it is defined as:
[TABLE]
where is the intensity of image pixel in row and column .
We recently introduced a parallel reconstruction algorithm called coordinate-reduced stochastic gradient descent (CSGD) to minimize quadratic objective function [3]. We here introduce a slight modification by simplifying the step length calculation and show that the modified version converges to the least squares solution of . We will call this modified algorithm block stochastic gradient descend (BSGD). We combine BSGD with an iterative shrinkage/thresholding (ISTA-type) step [4] to solve Eq.1. The new algorithm, called BSGD-TV, is compared with block ADMM-TV [5], an algorithm sharing the same parallel architecture and the same communication cost. Simulation results show that BSGD-TV is significantly faster as it requires significantly fewer matrix vector products compared to block ADMM-TV.
2 BSGD-TV Algorithm
2.1 Algorithm description
BSGD works on blocks of and . We assume that is divided into row blocks and column blocks. Let and be sub-vectors of and and let be the associated block of matrix so that . Our algorithm splits the optimization into blocks, so that each parallel process only computes using a single block and for some and . Each process also requires an estimate of the current residual and computes a vector , both of which are of the same size as . The main steps (ignoring initialisation) are described in Algo.1.
To effectively solve line 9, we here adopt method proposed in [6].
2.2 BSGD Convergence
BSGD without the proximal operator (=0 in Eq.1), and with parallelization over all subsets can be shown to converge to the least squares solution. To see this, we write the update of as
[TABLE]
In this form, BSGD is similar to gradient descent but uses an old gradient. Assume that there is a fixed point defined by . Note that the fixed point condition implies that, if is full column rank, then . Thus the fixed point is the least squares solution. Theorem 2.1 states the conditions on parameter for convergence when all subsets and are selected within one epoch.
Theorem 2.1**.**
If , where is the maximum eigenvalue of and assume is full column rank, then BSGD without the TV operator (=0 in Eq.1), and with parallelization over all subsets converges to the least squares solution .
Proof of Theorem 2.1.
The iteration in Eq.3 can be written as
[TABLE]
Standard convergence results for iterative method of this type with fixed require the spectral radius of to be less than 1 [7]. Let be any (possibly complex valued) eigenvalue of , i.e. satisfies det. It is straightforward to obtain:
[TABLE]
By Eq.5, we see that eigenvalues of correspond to
[TABLE]
Eigenvalues of are then given by
[TABLE]
As the spectral radius of corresponds to the largest magnitude of the eigenvalues of , we require and to ensure the convergence of the algorithm. is a positive definite matrix and thus has only positive, real valued eigenvalues . Thus and are real valued if and complex valued if is . In the complex case, it is easy to see that and if , implying that the acceptable range of is . ∎
Theorem 2.2 gives a general convergence condition when applying BSGD-TV to solve Eq.1.
Theorem 2.2**.**
If the constant step length satisfies
[TABLE]
where is defined in Eq.1, then BSGD-TV converges to the optimal solution of Eq.1.
Proof of Theorem 2.2.
With parallelization over all subsets, BSGD-TV computes
[TABLE]
We define a function as
[TABLE]
where is the squared norm. The fact that means that:
[TABLE]
Finally, the definition of and the requirement on the step length in Eq.8, mean that holds, so that BSGD-TV converges to the fixed point of Eq.1. ∎
3 Simulations
We show experimentally that the method also converges when only a fraction and of subsets of and are randomly selected to calculate the at each iteration. The simulation geometry is shown in Fig.1.
We add Gaussian noise to the projections so that the SNR of is 17.7 dB. We define the relative error as , where is the norm of the difference between reconstructed image vector and the original vector . Convergence is shown in Fig.2a. We plot relative error against epochs, where an epoch is a normalised iteration count that corrects for the fact that the stochastic version of our algorithm only updates a subset of elements at each iteration.
BSGD-TV and ADMM-TV are faster than ISTA in terms of epochs. However, ADMM-TV is significantly slower than BSGD-TV, because ADMM-TV requires matrix inversions at each iteration, while BSGD does not. Even when implementing ADMM-TV using as few conjugate gradient iterations per step as possible, as shown in Fig.2b, BSGD-TV is more computationally efficient in terms of the number of required matrix vector multiplications [3]. Compared to ISTA and GD, our block method allows these computations to be fully parallelised which would enable to reconstruct large scale CT reconstructions while the computation node have limited storage capacity.
4 Conclusion
BSGD-TV is a parallel algorithm for large scale TV constrained CT reconstruction. It is similar to the popular ISTA algorithm but is specially designed for optimisation in distributed networks. The advantage is that individual compute nodes only operate on subsets of and , which means they can operate with less internal memory. The method converges significantly faster than block-ADMM methods.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] X. Guo, “Convergence studies on block iterative algorithms for image reconstruction”, Applied Mathematics and Computation, 273 : 525–534, 2016.
- 2[2] E. Sidky and X.Pan, “Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization”, Physics in Medicine & \& Biology, 53 (17):4777–4807, 2008.
- 3[3] Y. Gao and T.Blumensath, “A Joint Row and Column Action Method for Cone-Beam Computed Tomography”, IEEE Transactions on Computational Imaging, 2018.
- 4[4] A. Beck and M.Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems”, SIAM journal on imaging sciences, 2 (1):183–202, 2009.
- 5[5] N. Parikh and S. Boyd, “Block splitting for distributed optimization”, Mathematical Programming Computation, 6 (1):77–102, 2014.
- 6[6] A. Beck and M.Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems”, IEEE Transactions on Image Processing, 18 (11):2419–2434, 2009.
- 7[7] Y. Saad, “Iterative methods for sparse linear systems”, siam, 82 , 2003.
