Fixing Nonconvergence of Algebraic Iterative Reconstruction with an   Unmatched Backprojector

Yiqiu Dong; Per Christian Hansen; Michiel E. Hochstenbach; Nicolai; Andre Brogaard Riis

arXiv:1902.04282·math.NA·February 14, 2019·SIAM J. Sci. Comput.

Fixing Nonconvergence of Algebraic Iterative Reconstruction with an Unmatched Backprojector

Yiqiu Dong, Per Christian Hansen, Michiel E. Hochstenbach, Nicolai, Andre Brogaard Riis

PDF

Open Access

TL;DR

This paper addresses convergence issues in algebraic iterative reconstruction methods with unmatched projector/backprojector pairs by proposing a shifted algorithm that guarantees convergence under certain conditions, with practical implementation guidance.

Contribution

It introduces a shifted algorithm for unmatched algebraic reconstruction that ensures convergence and provides eigenvalue estimation techniques for parameter selection.

Findings

01

The shifted algorithm guarantees convergence under specific conditions.

02

Perturbation bounds for the fixed point are established.

03

Numerical tests demonstrate improved convergence in computed tomography.

Abstract

We consider algebraic iterative reconstruction methods with applications in image reconstruction. In particular, we are concerned with methods based on an unmatched projector/backprojector pair; i.e., the backprojector is not the exact adjoint or transpose of the forward projector. Such situations are common in large-scale computed tomography, and we consider the common situation where the method does not converge due to the nonsymmetry of the iteration matrix. We propose a modified algorithm that incorporates a small shift parameter, and we give the conditions that guarantee convergence of this method to a fixed point of a slightly perturbed problem. We also give perturbation bounds for this fixed point. Moreover, we discuss how to use Krylov subspace methods to efficiently estimate the leftmost eigenvalue of a certain matrix to select a proper shift parameter. The modified algorithm…

Tables1

Table 1. Table 1: Estimation of the leftmost eigenvalue λ lm subscript 𝜆 lm \lambda_{\mathrm{lm}} of B A 𝐵 𝐴 BA with the methods discussed in Section 4 . We use the ASTRA test problem mentioned in the text. The methods stop when we reach convergence with absolute tolerance 10 − 2 superscript 10 2 10^{-2} or after the fixed number of iterations used in the field of values based method Algorithm 2 fovN for N = 10, 15, and 20 iterations, respectively. We show the mean number of matrix-vector multiplications (MVMs) as well as the mean and standard deviation of the estimated λ lm subscript 𝜆 lm \lambda_{\mathrm{lm}} . The top half of the table shows results for the case when the matrix-vector multiplications are done by the ASTRA functon opTomo , while the bottom half is for the case when we explicitly store the matrices. The difference is due to the difference in precision (single vs. double).

	25 trials with opTomo that utilizes the GPU
	Mean MVM	Mean $λ_{lm}$ (st. dev.)
fov10	1660	$- 0.8921$ ( $1.167 \cdot 10^{- 1}$ )
fov15	1960	$- 0.9323$ ( $7.478 \cdot 10^{- 3}$ )
fov20	1260	$- 0.9354$ ( $3.579 \cdot 10^{- 3}$ )
ks	1041	$- 0.9281$ ( $2.510 \cdot 10^{- 5}$ )
eigs	1278	$- 0.9281$ ( $4.103 \cdot 10^{- 5}$ )
	25 trials with $A$ and ${\hat{A}}^{T}$ explicitly stored
	Mean MVM	Mean $λ_{lm}$ (st. dev.)
fov10	1660	$- 0.8920$ ( $1.171 \cdot 10^{- 1}$ )
fov15	1960	$- 0.9322$ ( $7.647 \cdot 10^{- 3}$ )
fov20	1260	$- 0.9354$ ( $3.581 \cdot 10^{- 3}$ )
ks	1039	$- 0.9281$ ( $2.334 \cdot 10^{- 5}$ )
eigs	1261	$- 0.9281$ ( $3.893 \cdot 10^{- 5}$ )
jd	1404	$- 0.9281$ ( $4.624 \cdot 10^{- 5}$ )

Equations132

A \overset{x}{ˉ} = \overset{ˉ}{b}, A \in R^{m \times n},

A \overset{x}{ˉ} = \overset{ˉ}{b}, A \in R^{m \times n},

X = R (X), Y = R (Y), Y^{⊥} = R (Y_{0}) .

X = R (X), Y = R (Y), Y^{⊥} = R (Y_{0}) .

\forall x \in X : P_{X, Y} x = x, \forall y \in Y : P_{X, Y} y = 0, \forall z \in R^{m} : P_{X, Y} z \in X,

\forall x \in X : P_{X, Y} x = x, \forall y \in Y : P_{X, Y} y = 0, \forall z \in R^{m} : P_{X, Y} z \in X,

P_{\mathcal{X},\mathcal{Y}}=X\bigl{(}Y_{0}^{T}X\bigr{)}^{\!\dagger}\,Y_{0}^{T}\ ,

P_{\mathcal{X},\mathcal{Y}}=X\bigl{(}Y_{0}^{T}X\bigr{)}^{\!\dagger}\,Y_{0}^{T}\ ,

X_{\mathcal{Y}}^{\dagger}=\left\{\begin{array}[]{ll}X^{\dagger}\,P_{\mathcal{X},\mathcal{Y}}\ =\ \bigl{(}Y_{0}^{T}X\bigr{)}^{\!\dagger}\,Y_{0}^{T}\ ,&m\geq n\\[5.69054pt] P_{\mathcal{Y},\mathcal{N}(X)}\,X^{\dagger}\ =\ Y\,(X\,Y)^{\dagger}\ ,&m\ \leq\ n\ .\end{array}\right.

X_{\mathcal{Y}}^{\dagger}=\left\{\begin{array}[]{ll}X^{\dagger}\,P_{\mathcal{X},\mathcal{Y}}\ =\ \bigl{(}Y_{0}^{T}X\bigr{)}^{\!\dagger}\,Y_{0}^{T}\ ,&m\geq n\\[5.69054pt] P_{\mathcal{Y},\mathcal{N}(X)}\,X^{\dagger}\ =\ Y\,(X\,Y)^{\dagger}\ ,&m\ \leq\ n\ .\end{array}\right.

x^{k + 1} = x^{k} + ω A^{T} (b - A x^{k}), k = 0, 1, 2, \dots,

x^{k + 1} = x^{k} + ω A^{T} (b - A x^{k}), k = 0, 1, 2, \dots,

x^{k + 1} = x^{k} + ω B (b - A x^{k}), k = 0, 1, 2, \dots,

x^{k + 1} = x^{k} + ω B (b - A x^{k}), k = 0, 1, 2, \dots,

0 < ω < \frac{2 Re ( λ _{j} )}{∣ λ _{j} ∣ ^{2}} and Re (λ_{j}) > 0,

0 < ω < \frac{2 Re ( λ _{j} )}{∣ λ _{j} ∣ ^{2}} and Re (λ_{j}) > 0,

B A B y = B b .

B A B y = B b .

x^{*} = (B A)_{R (B)}^{†} B b = B (A B)_{N (B)}^{†} b = P_{R (B), N (B A)} A_{N (B)}^{†} b,

x^{*} = (B A)_{R (B)}^{†} B b = B (A B)_{N (B)}^{†} b = P_{R (B), N (B A)} A_{N (B)}^{†} b,

\overset{x}{ˉ}^{*} = P_{R (B A), N (B A)} \overset{x}{ˉ} .

\overset{x}{ˉ}^{*} = P_{R (B A), N (B A)} \overset{x}{ˉ} .

B A = U Σ V^{T}, Σ \in R^{p \times p}, U, V \in R^{n \times p},

B A = U Σ V^{T}, Σ \in R^{p \times p}, U, V \in R^{n \times p},

y = (V^{T} B)^{†} Σ^{- 1} U^{T} B b,

y = (V^{T} B)^{†} Σ^{- 1} U^{T} B b,

x^{*} = B (V^{T} B)^{†} Σ^{- 1} U^{T} B b = B (V^{T} B)^{†} V^{T} V Σ^{- 1} U^{T} B b = B (V^{T} B)^{†} V^{T} (B A)^{†} B b .

x^{*} = B (V^{T} B)^{†} Σ^{- 1} U^{T} B b = B (V^{T} B)^{†} V^{T} V Σ^{- 1} U^{T} B b = B (V^{T} B)^{†} V^{T} (B A)^{†} B b .

\bar{x}^{*}=B\,A\,(B\,A\,B\,A)^{\dagger}B\,A\,\bar{x}=B\,A\,\bigl{(}((B\,A)^{T})^{T}B\,A\bigr{)}^{\dagger}((B\,A)^{T})^{T}\bar{x},

\bar{x}^{*}=B\,A\,(B\,A\,B\,A)^{\dagger}B\,A\,\bar{x}=B\,A\,\bigl{(}((B\,A)^{T})^{T}B\,A\bigr{)}^{\dagger}((B\,A)^{T})^{T}\bar{x},

∥ x^{*} - \overset{x}{ˉ}^{*} ∥ \leq ∥ P_{R (B), N (B A)} ∥ ∥ A_{N (B)}^{†} ∥ ∥ e ∥ .

∥ x^{*} - \overset{x}{ˉ}^{*} ∥ \leq ∥ P_{R (B), N (B A)} ∥ ∥ A_{N (B)}^{†} ∥ ∥ e ∥ .

∥ x^{*} - \overset{x}{ˉ}^{*} ∥ \leq ∥ A_{N (B)}^{†} ∥ ∥ e ∥ .

∥ x^{*} - \overset{x}{ˉ}^{*} ∥ \leq ∥ A_{N (B)}^{†} ∥ ∥ e ∥ .

x^{k + 1} = (1 - α ω) x^{k} + ω B (b - A x^{k}), k = 0, 1, 2, \dots .

x^{k + 1} = (1 - α ω) x^{k} + ω B (b - A x^{k}), k = 0, 1, 2, \dots .

x_{α} = ar g x min {∥ A x - b ∥^{2} + α ∥ x ∥^{2}} = (A^{T} A + α I)^{- 1} A^{T} b,

x_{α} = ar g x min {∥ A x - b ∥^{2} + α ∥ x ∥^{2}} = (A^{T} A + α I)^{- 1} A^{T} b,

x^{k + 1} = x^{k} - ω (A^{T} (b - A x^{k}) + α x^{k}) = (1 - α ω) x^{k} + ω A^{T} (b - A x^{k}) .

x^{k + 1} = x^{k} - ω (A^{T} (b - A x^{k}) + α x^{k}) = (1 - α ω) x^{k} + ω A^{T} (b - A x^{k}) .

[A α I], [B, α I], [b 0] .

[A α I], [B, α I], [b 0] .

0 < ω < 2 \frac{Re ( λ _{j} ) + α}{∣ λ _{j} ∣ ^{2} + α ( α + 2 Re ( λ _{j} ))} and Re (λ_{j}) + α > 0 .

0 < ω < 2 \frac{Re ( λ _{j} ) + α}{∣ λ _{j} ∣ ^{2} + α ( α + 2 Re ( λ _{j} ))} and Re (λ_{j}) + α > 0 .

x^{k + 1} = T x^{k} + ω B b,

x^{k + 1} = T x^{k} + ω B b,

C x_{α}^{*} = B b .

C x_{α}^{*} = B b .

\left[\begin{array}[]{cc}0&0\\[5.69054pt] 0&I\end{array}\right]\ \left[\begin{array}[]{cc}I&-\omega\,N^{T}CR\\[5.69054pt] 0&I-\omega R^{T}CR\end{array}\right]=\left[\begin{array}[]{cc}0&0\\[5.69054pt] 0&I-\omega R^{T}CR\end{array}\right].

\left[\begin{array}[]{cc}0&0\\[5.69054pt] 0&I\end{array}\right]\ \left[\begin{array}[]{cc}I&-\omega\,N^{T}CR\\[5.69054pt] 0&I-\omega R^{T}CR\end{array}\right]=\left[\begin{array}[]{cc}0&0\\[5.69054pt] 0&I-\omega R^{T}CR\end{array}\right].

T x = x \Leftrightarrow C x = 0 \Leftrightarrow x \in N (C) \Leftrightarrow - α is an eigenvalue of B A .

T x = x \Leftrightarrow C x = 0 \Leftrightarrow x \in N (C) \Leftrightarrow - α is an eigenvalue of B A .

x_{α}^{*} = (1 - α ω) x_{α}^{*} + ω B (b - A x^{*}) ⟺ (B A + α I) x_{α}^{*} = B b

x_{α}^{*} = (1 - α ω) x_{α}^{*} + ω B (b - A x^{*}) ⟺ (B A + α I) x_{α}^{*} = B b

x_{α}^{*} = (B A + α I)^{- 1} B b = B (A B + α I)^{- 1} b, x_{α}^{*} \in R (B) .

x_{α}^{*} = (B A + α I)^{- 1} B b = B (A B + α I)^{- 1} b, x_{α}^{*} \in R (B) .

\overset{x}{ˉ}_{α}^{*} = (B A + α I)^{- 1} B A \overset{x}{ˉ}, \overset{x}{ˉ}_{α}^{*} \in R (B A), \overset{x}{ˉ} - \overset{x}{ˉ}_{α}^{*} = α (B A + α I)^{- 1} \overset{x}{ˉ} .

\overset{x}{ˉ}_{α}^{*} = (B A + α I)^{- 1} B A \overset{x}{ˉ}, \overset{x}{ˉ}_{α}^{*} \in R (B A), \overset{x}{ˉ} - \overset{x}{ˉ}_{α}^{*} = α (B A + α I)^{- 1} \overset{x}{ˉ} .

B\,\bigl{(}(A\,B+\alpha\,I)\,y-b\bigr{)}=0\ ,

B\,\bigl{(}(A\,B+\alpha\,I)\,y-b\bigr{)}=0\ ,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging Techniques and Applications · Advanced MRI Techniques and Applications · Numerical methods in inverse problems

Full text

\headers

Fixing NonconvergenceDong, Hansen, Hochstenbach, and Riis

Fixing Nonconvergence of Algebraic Iterative Reconstruction with

an Unmatched Backprojector††thanks: Submitted to the editors DATE. \fundingThis work has been partially funded by Advanced grant 291405 from the European Research Council. The third author has been supported by an NWO Vidi research grant.

Yiqiu Dong Department of Applied Mathematics and Computer Science, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark (, http://www.compute.dtu.dk/~yido/, , http://www.compute.dtu.dk/~pcha/, , http://www.compute.dtu.dk/~nabr/). [email protected]

[email protected]

Per Christian Hansen22footnotemark: 2

Michiel E. Hochstenbach Department of Mathematics and Computer Science, TU Eindhoven, 5600 MB Eindhoven, The Netherlands (http://www.win.tue.nl/~hochsten/)

Nicolai André Brogaard Riis22footnotemark: 2

Abstract

We consider algebraic iterative reconstruction methods with applications in image reconstruction. In particular, we are concerned with methods based on an unmatched projector/backprojector pair; i.e., the backprojector is not the exact adjoint or transpose of the forward projector. Such situations are common in large-scale computed tomography, and we consider the common situation where the method does not converge due to the nonsymmetry of the iteration matrix. We propose a modified algorithm that incorporates a small shift parameter, and we give the conditions that guarantee convergence of this method to a fixed point of a slightly perturbed problem. We also give perturbation bounds for this fixed point. Moreover, we discuss how to use Krylov subspace methods to efficiently estimate the leftmost eigenvalue of a certain matrix to select a proper shift parameter. The modified algorithm is illustrated with test problems from computed tomography.

keywords:

Unmatched transpose, algebraic iterative reconstruction, perturbation theory, leftmost eigenvalue estimation, computed tomography

{AMS}

65F10, 65F15, 65F22, 15A18, 15A60

1 Introduction

Algebraic iterative reconstruction techniques [10] — such as the methods by Kaczmarz and Cimmino — play an important role in solving inverse problems. In particular, they are popular in computed tomography (CT) due to their great flexibility with respect to the measurement geometry of the X-ray scanner and their ability to handle very underdetermined problems. This is in contrast to filtered backprojection and similar algorithms [16] that rely on specific geometries and a large amount of data. Algebraic iterative methods are also used successfully in other image reconstruction problems such as image deblurring.

The fundamental mechanism behind these methods is known as semi-convergence [16]. During the initial iterations, the iterates approach the exact (and unattainable) solution to the noise-free problem, while in later stages the iterates converge to the undesired noisy solution. The methods produce filtered, or regularized, solutions and the number of iterations plays the role of a regularization parameter [7].

The algebraic iterative methods, in their basic form as well as their block versions, lend themselves very well to implementations that utilize GPUs and other hardware accelerators, and where the coefficient matrix is never stored; rather, the matrix-vector multiplications are computed on the fly. This has led to an implementation paradigm that is routinely used in software packages for computed tomography (such as ASTRA [22] and TIGRE [1]), namely, to focus on the computational speed of the matrix-vector multiplication. However, this introduces a convergence issue that has largely been ignored.

To set the stage, we formulate the noise-free problem as

[TABLE]

where the vectors $\bar{x}$ and $\bar{b}$ represent the exact image and the noise-free data, while $A$ represents the forward model — known as the (forward) projection in CT where both $m\geq n$ and $m<n$ are common. The multiplication with $A^{T}$ , the transpose of $A$ , is known in CT as the backprojection. These two operations form the computational core of any algebraic iterative method and therefore — to optimize for computational speed — the software developers often choose different discretization schemes, and different model approximations, for these operations [24]. Consequently, in such software the backprojection corresponds to multiplication with a matrix $\widehat{A}^{\,T}$ that is not the exact transpose of $A$ . We refer to this situation as having an unmatched projector/backprojector pair, and we call $\widehat{A}^{\,T}$ the unmatched transpose.

It appears that very little attention has been given to iterations based on such unmatched pairs; see [25] for an early reference and [2], [13] for two more recent ones. The latter paper has introduced a methodology for analyzing such iterations and formulated conditions for convergence, and it has been shown that an unmatched projector/backprojector pair deteriorates the best possible solution at the point of semi-convergence.

In the common situation of an unmatched projector/backprojector pair where the convergence criterion from [2] is not satisfied, the iterations will fail to converge for noise-free data (although some kind of semi-convergence may be observed experimentally for noisy data). In this paper we show that a small and cost-efficient modification of the basic algorithm can guarantee convergence to a solution of a slightly perturbed problem. This ensures that the fast implementations of the forward projections and backprojections can still be used without sacrificing convergence. Moreover, to provide a theoretical foundation we extend the convergence and perturbation analysis from [2] to the modified algorithm.

Our paper has been organized as follows. Section 2 sets the stage by summarizing convergence results for a generic iterative algorithm (proposed in [2]) that allows an unmatched transpose. Section 3 introduces the modified algorithm, gives the associated convergence conditions, and discusses the perturbation theory for the underlying problem. The modified algorithm is based on the introduction of a shift parameter, and in Section 4 we discuss how to efficiently estimate the leftmost eigenvalue of a certain matrix, which defines this shift. Finally, in Section 5 we give numerical examples that illustrate the new method for solving inverse problems, followed by some conclusions in Section 6.

We use the following notations: $\|\cdot\|$ denotes the vector and matrix 2-norm, $\mathcal{R}(A)$ and $\mathcal{N}(A)$ are the range and null space of $A$ , respectively, and we split a complex eigenvalue $\lambda_{j}$ into its real and imaginary parts denoted by $\mathrm{Re}(\lambda_{j})$ and $\mathrm{Im}(\lambda_{j})$ , respectively. For a vector $x$ , we use $x^{H}$ for its conjugate transpose.

In Section 2 we use notations and concepts associated with oblique projections and oblique pseudoinverses to obtain compact expression that would otherwise be quite lengthy. We refer to [9] for details and geometric interpretations of these quantities in relation to inverse problems. Given two complementary subspaces $\mathcal{X}$ and $\mathcal{Y}$ of $\mathbb{R}^{m}$ that intersect trivially, and matrices $X$ , $Y$ , and $Y_{0}$ such that

[TABLE]

Then the $m\times m$ matrix $P_{\mathcal{X},\mathcal{Y}}$ denotes the oblique projector onto $\mathcal{X}$ along $\mathcal{Y}$ which satisfies

[TABLE]

and the projection matrix can be written as

[TABLE]

where † denotes the Moore–Penrose pseudoinverse. Moreover, if $X\in\mathbb{R}^{m\times n}$ then the $n\times m$ matrix $X_{\mathcal{Y}}^{\dagger}$ denotes the oblique pseudoinverse of $X$ along $\mathcal{Y}$ , given by

[TABLE]

The case $m\geq n$ requires $\mathcal{R}(X)$ and $\mathcal{N}(Y)$ to be complementary, while the case $m\leq n$ requires $\mathcal{N}(X)$ and $\mathcal{R}(Y)$ to be complementary. If $\mathcal{Y}=\mathcal{X}^{\perp}$ then $P_{\mathcal{X},\mathcal{Y}}$ is the orthogonal projector on $\mathcal{X}$ and $X_{\mathcal{Y}}^{\dagger}$ is the ordinary pseudoinverse.

2 The BA Iteration

When we consider noisy data $b=\bar{b}+e$ , where the vector $e$ represents the perturbation, then it is common to compute a (weighted) least squares solution. In the simplest case with unit weights we can compute the solution by means of the Landweber iteration (or gradient descent method) with initial guess $x^{0}=0$ :

[TABLE]

where $\omega$ is a relaxation parameter satisfying $0<\omega<2\,/\,\|A^{T}A\|$ .

To analyze the behavior of this and similar algebraic iterative methods with an unmatched transpose, we follow [2] and consider the BA Iteration defined by

[TABLE]

where different choices of the $n\times m$ matrix $B$ give unmatched-transpose versions of known iterative methods. For example, $B$ can be an unmatched transpose $\widehat{A}^{\,T}$ for Landweber’s method, or $B$ can be an unmatched approximation to $A^{T}\mathrm{diag}(A\,A^{T})^{-1}$ for Cimmino’s method; see [10] for an overview of methods.

The convergence of the BA Iteration is governed by the (complex) eigenvalues $\lambda_{j}$ of the matrix $BA$ : (5) converges if and only if the relaxation parameter $\omega$ and all nonzero $\lambda_{j}$ satisfy

[TABLE]

see [2, Prop. 3.2] for details. Note that this specializes to the standard condition when $B=A^{T}$ .

We will now investigate when the BA Iteration (5) has a unique fixed point. From the definition of the BA Iteration (5) with $x^{0}=0$ it follows that any fixed point $x^{*}$ must satisfy $B\,A\,x^{*}=B\,b$ . Moreover, it is also clear from (5) that all iterates $x^{k}\in\mathcal{R}(B)$ ; in particular this holds for $x^{*}$ . Therefore, we can write any fixed point in the form $x^{*}=B\,y$ for some vector $y\in\mathbb{R}^{m}$ . (This vector $y$ may not be unique, as one can add an arbitrary component $z\in\mathcal{N}(B)$ , but this is irrelevant for what is to follow.) Inserting $x^{*}=B\,y$ we obtain an equation for $y$ :

[TABLE]

Here, $B\,A$ is an operator from $\mathcal{R}(B)$ to itself. Recall that $A\in\mathbb{R}^{m\times n}$ and $B\in\mathbb{R}^{n\times m}$ , where both $m\geq n$ and $m<n$ are possible in CT applications.

From (7) it follows there is a unique fixed point $x\in\mathcal{R}(B)$ if and only if one of the following eight equivalent conditions holds.

Proposition 2.1.

Consider the two matrices $A\in\mathbb{R}^{m\times n}$ and $B\in\mathbb{R}^{n\times m}$ with ranks $r_{A},r_{B}\leq\min\{m,n\}$ and with the singular value decompositions (SVDs) $A=U_{A}\Sigma_{A}V_{A}^{T}$ and $B^{T}=U_{B}\Sigma_{B}V_{B}^{T}$ . The following statements are equivalent:

(i)

$BA:\mathcal{R}(B)\to\mathcal{R}(B)$ * is nonsingular (meaning that $BA\,z=0$ and $z\in\mathcal{R}(B)$ imply that $z=0$ );* 2. (ii)

For every $b\in\mathbb{R}^{m}$ , the equation $BAx=Bb$ has a unique solution $x\in\mathcal{R}(B)$ ; 3. (iii)

$\mathcal{R}(B)\cap\mathcal{N}(BA)=\{0\}$ ; 4. (iv)

$\mathcal{N}(BAB)=\mathcal{N}(B)$ ; 5. (v)

$\mathcal{R}(BAB)=\mathcal{R}(B)$ ; 6. (vi)

$\mathrm{rank}(BAB)=\mathrm{rank}(B)$ ; 7. (vii)

$A$ * is nonsingular on $\mathcal{R}(B)$ and $B$ is nonsingular on $\mathcal{R}(AB)$ ;* 8. (viii)

$\mathcal{R}(B)\cap\mathcal{N}(A)=\{0\}$ * and $\mathcal{R}(AB)\cap\mathcal{N}(B)=\{0\}$ ; *

Proof 2.2.

*The equivalences follow relatively straightforwardly from the (dimensions of) nullspaces and ranges of $A$ , $B$ , $BA$ , and $AB$ . For (iv) we have $\mathcal{N}(BAB)\supseteq\mathcal{N}(B)$ with equality if and only if (i) holds. For (v) one has $\mathcal{R}(BAB)\subseteq\mathcal{R}(B)$ with equality if and only if (i) holds, where the ranks in (vi) are the dimensions of the subspaces of (v). A nonzero vector in $\mathcal{R}(B)$ cannot be mapped to zero by the consecutive action of $BA$ , which is stated in (vii) and (viii). *

We assume from now on that there is a unique solution, and the next theorem provides specific expressions for this fixed point. We will see that, even in the absence of noise, the fixed point of (5) is not the exact solution $\bar{x}$ . One way to understand this — following the discussion in [2] — is by the fact that the unmatched normal equations $BA\,x=B\,b$ may be viewed as an oblique projection of $A\,x=b$ , instead of the common orthogonal projection underlying the normal equations $A^{T}A\,x=A^{T}\,b$ .

Theorem 2.3.

Assume that $A$ and $B$ satisfy the criteria in Proposition 2.1. Then the fixed point $x^{*}$ of the BA Iteration (5) with starting vector $x^{0}=0$ can be expressed in three ways:

[TABLE]

and for noise-free data $\bar{b}=A\,\bar{x}$ the fixed point is given by

[TABLE]

*If $m\geq n$ and $A$ and $B$ have full rank then $x^{*}=(B\,A)^{-1}B\,b$ and $\bar{x}^{*}=\bar{x}$ . *

Proof 2.4.

By writing the fixed point as $x^{*}=B\,y$ with $y\in\mathbb{R}^{m}$ it follows from (7) that $x^{*}=B\,(B\,A\,B)^{\dagger}B\,b$ , and the first two expressions in (8) are obtained by recognizing that $B\,((B\,A)\,B)^{\dagger}=(B\,A)_{\mathcal{R}(B)}^{\dagger}$ and $(B\,(A\,B))^{\dagger}B=(A\,B)_{\mathcal{N}(B)}^{\dagger}$ ; cf. (3). We now introduce the SVD

[TABLE]

where $p=\mathrm{rank}(BA)$ . Inserting this in (7) we get $U\,\Sigma\,V^{T}B\ y=B\,b$ , and by multiplying from the left with $\Sigma^{-1}U^{T}$ we obtain $V^{T}B\ y=\Sigma^{-1}U^{T}B\,b$ . The solution $y$ of minimum norm is given by

[TABLE]

and hence

[TABLE]

By recognizing $P_{\,\mathcal{R}(B),\,\mathcal{N}(BA)}=B\,(V^{T}B)^{\dagger}V^{T}$ as the oblique projector onto $\mathcal{R}(B)$ along $\mathcal{N}(V^{T})=\mathcal{N}(BA)$ , cf. (2), and $A_{\mathcal{N}(B)}^{\dagger}=(B\,A)^{\dagger}B$ as the oblique pseudoinverse of $A$ along $\mathcal{N}(B)$ , cf. (3), we obtain the third expression in (8).

For the special case $b=\bar{b}=A\,\bar{x}$ the fixed point satisfies $\bar{x}^{*}\in\mathcal{R}(BA)$ , and hence we can write it as $\bar{x}^{*}=B\,A\,\bar{y}$ for some vector $\bar{y}\in\mathbb{R}^{n}$ . According to $B\,A\,\bar{x}^{*}=B\,\bar{b}=B\,A\,\bar{x}$ , we obtain

[TABLE]

*and we recognize $P_{\,\mathcal{R}(BA),\,\mathcal{N}(BA)}=B\,A\bigl{(}((B\,A)^{T})^{T}B\,A\bigr{)}^{\dagger}((B\,A)^{T})^{T}$ as the oblique projector onto $\mathcal{R}(BA)$ along $\mathcal{R}((BA)^{T})^{\perp}=\mathcal{N}(BA)$ . *

The sensitivity of the fixed point to perturbations of the right-hand side can be characterized as follows.

Corollary 2.5.

Let $\bar{x}^{*}$ and $x^{*}$ denote the fixed point of the BA Iteration when applied to the noise-free data $\bar{b}$ and the noisy data $b=\bar{b}+e$ , respectively. Then

[TABLE]

If $m\geq n$ and $A$ and $B$ have full rank then

[TABLE]

*When $B=A^{T}$ then the oblique pseudoinverse $A_{\mathcal{N}(B)}^{\dagger}$ is the ordinary pseudoinverse $A^{\dagger}$ and we obtain the traditional least-squares perturbation bound. *

Proof 2.6.

*The first bound is a direct consequence of (8), and the second bound follows from the fact that $P_{\,\mathcal{R}(B),\,\mathcal{N}(BA)}=I$ when $m\geq n=\mathrm{rank}(A)$ . *

The conclusions to be drawn from the analysis in this section is that the conditions for the existence of a fixed point of the BA Iteration are rather strict, and that the fixed point is potentially very sensitive to data errors since the matrix $A$ is ill-conditioned in inverse problems. Moreover, it is very difficult to check the existence conditions in a practical application, and it appears that they are very often violated in the available software systems. This motivates the development of a modified iterative method that is always guaranteed to have a fixed point, which we introduce and analyze in the rest of this paper.

3 A Modified Algorithm

In our numerical studies with ASTRA and other software packages for CT, we have found that very often the convergence condition in (6) is violated in that $B\,A$ has one or more eigenvalues with negative real part. As a consequence the iteration has no fixed point and the typical situation is that the iterates $x^{k}$ , after some iterations, start to diverge. We illustrate this in Section 5.

3.1 The Shifted BA Iteration

To remedy this non-convergence issue, we propose a modified version of the BA Iteration that has guaranteed convergence and whose fixed point approximates the exact solution $\bar{x}$ . In addition, the modified method should exhibit semi-convergence properties similar to the BA Iteration. We refer to the modified algorithm as the Shifted BA Iteration, and it guarantees convergence of the iterations for appropriate choices of the two parameters $\alpha>0$ and $\omega>0$ :

[TABLE]

This scheme is motivated by the Tikhonov problem,

[TABLE]

for which a gradient descent step takes the form

[TABLE]

Hence, if $B=A^{T}$ then with a properly chosen $\omega$ the iteration (10) converges to a Tikhonov solution $x_{\alpha}$ . Below we study the convergence properties of (10) with $B\neq A^{T}$ .

The matrix that governs the iterations for the Shifted BA Iteration (10) is $B\,A+\alpha\,I$ with $I$ as the identity matrix, whose eigenvalues are $\lambda_{j}+\alpha$ (where $\lambda_{j}$ are the eigenvalues of $BA$ ). Our key idea is that by a proper choice of the additional positive parameter $\alpha$ we can ensure that all these eigenvalues have a nonnegative real part — thus ensuring convergence. The shift needs to be just large enough that $\mathrm{Re}(\lambda_{j})+\alpha>0$ for those $\lambda_{j}\neq-\alpha$ . At the same time, if $\alpha$ is small then the fixed point will be an approximation to the exact solution $\bar{x}$ . Hence, in contrast to the BA Iteration the shifted version has a unique fixed point that is always attained for both noisy and noise-free data. Our new approach can therefore be viewed as a modification where both a regularization term and semi-convergence of the iterations are used for noisy data.

We note that the Shifted BA Iteration is mathematically equivalent to applying the BA Iteration to the augmented matrices and vector

[TABLE]

A similar idea was used in [3, §3.2], for the case $B=A^{T}$ , to perform convergence analysis for the case $\mathrm{rank}(A)<n$ . According to [2, Props. 3.1 and 3.2], with the augmented matrices and vector defined in (12) we obtain the following convergence criterion for the Shifted BA Iteration.

Theorem 3.1.

Let $\lambda_{j}$ denote those eigenvalues of $BA$ that are different from $-\alpha$ . Then the Shifted BA Iteration (10) converges to a fixed point if and only if $\alpha$ and $\omega$ satisfy

[TABLE]

Proof 3.2.

Replacing the matrices $B$ and $A$ in the BA Iteration with the augmented ones from (12), we define $C=BA+\alpha I$ and $T=I-\omega C$ . Then $T$ is the iteration matrix for the Shifted BA Iteration (10), i.e.,

[TABLE]

and any fixed point $x^{*}_{\alpha}$ of (10) satisfies the equation

[TABLE]

Let $P=P_{\mathcal{R}(C^{T})}$ be the orthogonal projector onto $\mathcal{R}(C^{T})=\mathcal{N}(C)^{\perp}$ . In view of the presence of the projection $P$ , we consider the new coordinate system given by the orthogonal matrix $[\,N\ \,R\,]$ , where $N$ and $R$ are matrices with orthonormal columns spanning $\mathcal{N}(C)$ and $\mathcal{R}(C^{T})$ , respectively. We examine $P\,T$ in the new coordinates where $[\,N\ \,R\,]^{T}P\,[\,N\ \,R\,]\ [\,N\ \,R\,]^{T}\,T\,[\,N\ \,R\,]$ takes the form

[TABLE]

For the first block row, it holds that

[TABLE]

This shows the eigenvalue $1$ of $T$ is associated with the eigenspace $\mathcal{N}(C)$ . We see that because of the projector operator, this eigenvalue, which corresponds to eigenvalues of $BA$ equal to $-\alpha$ , is irrelevant for convergence. From the second block row, it suffices to consider $PT$ as operator $\mathcal{R}(C^{T})\to\mathcal{R}(C^{T})$ , where it holds that $PT=T$ . Combining these facts, we conclude that it is enough to consider the eigenvalues $1-\omega(\lambda_{j}+\alpha)$ of $T$ , where $\lambda_{j}$ is not equal to $-\alpha$ .

*Then, applying the results in [2, Props. 3.1 and 3.2] to the augmented matrices and vector (12), we obtain the sufficient and necessary condition of convergence with respect to $\alpha$ and $\omega$ . *

When we have convergence, a fixed point $x_{\alpha}^{*}$ of the Shifted BA Iteration satisfies

[TABLE]

and, similar to Theorem 2.3, we can characterize this fixed point as follows.

Theorem 3.3.

Assume that $B\,A+\alpha\,I$ is nonsingular. The fixed point of the Shifted BA Iteration (10) applied to $b$ , with starting vector $x^{0}=0$ and $\alpha>0$ , satisfies

[TABLE]

For noise-free data $\bar{b}=A\,\bar{x}$ the fixed point $\bar{x}_{\alpha}^{*}$ satisfies

[TABLE]

Proof 3.4.

The first relation follows immediately from (16). To obtain the second relation we write $x_{\alpha}^{*}=B\,y$ and note that $(B\,A+\alpha\,I)\,B\,y=B\,(A\,B+\alpha\,I)\,y$ . Hence $y$ must satisfy

[TABLE]

and therefore $y$ has the general form with an arbitrary component in the null space of $B\,(A\,B+\alpha\,I)$ as well as $B$ :

[TABLE]

We note that $A\,B+\alpha\,I$ is nonsingular due to our assumption that $B\,A+\alpha\,I$ is nonsingular, cf. [11, Thm. 1.3.22]. Thus

[TABLE]

and it follows immediately that $x_{\alpha}^{*}\in\mathcal{R}(B)$ . The first results for $\bar{x}_{\alpha}^{*}$ follows immediately from (17). To show the second result let $B\,A$ have the eigendecomposition $B\,A=W\,\mathrm{diag}(\lambda_{i})\,W^{-1}$ ; then $(B\,A+\alpha\,I)^{-1}B\,A=W\,\mathrm{diag}(\lambda_{i}/(\lambda_{i}+\alpha))\,W^{-1}$ from which it follows that $\bar{x}_{\alpha}^{*}\in\mathrm{span}\{w_{1},\ldots,w_{\mathrm{rank}(BA)}\}=\mathcal{R}(BA)$ . The third result follows from the relation

[TABLE]

Note that the results in (17) can also be derived by applying the augmented matrices and vector defined in (12) to (8).

To summarize, we formulated the convergence conditions for the Shifted BA Iteration in terms of the shift $\alpha$ and the relaxation parameter $\omega$ , and we gave explicit expressions for the fixed point of this iterative method.

3.2 First-Order Perturbation Analysis

Recall that the fixed point $x_{\alpha}^{*}$ of the Shifted BA Iteration is the Tikhonov solution in (11) when $B=A^{T}$ . Following [2] it is natural to give a general perturbation analysis of the Tikhonov problem when different perturbations are introduced in the matrices $A$ and $A^{T}$ in the corresponding normal equations $A^{T}A\,x=A^{T}b$ . A special instance of this analysis is when $B$ is an unmatched transpose of $A$ , and where our analysis lets us bound the difference between the fixed point $x_{\alpha}^{*}$ and the exact solution $\bar{x}$ .

We introduce the perturbed quantities

[TABLE]

with $E_{A}\in\mathbb{R}^{m\times n}$ , $E_{A^{T}}\in\mathbb{R}^{n\times m}$ and $e\in\mathbb{R}^{m}$ . Moreover we define $\tilde{x}_{\alpha}$ as the solution to the Regularized Unmatched Normal Equations

[TABLE]

We want to compare $\tilde{x}_{\alpha}$ to the exact solution $\bar{x}$ . To do this, we introduce the Tikhonov solution $\bar{x}_{\alpha}=(A^{T}A+\alpha\,I)^{-1}A^{T}\bar{b}$ to the unperturbed problem (11).

We then split the error into a perturbation error $\tilde{x}_{\alpha}-\bar{x}_{\alpha}$ and a regularization error $\bar{x}_{\alpha}-\bar{x}$ as follows:

[TABLE]

This approach allows us to study the effect of the matrix and right-hand side perturbations in isolation from the effect that Tikhonov regularization has on a noise-free system.

Theorem 3.5.

With the definitions in (18) and (19) we have the following first-order error bounds obtained by omitting higher-order terms:

[TABLE]

Proof 3.6.

Let $\tilde{x}_{\alpha}=\bar{x}_{\alpha}+\delta x_{\alpha}$ and consider the perturbed system

[TABLE]

Moreover note that from (18) we have

[TABLE]

and

[TABLE]

where we have introduced

[TABLE]

Inserting these equations in (20) we obtain

[TABLE]

and rearranging we get

[TABLE]

Now using that $(A^{T}A+\alpha I)\,\bar{x}_{\alpha}=A^{T}\bar{b}$ and neglecting higher-order terms we get

[TABLE]

which can also be obtained by using the augmented form in the proof of [2, Prop. 2.1], i.e., replacing $A$ , $E$ , and $b$ with

[TABLE]

This leads to the bound

[TABLE]

where we use that, with $\sigma_{i}$ being the $i$ th singular value of $A$ ,

[TABLE]

and

[TABLE]

For the relative error we get

[TABLE]

where we used that

[TABLE]

To complete the analysis we need to bound the regularization error $\bar{x}_{\alpha}-\bar{x}$ associated with the noise-free system. To obtain a useful bound we need to incorporate the fact that we solve a discretized inverse problem. This is done in the following theorem from [5] (see also [7, Thm. 4.5.1]).

Theorem 3.7.

Introduce SVD of $A$ as $A=\sum_{i=1}^{\min(m,n)}u_{i}\,\sigma_{i}\,v_{i}^{T}$ and assume that the noise-free right-hand side $\bar{b}$ is given by the model

[TABLE]

in which $\nu\geq 0$ is a model parameter that controls the decay of these coefficients. Then

[TABLE]

In practice the Tikhonov regularization parameter $\alpha$ is always less than $\|A\|^{2}$ [7]. This theorem then says that the noise-free problem must satisfy the discrete Picard condition for Tikhonov regularization to produce a useful result — which is, of course, the case for the imaging problems we have in mind.

To summarize our results, the shift parameter $\alpha$ plays the following roles. The regularization error decreases as $\alpha$ decreases, and if the noise-free data satisfies the discrete Picard condition (as we expect) then a small nonzero $\alpha$ has little influence on the regularization error. On the other hand, as $\alpha$ decreases then the perturbation error increases. We want to use an $\alpha$ just large enough to ensure convergence.

4 Eigenvalue Estimation

The motivation behind the Shifted BA Iteration is to introduce a small positive shift parameter $\alpha$ , just large enough to ensure that all the shifted eigenvalues have a positive real part, i.e., $\mathrm{Re}(\lambda_{j})+\alpha>0$ . To turn this principle into an efficient working algorithm, we therefore need to be able to estimate the leftmost eigenvalue $\lambda_{\mathrm{lm}}$ of $B\,A$ , the eigenvalue with the minimal real part. If $\mathrm{Re}(\lambda_{\mathrm{lm}})>0$ then we just use the BA Iteration, otherwise we use the Shifted BA Iteration with $\alpha$ slightly larger than $|\mathrm{Re}(\lambda_{\mathrm{lm}})|$ . (In view of Theorem 3.1, we might theoretically even take $\alpha$ exactly equal to this quantity, but this is not important in practice, since the approximation to the smallest real part of the leftmost eigenvalue will usually be an upper bound.) It is important to note that we only have actions with $A$ and $B$ at our disposal, and no actions with $A^{T}$ , $B^{T}$ , or exact shift-and-invert transformations, are possible.

Various approaches have been investigated by Meerbergen and coauthors for the rightmost eigenvalue of a matrix $C$ (or, equivalently, the leftmost of $-C$ ). Several of these are “matrix-free”, which means that only matrix-vector products are necessary. An approach based on Chebyshev polynomials has been proposed in [14]. In [15], the search space is expanded by an approximation to $\exp(C)\,z$ , where $z$ is the current Ritz vector. However, these methods usually take a considerable number of matrix-vector multiplications to expand the search space by one vector.

Since the leftmost eigenvalue is an eigenvalue located at the exterior of the spectrum, Krylov based methods are often well suited. Stewart’s Krylov–Schur method [20] is one of the most popular methods to compute such eigenvalues. This method is essentially equivalent to implicitly restarted Arnoldi [19], as for instance implemented in Matlab’s eigs, but has a particularly elegant and easy-to-understand implementation. In our experiments, a custom-made implementation of the Krylov–Schur method proved to be, on average, a factor 1.2–1.3 faster than eigs in terms of matrix-vector multiplications. We give pseudocode for the Krylov–Schur method in Algorithm 1.

Algorithm 1: Krylov–Schur for the leftmost eigenvalue

Input: Minimal and maximal subspace dimensions $\underline{\ell}<\bar{\ell}$ , starting vector $v_{1}$ , tole-

rance tol, functions to perform matrix-vector multiplications with $A$ and $B$ .

Output: A pair $(\theta,v)$ that approximates the leftmost eigenpair $(\lambda_{\mathrm{lm}},v_{\mathrm{lm}})$ ,

with $\|(B\,A-\theta I)\,v\|\leq\texttt{tol}$ .

1: Form the Krylov decomposition $B\,A\,V_{\bar{\ell}}=V_{\bar{\ell}}\,H_{\bar{\ell}}+h_{\bar{\ell}+1,\bar{\ell}}\,v_{\bar{\ell}+1}\,f_{\bar{\ell}}^{H}$

2: for $k=1,2,\dots$

3: M Extract Schur pairs $(\theta_{j},c_{j})$ from $H_{\bar{\ell}}$ with $j=1,\dots,\bar{\ell}$ ,

MM sorted on nondecreasing real part

4: M if $|h_{\bar{\ell}+1,\bar{\ell}}\,f_{\bar{\ell}}^{H}\,c_{1}|\leq\texttt{tol}$

5: MMM Accept $\theta=\theta_{1}$ with leftmost Schur vector $v=V_{\bar{\ell}}\,c_{1}$ , stop

6: M end

7: M Truncate decomposition to dimension $\underline{\ell}$ by selecting leftmost Schur vectors

8: M Expand the Krylov decomposition to dimension $\bar{\ell}$

9: end

This algorithm uses the Krylov decomposition from [20] which is a generalization of the Arnoldi decomposition and which may have complex factors. In Line 1, the first Krylov decomposition has $f_{\bar{\ell}}=e_{\bar{\ell}}$ , the ${\bar{\ell}}$ th standard basis vector. In subsequent Krylov decompositions this vector is changed, as indicated below. In Line 4, we exploit the fact that

[TABLE]

As described in [20], the restart in Lines 7–8 is performed as follows. Suppose $H_{\bar{\ell}}=QSQ^{H}$ is the Schur decomposition with the most relevant Schur vectors (corresponding to the leftmost Ritz values in $S$ ) sorted in the beginning of $Q$ . Then the method is restarted with $V_{\underline{\ell}}:=V_{\bar{\ell}}\,Q_{1:\underline{\ell}}$ instead of $V_{\bar{\ell}}$ ; $S_{\underline{\ell}}$ instead of $H_{\bar{\ell}}$ ; and $Q_{1:\underline{\ell}}^{H}f_{\bar{\ell}}$ instead of $f_{\bar{\ell}}$ . We present numerical experiments with this method in Section 5.2.

An alternative approach that uses inexact shift-and-invert operators by carrying out inner iterations is Jacobi–Davidson [18]. This inexact inner-outer type of method may be worthwhile when the leftmost eigenvalue is not well separated from neighboring eigenvalues. In our applications this does not seem the case, and Jacobi–Davidson performs usually worse than Krylov–Schur. In addition, it is not obvious to generate a preconditioner for a shifted version of $B\,A$ , which would be very helpful for Jacobi–Davidson.

In our experiments with examples from computed tomography, we find that the matrix $B\,A$ is often close to normal (in fact, even close to symmetric; cf. Section 5.2). For such problems, another alternative approach to approximate the leftmost eigenvalue is the following. Let

[TABLE]

be the field of values (or numerical range) of $C$ . Then we can expect the quantity

[TABLE]

to be close to the leftmost eigenvalue of $B\,A$ . This $\nu(BA)$ would in principle be relatively easy to approximate, since it is equal to $\frac{1}{2}\min\lambda(BA+(BA)^{T})$ , which results in computing an exterior eigenvalue of a symmetric eigenproblem. This would mean that we can use a symmetric version of Krylov–Schur, which saves roughly half of the reorthogonalization costs.

Unfortunately, in our applications we do not have the action with $A^{T}$ or $B^{T}$ , and the described approach is not an option. However, as an alternative, we can still approximate $\nu(BA)$ by

[TABLE]

where $H_{\bar{\ell}}$ is the matrix in the Krylov decomposition obtained with $B\,A$ after several iterations. The computation of this quantity only requires the known $H_{\bar{\ell}}^{H}$ , and therefore bypasses the need of the transposes of $A$ and $B$ . Although there are usually no error bounds for this type of approximation, it may in practice be of very good quality. This algorithm is summarized below.

Algorithm 2: A field of values approximation for the leftmost eigenvalue

Input: Minimal and maximal subspace dimensions $\underline{\ell}<\bar{\ell}$ , starting vector $v_{1}$ ,

maximum iterations maxit, functions to perform actions with $A$ and $B$ .

Output: $\theta$ , the leftmost point of a projected field of values, which approximates the

leftmost point of $W(BA)$ .

1: Form the Krylov decomposition $B\,A\,V_{\bar{\ell}}=V_{\bar{\ell}}\,H_{\bar{\ell}}+h_{\bar{\ell}+1,\bar{\ell}}\,v_{\bar{\ell}+1}\,f_{\bar{\ell}}^{H}$

2: for $k=1,2,\dots,\texttt{maxit}$

3: M Extract Schur pairs $(\theta_{j},c_{j})$ from $H_{\bar{\ell}}$ with $j=1,\dots,\bar{\ell}$ ,

MM sorted on nondecreasing real part

4: M Truncate decomposition to dimension $\underline{\ell}$ by selecting leftmost Schur vectors

5: M Expand the Krylov decomposition to dimension $\bar{\ell}$

6: end

7: Accept $\theta=\min\mathrm{Re}W(H_{\bar{\ell}})$ = real part of leftmost eigenvalue of $\frac{1}{2}(H_{\bar{\ell}}+H_{\bar{\ell}}^{H})$

Note that in a type of method as in Algorithm 2, there is typically no error estimate available; there only is a user-chosen parameter maxit, which often can be modest (see Section 5). A main advantage of Algorithm 2 over Algorithm 1 is that it may be possible to stop the iterations improving the Krylov decomposition earlier, before the eigenpair of Algorithm 1 has converged to the desired tolerance. We test both approaches in the next section.

5 Numerical Examples

We present numerical examples with two different test problems, in order to demonstrate the performance of our new algorithm. All computations are carried out in MATLAB.

5.1 Small Illustrative Test Problem

The first test problem is quite small, with $m=n=64$ , such that we can explicitly compute the desired eigenvalues and other quantities that allow us to analyze the algorithms’ performance related to the above theory. The matrix $A$ is full, and it is generated by means of the function regutm from Regularization Tools [8] by which we can generate random test matrices with specified singular values, while the singular vectors have the characteristic spectral behavior of inverse problems [6]. We generate a well-conditioned matrix $A_{\mathrm{well}}$ and a more ill-conditioned matrix $A_{\mathrm{ill}}$ with the following distribution of singular values:

[TABLE]

We then generate a corresponding unmatched transpose $\widehat{A}^{\,T}=A^{T}+E_{A^{T}}$ by adding random Gaussian elements to $A^{T}$ ; all elements are from $\mathcal{N}(0,\sigma_{A}^{2})$ where the variance is chosen such that $\|E_{A^{T}}\|\,/\,\|A\|=0.05$ . Both $A$ and $\widehat{A}^{\,T}$ have full rank.

The exact solution $\bar{x}$ is the one from the shaw test problem [8]; it is smooth with two humps. Then we generate the exact and noisy right-hand sides by

[TABLE]

where the random elements of $e$ are Gaussian; all elements are from $\mathcal{N}(0,\sigma_{b}^{2})$ where the variance scales the noise as desired.

We applied both the BA Iteration and the Shifted BA Iteration with $B=\widehat{A}^{\,T}$ to these problems. For the BA Iteration we use the relaxation parameter $\omega=1.9\,/\,\|\widehat{A}^{\,T}A\|$ and for the Shifted BA Iteration we use $\omega$ equal to the upper bound in (13) with the factor 2 replaced by 1.9. For both systems we used eig to compute the eigenvalues; the spectral radius is $\rho(BA)=1.00$ by construction, and the leftmost eigenvalues are

[TABLE]

Note that for $A_{\mathrm{ill}}$ the leftmost eigenvalue is a complex conjugate pair with a negative real part. Hence we expect the BA Iteration to exhibit non-convergence for the problems with $A_{\mathrm{ill}}$ . To ensure convergence of the Shifted BA Iteration we choose

[TABLE]

The convergence histories for the noisy data ( $e\neq 0$ ) are shown in Figure 1; the similar plots for the noise-free data ( $e=0$ ) are almost identical. We make the following observations:

•

The left plot confirms that the BA Iteration converges only for the well-conditioned matrix for which all eigenvalues have a positive real part, and that it converges to $x^{*}=A_{\mathcal{N}(B)}^{\dagger}b=B\,(AB)^{-1}b$ .

•

The right plot confirms that the Shifted BA Iteration converges for both matrices, and that it converges to $x_{\alpha}^{*}=B\,(AB+\alpha I)^{-1}b$ . For the well-conditioned system the convergence is much faster compared to the ill-conditioned system. Also, the Shifted BA Iteration converges faster than the BA Iteration for the well-conditioned system.

Having confirmed the convergence of the methods, it is also relevant to study how well the methods are able to approximate the exact solution $\bar{x}$ . To illustrate this, Figure 2 shows plots of the reconstruction errors $\|x^{k}-\bar{x}\|\,/\,\|\bar{x}\|$ versus iteration number $k$ . We make several observations:

•

For the well-conditioned matrix $A_{\mathrm{well}}$ and noise-free data ( $e=0$ ) the BA Iteration converges to the exact solution $\bar{x}$ as predicted by Theorem 2.3 with square and full-rank $A$ and $B$ .

•

For the ill-conditioned matrix $A_{\mathrm{ill}}$ the BA Iteration always diverges.

•

For noise-free data ( $e=0$ ) the Shifted BA Iteration converges to a slightly perturbed solution $\bar{x}_{\alpha}^{\ast}$ with

[TABLE]

•

For noisy data ( $e\neq 0$ ) the BA Iteration for $A_{\mathrm{well}}$ , as well as the Shifted BA Iteration for both $A_{\mathrm{well}}$ and $A_{\mathrm{ill}}$ , converge to a fixed point that is quite far from the exact solution. Exactly the same behavior occurs for iterations that use a matching transpose. The main point is that for noisy data the iterations exhibit semi-convergence, where the iterates produce a good approximation to $\bar{x}$ during the initial iterations.

In conclusion, these experiments verify the benefit of using the Shifted BA Iteration, namely, guaranteed convergence while retaining the semi-convergence that all algebraic iterative methods rely on for noisy data.

5.2 Test Problem From the ASTRA Toolbox

The second test problem comes from X-ray computed tomography (CT) using a parallel-beam geometry with 90 projections in the angular range $0^{\circ}$ – $180^{\circ}$ , and a detector with 80 pixels and of length is equal to the image size. The exact solution $\bar{x}$ represents a $128\times 128$ discretization of the Shepp–Logan phantom. Hence the problem size is $m=7200$ and $n=16384$ , and the problem is underdetermined. For such underdetermined CT problems the algebraic iterative methods can give much better results than the “standard” methods based on filtered backprojection [17]. As before, the exact and noisy data are generated according to (21).

The matrices $A$ and $\widehat{A}^{\,T}$ that represent the forward and backprojections come from the ASTRA toolbox [22] used in conjunction with “spot operators” [23]. Specifically, we use the ASTRA function opTomo to compute the matrix-vector multiplications with these matrices. For the forward projection the GPU-version of ASTRA uses the interpolation model, also known as Joseph’s method [12], while the backprojection uses the line model with linear interpolation between detector pixels (as done, e.g., in MATLAB’s iradon). The matrices are not stored; if we store them then the sparsity of $A$ and $\widehat{A}^{\,T}$ is 1.32% and 2.35%, respectively. Measures of nonsymmetry and nonnormality of $BA$ are

[TABLE]

The spectral radius is $\rho(BA)\approx 1.76\cdot 10^{4}$ .

For this test problem we use the algorithms from Section 4 to estimate the leftmost eigenvalue of $BA$ , and we compare the performance of the following strategies:

•

eigs from MATLAB with options maxit = 1500, tol = $10^{-2}$ , SIGMA = ’sr’;

•

ks is the Krylov–Schur method (Algorithm 1) with options maxit = 1500, absolute tolerance tol = $10^{-2}$ , mindim = 30, maxdim = 60, target = ’-inf’ (for the leftmost eigenvalue);

•

jd is the Jacobi–Davidson method with options maxit = 1500, tol = $10^{-2}$ , mindim = 30, maxdim = 60, target = ’-inf’;

•

fovN is the field of values based method (Algorithm 2) with options maxit = N, mindim = 30, maxdim = 60.

Table 1 shows results for two cases:

All matrix-vector multiplications are performed on the GPU, using the ASTRA function opTomo. These multiplications are performed in single precision. The Jacobi–Davidson method jd did not converge, and this may be due to the single-precision computations on the GPU. 2. 2.

The two matrices $A$ and $\widehat{A}^{\,T}$ are explicitly computed and stored as sparse matrices. Like all the other computations, these computations are performed on the CPU in double precision.

We see that for these CT problems the Krylov–Schur method uses the least amount of computations, corresponding to the work in performing about 500 iterations of the Shifted BA Iteration method. When solving several CT problems with the same geometry, and hence the same matrices, this is acceptable. Even when an unmatched pair is used only once, this may be an acceptable overhead to ensure convergence and trust in the computed solution.

Figure 3 reports the work involved in the eigenvalue estimation with the Krylov-Schur method, as measured by matrix-vector multiplications (MVMs), for different numbers of image pixels $n=64^{2},128^{2},256^{2},512^{2},1024^{2},2048^{2}$ . The number of rows is $m\approx 0.45\,n$ . We see that for larger problems the number of MVMs seems to be proportional to $n^{1/3}$ .

Figure 4 shows the convergence histories when applying the two iterative algorithms to this problem, with $\alpha$ and $\omega$ chosen as before. We observe the same behavior as before: The BA Iteration diverges, while the Shifted BA Iteration converges to a fixed point. Moreover, the Shifted BA Iteration exhibits semi-convergence as expected, and the minimum reconstruction error — at the point of semi-convergence — does not deteriorate when using the shifted method.

6 Conclusions

We have considered algebraic iterative reconstruction methods with an unmatched backprojector, i.e., the backprojector is not the exact adjoint or transpose of the forward projector. In particular we are concerned with the common situation where the iterative method does not converge, due to the nonsymmetry of the iteration matrix. We propose a modified algorithm that uses a small shift parameter, we define the conditions that guarantee convergence to a fixed point of a slightly perturbed problem, and we give perturbation bounds for this fixed point. We also discuss how to efficiently estimate the leftmost eigenvalue of a certain matrix, which is needed to computed the shift parameter in the modified algorithm. Numerical experiments with artificial test problems as well as a test problem from computed tomography illustrate the use of the new algorithm. Our MATLAB code is available on request.

Acknowledgements

We thank Willem Jan Palenstijn (CWI) for sharing his insight into the ASTRA package. We acknowledge the inspiration from Tommy Elfving and Martin S. Andersen who, independently, suggested the shift as a remedy for the nonconvergence. Finally, we thank two anonymous referees for many valuable comments that helped to improve the paper.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Biguri, M. Dosanjh, S. Hancock, and M. Soleimani, TIGRE: a MATLAB-GPU toolbox for CBCT image reconstruction , Biomed. Phys. Eng. Express, 2 (2016), 055010; the software available from https://github.com/CERN/TIGRE .
2[2] T. Elfving and P. C. Hansen Unmatched projector/backprojector pairs: perturbation and convergence analysis , SIAM J. Sci. Comput., 40 (2018), pp. A 573–A 591.
3[3] T. Elfving, P. C. Hansen, and T. Nikazad, Semiconvergence and relaxation parameters for projected SIRT algorithms , SIAM J. Sci. Comput., 34 (2012), pp. A 2000–A 2017.
4[4] W. Hackbusch, Iterative Solution of Large Sparse Systems of Equations , Springer, New York, 2016.
5[5] P. C. Hansen, The discrete Picard condition for discrete ill-posed problems , BIT, 5 (1990), pp. 658–672.
6[6] P. C. Hansen, Test matrices for regularization methods , SIAM J. Sci. Comput., 16 (1995), pp. 506–512.
7[7] P. C. Hansen, Rank-Deficient and Discrete Ill-Posed Problems , SIAM, Philadelphia (1998).
8[8] P. C. Hansen, Regularization Tools Version 4.0 for Matlab 7.3 , Numer. Algorithms, 46 (2007), pp. 189–194.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Fixing Nonconvergence of Algebraic Iterative Reconstruction with

Abstract

keywords:

1 Introduction

2 The BA Iteration

Proposition 2.1**.**

Proof 2.2**.**

Theorem 2.3**.**

Proof 2.4**.**

Corollary 2.5**.**

Proof 2.6**.**

3 A Modified Algorithm

3.1 The Shifted BA Iteration

Theorem 3.1**.**

Proof 3.2**.**

Theorem 3.3**.**

Proof 3.4**.**

3.2 First-Order Perturbation Analysis

Theorem 3.5**.**

Proof 3.6**.**

Theorem 3.7**.**

4 Eigenvalue Estimation

5 Numerical Examples

5.1 Small Illustrative Test Problem

5.2 Test Problem From the ASTRA Toolbox

6 Conclusions

Acknowledgements

Proposition 2.1.

Proof 2.2.

Theorem 2.3.

Proof 2.4.

Corollary 2.5.

Proof 2.6.

Theorem 3.1.

Proof 3.2.

Theorem 3.3.

Proof 3.4.

Theorem 3.5.

Proof 3.6.

Theorem 3.7.