Analysis of the Block Coordinate Descent Method for Linear Ill-Posed   Problems

Simon Rabanser; Lukas Neumann; Markus Haltmeier

arXiv:1902.04794·math.NA·July 29, 2019·SIAM J. Imaging Sci.

Analysis of the Block Coordinate Descent Method for Linear Ill-Posed Problems

Simon Rabanser, Lukas Neumann, Markus Haltmeier

PDF

TL;DR

This paper analyzes the convergence of block coordinate descent (BCD) methods for linear inverse problems, demonstrating that under certain conditions, BCD with proper stopping criteria acts as a regularization method, supported by numerical experiments.

Contribution

The paper provides the first convergence analysis of BCD for inverse problems and shows it can serve as a regularization method under specific tensor product operator conditions.

Findings

01

BCD with stopping criteria converges for tensor product operators

02

Numerical experiments compare BCD and full gradient descent

03

Tests include linear and non-linear inverse problems

Abstract

Block coordinate descent (BCD) methods approach optimization problems by performing gradient steps along alternating subgroups of coordinates. This is in contrast to full gradient descent, where a gradient step updates all coordinates simultaneously. BCD has been demonstrated to accelerate the gradient method in many practical large-scale applications. Despite its success no convergence analysis for inverse problems is known so far. In this paper, we investigate the BCD method for solving linear inverse problems. As main theoretical result, we show that for operators having a particular tensor product form, the BCD method combined with an appropriate stopping criterion yields a convergent regularization method. To illustrate the theory, we perform numerical experiments comparing the BCD and the full gradient descent method for a system of integral equations. We also present numerical…

Figures37

Click any figure to enlarge with its caption.

Equations165

y^{δ} = A (x [1], \dots, x [B]) + z

y^{δ} = A (x [1], \dots, x [B]) + z

x_{k + 1}^{δ} := x_{k}^{δ} - s_{k}^{δ} A^{*} (A (x_{k}^{δ}) - y^{δ}),

x_{k + 1}^{δ} := x_{k}^{δ} - s_{k}^{δ} A^{*} (A (x_{k}^{δ}) - y^{δ}),

x_{k + 1}^{δ} [1] x_{k + 1}^{δ} [2] ⋮ x_{k + 1}^{δ} [B] = x_{k}^{δ} [1] x_{k}^{δ} [2] ⋮ x_{k}^{δ} [B] - s_{k}^{δ} A_{1}^{*} A_{2}^{*} ⋮ A_{B}^{*} (A (x_{k}^{δ}) - y^{δ}) .

x_{k + 1}^{δ} [1] x_{k + 1}^{δ} [2] ⋮ x_{k + 1}^{δ} [B] = x_{k}^{δ} [1] x_{k}^{δ} [2] ⋮ x_{k}^{δ} [B] - s_{k}^{δ} A_{1}^{*} A_{2}^{*} ⋮ A_{B}^{*} (A (x_{k}^{δ}) - y^{δ}) .

x_{k + 1}^{δ} [b] := x_{k}^{δ} [b] - s_{k}^{δ} {A_{b}^{*} (A (x_{k}^{δ}) - y^{δ}) 0 if b = b (k) otherwise,

x_{k + 1}^{δ} [b] := x_{k}^{δ} [b] - s_{k}^{δ} {A_{b}^{*} (A (x_{k}^{δ}) - y^{δ}) 0 if b = b (k) otherwise,

P_{b} = (e_{b} e_{b}^{T}) \otimes Id_{X} : X^{B} \to X^{B} : x [1] ⋮ x [b] ⋮ x [B] \mapsto 0 ⋮ x [b] ⋮ 0,

P_{b} = (e_{b} e_{b}^{T}) \otimes Id_{X} : X^{B} \to X^{B} : x [1] ⋮ x [b] ⋮ x [B] \mapsto 0 ⋮ x [b] ⋮ 0,

x_{k + 1}^{δ} := x_{k}^{δ} - s_{k}^{δ} P_{b (k)} A^{*} (A (x_{k}^{δ}) - y^{δ}) .

x_{k + 1}^{δ} := x_{k}^{δ} - s_{k}^{δ} P_{b (k)} A^{*} (A (x_{k}^{δ}) - y^{δ}) .

K_{B} := Id_{R^{B}} \otimes K : X \to Y : x [1] ⋮ x [B] \mapsto K (x [1]) ⋮ K (x [B])

K_{B} := Id_{R^{B}} \otimes K : X \to Y : x [1] ⋮ x [B] \mapsto K (x [1]) ⋮ K (x [B])

V_{Y} := V \otimes Id_{Y} : Y \to Y : y \mapsto b = 1 \sum B v_{b} y [b] .

V_{X} x_{k + 1}^{δ} = V_{X} x_{k}^{δ} - s_{k}^{δ} V_{X} P_{b (k)} A^{*} (A (x_{k}^{δ}) - y^{δ}) = V_{X} x_{k}^{δ} - s_{k}^{δ} ∥ v_{b (k)} ∥^{2} Q_{b (k)}^{X} K_{D}^{*} (A (x_{k}^{δ}) - y^{δ}) .

V_{X} x_{k + 1}^{δ} = V_{X} x_{k}^{δ} - s_{k}^{δ} V_{X} P_{b (k)} A^{*} (A (x_{k}^{δ}) - y^{δ}) = V_{X} x_{k}^{δ} - s_{k}^{δ} ∥ v_{b (k)} ∥^{2} Q_{b (k)}^{X} K_{D}^{*} (A (x_{k}^{δ}) - y^{δ}) .

Q_{b}^{X} := \frac{1}{∥ v _{b} ∥ ^{2}} (v_{b} v_{b}^{T}) \otimes Id_{X} : X \to X .

Q_{b}^{X} := \frac{1}{∥ v _{b} ∥ ^{2}} (v_{b} v_{b}^{T}) \otimes Id_{X} : X \to X .

r_{k}^{δ} := Q_{b (k)}^{Y} (y^{δ} - A (x_{k}^{δ})) .

r_{k}^{δ} := Q_{b (k)}^{Y} (y^{δ} - A (x_{k}^{δ})) .

\frac{1}{2} V_{X} x_{k + 1}^{δ} - V_{X} x^{*}^{2} - \frac{1}{2} V_{X} x_{k}^{δ} - V_{X} x^{*}^{2} \leq - s_{k}^{δ} r_{k}^{δ} ∥ v_{b (k)} ∥^{2} (r_{k}^{δ} - δ_{b (k)}) + \frac{( s _{k}^{δ} ) ^{2}}{2} V_{X} P_{b (k)} A^{*} (y^{δ} - A (x_{k}^{δ}))^{2} .

\frac{1}{2} V_{X} x_{k + 1}^{δ} - V_{X} x^{*}^{2} - \frac{1}{2} V_{X} x_{k}^{δ} - V_{X} x^{*}^{2} \leq - s_{k}^{δ} r_{k}^{δ} ∥ v_{b (k)} ∥^{2} (r_{k}^{δ} - δ_{b (k)}) + \frac{( s _{k}^{δ} ) ^{2}}{2} V_{X} P_{b (k)} A^{*} (y^{δ} - A (x_{k}^{δ}))^{2} .

0 \leq s_{k}^{δ} \leq \frac{2 r _{k}^{δ} ∥ v _{b (k)} ∥ ^{2} ( r _{k}^{δ} - δ _{b (k)} )}{V _{X} P _{b (k)} A ^{*} ( y ^{δ} - A ( x _{k}^{δ} )) ^{2}},

0 \leq s_{k}^{δ} \leq \frac{2 r _{k}^{δ} ∥ v _{b (k)} ∥ ^{2} ( r _{k}^{δ} - δ _{b (k)} )}{V _{X} P _{b (k)} A ^{*} ( y ^{δ} - A ( x _{k}^{δ} )) ^{2}},

\frac{1}{2} V_{X} x_{k + 1}^{δ} - V_{X} x^{*}^{2} - \frac{1}{2} V_{X} x_{k}^{δ} - V_{X} x^{*}^{2} \leq ⟨ V_{X} x_{k}^{δ} - V_{X} x^{*}, V_{X} x_{k + 1}^{δ} - V_{X} x_{k}^{δ} ⟩ + \frac{( s _{k}^{δ} ) ^{2}}{2} V_{X} P_{b (k)} A^{*} (y^{δ} - A (x_{k}^{δ}))^{2} .

\frac{1}{2} V_{X} x_{k + 1}^{δ} - V_{X} x^{*}^{2} - \frac{1}{2} V_{X} x_{k}^{δ} - V_{X} x^{*}^{2} \leq ⟨ V_{X} x_{k}^{δ} - V_{X} x^{*}, V_{X} x_{k + 1}^{δ} - V_{X} x_{k}^{δ} ⟩ + \frac{( s _{k}^{δ} ) ^{2}}{2} V_{X} P_{b (k)} A^{*} (y^{δ} - A (x_{k}^{δ}))^{2} .

⟨ V_{X} x_{k}^{δ} - V_{X} x^{*}, V_{X} x_{k + 1}^{δ} - V_{X} x_{k}^{δ} ⟩

⟨ V_{X} x_{k}^{δ} - V_{X} x^{*}, V_{X} x_{k + 1}^{δ} - V_{X} x_{k}^{δ} ⟩

= s_{k}^{δ} ∥ v_{b (k)} ∥^{2} ⟨ V_{X} (x_{k}^{δ} - x^{*}), Q_{b (k)}^{Y} K_{D}^{*} (y^{δ} - A (x_{k}^{δ})) ⟩

= s_{k}^{δ} ∥ v_{b (k)} ∥^{2} ⟨ K_{B} V_{X} (x_{k}^{δ} - x^{*}), Q_{b (k)}^{Y} (y^{δ} - A (x_{k}^{δ})) ⟩

= s_{k}^{δ} ∥ v_{b (k)} ∥^{2} ⟨ A (x_{k}^{δ}) - A (x^{*}), Q_{b (k)}^{Y} (y^{δ} - A (x_{k}^{δ})) ⟩

= s_{k}^{δ} ∥ v_{b (k)} ∥^{2} ⟨ A (x_{k}^{δ}) - y^{δ} + y^{δ} - A (x^{*}), Q_{b (k)}^{Y} (y^{δ} - A (x_{k}^{δ})) ⟩

\leq s_{k}^{δ} ∥ v_{b (k)} ∥^{2} (- ∥ Q_{b (k)}^{Y} (y^{δ} - A (x_{k}^{δ})) ∥^{2} + δ_{b (k)} ∥ Q_{b (k)}^{Y} (y^{δ} - A (x_{k}^{δ})) ∥)

\frac{1}{2}\left\|\mathcal{V}_{X}x_{k+1}^{\delta}-\mathcal{V}_{X}x^{*}\right\|^{2}-\frac{1}{2}\left\|\mathcal{V}_{X}x_{k}^{\delta}-\mathcal{V}_{X}x^{*}\right\|^{2}\leq s^{\delta}_{k}r^{\delta}_{k}\|v_{b(k)}\|^{2}\Bigl{(}\delta_{b(k)}-r^{\delta}_{k}\Bigr{)}\\ +\frac{(s^{\delta}_{k})^{2}}{2}\,\left\|\mathcal{V}_{X}\mathcal{P}_{b(k)}\mathcal{A}^{*}\left(y^{\delta}-\mathcal{A}(x_{k}^{\delta})\right)\right\|^{2}\,,

\frac{1}{2}\left\|\mathcal{V}_{X}x_{k+1}^{\delta}-\mathcal{V}_{X}x^{*}\right\|^{2}-\frac{1}{2}\left\|\mathcal{V}_{X}x_{k}^{\delta}-\mathcal{V}_{X}x^{*}\right\|^{2}\leq s^{\delta}_{k}r^{\delta}_{k}\|v_{b(k)}\|^{2}\Bigl{(}\delta_{b(k)}-r^{\delta}_{k}\Bigr{)}\\ +\frac{(s^{\delta}_{k})^{2}}{2}\,\left\|\mathcal{V}_{X}\mathcal{P}_{b(k)}\mathcal{A}^{*}\left(y^{\delta}-\mathcal{A}(x_{k}^{\delta})\right)\right\|^{2}\,,

x_{k + 1}^{δ}

x_{k + 1}^{δ}

d_{k}^{δ}

τ > 1 .

τ > 1 .

k \in N \sum d_{k}^{δ} s_{k}^{δ} ∥ v_{b (k)} ∥^{2} (r_{k}^{δ})^{2} \leq \frac{∥ V _{X} x _{0} - V _{X} x ^{*} ∥ ^{2}}{γ _{min} ( 2 - θ _{max} )},

k \in N \sum d_{k}^{δ} s_{k}^{δ} ∥ v_{b (k)} ∥^{2} (r_{k}^{δ})^{2} \leq \frac{∥ V _{X} x _{0} - V _{X} x ^{*} ∥ ^{2}}{γ _{min} ( 2 - θ _{max} )},

V_{X} x_{k}^{δ} - V_{X} x^{*}^{2} - V_{X} x_{k + 1}^{δ} - V_{X} x^{*}^{2} \geq (2 - θ_{max}) d_{k}^{δ} s_{k}^{δ} ∥ v_{b (k)} ∥^{2} (r_{k}^{δ})^{2} (1 - 1/ τ) .

V_{X} x_{k}^{δ} - V_{X} x^{*}^{2} - V_{X} x_{k + 1}^{δ} - V_{X} x^{*}^{2} \geq (2 - θ_{max}) d_{k}^{δ} s_{k}^{δ} ∥ v_{b (k)} ∥^{2} (r_{k}^{δ})^{2} (1 - 1/ τ) .

∥ V_{X} x_{k}^{δ}

∥ V_{X} x_{k}^{δ}

\geq 2 s_{k}^{δ} ∥ v_{b (k)} ∥^{2} r_{k}^{δ} (- δ_{b (k)} + r_{k}^{δ}) - (s_{k}^{δ})^{2} ∥ V_{X} P_{b (k)} A^{*} (y^{δ} - A (x_{k}^{δ})) ∥^{2}

\geq 2 s_{k}^{δ} ∥ v_{b (k)} ∥^{2} r_{k}^{δ} (- δ_{b (k)} + r_{k}^{δ}) - s_{k}^{δ} θ_{max} A_{k}^{δ} ∥ V_{X} P_{b (k)} A^{*} (y^{δ} - A (x_{k}^{δ})) ∥^{2}

= 2 s_{k}^{δ} ∥ v_{b (k)} ∥^{2} r_{k}^{δ} (- δ_{b (k)} + r_{k}^{δ}) - s_{k}^{δ} θ_{max} ∥ v_{b (k)} ∥^{2} r_{k}^{δ} (- δ_{b (k)} + r_{k}^{δ})

= (2 - θ_{max}) s_{k}^{δ} ∥ v_{b (k)} ∥^{2} r_{k}^{δ} (r_{k}^{δ} - δ_{b (k)})

\geq (2 - θ_{max}) s_{k}^{δ} ∥ v_{b (k)} ∥^{2} (r_{k}^{δ})^{2} (1 - 1/ τ) .

∥ V_{X} x_{0} - V_{X} x^{*} ∥^{2} \geq (2 - θ_{max}) γ_{min} k \in N \sum d_{k} s_{k}^{δ} ∥ v_{b (k)} ∥^{2} (r_{k}^{δ})^{2},

∥ V_{X} x_{0} - V_{X} x^{*} ∥^{2} \geq (2 - θ_{max}) γ_{min} k \in N \sum d_{k} s_{k}^{δ} ∥ v_{b (k)} ∥^{2} (r_{k}^{δ})^{2},

A_{k}^{δ} = \frac{∥ v _{b (k)} ∥ ^{2} r _{k}^{δ} ( r _{k}^{δ} - δ _{b (k)} )}{∥ V _{X} P _{b (k)} V _{X}^{*} K _{D}^{*} ( y ^{δ} - A ( x _{k}^{δ} )) ∥ ^{2}} \geq (1 - \frac{1}{τ}) \frac{∥ Q _{b (k)}^{Y} ( y ^{δ} - A ( x _{k}^{δ} )) ∥ ^{2}}{∥ Q _{b (k)}^{X} K _{D}^{*} ( y ^{δ} - A ( x _{k}^{δ} )) ∥ ^{2}} \geq γ_{min} \frac{∥ Q _{b (k)}^{Y} ( y ^{δ} - A ( x _{k}^{δ} )) ∥ ^{2}}{∥ K _{B}^{*} Q _{b (k)}^{Y} ( y ^{δ} - A ( x _{k}^{δ} )) ∥ ^{2}} \geq \frac{γ _{min}}{∥ K _{B}^{*} ∥ ^{2}} .

A_{k}^{δ} = \frac{∥ v _{b (k)} ∥ ^{2} r _{k}^{δ} ( r _{k}^{δ} - δ _{b (k)} )}{∥ V _{X} P _{b (k)} V _{X}^{*} K _{D}^{*} ( y ^{δ} - A ( x _{k}^{δ} )) ∥ ^{2}} \geq (1 - \frac{1}{τ}) \frac{∥ Q _{b (k)}^{Y} ( y ^{δ} - A ( x _{k}^{δ} )) ∥ ^{2}}{∥ Q _{b (k)}^{X} K _{D}^{*} ( y ^{δ} - A ( x _{k}^{δ} )) ∥ ^{2}} \geq γ_{min} \frac{∥ Q _{b (k)}^{Y} ( y ^{δ} - A ( x _{k}^{δ} )) ∥ ^{2}}{∥ K _{B}^{*} Q _{b (k)}^{Y} ( y ^{δ} - A ( x _{k}^{δ} )) ∥ ^{2}} \geq \frac{γ _{min}}{∥ K _{B}^{*} ∥ ^{2}} .

i_{1} = 0 \sum p - 1

i_{1} = 0 \sum p - 1

\leq i_{1} = 0 \sum p - 1 Q_{b (p i_{0} + i_{1})}^{Y} (y - A (x_{p i_{0} + i_{1}})) for all i_{0} \in {k_{0}, \dots, l_{0}} .

∥ ξ_{k} - ξ_{l} ∥ \leq ∥ ξ_{k} - ξ_{n} ∥ + ∥ ξ_{l} - ξ_{n} ∥

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Analysis of the Block Coordinate Descent Method for Linear Ill-Posed Problems

Simon Rabanser

Department of Mathematics, University of Innsbruck

Technikerstrasse 13, 6020 Innsbruck, Austria

E-mail: [email protected]

Lukas Neumann

Institute of Basic Sciences in Engineering Science, University of Innsbruck

Technikerstrasse 13, 6020 Innsbruck, Austria

E-mail: [email protected]

Markus Haltmeier

Department of Mathematics, University of Innsbruck

Technikerstrasse 13, 6020 Innsbruck, Austria

E-mail: [email protected]

Abstract

Block coordinate descent (BCD) methods approach optimization problems by performing gradient steps along alternating subgroups of coordinates. This is in contrast to full gradient descent, where a gradient step updates all coordinates simultaneously. BCD has been demonstrated to accelerate the gradient method in many practical large-scale applications. Despite its success no convergence analysis for inverse problems is known so far. In this paper, we investigate the BCD method for solving linear inverse problems. As main theoretical result, we show that for operators having a particular tensor product form, the BCD method combined with an appropriate stopping criterion yields a convergent regularization method. To illustrate the theory, we perform numerical experiments comparing the BCD and the full gradient descent method for a system of integral equations. We also present numerical tests for a non-linear inverse problem not covered by our theory, namely one-step inversion in multi-spectral X-ray tomography.

Keywords: ill-posed problems, convergence analysis, regularization theory, coordinate descent, multi-spectral CT

MSC2010: 65J20, 44A12, 47J06.

1 Introduction

We consider the solution of inverse problems of the form

[TABLE]

by block coordinate gradient descent (BCD) methods. Here $\mathcal{A}\colon\mathcal{X}\to\mathcal{Y}$ is a linear forward operator between Hilbert spaces $\mathcal{X}=X_{1}\times\cdots\times X_{B}$ and $\mathcal{Y}$ . Moreover, $x=(x[1],\dots,x[B])\in\mathcal{X}$ is the vector of blocks $x[b]\in X_{b}$ of unknown variables, $y^{\delta}\in\mathcal{Y}$ are the given noisy data, and $z$ denotes the data perturbation that satisfies $\left\|z\right\|\leq\delta$ for some noise level $\delta\geq 0$ .

For many inverse problems, the individual blocks $x[b]$ arise in a natural manner and might correspond to $x[b]=f[b]$ , where $f[b]\colon\Omega_{b}\to\mathbb{R}$ are functions modeling unknown spatially varying parameter distributions. The blocks might also be formed by applying domain decomposition $\Omega=\Omega_{1}\cup\Omega_{2}\cup\ldots\cup\Omega_{B}$ to a single function $f\colon\Omega\to\mathbb{R}$ , and defining $x[b]={f}|{\Omega_{b}}$ as the restriction of $f$ to $\Omega_{b}$ .

1.1 Iterative regularization methods

The characteristic feature of inverse problems is their ill-posedness which means that the solution of (1.1) is unstable with respect to data perturbations. In such a situation, one has to apply regularization methods to obtain solutions in a stable way. There are at least two basic classes of regularization methods: iterative regularization and variational regularization [6, 23]. In this paper we consider iterative regularization and introduce and analyze BCD as new member of this class of regularization methods.

The most established iterative regularization approaches for inverse problems are the Landweber iteration and its variants [11, 14, 9, 19]

[TABLE]

where $x_{0}^{\delta}:=x_{0}\in\mathcal{X}$ is an initial guess, $s^{\delta}_{k}$ is the step size and $\mathcal{A}^{*}$ denotes the adjoint of $\mathcal{A}$ . If the step size is taken constant, then (1.2) is the Landweber iteration [14, 9]. Other step size rules yield the steepest descent and the minimal error method [20] or a more recent variant analyzed in [19]. Kaczmarz type variants of (1.2) for systems of ill-posed equations have been analyzed in [5, 8, 7, 13, 15, 16]. Kaczmarz methods make use of a product structure of the image space $\mathcal{Y}$ , and are in this sense dual to BCD methods which exploit the product structure of the pre-image space $\mathcal{X}$ .

We consider the product form $\mathcal{X}=X_{1}\times\cdots\times X_{B}$ , where the forward operator can be written as $\mathcal{A}=[\mathcal{A}_{1},\dots,\mathcal{A}_{B}]$ . As a consequence, the Landweber iteration takes the form

[TABLE]

We see that each iterative update requires computing $B$ separate updates, one for each of the blocks.

1.2 Block coordinate descent (BCD)

In order to simplify the iterative update in (1.3), a natural idea is to update only a single block in each iteration. This results in the BCD iteration

[TABLE]

where the control $b(k)\in\left\{1,\dots,B\right\}$ selects the block that is updated in the $k$ th iteration. If we apply the BCD iteration to exact data where $\delta=0$ , we write $x_{k}$ instead of $x_{k}^{\delta}$ . Rigorously studying the iteration (1.4) in the context of ill-posed problems is the main aim of this paper. To guarantee convergence in the noisy case we will slightly modify the update rule of the BCD iteration by including a loping strategy which skips the $k$ th iterative step if a certain residual term is sufficiently small (see Definition 2.4). Under the reasonable assumption that the complexity of evaluating $\mathcal{A}$ is essentially $B$ -times the complexity $M$ of evaluating $\mathcal{A}^{*}_{b}$ , then one step of the Landweber Method has complexity $\mathcal{O}(2BM)$ , whereas one step of the BCD method has complexity $\mathcal{O}((B+1)M)$ . For the special form of $\mathcal{A}$ considered in the following section, the complexity of one step of the BCD method even reduces to $\mathcal{O}(2M)$ ; see Remark 2.2.

Note that the iteration (1.4) arises by applying the block gradient descent method, well known in optimization [3, 18, 22, 24], to the residual functional $\frac{1}{2}\|y^{\delta}-\mathcal{A}(x)\|^{2}$ . In a finite dimensional setting, BCD and other coordinate descent type methods are well studied. However, existing convergence results mostly analyze convergence in the objective value. This only implies convergence in pre-image space, if the residual functional is strongly convex. Strong convexity does not hold for ill-posed problems. Therefore, existing convergence results and methods cannot be applied to ill-posed inverse problems. Note that removing the strict convexity assumption can also also be achieved by coupling the BCD method with a proximal term; see [4] and the references therein.

To the best of our knowledge, no convergence result for (1.4) in the ill-posed setting is available. As the main contribution in this paper we will present a convergence analysis of BCD applicable to the ill-posed case. We show that under assumptions specified in Section 2, for operators having a particular tensor product form, the BCD iteration yields a regularization method for solving ill-posed linear problems.

1.3 Outline

This paper is organized as follows. In Section 2 we present the main assumptions made in this paper, derive an auxiliary results and introduce the loping strategy. In Section 3 we present the convergence analysis. In the exact data case, we show that the BCD iteration converges to a solution $x^{*}$ of the given equation as $k\to\infty$ . In the noisy data case we show that the stopping index of the loping BCD iteration is finite and the corresponding iterates converge to $x^{*}$ as $\delta\to 0$ . To illustrate the theory, in Section 4 we compare the BCD method with the gradient method for a system of integral equations. Additionally, in Section 5 we consider a non-linear example not covered by our theory, namely one-step inversion in multi-spectral X-ray tomography [21, 12, 1, 2]. The paper concludes with a short discussion presented in Section 6.

2 Preliminaries

In this section we formulate the main assumptions and derive basic results that we will use in the convergence analysis presented in Section 3.

Note that for any Hilbert space $X$ we can write $X^{B}\simeq\mathbb{R}^{B}\otimes X$ . For any $b\in\left\{1,\dots,B\right\}$ we define the projection operators

[TABLE]

where $e_{b}$ denotes the $b$ th standard basis vector in $\mathbb{R}^{B}$ , defined by $e_{b}[b]=1$ and $e_{b}[b^{\prime}]=0$ for $b^{\prime}\neq b$ . Using (2.1), the BCD method (1.4) can be written in the compact form

[TABLE]

Here $b(k)\in\{1,\dots,B\}$ is the selected block at the $k$ th iteration, $s^{\delta}_{k}>0$ is the step size, and $x_{0}^{\delta}:=x_{0}\in\mathcal{X}$ is some initial guess. Recall that in the case of exact data we write $x_{k}$ instead of $x_{k}^{\delta}$ .

2.1 Main assumptions

We note that the main difficulty we encountered in the convergence analysis of the BCD method for ill-posed problems is that even for exact data $y=\mathcal{A}(x^{*})$ , the error $\left\|x_{k}-x^{*}\right\|$ is not monotonically decreasing, except for some very special cases. This can be easily verified for linear operators in $\mathbb{R}^{B}$ . On the other hand, the BCD is monotonically decreasing in the objective value, which is used in existing convergence theory for optimization problems [3, 18, 22, 24]. However, this cannot be used directly for the convergence analysis in the ill-posed setting where the value of the residual functional gives no bounds for the error $\left\|x_{k}-x^{*}\right\|$ .

We present a complete convergence analysis under the following assumption that allows to separate the difficulties due to the ill-posedness and due to the non-monotonicity.

Assumption 2.1 (Main conditions for the convergence analysis).

**

(A1)

$\mathcal{X}$ , $\mathcal{Y}$ are Hilbert spaces of the form $\mathcal{X}=X^{B}$ , $\mathcal{Y}=Y^{D}$ with $D,B\in\mathbb{N}$ . 2. (A2)

$\mathcal{A}\colon\mathcal{X}\to\mathcal{Y}$ * has the form $\mathcal{A}=V\otimes K$ , where*

$\blacksquare$

$K\colon X\to Y$ * is bounded linear;*

$\blacksquare$

$V\in\mathbb{R}^{D\times B}$ * has rank $B$ and non-vanishing columns $v_{b}\in\mathbb{R}^{D}$ ;* 3. (A3)

The control $b\colon\mathbb{N}\to\left\{1,\dots,B\right\}$ satisfies

$\exists p\in\mathbb{N}\;\forall k\in\mathbb{N}\colon\left\{b(k),\dots,b(k+p-1)\right\}=\left\{1,\dots,B\right\}$ .

Let us introduce the operators

[TABLE]

In a similar manner we denote $\mathcal{K}_{D}:=\operatorname{Id}_{\mathbb{R}^{D}}\otimes\,K$ and $\mathcal{V}_{X}:=V\otimes\operatorname{Id}_{X}$ . Then we have $\mathcal{A}=\mathcal{V}_{Y}\circ\mathcal{K}_{B}=\mathcal{K}_{D}\circ\mathcal{V}_{X}$ .

To overcome the above mentioned obstacles in the convergence analysis we will study the auxiliary sequence $(\mathcal{V}_{X}x_{k}^{\delta})_{k\in\mathbb{N}}$ which, by linearity, satisfies

[TABLE]

Here we have set

[TABLE]

We will also use the notation $\mathcal{Q}^{Y}_{b}:=\|v_{b}\|^{-2}(v_{b}v_{b}^{\mathsf{T}})\otimes\operatorname{Id}_{Y}$ . As an important auxiliary result we will show monotonicity for $(\mathcal{V}_{X}x_{k}^{\delta})_{k\in\mathbb{N}}$ . This allows us to show that the BCD method combined with a loping strategy is a convergent regularization method. In fact, this is the reason for requiring the forward operator $\mathcal{A}$ to have the particular tensor product form specified in assumption (A2). The convergence analysis in the more general setting is still an open and challenging problem.

Note that the assumption $\operatorname{rank}(V)=B$ is only necessary for the convergence of $(\mathcal{V}_{X}x_{k})_{k\in\mathbb{N}}$ implying convergence of $(x_{k})_{k\in\mathbb{N}}$ . In the case that $V$ has arbitrary rank, the main convergence results still hold true for the semi-norm $\left\|\mathcal{V}_{X}(\,\cdot\,)\right\|$ in place of the norm $\left\|\,\cdot\,\right\|$ .

Remark 2.2 (Numerical complexity).

For the considered form $\mathcal{A}=V\otimes K$ and a cyclic control $b(k)=((k-1)\operatorname{mod}B)+1$ , one cycle of updates with the BCD method for $k\in\left\{\ell B,\dots,(\ell+1)B-1\right\}$ has essentially the same numerical complexity as one iteration with the Landweber iteration. To see this, we implement the BCD method in the following manner:

(S1)

Initialization: $\forall b=1,\dots,B$ do

$\blacksquare$

$x_{\rm BCD}[b]\leftarrow x_{0}[b]$ **

$\blacksquare$

$h_{\rm BCD}[b]\leftarrow K(x_{\rm BCD}[b])$ . 2. (S2)

Updates: $\forall i_{0}=1,\dots,N_{\rm cycle}\forall b=1,\dots,B$ do

$\blacksquare$

$x_{\rm BCD}[b]\leftarrow x_{\rm BCD}[b]-s_{k}K^{*}((\mathcal{V}_{Y}^{*}(\mathcal{V}_{Y}h_{\rm BCD}-y^{\delta}))[b])$ **

$\blacksquare$

$h_{\rm BCD}[b]\leftarrow K(x_{\rm BCD}[b])$ .

Complexity of the above procedure is dominate by the evaluation of $K$ , $K^{*}$ and the evaluation of $\mathcal{V}_{Y}$ , $\mathcal{V}_{Y}^{*}$ . Unless $B$ is very large (or evaluating $K$ , $K^{*}$ is cheap), for typical inverse problems, the dominating parts are $K$ , $K^{*}$ . This shows that the complexity of one cycle of the BCD iteration in fact is similar to the complexity of one iteration of the Landweber iteration.

2.2 Monotonicity

The following lemma is an important auxiliary result, which will be used at several places throughout this article.

Lemma 2.3 (Monotonicity).

Let $x^{*}\in\mathcal{X}$ satisfy $\mathcal{A}(x^{*})=y$ and set

[TABLE]

Then, the following estimate holds:

[TABLE]

In particular, if $\|\mathcal{Q}^{Y}_{b}(y-y^{\delta})\|\leq\delta_{b}$ and $r^{\delta}_{k}\geq\delta_{b(k)}$ and if the step size is chosen such that

[TABLE]

then $\|\mathcal{V}_{X}x_{k+1}^{\delta}-\mathcal{V}_{X}x^{*}\|^{2}\leq\|\mathcal{V}_{X}x_{k}^{\delta}-\mathcal{V}_{X}x^{*}\|^{2}$ .

Proof.

Equation (2.3) implies

[TABLE]

We have

[TABLE]

By combining (2.8) with the above estimate, we obtain

[TABLE]

which is the desired estimate (2.6). If $s^{\delta}_{k}$ is chosen according to (2.7), then the right hand side in inequality (2.6) is less or equal to 0, which implies $\|\mathcal{V}_{X}x_{k+1}^{\delta}-\mathcal{V}_{X}x^{*}\|^{2}\leq\|\mathcal{V}_{X}x_{k}^{\delta}-\mathcal{V}_{X}x^{*}\|^{2}$ . ∎

2.3 Loping BCD and discrepancy principle

From Lemma 2.3 we see that if the residual term $r^{\delta}_{k}=\|\mathcal{Q}^{Y}_{b(k)}(y^{\delta}-\mathcal{A}(x_{k}^{\delta}))\|$ satisfies (2.5), then the error $\|\mathcal{V}_{X}x_{k}^{\delta}-\mathcal{V}_{X}x^{*}\|$ is decreasing. In the case that (2.5) does not hold, then an iterative update might increase the value of $\|\mathcal{V}_{X}x_{k}^{\delta}-\mathcal{V}_{X}x^{*}\|$ . Following a similar strategy introduced in [8, 5] for Kaczmarz type iterative method we therefore modify (2.2) by introducing a loping strategy as follows.

Definition 2.4 (Loping BCD).

We define the loping BCD method by

[TABLE]

where $r^{\delta}_{k}=\|\mathcal{Q}^{Y}_{b(k)}(y^{\delta}-\mathcal{A}(x_{k}^{\delta}))\|$ is as in Equation (2.5), and

[TABLE]

In the case of exact data, we have $d^{\delta}_{k}=1$ and the loping BCD iteration reduces to the standard BCD. In the noisy data case the loping parameters $d^{\delta}_{k}$ ensure that no update is made if we cannot guarantee that an update would decrease $\|\mathcal{V}_{X}x_{k}^{\delta}-\mathcal{V}_{X}x^{*}\|$ . Note that the choice of $\tau$ as in (2.11) implies that condition (2.5) is satisfied whenever we have $d^{\delta}_{k}=1$ . For the loping BCD, Lemma 2.3 therefore implies that the error term $\|\mathcal{V}_{X}x_{k}^{\delta}-\mathcal{V}_{X}x^{*}\|$ is in fact monotonically decreasing. Moreover, we can show the following.

Lemma 2.5 (Summability of squared residuals).

Let $x^{*}\in\mathcal{X}$ satisfy $\mathcal{A}(x^{*})=y$ . Then the residuals $r^{\delta}_{k}:=\|\mathcal{Q}^{Y}_{b(k)}(y^{\delta}-\mathcal{A}(x_{k}^{\delta}))\|$ of the loping BCD iteration (2.9), (2.10) satisfy

[TABLE]

where, $s^{\delta}_{k}$ , $\gamma_{\rm min}$ , $\theta_{\rm max}$ are chosen such that

(S1)

$\forall k\in\mathbb{N}\colon d^{\delta}_{k}=1\Rightarrow s^{\delta}_{k}\in(0,2A_{k}^{\delta})$ * with $A_{k}^{\delta}:=\frac{\|v_{b(k)}\|^{2}r^{\delta}_{k}(r^{\delta}_{k}-\delta_{b(k)})}{\left\|\mathcal{V}_{X}\mathcal{P}_{b(k)}\mathcal{A}^{*}(y^{\delta}-\mathcal{A}(x_{k}^{\delta}))\right\|^{2}}$ ;* 2. (S2)

$\forall k\in\mathbb{N}\colon d^{\delta}_{k}=1\Rightarrow\theta_{k}:=s^{\delta}_{k}/A_{k}^{\delta}\leq\theta_{\rm max}<2$ ; 3. (S3)

$1-1/\tau\geq\gamma_{\rm min}>0$ .

Proof.

We first show

[TABLE]

If $r^{\delta}_{k}<\tau\delta$ , then $d^{\delta}_{k}=0$ and $x_{k+1}^{\delta}=x_{k}^{\delta}$ and therefore (2.13) holds with equality. If $r^{\delta}_{k}\geq\tau\delta$ , application of Lemma 2.3, (S2) and (S1) yield

[TABLE]

This shows (2.13) with $d^{\delta}_{k}=1$ in (2.10).

Summing (2.13) over all $k\in\mathbb{N}$ and using (S3) we obtain

[TABLE]

which shows (2.12) after dividing by $(2-\theta_{\rm max})\gamma_{\rm min}$ . ∎

Remark 2.6.

Note the conditions for the step sizes in Lemma 2.5 are inspired by [19], where a new step size rule for the gradient method for ill-posed problems has been introduced. From the definitions of $r^{\delta}_{k},d^{\delta}_{k}$ we obtain $r^{\delta}_{k}-\delta_{b(k)}\geq=(1-1/\tau)r^{\delta}_{k}$ . Moreover, recall that $\mathcal{V}_{X}\mathcal{P}_{b(k)}\mathcal{V}_{X}^{*}=\|v_{b(k)}\|^{2}\mathcal{Q}^{X}_{b(k)}$ . Consequently,

[TABLE]

This implies that we can choose the step sizes bounded from below. In particular, (2.12) holds for any constant step size choice $s_{k}^{\delta}=s_{\star}\in(0,\gamma_{\rm min}/{\|\mathcal{K}_{B}^{*}\|^{2}}]$ .

3 Convergence Analysis of the BCD method

Throughout the following, let Assumption 2.1 be satisfied. Moreover, we assume that the step sizes satisfy $s_{\rm min}\leq s_{k}^{\delta}\leq s_{\rm max}$ for some numbers $s_{\rm min}\leq s_{\rm max}$ independent of the iteration index $k\in\mathbb{N}$ and the noise level $\delta\geq 0$ , and that (S1)-(S3) in Lemma 2.5 hold.

3.1 Convergence for exact data

In this subsection we show convergence of the BCD iteration in the noise free case. The proof closely follows [5, 13].

Theorem 3.1 (Convergence of BCD for exact data).

In the exact data case $\delta=0$ , the BCD iteration $(x_{k})_{k\in\mathbb{N}}$ defined by (2.2), satisfies $x_{k}\to x^{\boldsymbol{\texttt{+}}}$ , where $x^{\boldsymbol{\texttt{+}}}$ is the solution of $\mathcal{A}(x)=y$ with minimal distance to $x_{0}$ .

Proof.

Let $x^{*}\in\mathcal{X}$ satisfy $\mathcal{A}(x^{*})=y$ and define $\xi_{k}:=\mathcal{V}_{X}x_{k}-\mathcal{V}_{X}x^{*}$ . We will show that $(\xi_{k})_{k\in\mathbb{N}}$ is a Cauchy sequence. For $k=k_{0}p+k_{1}$ and $l=l_{0}p+l_{1}$ with $k\leq l$ and $k_{1},l_{1}\in\{0,\dots,p-1\}$ , let $n_{0}\in\{k_{0},\dots,l_{0}\}$ be such that

[TABLE]

With $n:=pn_{0}+p-1$ we have

[TABLE]

and

[TABLE]

According to Lemma 2.3, the nonnegative sequence $(\left\|\xi_{k}\right\|)_{k\in\mathbb{N}}$ is monotonically decreasing and therefore converges to some $\epsilon\geq 0$ . Consequently, the last two terms in equations (3.3) and (3.4) converge to $\varepsilon^{2}-\varepsilon^{2}=0$ for $k\to\infty$ . In order to show that also $\left\langle\xi_{n}-\xi_{k},\xi_{n}\right\rangle$ and $\left\langle\xi_{n}-\xi_{l},\xi_{n}\right\rangle$ converge to zero, we set $i^{*}:=pn_{0}+i_{1}$ . Then using the definition of the BCD method in (2.2) for $i\in\{0,\dots,p-1\}$ we obtain

[TABLE]

with $v_{\rm max}:=\max\left\{\|v_{1}\|,\dots,\|v_{B}\|\right\}$ . Further we obtain

[TABLE]

Substituting the estimate in (3.5), using the inequality $(\sum_{i=0}^{p-1}a_{i})^{2}\leq p\sum_{i=0}^{p-1}a_{i}^{2}$ and (3.1) one concludes

[TABLE]

where we defined $C:=s_{\rm max}v_{\rm max}^{2}(2p+s_{\rm max}\|\mathcal{K}_{B}\|^{2}v_{\rm max}^{2}p)$ . Finally, we have

[TABLE]

Because of Lemma 2.5, the last sum converges to zero for $k=pk_{0}+k_{1}\to\infty$ which implies $\left|\left\langle\xi_{n}-\xi_{k},\xi_{n}\right\rangle\right|\to 0$ . Similarly, one shows $\left|\left\langle\xi_{n}-\xi_{l},\xi_{n}\right\rangle\right|\to 0$ . Therefore, $\xi_{k}$ is Cauchy sequence and $\mathcal{V}_{X}x_{k}=\mathcal{V}_{X}x^{*}-\xi_{k}$ tends to an element $\mathcal{V}_{X}x^{\boldsymbol{\texttt{+}}}$ with $x^{\boldsymbol{\texttt{+}}}\in\mathcal{X}$ . Because $V$ has rank $B$ and $\|\mathcal{Q}^{Y}_{b(i)}(y-\mathcal{A}(x_{i}))\|\to 0$ , the element $x^{\boldsymbol{\texttt{+}}}$ is a solution of $\mathcal{A}(x)=y$ . Further,

[TABLE]

Because $\operatorname{ker}(\mathcal{A})^{\perp}$ is closed, its follows that $x^{*}-x_{0}\in\operatorname{ker}(\mathcal{A})^{\perp}$ . Since $x^{\boldsymbol{\texttt{+}}}$ is the only solution for which the latter holds true, we obtain $x_{k}\to x^{\boldsymbol{\texttt{+}}}$ . ∎

3.2 Convergence for noisy data

In the noisy data case, we consider the loping version of the BCD. The iteration is terminated when for the first time all $x^{\delta}_{k}$ are equal within a cycle. That is, we stop the iteration at

[TABLE]

To simplify the notation, we assume that $\delta_{b}=\delta$ for all $b\in\left\{1,\dots,B\right\}$ . We first show that the stopping index is always finite.

Proposition 3.2 (Existence of stopping index).

If $\delta>0$ , then the stopping index $k_{*}^{\delta}$ defined in (3.9) is finite, and we have

[TABLE]

Proof.

If for every $k\in\mathbb{N}$ , there exists $\ell\in\left\{0,\dots,p-1\right\}$ such that $x_{k+\ell}\neq x_{k}$ , then from Lemma 2.5 we obtain

[TABLE]

where $C>0$ is a lower bound of $s^{\delta}_{k}\|v_{b(k)}\|^{2}$ . The right hand side of (3.11) tends to infinity, which gives a contradiction. Consequently, the set $\{k\in\mathbb{N}\mid x_{k}^{\delta}=x_{k+1}^{\delta}=\cdots=x_{k+p-1}^{\delta}\}$ is nonempty and therefore contains a finite minimal element.

To prove (3.10) note that the finiteness of the stopping index and the definition of the loping BCD implies $\|\mathcal{Q}^{Y}_{b(k_{*}^{\delta}+\ell)}(y^{\delta}-\mathcal{A}(x_{k_{*}^{\delta}}^{\delta}))\|<\tau\delta$ for $\ell=0,\dots,p-1$ . The assumption (A3) on the control sequence $b(k)$ thus gives (3.10). ∎

We call the step size selection $(s^{\delta}_{k})_{k\in\mathbb{N}}$ continuous at $\delta=0$ if for all $k\in\mathbb{N}$ we have

[TABLE]

An example for a continuous step size selection is the constant strep size $s_{k}^{\delta}=\gamma_{\rm min}/\|\mathcal{K}_{B}\|^{2}$ . The next auxiliary result shows that the continuity of the step size selection implies continuity of $x_{k}^{\delta}$ at $\delta=0$ .

Lemma 3.3 (Continuity of the BCD iteration at $\delta=0$ ).

Suppose the step selection is continuous at $\delta=0$ , and define

[TABLE]

Then, for all $k\in\mathbb{N}$ , we have

[TABLE]

Moreover, $x_{k+1}^{\delta}\to x_{k+1}$ , as $\delta\to 0$ .

Proof.

We prove Lemma 3.3 by induction. Assume $k\geq 0$ and that (3.13) holds for all $k^{\prime}<k$ . First we note that (3.13) implies $x_{k+1}^{\delta}\to x_{k+1}$ , as $\delta\to 0$ . For the proof of (3.13) we consider two cases. In the first case, $d^{\delta}_{k}=1$ , we have

[TABLE]

In the second case, $d^{\delta}_{k}=0$ , we have $\left\|\mathcal{Q}^{Y}_{b(k)}(y^{\delta}-\mathcal{A}(x_{k}^{\delta}))\right\|<\tau\delta$ . Consequently,

[TABLE]

Now (3.13) follows from the continuity of $\mathcal{A}$ , and the induction hypothesis implying $x_{k}^{\delta}\to x_{k}$ . ∎

Theorem 3.4 (Convergence of the loping BCD for noisy data).

Suppose the step selection $(s^{\delta}_{k})_{k\in\mathbb{N}}$ is continuous at $\delta=0$ . Let $(\delta_{j})_{j\in\mathbb{N}}\in(0,\infty)^{\mathbb{N}}$ converge to [math] and let $(y_{j})\in\mathcal{Y}^{\mathbb{N}}$ be a sequence of noisy data with $\|\mathcal{Q}_{b}^{Y}(y_{j}-y)\|\leq\delta_{j}$ . Let $(x_{j,k})_{k\in\mathbb{N}}$ be defined by the loping BCD iteration with data $y_{j}$ and stopped at $k_{j}:=k_{*}(\delta_{j},y_{j})$ according to (3.9). Then $(x_{j,k_{j}})_{j\in\mathbb{N}}\to x^{\boldsymbol{\texttt{+}}}$ , where $x^{\boldsymbol{\texttt{+}}}$ is the solution of $\mathcal{A}(x)=y$ with minimal distance to $x_{0}$ .

Proof.

From Lemma 3.3 and the continuity of $\mathcal{A}$ we have, for any $k\in\mathbb{N}$ , that $x_{j,k}\to x_{k}$ and $\mathcal{A}(x_{j,k})\to\mathcal{A}(x_{k})$ as $j\to\infty$ .

To show that $x_{j,k_{j}}\to x^{\boldsymbol{\texttt{+}}}$ , we first assume that $k_{j}$ has a finite accumulation point $k_{*}$ . Without loss of generality we may assume that $k_{j}=k_{*}$ for all $j\in\mathbb{N}$ . From Proposition 3.2 we know that $\|\mathcal{Q}^{Y}_{b}(y_{j}-\mathcal{A}(x_{j_{,}k_{*}}))\|<\tau\delta_{j}$ . By taking the limit $j\to\infty$ , we obtain $y=\mathcal{A}(x_{k_{*}})$ . Consequently, $x_{k_{*}}=x^{\boldsymbol{\texttt{+}}}$ and $x_{j_{,}k_{*}}\to x^{*}$ as $j\to\infty$ . It remains to consider the case where $k_{j}\to\infty$ as $j\to\infty$ . To that end let $\epsilon>0$ . Without loss of generality we assume that $k_{j}$ is monotonically increasing. According to Theorem 3.1 we can choose $n\in\mathbb{N}$ such that $\left\|\mathcal{V}_{X}x_{k_{n}}-\mathcal{V}_{X}x^{\boldsymbol{\texttt{+}}}\right\|<\epsilon/2$ . Equation (3.13) implies that there exists $j_{0}>n$ such that $\left\|\mathcal{V}_{X}x_{j,k_{n}}-\mathcal{V}_{X}x_{k_{n}}\right\|<\epsilon/2$ for all $j\geq j_{0}$ . Together with the monotonicity we obtain

[TABLE]

Because $\mathcal{V}_{Y}$ is non-singular, this shows $x_{j,k_{j}}\to x^{\boldsymbol{\texttt{+}}}$ as $j\to\infty$ . ∎

4 Example: System of linear integral equation

In this section we compare the BCD method to the standard Landweber method for an elementary system of linear integral equations.

4.1 Forward problem

Consider the integration operator $K\colon L^{2}([0,1])\to L^{2}([0,1])$ defined by

[TABLE]

According to the Cauchy-Schwarz inequality, we have

[TABLE]

for all $f\in L^{2}([0,1])$ , which shows that $K$ is a well-defined linear bounded operator. Using the operator $K$ we consider the following forward model applied to a vector of functions $(f[b])_{b=1}^{B}\in(L^{2}([0,1]))^{B}$ .

Definition 4.1.

For $D\geq B\geq 1$ and given matrix $V=(v_{d,b})_{d,b}\in\mathbb{R}^{D\times B}$ of rank $B$ , we define the forward operator

[TABLE]

According to our general notion we have $\mathcal{A}=V\otimes K$ and the theory presented in the previous section can be applied for solving the inverse problem $\mathcal{A}(f)=g$ . Note that this equation clearly is ill-posed because the range of $K$ is non-closed (and equal to the Sobolev space $H^{1}_{\diamond}([0,1]):=\left\{g\in L^{2}([0,1])\mid g^{\prime}\in L^{2}([0,1])\wedge g(0)=0\right\}$ of all weakly differentiable functions vanishing at [math].)

More generally, one could replace the integration operator by any bounded (integral) operator $K\colon L^{2}([0,1])\to L^{2}([0,1])$ with non-closed range.

4.2 Reconstruction results

For all presented numerical results we use $B=D=2$ and take $V=\tilde{V}/{\|\tilde{V}\|}_{2,2}$ with

[TABLE]

We discretize $K$ with the composite trapezoidal rule using $p=100$ intervals such that the data and the unknowns are elements in $(\mathbb{R}^{p})^{2}$ . The true unknown $f^{*}=(f^{*}[1],f^{*}[2])$ and the noisy data $g^{\delta}=(g^{\delta}[1],g^{\delta}[2])$ are shown in Figure 4.1. The exact data $g=\mathcal{A}f^{*}$ have been computed via numerical integration followed by application of $V$ . Subsequently we computed noisy data by adding random white noise to $y$ with a standard deviation of $0.001$ . The resulting relative data errors are $\left\|g-g^{\delta}\right\|/\left\|g\right\|\simeq 0.015$ , $\left\|\mathcal{Q}_{1}(g-g^{\delta})\right\|/\left\|\mathcal{Q}_{1}g\right\|\simeq 0.011$ and $\left\|\mathcal{Q}_{2}(g-g^{\delta})\right\|/\left\|\mathcal{Q}_{2}g\right\|\simeq 0.012$ , respectively.

Reconstruction using the BCD and Landweber methods from simulated data are shown in Figure 4.2. For each case we have used the maximum constant step-size, that lead to stable reconstruction. We evaluate the reconstruction error (norm of $f_{k}-f^{*}$ ) in terms of the standard 2-norm $\left\|\,\cdot\,\right\|_{2}$ and in the $V$ -norm $\left\|\,\cdot\,\right\|_{V}$ ,

[TABLE]

respectively. As we can see from the bottom row in Figure 4.2, measured in both norms, the reconstruction error of the BCD is smaller than the error of Landweber iteration.

Figure 4.3 shows reconstruction results for nosy data. Again, the error in the BCD method decreases faster than the one of the Landweber method. The BCD therefore requires less cycles than the Landweber method. Moreover, in the middle column of Figure 4.3 we illustrate the need for the loping (or another regularization strategy). Without loping, the BCD iteration as well as the Landweber iteration start to diverge after around 2000 iterations. With loping (for the BCD method) and the with the discrepancy principle (for the Landweber method) both iterations stop. (Note that here we only show the error in the $V$ -norm and that the Landweber method is monotonically decreasing in the $2$ -norm when using the discrepancy principle.) Finally, the bottom row in Figure 4.3 shows that the reconstruction error for the BCD iteration is not monotonically decreasing in the standard norm, whereas in the $V$ -norm it is.

5 A nonlinear test: Multi-spectral X-ray tomography

In this section we apply a nonlinear generalization of the BCD and the Landweber iteration to one-step inversion in multi-spectral X-ray tomography. In particular, for nonlinear operators $\mathcal{A}$ in place of linear ones, we use the following generalization of the BCD iteration

[TABLE]

Note that such problems are not covered be our theoretical analysis. We consider extending our theory to this class of examples a particularly interesting topic of future research.

In the following we denote by $D_{R}\subseteq\mathbb{R}^{2}$ the disc with radius $R<1$ centered at the origin. We define the fan beam Radon transform $R\mu\colon\mathbb{S}^{1}\times\mathbb{S}^{1}\to\mathbb{R}$ of a function $\mu\colon\mathbb{R}^{2}\to\mathbb{R}$ supported in $D_{R}$ by

[TABLE]

It can be easily verified that the fan beam Radon transform $R\colon L^{2}(D_{R})\to L^{2}(\mathbb{S}^{1}\times\mathbb{S}^{1})$ is linear and continuous [17].

5.1 Mathematical modeling

We assume that the tissue is composed of $B$ different materials each of them having a different energy dependent X-ray attenuation coefficient $\mu_{b}(E)$ with $b=1,\dots,B$ . The combined X-ray attenuation coefficient is then given by

[TABLE]

where $f[b]\colon\mathbb{R}^{2}\to[0,1]$ is the fractional density map of the $b$ th material. Our goal is to determine the fractional density maps $f[b]$ from multi-spectral X-ray transmission measurements.

The energy sensitive X-ray transmission measurements result in the intensity [2]

[TABLE]

Here $W\subseteq[0,\infty)$ denotes the energy window where the measurement is made and $s\colon[0,\infty)\to\mathbb{R}$ is the product of X-ray beam spectrum intensity and detector sensitivity. We assume the detector sensitivity to be constant and that the spectrum $s$ is known for energies ranging from $20\text{\,}\mathrm{keV}$ to $120\text{\,}\mathrm{keV}$ covering any energy window. The spectrum used for the numerical results is the same as in [1, 2] and shown in Figure 5.1.

In order to recover multiple material densities, we use multiple energy windows. We choose the same number $B$ of spectral windows as we have different materials. Moreover, to simplify the mathematical formulation we uniformly discretize the energy variable, $E_{0}=$ 20\text{,}\mathrm{keV} $<E_{1}<\cdots<E_{N}=$ 120\text{,}\mathrm{keV}$$. The X-ray measurements corresponding to the $b$ th energy window is given by

[TABLE]

Here $W_{b}\subseteq\left\{1,\dots,N\right\}$ model discrete energy windows, $(s_{i})_{i=1}^{N}$ is the discretized beam spectrum intensity, and $\Delta E:=($ 120\text{,}\mathrm{keV} $)/N$ . Summarizing the above we define the following forward operator.

Definition 5.1 (Multi-spectral X-ray measurement operator).

The measurement operator $\mathcal{A}$ with respect to the energy windows $W_{1},\dots,W_{B}$ is given by

[TABLE]

We can decompose the operator $\mathcal{A}$ in the form

[TABLE]

where

$\blacksquare$

$\mathcal{U}\colon L^{2}(D_{R})^{B}\to L^{2}(D_{R})^{N}\colon f\mapsto(\sum_{b=1}^{B}\mu_{i,b}f[b])_{i=1}^{N}$

$\blacksquare$

$\mathcal{R}\colon L^{2}(D_{R})^{N}\to L^{2}(D_{R})^{N}\colon(\mu_{i})_{i=1}^{N}\mapsto(R\mu_{i})_{i=1}^{N}$

$\blacksquare$

$\mathcal{E}\colon L^{2}(D_{R})^{N}\to L^{2}(D_{R})^{N}\colon(g_{i})_{i=1}^{N}\mapsto(\exp(-g_{i}))_{i=1}^{N}$

$\blacksquare$

$\mathcal{V}_{Y}\colon L^{2}(D_{R})^{N}\to L^{2}(D_{R})^{B}\colon(g_{i})_{i=1}^{N}\mapsto(\sum_{i\in W_{b}}s_{i}g_{i})_{b=1}^{B}$ .

The operators $\mathcal{V}_{Y},\mathcal{R},\mathcal{U}$ are linear and bounded. To show the continuity and differentiability of $\mathcal{A}$ we have to verify that $\mathcal{E}$ is continuous and differentiable.

Proposition 5.2 (Continuity and differentiability of $\mathcal{A}$ ).

The operator $\mathcal{A}$ is continuous and Fréchet differentiable. For $f,h\in(L(D_{R})^{2})^{B}$ we have

[TABLE]

with

[TABLE]

Proof.

One only has to verify that $f\mapsto\exp(-f)$ is continuous and Fréchet differentiable on $L^{2}(D_{R})$ with derivative given by $\mathcal{E}^{\prime}(g)h=\exp(-g)h$ . For that purpose, let $\left\|h\right\|_{2}\to 0$ which in particular implies its point wise convergence. Therefore

[TABLE]

This shows (5.9), and (5.8) follows by the chain rule. ∎

In the context of the BCD method, the fractional density maps $f[b]$ play the roles of the blocks $x[b]$ . The form (5.7) of the forward operator $\mathcal{A}$ has some similarity with the form that we used in the theoretical analysis of the BCD method, in the sense that the infinite dimensional smoothing operator is applied to several channels of a function. However, so far we have not been able to perform an analysis accounting for the non-linearity. Additionally, we apply a preconditioning technique as outlined in the following subsection. Extending the convergence analysis of BCD such that it applies to multi-spectral CT is subject of future research.

5.2 Logarithmic scaling and preconditioning

The energy dependence of the mass-attenuation coefficient of different materials can be quite similar. In order to enhance the dependence on the different materials we propose a logarithmic scaling and preconditioning technique (different from [2]). For simplicity we consider only the case $B=2$ , the general case can be treated in a similar manner.

The proposed preconditioned logarithmic data take the form

[TABLE]

where $f=(f[1],f[2])$ are the unknowns and $c_{1,1}$ , $c_{1,2}$ , $c_{2,1}$ , $c_{2,2}$ are parameters. Moreover, recall that $\mathcal{A}_{1}(f)$ and $\mathcal{A}_{2}(f)$ are the X-ray intensities defined by (5.6) corresponding to $W_{1},W_{2}\subseteq\left\{1,\dots,N\right\}$ modeling the discrete energy windows. The preconditioned inverse problem consists in solving the system

[TABLE]

where $v_{1},v_{2}$ are data perturbed by noise $(z_{1},z_{2})$ .

In order to solve the equations in (5.11), (5.12) with the BCD method we define the residual functionals

[TABLE]

Application of the BCD method requires the adjoint gradient of $\mathcal{A}_{1}$ and $\mathcal{A}_{2}$ , that we compute next.

Proposition 5.3 (Derivative of the preconditioned residuals).

Let $f,h\in L(D_{R})^{2}$ . The directional derivatives of $\Phi_{1}$ and $\Phi_{2}$ at $f$ in direction $h$ are given by

[TABLE]

Proof.

This follows from the chain rule. ∎

From Proposition 5.3 we conclude that the partial gradients of the residual functionals $\Phi_{b}$ are given by

[TABLE]

These expressions will be used for the implementations of the BCD as well as the Landweber method applied to the preconditioned system (5.11).

5.3 Numerical implementation

For all our experiments we used fan beam geometry. Each channel of the discrete phantom has size $400\times 400$ . We discretized $R$ using 300 detector positions $\alpha_{k}$ equidistantly distributed on $\mathbb{S}^{1}$ . For each detector position we compute $481$ line integrals for uniformly distributed angles $\beta_{\ell}$ in the interval $[-\pi/3,\pi/3]$ . To actually compute $Rf(\alpha_{k},\beta_{\ell})$ we used the trapezoidal rule and linear interpolation where we discretized the line integral using 400 equidistant sampling points in the interval $[0,2]$ . The adjoint $R^{*}g$ is evaluated using the standard backprojection algorithm with linear interpolation. We used $N=30$ equidistant discrete energy positions from $20\text{\,}\mathrm{k}\mathrm{e}\mathrm{V}$ to $120\text{\,}\mathrm{k}\mathrm{e}\mathrm{V}$ .

For our numerical studies we apply one-step inversion in multi-spectral CT tomography to reconstruct a head phantom composed of two different material map derived from FORBID head. The phantom is shown in Figure 5.2 and consists of the pair $f=(f[1],f[2])$ , where $f[1]$ corresponds to the fractional density of the brain and $f[2]$ to the fractional density of the bone material. We slightly modified the FORBID head phantom by inserting a disk with value $1/2$ in both components to demonstrate that the method can actually reconstruct mixed material distributions. The mass attenuation coefficients of the material maps (bone and brain) are taken from NIST tables [10] and are shown in Figure 5.3.

Figure 5.4 shows the data used for image reconstruction. In the first row original data $\mathcal{A}(f)=(\mathcal{A}_{1}(f)),\mathcal{A}_{2}(f)$ according to Definition 5.1 are plotted, where the indices $1$ and $2$ corresponds to energy windows $[$ 20\text{,}\mathrm{k}\mathrm{e}\mathrm{V} $,$ 70\text{,}\mathrm{k}\mathrm{e}\mathrm{V} $]$ and $[$ 70\text{,}\mathrm{k}\mathrm{e}\mathrm{V} $,$ 120\text{,}\mathrm{k}\mathrm{e}\mathrm{V} $]$ , respectively. One can observe, the data for both energy windows look quite similar. This is because of the similar energy dependence of the mass attenuation coefficients for $f[1]$ and $f[2]$ ; compare Figure 5.3. For this reason, we make use of the proposed scaling and preconditioning outlined in Section 5.2. The second row shows the preconditioned data we use for the reconstruction. For comparison purpose, the last row in Figure 5.4 shows the negative logarithm of the X-ray intensities for the full energy window, with in each case containing only one of the material maps. We have chosen the constants $c_{1,1}=1$ , $c_{1,2}=-1.35$ , $c_{2,1}=-1$ and $c_{2,2}=2.3$ in such a way that each of the modified data blocks highlights different aspects of the material maps. Note that we have selected the constants for data of a very different phantom in order to avoid inverse crime.

5.4 Numerical results

For the following results we compare the performance of the BCD method with the standard gradient method as reference method. We use a cyclic control $b(k)=(k-1)\mod B$ and constant step sizes for both methods. Note that for the BCD as well as the Landweber method we included a positivity constraint. Figure 5.5 shows reconstruction results for the bone and brain material map. Due to the applied logarithmic scaling and preconditioning, both methods are able to separate the materials after a reasonable number of iterations. One observes that even the mixed part can be reconstructed as well.

Figure 5.6 shows the relative squared reconstruction errors

[TABLE]

of the bone and the brain map using the Landweber method and the BCD method. The horizontal axes show the number of iterations in the Landweber method and the number of cycles (number of iterations divided by the number of blocks) in the BCD method. A cycle for the BCD method has the same numerical complexity as one iteration for the Landweber method. The BCD method delivers a lower relative error for the brain map, the relative error of the reconstruction for the bone map is similar for both methods.

Reconstruction results for noisy data are shown in Figure 5.7. To generate the noisy data, we added Gaussian white noise with standard deviation equal to $2\text{\,}\mathrm{\char 37\relax}$ of the maximal value of the exact data. In order to maintain stability of both iterations we stopped the Landweber iteration after $116$ iterations, accordingly the BCD-method is stopped after $116$ cycles. The relative squared reconstruction error is shown in Figure 5.8. Again, the BCD method is roughly a factor two faster than the Landweber method in recovering the brain map. For recovering the bone map, both methods are equally fast. We associate this different behavior to the particular form of preconditioning. As can be seen from the second line in Figure 5.4, both preconditioned data pairs contain significant parts of the data corresponding to the brain whereas the bone data is mainly contained in the second one. Investigating optimal weights for the preconditioning is an interesting aspect of future work.

6 Conclusion

In this paper we analyzed the BCD (block coordinate descent) method for linear inverse problems. For a particular tensor product form we have shown that the BCD method combined with an appropriate loping and stopping strategy is a convergent regularization method for ill-posed inverse problems. The analysis in the present paper applies to operators having the tensor product form $V\otimes K(x)=V(K(x[1]),\dots,{K(x[B])})$ , where $V\in\mathbb{R}^{D\times B}$ and $K\colon X\to Y$ is linear. We presented two examples for numerically solving ill-posed problems with the BCD method. The first one is concerns a system of linear integral equations that is covered by our theory. As an outlook we applied the BCD method to an example not covered by our theory, namely one-step inversion in multi-spectral X-ray computed tomography.

Future work will be done to extend our analysis of the BCD method to more general forward operators, in particular non-linear problems including examples like multi-spectral CT. This is challenging as the BCD is not monotone in the reconstruction error $\left\|x_{k}-x^{*}\right\|$ . However, we believe that the technique introduced in this paper of finding a suitable norm where monotonicity holds can be extended to more general situations.

Acknowledgments

The work Markus Haltmeier has been supported by the Austrian Science Fund (FWF), project P 30747-N32. Simon Rabanser acknowledges support of the Austrian Academy of Sciences (ÖAW) via the DOC Fellowship Programme. The authors thank the anonymous reviewers for valuable comments that helped to significantly improve the manuscript.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] H. Atak and P. M. Shikhaliev. Dual energy ct with photon counting and dual source systems: comparative evaluation. Phys. Med. Biol. , 60(23):8949, 2015.
2[2] R. F. Barber, E. Y. Sidky, T. G. Schmidt, and X. Pan. An algorithm for constrained one-step inversion of spectral CT data. Phys. Med. Biol. , 61(10):3784, 2016.
3[3] A. Beck and L. Tetruashvili. On the convergence of block coordinate descent type methods. SIAM J. Optim , 23(4):2037–2060, 2013.
4[4] J. Bolte, S. Sabach, and M. Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming , 146(1-2):459–494, 2014.
5[5] A. De Cezaro, M. Haltmeier, A. Leitão, and O. Scherzer. On steepest-descent-Kaczmarz methods for regularizing systems of nonlinear ill-posed equations. Appl. Math. Comput. , 202(2):596–607, 2008.
6[6] H. Engl, M. Hanke, and A. Neubauer. Regularization of inverse problems , volume 375. Springer Science & Business Media, 1996.
7[7] M Haltmeier. Convergence analysis of a block iterative version of the loping Landweber-Kaczmarz iteration. Nonlinear Anal. , 71(12):e 2912–e 2919, 2009.
8[8] M. Haltmeier, A. Leitão, and O. Scherzer. Kaczmarz methods for regularizing nonlinear ill-posed equations. I. Convergence analysis. Inverse Probl. Imaging , 1(2):289–298, 2007.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Analysis of the Block Coordinate Descent Method for Linear Ill-Posed Problems

Abstract

1 Introduction

1.1 Iterative regularization methods

1.2 Block coordinate descent (BCD)

1.3 Outline

2 Preliminaries

2.1 Main assumptions

Assumption 2.1** (Main conditions for the convergence analysis).**

Remark 2.2** (Numerical complexity).**

2.2 Monotonicity

Lemma 2.3** (Monotonicity).**

Proof.

2.3 Loping BCD and discrepancy principle

Definition 2.4** (Loping BCD).**

Lemma 2.5** (Summability of squared residuals).**

Proof.

Remark 2.6**.**

3 Convergence Analysis of the BCD method

3.1 Convergence for exact data

Theorem 3.1** (Convergence of BCD for exact data).**

Proof.

3.2 Convergence for noisy data

Proposition 3.2** (Existence of stopping index).**

Proof.

Lemma 3.3** (Continuity of the BCD iteration at δ=0\delta=0δ=0).**

Proof.

Theorem 3.4** (Convergence of the loping BCD for noisy data).**

Proof.

4 Example: System of linear integral equation

4.1 Forward problem

Definition 4.1**.**

4.2 Reconstruction results

5 A nonlinear test: Multi-spectral X-ray tomography

5.1 Mathematical modeling

Definition 5.1** (Multi-spectral X-ray measurement operator).**

Proposition 5.2** (Continuity and differentiability of A\mathcal{A}A).**

Proof.

5.2 Logarithmic scaling and preconditioning

Proposition 5.3** (Derivative of the preconditioned residuals).**

Proof.

5.3 Numerical implementation

5.4 Numerical results

6 Conclusion

Acknowledgments

Assumption 2.1 (Main conditions for the convergence analysis).

Remark 2.2 (Numerical complexity).

Lemma 2.3 (Monotonicity).

Definition 2.4 (Loping BCD).

Lemma 2.5 (Summability of squared residuals).

Remark 2.6.

Theorem 3.1 (Convergence of BCD for exact data).

Proposition 3.2 (Existence of stopping index).

Lemma 3.3 (Continuity of the BCD iteration at $\delta=0$ ).

Theorem 3.4 (Convergence of the loping BCD for noisy data).

Definition 4.1.

Definition 5.1 (Multi-spectral X-ray measurement operator).

Proposition 5.2 (Continuity and differentiability of $\mathcal{A}$ ).

Proposition 5.3 (Derivative of the preconditioned residuals).