Regularization Properties of the Krylov Iterative Solvers CGME and LSMR   For Linear Discrete Ill-Posed Problems with an Application to Truncated   Randomized SVDs

Zhongxiao Jia

arXiv:1812.04762·math.NA·March 20, 2020

Regularization Properties of the Krylov Iterative Solvers CGME and LSMR For Linear Discrete Ill-Posed Problems with an Application to Truncated Randomized SVDs

Zhongxiao Jia

PDF

TL;DR

This paper analyzes the regularization properties of Krylov solvers CGME and LSMR for large-scale ill-posed problems, comparing their effectiveness with LSQR and improving understanding of their solutions and truncation effects.

Contribution

It establishes the regularization properties of CGME and LSMR, including filtered SVD expansions and accuracy comparisons with LSQR, and improves fundamental results on randomized truncated SVDs.

Findings

01

CGME and LSMR have regularization properties similar to LSQR.

02

The solutions obtained by CGME and LSMR are less accurate than those by LSQR.

03

Truncation in randomized SVDs can reduce accuracy, as analyzed in the paper.

Abstract

For the large-scale linear discrete ill-posed problem $min ∥ A x - b ∥$ or $A x = b$ with $b$ contaminated by Gaussian white noise, there are four commonly used Krylov solvers: LSQR and its mathematically equivalent CGLS, the Conjugate Gradient (CG) method applied to $A^{T} A x = A^{T} b$ , CGME, the CG method applied to $min ∥ A A^{T} y - b ∥$ or $A A^{T} y = b$ with $x = A^{T} y$ , and LSMR, the minimal residual (MINRES) method applied to $A^{T} A x = A^{T} b$ . These methods have intrinsic regularizing effects, where the number $k$ of iterations plays the role of the regularization parameter. In this paper, we establish a number of regularization properties of CGME and LSMR, including the filtered SVD expansion of CGME iterates, and prove that the 2-norm filtering best regularized solutions by CGME and LSMR are less accurate than and at least as accurate as those by LSQR, respectively. We also prove that the semi-convergence…

Tables1

Table 1. Table 1: The description of test problems.

Problem	Description	Size of $m, n$
shaw	1D image restoration model	$m = n = 5000$
gravity	1D gravity surveying problem	$m = n = 5000$
baart	1D image deblurring	$m = n = 5000$
phillips	phillips’ famous test problem	$m = n = 5000$
heat	Inverse heat problem	$m = n = 5000$
deriv2	Computation of second derivative	$m = n = 10000$
AtmosphericBlur10	Spatially Invariant Gaussian Blur	$m = n = 65536$
AtmosphericBlur30	Spatially Invariant Gaussian Blur	$m = n = 65536$
GaussianBlur420	Spatially Invariant Atmospheric	$m = n = 65536$
	Turbulence Blur
GaussianBlur422	Spatially Invariant Atmospheric	$m = n = 65536$
	Turbulence Blur
VariantGaussianBlur1	Spatially Variant Gaussian Blur	$m = n = 99856$
VariantGaussianBlur2	Spatially Variant Gaussian Blur	$m = n = 99856$
VariantMotionBlur_large	Spatially Variant Motion Blur	$m = n = 65536$
VariantMotionBlur_medium	Spatially Variant Motion Blur	$m = n = 65536$
blur	2D image restoration	$m = n = 22500$
fanbeamtomo	2D fan-beam tomography problem	$61200 \times 14400$
seismictomo	2D seismic tomography	$20000 \times 10000$

Equations249

\min\limits_{x\in\mathbb{R}^{n}}\|Ax-b\|\mbox{\,\ or \ $Ax=b$,}\ \ \ A\in\mathbb{R}^{m\times n},\ b\in\mathbb{R}^{m},

\min\limits_{x\in\mathbb{R}^{n}}\|Ax-b\|\mbox{\,\ or \ $Ax=b$,}\ \ \ A\in\mathbb{R}^{m\times n},\ b\in\mathbb{R}^{m},

K x = (K x) (t) = \int_{Ω} k (s, t) x (t) d t = g (s) = g, s \in Ω \subset R^{q},

K x = (K x) (t) = \int_{Ω} k (s, t) x (t) d t = g (s) = g, s \in Ω \subset R^{q},

x \in R^{n} min ∥ Lx ∥ \mbox s u bj ec tt o ∥ A x - b ∥ \leq τ ∥ e ∥

x \in R^{n} min ∥ Lx ∥ \mbox s u bj ec tt o ∥ A x - b ∥ \leq τ ∥ e ∥

A=U\left(\begin{array}[]{c}\Sigma\\ \mathbf{0}\end{array}\right)V^{T}

A=U\left(\begin{array}[]{c}\Sigma\\ \mathbf{0}\end{array}\right)V^{T}

x_{nai v e} = i = 1 \sum n \frac{u _{i}^{T} b}{σ _{i}} v_{i} = i = 1 \sum n \frac{u _{i}^{T} b _{t r u e}}{σ _{i}} v_{i} + i = 1 \sum n \frac{u _{i}^{T} e}{σ _{i}} v_{i} = x_{t r u e} + i = 1 \sum n \frac{u _{i}^{T} e}{σ _{i}} v_{i}

x_{nai v e} = i = 1 \sum n \frac{u _{i}^{T} b}{σ _{i}} v_{i} = i = 1 \sum n \frac{u _{i}^{T} b _{t r u e}}{σ _{i}} v_{i} + i = 1 \sum n \frac{u _{i}^{T} e}{σ _{i}} v_{i} = x_{t r u e} + i = 1 \sum n \frac{u _{i}^{T} e}{σ _{i}} v_{i}

∣ u_{i}^{T} b_{t r u e} ∣ = σ_{i}^{1 + β}, β > 0, i = 1, 2, \dots, n,

∣ u_{i}^{T} b_{t r u e} ∣ = σ_{i}^{1 + β}, β > 0, i = 1, 2, \dots, n,

∣ u_{k_{0}}^{T} b ∣ \approx ∣ u_{k_{0}}^{T} b_{t r u e} ∣ > ∣ u_{k_{0}}^{T} e ∣ \approx η, ∣ u_{k_{0} + 1}^{T} b ∣ \approx ∣ u_{k_{0} + 1}^{T} e ∣ \approx η;

∣ u_{k_{0}}^{T} b ∣ \approx ∣ u_{k_{0}}^{T} b_{t r u e} ∣ > ∣ u_{k_{0}}^{T} e ∣ \approx η, ∣ u_{k_{0} + 1}^{T} b ∣ \approx ∣ u_{k_{0} + 1}^{T} e ∣ \approx η;

min ∥ x ∥ \mbox s u bj ec tt o ∥ A_{k} x - b ∥ = min

min ∥ x ∥ \mbox s u bj ec tt o ∥ A_{k} x - b ∥ = min

A Q_{k}

A Q_{k}

A^{T} P_{k + 1}

B_{k}=\left(\begin{array}[]{cccc}\alpha_{1}&&&\\ \beta_{2}&\alpha_{2}&&\\ &\beta_{3}&\ddots&\\ &&\ddots&\alpha_{k}\\ &&&\beta_{k+1}\end{array}\right)\in\mathbb{R}^{(k+1)\times k}.

B_{k}=\left(\begin{array}[]{cccc}\alpha_{1}&&&\\ \beta_{2}&\alpha_{2}&&\\ &\beta_{3}&\ddots&\\ &&\ddots&\alpha_{k}\\ &&&\beta_{k+1}\end{array}\right)\in\mathbb{R}^{(k+1)\times k}.

B_{k} = P_{k + 1}^{T} A Q_{k} .

B_{k} = P_{k + 1}^{T} A Q_{k} .

∥ A x_{k}^{l s q r} - b ∥ = x \in V_{k}^{R} min ∥ A x - b ∥

∥ A x_{k}^{l s q r} - b ∥ = x \in V_{k}^{R} min ∥ A x - b ∥

x_{k}^{l s q r} = Q_{k} y_{k}^{l s q r} \mbox w i t h y_{k}^{l s q r} = ar g y \in R^{k} min ∥ B_{k} y - β_{1} e_{1}^{(k + 1)} ∥ = β_{1} B_{k}^{†} e_{1}^{(k + 1)},

x_{k}^{l s q r} = Q_{k} y_{k}^{l s q r} \mbox w i t h y_{k}^{l s q r} = ar g y \in R^{k} min ∥ B_{k} y - β_{1} e_{1}^{(k + 1)} ∥ = β_{1} B_{k}^{†} e_{1}^{(k + 1)},

∥ x_{nai v e} - x_{k}^{c g m e} ∥ = x \in V_{k}^{R} min ∥ x_{nai v e} - x ∥

∥ x_{nai v e} - x_{k}^{c g m e} ∥ = x \in V_{k}^{R} min ∥ x_{nai v e} - x ∥

\overset{ˉ}{B}_{k} = P_{k}^{T} A Q_{k} .

\overset{ˉ}{B}_{k} = P_{k}^{T} A Q_{k} .

x_{k}^{c g m e} = Q_{k} y_{k}^{c g m e} \mbox w i t h y_{k}^{c g m e} = β_{1} \overset{ˉ}{B}_{k}^{- 1} e_{1}^{(k)}

x_{k}^{c g m e} = Q_{k} y_{k}^{c g m e} \mbox w i t h y_{k}^{c g m e} = β_{1} \overset{ˉ}{B}_{k}^{- 1} e_{1}^{(k)}

∥ A^{T} (b - A x_{k}^{l s m r}) ∥ = x \in V_{k}^{R} min ∥ A^{T} (b - A x) ∥

∥ A^{T} (b - A x_{k}^{l s m r}) ∥ = x \in V_{k}^{R} min ∥ A^{T} (b - A x) ∥

x_{k}^{l s m r} = Q_{k} y_{k}^{l s m r} \mbox w i t h y_{k}^{l s m r} = ar g y \in R^{k} min ∥ (B_{k}^{T} B_{k}, α_{k + 1} β_{k + 1} e_{k}^{(k)})^{T} y - α_{1} β_{1} e_{1}^{(k + 1)} ∥.

x_{k}^{l s m r} = Q_{k} y_{k}^{l s m r} \mbox w i t h y_{k}^{l s m r} = ar g y \in R^{k} min ∥ (B_{k}^{T} B_{k}, α_{k + 1} β_{k + 1} e_{k}^{(k)})^{T} y - α_{1} β_{1} e_{1}^{(k + 1)} ∥.

x_{k}^{l s q r} = Q_{k} B_{k}^{†} P_{k + 1}^{T} b,

x_{k}^{l s q r} = Q_{k} B_{k}^{†} P_{k + 1}^{T} b,

min ∥ x ∥ \mbox s u bj ec tt o ∥ P_{k + 1} B_{k} Q_{k}^{T} x - b ∥ = min

min ∥ x ∥ \mbox s u bj ec tt o ∥ P_{k + 1} B_{k} Q_{k}^{T} x - b ∥ = min

γ_{k}^{l s q r} = ∥ A - P_{k + 1} B_{k} Q_{k}^{T} ∥,

γ_{k}^{l s q r} = ∥ A - P_{k + 1} B_{k} Q_{k}^{T} ∥,

γ_{k}^{l s q r} \geq σ_{k + 1} .

γ_{k}^{l s q r} \geq σ_{k + 1} .

σ_{k + 1} \leq γ_{k}^{l s q r} < \frac{σ _{k} + σ _{k + 1}}{2} .

σ_{k + 1} \leq γ_{k}^{l s q r} < \frac{σ _{k} + σ _{k + 1}}{2} .

γ_{k}^{l s q r}

γ_{k}^{l s q r}

G_{k}

G_{k}

α_{k + 1}

α_{k + 1}

γ_{k + 1}^{l s q r}

x_{k}^{c g m e} = Q_{k} \overset{ˉ}{B}_{k}^{- 1} P_{k}^{T} b .

x_{k}^{c g m e} = Q_{k} \overset{ˉ}{B}_{k}^{- 1} P_{k}^{T} b .

min ∥ x ∥ \mbox s u bj ec tt o ∥ P_{k} \overset{ˉ}{B}_{k} Q_{k}^{T} x - b ∥ = min

min ∥ x ∥ \mbox s u bj ec tt o ∥ P_{k} \overset{ˉ}{B}_{k} Q_{k}^{T} x - b ∥ = min

P_{k + 1} B_{k} Q_{k}^{T} = A Q_{k} Q_{k}^{T} .

P_{k + 1} B_{k} Q_{k}^{T} = A Q_{k} Q_{k}^{T} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Regularization Properties of the Krylov Iterative Solvers

CGME and LSMR For Linear Discrete Ill-Posed Problems with an Application to Truncated Randomized SVDs††thanks: This work was supported in part by the National Science Foundation of China (No. 11771249)

Zhongxiao Jia Department of Mathematical Sciences, Tsinghua University, 100084 Beijing, China. () [email protected]

Abstract

For the large-scale linear discrete ill-posed problem $\min\|Ax-b\|$ or $Ax=b$ with $b$ contaminated by Gaussian white noise, there are four commonly used Krylov solvers: LSQR and its mathematically equivalent CGLS, the Conjugate Gradient (CG) method applied to $A^{T}Ax=A^{T}b$ , CGME, the CG method applied to $\min\|AA^{T}y-b\|$ or $AA^{T}y=b$ with $x=A^{T}y$ , and LSMR, the minimal residual (MINRES) method applied to $A^{T}Ax=A^{T}b$ . These methods have intrinsic regularizing effects, where the number $k$ of iterations plays the role of the regularization parameter. In this paper, we establish a number of regularization properties of CGME and LSMR, including the filtered SVD expansion of CGME iterates, and prove that the 2-norm filtering best regularized solutions by CGME and LSMR are less accurate than and at least as accurate as those by LSQR, respectively. We also prove that the semi-convergence of CGME and LSMR always occurs no later and sooner than that of LSQR, respectively. As a byproduct, using the analysis approach for CGME, we improve a fundamental result on the accuracy of the truncated rank $k$ approximate SVD of $A$ generated by randomized algorithms, and reveal how the truncation step damages the accuracy. Numerical experiments justify our results on CGME and LSMR.

keywords:

Discrete ill-posed, rank $k$ approximations, semi-convergence, regularized solution, Lanczos bidiagonalization, TSVD regularized solution, CGME, LSMR, LSQR, CGLS

AMS:

65F22, 15A18, 65F10, 65F20, 65R32, 65J20, 65R30

\slugger

sirevxxxxxxxx–x

1 Introduction and Preliminaries

Consider the linear discrete ill-posed problem

[TABLE]

where the norm $\|\cdot\|$ is the 2-norm of a vector or matrix, and $A$ is extremely ill conditioned with its singular values decaying to zero without a noticeable gap. We simply assume that $m\geq n$ . Since the results in this paper hold for both the $m\geq n$ and $m\leq n$ cases. (1) arises from many applications, e.g., from the discretization of the first kind Fredholm integral equation

[TABLE]

where the kernel $k(s,t)\in L^{2}({\Omega\times\Omega})$ and $g(s)$ are known functions, while $x(t)$ is the unknown function to be sought. Applications include image deblurring, signal processing, geophysics, computerized tomography, heat propagation, biomedical and optical imaging, groundwater modeling, and many others [1, 9, 10, 24, 35, 36, 37, 39, 47]. The right-hand side $b=b_{true}+e$ is assumed to be contaminated by a Gaussian white noise $e$ , caused by measurement, modeling or discretization errors, where $b_{true}$ is noise-free and $\|e\|<\|b_{true}\|$ . Because of the presence of noise $e$ and the extreme ill-conditioning of $A$ , the naive solution $x_{naive}=A^{\dagger}b$ of (1) generally bears no relation to the true solution $x_{true}=A^{\dagger}b_{true}$ , where $\dagger$ denotes the Moore-Penrose inverse of a matrix. Therefore, we must use regularization to extract a good approximation to $x_{true}$ as much as possible.

For a Gaussian white noise $e$ , throughout the paper, we always assume that $b_{true}$ satisfies the discrete Picard condition $\|A^{\dagger}b_{true}\|\leq C$ with some constant $C$ for $\|A^{\dagger}\|$ arbitrarily large [1, 13, 20, 21, 22, 24, 36]. Without loss of generality, assume that $Ax_{true}=b_{true}$ . Then a dominating regularization approach is to solve the problem

[TABLE]

with $\tau>1$ slightly [22, 24], where $L$ is a regularization matrix and its suitable choice is based on a-prior information on $x_{true}$ .

In this paper, we are concerned with the case $L=I$ in (3), which corresponds to a 2-norm filtering regularization problem. Let

[TABLE]

be the singular value decomposition (SVD) of $A$ , where $U=(u_{1},u_{2},\ldots,u_{m})\in\mathbb{R}^{m\times m}$ and $V=(v_{1},v_{2},\ldots,v_{n})\in\mathbb{R}^{n\times n}$ are orthogonal, $\Sigma={\rm diag}(\sigma_{1},\sigma_{2},\ldots,\sigma_{n})\in\mathbb{R}^{n\times n}$ with the singular values $\sigma_{1}>\sigma_{2}>\cdots>\sigma_{n}>0$ assumed to be simple, the superscript $T$ denotes the transpose of a matrix or vector, and $\mathbf{0}$ denotes a zero matrix. With (4), we have

[TABLE]

and $\|x_{true}\|=\|A^{\dagger}b_{true}\|=\left(\sum_{i=1}^{n}\frac{|u_{i}^{T}b_{true}|^{2}}{\sigma_{i}^{2}}\right)^{1/2}$ .

The discrete Picard condition means that, on average, the Fourier coefficient $|u_{i}^{T}b_{true}|$ decays faster than $\sigma_{i}$ , which results in the following popular model that is used throughout Hansen’s books [22, 24] and the references therein as well as [32, 33]:

[TABLE]

where $\beta$ is a model parameter that controls the decay rates of $|u_{i}^{T}b_{true}|$ .

The covariance matrix of the Gaussian white noise $e$ is $\eta^{2}I$ , the expected value $\mathcal{E}(\|e\|^{2})=m\eta^{2}$ and $\mathcal{E}(|u_{i}^{T}e|)=\eta,\,i=1,2,\ldots,n$ , so that $\|e\|\approx\sqrt{m}\eta$ and $|u_{i}^{T}e|\approx\eta,\ i=1,2,\ldots,n$ . (5) and (6) show that, for large singular values, $|{u_{i}^{T}b_{true}}|/{\sigma_{i}}$ is dominant relative to $|u_{i}^{T}e|/{\sigma_{i}}$ . Once $|u_{i}^{T}b_{true}|\leq|u_{i}^{T}e|$ from some $i$ onwards, the noise $e$ dominates $|u_{i}^{T}b|$ , and the terms $\frac{|u_{i}^{T}b|}{\sigma_{i}}\approx\frac{|u_{i}^{T}e|}{\sigma_{i}}$ overwhelm $x_{true}$ for small singular values and must be dampened. Therefore, the transition point $k_{0}$ is such that

[TABLE]

see [24, p.42, 98] and [22, p.70-1].

The truncated SVD (TSVD) method [20, 22, 24] is a reliable and commonly used method for solving small to modest sized (3), and it solves a sequence of problems

[TABLE]

starting with $k=1$ onwards, where $A_{k}=U_{k}\Sigma_{k}V_{k}^{T}$ is a best rank $k$ approximation to $A$ with respect to the 2-norm with $U_{k}=(u_{1},\ldots,u_{k})$ , $V_{k}=(v_{1},\ldots,v_{k})$ and $\Sigma_{k}={\rm diag}(\sigma_{1},\ldots,\sigma_{k})$ ; it holds that $\|A-A_{k}\|=\sigma_{k+1}$ [3, p.12], and $x_{k}^{tsvd}=A_{k}^{\dagger}b$ solves (8), called the TSVD regularized solution. For the Gaussian white noise $e$ it is known from [22, p.70-1] and [24, p.71,86-8,95] that $x_{k_{0}}^{tsvd}$ is the 2-norm filtering best TSVD regularized solution of (1), i.e., $x_{k_{0}}^{tsvd}$ has the minimal 2-norm error $\|x_{true}-x_{k_{0}}^{tsvd}\|=\min_{k=1,2,\ldots,n}\|x_{true}-x_{k}^{tsvd}\|$ . The index $k$ plays the role of the regularization parameter in the TSVD method. It has been observed and justified that $x_{k_{0}}^{tsvd}$ is essentially a 2-norm filtering best possible solution of (1); see [21], [22, p.109-11], [24, Sections 4.2 and 4.4] and [46]. We refer to [32] for general elaborations. As a result, we can take $x_{k_{0}}^{tsvd}$ as the standard reference when assessing the regularization ability of a 2-norm filtering regularization method.

For $A$ large, the TSVD method is generally prohibitively expensive, and only iterative regularization methods are appealing. Krylov iterative solvers have formed a major class of methods [1, 10, 14, 17, 22, 24, 37]. Specifically, the CGLS method [15, 26] and its mathematically equivalent LSQR method [41], the CGME method [3, 4, 6, 17, 18] and the LSMR method [4, 5, 12] have been commonly used. These methods are deterministic 2-norm filtering regularization methods, have general regularizing effects, and exhibit semi-convergence [39, p.89]; see also [3, p.314], [4, p.733], [22, p.135] and [24, p.110]: The iterates first converge to $x_{true}$ , then the noise $e$ starts to deteriorate the iterates so that they start to diverge from $x_{true}$ and instead converge to $x_{naive}$ . The iteration number plays the role of the regularization parameter in iterative regularization methods.

The behavior of ill-posed problems and solvers depends on the decay rate of $\sigma_{j}$ . Hoffmann [29] has characterized the degree of ill-posedness of (1) as follows: If $\sigma_{j}=\mathcal{O}(\rho^{-j})$ with $\rho>1$ , $j=1,2,\ldots,n$ , then (1) is severely ill-posed; if $\sigma_{j}=\mathcal{O}(j^{-\alpha})$ , then (1) is mildly or moderately ill-posed for $\frac{1}{2}<\alpha\leq 1$ or $\alpha>1$ . This definition has been widely used [1, 10, 22, 24]. The requirement $\alpha>\frac{1}{2}$ does not appear in [29] and is explicitly added in [30, 32], which is always met for a linear compact operator equation [19, 22].

Hanke and Hansen [19] address that a strict proof of the regularizing properties of conjugate gradients is extremely difficult; see also [23]. The regularizing effects of CGLS, LSQR and CGME have been intensively studied; see, e.g., and have been intensively studied [1, 8, 11, 14, 17, 18, 22, 24, 27, 28, 30, 32, 33, 42, 45]. It has long been known (cf. [19, 22, 23, 24]) that if the singular values of the projection matrices involved in LSQR, called the Ritz values, approximate the large singular values in natural order then LSQR has the same regularization ability as the TSVD method, that is, the two methods can compute 2-norm filtering best regularized solutions with the same accuracy. As we will see clearly, the same results hold for CGME and LSMR when the singular values of projection matrices approximate the large singular values of $A$ and $A^{T}A$ in this order, respectively.

If a 2-norm filtering regularized solution of (1) is as accurate as $x_{k_{0}}^{tsvd}$ , it is called a 2-norm filtering best possible regularized solution. If the 2-norm filtering regularized solution by a regularization method at semi-convergence is such a best possible one, then the solver is said to have the full regularization. Otherwise, the solver has only the partial regularization. This definition is introduced in [30, 32]. In terms of it, a fundamental question posed in [30, 32] is: Do CGLS, LSQR, CGME and LSMR have the full or partial regularization for severely, moderately and mildly ill-posed problems? Actually, this question has been receiving high attention for CGLS and LSQR.

For the cases that $\sigma_{i}$ are simple, the author in [32] has given accurate estimates for the 2-norm distances between the underlying $k$ dimensional Krylov subspace and the $k$ dimensional dominant right singular subspace $span\{V_{k}\}$ of $A$ for severely, moderately and mildly ill-posed problems. On the basis of [32], the author in [33] has proved that, for LSQR, the $k$ Ritz values converge to the $k$ large singular values of $A$ in natural order and Lanczos bidiagonalization always generates a near best rank $k$ approximation until $k=k_{0}$ for severely and moderately ill-posed problems with suitable $\rho>1$ and $\alpha>1$ , meaning that LSQR and CGLS have the full regularization. However, if such desired properties fail to hold, it has been theoretically unknown if LSQR has the full or partial regularization. Nevertheless, numerical experiments on many ill-posed problems have demonstrated that LSQR always has the full regularization [32, 33].

In this paper, we analyze the regularization of CGME and LSMR under the assumption that all the singular values $\sigma_{i}$ are simple. We establish a number of results, and prove that the regularization ability of CGME is generally inferior to that of LSQR, that is, the 2-norm filtering best regularized solutions obtained by CGME at semi-convergence are generally less accurate than those obtained by LSQR. Specifically, we derive the filtered SVD expansion of CGME iterates, by which we prove that the semi-convergence of CGME always occurs no later than that of LSQR and can be much earlier than the latter. In the meantime, we show how to extract a rank $k$ approximation from the rank $k+1$ approximation to $A$ generated in CGME at iteration $k$ , which is as accurate as the rank $k$ approximation in LSQR. Exploiting such rank $k$ approximation, we propose a modified CGME (MCGME) method whose regularization ability is shown to be very comparable to that of LSQR. For LSMR, we present a number of results and prove that its regularization ability is as good as that of LSQR and the two methods compute the 2-norm filtering best regularized solutions with essentially the same accuracy. We also show that the semi-convergence of LSMR always occurs no sooner than that of LSQR.

As a windfall, making of our analysis approach used for CGME, we improve a fundamental bound, Theorem 9.3 presented in Halko et al. [16], for the accuracy of the truncated rank $k$ approximation to $A$ generated by randomized algorithms, which have formed a highly intensive topic and have been used in numerous disciplines over the years. As remarked by Halko et al. in [16] (cf. Remark 9.1 there), their bound appears “conservative, but a complete theoretical understanding lacks.” Our new bounds for the approximation accuracy are not only unconditionally sharper than theirs but also can reveal how the truncation step damages the accuracy of the rank $k$ approximation.

The paper is organized as follows. In Section 2, we review LSQR, CGME and LSMR. In Section 3, we briefly state some results on LSQR in [32, 33] and take LSQR as reference to assess the regularization ability of CGME and LSMR. In Section 4, we derive a number of regularization properties of CGME and propose the MCGME method. In Section 5, we consider the accuracy of the truncated rank $k$ randomized approximation [16] and present sharper bounds. In Section 6, we study the regularization ability of LSMR. In Section 7, we report numerical experiments to confirm our theory. We conclude the paper in Section 8.

Throughout the paper, we denote by $\mathcal{K}_{k}(C,w)=span\{w,Cw,\ldots,C^{k-1}w\}$ the $k$ dimensional Krylov subspace generated by the matrix $\mathit{C}$ and the vector $\mathit{w}$ , and by the bold letter $\mathbf{0}$ the zero matrix with orders clear from the context.

2 The LSQR, CGME and LSMR algorithms

These three algorithms are all based on the Lanczos bidiagonalization process, which computes two orthonormal bases $\{q_{1},q_{2},\dots,q_{k}\}$ and $\{p_{1},p_{2},\dots,p_{k+1}\}$ of $\mathcal{K}_{k}(A^{T}A,A^{T}b)$ and $\mathcal{K}_{k+1}(AA^{T},b)$ for $k=1,2,\ldots,n$ , respectively. We describe the process as Algorithm 1.

Algorithm 1: $k$ -step Lanczos bidiagonalization process

Take $p_{1}=b/\|b\|\in\mathbb{R}^{m}$ , and define $\beta_{1}{q_{0}}=\mathbf{0}$ . 2. 2.

For $j=1,2,\ldots,k$

(a)

$r=A^{T}p_{j}-\beta_{j}{q_{j-1}}$ 2. (b)

$\alpha_{j}=\|r\|;q_{j}=r/\alpha_{j}$ 3. (c)

$z=Aq_{j}-\alpha_{j}{p_{j}}$ 4. (d)

$\beta_{j+1}=\|z\|;p_{j+1}=z/\beta_{j+1}.$

Algorithm 1 can be written in the matrix form

[TABLE]

where $e_{k+1}^{(k+1)}$ denotes the $(k+1)$ -th canonical basis vector of $\mathbb{R}^{k+1}$ , $P_{k+1}=(p_{1},p_{2},\ldots,p_{k+1})$ , $Q_{k}=(q_{1},q_{2},\ldots,q_{k})$ and

[TABLE]

It is known from (9) that

[TABLE]

Algorithm 1 cannot break down before step $n$ when $\sigma_{i},\ i=1,2,\ldots,n$ , are simple since $b$ is supposed to have nonzero components in the directions of $u_{i},\ i=1,2,\ldots,n$ . The singular values $\theta_{i}^{(k)},\ i=1,2,\ldots,k$ of $B_{k}$ , called the Ritz values of $A$ with respect to the left and right subspaces $span\{P_{k+1}\}$ and $span\{Q_{k}\}$ , are all simple.

Write $\mathcal{V}_{k}^{R}=\mathcal{K}_{k}(A^{T}A,A^{T}b)$ and $\beta_{1}=\|b\|$ . At iteration $k$ , LSQR [41] solves

[TABLE]

for the iterate

[TABLE]

where $e_{1}^{(k+1)}$ is the first canonical basis vector of $\mathbb{R}^{k+1}$ , and $\|Ax_{k}^{lsqr}-b\|=\|B_{k}y_{k}^{lsqr}-\beta_{1}e_{1}^{(k+1)}\|$ decreases monotonically with respect to $k$ .

CGME [4, 17, 18, 27, 28] is the CG method implicitly applied to $\min\|AA^{T}y-b\|$ or $AA^{T}y=b$ with $x=A^{T}y$ , and it solves the problem

[TABLE]

for the iterate $x_{k}^{cgme}$ . The error norm $\|x_{naive}-x_{k}^{cgme}\|$ decreases monotonically with respect to $k$ . Let $\bar{B}_{k}\in\mathbb{R}^{k\times k}$ be the matrix consisting of the first $k$ rows of $B_{k}$ , i.e.,

[TABLE]

Then the CGME iterate

[TABLE]

and $\|Ax_{k}^{cgme}-b\|=\beta_{k+1}|(e_{k}^{(k)})^{T}y_{k}^{cgme}|$ with $e_{k}^{(k)}$ the $k$ -th canonical vector of $\mathbb{R}^{k+1}$ .

LSMR [4, 12] is mathematically equivalent to MINRES [40] applied to the normal equation $A^{T}Ax=A^{T}b$ of (1), and it solves

[TABLE]

for the iterate $x_{k}^{lsmr}$ . The residual norm $\|A^{T}(b-Ax_{k}^{lsmr})\|$ of the normal equation decreases monotonically with respect to $k$ , and the iterate

[TABLE]

3 Some results on LSQR in [32, 33]

From $\beta_{1}e_{1}^{(k+1)}=P_{k+1}^{T}b$ and (13) we have

[TABLE]

which is the minimum 2-norm solution to the problem that perturbs $A$ in (1) to its rank $k$ approximation $P_{k+1}B_{k}Q_{k}^{T}$ . Recall that $\|A-A_{k}\|=\sigma_{k+1}$ . Analogous to (8), LSQR now solves a sequence of problems

[TABLE]

for $x_{k}^{lsqr}$ starting with $k=1$ onwards, where $A$ in (1) is replaced by a rank $k$ approximation $P_{k+1}B_{k}Q_{k}^{T}$ of it. Therefore, if $P_{k+1}B_{k}Q_{k}^{T}$ is a near best rank $k$ approximation to $A$ with an approximate accuracy $\sigma_{k+1}$ and the singular values $\theta_{i}^{(k)},\ i=1,2,\ldots,k$ of $B_{k}$ approximate the $k$ large $\sigma_{i}$ in natural order for $k=1,2,\ldots,k_{0}$ , then LSQR has the same regularization ability as the TSVD method and thus has the full regularization. See [32] for more elaborations.

The analysis on the TSVD method and the Tikhonov regularization method [22, 24] shows that the core requirement on a regularization method is to acquire the $k_{0}$ dominant SVD components of $A$ and meanwhile suppress the remaining $n-k_{0}$ SVD components. Therefore, the more accurate the rank $k$ approximation is to $A$ and the better approximations are the $k$ non-zero singular values of a projection matrix to some of the $k_{0}$ large singular values of $A$ , the better regularization ability of the method has, so that the best regularized solution obtained by it is more accurate.

Define

[TABLE]

which measures the accuracy of the rank $k$ approximation $P_{k+1}B_{k}Q_{k}^{T}$ to $A$ involved in LSQR. Since the best rank $k$ approximation $A_{k}$ satisfies $\|A-A_{k}\|=\sigma_{k+1}$ , we have

[TABLE]

The author in [33] introduces the definition of a near best rank $k$ approximation to $A$ : For LSQR, $P_{k+1}B_{k}Q_{k}^{T}$ is called a near best rank $k$ approximation to $A$ if $\gamma_{k}^{lsqr}$ is closer to $\sigma_{k+1}$ than to $\sigma_{k}$ :

[TABLE]

Based on the accurate estimates established in [32] for the 2-norm distances between the underlying Krylov subspace $\mathcal{V}_{k}^{R}$ and the $k$ dimensional dominant right singular subspace $span\{V_{k}\}$ for severely, moderately and mildly ill-posed problems, the author [33] has derived accurate estimates for $\gamma_{k}^{lsqr}$ and a number of approximation properties of $\theta_{i}^{(k)},\ i=1,2,\ldots,k$ for the three kinds of ill-posed problems. The results have shown that, for severely and moderately ill-posed problems with for suitable $\rho>1$ and $\alpha>1$ and for $k=1,2,\ldots,k_{0}$ , $P_{k+1}B_{k}Q_{k}^{T}$ must be a near best rank $k$ approximation to $A$ , and the $k$ Ritz values $\theta_{i}^{(k)}$ approximate the large singular values $\sigma_{i}$ of $A$ in natural order. This means that LSQR has the full regularization for these two kinds of problems with suitable $\rho>1$ and $\alpha>1$ . However, for moderately ill-posed problems with $\alpha>1$ not enough and mildly ill-posed problems, $P_{k+1}B_{k}Q_{k}^{T}$ is generally not a near best rank $k$ approximation, and the $k$ Ritz values $\theta_{i}^{(k)}$ do not approximate the large singular values of $A$ in natural order for some $k\leq k^{*}$ .

In particular, the author [33, Theorem 5.1] has proved the following three results:

[TABLE]

with

[TABLE]

These notation and results will be used later.

4 The regularization of CGME

Note that $P_{k}^{T}b=\beta_{1}e_{1}^{(k)}$ . We obtain

[TABLE]

Therefore, analogous to (8) and (18), CGME solves a sequence of problems

[TABLE]

for the regularized solution $x_{k}^{cgme}$ starting with $k=1$ onwards, where $A$ in (1) is replaced by a rank $k$ approximation $P_{k}\bar{B}_{k}Q_{k}^{T}$ of it.

Just as LSQR, if $P_{k}\bar{B}_{k}Q_{k}^{T}$ is a near best rank $k$ approximation to $A$ and the $k$ singular values of $\bar{B}_{k}$ approximate the large ones of $A$ in natural order for $k=1,2,\ldots,k_{0}$ , then CGME has the full regularization.

By (9), (10) and (12), the rank $k$ approximation involved in LSQR is

[TABLE]

By (19), we have $\gamma_{k}^{lsqr}=\|A(I-Q_{k}Q_{k}^{T})\|.$ For CGME, by (10) and (14), we obtain

[TABLE]

Therefore, $x_{k}^{cgme}$ is the solution to (31) in which the rank $k$ approximation to $A$ is $P_{k}\bar{B}_{k}Q_{k}^{T}=P_{k}P_{k}^{T}A$ , whose approximation accuracy is

[TABLE]

Theorem 1.

For the rank $k$ approximations $P_{k}P_{k}^{T}A=P_{k}\bar{B}_{k}Q_{k}^{T}$ to $A$ , $k=1,2,\ldots,n-1$ , with the definition $\gamma_{0}^{lsqr}=\|A\|$ we have

[TABLE]

*Proof. * We give two proofs of the upper bound in (35). The first is as follows. Since $P_{k+1}P_{k+1}^{T}(I-P_{k+1}P_{k+1}^{T})=\mathbf{0}$ , from (10) we obtain

[TABLE]

which is the upper bound in (35) by replacing the index $k+1$ with $k$ .

Taking $k=n$ in (12) and augmenting $P_{n+1}$ such that $P=(P_{n+1},\widehat{P})\in\mathbb{R}^{m\times m}$ is orthogonal, we have

[TABLE]

where all the entries $\alpha_{i}$ and $\beta_{i+1}$ , $i=1,2,\ldots,n$ , of $B_{n}$ are positive, and $Q_{n}\in\mathbb{R}^{n\times n}$ is orthogonal. Then by the orthogonal invariance of the 2-norm we obtain

[TABLE]

with $G_{k}$ defined by (27). It is straightforward to justify that the singular values of $G_{k}\in\mathbb{R}^{(n-k+1)\times(n-k)}$ strictly interlace those of $(\beta_{k}e_{1},G_{k})\in\mathbb{R}^{(n-k+1)\times(n-k+1)}$ by noting that $(\beta_{k+1}e_{1},G_{k})^{T}(\beta_{k+1}e_{1},G_{k})$ is an unreduced symmetric tridiagonal matrix, from which and $\|G_{k}\|=\gamma_{k}^{lsqr}$ the lower bound of (35) follows.

Based on (38), we can also give the second proof of the upper bound in (35). Observe from (27) that $(\beta_{k+1}e_{1},G_{k})$ is the matrix deleting the first row of $G_{k-1}$ . Applying the strict interlacing property of singular values to $(\beta_{k+1}e_{1},G_{k})$ and $G_{k-1}$ , we obtain $\gamma_{k-1}^{lsqr}=\|G_{k-1}\|>\|(\beta_{k+1}e_{1},G_{k})\|=\gamma_{k}^{cgme}$ , which yields the upper bound of (35).

From (38), notice that $(\beta_{k+2}e_{1},G_{k+1})$ is the matrix deleting the first row of $(\beta_{k+1}e_{1},G_{k})$ and the first column, which is zero, of the resulting matrix. Applying the strict interlacing property of singular values to $(\beta_{k+2}e_{1},G_{k+1})$ and $(\beta_{k+1}e_{1},G_{k})$ establishes (36).

(35) indicates that $P_{k}P_{k}^{T}A=P_{k}\bar{B}_{k}Q_{k}^{T}$ is definitely a less accurate rank $k$ approximation to $A$ than $AQ_{k}Q_{k}^{T}=P_{k+1}B_{k}Q_{k}^{T}$ in LSQR. (36) shows the strict monotonic decreasing property of $\gamma_{k}^{cgme}$ . Moreover, keep in mind that $\gamma_{k}^{lsqr}\geq\sigma_{k+1}$ . Then a combination of it and the results in Section 3 indicates that, unlike $P_{k+1}B_{k}Q_{k}^{T}$ in LSQR, there is no guarantee that $P_{k}\bar{B}_{k}Q_{k}^{T}$ is a near best rank $k$ approximation to $A$ even for severely and moderately ill-posed problems, because $\gamma_{k}^{cgme}$ simply lies between $\gamma_{k}^{lsqr}$ and $\gamma_{k-1}^{lsqr}$ and there do not exist any sufficient conditions on $\rho>1$ and $\alpha>1$ that enforce $\gamma_{k}^{cgme}$ to be closer to $\gamma_{k}^{lsqr}$ , let alone closer to $\sigma_{k+1}$ . Therefore, based on the accuracy of the rank $k$ approximations in CGME and LSQR, we come to the conclusion that the regularization ability of CGME cannot be superior and is generally inferior to that of LSQR. Furthermore, since there is no guarantee that $P_{k}\bar{B}_{k}Q_{k}^{T}$ is a near best rank $k$ approximation for severely and moderately ill-posed problems with suitable $\rho>1$ and $\alpha>1$ , CGME may not have the full regularization for these two kinds of problems.

In the following we investigate the approximation behavior of the $k$ singular values $\bar{\theta}_{i}^{(k)}$ of $\bar{B}_{k},\ k=1,2,\ldots,n$ . Before proceeding, it is necessary to have a closer look at Algorithm 1 and distinguish some subtleties when $A$ is rectangular, i.e., $m>n$ , and square, i.e., $m=n$ , respectively.

Keep in mind that Algorithm 1 does not break down before step $n$ . For the rectangular case $m>n$ , Algorithm 1 is exactly what is presented there, all the $\alpha_{k}$ and $\beta_{k+1}$ are positive, $k=1,2,\ldots,n$ , and we generate $P_{n+1}$ and $Q_{n}$ at step $n$ and $\alpha_{n+1}=\beta_{n+2}=0$ . As a consequence, by definition (33), we have

[TABLE]

It is known from (37) that the singular values of $B_{n}$ are identical to the singular values $\sigma_{i},\ i=1,2,\ldots,n$ of $A$ . Therefore, the $n+1$ singular values of $\bar{B}_{n+1}$ are $\sigma_{i},\,i=1,2,\ldots,n$ and zero.

For the square case $m=n$ , however, we must have $\beta_{n+1}=0$ , that is, the last row of $B_{n}$ is zero; otherwise, we would obtain an $n\times(n+1)$ orthonormal matrix $P_{n+1}$ , which is impossible since $P_{n}$ is already an orthogonal matrix. After Algorithm 1 is run to completion, we have

[TABLE]

whose singular values $\bar{\theta}_{i}^{(n)}=\sigma_{i},\,i=1,2,\ldots,n$ .

By the definition (33) of $\bar{B}_{k}$ , from (10) and the above description, for both the rectangular and square cases we obtain

[TABLE]

with $n^{*}=n+1$ for $m>n$ and $n^{*}=n$ for $m=n$ , which are unreduced symmetric tridiagonal matrices. For $m=n$ , the eigenvalues of $AA^{T}$ are just $\sigma_{i}^{2},\ i=1,2,\ldots,n$ , all of which are simple and positive; for $m>n$ , the eigenvalues of $AA^{T}$ are $\sigma_{i}^{2},\,i=1,2,\ldots,n$ plus $m-n$ zeros, denoted by $\sigma_{n+1}^{2}=\cdots=\sigma_{m}^{2}=0$ for our later use. Therefore, by the definition of $n^{*}$ , the eigenvalues of $\bar{B}_{n^{*}}\bar{B}_{n^{*}}^{T}$ are $\sigma_{i}^{2},\ i=1,2,\ldots,n^{*}$ .

Notice that $\bar{B}_{k}\bar{B}_{k}^{T}$ is nothing but the projection matrix of $AA^{T}$ onto the $k$ dimensional Krylov subspace $\mathcal{K}_{k}(AA^{T},b)$ . More precisely, $\bar{B}_{k}\bar{B}_{k}^{T}$ is generated by the $k$ -step symmetric Lanczos tridiagonalization process applied to $AA^{T}$ starting with $p_{1}=b/\|b\|$ , and the eigenvalues of $\bar{B}_{k}\bar{B}_{k}^{T}$ generally approximate extreme eigenvalues of $AA^{T}$ ; see, e.g., [3, 4, 43] for details. Particularly, the smallest eigenvalue $(\bar{\theta}_{k}^{(k)})^{2}$ of $\bar{B}_{k}\bar{B}_{k}^{T}$ generally converges to the smallest eigenvalue $\sigma_{n^{*}}^{2}$ of $AA^{T}$ as $k$ increases, which is $\sigma_{n+1}^{2}=0$ for $m>n$ and $\sigma_{n}^{2}>0$ for $m=n$ . In contrast, for $B_{k}$ , its smallest singular value $\theta_{k}^{(k)}>\sigma_{n}$ unconditionally until $\theta_{n}^{(n)}=\sigma_{n}$ .

We next give a number of close relationships between $\bar{\theta}_{i}^{(k)}$ and $\theta_{i}^{(k)}$ as well as between them and the singular values $\sigma_{i}$ of $A$ , which are crucial to compare the regularizing effects of CGME with those of LSQR.

Theorem 2.

Denote by $\bar{\theta}_{i}^{(k)}$ and $\theta_{i}^{(k)},\ i=1,2,\ldots,k$ the singular values of $\bar{B}_{k}$ and $B_{k}$ , respectively, labeled in decreasing order. Then

[TABLE]

Moreover,

[TABLE]

for $m=n$ and

[TABLE]

for $m>n$ .

Proof. Observe that $\bar{B}_{k}$ consists of the first $k$ rows of $B_{k}$ and all the $\alpha_{k}$ and $\beta_{k+1}$ are positive for $k=1,2,\ldots,n-1$ . Applying the strict interlacing property of singular values to $\bar{B}_{k}$ and $B_{k}$ , we obtain (41).

Note that, for $A$ both rectangular and square, we have $\theta_{i}^{(n)}=\sigma_{i},\ i=1,2,\ldots,n$ . Since $B_{k}$ consists of the first $k$ columns of $B_{n}$ and deletes the last $n-k$ zero rows of the resulting matrix, applying the strict interlacing property of singular values to $B_{k}$ and $B_{n}$ (cf. [44, p.198, Corollary 4.4]), for $k=1,2,\ldots,n-1$ we have

[TABLE]

Observe that $\bar{B}_{k}\bar{B}_{k}^{T},\,k=1,2,\ldots,n-1,$ are the $k\times k$ leading principal matrices of $\bar{B}_{n^{*}}\bar{B}_{n^{*}}^{T}$ , whose eigenvalues are $\sigma_{i}^{2},\ i=1,2,\ldots,n^{*}$ , and they are unreduced symmetric tridiagonal matrices. Applying the strict interlacing property of eigenvalues to $\bar{B}_{k}\bar{B}_{k}^{T}$ and $\bar{B}_{n^{*}}\bar{B}_{n^{*}}^{T}$ , for $k=1,2,\ldots,n-1$ we obtain

[TABLE]

from which and the definition of $n^{*}$ it follows that

[TABLE]

for $m=n$ and

[TABLE]

for $m>n$ . The above, together with (45) and (41), yields (42)–(44).

From Section 3, (42) and (44) indicate that, unlike the $k$ singular values $\theta_{i}^{(k)}$ of $B_{k}$ , which have been proved to interlace the first $k+1$ large ones of $A$ and approximate the first $k$ ones in natural order for the severely or moderately ill-posed problems for suitable $\rho>1$ or $\alpha>1$ [33], the lower bound for $\bar{\theta}_{k}^{(k)}$ is simply $\sigma_{n}$ for $m=n$ and zero for $m>n$ , respectively, and there does not exist a better lower bound for it. This implies that $\bar{\theta}_{k}^{(k)}$ may be much smaller than $\sigma_{k+1}$ and it can be as small as $\sigma_{n}$ for $m=n$ and arbitrarily small for $m>n$ , independent of $\rho$ or $\alpha$ . In other words, the size of $\rho$ or $\alpha$ has no intrinsic effects on the size of $\bar{\theta}_{k}^{(k)}$ , and cannot make $\bar{\theta}_{k}^{(k)}$ lie between $\sigma_{k+1}$ and $\sigma_{k}$ by choosing $\rho$ or $\alpha$ , that is, the regularizing effects of CGME have intrinsic indeterminacy for severely and moderately ill-posed problems, independent of the size of $\rho$ and $\alpha$ . Therefore, CGME may or may not have the full regularization for these two kinds of problems. On the other hand, even if the $\bar{\theta}_{i}^{(k)}$ approximate the first $k$ large singular values $\sigma_{i}$ in natural order, they are less accurate than the $k$ singular values $\theta_{i}^{(k)}$ of $B_{k}$ because of (42) and (44). Consequently, since the $\theta_{i}^{(k)}$ are always correspondingly larger than the $\bar{\theta}_{i}^{(k)}$ , the regularization ability of CGME cannot be superior and is generally inferior to that of the LSQR.

A final note is that, unlike for $m=n$ , CGME may be at risk for $m>n$ since the $\bar{\theta}_{k}^{(k)}$ converges to zero other than $\sigma_{n}$ as $k$ increases and can be arbitrarily small, which causes that the projected problem $\bar{B}_{k}y_{k}^{cgme}=\beta_{1}e_{1}^{(k)}$ may even be worse conditioned than (1) and $\|x_{k}^{cgme}\|=\|y_{k}^{cgme}\|$ may be unbounded as $k$ increases and bigger than $\|x_{naive}\|$ for a given (1).

In what follows we establish more results on the regularization of CGME and get more insight into it. It is known, e.g., [22, p.146] that the LSQR iterate $x_{k}^{lsqr}$ takes the following filtered SVD expansion:

[TABLE]

where the filters

[TABLE]

These results have been extensively used to study the regularizing effects of LSQR; see, e.g., [22, 23, 32]. We now prove that the CGME iterate $x_{k}^{cgme}$ also takes a filtered SVD expansion similar to (46) and (47), but its proof is much more involved than that of (46) and (47).

Theorem 3.

The CGME iterate $x_{k}^{cgme}$ has the filtered SVD expansion

[TABLE]

where the filters

[TABLE]

Proof. Let $y_{naive}=(AA^{T})^{\dagger}b$ be the minimal 2-norm solution to $\min_{y}\|AA^{T}y-b\|$ . Recall Algorithm 1. For this minimization problem, starting with $y_{0}^{cgme}=\mathbf{0}$ , at iteration $k$ the CG method extracts $y_{k}^{cgme}$ from the $k$ dimensional Krylov subspace

[TABLE]

It is well known from, e.g., [38], that the residual of $y_{k}^{cgme}$ is

[TABLE]

where $r_{k}(\lambda)$ is the $k$ -th residual, or Ritz, polynomial with the normalization $r_{k}(0)=1$ , whose $k$ roots are the Ritz values $(\bar{\theta}_{j}^{(k)})^{2}$ of $AA^{T}$ with respect to the subspace $span\{P_{k}\}$ ; see (40). Therefore, we have

[TABLE]

From the full SVD (4) of $A$ , write $U=(U_{n},U_{\perp})$ . Then we have $A=U_{n}\Sigma V^{T}$ , the compact SVD of $A$ . It is straightforward to see that

[TABLE]

Therefore, by $y_{naive}=(AA^{T})^{\dagger}b$ , premultiplying the two hand sides of (50) by $(AA^{T})^{\dagger}$ yields

[TABLE]

from which it follows that

[TABLE]

By the SVD (4) of $A$ , we have

[TABLE]

Hence for $k=1,2,\ldots,n$ from (51) and (52) we obtain

[TABLE]

with $f_{i}^{(k,cgme)}$ defined by (49). In terms of $x_{k}^{cgme}=A^{T}y_{k}^{cgme}$ and $A=U_{n}\Sigma V^{T}$ , premultiplying the two hand sides of the above relation by $A^{T}$ and exploiting $U_{n}^{T}U_{n}=I$ , we have

[TABLE]

Then making use of this relation, $A^{T}u_{i}=\sigma_{i}v_{i}$ and (53), we obtain (48).

Based on Theorems 2–3, we can prove the following important result.

Theorem 4.

Let $k_{cgme}^{*}$ and $k_{lsqr}^{*}$ be iterations at which the semi-convergence of CGME and LSQR occurs, respectively, $k_{0}$ the transition point of the TSVD method. Then

[TABLE]

that is, the semi-convergence of CGME always occurs no later than that of LSQR and the TSVD method.

Proof. The result $k_{lsqr}^{*}\leq k_{0}$ has been proved in [32, Theorem 3.1]. Next we first prove that $k_{cgme}^{*}\leq k_{0}$ .

Recall that the best TSVD solution

[TABLE]

and the fact that a 2-norm filtering best possible solution must capture the $k_{0}$ dominant SVD components of $A$ and suppress the $n-k_{0}$ small SVD components of $A$ .

For CGME, from (42) and (44) we have $\bar{\theta}_{k}^{(k)}<\sigma_{k},$ Therefore, at iteration $k_{0}+1$ we must have $\bar{\theta}_{k_{0}+1}^{(k_{0}+1)}<\sigma_{k_{0}+1}$ . If the $\bar{\theta}_{i}^{(k)}$ approximate the large $\sigma_{i}$ in natural order for $k=1,2,\ldots,k_{0}$ , then by (49) we have $f_{i}^{(k,cgme)}\rightarrow 1$ for $i=1,2,\ldots,k$ and $f_{i}^{(k,cgme)}\rightarrow 0$ for $i=k+1,\ldots,n$ . On the other hand, by (49) we have $f_{k_{0}+1}^{(k_{0}+1,cgme)}=\mathcal{O}(1)$ . Compared with the best TSVD solution, by (48) the above shows that the CGME iterate $x_{k}^{cgme}$ captures the $k$ dominant SVD components of $A$ and filters out the $n-k$ small ones. As a result, $x_{k}^{cgme}$ improves until iteration $k_{0}$ , and the semi-convergence of CGME occurs at iteration $k_{cgme}^{*}=k_{0}$ .

If the $\bar{\theta}_{j}^{(k)}$ do not converge to the large singular values of $A$ in natural order and $\bar{\theta}_{k}^{(k)}<\sigma_{k_{0}+1}$ for some iteration $k\leq k_{0}$ for the first time, then $x_{k}^{cgme}$ is already deteriorated by the noise $e$ before iteration $k$ : Suppose that $\sigma_{j^{*}}<\bar{\theta}_{k}^{(k)}<\sigma_{k_{0}+1}$ with $j^{*}$ the smallest integer $j^{*}>k_{0}+1$ . Then we can easily justify from (49) that $f_{i}^{(k,cgme)}\in(0,1)$ and tends to zero monotonically for $i=j^{*},j^{*}+1,\ldots,n$ , but

[TABLE]

since the first factor is non-positive and the second factor is positive by noting that $\bar{\theta}_{j}^{(k)}>\sigma_{i}$ , $j=1,2,\ldots,k-1$ for $i=k_{0}+1,\ldots,j^{*}-1$ . As a result, $f_{i}^{(k,cgme)}\geq 1$ for $i=k_{0}+1,\ldots,j^{*}-1$ , showing that $x_{k}^{cgme}$ has been deteriorated by the noise $e$ and the semi-convergence of CGME has occurred at some iteration $k^{*}_{cgme}<k_{0}$ .

Finally, we prove $k_{cgme}^{*}\leq k_{lsqr}^{*}$ . Notice that $\bar{\theta}_{k}^{(k)}<\theta_{k}^{(k)}$ means that the first iteration $k$ such that $\bar{\theta}_{k}^{(k)}<\sigma_{k_{0}+1}$ for CGME is no more than the one such that $\theta_{k}^{(k)}<\sigma_{k_{0}+1}$ for LSQR. Therefore, applying a similar proof to that of the semi-convergence of CGME to (46)–(47), it is direct that the semi-convergence of CGME occurs no later than that of LSQR, i.e., $k_{cgme}^{*}\leq k_{lsqr}^{*}$ .

It is seen from the above proof that, due to $\bar{\theta}_{k}^{(k)}<\theta_{k}^{(k)}$ , the semi-convergence of CGME can occur much earlier than that of LSQR.

We can, informally, deduce more features of CGME. By definition, the optimality of CGME means that

[TABLE]

holds unconditionally for $i=1,2,\ldots,n$ . Since $x_{k}^{cgme}$ and $x_{k}^{lsqr}$ converge to $x_{true}$ until iterations $k_{cgme}^{*}$ and $k_{lsqr}^{*}$ at which the semi-convergence of CGME and LSQR occurs, respectively, it is known that, for $k\leq k_{cgme}^{*}$ and $k\leq k_{lsqr}^{*}$ , $\|x_{true}-x_{k}^{cgme}\|$ and $\|x_{true}-x_{k}^{lsqr}\|$ are negligible relative to $\|x_{naive}-x_{true}\|$ , which is supposed very large in the context of discrete ill-posed problems. As a consequence, we have

[TABLE]

Since the first terms in the right-hand sides of (56) and (57) are the same constant, a combination of (55) with (56) and (57) means that

[TABLE]

generally holds until $k=k_{cgme}^{*}$ . That is, $x_{k}^{cgme}$ should be at least as accurate as $x_{k}^{lsqr}$ until the semi-convergence of CGME occurs. Then for $k>k_{cgme}^{*}$ , according to Theorem 4, $x_{k}^{lsqr}$ continues approximating $x_{true}$ as $k$ increases until iteration $k=k_{lsqr}^{*}$ , at which LSQR ultimately computes a more accurate approximation $x_{k_{lsqr}^{*}}^{lsqr}$ to $x_{true}$ than $x_{k_{cgme}^{*}}^{cgme}$ .

We will have more exciting findings. Observe that after Lanczos bidiagonalization is run $k$ steps, we have already obtained $\bar{B}_{k+1}$ , $P_{k+1}$ and $Q_{k+1}$ , but LSQR and CGME exploit only $B_{k},Q_{k}$ and $\bar{B}_{k},Q_{k}$ , respectively. Since $\alpha_{k+1}>0$ for $k\leq n-1$ , applying the strict interlacing property of singular values to $B_{k}$ and $\bar{B}_{k+1}$ , we have

[TABLE]

Note from (44) that $\bar{\theta}_{i}^{(k+1)}<\sigma_{i},\ i=1,2,\ldots,k+1$ . Combining (59) with (44), we see that as approximations to the first large $k$ singular values $\sigma_{i}$ of $A$ , although the $k$ singular values $\bar{\theta}_{i}^{(k)}$ of $\bar{B}_{k}$ are less accurate than the singular values $\theta_{i}^{(k)}$ of $B_{k}$ , the first $k$ singular values $\bar{\theta}_{i}^{(k+1)}$ of $\bar{B}_{k+1}$ are more accurate than the $\theta_{i}^{(k)}$ correspondingly.

Based on the above property and (33), we next show how to extract a best possible rank $k$ approximation to $A$ from the available rank $k+1$ matrix $P_{k+1}\bar{B}_{k+1}Q_{k+1}^{T}=P_{k+1}P_{k+1}^{T}A$ generated by Algorithm 1.

Theorem 5.

Let $\bar{C}_{k}$ be the best rank $k$ approximation to $\bar{B}_{k+1}$ with respect to the 2-norm. Then for $k=1,2,\ldots,n-1$ we have

[TABLE]

where $\bar{\theta}_{k+1}^{(k+1)}$ is the smallest singular value of $\bar{B}_{k+1}$ and $\gamma_{k+1}^{cgme}$ is defined by (34).

Proof. Write $A-P_{k+1}\bar{C}_{k}Q_{k+1}^{T}=A-P_{k+1}\bar{B}_{k+1}Q_{k+1}^{T}+P_{k+1}(\bar{B}_{k+1}-\bar{C}_{k})Q_{k+1}^{T}$ . Then exploiting (33), we obtain

[TABLE]

By the definition of $C_{k}$ and (33), it is easily justified that $P_{k+1}\bar{C}_{k}Q_{k+1}^{T}$ is the best rank $k$ approximation to $P_{k+1}\bar{B}_{k+1}Q_{k+1}^{T}=P_{k+1}P_{k+1}^{T}A$ in the 2-norm as $P_{k+1}$ and $Q_{k+1}$ are column orthonormal. Keep in mind that $A_{k}$ is the best rank $k$ approximation to $A$ . Since $P_{k+1}P_{k+1}^{T}A_{k}$ is a rank $k$ approximation to $P_{k+1}P_{k+1}^{T}A$ , we obtain

[TABLE]

Note that the first term in the right-hand side of (63) is just $\gamma_{k+1}^{cgme}$ . Therefore, it follow from (63) that (60) holds.

Since $P_{k+1}$ and $Q_{k+1}$ are column orthonormal and $C_{k}$ is the best rank $k$ approximation to $\bar{B}_{k+1}$ , by the orthogonal invariance of 2-norm we obtain

[TABLE]

which, together with (62), yields (61).

The bound (61) is always smaller than the bound (60) because of $\bar{\theta}_{k+1}^{(k+1)}<\sigma_{k+1}$ from (42) and (44). Indeed, the bound (60) can be conservative since we have amplified $\|P_{k+1}(\bar{B}_{k+1}-\bar{C}_{k})Q_{k+1}^{T}\|$ twice and obtained its bound $\sigma_{k+1}$ , which might be a considerable overestimate. Moreover, as we have explained previously, (42) and (44) show that $\bar{\theta}_{k+1}^{(k+1)}>\sigma_{n}$ may approach $\sigma_{n}$ for $m=n$ and $\bar{\theta}_{k+1}^{(k+1)}>0$ can be close to zero arbitrarily for $m>n$ . By definition (19) of $\gamma_{k}^{lsqr}$ , since $\gamma_{k+1}^{cgme}<\gamma_{k}^{lsqr}$ (cf. the upper bound of (35)), $\gamma_{k}^{lsqr}\geq\sigma_{k+1}>\bar{\theta}_{k+1}^{(k+1)}$ and $\|A-P_{k+1}\bar{C}_{k}Q_{k+1}^{T}\|\geq\sigma_{k+1}$ , the right-hand side of (61) satisfies

[TABLE]

Therefore, $\bar{\theta}_{k+1}^{(k+1)}+\gamma_{k+1}^{cgme}$ is as small as and can even be smaller than $\gamma_{k}^{lsqr}$ , meaning that $P_{k+1}\bar{C}_{k}Q_{k+1}^{T}$ is as accurate as the rank $k$ approximation $P_{k+1}B_{k}Q_{k}^{T}$ in LSQR.

Define $Q_{n+1}=(Q_{n},\mathbf{0})\in\mathbb{R}^{n\times(n+1)}$ , and note from (39) that $\bar{B}_{n+1}=(B_{n},\mathbf{0})$ . Recall that the singular values of $\bar{B}_{n+1}$ and $B_{n}$ are $\bar{\theta}_{i}^{(n+1)},\ i=1,2,\ldots,n+1$ and $\theta_{i}^{(n)},\ i=1,2,\ldots,n$ , respectively, and $\bar{\theta}_{i}^{(n+1)}=\theta_{i}^{(n)}=\sigma_{i},\ i=1,2,\ldots,n$ and $\bar{\theta}_{n+1}^{(n+1)}=0$ . From (37) and the definition of $\bar{C}_{n}$ , since $\bar{B}_{n+1}$ is of rank $n$ , we have

[TABLE]

and

[TABLE]

Based on Theorem 5 and the analysis followed, just as done in CGME and LSQR, we can replace $A$ in (1) by the rank $k$ approximation $P_{k+1}\bar{C}_{k}Q_{k+1}^{T}$ and propose a modified CGME (MCGME) method that solves

[TABLE]

for the regularized solution $x_{k}^{mcgme}=Q_{k+1}y_{k}^{mcgme}$ of (1) with

[TABLE]

starting with $k=1$ onwards. MCGME is expected to have the same regularization ability as LSQR because (i) the $k$ nonzero singular values $\bar{\theta}_{i}^{(k+1)}$ of $\bar{C}_{k}$ are more accurate than the $k$ singular values $\theta_{i}^{(k)}$ of $B_{k}$ as approximations to the first $k$ singular values of $A$ and (ii) $P_{k+1}\bar{C}_{k}Q_{k+1}^{T}$ is a rank $k$ approximation which is as accurate as $P_{k+1}B_{k}Q_{k}^{T}$ in LSQR. Regarding implementations, we comment that the singular values, and left and right singular vectors of $\bar{C}_{k}^{\dagger}$ is already available when $\bar{C}_{k}$ is extracted from the SVD of $\bar{B}_{k+1}$ , whose computational cost is $\mathcal{O}(k^{3})$ flops. As a result, by (65) we can compute $y_{k}^{mcgme}$ at cost of $\mathcal{O}(k^{2})$ flops. A difference from CGME and LSQR is that MCGME seeks $x_{k}^{mcgme}$ in the $k+1$ dimensional Krylov subspace $\mathcal{K}_{k+1}(A^{T}A,A^{T}b)$ other than in $\mathcal{K}_{k}(A^{T}A,A^{T}b)$ . Numerical experiments will justify that MCGME has very comparable regularizing effects to LSQR and can obtain the best regularized solutions with very similar accuracy to those by LSQR. We will not consider the by-product MCGME method further in this paper.

$\bar{C}_{k}$ may have some other potential applications. For example, when we are required to compute several largest singular triplets of a large scale matrix $A$ , we can use the nonzero singular values of $\bar{C}_{k}$ to replace the ones of $B_{k}$ as more accurate approximations to the largest singular values of $A$ in Lanczos bidiagonaliation type algorithms [34]. In such a way, exploiting the SVD of $\bar{C}_{k}$ , we can also compute more accurate approximate left and right singular vectors of $A$ simultaneously. A development of such modified algorithms is beyond the scope of this paper.

5 The accuracy of truncated rank $k$ approximate SVDs

by randomized algorithms

In this section, we deviate from the context of Krylov solvers. Using the analysis approach in the last section, we consider the accuracy of a truncated rank $k$ SVD approximation to $A$ constructed by standard randomized algorithms and their improved variants [16]. This topic has been intensively investigated in recent years; see the survey paper [16] and the references therein. Algorithm 2 is one of the two basic randomized algorithms from [16] for computing an approximate SVD and extracting a truncated rank $k$ approximate SVD from it. A minor difference from the other sections in this paper is that we drop the restrictions that the singular values of $A$ are simple and $m\geq n$ , that is, the singular values of $A$ are $\sigma_{1}\geq\sigma_{2}\geq\cdots\geq\sigma_{\min\{m,n\}}$ .

Algorithm 2: Randomized approximate SVD of $A$

•

Input: Given $A\in\mathbb{R}^{m\times n}$ , a target rank $k$ , and an oversampling parameter $p$ satisfying $\ell=k+p\leq\min\{m,n\}$ .

•

Output: a truncated rank $k$ approximate SVD $A_{(k)}$ of $A$ .

Stage A

Draw an $n\times\ell$ Gaussian random matrix $\Omega$ . 2. 2.

Form the $m\times\ell$ matrix $Y=A\Omega$ . 3. 3.

Compute the compact QR factorization $Y=PR$ .

Stage B

Form $B=P^{T}A$ . 2. 2.

Compute the compact SVD of the $\ell\times n$ matrix $B$ : $B=\widetilde{U}\widetilde{\Sigma}\widetilde{V}^{T}$ . 3. 3.

Set $\widehat{U}=P\widetilde{U}$ . Compute a rank $\ell$ SVD approximation $PP^{T}A=\widehat{U}\widetilde{\Sigma}\widetilde{V}^{T}$ to $A$ . 4. 4.

Let $B_{(k)}=\widetilde{U}_{k}\widetilde{\Sigma}_{(k)}\widetilde{V}_{k}^{T}$ be the best rank $k$ approximation to $B$ with the diagonal $\widetilde{\Sigma}_{(k)}$ being the first $k$ diagonals of $\widetilde{\Sigma}$ , and $\widetilde{U}_{k}$ and $\widetilde{V}_{k}$ the first $k$ columns of $\widetilde{U}$ and $\widetilde{V}$ , respectively. Form a truncated rank $k$ SVD approximation $A_{(k)}=PB_{(k)}=\widehat{U}_{k}\widetilde{\Sigma}_{(k)}\widetilde{V}_{k}^{T}$ to $A$ with $\widehat{U}_{k}=P\widetilde{U}_{k}$ .

For the approximation accuracy of $A_{(k)}$ to $A$ , Halko et al. [16] establish a fundamental result (cf. Theorem 9.3 there):

[TABLE]

Assume that the oversampling parameter $p\geq 4$ . Making use of the probability theory, in terms of $\sigma_{k+1}$ , Halko et al. [16] have established a number of bounds for $\|(I-PP^{T})A\|$ ; see, e.g., Theorems 10.5–10.8 and Corollary 10.9–10.10 there. However, concerning (66), they point out in Remark 9.1 that ”In the randomized setting, the truncation step appears to be less damaging than the error bound of Theorem 9.3 suggests, but we currently lack a complete theoretical understanding of its behavior.” That is to say, the first term $\sigma_{k+1}$ in (66) is generally conservative and an overestimate.

Motivated by the proof of (61) in Theorem 5, we can improve (66) substantially and reveal why (66) is an overestimate. Let

[TABLE]

be the singular values of $B=P^{T}A$ defined in Algorithm 2. It is clear from Algorithm 2 that

[TABLE]

is an $(k+p)\times(k+p)$ symmetric matrix, which is the projection matrix of $AA^{T}$ onto the subspace $span\{P\}$ in the orthonormal basis of $\{p_{i}\}_{i=1}^{k+p}$ with $P=(p_{1},p_{2},\ldots,p_{k+p})$ , whose eigenvalues are $\widetilde{\sigma}_{i}^{2},\ i=1,2,\ldots,k+p$ . Keep in mind that the eigenvalues of $AA^{T}$ are $\sigma_{i}^{2},\ i=1,2,\ldots,\min\{m,n\}$ and $m-\min\{m,n\}$ zeros, denoted by $\sigma_{\min\{m,n\}+1}^{2}=\cdots=\sigma_{m}^{2}=0$ for later use.

Theorem 6.

For $A\in\mathbb{R}^{m\times n}$ , let $P$ and $A_{(k)}$ be defined as in Algorithm 2, and $\widetilde{\sigma}_{k+1}$ defined as in (67). Then

[TABLE]

with

[TABLE]

Proof. Since $P$ is orthonormal, the eigenvalues of $BB^{T}$ interlace those of $AA^{T}$ and satisfy (cf. [44, p.198, Corollary 4.4])

[TABLE]

from which (69) follows.

From Algorithm 2, we can write

[TABLE]

Since $B_{(k)}$ is the best rank $k$ approximation to $B$ , by the column orthonormality of $P$ we obtain

[TABLE]

which proves (68).

Remark 5.1.

This theorem indicates that $\widetilde{\sigma}_{k+1}$ never exceeds $\sigma_{k+1}$ and, for $m,n$ large and $k+p$ small, it may be much smaller than $\sigma_{k+1}$ . Specifically, $\widetilde{\sigma}_{k+1}$ can be as small as $\sigma_{m-p+1}$ . For $m>n$ , whenever $m-p+1>n$ , we have $\sigma_{m-p+1}=0$ . Consequently, the bound (68) is unconditionally superior to the bound (66) and is sharper than the latter when $\widetilde{\sigma}_{k+1}<\sigma_{k+1}$ . On the other hand, however, note that $\sigma_{k+1}\leq\|A-A_{(k)}\|$ . Therefore, if $\|(I-PP^{T})A\|<\sigma_{k+1}$ , we must have $\widetilde{\sigma}_{k+1}\approx\sigma_{k+1}$ , that is, $\widetilde{\sigma}_{k+1}$ dominates the bound (68). Summarizing the above, in response of Remark 9.1 in [16], we conclude that the truncation step does damage the approximation accuracy of the truncated rank $k$ approximation when $\|(I-PP^{T})A\|<\sigma_{k+1}$ and it is less damaging when $\|(I-PP^{T})A\|\geq\sigma_{k+1}$ .

As we have seen, the column space of $P$ constructed by Algorithm 2 aims to capture the $(k+p)$ -dimensional dominant left singular subspace of $A$ . A variant of it is to capture the $(k+p)$ -dimensional right dominant singular subspace of $A$ . Mathematically, it amounts to applying Algorithm 2 to $A^{T}$ and computes a truncated rank $k$ SVD approximation $A_{(k)}$ to $A$ in a similar way. We call such variant Algorithm 3, for which (66) now becomes

[TABLE]

with the orthonormal $P\in\mathbb{R}^{n\times(k+p)}$ .

Note that the eigenvalues of $A^{T}A$ are $\sigma_{i}^{2},\ i=1,2,\ldots,\min\{m,n\}$ and $n-\min\{m,n\}$ zeros, denoted by $\sigma_{\min\{m,n\}+1}^{2}=\cdots=\sigma_{n}^{2}=0$ . Since the eigenvalues of $(AP)^{T}AP$ interlace those of $A^{T}A$ , using the same proof approach as that of Theorem 6, we can establish the following result.

Theorem 7.

For $A\in\mathbb{R}^{m\times n}$ , let $P$ and $A_{(k)}$ be defined as in Algorithm 3, and $\widetilde{\sigma}_{1}\geq\widetilde{\sigma}_{2}\geq\cdots\geq\widetilde{\sigma}_{k+p}$ be the singular values of $AP$ . Then

[TABLE]

with

[TABLE]

We comment that, in the case $m<n$ , whenever $n-p+1>m$ , we have $\sigma_{n-p+1}=0$ , and consequently the bound (71) is unconditionally superior to and can be substantially sharper than the bound (70) for $m,n$ large and $k+p$ small.

Remark 5.2.

If the singular values $\sigma_{i}$ of $A$ are all simple, by the strict interlacing properties of eigenvalues, the singular values of $B$ in Algorithms 2–3 are all simple too, and the lower and upper bounds in (69) and (72) are strict, i.e., $\widetilde{\sigma}_{k+1}<\sigma_{k+1}$ .

Remark 5.3.

(66) and (70) and Theorems 6–7 hold for all the truncated rank $k$ SVD approximations generated by the enhanced variants of Algorithm 2–3 in [16], where the unique difference between the variants is the way that $P$ is generated. More generally, Theorems 6–7 are true for arbitrarily given orthonormal $P\in\mathbb{R}^{m\times(k+p)}$ and $P\in\mathbb{R}^{n\times(k+p)}$ with $k+p\leq\min\{m,n\}$ , respectively.

6 The regularization of LSMR

From Algorithm 1 we obtain

[TABLE]

Therefore, from (16), noting that $Q_{k+1}^{T}A^{T}b=\alpha_{1}\beta_{1}e_{1}^{(k+1)}$ , we have

[TABLE]

which means that LSMR solves the problem

[TABLE]

for the regularized solution $x_{k}^{lsmr}$ starting with $k=1$ onwards. In the meantime, it is direct to justify that the TSVD solution $x_{k}^{tsvd}$ solves the problem

[TABLE]

starting with $k=1$ onwards. Therefore, (75) and (76) deal with the normal equation $A^{T}Ax=A^{T}b$ of (1) by replacing $A^{T}A$ with its rank $k$ approximations $Q_{k+1}Q_{k+1}^{T}A^{T}AQ_{k}Q_{k}^{T}$ and $A_{k}^{T}A_{k}$ , respectively.

In view of (75) and (76), we need to accurately estimate the approximation accuracy $\|A^{T}A-Q_{k+1}Q_{k+1}^{T}A^{T}AQ_{k}Q_{k}^{T}\|$ and investigate how the singular values of $Q_{k+1}^{T}A^{T}AQ_{k}$ approximate the $k$ large singular values $\sigma_{i}^{2},\ i=1,2,\ldots,k$ of $A^{T}A$ . We are concerned with some intrinsic relationships between the regularizing effects of LSMR and those of LSQR and compare the regularization ability of the two methods.

By (17), (9), (10), (12) and $P_{k+1}P_{k+1}^{T}b=b$ , the LSQR iterate

[TABLE]

which is the solution to the problem

[TABLE]

that replaces $A^{T}A$ by its rank $k$ approximation $Q_{k}Q_{k}^{T}A^{T}AQ_{k}Q_{k}^{T}=Q_{k}B_{k}^{T}B_{k}Q_{k}^{T}$ in the normal equation $A^{T}Ax=A^{T}b$ . In this sense, the accuracy of such rank $k$ approximation is measured in terms of $\|A^{T}A-Q_{k}Q_{k}^{T}A^{T}AQ_{k}Q_{k}^{T}\|$ for LSQR.

Firstly, we present the following result, which compares the accuracy of two rank $k$ approximations involved in LSMR and LSQR in the sense of solving the normal equation $A^{T}Ax=A^{T}b$ .

Theorem 8.

For the rank $k$ approximations to $A^{T}A$ in (75) and (77), $k=1,2,\ldots,n-1$ , we have

[TABLE]

Proof. For the orthogonal matrix $Q_{n}$ generated by Algorithm 1, noticing that $\alpha_{n+1}=0$ , from (9) and (10) we obtain $Q_{n}^{T}A^{T}AQ_{n}=B_{n}^{T}B_{n}$ and

[TABLE]

where

[TABLE]

is the matrix by deleting the $(k+1)\times k$ leading principal matrix of the symmetric tridiagonal matrix $B_{n}^{T}B_{n}$ and the first $k-1$ zero rows and $k$ zero columns of the resulting matrix, where $G_{k}$ is defined by (27) and $e_{1}^{(n-k)}$ are the first canonical vector of $\mathbb{R}^{n-k}$ .

On the other hand, it is direct to verify that

[TABLE]

where $F_{k}^{\prime}=\left(\alpha_{k+1}\beta_{k+1}e_{2}^{(n-k+1)},F_{k}\right)\in\mathbb{R}^{(n-k+1)\times(n-k+1)}$ with $e_{2}^{(n-k+1)}$ being the second canonical vector of $\mathbb{R}^{n-k+1}$ .

From (87) and (90), we obtain

[TABLE]

Since $G_{k}^{T}G_{k}$ is unreduced symmetric tridiagonal, its eigenvalues are all simple. Observe from (90) that

[TABLE]

Therefore, we know from [7, p.218] that the eigenvalues of $F_{k}^{T}F_{k}$ strictly interlace those of $(G_{k}^{T}G_{k})^{2}$ and are all simple. Furthermore, we see from (27) that $G_{k}$ is of full column rank, which means that the eigenvalues of $F_{k}^{T}F_{k}$ are all positive.

Note that the eigenvalues of $F_{k}F_{k}^{T}$ are those of $F_{k}^{T}F_{k}$ and zero. As a result, the eigenvalues of $F_{k}F_{k}^{T}$ are all simple. According to [7, p.218], we know from (92) that the eigenvalues of $F_{k}^{\prime}(F_{k}^{\prime})^{T}$ strictly interlace those of $F_{k}F_{k}^{T}$ . Therefore, we obtain

[TABLE]

which, from (79) and (91), establishes (78).

This theorem indicates that, as far as solving $A^{T}Ax=A^{T}b$ is concerned, the rank $k$ approximation in LSMR is more accurate than that in LSQR.

Recall that (19) measures the quality of the rank $k$ approximation involved in LSQR for the regularization problem (18). We now estimate the approximation accuracy of $Q_{k+1}Q_{k+1}^{T}A^{T}AQ_{k}Q_{k}^{T}$ to $A^{T}A$ in terms of $(\gamma_{k}^{lsqr})^{2}$ .

Theorem 9.

For $k=1,2,3,\ldots,n-1$ , let $\gamma_{k}^{lsqr}$ be defined as (19). For $k=2,3,\ldots,n-1$ we have

[TABLE]

with $0<m_{k}<1$ and $\gamma_{0}^{lsqr}=\|A\|$ . For $k=1,2,\ldots,n-2$ , the strict monotonic decreasing property holds:

[TABLE]

Proof. Combining (90) with (21) and (28), for $k=2,3,\ldots,n-1$ we obtain from [48, p.98] and [7, p.218] that

[TABLE]

with $0<m^{\prime}_{k}\leq 1$ and $0<m_{k}<m^{\prime}_{k}$ , from which the lower and upper bounds in (94) follow directly.

For $k=1$ , the equality in (96) is still true. From (28), we have $\alpha_{2}<\gamma_{1}^{lsqr},\ \beta_{2}<\|A\|=\gamma_{0}^{lsqr}$ . Therefore, we obtain

[TABLE]

from which it follows that (94) holds for $k=1$ .

From (87), we see that $F_{k+1}$ is the matrix that first deletes the first column and row of $F_{k}$ and then deletes the first zero column and row of the resulting matrix. Therefore, applying the interlacing property of singular values to $F_{k+1}$ and $F_{k}$ yields

[TABLE]

We next prove that the above ” $\leq$ ” is the strict ” $<$ ”. Since $B_{n}^{T}B_{n}=Q_{n}^{T}A^{T}AQ_{n}$ is an unreduced symmetric tridiagonal matrix, its singular values $\sigma_{i}^{2},\ i=1,2,\ldots,n$ are simple. Observe that $F_{k}$ is the matrix deleting the first $k$ columns of $B_{n}^{T}B_{n}$ and the first $k$ zero rows of the resulting matrix. Consequently, the singular values $\zeta_{i}^{(k)},\,i=1,2,\ldots,n-k$ of $F_{k}$ strictly interlace the simple singular values $\sigma_{i}^{2},\ i=1,2,\ldots,n$ of $B_{n}^{T}B_{n}$ and are thus simple for $k=1,2,\ldots,n-1$ . Moreover, the singular values of $F_{k+1}$ strictly interlace those of $F_{k}$ , which means that $\zeta_{1}^{(k)}<\zeta_{1}^{(k+1)}$ , i.e., $\|F_{k}\|<\|F_{k+1}\|$ , which proves (95).

Remark 6.1.

According to the results and analysis in [33], we have $\gamma_{k-1}^{lsqr}/\gamma_{k}^{lsqr}\sim\rho$ for severely ill-posed problems, and $\gamma_{k-1}^{lsqr}/\gamma_{k}^{lsqr}\sim(k/(k-1))^{\alpha}$ for moderately and mildly ill-posed problems. Therefore, the lower and upper bounds of (94) indicate that $\|A^{T}A-Q_{k+1}Q_{k+1}^{T}A^{T}AQ_{k}Q_{k}^{T}\|\sim(\gamma_{k}^{lsqr})^{2}$ .

Finally, let us investigate the relationship between the singular values of rank $k$ approximation matrices in LSMR and LSQR. From (73) and (12), we know that they are the singular values of $(B_{k}^{T}B_{k},\alpha_{k+1}\beta_{k+1}e_{k}^{(k)})^{T}$ and $B_{k}^{T}B_{k}$ , respectively.

Theorem 10.

Let $(\widetilde{\theta}_{1}^{(k)})^{2}>(\widetilde{\theta}_{2}^{(k)})^{2}>\cdots>(\widetilde{\theta}_{k}^{(k)})^{2}$ be the singular values of $(B_{k}^{T}B_{k},\alpha_{k+1}\beta_{k+1}e_{k}^{(k)})^{T}$ . Then for $i=1,2,\ldots,k$ we have

[TABLE]

Proof. Observe that $(B_{k}^{T}B_{k},\alpha_{k+1}\beta_{k+1}e_{k}^{(k)})^{T}$ is the matrix consisting of the first $k$ columns of $B_{n}^{T}B_{n}$ and deleting the last $n-k-1$ zero rows of the resulting matrix. As a result, since $\sigma_{i},\ i=1,2,\ldots,n$ , are simple, the singular values $(\widetilde{\theta}_{i}^{(k)})^{2}$ of $(B_{k}^{T}B_{k},\alpha_{k+1}\beta_{k+1}e_{k}^{(k)})^{T}$ strictly interlace the singular values $\sigma_{i}^{2}$ of $B_{n}^{T}B_{n}$ :

[TABLE]

and are simple, which means the upper bound (97).

Note that $(B_{k}^{T}B_{k},\alpha_{k+1}\beta_{k+1}e_{k}^{(k)})^{T}(B_{k}^{T}B_{k},\alpha_{k+1}\beta_{k+1}e_{k}^{(k)})$ has the $k+1$ eigenvalues $(\widetilde{\theta}_{i}^{(k)})^{4}$ and zero, and $(B_{k}^{T}B_{k})^{T}(B_{k}^{T}B_{k})=(B_{k}^{T}B_{k})^{2}$ is its $k\times k$ leading principal submatrix and has $k$ simple eigenvalues $(\theta_{i}^{(k)})^{4}$ . Therefore, $(\theta_{i}^{(k)})^{4}$ strictly interlace $(\widetilde{\theta}_{i}^{(k)})^{4}$ and zero, which proves the lower bound of (97).

On the other hand, we have

[TABLE]

Recall (27) that $\alpha_{k+1}<\gamma_{k}^{lsqr}$ and $\beta_{k+1}<\gamma_{k-1}^{lsqr}$ . By standard perturbation theory, we obtain

[TABLE]

from which it follows that (98) holds.

Remark 6.2.

(97) indicates that $\widetilde{\theta}_{i}^{(k)},\ 1=1,2,\ldots,k$ approximate the first $k$ large singular values $\sigma_{i}$ more accurately than $\theta_{i}^{(k)}$ . Particularly, since $\theta_{k}^{(k)}<\widetilde{\theta}_{k}^{(k)}$ , the first iteration step $k$ such that $\widetilde{\theta}_{k}^{(k)}<\sigma_{k_{0}+1}$ must be no smaller than the $k$ such that $\theta_{k}^{(k)}<\sigma_{k_{0}+1}$ . A combination of this and the previous analysis on the semi-convergence of CGME and LSQR implies that the semi-convergence of LSMR must occur no sooner than that of LSQR. On the other hand, (98) shows that $\widetilde{\theta}_{i}^{(k)}$ is bounded from the above by $\theta_{i}^{(k)}$ as an approximation to $\sigma_{i}$ , which and (97) imply that $\widetilde{\theta}_{i}^{(k)}$ and $\theta_{i}^{(k)}$ interact and $\theta_{i}^{(k)}$ cannot be considerably more accurate than $\widetilde{\theta}_{i}^{(k)}$ as approximations to the large singular values of $A$ for $i=1,2,\ldots,k$ .

Remark 6.3.

A combination of Theorem 8 and the above two remarks means that the regularizing effects of LSMR are not inferior to those of LSQR and the best regularized solutions by LSMR are at least as accurate as those by LSQR, that is, LSMR has the same regularization ability as that of LSQR. Particularly, from the results on LSQR in Section 3, we conclude that LSMR has the full regularization for severely or moderately ill-posed problems with suitable $\rho>1$ or $\alpha>1$ .

A final note is that Huang and Jia [31] have derived the eigendecomposition, i.e., equivalent SVD, filtered expansion of MINRES iterates for $Ax=b$ with $A$ symmetric; see Theorem 3.1 there. The result can be directly adapted to the LSMR iterates $x_{k}^{lsmr}$ by keeping in mind that LSMR is mathematically equivalent to MINRES applied to the specific symmetric positive definite linear system $A^{T}Ax=A^{T}b$ .

7 Numerical experiments

All the computations are carried out in Matlab R2017b on the Intel Core i7-4790k with CPU 4.00 GHz processor and 16 GB RAM with the machine precision $\epsilon_{\rm mach}=2.22\times 10^{-16}$ under the Miscrosoft Windows 8 64-bit system.

We have tested LSQR, CGME, LSMR and MCGME on almost all the 1D and 2D problems from [2, 23, 25] and have observed similar phenomena. For the sake of length, we list only some of them in Table 1, where each problem takes its default parameter(s). We mention that the relatively easy 1D problems are all from [23, 25], where shaw, gravity and baart are severely ill-posed and phillips, heat and and deriv2 are moderately. The 2D image deblurring problems blur, fanbeamtomo and seismictomo are also from [23, 25], and the other 2D problems are from [2]. We notice that for blur, fanbeamtomo, although the orders $m$ and $n$ are already tens of thousands, their condition numbers $\sigma_{1}/\sigma_{n}$ are only 31.5 and 2472, respectively, which, intuitively, do not satisfy the definition of a discrete ill-posed problem whose singular values decay and are centered at zero, so that the ratio $\sigma_{1}/\sigma_{n}$ is very large. For each test problem, we compute $b_{true}=Ax_{true}$ and add a Gaussian white noise $e$ with zero mean to $b_{true}$ by prescribing the relative noise level

[TABLE]

We use the code lsqr_b.m of [23], where the reorthogonalization is exploited during Lanczos bidiagonalization in order to maintain the numerical orthogonality of $P_{k+1}$ and $Q_{k}$ . We have written the Matlab codes of CGME, LSMR and MCGME based on the same Lanczos bidiagonalization process used in lsqr_b.m.

For all the 1D problems and the 2D seismictomo, we report the results on them for $\varepsilon=10^{-3}$ ; for all the 2D problems except blur and fanbeamtomo, we report the results on them for $\varepsilon=5\times 10^{-3}$ . For several other $\varepsilon\in[10^{-3},5\times 10^{-2}]$ , we have the same findings. For blur and fanbeamtomo, however, we will observe some fundamental distinctions between the convergence features for $\varepsilon$ lying in this practical interval. Figures 2–7 depict the convergence processes of LSQR, CGME, LSMR and MCGME, and we give some key details, including the iterations $k^{*}$ at which the semi-convergence of an algorithm occurs and the relative error of the best regularized solution obtained by each algorithm, which is defined by

[TABLE]

for LSQR. Similar relative errors are defined for CGME, LSMR and MCGME with the superscript “ $lsqr$ ” replaced by “ $cgme$ ”, “ $lsmr$ ” and “ $mcgme$ ”, respectively. In addition, as a comparison standard on the solution accuracy, we depict the semi-convergence process of the TSVD method for blur and seismictomo, and report the relative errors of the best TSVD regularized solutions $x_{k_{0}}^{tsvd}$ with $k_{0}$ the transition point at which the semi-convergence of TSVD occurs. For the other nine larger 2D problems, we cannot compute the SVDs of the matrices due to out of memory in our computer. We mention that for the first six 1D test problems we have found that the best regularized solutions obtained by TSVD method have the same accuracy as those by LSQR, where the $k_{0}$ are very small relative to $n$ and all the $k^{*}\leq k_{0}$ correspondingly. We omit the results on the 1D problems obtained by the TSVD method.

We now comment the figures and the related details in order.

Firstly, for all the problems in Table 1, the semi-convergence of CGME occurs earlier than LSQR and can be much earlier. This confirms Theorem 4. The much earlier semi-convergence of CGME indicates that $\bar{\theta}_{k}^{(k)}<\sigma_{k_{0}+1}$ occurs much earlier for CGME than $\theta_{k}^{(k)}<\sigma_{k_{0}+1}$ for LSQR.

Secondly, for all the problems, the best regularized solutions $x_{k^{*}}^{cgme}$ are correspondingly less accurate than $x_{k^{*}}^{lsqr}$ considerably except for blur in Figure 5, where the best regularized solution by CGME is almost as accurate as those by LSQR, LSMR and MCGME. For all the 1D problems but baart and the 2D problem fanbeamtomo with $\varepsilon=10^{-3}$ , the relative errors of the best regularized solutions by CGME are twice to five times larger than the counterparts by the other three ones, indicating that the regularization ability is considerably inferior to the other three ones, given that the relative errors by LSQR, LSMR and MCGME themselves are only roughly $0.01\sim 0.1$ ; see Figures 1 (a) and 6 (a). These results confirm Theorems 1–2 and the analysis on them. We will make more comments on Figure 5 later.

Thirdly, for each of the problems, by a careful observation and comparison, we have found that $x_{k}^{cgme}$ is more accurate than and at least at least as accurate as $x_{k}^{lsqr}$ until the occurrence of CGME, after which LSQR continues improving iterates until the occurrence of its semi-convergence, as is clearly seen from Figures 1–7. These results justify our arguments on (58).

Fourthly, for each of the 2D problems, the best regularized solution $x_{k^{*}}^{lsmr}$ is at least as accurate as $x_{k^{*}}^{lsqr}$ , and the semi-convergence of LSMR always occurs no sooner and actually later than that of LSQR. We notice that the relative error of $x_{k^{*}}^{lsmr}$ is only slightly smaller than that of $x_{k^{*}}^{lsqr}$ , and there is little difference between them. For all the 1D problems, the semi-convergence of LSMR and LSQR occurs exactly at the same iterations, and the best regularized solutions obtained by them have the same accuracy. These results confirm Remark 6.2 and justify that LSMR has the same regularization ability as that of LSQR.

Fifthly, for each of the test problems, MCGME improves CGME substantially. As a matter of fact, for the 1D problems, the best regularized solutions by MCGME have the same accuracy as those by LSQR and LSMR; for the 2D problems, the best regularized solutions $x_{k^{*}}^{mcgme}$ are almost as accurate as $x_{k^{*}}^{lsqr}$ and $x_{k^{*}}^{lsmr}$ .

Sixthly, as we have stated, blur and fanbeamtomo are quite well conditioned. With the relatively small $\varepsilon=10^{-3}$ , we observe from Figures 5–6 that there is no semi-convergence phenomenon for LSQR, LSMR and MCGME as well as the TSVD method. This means that $e$ does not plays a part in regularization and these methods solve these two problems as if they were ordinary linear systems. Furthermore, it is clear from the figures that the relative errors of regularized solutions obtained by LSQR, LSMR and MCGME stabilize after 30 iterations for blur and 80 iterations for fanbeamtomo, respectively. Figures 5 (a) and 6 (a) seems to indicate that CGME has no semi-convergence phenomenon for the square blur and given $\varepsilon$ but it has for the rectangular fanbeamtomo. However, this semi-convergence is in disguise and is not caused by the noise $e$ : For the rectangular fanbeamtomo, (44), its proof and the analysis on it state that the smallest singular value $\bar{\theta}_{k}^{(k)}$ of $\bar{B}_{k}$ can be arbitrarily small and approaches zero as $k$ increases. As we have elaborated, $(\bar{\theta}_{k}^{(k)})^{2}$ approaches the eigenvalue zero of $AA^{T}$ as $k$ increases. As a result, the projected problem $\bar{B}_{k}y_{k}^{cgme}=\beta e_{1}^{(k)}$ involved in CGME can become even worse conditioned than (1) itself as $k$ increases for $A$ rectangular, causing that $\|x_{k}^{cgme}\|$ , which equals $\|Q_{k}y_{k}^{cgme}\|=\|y_{k}^{cgme}\|$ , and the relative error $\frac{\|x_{k}^{cgme}-x_{true}\|}{\|x_{true}\|}$ tends to infinity with respect to $k$ . This can also be seen from (49), where we can easily check that $|f_{k}^{(k,cgme)}|\rightarrow\infty$ as $k$ increases since $\sigma_{k}$ is a constant but $\bar{\theta}_{k}^{(k)}\rightarrow 0$ as $k$ increases.

In contrast, the smallest singular values of the projection matrices are always bounded from below by either $\sigma_{n}$ for LSQR (cf. (43)) and MCGME (cf. (59)) or $\sigma_{n}^{2}$ for LSMR (cf. (97)), no matter how $A$ is rectangular or square. This is why CGME has seemingly semi-convergence phenomenon for $A$ rectangular when the other solvers do not have. In the meantime, we see that the best regularized solution by CGME is substantially less accurate than those by the other three algorithms for fanbeamtomo. For the square blur with $\varepsilon=10^{-3}$ , we see that the four Krylov solvers and the TSVD method do not exhibit semi-convergence and compute the solutions with very comparable accuracy. These results and analysis tell us that CGME is definitely not a good choice when $A$ is rectangular.

Seventhly, if the relative noise level $\varepsilon$ is increased to $\varepsilon=0.05$ , the semi-convergence of LSQR, LSMR and MCGME occurs for fanbeamtomo, as is seen from Figure 6. We have also observed the semi-convergence of the four algorithms and the TSVD method for blur with $\varepsilon=0.05$ . We find that the best regularized solutions by LSQR, LSMR and MCGME have very comparable accuracy but CGME computes a less accurate best regularized solution. We omit the corresponding figure. For the test problems, we have also observed that the semi-convergence of the TSVD method occurs much later than the four Krylov solvers, i.e., $k^{*}\ll k_{0}$ .

8 Conclusions

For a general large-scale ill-posed problem (1), iterative solvers are only computationally viable. Of them, the Krylov solvers LSQR, CGLS, CGME and LSMR have been commonly used. In terms of the accuracy of the rank $k$ approximation to $A$ in LSQR, in this paper we have derived accurate estimates for the accuracy of the rank $k$ approximations to $A$ and $A^{T}A$ that are involved in CGME and LSMR, respectively. We have made detailed analyses on the approximation behavior of the singular values of the projection matrices associated with CGME and LSMR. In the meantime, we have derived the filtered SVD expansion of CGME regularized iterates. In conclusion, we have shown that the regularization of CGME is generally inferior to LSQR and the semi-convergence of CGME occurs no later than that of LSQR. We have extracted a best possible rank $k$ approximation to $A$ from the rank $(k+1)$ approximation $P_{k+1}P_{k+1}^{T}A$ , and have shown why such approximation is as accurate as the rank $k$ approximation in LSQR. Based on this analysis, as a by-product, we have proposed a modified CGME (MCGME) method that improves CGME substantially and has the same regularization ability as LSQR.

We have substantially improved a fundamental result, Theorem 9.3 in [16], which gives a bound for the approximation accuracy of the truncated rank $k$ SVD approximation to $A$ generated by randomized algorithms and lacks a complete understanding to its considerable overestimate. Our new bounds are unconditionally superior to theirs and reveal how the truncation step affects the accuracy of the truncated rank $k$ approximation to $A$ .

In the meantime, we have proved that LSMR has the same regularization ability as LSQR and the semi-convergence of LSMR occurs no sooner than that of LSQR. Particularly, we have shown that LSMR has the full regularization for severely and moderately ill-posed problems with suitable $\rho>1$ and $\alpha>1$ .

We have made detailed numerical experiments to confirm our regularization results on CGME and LSMR. We have also numerically demonstrated that the best regularized solutions by MCGME are very comparable to those by LSQR.

Bibliography48

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. C. Aster, B. Borchers, and C. H. Thurber, Parameter Estimation and Inverse Problems , second ed., Elsevier, New York, 2013.
2[2] S. Berisha and J. G. Nagy, Restore tools: Iterative methods for image restoration , 2012, available from http://www.mathcs.emory.edu/ ∼ nagy/Restore Tools.
3[3] Å. Björck, Numerical Methods for Least Squares Problems , SIAM, Philadelphia, PA, 1996.
4[4] , Numerical Methods in Matrix Computations , Texts in Applied Mathematics, vol. 59, Springer, Cham, 2015.
5[5] J. Chung and K. Palmer, A hybrid LSMR algorithm for large-scale Tikhonov regularization , SIAM J. Sci. Comput., 37 (5) (2015), pp. S 562–S 580.
6[6] E. J. Craig, The N 𝑁 N -step iteration procedures , J. Math. Phys. 34 (1955), pp. 64–73.
7[7] J. Demmel, Applied Numerical Linear Algebra , SIAM, Philadelphia, PA, 1997.
8[8] B. Eicke, A. K. Lious, and R. Plato, The instability of some gradient methods for ill-posed problems , Numer. Math., 58 (1) (1990), pp. 129–134.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Regularization Properties of the Krylov Iterative Solvers

Abstract

keywords:

AMS:

1 Introduction and Preliminaries

2 The LSQR, CGME and LSMR algorithms

3 Some results on LSQR in [32, 33]

4 The regularization of CGME

Theorem 1**.**

Theorem 2**.**

Theorem 3**.**

Theorem 4**.**

Theorem 5**.**

5 The accuracy of truncated rank kkk approximate SVDs

Theorem 6**.**

Remark 5.1**.**

Theorem 7**.**

Remark 5.2**.**

Remark 5.3**.**

6 The regularization of LSMR

Theorem 8**.**

Theorem 9**.**

Remark 6.1**.**

Theorem 10**.**

Remark 6.2**.**

Remark 6.3**.**

7 Numerical experiments

8 Conclusions

Theorem 1.

Theorem 2.

Theorem 3.

Theorem 4.

Theorem 5.

5 The accuracy of truncated rank $k$ approximate SVDs

Theorem 6.

Remark 5.1.

Theorem 7.

Remark 5.2.

Remark 5.3.

Theorem 8.

Theorem 9.

Remark 6.1.

Theorem 10.

Remark 6.2.

Remark 6.3.