Regularization Properties of the Krylov Iterative Solvers CGME and LSMR For Linear Discrete Ill-Posed Problems with an Application to Truncated Randomized SVDs
Zhongxiao Jia

TL;DR
This paper analyzes the regularization properties of Krylov solvers CGME and LSMR for large-scale ill-posed problems, comparing their effectiveness with LSQR and improving understanding of their solutions and truncation effects.
Contribution
It establishes the regularization properties of CGME and LSMR, including filtered SVD expansions and accuracy comparisons with LSQR, and improves fundamental results on randomized truncated SVDs.
Findings
CGME and LSMR have regularization properties similar to LSQR.
The solutions obtained by CGME and LSMR are less accurate than those by LSQR.
Truncation in randomized SVDs can reduce accuracy, as analyzed in the paper.
Abstract
For the large-scale linear discrete ill-posed problem or with contaminated by Gaussian white noise, there are four commonly used Krylov solvers: LSQR and its mathematically equivalent CGLS, the Conjugate Gradient (CG) method applied to , CGME, the CG method applied to or with , and LSMR, the minimal residual (MINRES) method applied to . These methods have intrinsic regularizing effects, where the number of iterations plays the role of the regularization parameter. In this paper, we establish a number of regularization properties of CGME and LSMR, including the filtered SVD expansion of CGME iterates, and prove that the 2-norm filtering best regularized solutions by CGME and LSMR are less accurate than and at least as accurate as those by LSQR, respectively. We also prove that the semi-convergence…
| Problem | Description | Size of |
|---|---|---|
| shaw | 1D image restoration model | |
| gravity | 1D gravity surveying problem | |
| baart | 1D image deblurring | |
| phillips | phillips’ famous test problem | |
| heat | Inverse heat problem | |
| deriv2 | Computation of second derivative | |
| AtmosphericBlur10 | Spatially Invariant Gaussian Blur | |
| AtmosphericBlur30 | Spatially Invariant Gaussian Blur | |
| GaussianBlur420 | Spatially Invariant Atmospheric | |
| Turbulence Blur | ||
| GaussianBlur422 | Spatially Invariant Atmospheric | |
| Turbulence Blur | ||
| VariantGaussianBlur1 | Spatially Variant Gaussian Blur | |
| VariantGaussianBlur2 | Spatially Variant Gaussian Blur | |
| VariantMotionBlur_large | Spatially Variant Motion Blur | |
| VariantMotionBlur_medium | Spatially Variant Motion Blur | |
| blur | 2D image restoration | |
| fanbeamtomo | 2D fan-beam tomography problem | |
| seismictomo | 2D seismic tomography |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Regularization Properties of the Krylov Iterative Solvers
CGME and LSMR For Linear Discrete Ill-Posed Problems with an Application to Truncated Randomized SVDs††thanks: This work was supported in part by the National Science Foundation of China (No. 11771249)
Zhongxiao Jia Department of Mathematical Sciences, Tsinghua University, 100084 Beijing, China. () [email protected]
Abstract
For the large-scale linear discrete ill-posed problem or with contaminated by Gaussian white noise, there are four commonly used Krylov solvers: LSQR and its mathematically equivalent CGLS, the Conjugate Gradient (CG) method applied to , CGME, the CG method applied to or with , and LSMR, the minimal residual (MINRES) method applied to . These methods have intrinsic regularizing effects, where the number of iterations plays the role of the regularization parameter. In this paper, we establish a number of regularization properties of CGME and LSMR, including the filtered SVD expansion of CGME iterates, and prove that the 2-norm filtering best regularized solutions by CGME and LSMR are less accurate than and at least as accurate as those by LSQR, respectively. We also prove that the semi-convergence of CGME and LSMR always occurs no later and sooner than that of LSQR, respectively. As a byproduct, using the analysis approach for CGME, we improve a fundamental result on the accuracy of the truncated rank approximate SVD of generated by randomized algorithms, and reveal how the truncation step damages the accuracy. Numerical experiments justify our results on CGME and LSMR.
keywords:
Discrete ill-posed, rank approximations, semi-convergence, regularized solution, Lanczos bidiagonalization, TSVD regularized solution, CGME, LSMR, LSQR, CGLS
AMS:
65F22, 15A18, 65F10, 65F20, 65R32, 65J20, 65R30
\slugger
sirevxxxxxxxx–x
1 Introduction and Preliminaries
Consider the linear discrete ill-posed problem
[TABLE]
where the norm is the 2-norm of a vector or matrix, and is extremely ill conditioned with its singular values decaying to zero without a noticeable gap. We simply assume that . Since the results in this paper hold for both the and cases. (1) arises from many applications, e.g., from the discretization of the first kind Fredholm integral equation
[TABLE]
where the kernel and are known functions, while is the unknown function to be sought. Applications include image deblurring, signal processing, geophysics, computerized tomography, heat propagation, biomedical and optical imaging, groundwater modeling, and many others [1, 9, 10, 24, 35, 36, 37, 39, 47]. The right-hand side is assumed to be contaminated by a Gaussian white noise , caused by measurement, modeling or discretization errors, where is noise-free and . Because of the presence of noise and the extreme ill-conditioning of , the naive solution of (1) generally bears no relation to the true solution , where denotes the Moore-Penrose inverse of a matrix. Therefore, we must use regularization to extract a good approximation to as much as possible.
For a Gaussian white noise , throughout the paper, we always assume that satisfies the discrete Picard condition with some constant for arbitrarily large [1, 13, 20, 21, 22, 24, 36]. Without loss of generality, assume that . Then a dominating regularization approach is to solve the problem
[TABLE]
with slightly [22, 24], where is a regularization matrix and its suitable choice is based on a-prior information on .
In this paper, we are concerned with the case in (3), which corresponds to a 2-norm filtering regularization problem. Let
[TABLE]
be the singular value decomposition (SVD) of , where and are orthogonal, with the singular values assumed to be simple, the superscript denotes the transpose of a matrix or vector, and denotes a zero matrix. With (4), we have
[TABLE]
and .
The discrete Picard condition means that, on average, the Fourier coefficient decays faster than , which results in the following popular model that is used throughout Hansen’s books [22, 24] and the references therein as well as [32, 33]:
[TABLE]
where is a model parameter that controls the decay rates of .
The covariance matrix of the Gaussian white noise is , the expected value and , so that and . (5) and (6) show that, for large singular values, is dominant relative to . Once from some onwards, the noise dominates , and the terms overwhelm for small singular values and must be dampened. Therefore, the transition point is such that
[TABLE]
see [24, p.42, 98] and [22, p.70-1].
The truncated SVD (TSVD) method [20, 22, 24] is a reliable and commonly used method for solving small to modest sized (3), and it solves a sequence of problems
[TABLE]
starting with onwards, where is a best rank approximation to with respect to the 2-norm with , and ; it holds that [3, p.12], and solves (8), called the TSVD regularized solution. For the Gaussian white noise it is known from [22, p.70-1] and [24, p.71,86-8,95] that is the 2-norm filtering best TSVD regularized solution of (1), i.e., has the minimal 2-norm error . The index plays the role of the regularization parameter in the TSVD method. It has been observed and justified that is essentially a 2-norm filtering best possible solution of (1); see [21], [22, p.109-11], [24, Sections 4.2 and 4.4] and [46]. We refer to [32] for general elaborations. As a result, we can take as the standard reference when assessing the regularization ability of a 2-norm filtering regularization method.
For large, the TSVD method is generally prohibitively expensive, and only iterative regularization methods are appealing. Krylov iterative solvers have formed a major class of methods [1, 10, 14, 17, 22, 24, 37]. Specifically, the CGLS method [15, 26] and its mathematically equivalent LSQR method [41], the CGME method [3, 4, 6, 17, 18] and the LSMR method [4, 5, 12] have been commonly used. These methods are deterministic 2-norm filtering regularization methods, have general regularizing effects, and exhibit semi-convergence [39, p.89]; see also [3, p.314], [4, p.733], [22, p.135] and [24, p.110]: The iterates first converge to , then the noise starts to deteriorate the iterates so that they start to diverge from and instead converge to . The iteration number plays the role of the regularization parameter in iterative regularization methods.
The behavior of ill-posed problems and solvers depends on the decay rate of . Hoffmann [29] has characterized the degree of ill-posedness of (1) as follows: If with , , then (1) is severely ill-posed; if , then (1) is mildly or moderately ill-posed for or . This definition has been widely used [1, 10, 22, 24]. The requirement does not appear in [29] and is explicitly added in [30, 32], which is always met for a linear compact operator equation [19, 22].
Hanke and Hansen [19] address that a strict proof of the regularizing properties of conjugate gradients is extremely difficult; see also [23]. The regularizing effects of CGLS, LSQR and CGME have been intensively studied; see, e.g., and have been intensively studied [1, 8, 11, 14, 17, 18, 22, 24, 27, 28, 30, 32, 33, 42, 45]. It has long been known (cf. [19, 22, 23, 24]) that if the singular values of the projection matrices involved in LSQR, called the Ritz values, approximate the large singular values in natural order then LSQR has the same regularization ability as the TSVD method, that is, the two methods can compute 2-norm filtering best regularized solutions with the same accuracy. As we will see clearly, the same results hold for CGME and LSMR when the singular values of projection matrices approximate the large singular values of and in this order, respectively.
If a 2-norm filtering regularized solution of (1) is as accurate as , it is called a 2-norm filtering best possible regularized solution. If the 2-norm filtering regularized solution by a regularization method at semi-convergence is such a best possible one, then the solver is said to have the full regularization. Otherwise, the solver has only the partial regularization. This definition is introduced in [30, 32]. In terms of it, a fundamental question posed in [30, 32] is: Do CGLS, LSQR, CGME and LSMR have the full or partial regularization for severely, moderately and mildly ill-posed problems? Actually, this question has been receiving high attention for CGLS and LSQR.
For the cases that are simple, the author in [32] has given accurate estimates for the 2-norm distances between the underlying dimensional Krylov subspace and the dimensional dominant right singular subspace of for severely, moderately and mildly ill-posed problems. On the basis of [32], the author in [33] has proved that, for LSQR, the Ritz values converge to the large singular values of in natural order and Lanczos bidiagonalization always generates a near best rank approximation until for severely and moderately ill-posed problems with suitable and , meaning that LSQR and CGLS have the full regularization. However, if such desired properties fail to hold, it has been theoretically unknown if LSQR has the full or partial regularization. Nevertheless, numerical experiments on many ill-posed problems have demonstrated that LSQR always has the full regularization [32, 33].
In this paper, we analyze the regularization of CGME and LSMR under the assumption that all the singular values are simple. We establish a number of results, and prove that the regularization ability of CGME is generally inferior to that of LSQR, that is, the 2-norm filtering best regularized solutions obtained by CGME at semi-convergence are generally less accurate than those obtained by LSQR. Specifically, we derive the filtered SVD expansion of CGME iterates, by which we prove that the semi-convergence of CGME always occurs no later than that of LSQR and can be much earlier than the latter. In the meantime, we show how to extract a rank approximation from the rank approximation to generated in CGME at iteration , which is as accurate as the rank approximation in LSQR. Exploiting such rank approximation, we propose a modified CGME (MCGME) method whose regularization ability is shown to be very comparable to that of LSQR. For LSMR, we present a number of results and prove that its regularization ability is as good as that of LSQR and the two methods compute the 2-norm filtering best regularized solutions with essentially the same accuracy. We also show that the semi-convergence of LSMR always occurs no sooner than that of LSQR.
As a windfall, making of our analysis approach used for CGME, we improve a fundamental bound, Theorem 9.3 presented in Halko et al. [16], for the accuracy of the truncated rank approximation to generated by randomized algorithms, which have formed a highly intensive topic and have been used in numerous disciplines over the years. As remarked by Halko et al. in [16] (cf. Remark 9.1 there), their bound appears “conservative, but a complete theoretical understanding lacks.” Our new bounds for the approximation accuracy are not only unconditionally sharper than theirs but also can reveal how the truncation step damages the accuracy of the rank approximation.
The paper is organized as follows. In Section 2, we review LSQR, CGME and LSMR. In Section 3, we briefly state some results on LSQR in [32, 33] and take LSQR as reference to assess the regularization ability of CGME and LSMR. In Section 4, we derive a number of regularization properties of CGME and propose the MCGME method. In Section 5, we consider the accuracy of the truncated rank randomized approximation [16] and present sharper bounds. In Section 6, we study the regularization ability of LSMR. In Section 7, we report numerical experiments to confirm our theory. We conclude the paper in Section 8.
Throughout the paper, we denote by the dimensional Krylov subspace generated by the matrix and the vector , and by the bold letter the zero matrix with orders clear from the context.
2 The LSQR, CGME and LSMR algorithms
These three algorithms are all based on the Lanczos bidiagonalization process, which computes two orthonormal bases and of and for , respectively. We describe the process as Algorithm 1.
Algorithm 1: -step Lanczos bidiagonalization process
Take , and define . 2. 2.
For
- (a)
2. (b)
3. (c)
4. (d)
Algorithm 1 can be written in the matrix form
[TABLE]
where denotes the -th canonical basis vector of , , and
[TABLE]
It is known from (9) that
[TABLE]
Algorithm 1 cannot break down before step when , are simple since is supposed to have nonzero components in the directions of . The singular values of , called the Ritz values of with respect to the left and right subspaces and , are all simple.
Write and . At iteration , LSQR [41] solves
[TABLE]
for the iterate
[TABLE]
where is the first canonical basis vector of , and decreases monotonically with respect to .
CGME [4, 17, 18, 27, 28] is the CG method implicitly applied to or with , and it solves the problem
[TABLE]
for the iterate . The error norm decreases monotonically with respect to . Let be the matrix consisting of the first rows of , i.e.,
[TABLE]
Then the CGME iterate
[TABLE]
and with the -th canonical vector of .
LSMR [4, 12] is mathematically equivalent to MINRES [40] applied to the normal equation of (1), and it solves
[TABLE]
for the iterate . The residual norm of the normal equation decreases monotonically with respect to , and the iterate
[TABLE]
3 Some results on LSQR in [32, 33]
From and (13) we have
[TABLE]
which is the minimum 2-norm solution to the problem that perturbs in (1) to its rank approximation . Recall that . Analogous to (8), LSQR now solves a sequence of problems
[TABLE]
for starting with onwards, where in (1) is replaced by a rank approximation of it. Therefore, if is a near best rank approximation to with an approximate accuracy and the singular values of approximate the large in natural order for , then LSQR has the same regularization ability as the TSVD method and thus has the full regularization. See [32] for more elaborations.
The analysis on the TSVD method and the Tikhonov regularization method [22, 24] shows that the core requirement on a regularization method is to acquire the dominant SVD components of and meanwhile suppress the remaining SVD components. Therefore, the more accurate the rank approximation is to and the better approximations are the non-zero singular values of a projection matrix to some of the large singular values of , the better regularization ability of the method has, so that the best regularized solution obtained by it is more accurate.
Define
[TABLE]
which measures the accuracy of the rank approximation to involved in LSQR. Since the best rank approximation satisfies , we have
[TABLE]
The author in [33] introduces the definition of a near best rank approximation to : For LSQR, is called a near best rank approximation to if is closer to than to :
[TABLE]
Based on the accurate estimates established in [32] for the 2-norm distances between the underlying Krylov subspace and the dimensional dominant right singular subspace for severely, moderately and mildly ill-posed problems, the author [33] has derived accurate estimates for and a number of approximation properties of for the three kinds of ill-posed problems. The results have shown that, for severely and moderately ill-posed problems with for suitable and and for , must be a near best rank approximation to , and the Ritz values approximate the large singular values of in natural order. This means that LSQR has the full regularization for these two kinds of problems with suitable and . However, for moderately ill-posed problems with not enough and mildly ill-posed problems, is generally not a near best rank approximation, and the Ritz values do not approximate the large singular values of in natural order for some .
In particular, the author [33, Theorem 5.1] has proved the following three results:
[TABLE]
with
[TABLE]
[TABLE]
These notation and results will be used later.
4 The regularization of CGME
Note that . We obtain
[TABLE]
Therefore, analogous to (8) and (18), CGME solves a sequence of problems
[TABLE]
for the regularized solution starting with onwards, where in (1) is replaced by a rank approximation of it.
Just as LSQR, if is a near best rank approximation to and the singular values of approximate the large ones of in natural order for , then CGME has the full regularization.
By (9), (10) and (12), the rank approximation involved in LSQR is
[TABLE]
By (19), we have For CGME, by (10) and (14), we obtain
[TABLE]
Therefore, is the solution to (31) in which the rank approximation to is , whose approximation accuracy is
[TABLE]
Theorem 1**.**
For the rank approximations to , , with the definition we have
[TABLE]
*Proof. * We give two proofs of the upper bound in (35). The first is as follows. Since , from (10) we obtain
[TABLE]
which is the upper bound in (35) by replacing the index with .
Taking in (12) and augmenting such that is orthogonal, we have
[TABLE]
where all the entries and , , of are positive, and is orthogonal. Then by the orthogonal invariance of the 2-norm we obtain
[TABLE]
with defined by (27). It is straightforward to justify that the singular values of strictly interlace those of by noting that is an unreduced symmetric tridiagonal matrix, from which and the lower bound of (35) follows.
Based on (38), we can also give the second proof of the upper bound in (35). Observe from (27) that is the matrix deleting the first row of . Applying the strict interlacing property of singular values to and , we obtain , which yields the upper bound of (35).
From (38), notice that is the matrix deleting the first row of and the first column, which is zero, of the resulting matrix. Applying the strict interlacing property of singular values to and establishes (36).
(35) indicates that is definitely a less accurate rank approximation to than in LSQR. (36) shows the strict monotonic decreasing property of . Moreover, keep in mind that . Then a combination of it and the results in Section 3 indicates that, unlike in LSQR, there is no guarantee that is a near best rank approximation to even for severely and moderately ill-posed problems, because simply lies between and and there do not exist any sufficient conditions on and that enforce to be closer to , let alone closer to . Therefore, based on the accuracy of the rank approximations in CGME and LSQR, we come to the conclusion that the regularization ability of CGME cannot be superior and is generally inferior to that of LSQR. Furthermore, since there is no guarantee that is a near best rank approximation for severely and moderately ill-posed problems with suitable and , CGME may not have the full regularization for these two kinds of problems.
In the following we investigate the approximation behavior of the singular values of . Before proceeding, it is necessary to have a closer look at Algorithm 1 and distinguish some subtleties when is rectangular, i.e., , and square, i.e., , respectively.
Keep in mind that Algorithm 1 does not break down before step . For the rectangular case , Algorithm 1 is exactly what is presented there, all the and are positive, , and we generate and at step and . As a consequence, by definition (33), we have
[TABLE]
It is known from (37) that the singular values of are identical to the singular values of . Therefore, the singular values of are and zero.
For the square case , however, we must have , that is, the last row of is zero; otherwise, we would obtain an orthonormal matrix , which is impossible since is already an orthogonal matrix. After Algorithm 1 is run to completion, we have
[TABLE]
whose singular values .
By the definition (33) of , from (10) and the above description, for both the rectangular and square cases we obtain
[TABLE]
with for and for , which are unreduced symmetric tridiagonal matrices. For , the eigenvalues of are just , all of which are simple and positive; for , the eigenvalues of are plus zeros, denoted by for our later use. Therefore, by the definition of , the eigenvalues of are .
Notice that is nothing but the projection matrix of onto the dimensional Krylov subspace . More precisely, is generated by the -step symmetric Lanczos tridiagonalization process applied to starting with , and the eigenvalues of generally approximate extreme eigenvalues of ; see, e.g., [3, 4, 43] for details. Particularly, the smallest eigenvalue of generally converges to the smallest eigenvalue of as increases, which is for and for . In contrast, for , its smallest singular value unconditionally until .
We next give a number of close relationships between and as well as between them and the singular values of , which are crucial to compare the regularizing effects of CGME with those of LSQR.
Theorem 2**.**
Denote by and the singular values of and , respectively, labeled in decreasing order. Then
[TABLE]
Moreover,
[TABLE]
for and
[TABLE]
for .
Proof. Observe that consists of the first rows of and all the and are positive for . Applying the strict interlacing property of singular values to and , we obtain (41).
Note that, for both rectangular and square, we have . Since consists of the first columns of and deletes the last zero rows of the resulting matrix, applying the strict interlacing property of singular values to and (cf. [44, p.198, Corollary 4.4]), for we have
[TABLE]
Observe that are the leading principal matrices of , whose eigenvalues are , and they are unreduced symmetric tridiagonal matrices. Applying the strict interlacing property of eigenvalues to and , for we obtain
[TABLE]
from which and the definition of it follows that
[TABLE]
for and
[TABLE]
for . The above, together with (45) and (41), yields (42)–(44).
From Section 3, (42) and (44) indicate that, unlike the singular values of , which have been proved to interlace the first large ones of and approximate the first ones in natural order for the severely or moderately ill-posed problems for suitable or [33], the lower bound for is simply for and zero for , respectively, and there does not exist a better lower bound for it. This implies that may be much smaller than and it can be as small as for and arbitrarily small for , independent of or . In other words, the size of or has no intrinsic effects on the size of , and cannot make lie between and by choosing or , that is, the regularizing effects of CGME have intrinsic indeterminacy for severely and moderately ill-posed problems, independent of the size of and . Therefore, CGME may or may not have the full regularization for these two kinds of problems. On the other hand, even if the approximate the first large singular values in natural order, they are less accurate than the singular values of because of (42) and (44). Consequently, since the are always correspondingly larger than the , the regularization ability of CGME cannot be superior and is generally inferior to that of the LSQR.
A final note is that, unlike for , CGME may be at risk for since the converges to zero other than as increases and can be arbitrarily small, which causes that the projected problem may even be worse conditioned than (1) and may be unbounded as increases and bigger than for a given (1).
In what follows we establish more results on the regularization of CGME and get more insight into it. It is known, e.g., [22, p.146] that the LSQR iterate takes the following filtered SVD expansion:
[TABLE]
where the filters
[TABLE]
These results have been extensively used to study the regularizing effects of LSQR; see, e.g., [22, 23, 32]. We now prove that the CGME iterate also takes a filtered SVD expansion similar to (46) and (47), but its proof is much more involved than that of (46) and (47).
Theorem 3**.**
The CGME iterate has the filtered SVD expansion
[TABLE]
where the filters
[TABLE]
Proof. Let be the minimal 2-norm solution to . Recall Algorithm 1. For this minimization problem, starting with , at iteration the CG method extracts from the dimensional Krylov subspace
[TABLE]
It is well known from, e.g., [38], that the residual of is
[TABLE]
where is the -th residual, or Ritz, polynomial with the normalization , whose roots are the Ritz values of with respect to the subspace ; see (40). Therefore, we have
[TABLE]
From the full SVD (4) of , write . Then we have , the compact SVD of . It is straightforward to see that
[TABLE]
Therefore, by , premultiplying the two hand sides of (50) by yields
[TABLE]
from which it follows that
[TABLE]
By the SVD (4) of , we have
[TABLE]
Hence for from (51) and (52) we obtain
[TABLE]
with defined by (49). In terms of and , premultiplying the two hand sides of the above relation by and exploiting , we have
[TABLE]
Then making use of this relation, and (53), we obtain (48).
Based on Theorems 2–3, we can prove the following important result.
Theorem 4**.**
Let and be iterations at which the semi-convergence of CGME and LSQR occurs, respectively, the transition point of the TSVD method. Then
[TABLE]
that is, the semi-convergence of CGME always occurs no later than that of LSQR and the TSVD method.
Proof. The result has been proved in [32, Theorem 3.1]. Next we first prove that .
Recall that the best TSVD solution
[TABLE]
and the fact that a 2-norm filtering best possible solution must capture the dominant SVD components of and suppress the small SVD components of .
For CGME, from (42) and (44) we have Therefore, at iteration we must have . If the approximate the large in natural order for , then by (49) we have for and for . On the other hand, by (49) we have . Compared with the best TSVD solution, by (48) the above shows that the CGME iterate captures the dominant SVD components of and filters out the small ones. As a result, improves until iteration , and the semi-convergence of CGME occurs at iteration .
If the do not converge to the large singular values of in natural order and for some iteration for the first time, then is already deteriorated by the noise before iteration : Suppose that with the smallest integer . Then we can easily justify from (49) that and tends to zero monotonically for , but
[TABLE]
since the first factor is non-positive and the second factor is positive by noting that , for . As a result, for , showing that has been deteriorated by the noise and the semi-convergence of CGME has occurred at some iteration .
Finally, we prove . Notice that means that the first iteration such that for CGME is no more than the one such that for LSQR. Therefore, applying a similar proof to that of the semi-convergence of CGME to (46)–(47), it is direct that the semi-convergence of CGME occurs no later than that of LSQR, i.e., .
It is seen from the above proof that, due to , the semi-convergence of CGME can occur much earlier than that of LSQR.
We can, informally, deduce more features of CGME. By definition, the optimality of CGME means that
[TABLE]
holds unconditionally for . Since and converge to until iterations and at which the semi-convergence of CGME and LSQR occurs, respectively, it is known that, for and , and are negligible relative to , which is supposed very large in the context of discrete ill-posed problems. As a consequence, we have
[TABLE]
Since the first terms in the right-hand sides of (56) and (57) are the same constant, a combination of (55) with (56) and (57) means that
[TABLE]
generally holds until . That is, should be at least as accurate as until the semi-convergence of CGME occurs. Then for , according to Theorem 4, continues approximating as increases until iteration , at which LSQR ultimately computes a more accurate approximation to than .
We will have more exciting findings. Observe that after Lanczos bidiagonalization is run steps, we have already obtained , and , but LSQR and CGME exploit only and , respectively. Since for , applying the strict interlacing property of singular values to and , we have
[TABLE]
Note from (44) that . Combining (59) with (44), we see that as approximations to the first large singular values of , although the singular values of are less accurate than the singular values of , the first singular values of are more accurate than the correspondingly.
Based on the above property and (33), we next show how to extract a best possible rank approximation to from the available rank matrix generated by Algorithm 1.
Theorem 5**.**
Let be the best rank approximation to with respect to the 2-norm. Then for we have
[TABLE]
where is the smallest singular value of and is defined by (34).
Proof. Write . Then exploiting (33), we obtain
[TABLE]
By the definition of and (33), it is easily justified that is the best rank approximation to in the 2-norm as and are column orthonormal. Keep in mind that is the best rank approximation to . Since is a rank approximation to , we obtain
[TABLE]
Note that the first term in the right-hand side of (63) is just . Therefore, it follow from (63) that (60) holds.
Since and are column orthonormal and is the best rank approximation to , by the orthogonal invariance of 2-norm we obtain
[TABLE]
which, together with (62), yields (61).
The bound (61) is always smaller than the bound (60) because of from (42) and (44). Indeed, the bound (60) can be conservative since we have amplified twice and obtained its bound , which might be a considerable overestimate. Moreover, as we have explained previously, (42) and (44) show that may approach for and can be close to zero arbitrarily for . By definition (19) of , since (cf. the upper bound of (35)), and , the right-hand side of (61) satisfies
[TABLE]
Therefore, is as small as and can even be smaller than , meaning that is as accurate as the rank approximation in LSQR.
Define , and note from (39) that . Recall that the singular values of and are and , respectively, and and . From (37) and the definition of , since is of rank , we have
[TABLE]
and
[TABLE]
Based on Theorem 5 and the analysis followed, just as done in CGME and LSQR, we can replace in (1) by the rank approximation and propose a modified CGME (MCGME) method that solves
[TABLE]
for the regularized solution of (1) with
[TABLE]
starting with onwards. MCGME is expected to have the same regularization ability as LSQR because (i) the nonzero singular values of are more accurate than the singular values of as approximations to the first singular values of and (ii) is a rank approximation which is as accurate as in LSQR. Regarding implementations, we comment that the singular values, and left and right singular vectors of is already available when is extracted from the SVD of , whose computational cost is flops. As a result, by (65) we can compute at cost of flops. A difference from CGME and LSQR is that MCGME seeks in the dimensional Krylov subspace other than in . Numerical experiments will justify that MCGME has very comparable regularizing effects to LSQR and can obtain the best regularized solutions with very similar accuracy to those by LSQR. We will not consider the by-product MCGME method further in this paper.
may have some other potential applications. For example, when we are required to compute several largest singular triplets of a large scale matrix , we can use the nonzero singular values of to replace the ones of as more accurate approximations to the largest singular values of in Lanczos bidiagonaliation type algorithms [34]. In such a way, exploiting the SVD of , we can also compute more accurate approximate left and right singular vectors of simultaneously. A development of such modified algorithms is beyond the scope of this paper.
5 The accuracy of truncated rank approximate SVDs
by randomized algorithms
In this section, we deviate from the context of Krylov solvers. Using the analysis approach in the last section, we consider the accuracy of a truncated rank SVD approximation to constructed by standard randomized algorithms and their improved variants [16]. This topic has been intensively investigated in recent years; see the survey paper [16] and the references therein. Algorithm 2 is one of the two basic randomized algorithms from [16] for computing an approximate SVD and extracting a truncated rank approximate SVD from it. A minor difference from the other sections in this paper is that we drop the restrictions that the singular values of are simple and , that is, the singular values of are .
Algorithm 2: Randomized approximate SVD of
- •
Input: Given , a target rank , and an oversampling parameter satisfying .
- •
Output: a truncated rank approximate SVD of .
Stage A
Draw an Gaussian random matrix . 2. 2.
Form the matrix . 3. 3.
Compute the compact QR factorization .
Stage B
Form . 2. 2.
Compute the compact SVD of the matrix : . 3. 3.
Set . Compute a rank SVD approximation to . 4. 4.
Let be the best rank approximation to with the diagonal being the first diagonals of , and and the first columns of and , respectively. Form a truncated rank SVD approximation to with .
For the approximation accuracy of to , Halko et al. [16] establish a fundamental result (cf. Theorem 9.3 there):
[TABLE]
Assume that the oversampling parameter . Making use of the probability theory, in terms of , Halko et al. [16] have established a number of bounds for ; see, e.g., Theorems 10.5–10.8 and Corollary 10.9–10.10 there. However, concerning (66), they point out in Remark 9.1 that ”In the randomized setting, the truncation step appears to be less damaging than the error bound of Theorem 9.3 suggests, but we currently lack a complete theoretical understanding of its behavior.” That is to say, the first term in (66) is generally conservative and an overestimate.
Motivated by the proof of (61) in Theorem 5, we can improve (66) substantially and reveal why (66) is an overestimate. Let
[TABLE]
be the singular values of defined in Algorithm 2. It is clear from Algorithm 2 that
[TABLE]
is an symmetric matrix, which is the projection matrix of onto the subspace in the orthonormal basis of with , whose eigenvalues are . Keep in mind that the eigenvalues of are and zeros, denoted by for later use.
Theorem 6**.**
For , let and be defined as in Algorithm 2, and defined as in (67). Then
[TABLE]
with
[TABLE]
Proof. Since is orthonormal, the eigenvalues of interlace those of and satisfy (cf. [44, p.198, Corollary 4.4])
[TABLE]
from which (69) follows.
From Algorithm 2, we can write
[TABLE]
Since is the best rank approximation to , by the column orthonormality of we obtain
[TABLE]
which proves (68).
Remark 5.1**.**
This theorem indicates that never exceeds and, for large and small, it may be much smaller than . Specifically, can be as small as . For , whenever , we have . Consequently, the bound (68) is unconditionally superior to the bound (66) and is sharper than the latter when . On the other hand, however, note that . Therefore, if , we must have , that is, dominates the bound (68). Summarizing the above, in response of Remark 9.1 in [16], we conclude that the truncation step does damage the approximation accuracy of the truncated rank approximation when and it is less damaging when .
As we have seen, the column space of constructed by Algorithm 2 aims to capture the -dimensional dominant left singular subspace of . A variant of it is to capture the -dimensional right dominant singular subspace of . Mathematically, it amounts to applying Algorithm 2 to and computes a truncated rank SVD approximation to in a similar way. We call such variant Algorithm 3, for which (66) now becomes
[TABLE]
with the orthonormal .
Note that the eigenvalues of are and zeros, denoted by . Since the eigenvalues of interlace those of , using the same proof approach as that of Theorem 6, we can establish the following result.
Theorem 7**.**
For , let and be defined as in Algorithm 3, and be the singular values of . Then
[TABLE]
with
[TABLE]
We comment that, in the case , whenever , we have , and consequently the bound (71) is unconditionally superior to and can be substantially sharper than the bound (70) for large and small.
Remark 5.2**.**
If the singular values of are all simple, by the strict interlacing properties of eigenvalues, the singular values of in Algorithms 2–3 are all simple too, and the lower and upper bounds in (69) and (72) are strict, i.e., .
Remark 5.3**.**
(66) and (70) and Theorems 6–7 hold for all the truncated rank SVD approximations generated by the enhanced variants of Algorithm 2–3 in [16], where the unique difference between the variants is the way that is generated. More generally, Theorems 6–7 are true for arbitrarily given orthonormal and with , respectively.
6 The regularization of LSMR
From Algorithm 1 we obtain
[TABLE]
Therefore, from (16), noting that , we have
[TABLE]
which means that LSMR solves the problem
[TABLE]
for the regularized solution starting with onwards. In the meantime, it is direct to justify that the TSVD solution solves the problem
[TABLE]
starting with onwards. Therefore, (75) and (76) deal with the normal equation of (1) by replacing with its rank approximations and , respectively.
In view of (75) and (76), we need to accurately estimate the approximation accuracy and investigate how the singular values of approximate the large singular values of . We are concerned with some intrinsic relationships between the regularizing effects of LSMR and those of LSQR and compare the regularization ability of the two methods.
By (17), (9), (10), (12) and , the LSQR iterate
[TABLE]
which is the solution to the problem
[TABLE]
that replaces by its rank approximation in the normal equation . In this sense, the accuracy of such rank approximation is measured in terms of for LSQR.
Firstly, we present the following result, which compares the accuracy of two rank approximations involved in LSMR and LSQR in the sense of solving the normal equation .
Theorem 8**.**
For the rank approximations to in (75) and (77), , we have
[TABLE]
Proof. For the orthogonal matrix generated by Algorithm 1, noticing that , from (9) and (10) we obtain and
[TABLE]
where
[TABLE]
is the matrix by deleting the leading principal matrix of the symmetric tridiagonal matrix and the first zero rows and zero columns of the resulting matrix, where is defined by (27) and are the first canonical vector of .
On the other hand, it is direct to verify that
[TABLE]
where with being the second canonical vector of .
[TABLE]
Since is unreduced symmetric tridiagonal, its eigenvalues are all simple. Observe from (90) that
[TABLE]
Therefore, we know from [7, p.218] that the eigenvalues of strictly interlace those of and are all simple. Furthermore, we see from (27) that is of full column rank, which means that the eigenvalues of are all positive.
Note that the eigenvalues of are those of and zero. As a result, the eigenvalues of are all simple. According to [7, p.218], we know from (92) that the eigenvalues of strictly interlace those of . Therefore, we obtain
[TABLE]
which, from (79) and (91), establishes (78).
This theorem indicates that, as far as solving is concerned, the rank approximation in LSMR is more accurate than that in LSQR.
Recall that (19) measures the quality of the rank approximation involved in LSQR for the regularization problem (18). We now estimate the approximation accuracy of to in terms of .
Theorem 9**.**
For , let be defined as (19). For we have
[TABLE]
with and . For , the strict monotonic decreasing property holds:
[TABLE]
Proof. Combining (90) with (21) and (28), for we obtain from [48, p.98] and [7, p.218] that
[TABLE]
with and , from which the lower and upper bounds in (94) follow directly.
For , the equality in (96) is still true. From (28), we have . Therefore, we obtain
[TABLE]
from which it follows that (94) holds for .
From (87), we see that is the matrix that first deletes the first column and row of and then deletes the first zero column and row of the resulting matrix. Therefore, applying the interlacing property of singular values to and yields
[TABLE]
We next prove that the above ”” is the strict ””. Since is an unreduced symmetric tridiagonal matrix, its singular values are simple. Observe that is the matrix deleting the first columns of and the first zero rows of the resulting matrix. Consequently, the singular values of strictly interlace the simple singular values of and are thus simple for . Moreover, the singular values of strictly interlace those of , which means that , i.e., , which proves (95).
Remark 6.1**.**
According to the results and analysis in [33], we have for severely ill-posed problems, and for moderately and mildly ill-posed problems. Therefore, the lower and upper bounds of (94) indicate that .
Finally, let us investigate the relationship between the singular values of rank approximation matrices in LSMR and LSQR. From (73) and (12), we know that they are the singular values of and , respectively.
Theorem 10**.**
Let be the singular values of . Then for we have
[TABLE]
Proof. Observe that is the matrix consisting of the first columns of and deleting the last zero rows of the resulting matrix. As a result, since , are simple, the singular values of strictly interlace the singular values of :
[TABLE]
and are simple, which means the upper bound (97).
Note that has the eigenvalues and zero, and is its leading principal submatrix and has simple eigenvalues . Therefore, strictly interlace and zero, which proves the lower bound of (97).
On the other hand, we have
[TABLE]
Recall (27) that and . By standard perturbation theory, we obtain
[TABLE]
from which it follows that (98) holds.
Remark 6.2**.**
(97) indicates that approximate the first large singular values more accurately than . Particularly, since , the first iteration step such that must be no smaller than the such that . A combination of this and the previous analysis on the semi-convergence of CGME and LSQR implies that the semi-convergence of LSMR must occur no sooner than that of LSQR. On the other hand, (98) shows that is bounded from the above by as an approximation to , which and (97) imply that and interact and cannot be considerably more accurate than as approximations to the large singular values of for .
Remark 6.3**.**
A combination of Theorem 8 and the above two remarks means that the regularizing effects of LSMR are not inferior to those of LSQR and the best regularized solutions by LSMR are at least as accurate as those by LSQR, that is, LSMR has the same regularization ability as that of LSQR. Particularly, from the results on LSQR in Section 3, we conclude that LSMR has the full regularization for severely or moderately ill-posed problems with suitable or .
A final note is that Huang and Jia [31] have derived the eigendecomposition, i.e., equivalent SVD, filtered expansion of MINRES iterates for with symmetric; see Theorem 3.1 there. The result can be directly adapted to the LSMR iterates by keeping in mind that LSMR is mathematically equivalent to MINRES applied to the specific symmetric positive definite linear system .
7 Numerical experiments
All the computations are carried out in Matlab R2017b on the Intel Core i7-4790k with CPU 4.00 GHz processor and 16 GB RAM with the machine precision under the Miscrosoft Windows 8 64-bit system.
We have tested LSQR, CGME, LSMR and MCGME on almost all the 1D and 2D problems from [2, 23, 25] and have observed similar phenomena. For the sake of length, we list only some of them in Table 1, where each problem takes its default parameter(s). We mention that the relatively easy 1D problems are all from [23, 25], where shaw, gravity and baart are severely ill-posed and phillips, heat and and deriv2 are moderately. The 2D image deblurring problems blur, fanbeamtomo and seismictomo are also from [23, 25], and the other 2D problems are from [2]. We notice that for blur, fanbeamtomo, although the orders and are already tens of thousands, their condition numbers are only 31.5 and 2472, respectively, which, intuitively, do not satisfy the definition of a discrete ill-posed problem whose singular values decay and are centered at zero, so that the ratio is very large. For each test problem, we compute and add a Gaussian white noise with zero mean to by prescribing the relative noise level
[TABLE]
We use the code lsqr_b.m of [23], where the reorthogonalization is exploited during Lanczos bidiagonalization in order to maintain the numerical orthogonality of and . We have written the Matlab codes of CGME, LSMR and MCGME based on the same Lanczos bidiagonalization process used in lsqr_b.m.
For all the 1D problems and the 2D seismictomo, we report the results on them for ; for all the 2D problems except blur and fanbeamtomo, we report the results on them for . For several other , we have the same findings. For blur and fanbeamtomo, however, we will observe some fundamental distinctions between the convergence features for lying in this practical interval. Figures 2–7 depict the convergence processes of LSQR, CGME, LSMR and MCGME, and we give some key details, including the iterations at which the semi-convergence of an algorithm occurs and the relative error of the best regularized solution obtained by each algorithm, which is defined by
[TABLE]
for LSQR. Similar relative errors are defined for CGME, LSMR and MCGME with the superscript “” replaced by “”, “” and “”, respectively. In addition, as a comparison standard on the solution accuracy, we depict the semi-convergence process of the TSVD method for blur and seismictomo, and report the relative errors of the best TSVD regularized solutions with the transition point at which the semi-convergence of TSVD occurs. For the other nine larger 2D problems, we cannot compute the SVDs of the matrices due to out of memory in our computer. We mention that for the first six 1D test problems we have found that the best regularized solutions obtained by TSVD method have the same accuracy as those by LSQR, where the are very small relative to and all the correspondingly. We omit the results on the 1D problems obtained by the TSVD method.
We now comment the figures and the related details in order.
Firstly, for all the problems in Table 1, the semi-convergence of CGME occurs earlier than LSQR and can be much earlier. This confirms Theorem 4. The much earlier semi-convergence of CGME indicates that occurs much earlier for CGME than for LSQR.
Secondly, for all the problems, the best regularized solutions are correspondingly less accurate than considerably except for blur in Figure 5, where the best regularized solution by CGME is almost as accurate as those by LSQR, LSMR and MCGME. For all the 1D problems but baart and the 2D problem fanbeamtomo with , the relative errors of the best regularized solutions by CGME are twice to five times larger than the counterparts by the other three ones, indicating that the regularization ability is considerably inferior to the other three ones, given that the relative errors by LSQR, LSMR and MCGME themselves are only roughly ; see Figures 1 (a) and 6 (a). These results confirm Theorems 1–2 and the analysis on them. We will make more comments on Figure 5 later.
Thirdly, for each of the problems, by a careful observation and comparison, we have found that is more accurate than and at least at least as accurate as until the occurrence of CGME, after which LSQR continues improving iterates until the occurrence of its semi-convergence, as is clearly seen from Figures 1–7. These results justify our arguments on (58).
Fourthly, for each of the 2D problems, the best regularized solution is at least as accurate as , and the semi-convergence of LSMR always occurs no sooner and actually later than that of LSQR. We notice that the relative error of is only slightly smaller than that of , and there is little difference between them. For all the 1D problems, the semi-convergence of LSMR and LSQR occurs exactly at the same iterations, and the best regularized solutions obtained by them have the same accuracy. These results confirm Remark 6.2 and justify that LSMR has the same regularization ability as that of LSQR.
Fifthly, for each of the test problems, MCGME improves CGME substantially. As a matter of fact, for the 1D problems, the best regularized solutions by MCGME have the same accuracy as those by LSQR and LSMR; for the 2D problems, the best regularized solutions are almost as accurate as and .
Sixthly, as we have stated, blur and fanbeamtomo are quite well conditioned. With the relatively small , we observe from Figures 5–6 that there is no semi-convergence phenomenon for LSQR, LSMR and MCGME as well as the TSVD method. This means that does not plays a part in regularization and these methods solve these two problems as if they were ordinary linear systems. Furthermore, it is clear from the figures that the relative errors of regularized solutions obtained by LSQR, LSMR and MCGME stabilize after 30 iterations for blur and 80 iterations for fanbeamtomo, respectively. Figures 5 (a) and 6 (a) seems to indicate that CGME has no semi-convergence phenomenon for the square blur and given but it has for the rectangular fanbeamtomo. However, this semi-convergence is in disguise and is not caused by the noise : For the rectangular fanbeamtomo, (44), its proof and the analysis on it state that the smallest singular value of can be arbitrarily small and approaches zero as increases. As we have elaborated, approaches the eigenvalue zero of as increases. As a result, the projected problem involved in CGME can become even worse conditioned than (1) itself as increases for rectangular, causing that , which equals , and the relative error tends to infinity with respect to . This can also be seen from (49), where we can easily check that as increases since is a constant but as increases.
In contrast, the smallest singular values of the projection matrices are always bounded from below by either for LSQR (cf. (43)) and MCGME (cf. (59)) or for LSMR (cf. (97)), no matter how is rectangular or square. This is why CGME has seemingly semi-convergence phenomenon for rectangular when the other solvers do not have. In the meantime, we see that the best regularized solution by CGME is substantially less accurate than those by the other three algorithms for fanbeamtomo. For the square blur with , we see that the four Krylov solvers and the TSVD method do not exhibit semi-convergence and compute the solutions with very comparable accuracy. These results and analysis tell us that CGME is definitely not a good choice when is rectangular.
Seventhly, if the relative noise level is increased to , the semi-convergence of LSQR, LSMR and MCGME occurs for fanbeamtomo, as is seen from Figure 6. We have also observed the semi-convergence of the four algorithms and the TSVD method for blur with . We find that the best regularized solutions by LSQR, LSMR and MCGME have very comparable accuracy but CGME computes a less accurate best regularized solution. We omit the corresponding figure. For the test problems, we have also observed that the semi-convergence of the TSVD method occurs much later than the four Krylov solvers, i.e., .
8 Conclusions
For a general large-scale ill-posed problem (1), iterative solvers are only computationally viable. Of them, the Krylov solvers LSQR, CGLS, CGME and LSMR have been commonly used. In terms of the accuracy of the rank approximation to in LSQR, in this paper we have derived accurate estimates for the accuracy of the rank approximations to and that are involved in CGME and LSMR, respectively. We have made detailed analyses on the approximation behavior of the singular values of the projection matrices associated with CGME and LSMR. In the meantime, we have derived the filtered SVD expansion of CGME regularized iterates. In conclusion, we have shown that the regularization of CGME is generally inferior to LSQR and the semi-convergence of CGME occurs no later than that of LSQR. We have extracted a best possible rank approximation to from the rank approximation , and have shown why such approximation is as accurate as the rank approximation in LSQR. Based on this analysis, as a by-product, we have proposed a modified CGME (MCGME) method that improves CGME substantially and has the same regularization ability as LSQR.
We have substantially improved a fundamental result, Theorem 9.3 in [16], which gives a bound for the approximation accuracy of the truncated rank SVD approximation to generated by randomized algorithms and lacks a complete understanding to its considerable overestimate. Our new bounds are unconditionally superior to theirs and reveal how the truncation step affects the accuracy of the truncated rank approximation to .
In the meantime, we have proved that LSMR has the same regularization ability as LSQR and the semi-convergence of LSMR occurs no sooner than that of LSQR. Particularly, we have shown that LSMR has the full regularization for severely and moderately ill-posed problems with suitable and .
We have made detailed numerical experiments to confirm our regularization results on CGME and LSMR. We have also numerically demonstrated that the best regularized solutions by MCGME are very comparable to those by LSQR.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. C. Aster, B. Borchers, and C. H. Thurber, Parameter Estimation and Inverse Problems , second ed., Elsevier, New York, 2013.
- 2[2] S. Berisha and J. G. Nagy, Restore tools: Iterative methods for image restoration , 2012, available from http://www.mathcs.emory.edu/ ∼ nagy/Restore Tools.
- 3[3] Å. Björck, Numerical Methods for Least Squares Problems , SIAM, Philadelphia, PA, 1996.
- 4[4] , Numerical Methods in Matrix Computations , Texts in Applied Mathematics, vol. 59, Springer, Cham, 2015.
- 5[5] J. Chung and K. Palmer, A hybrid LSMR algorithm for large-scale Tikhonov regularization , SIAM J. Sci. Comput., 37 (5) (2015), pp. S 562–S 580.
- 6[6] E. J. Craig, The N 𝑁 N -step iteration procedures , J. Math. Phys. 34 (1955), pp. 64–73.
- 7[7] J. Demmel, Applied Numerical Linear Algebra , SIAM, Philadelphia, PA, 1997.
- 8[8] B. Eicke, A. K. Lious, and R. Plato, The instability of some gradient methods for ill-posed problems , Numer. Math., 58 (1) (1990), pp. 129–134.
