The convergence of the Generalized Lanczos Trust-Region Method for the Trust-Region Subproblem
Zhongxiao Jia, Fa Wang

TL;DR
This paper develops a comprehensive convergence theory for the generalized Lanczos trust-region (GLTR) method, providing a-priori bounds for key solution errors and residuals in large-scale trust-region subproblems, validated by numerical experiments.
Contribution
It introduces the first a-priori convergence bounds for Lagrangian multipliers and residual norms in the GLTR method for large-scale trust-region subproblems.
Findings
Derived a-priori bounds for Lagrangian multiplier errors
Established convergence rates for residual norms
Numerical results confirm the bounds' accuracy
Abstract
Solving the trust-region subproblem (TRS) plays a key role in numerical optimization and many other applications. The generalized Lanczos trust-region (GLTR) method is a well-known Lanczos type approach for solving a large-scale TRS. The method projects the original large-scale TRS onto a dimensional Krylov subspace, whose orthonormal basis is generated by the symmetric Lanczos process, and computes an approximate solution from the underlying subspace. There have been some a-priori error bounds for the optimal solution and the optimal objective value in the literature, but no a-priori result exists on the convergence of Lagrangian multipliers involved in projected TRS's and the residual norm of approximate solution. In this paper, a general convergence theory of the GLTR method is established, and a-priori bounds are derived for the errors of the optimal Lagrangian multiplier, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Advanced Optimization Algorithms Research · Numerical Methods and Algorithms
The convergence of the Generalized Lanczos Trust-Region Method for the Trust-Region
Subproblem††thanks: This work was supported in part by the National Science Foundation of China (No. 11771249)
Zhongxiao Jia Corresponding author. Department of Mathematical Sciences, Tsinghua University, 100084 Beijing, China. () [email protected]
Fa Wang Department of Mathematical Sciences, Tsinghua University, 100084 Beijing, China. () [email protected]
Abstract
Solving the trust-region subproblem (TRS) plays a key role in numerical optimization and many other applications. The generalized Lanczos trust-region (GLTR) method is a well-known Lanczos type approach for solving a large-scale TRS. The method projects the original large-scale TRS onto a dimensional Krylov subspace, whose orthonormal basis is generated by the symmetric Lanczos process, and computes an approximate solution from the underlying subspace. There have been some a-priori error bounds for the optimal solution and the optimal objective value in the literature, but no a-priori result exists on the convergence of Lagrangian multipliers involved in projected TRS’s and the residual norm of approximate solution. In this paper, a general convergence theory of the GLTR method is established, and a-priori bounds are derived for the errors of the optimal Lagrangian multiplier, the optimal solution, the optimal objective value and the residual norm of approximate solution. Numerical experiments demonstrate that our bounds are realistic and predict the convergence rates of the three errors and residual norms accurately.
keywords:
trust-region subproblem, GLTR method, a-priori bound, Lagrangian multiplier, Chebyshev polynomial, eigenvalue problem, symmetric Lanczos process, Krylov subspace
AMS:
90C20, 90C30, 65K05, 65F10
\slugger
sirevxxxxxxxx–x
1 Introduction
Consider the solution of the trust-region subproblem (TRS)
[TABLE]
where is symmetric and nonsingular, the nonzero , is the trust-region radius, and the norm is the 2-norm of a matrix or vector. Problem (1) arises from nonlinear numerical optimization [3, 21], where is a quadratic model of at the current approximate solution, is Hessian and is the gradient of at the current approximate solution, and many others, e.g., Tikhonov regularization of ill-posed problems [23, 24], graph partitioning problems [14], the constrained eigenvalue problem [10], and the Levenberg–Marquardt algorithm for solving nonlinear least squares problems [21].
The following results [3, 20] provide a theoretical basis for a TRS algorithm and give necessary and sufficient conditions, called the optimal conditions, for the solution of TRS (1).
Theorem 1**.**
A vector is a solution to (1) if and only if there exists the optimal Lagrangian multiplier such that
[TABLE]
where is the 2-norm of a matrix or vector, and the notation indicates that a symmetric matrix is semi-positive definite.
TRS algorithms for solving (1) have been extensively studied for a few decades and can be classified as the following four categories, in which most of the algorithms in the first three categories are mentioned in [1].
- •
Accurate methods for dense problems. The Moré-Sorensen method [20] iteratively solves symmetric positive definite linear systems by the Cholesky factorizations. It is highly efficient and accurate for small to medium sized dense problems.
- •
Accurate methods for large sparse problems. Algorithms in [23, 24, 26] iteratively compute the smallest eigenvalue of the matrix , where is a adjusted parameter. Another approach due to [22] solves TRS via semidefinite programming, and a modification of the Moré-Sorensen method using Taylor series is presented in [9]. The generalized Lanczos trust-region(GLTR) method [8] solves the TRS by a Lanczos type approach. Other accurate methods include subspace projection methods; see, e.g., [6, 13].
- •
Approximate methods. Steihaug and Toint independently propose a Truncated Conjugate Gradient (TCG) method [27, 29], and Yuan [30] proves that the function reduction obtained at the point produced by this method is at least half of that obtained at the function minimizer when the function is convex, i.e., is symmetric positive definite. If is symmetric indefinite, an approximate solution must reach the trust-region boundary and TCG only solves (1) approximately.
- •
Eigenvalue based methods. The method due to Gander, Golub and von Matt [10] reduces TRS (1) to a single quadratic eigenvalue problem, which is linearized to a standard eigenvalue problem of size . Using a different derivation, Adachi et al. [1] extend the method in [10] to a more general TRS (6) and formulate it as a generalized eigenvalue problem of size . A solution to (1) can be determined by the rightmost eigenvalue and the associated eigenvector of the resulting matrix. The eigenvalue problem is solved by the QR algorithm for small or moderate and by iterative projection methods for large [25].
In applications, rather than simply using the 2-norm, some methods (see, e.g., [1, 8, 22, 26]) focus on the following more general TRS
[TABLE]
where is symmetric positive definite and the norm . In light of [23], the matrix is often constructed to impose a smoothness condition on a solution to (6) for the ill-posed problem and to incorporate scaling of variables in optimization. For instance, it is argued in [3] that a good choice is for some invertible matrix or the Hermitian polar factor [15] of .
Notice that the problem (6) is mathematically equivalent to a standard TRS (1) through the following substitutions
[TABLE]
Therefore, we assume that , the identity matrix, and just consider TRS (1) without loss of generality when considering the convergence of the GLTR method.
The GLTR method and other projection methods avoid the high overhead of computing a series of Cholesky factorizations and have shown to be efficient for a large-scale TRS; see, e.g., [2, 5, 8]. Let be a solution to TRS (1) and be the approximate solution from the underlying dimensional Krylov subspace obtained by the GLTR method. By Theorem 1, there is an optimal Lagrangian multiplier for each projected TRS problem onto . Then four central convergence problems are: how fast the three errors , , and the residual norm of the approximate solution of (3) decrease as increases. Regarding and , some a-priori bounds have been derived in [31]. However, for and , there have been no a-priori bounds to show how they converge and tend to zero as increases. The only known result on is that increases monotonically with and is bounded from above by [18]. Therefore, we always have . The residual norm is important in both theory and practice as it is computable and its size is commonly used to measure the convergence of the GLTR method. We mention that a mixed bound is given for in [32, Lemma 3.4]. However, it is easy to check that the mixed bound in [32] does not exhibit any decreasing tendency and even can never be small unless the symmetric Lanczos process breaks down, in which case the bound is trivially zero.
Remarkably, it has recently been shown that, under certain mild conditions, the solution of (1) is mathematically equivalent to solving a certain matrix eigenvalue problem of size [1]. This equivalence provides us a new approach to efficiently solve (1). Among others, such mathematical equivalence makes us realize that, at iteration , the GLTR method amounts to solving a certain eigenvalue problem of size by projecting the matrix eigenvalue problem onto a special dimensional subspace in constructed by used in the GLTR method. At iteration , unlike the GLTR method, one can simultaneously obtain the optimal and the solution to the projected TRS. Such key observation is our starting point to study the convergence of the GLTR method. A note is that we are mainly concerned with other than the error . The sine is a standard measure when considering the error of an eigenvector and its approximations in the context of the matrix eigenvalue problem [28]. The authors of [1] measure the error of and by the sine of angle in their experiments.
The importance of the contributions in this paper is, in turn, the establishment of the two a-priori bounds for for the first time, that of the bound for , that of the bounds for the residual norm for the first time, and finally that of a new sharp bound for . The bound for is different from the two ones presented in [31], and its proof is also simpler than those in [31]. The first a-priori bound for , though a considerable overestimate, is the background for establishing the second much sharper one. With the bounds for and or , we are able to derive a-priori bounds for . When establishing the first a-priori bound for and the a-priori bound for , we need to solve the problem of the polynomial best uniform approximation to the rational function with and . We will exploit a generating function of with Chebyshev polynomials of the second kind [4] to handle this best uniform approximation problem by obtaining a suboptimal approximation polynomial. Numerical results demonstrate that our a-priori bounds predict the convergence rates of the three errors and residual norms and estimate their values accurately.
This paper is organized as follows. In section 2, we give some preliminaries and introduce the equivalence of the solution of (1) and a certain matrix eigenvalue problem. We review the GLTR method in section 3. Section 4 is devoted to a-priori bounds for and . A-priori bounds for and are presented in section 5. In section 6, we report numerical experiments to confirm that our bounds estimate the convergence rates and behavior of the GLTR method accurately. Finally, we conclude the paper in section 7.
Throughout this paper, denote by the superscript the transpose of a matrix or vector, by the 2-norm of a matrix or vector, by the identity matrix with order clear from the context, and by the th column of . All vectors are column vectors and are typeset in lower case letters.
2 Preliminaries
2.1 A solution to TRS (1)
Suppose that is the eigendecomposition of , where is orthogonal and with the being the eigenvalues of labeled as .
If , then the solution to TRS (1) is unique and . If (1) has no solution with , then is positive definite and with and . All these correspond to the so-called “easy case” [3, 8, 20, 21] or “nondegenerate case” [13].
If is indefinite and
[TABLE]
the null space of , then we have the following definition [3, 8, 21].
Definition 2** (Hard Case).**
The solution of TRS (1) is a hard case if is orthogonal to the eigenspace corresponding to the eigenvalue of and the optimal Lagrangian multiplier is .
The hard case is also called the “degenerate case” [13]. In this case, (1) may have multiple optimal solutions [21, p.87-88], which can be characterized as
[TABLE]
where and , , and the superscript denotes the Moore-Penrose generalized inverse. with is unique if and only if is a simple eigenvalue of and the scalar satisfies
[TABLE]
As we can see, in the hard case, we not only need to solve a singular system but also need to compute the eigenspace of associated with the smallest eigenvalue . The hard case has been studied for years; see, e.g., [7, 8, 20, 21, 22]. An eigensolver is proposed in [1] to detect and handle the hard case theoretically and numerically.
As has been addressed in [3], the hard case rarely occurs in practice, as it requires that both be indefinite and be orthogonal to . In the sequel, we are only concerned with the easy case.
2.2 The equivalence of the TRS and a matrix
eigenvalue problem
Adachi et al. [1] prove that TRS (6) can be treated by solving a certain generalized eigenvalue problem of order . For , the generalized eigenvalue problem in [1] reduces to the standard eigenvalue problem of the augmented matrix
[TABLE]
Let be the eigenvalues of labeled as
[TABLE]
where is the real part of a scalar. The following result in [1] establishes a key relationship between the TRS solution and the eigenpair of .
Theorem 3** ([1]).**
Let satisfy Theorem 1 with . Then the rightmost eigenvalue of is real and simple, and . Let be the unit length eigenvector of associated with the eigenvalue , i.e.,
[TABLE]
and suppose that . Then the unique TRS solution is
[TABLE]
Remark 2.1**.**
Adachi et al. [1] have proved that corresponds to the hard case, i.e., and Therefore, in the easy case, is guaranteed, and (10) holds.
3 The generalized Lanczos trust-region (GLTR) method [8]
For (1) large, an effective approach is to iteratively solve a sequence of smaller projected problems
[TABLE]
where is some specially chosen dimensional subspace, and we use the solution to TRS (11) to approximate .
A most commonly used is the dimensional Krylov subspace
[TABLE]
generated by and . The GLTR method starts with the TCG method [27, 29]. When is positive definite and , which corresponds to , the method returns a converged approximate solution to . In this case, the convergence theory of the standard conjugate gradient method is directly applicable. The GLTR method switches to the Lanczos method to accurately solve the projected problem (11) whenever a negative curvature is present or the solution norm by the TCG method exceeds the trust-region radius , which corresponds to an indefinite or . It proceeds in such a way until converges to .
In the sequel, without loss of generality we always assume that the TCG method does not solve (11) exactly and one must use the Lanczos method starting from the first iteration, so as to compute the solution to (11) with , meaning that for .
In the following, we describe the GLTR method. At iteration , mathematically, the GLTR method exploits the symmetric Lanczos process to generate an orthonormal basis of defined by (12), which can be written in matrix form
[TABLE]
where is orthonormal and the matrix
[TABLE]
is symmetric tridiagonal, which is called the orthogonal projection matrix of onto in the orthonormal basis .
We shall consider vectors of form
[TABLE]
Let solve the projected problem
[TABLE]
It then follows from (17) and the Lanczos process that solves the reduced TRS
[TABLE]
and .
From Theorem 1, the vector is a solution to (19) if and only if there exists the optimal Lagrangian multiplier such that
[TABLE]
As is tridiagonal, we can use the Moré-Sorensen method to efficiently solve (18) even if is large and then obtain from . The resulting method is the GLTR method for solving (1). It has been shown in [1] that TRS (19) is always the easy case provided that the symmetric Lanczos process does not break down at iteration . Under the assumption that , this means that we always for all , where is the first iteration at which the symmetric Lanczos process breaks down, i.e., .
The authors of [8] prove that the residual norm of and as approximate solutions of (3) satisfies
[TABLE]
from which it is known that if the symmetric Lanczos process breaks down at iteration for the first time, then and . This result indicates that we can efficiently measure the residual norm by exploiting the last entry of without explicitly forming before a prescribed convergence tolerance is achieved.
In the next two sections we shall consider the convergence of the GLTR method, and establish a-priori bounds for the errors , , and the residual norm . We will prove how they decrease as increases. We point out that, unlike , which is concerned with in [31, 32], we consider the error .
4 A-priori bounds for
and
We establish a-priori bounds for in this section. It is known from [18] that increases monotonically with and is bounded from above by . Precisely, suppose that the symmetric Lanczos process breaks down at some . Then for it holds that
[TABLE]
Under the assumption that , we have for , but there has been no quantitative result on how fast converges to .
Define the matrix
[TABLE]
with defined by (7) and
[TABLE]
with the columns of the orthonormal defined by (13). It is straightforward that
[TABLE]
with defined by (16) and .
Obviously, is column orthonormal, and its columns span the dimensional subspace
[TABLE]
Therefore, is the orthogonal projection matrix of onto in the orthonormal basis and .
Let , be the eigenvalues of , which, similarly to (8), are labeled as
[TABLE]
From Theorem 3 it is known that
[TABLE]
is real and simple.
Let z^{(k)}=\left(\begin{array}[]{c}z^{(k)}_{1}\\ z^{(k)}_{2}\\ \end{array}\right) be the unit length eigenvector of associated with , i.e.,
[TABLE]
Then the vector
[TABLE]
is the Ritz vector of from the subspace and approximates the unit length eigenvector of associated with its rightmost real eigenvalue .
From the structure (27) of and the definition (30) of , it is easy to show that
[TABLE]
is the left eigenvector of corresponding to the real simple eigenvalue . and from (30) it is straightforward to verify that
[TABLE]
Therefore, by definition (cf. [28, p.186]), the spectral condition number of is
[TABLE]
Similarly, by the structure (7) of and the definition (9) of , the vector is the left eigenvector of associated with the eigenvalue . As a result, the spectral condition number of is
[TABLE]
By Theorem 3, the unique solution to (19) is
[TABLE]
and the unique solution to TRS (18) is
[TABLE]
Denote by the acute angle between a nonzero vector and . Then
[TABLE]
where is the orthogonal projector onto . In terms of Theorem 3 and (29), we have
[TABLE]
where is the rightmost eigenvalue of .
Let be the orthogonal projector onto . Then is the restriction of to the subspace and its matrix representation is in the orthonormal basis and . The eigenvalues of restricted to are the eigenvalues of , and the eigenvectors are the Ritz vectors of from ; see [25] for details. Therefore, a direct application of Theorem 3.8 in [16] to our context gives the following result.
Lemma 4**.**
Let and be the rightmost eigenvalues of and , respectively, and suppose that . Then for small it holds that
[TABLE]
where is defined by (43) and . 111In Theorem 3.8 of [16], in the right-hand side of (49) is , but it is obvious that the sine and tangent can be replaced each other in the right-hand side when becomes small.
[TABLE]
which converges to defined by (44) when . This is indeed the case, as will be shown in the next section. In the meantime, . As a result, by this lemma, the convergence problem of to becomes to analyze how fast decreases as increases.
Notice that
[TABLE]
Therefore, in order to bound and to show how it converges to zero as increases, we need to analyze and separately.
We first consider . Throughout the paper, we denote by the set of polynomials of degree not exceeding . We first present the following result.
Lemma 5**.**
The distance between and satisfies
[TABLE]
and
[TABLE]
where
[TABLE]
with being the eigenvalues of . Moreover,
[TABLE]
where is the condition number of .
Proof. Theorem 1 has shown that satisfies the linear system . Therefore, exploiting the shift invariance and the eigendecomposition , we have
[TABLE]
with the polynomial and .
Note that is symmetric positive definite. Applying a standard estimate (cf. the book [11, p.51, Theorem 3.1.1] to , we obtain (54).
Relation (10) shows that is the same as up to a scaling. Therefore, replacing in (51) and (52) by and exploiting (54), we have established the following upper bound for .
Theorem 6**.**
Let be the unit length eigenvector of associated with its rightmost eigenvalue . Then
[TABLE]
where .
As it will turn out, an estimation of is much more involved.
Theorem 7**.**
With the notation previously, we have
[TABLE]
where and are the largest and smallest eigenvalues of , and
[TABLE]
with
[TABLE]
where .
Proof. Recall that is the eigendecomposition of , where is orthogonal and with the eigenvalues.
From and (10), we obtain
[TABLE]
From (9), we have
[TABLE]
Making use of , (59) and the orthogonality of , we then obtain
[TABLE]
Consider the variable transformation
[TABLE]
which maps to in one-to-one correspondence. Then
[TABLE]
is the error of the best or optimal uniform polynomial approximation from to the rational function over the interval with . To our best knowledge, there seems no known explicit solution to such approximation problem. However, recall from (50) that . Therefore, it is enough to prove that is of the same order as bound (55) because this means that is at least as small as bound (55) for . To this end, exploiting Chebyshev polynomials of the second kind and one of its fundamental properties, we will establish a desired bound for , which is indeed as small as bound (55).
Theorem 8**.**
The approximation error
[TABLE]
and
[TABLE]
where and .
Proof. For any and there is the following generating function [4, p.215]:
[TABLE]
where is the th degree Chebyshev polynomial of the second kind [4, p.212].
For , it is easily justified that . Therefore, the identity (63) becomes
[TABLE]
from which it follows that
[TABLE]
Taking the th degree polynomial
[TABLE]
and noting that for and for , we have
[TABLE]
From (58), it is straightforward to justify that
[TABLE]
Therefore, from (56), (60) and (65) it follows that (61) and (62) hold.
Combining Lemma 4, (50), Theorem 6 and Theorem 8, by a simple manipulation, we achieve the following bounds for and .
Theorem 9**.**
Suppose that . Then
[TABLE]
and asymptotically
[TABLE]
where
[TABLE]
* with the orthogonal projector onto defined by (28), and and are defined by (43) and (66).*
A-priori bound (68), for the first time, proves that converges to zero as increases. As a matter of fact, based on this bound, we can further establish a much sharper bound for . Before proceeding, we first derive the following result, which will play a key role in establishing the sharper a-priori bound for .
Theorem 10**.**
For , the following a-priori bound holds:
[TABLE]
where and .
Proof. Consider the symmetric positive definite linear system
[TABLE]
with , which is (21) for and has the solution . When taking as the starting vector, i.e., taking the zero vector as an initial guess to , the symmetric Lanczos process generates an orthonormal basis of the dimensional Krylov subspace
[TABLE]
and the symmetric tridiagonal . Define . Then . Applying the symmetric Lanczos method to solving (71), at iteration we obtain the projected problem
[TABLE]
Write its solution as . Then the symmetric Lanczos method computes the approximation of .
Define the error and the residual of (71). Note that the initial residual . Then and
[TABLE]
from which and [19, Theorem 2.11] it follows that the square of -norm error satisfies
[TABLE]
As a result, we obtain
[TABLE]
Notice that the eigenvalues of are the exact eigenvalues of , which means that the smallest and largest eigenvalues of lie in . Since the symmetric Lanczos method is mathematically equivalent to the conjugate gradient method at the same iteration when the same initial guess on is used, applying a standard estimate (cf. [11, Theorem 3.1.1] and [19, Theorem 2.30]) to gives rise to
[TABLE]
Since , the the squared initial error
[TABLE]
Exploiting , we obtain
[TABLE]
Substituting the above three relations into (72) yields (70).
Theorem 11**.**
Assume that the symmetric Lanczos process breaks down at iteration and for . Then for suitably large we have the asymptotic a-priori bound
[TABLE]
*where the factors *
[TABLE]
with .
Proof. From (21), we obtain
[TABLE]
and . Therefore, by (19) we have and
[TABLE]
By assumption and (19), we have
[TABLE]
with , and the eigenvalues are the exact eigenvalues of . Similarly to the above derivation, we obtain
[TABLE]
Subtracting the two hand sides of (76) and (77) yields
[TABLE]
Since and (67) has proved that is nonnegative and tends to zero as increases, we must have , i.e., , for suitably large. Precisely, by (68), a sufficient condition is to choose such that
[TABLE]
Moreover, since , by continuity argument, we have
[TABLE]
where the quantity in the right hand side has been shown by (72) to be strictly negative for all . Therefore, must become nonpositive for suitably large, that is, the first term in the right hand side of (78) becomes nonpositive as increases. As a result, from (78) we obtain the inequality
[TABLE]
when is suitably large.
Let us analyze . Since for suitably large, exploiting the series expansion of , we obtain
[TABLE]
Therefore, we have
[TABLE]
which is nonnegative provided that is suitably large. Substituting this relation into (79) and dropping the nonnegative higher small term in the resulting left-hand side give rise to
[TABLE]
with and defined by (74) and (75), respectively, which proves (73).
Since is symmetric positive definite and its eigenvalues lie between and , the smallest and largest ones of , respectively, we have . As a result, from the forms of and , it is straightforward to obtain
[TABLE]
independent of iteration .
Relation (73) shows that bounding amounts to bounding and separately. We have established an a-priori bound (70) for the former one. Now we investigate . Steihaug [27] has proved that the error of the optimal objective value monotonically decreases with respect to . Zhang et al. [31, Theorem 4.3] have given the following result. Starting with it, we can derive a new a-priori bound for , whose proof is much shorter than those in [31].
Lemma 12** ([31]).**
Suppose . Then
[TABLE]
for any nonzero .
Theorem 13**.**
Suppose . Then
[TABLE]
where .
Proof. Relation (81) has shown that
[TABLE]
By definition, we have
[TABLE]
where is the orthogonal projector onto . From the above relation and Lemma 5, it is immediate that
[TABLE]
Substituting it into (83) yields (82).
By a comparison, we find that bound (82) is as sharp as (4.24a) and (4.26a) in [31] but has a simpler form than the latter two, and its proof is also simpler.
Substituting bound (82) for into (73) and bound (70) into (73) ultimately leads to the following a-priori bound for .
Theorem 14**.**
Suppose . Then for suitably large we have
[TABLE]
with the factors and defined by (74) and (75), respectively.
This theorem clearly indicates that, except for the bounded factor, converges at least as fast as , and bound (86) is much sharper than bound (68) and is roughly square of the latter.
5 A-priori bounds for and
Suppose that . Then and have unit length. It is worthwhile to notice that the measures and are equivalent once they start to become fairly small. In fact, for fairly small we have
[TABLE]
It is seen from (46) and (10) that and are the same as and up to scaling, respectively. As a result, we have
[TABLE]
We take two steps to estimate . Firstly, we bound in terms of with and defined by (9) and (41), respectively. Secondly, we establish an a-priori bound for , showing how it converges to zero as increases. To this end, we need the following result [12, Lemma 2.3].
Lemma 15** ([12]).**
Let u=\left(\begin{array}[]{c}u_{1}\\ u_{2}\\ \end{array}\right) and \tilde{u}=\left(\begin{array}[]{c}\tilde{u}_{1}\\ \tilde{u}_{2}\\ \end{array}\right) where , for , and . Then
[TABLE]
With this lemma, we can present the following bound.
Theorem 16**.**
For the unit length eigenvector of associated with the eigenvalue and defined by (41), we have
[TABLE]
Proof. From (10) and (46), since
[TABLE]
with the unit length vectors and , by definition (41) of and Lemma 15 we obtain
[TABLE]
Bound (89) indicates that how fast converges amounts to how fast tends to zero as increases. In what follows, we derive an a-priori bound for .
As has been seen, and are simple eigenpairs of and , respectively, and is the Ritz pair approximating the eigenpair of . Let be orthogonal. Then the columns of form an orthonormal basis of the orthogonal complement of the subspace spanned by . It follows from the relation that
[TABLE]
where and .
Because the right hand side of (91) is block triangular, the eigenvalues of consist of and the eigenvalues of . Since is simple, is nonsingular. The quantity
[TABLE]
is called the separation of and , and , the smallest singular value of [28].
Let the columns of be an orthonormal basis of the orthogonal complement of the subspace spanned by and be orthogonal. From (30) we have , from which it follows that
[TABLE]
where and . Note that the eigenvalues of are the Ritz values but of with respect to the subspace defined by (28). As a result, by (29), is a simple eigenvalue of and . Since , and , we must have for suitably large.
In our notation, the following result is established in [17].
Lemma 17** ([17]).**
With the previous notation, let , assume that . Then
[TABLE]
Combining (89) and (94) with (67) yields the following result immediately.
Theorem 18**.**
For the unit length eigenvector of associated with its rightmost eigenvalue , assume that . Then it holds that
[TABLE]
where
[TABLE]
This theorem indicates that converges to at least as fast as .
Finally, we establish a-priori bounds for the residual norm .
Theorem 19**.**
Suppose . Then for we have
[TABLE]
by dropping the higher order small term .
Proof. From (3), we have
[TABLE]
Therefore, from , , and , noting that , we obtain
[TABLE]
by dropping the higher order small term .
Keep (87) in mind. By substituting bound (86) for and bound (95) for , which is approximately equal to for sufficiently large, into (96), we obtain an approximate a-priori bound for . They illustrate that is dominated by and tends to zero at least as fast as . Since the resulting bound is not rigorous, we do not write it explicitly.
As a by-product, by exploiting some of the previous results, it is easy to establish an a-priori bound for , as shown below. With it, we will establish a rigorous a-priori bound for .
Theorem 20**.**
Suppose . Then
[TABLE]
where .
Proof. It follows from [31, Theorem 4.3] and (84) that
[TABLE]
where is the orthogonal projector onto . Therefore, (97) follows from the above relation and (85) directly.
This theorem is the same as (4.18b) in [31]. With it, by substituting bound (86) for and bound (20) for into (96), it is straightforward to obtain the following rigorous a-priori bound for .
Theorem 21**.**
Suppose , and let . Then for suitably large we have
[TABLE]
with the factors and defined by (74) and (75), respectively.
Clearly, the second term of the right hand side in (98) dominates the bound soon as increases.
Summarizing the results obtained in these two sections, we conclude that the convergence rates of and are the squares of , and . This means that the convergence of and uses roughly half of the iterations as needed for and when the three errors and are reduced to about the same level.
6 Numerical examples
In this section, we compare our a-priori bounds in this paper with the four errors in the GLTR method: , , and , respectively. In order to give a full justification on our a-priori bounds, we test TRS’s with having different representative eigenvalue distributions and various condition numbers ’s.
All the experiments were performed on an Intel Core (TM) i7, CPU 3.6GHz, 8 GB RAM using MATLAB 2017A under the Microsoft Windows 10 64 bit.
Throughout this section, we always take and a fixed trust-region radius , and the vector is a unit length vector generated by the Matlab built-in function . Since the uncomputable tends to zero as increases, we take in the denominator of the bound of Theorem 18. We exploit the Matlab functions eigs and svds with the stopping tolerance to compute , and , respectively, use them as the “exact” ones, and then compute . To maintain the numerical orthogonality of the Lanczos basis vectors, in finite precision arithmetic, we use the symmetric Lanczos process with complete reorthogonalization.
When assessing our a-priori bounds, we should note that the bounds may be often large overestimates of the true errors, but that there are cases where the actual errors and their bounds become close to each other when increases. However one cannot say that a certain kind of bound is the sharpest in all cases. Possible overestimates of our bounds are not surprising, since the bounds are established in the worst case and the factors in front of or are the largest possible. Our aim consists in giving a-priori bounds which may yield sharp estimates of the asymptotic convergence rates even if those factors in front of the bounds are large.
Example 1. This example is randomly generated, where the symmetric indefinite sparse matrix is generated by the Matlab function
[TABLE]
where is a vector of ’s eigenvalues, and we take . We construct two ’s by taking two different ’s.
Example 1a. The elements of are evenly distributed among :
[TABLE]
Example 1b. We take the th element of as
[TABLE]
Therefore, the eigenvalues of lies in the union , and their magnitudes monotonically increases at the rate at each subinterval.
In Tables 1–2 and Figures 1–2, we list the results and compare the a-priori bounds with , , and , respectively.
Example 2. We take to be diagonal with translated Chebyshev nodes on the diagonal. This problem is tested in [31]. The zero nodes of the th Chebyshev polynomial in are given by
[TABLE]
Given an interval , the linear transformation
[TABLE]
maps to . The th translated Chebyshev zero nodes on are
[TABLE]
which monotonically decreases for and increases for , respectively, and cluster at and .
In Figure 3 and Table 3, we draw and list the results.
Example 3. We use the Strakoš matrix [19, p.16], which is used to test the behavior of the symmetric Lanczos method for the eigenvalue problem. The matrix is diagonal with the eigenvalues
[TABLE]
. The parameter controls the eigenvalue distribution. The large eigenvalues of are well separated for . We take , and .
In Figure 4 and Table 4, we depict and list the results.
Example 4. We take
[TABLE]
with generated by and . The eigenvalues of exhibit normal distribution characteristics. Figure 5 and Table 5 give the results.
We have observed from the figures and tables that, for all the test problems, (i) the corresponding bounds predict the convergence rates of , , and accurately and (ii) the bounds are very close to their values in most of the cases, especially for and .
The tables and figures also indicate that (i) the errors and as well as their bounds use roughly half of the iterations needed for and as well as their bounds to achieve approximately the same tolerance and (ii) the condition number affects the convergence of the GLTR method: the bigger is, the more iterations the method needs to reduce each of , , and to approximately the same level.
7 Conclusion
The GLTR method has been receiving high attention both theoretically and numerically. Some a-priori bounds have been obtained for and in the literature, but there has been no quantitative analysis and result on and . Starting with the mathematical equivalence of the solution of TRS (1) and the eigenvalue problem of the augmented matrix , we have established a-priori bounds for , , , and the residual norm . The results prove how the three errors and the residual norm decrease as the subspace dimension increases. Numerical results have confirmed that our bounds are realistic and they accurately predict the true convergence rates of the three errors and the residual norm in the GLTR method.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. Adachi, S. Iwata, Y. Nakatsukasa, and A. Takeda , Solving the trust-region subproblem by a generalized eigenvalue problem , SIAM J. Optim., 27 (2017), pp. 269–291.
- 2[2] R. H. Byrd, R. B. Schnabel, and G. A. Shultz , Approximate solution of the trust region problem by minimization over two-dimensional subspaces , Math. Prog., 40 (1988), pp. 247–263.
- 3[3] A. R. Conn, N. I. M. Gould, and P. L. Toint , Trust-region Methods , SIAM, Philadelphia, 2000.
- 4[4] R. Ei Attar , Special Functions and Orthogonal Polynomials , Lulu Press, USA, 2006.
- 5[5] J. B. Erway, and P. E. Gill , A subspace minimization method for the trust-region step , SIAM J. Optim., 20 (2010), pp. 1439–1461.
- 6[6] J. B. Erway, P. E. Gill, and J. D. Griffin , Iterative methods for finding a trust-region step , SIAM J. Optim., 20 (2009), pp. 1110–1131.
- 7[7] C. Fortin, and H. Wolkowicz , The trust region subproblem and semidefinite programming , Optim. Methods Softw., 19 (2004), pp. 41–67.
- 8[8] N. I. M. Gould, S. Lucidi, M. Roma, and P. L. Toint , Solving the trust-region subproblem using the Lanczos method , SIAM J. Optim., 9 (1999), pp. 504–525.
