The convergence of the Generalized Lanczos Trust-Region Method for the   Trust-Region Subproblem

Zhongxiao Jia; Fa Wang

arXiv:1908.02094·math.NA·April 13, 2021

The convergence of the Generalized Lanczos Trust-Region Method for the Trust-Region Subproblem

Zhongxiao Jia, Fa Wang

PDF

Open Access

TL;DR

This paper develops a comprehensive convergence theory for the generalized Lanczos trust-region (GLTR) method, providing a-priori bounds for key solution errors and residuals in large-scale trust-region subproblems, validated by numerical experiments.

Contribution

It introduces the first a-priori convergence bounds for Lagrangian multipliers and residual norms in the GLTR method for large-scale trust-region subproblems.

Findings

01

Derived a-priori bounds for Lagrangian multiplier errors

02

Established convergence rates for residual norms

03

Numerical results confirm the bounds' accuracy

Abstract

Solving the trust-region subproblem (TRS) plays a key role in numerical optimization and many other applications. The generalized Lanczos trust-region (GLTR) method is a well-known Lanczos type approach for solving a large-scale TRS. The method projects the original large-scale TRS onto a $k$ dimensional Krylov subspace, whose orthonormal basis is generated by the symmetric Lanczos process, and computes an approximate solution from the underlying subspace. There have been some a-priori error bounds for the optimal solution and the optimal objective value in the literature, but no a-priori result exists on the convergence of Lagrangian multipliers involved in projected TRS's and the residual norm of approximate solution. In this paper, a general convergence theory of the GLTR method is established, and a-priori bounds are derived for the errors of the optimal Lagrangian multiplier, the…

Tables5

Table 1. Table 1: Example 1a.

$α_{1}$	$α_{n}$	$κ$	$t$	$λ_{o p t}$	$q (s_{o p t})$
$2.0000$	$- 2.0000$	$18.1481$	$0.6198$	$2.2333$	$- 1.4770$

Table 2. Table 2: Example 1b.

$α_{1}$	$α_{n}$	$κ$	$t$	$λ_{o p t}$	$q (s_{o p t})$
$2.7183$	$- 2.7183$	$29.0828$	$0.6872$	$2.9119$	$- 1.7907$

Table 3. Table 3: Example 2.

$α_{1}$	$α_{n}$	$κ$	$t$	$λ_{o p t}$	$q (s_{o p t})$
$5.0000$	$- 5.0000$	$34.9455$	$0.7106$	$5.2946$	$- 2.9367$

Table 4. Table 4: Example 3.

$α_{1}$	$α_{n}$	$κ$	$t$	$λ_{o p t}$	$q (s_{o p t})$
$8.0000$	$- 2.0000$	$11.1518$	$0.5391$	$2.9850$	$- 1.9893$

Table 5. Table 5: Example 4.

$α_{1}$	$α_{n}$	$κ$	$t$	$λ_{o p t}$	$q (s_{o p t})$
$1.0000$	$- 0.9997$	$6.9000$	$0.4485$	$1.3386$	$- 1.1155$

Equations296

∥ s ∥ \leq Δ min q (s) = g^{T} s + \frac{1}{2} s^{T} A s,

∥ s ∥ \leq Δ min q (s) = g^{T} s + \frac{1}{2} s^{T} A s,

∥ s_{o pt} ∥

∥ s_{o pt} ∥

(A + λ_{o pt} I) s_{o pt}

λ_{o pt} (Δ - ∥ s_{o pt} ∥)

A + λ_{o pt} I

∥ s ∥_{B} \leq Δ min q (s),

∥ s ∥_{B} \leq Δ min q (s),

A \leftarrow B^{- \frac{1}{2}} A B^{- \frac{1}{2}}, g \leftarrow B^{- \frac{1}{2}} g .

A \leftarrow B^{- \frac{1}{2}} A B^{- \frac{1}{2}}, g \leftarrow B^{- \frac{1}{2}} g .

g ⊥ N (A - α_{n} I),

g ⊥ N (A - α_{n} I),

s_{o pt} = - (A - α_{n} I)^{†} g + η u_{n},

s_{o pt} = - (A - α_{n} I)^{†} g + η u_{n},

η^{2} = Δ^{2} - ∥ (A - α_{n} I)^{†} g ∥^{2} \geq 0.

η^{2} = Δ^{2} - ∥ (A - α_{n} I)^{†} g ∥^{2} \geq 0.

M=\left(\begin{array}[]{cc}-A&\frac{gg^{T}}{\Delta^{2}}\\ I&-A\\ \end{array}\right)\in\mathbb{R}^{2n\times 2n}.

M=\left(\begin{array}[]{cc}-A&\frac{gg^{T}}{\Delta^{2}}\\ I&-A\\ \end{array}\right)\in\mathbb{R}^{2n\times 2n}.

R e (μ_{1}) \geq R e (μ_{2}) \geq \dots \geq R e (μ_{2 n}),

R e (μ_{1}) \geq R e (μ_{2}) \geq \dots \geq R e (μ_{2 n}),

M\left(\begin{array}[]{c}y_{1}\\ y_{2}\\ \end{array}\right)=\mu_{1}\left(\begin{array}[]{c}y_{1}\\ y_{2}\\ \end{array}\right),\quad\left\|\left(\begin{array}[]{c}y_{1}\\ y_{2}\\ \end{array}\right)\right\|=1,

M\left(\begin{array}[]{c}y_{1}\\ y_{2}\\ \end{array}\right)=\mu_{1}\left(\begin{array}[]{c}y_{1}\\ y_{2}\\ \end{array}\right),\quad\left\|\left(\begin{array}[]{c}y_{1}\\ y_{2}\\ \end{array}\right)\right\|=1,

s_{o pt} = - \frac{Δ ^{2}}{g ^{T} y _{2}} y_{1} .

s_{o pt} = - \frac{Δ ^{2}}{g ^{T} y _{2}} y_{1} .

s \in S_{k}, ∥ s ∥ \leq Δ min q (s),

s \in S_{k}, ∥ s ∥ \leq Δ min q (s),

S_{k} = K_{k} (g, A) ≐ s p an {g, A g, A^{2} g, \dots, A^{k} g}

S_{k} = K_{k} (g, A) ≐ s p an {g, A g, A^{2} g, \dots, A^{k} g}

A Q_{k}

A Q_{k}

Q_{k}^{T} g

g

T_{k}=Q_{k}^{T}AQ_{k}=\left(\begin{array}[]{ccccc}\delta_{0}&\beta_{1}&&&\\ \beta_{1}&\delta_{1}&\ddots&&\\ &\ddots&\ddots&\ddots&\\ &&\ddots&\delta_{k-1}&\beta_{k}\\ &&&\beta_{k}&\delta_{k}\\ \end{array}\right)\in\mathbb{R}^{(k+1)\times(k+1)}

T_{k}=Q_{k}^{T}AQ_{k}=\left(\begin{array}[]{ccccc}\delta_{0}&\beta_{1}&&&\\ \beta_{1}&\delta_{1}&\ddots&&\\ &\ddots&\ddots&\ddots&\\ &&\ddots&\delta_{k-1}&\beta_{k}\\ &&&\beta_{k}&\delta_{k}\\ \end{array}\right)\in\mathbb{R}^{(k+1)\times(k+1)}

s = Q_{k} h \in S_{k} .

s = Q_{k} h \in S_{k} .

s \in S_{k}, ∥ s ∥ \leq Δ min q (s) = g^{T} s + \frac{1}{2} s^{T} A s .

s \in S_{k}, ∥ s ∥ \leq Δ min q (s) = g^{T} s + \frac{1}{2} s^{T} A s .

∥ h ∥ \leq Δ min ϕ (h) = β_{0} e_{1}^{T} h + \frac{1}{2} h^{T} T_{k} h

∥ h ∥ \leq Δ min ϕ (h) = β_{0} e_{1}^{T} h + \frac{1}{2} h^{T} T_{k} h

∥ h_{k} ∥

∥ h_{k} ∥

(T_{k} + λ_{k} I) h_{k}

λ_{k} (Δ - ∥ h_{k} ∥)

T_{k} + λ_{k} I

∥ (A + λ_{k} I) s_{k} + g ∥ = β_{k + 1} ∣ e_{k + 1}^{T} h_{k} ∣,

∥ (A + λ_{k} I) s_{k} + g ∥ = β_{k + 1} ∣ e_{k + 1}^{T} h_{k} ∣,

0 \leq λ_{0} \leq λ_{1} \leq \dots \leq λ_{k_{m a x}} = λ_{o pt} .

0 \leq λ_{0} \leq λ_{1} \leq \dots \leq λ_{k_{m a x}} = λ_{o pt} .

M_{k} = Q_{k}^{T} M Q_{k}

M_{k} = Q_{k}^{T} M Q_{k}

\widetilde{Q}_{k}=\left(\begin{array}[]{cc}Q_{k}&\\ &Q_{k}\\ \end{array}\right),

\widetilde{Q}_{k}=\left(\begin{array}[]{cc}Q_{k}&\\ &Q_{k}\\ \end{array}\right),

M_{k}=\left(\begin{array}[]{cc}-T_{k}&\frac{\beta_{0}^{2}e_{1}e_{1}^{T}}{\Delta^{2}}\\ I&-T_{k}\\ \end{array}\right)

M_{k}=\left(\begin{array}[]{cc}-T_{k}&\frac{\beta_{0}^{2}e_{1}e_{1}^{T}}{\Delta^{2}}\\ I&-T_{k}\\ \end{array}\right)

\widetilde{\mathcal{S}}_{k}=\left(\begin{array}[]{cc}\mathcal{S}_{k}&0\\ 0&\mathcal{S}_{k}\end{array}\right)\subset\mathbb{R}^{2n}.

\widetilde{\mathcal{S}}_{k}=\left(\begin{array}[]{cc}\mathcal{S}_{k}&0\\ 0&\mathcal{S}_{k}\end{array}\right)\subset\mathbb{R}^{2n}.

R e (μ_{1}^{(k)})

R e (μ_{1}^{(k)})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMatrix Theory and Algorithms · Advanced Optimization Algorithms Research · Numerical Methods and Algorithms

Full text

The convergence of the Generalized Lanczos Trust-Region Method for the Trust-Region

Subproblem††thanks: This work was supported in part by the National Science Foundation of China (No. 11771249)

Zhongxiao Jia Corresponding author. Department of Mathematical Sciences, Tsinghua University, 100084 Beijing, China. () [email protected]

Fa Wang Department of Mathematical Sciences, Tsinghua University, 100084 Beijing, China. () [email protected]

Abstract

Solving the trust-region subproblem (TRS) plays a key role in numerical optimization and many other applications. The generalized Lanczos trust-region (GLTR) method is a well-known Lanczos type approach for solving a large-scale TRS. The method projects the original large-scale TRS onto a $k$ dimensional Krylov subspace, whose orthonormal basis is generated by the symmetric Lanczos process, and computes an approximate solution from the underlying subspace. There have been some a-priori error bounds for the optimal solution and the optimal objective value in the literature, but no a-priori result exists on the convergence of Lagrangian multipliers involved in projected TRS’s and the residual norm of approximate solution. In this paper, a general convergence theory of the GLTR method is established, and a-priori bounds are derived for the errors of the optimal Lagrangian multiplier, the optimal solution, the optimal objective value and the residual norm of approximate solution. Numerical experiments demonstrate that our bounds are realistic and predict the convergence rates of the three errors and residual norms accurately.

keywords:

trust-region subproblem, GLTR method, a-priori bound, Lagrangian multiplier, Chebyshev polynomial, eigenvalue problem, symmetric Lanczos process, Krylov subspace

AMS:

90C20, 90C30, 65K05, 65F10

\slugger

sirevxxxxxxxx–x

1 Introduction

Consider the solution of the trust-region subproblem (TRS)

[TABLE]

where $A\in\mathbb{R}^{n\times n}$ is symmetric and nonsingular, the nonzero $g\in\mathbb{R}^{n}$ , $\Delta>0$ is the trust-region radius, and the norm $\|\cdot\|$ is the 2-norm of a matrix or vector. Problem (1) arises from nonlinear numerical optimization [3, 21], where $q(s)$ is a quadratic model of $\min f(s)$ at the current approximate solution, $A$ is Hessian and $g$ is the gradient of $f$ at the current approximate solution, and many others, e.g., Tikhonov regularization of ill-posed problems [23, 24], graph partitioning problems [14], the constrained eigenvalue problem [10], and the Levenberg–Marquardt algorithm for solving nonlinear least squares problems [21].

The following results [3, 20] provide a theoretical basis for a TRS algorithm and give necessary and sufficient conditions, called the optimal conditions, for the solution of TRS (1).

Theorem 1.

A vector $s_{opt}$ is a solution to (1) if and only if there exists the optimal Lagrangian multiplier $\lambda_{opt}\geq 0$ such that

[TABLE]

where $\|\cdot\|$ is the 2-norm of a matrix or vector, and the notation $\succeq 0$ indicates that a symmetric matrix is semi-positive definite.

TRS algorithms for solving (1) have been extensively studied for a few decades and can be classified as the following four categories, in which most of the algorithms in the first three categories are mentioned in [1].

•

Accurate methods for dense problems. The Moré-Sorensen method [20] iteratively solves symmetric positive definite linear systems by the Cholesky factorizations. It is highly efficient and accurate for small to medium sized dense problems.

•

Accurate methods for large sparse problems. Algorithms in [23, 24, 26] iteratively compute the smallest eigenvalue of the matrix $(\begin{smallmatrix}\alpha&g^{T}\\ g&A\end{smallmatrix})$ , where $\alpha$ is a adjusted parameter. Another approach due to [22] solves TRS via semidefinite programming, and a modification of the Moré-Sorensen method using Taylor series is presented in [9]. The generalized Lanczos trust-region(GLTR) method [8] solves the TRS by a Lanczos type approach. Other accurate methods include subspace projection methods; see, e.g., [6, 13].

•

Approximate methods. Steihaug and Toint independently propose a Truncated Conjugate Gradient (TCG) method [27, 29], and Yuan [30] proves that the function reduction obtained at the point produced by this method is at least half of that obtained at the function minimizer when the function $q(s)$ is convex, i.e., $A$ is symmetric positive definite. If $A$ is symmetric indefinite, an approximate solution must reach the trust-region boundary and TCG only solves (1) approximately.

•

Eigenvalue based methods. The method due to Gander, Golub and von Matt [10] reduces TRS (1) to a single quadratic eigenvalue problem, which is linearized to a standard eigenvalue problem of size $2n$ . Using a different derivation, Adachi et al. [1] extend the method in [10] to a more general TRS (6) and formulate it as a generalized eigenvalue problem of size $2n$ . A solution to (1) can be determined by the rightmost eigenvalue and the associated eigenvector of the resulting $2n\times 2n$ matrix. The eigenvalue problem is solved by the QR algorithm for $A$ small or moderate and by iterative projection methods for $A$ large [25].

In applications, rather than simply using the 2-norm, some methods (see, e.g., [1, 8, 22, 26]) focus on the following more general TRS

[TABLE]

where $B$ is symmetric positive definite and the norm $\|s\|_{B}=\sqrt{s^{T}Bs}$ . In light of [23], the matrix $B$ is often constructed to impose a smoothness condition on a solution to (6) for the ill-posed problem and to incorporate scaling of variables in optimization. For instance, it is argued in [3] that a good choice is $B=J^{-T}J^{-1}$ for some invertible matrix $J$ or the Hermitian polar factor [15] of $A$ .

Notice that the problem (6) is mathematically equivalent to a standard TRS (1) through the following substitutions

[TABLE]

Therefore, we assume that $B=I$ , the identity matrix, and just consider TRS (1) without loss of generality when considering the convergence of the GLTR method.

The GLTR method and other projection methods avoid the high overhead of computing a series of Cholesky factorizations and have shown to be efficient for a large-scale TRS; see, e.g., [2, 5, 8]. Let $s_{opt}$ be a solution to TRS (1) and $s_{k}$ be the approximate solution from the underlying $k+1$ dimensional Krylov subspace $\mathcal{K}_{k}(g,A)=span\{g,Ag,\ldots,A^{k}g\}$ obtained by the GLTR method. By Theorem 1, there is an optimal Lagrangian multiplier $\lambda_{k}$ for each projected TRS problem onto $\mathcal{K}_{k}(g,A)$ . Then four central convergence problems are: how fast the three errors $|\lambda_{opt}-\lambda_{k}|$ , $\|s_{k}-s_{opt}\|$ , $q(s_{k})-q(s_{opt})$ and the residual norm $\|(A+\lambda_{k}I)s_{k}+g\|$ of the approximate solution $\lambda_{k},s_{k}$ of (3) decrease as $k$ increases. Regarding $\|s_{k}-s_{opt}\|$ and $q(s_{k})-q(s_{opt})$ , some a-priori bounds have been derived in [31]. However, for $|\lambda_{opt}-\lambda_{k}|$ and $\|(A+\lambda_{k}I)s_{k}+g\|$ , there have been no a-priori bounds to show how they converge and tend to zero as $k$ increases. The only known result on $\lambda_{k}$ is that $\lambda_{k}$ increases monotonically with $k$ and is bounded from above by $\lambda_{opt}$ [18]. Therefore, we always have $|\lambda_{opt}-\lambda_{k}|=\lambda_{opt}-\lambda_{k}\geq 0$ . The residual norm is important in both theory and practice as it is computable and its size is commonly used to measure the convergence of the GLTR method. We mention that a mixed bound is given for $|\lambda_{opt}-\lambda_{k}|$ in [32, Lemma 3.4]. However, it is easy to check that the mixed bound in [32] does not exhibit any decreasing tendency and even can never be small unless the symmetric Lanczos process breaks down, in which case the bound is trivially zero.

Remarkably, it has recently been shown that, under certain mild conditions, the solution of (1) is mathematically equivalent to solving a certain matrix eigenvalue problem of size $2n$ [1]. This equivalence provides us a new approach to efficiently solve (1). Among others, such mathematical equivalence makes us realize that, at iteration $k$ , the GLTR method amounts to solving a certain eigenvalue problem of size $2(k+1)$ by projecting the $2n\times 2n$ matrix eigenvalue problem onto a special $2(k+1)$ dimensional subspace in $\mathbb{R}^{2n}$ constructed by $\mathcal{K}_{k}(g,A)$ used in the GLTR method. At iteration $k$ , unlike the GLTR method, one can simultaneously obtain the optimal $\lambda_{k}$ and the solution $s_{k}$ to the projected TRS. Such key observation is our starting point to study the convergence of the GLTR method. A note is that we are mainly concerned with $\sin\angle(s_{k},s_{opt})$ other than the error $\|s_{k}-s_{opt}\|$ . The sine is a standard measure when considering the error of an eigenvector and its approximations in the context of the matrix eigenvalue problem [28]. The authors of [1] measure the error of $s_{k}$ and $s_{opt}$ by the sine of angle $\angle(s_{k},s_{opt})$ in their experiments.

The importance of the contributions in this paper is, in turn, the establishment of the two a-priori bounds for $\lambda_{opt}-\lambda_{k}$ for the first time, that of the bound for $\sin\angle(s_{k},s_{opt})$ , that of the bounds for the residual norm $\|(A+\lambda_{k}I)s_{k}+g\|$ for the first time, and finally that of a new sharp bound for $q(s_{k})-q(s_{opt})$ . The bound for $q(s_{k})-q(s_{opt})$ is different from the two ones presented in [31], and its proof is also simpler than those in [31]. The first a-priori bound for $\lambda_{opt}-\lambda_{k}$ , though a considerable overestimate, is the background for establishing the second much sharper one. With the bounds for $\lambda_{opt}-\lambda_{k}$ and $\sin\angle(s_{k},s_{opt})$ or $\|s_{k}-s_{opt}\|$ , we are able to derive a-priori bounds for $\|(A+\lambda_{k}I)s_{k}+g\|$ . When establishing the first a-priori bound for $\lambda_{opt}-\lambda_{k}$ and the a-priori bound for $\sin\angle(s_{k},s_{opt})$ , we need to solve the problem of the polynomial best uniform approximation to the rational function $\frac{1}{(x-\eta)^{2}}$ with $x\in[-1,1]$ and $\eta>1$ . We will exploit a generating function of $\frac{1}{(x-\eta)^{2}}$ with Chebyshev polynomials of the second kind [4] to handle this best uniform approximation problem by obtaining a suboptimal approximation polynomial. Numerical results demonstrate that our a-priori bounds predict the convergence rates of the three errors and residual norms and estimate their values accurately.

This paper is organized as follows. In section 2, we give some preliminaries and introduce the equivalence of the solution of (1) and a certain $2n\times 2n$ matrix eigenvalue problem. We review the GLTR method in section 3. Section 4 is devoted to a-priori bounds for $\lambda_{opt}-\lambda_{k}$ and $q(s_{k})-q(s_{opt})$ . A-priori bounds for $\sin\angle(s_{k},s_{opt})$ and $\|(A+\lambda_{k}I)s_{k}+g\|$ are presented in section 5. In section 6, we report numerical experiments to confirm that our bounds estimate the convergence rates and behavior of the GLTR method accurately. Finally, we conclude the paper in section 7.

Throughout this paper, denote by the superscript $T$ the transpose of a matrix or vector, by $\|\cdot\|$ the 2-norm of a matrix or vector, by $I$ the identity matrix with order clear from the context, and by $e_{i}$ the $i$ th column of $I$ . All vectors are column vectors and are typeset in lower case letters.

2 Preliminaries

2.1 A solution to TRS (1)

Suppose that $A=S\Lambda S^{T}$ is the eigendecomposition of $A$ , where $S$ is orthogonal and $\Lambda=diag(\alpha_{1},\alpha_{2},\ldots,\alpha_{n})$ with the $\alpha_{i}$ being the eigenvalues of $A$ labeled as $\alpha_{1}\geq\alpha_{2}\geq\cdots\geq\alpha_{n}$ .

If $A+\lambda_{opt}I\succ 0$ , then the solution $s_{opt}$ to TRS (1) is unique and $s_{opt}=-(A+\lambda_{opt}I)^{-1}g$ . If (1) has no solution $s_{opt}$ with $\|s_{opt}\|=\Delta$ , then $A$ is positive definite and $s_{opt}=-A^{-1}g$ with $\|s_{opt}\|<\Delta$ and $\lambda_{opt}=0$ . All these correspond to the so-called “easy case” [3, 8, 20, 21] or “nondegenerate case” [13].

If $A$ is indefinite and

[TABLE]

the null space of $A-\alpha_{n}I$ , then we have the following definition [3, 8, 21].

Definition 2 (Hard Case).

The solution of TRS (1) is a hard case if $g$ is orthogonal to the eigenspace corresponding to the eigenvalue $\alpha_{n}$ of $A$ and the optimal Lagrangian multiplier is $\lambda_{opt}=-\alpha_{n}$ .

The hard case is also called the “degenerate case” [13]. In this case, (1) may have multiple optimal solutions [21, p.87-88], which can be characterized as

[TABLE]

where $u_{n}\in\mathcal{N}(A-\alpha_{n}I)$ and $\|u_{n}\|=1$ , $\|(A-\alpha_{n}I)^{\dagger}g\|\leq\Delta$ , and the superscript ${\dagger}$ denotes the Moore-Penrose generalized inverse. $s_{opt}$ with $\|s_{opt}\|=\Delta$ is unique if and only if $\alpha_{n}$ is a simple eigenvalue of $A$ and the scalar $\eta$ satisfies

[TABLE]

As we can see, in the hard case, we not only need to solve a singular system but also need to compute the eigenspace of $A$ associated with the smallest eigenvalue $\alpha_{n}$ . The hard case has been studied for years; see, e.g., [7, 8, 20, 21, 22]. An eigensolver is proposed in [1] to detect and handle the hard case theoretically and numerically.

As has been addressed in [3], the hard case rarely occurs in practice, as it requires that both $A$ be indefinite and $g$ be orthogonal to $\mathcal{N}(A-\alpha_{n}I)$ . In the sequel, we are only concerned with the easy case.

2.2 The equivalence of the TRS and a matrix

eigenvalue problem

Adachi et al. [1] prove that TRS (6) can be treated by solving a certain generalized eigenvalue problem of order $2n$ . For $B=I$ , the generalized eigenvalue problem in [1] reduces to the standard eigenvalue problem of the augmented matrix

[TABLE]

Let $\mu_{1},\mu_{2},\ldots,\mu_{2n}$ be the eigenvalues of $M$ labeled as

[TABLE]

where $Re(\cdot)$ is the real part of a scalar. The following result in [1] establishes a key relationship between the TRS solution and the eigenpair of $M$ .

Theorem 3 ([1]).

Let $(\lambda_{opt},s_{opt})$ satisfy Theorem 1 with $\|s_{opt}\|=\Delta$ . Then the rightmost eigenvalue $\mu_{1}$ of $M$ is real and simple, and $\mu_{1}=\lambda_{opt}$ . Let $y^{T}=(y_{1}^{T},y_{2}^{T})^{T}$ be the unit length eigenvector of $M$ associated with the eigenvalue $\mu_{1}$ , i.e.,

[TABLE]

and suppose that $g^{T}y_{2}\neq 0$ . Then the unique TRS solution is

[TABLE]

Remark 2.1.

Adachi et al. [1] have proved that $g^{T}y_{2}=0$ corresponds to the hard case, i.e., $\lambda_{opt}=-\alpha_{n}$ and $g\perp\mathcal{N}(A-\alpha_{n}I).$ Therefore, in the easy case, $g^{T}y_{2}\neq 0$ is guaranteed, and (10) holds.

3 The generalized Lanczos trust-region (GLTR) method [8]

For (1) large, an effective approach is to iteratively solve a sequence of smaller projected problems

[TABLE]

where $\mathcal{S}_{k}\subset\mathbb{R}^{n}$ is some specially chosen $k+1$ dimensional subspace, and we use the solution $s_{k}$ to TRS (11) to approximate $s_{opt}$ .

A most commonly used $\mathcal{S}_{k}$ is the $k+1$ dimensional Krylov subspace

[TABLE]

generated by $g$ and $A$ . The GLTR method starts with the TCG method [27, 29]. When $A$ is positive definite and $\|A^{-1}g\|\leq\Delta$ , which corresponds to $\lambda_{opt}=0$ , the method returns a converged approximate solution $s_{k}$ to $s_{opt}=-A^{-1}g$ . In this case, the convergence theory of the standard conjugate gradient method is directly applicable. The GLTR method switches to the Lanczos method to accurately solve the projected problem (11) whenever a negative curvature is present or the solution norm by the TCG method exceeds the trust-region radius $\Delta$ , which corresponds to an indefinite $A$ or $\lambda_{opt}>0$ . It proceeds in such a way until $s_{k}$ converges to $s_{opt}$ .

In the sequel, without loss of generality we always assume that the TCG method does not solve (11) exactly and one must use the Lanczos method starting from the first iteration, so as to compute the solution $s_{k}$ to (11) with $\|s_{k}\|=\Delta$ , meaning that $\lambda_{k}>0$ for $k=0,1,\ldots$ .

In the following, we describe the GLTR method. At iteration $k$ , mathematically, the GLTR method exploits the symmetric Lanczos process to generate an orthonormal basis $\{q_{i}\}_{i=0}^{k}$ of $\mathcal{S}_{k}$ defined by (12), which can be written in matrix form

[TABLE]

where $Q_{k}=(q_{0},q_{1},\ldots,q_{k})$ is orthonormal and the matrix

[TABLE]

is symmetric tridiagonal, which is called the orthogonal projection matrix of $A$ onto $\mathcal{S}_{k}$ in the orthonormal basis $\{q_{i}\}_{i=0}^{k}$ .

We shall consider vectors of form

[TABLE]

Let $s_{k}=Q_{k}h_{k}$ solve the projected problem

[TABLE]

It then follows from (17) and the Lanczos process that $h_{k}$ solves the reduced TRS

[TABLE]

and $q(s_{k})=\phi(h_{k})$ .

From Theorem 1, the vector $h_{k}$ is a solution to (19) if and only if there exists the optimal Lagrangian multiplier $\lambda_{k}\geq 0$ such that

[TABLE]

As $T_{k}$ is tridiagonal, we can use the Moré-Sorensen method to efficiently solve (18) even if $n$ is large and then obtain $s_{k}$ from $s_{k}=Q_{k}h_{k}$ . The resulting method is the GLTR method for solving (1). It has been shown in [1] that TRS (19) is always the easy case provided that the symmetric Lanczos process does not break down at iteration $k$ . Under the assumption that $\|s_{k}\|=\|h_{k}\|=\Delta$ , this means that we always $\lambda_{k}>0$ for all $k\leq k_{\max}$ , where $k_{\max}$ is the first iteration at which the symmetric Lanczos process breaks down, i.e., $\beta_{k_{\max}+1}=0$ .

The authors of [8] prove that the residual norm of $\lambda_{k}$ and $s_{k}$ as approximate solutions of (3) satisfies

[TABLE]

from which it is known that if the symmetric Lanczos process breaks down at iteration $k_{\max}$ for the first time, then $s_{k_{\max}}=s_{opt}$ and $\lambda_{k_{\max}}=\lambda_{opt}$ . This result indicates that we can efficiently measure the residual norm by exploiting the last entry of $h_{k}$ without explicitly forming $s_{k}=Q_{k}h_{k}$ before a prescribed convergence tolerance is achieved.

In the next two sections we shall consider the convergence of the GLTR method, and establish a-priori bounds for the errors $\lambda_{opt}-\lambda_{k}$ , $q(s_{k})-q(s_{opt})$ , $\sin\angle(s_{k},s_{opt})$ and the residual norm $\|(A+\lambda_{k}I)s_{k}+g\|$ . We will prove how they decrease as $k$ increases. We point out that, unlike $\|s_{k}-s_{opt}\|$ , which is concerned with in [31, 32], we consider the error $\sin\angle(s_{k},s_{opt})$ .

4 A-priori bounds for

$\lambda_{opt}-\lambda_{k}$ and $q(s_{k})-q(s_{opt})$

We establish a-priori bounds for $\lambda_{opt}-\lambda_{k}$ in this section. It is known from [18] that $\lambda_{k}$ increases monotonically with $k$ and is bounded from above by $\lambda_{opt}$ . Precisely, suppose that the symmetric Lanczos process breaks down at some $k_{\max}\leq n-1$ . Then for $k\leq k_{\max}$ it holds that

[TABLE]

Under the assumption that $\|s_{k}\|=\|h_{k}\|=\Delta$ , we have $\lambda_{k}>0$ for $k=0,1,\ldots,k_{\max}$ , but there has been no quantitative result on how fast $\lambda_{k}$ converges to $\lambda_{opt}$ .

Define the $2(k+1)\times 2(k+1)$ matrix

[TABLE]

with $M$ defined by (7) and

[TABLE]

with the columns of the orthonormal $Q_{k}$ defined by (13). It is straightforward that

[TABLE]

with $T_{k}$ defined by (16) and $\beta_{0}=\|g\|$ .

Obviously, $\widetilde{Q}_{k}$ is column orthonormal, and its columns span the $2(k+1)$ dimensional subspace

[TABLE]

Therefore, $M_{k}$ is the orthogonal projection matrix of $M$ onto $\widetilde{\mathcal{S}}_{k}$ in the orthonormal basis $\{(q_{i}^{T},0)^{T}\}_{i=0}^{k}$ and $\{(0,q_{i}^{T})^{T}\}_{i=0}^{k}$ .

Let $\mu^{(k)}_{i},\,i=1,2,\ldots,2(k+1)$ , be the eigenvalues of $M_{k}$ , which, similarly to (8), are labeled as

[TABLE]

From Theorem 3 it is known that

[TABLE]

is real and simple.

Let $z^{(k)}=\left(\begin{array}[]{c}z^{(k)}_{1}\\ z^{(k)}_{2}\\ \end{array}\right)$ be the unit length eigenvector of $M_{k}$ associated with $\mu_{1}^{(k)}$ , i.e.,

[TABLE]

Then the vector

[TABLE]

is the Ritz vector of $A$ from the subspace $\widetilde{\mathcal{S}}_{k}$ and approximates the unit length eigenvector $y^{T}=(y_{1}^{T},y_{2}^{T})^{T}$ of $M$ associated with its rightmost real eigenvalue $\mu_{1}=\lambda_{opt}$ .

From the structure (27) of $M_{k}$ and the definition (30) of $z^{(k)}$ , it is easy to show that

[TABLE]

is the left eigenvector of $M_{k}$ corresponding to the real simple eigenvalue $\mu_{1}^{(k)}=\lambda_{k}$ . and from (30) it is straightforward to verify that

[TABLE]

Therefore, by definition (cf. [28, p.186]), the spectral condition number of $\mu_{1}^{(k)}$ is

[TABLE]

Similarly, by the structure (7) of $M$ and the definition (9) of $y$ , the vector $(y_{2}^{T},y_{1}^{T})^{T}$ is the left eigenvector of $M$ associated with the eigenvalue $\mu_{1}$ . As a result, the spectral condition number of $\mu_{1}$ is

[TABLE]

By Theorem 3, the unique solution $h_{k}$ to (19) is

[TABLE]

and the unique solution $s_{k}$ to TRS (18) is

[TABLE]

Denote by $\angle(u,\mathcal{S}_{k})$ the acute angle between a nonzero vector $u$ and $\mathcal{S}_{k}$ . Then

[TABLE]

where $\pi_{k}$ is the orthogonal projector onto $\mathcal{S}_{k}$ . In terms of Theorem 3 and (29), we have

[TABLE]

where $\mu_{1}$ is the rightmost eigenvalue of $M$ .

Let $\widetilde{\pi}_{k}=\widetilde{Q}_{k}\widetilde{Q}_{k}^{T}$ be the orthogonal projector onto $\widetilde{\mathcal{S}}_{k}$ . Then $\widetilde{\pi}_{k}M\widetilde{\pi}_{k}$ is the restriction of $M$ to the subspace $\widetilde{\mathcal{S}}_{k}$ and its matrix representation is $M_{k}$ in the orthonormal basis $\{(q_{i}^{T},0)^{T}\}_{i=0}^{k}$ and $\{(0,q_{i}^{T})^{T}\}_{i=0}^{k}$ . The eigenvalues of $\widetilde{\pi}_{k}M\widetilde{\pi}_{k}$ restricted to $\widetilde{\mathcal{S}}_{k}$ are the eigenvalues of $M_{k}$ , and the eigenvectors are the Ritz vectors of $M$ from $\widetilde{\mathcal{S}}_{k}$ ; see [25] for details. Therefore, a direct application of Theorem 3.8 in [16] to our context gives the following result.

Lemma 4.

Let $\mu^{(k)}_{1}=\lambda_{k}$ and $\mu_{1}=\lambda_{opt}$ be the rightmost eigenvalues of $M_{k}$ and $M$ , respectively, and suppose that $\|s_{opt}\|=\|s_{k}\|=\Delta$ . Then for $\sin\angle(y,\widetilde{\mathcal{S}}_{k})$ small it holds that

[TABLE]

where $s(\lambda_{k})$ is defined by (43) and $\widetilde{\gamma}_{k}=\|\widetilde{\pi}_{k}M(I-\widetilde{\pi}_{k})\|$ . 111In Theorem 3.8 of [16], $\tan\angle(y,\widetilde{\mathcal{S}}_{k})$ in the right-hand side of (49) is $\sin\angle(y,\widetilde{\mathcal{S}}_{k})$ , but it is obvious that the sine and tangent can be replaced each other in the right-hand side when $\sin\angle(y,\widetilde{\mathcal{S}}_{k})$ becomes small.

From (41) and (43), we obtain

[TABLE]

which converges to $s(\lambda_{opt})$ defined by (44) when $y^{(k)}\rightarrow y$ . This is indeed the case, as will be shown in the next section. In the meantime, $\widetilde{\gamma}_{k}\leq\|M\|$ . As a result, by this lemma, the convergence problem of $\lambda_{k}$ to $\lambda_{opt}$ becomes to analyze how fast $\sin\angle(y,\widetilde{\mathcal{S}}_{k})$ decreases as $k$ increases.

Notice that

[TABLE]

Therefore, in order to bound $\lambda_{opt}-\lambda_{k}$ and to show how it converges to zero as $k$ increases, we need to analyze $\|(I-\pi_{k})y_{1}\|$ and $\|(I-\pi_{k})y_{2}\|$ separately.

We first consider $\|(I-\pi_{k})y_{1}\|$ . Throughout the paper, we denote by $\bar{P}_{k}$ the set of polynomials of degree not exceeding $k+1$ . We first present the following result.

Lemma 5.

The distance $\|(I-\pi_{k})s_{opt}\|$ between $s_{opt}$ and $\mathcal{S}_{k}=\mathcal{K}_{k}(g,A)$ satisfies

[TABLE]

and

[TABLE]

where

[TABLE]

with $\alpha_{1}\geq\alpha_{n-1}\geq\cdots\geq\alpha_{n}$ being the eigenvalues of $A$ . Moreover,

[TABLE]

where $\kappa=\frac{\alpha_{1}+\lambda_{opt}}{\alpha_{n}+\lambda_{opt}}$ is the condition number of $A+\lambda_{opt}I$ .

Proof. Theorem 1 has shown that $s_{opt}$ satisfies the linear system $(A+\lambda_{opt})s_{opt}=-g$ . Therefore, exploiting the shift invariance $\mathcal{K}_{k}(g,A)=\mathcal{K}_{k}(g,A+\lambda_{opt}I)$ and the eigendecomposition $A=S\Lambda S^{T}$ , we have

[TABLE]

with the polynomial $p_{k}(\lambda)=1+\lambda q(\lambda)\in\bar{P}_{k}$ and $p_{k}(0)=1$ .

Note that $A+\lambda_{opt}I$ is symmetric positive definite. Applying a standard estimate (cf. the book [11, p.51, Theorem 3.1.1] to $\epsilon_{1}^{(k)}$ , we obtain (54).

Relation (10) shows that $y_{1}$ is the same as $s_{opt}$ up to a scaling. Therefore, replacing $s_{opt}$ in (51) and (52) by $y_{1}$ and exploiting (54), we have established the following upper bound for $\|(I-\pi_{k})y_{1}\|$ .

Theorem 6.

Let $y^{T}=(y_{1}^{T},y_{2}^{T})^{T}$ be the unit length eigenvector of $M$ associated with its rightmost eigenvalue $\mu_{1}$ . Then

[TABLE]

where $\kappa=\frac{\alpha_{1}+\lambda_{opt}}{\alpha_{n}+\lambda_{opt}}$ .

As it will turn out, an estimation of $\|(I-\pi_{k})y_{2}\|$ is much more involved.

Theorem 7.

With the notation previously, we have

[TABLE]

where $\alpha_{1}$ and $\alpha_{n}$ are the largest and smallest eigenvalues of $A$ , and

[TABLE]

with

[TABLE]

where $\kappa=\frac{\alpha_{1}+\lambda_{opt}}{\alpha_{n}+\lambda_{opt}}$ .

Proof. Recall that $A=S\Lambda S^{T}$ is the eigendecomposition of $A$ , where $S$ is orthogonal and $\Lambda=diag(\alpha_{1},\alpha_{2},\ldots,\alpha_{n})$ with $\alpha_{1}\geq\alpha_{2}\geq\cdots\geq\alpha_{n}$ the eigenvalues.

From $(A+\lambda_{opt}I)s_{opt}=-g$ and (10), we obtain

[TABLE]

From (9), we have

[TABLE]

Making use of $\mathcal{K}_{k}(g,A)=\mathcal{K}_{k}(g,A+\lambda_{opt}I)$ , (59) and the orthogonality of $S$ , we then obtain

[TABLE]

Consider the variable transformation

[TABLE]

which maps $x\in[-1,1]$ to $z\in[\alpha_{n},\alpha_{1}]$ in one-to-one correspondence. Then

[TABLE]

$\epsilon_{2}^{(k)}$ is the error of the best or optimal uniform polynomial approximation from $\bar{P}_{k-1}$ to the rational function $\frac{1}{(x-\eta)^{2}}$ over the interval $[-1,1]$ with $\eta>1$ . To our best knowledge, there seems no known explicit solution to such approximation problem. However, recall from (50) that $\sin\angle(y,\widetilde{\mathcal{S}}_{k})>\|(I-\pi)y_{1}\|$ . Therefore, it is enough to prove that $\epsilon_{2}^{(k)}$ is of the same order as bound (55) because this means that $\sin\angle(y,\widetilde{\mathcal{S}}_{k})$ is at least as small as bound (55) for $\|(I-\pi)y_{1}\|$ . To this end, exploiting Chebyshev polynomials of the second kind and one of its fundamental properties, we will establish a desired bound for $\epsilon_{2}^{(k)}$ , which is indeed as small as bound (55).

Theorem 8.

The approximation error

[TABLE]

and

[TABLE]

where $t=\eta-\sqrt{\eta^{2}-1}$ and $\kappa=\frac{\alpha_{1}+\lambda_{opt}}{\alpha_{n}+\lambda_{opt}}$ .

Proof. For any $t\in(-1,1)$ and $x\in[-1,1]$ there is the following generating function [4, p.215]:

[TABLE]

where $U_{j}(x)=\sin(j\arccos x)$ is the $j$ th degree Chebyshev polynomial of the second kind [4, p.212].

For $t=\eta-\sqrt{\eta^{2}-1}$ , it is easily justified that $1+t^{2}=2\eta t$ . Therefore, the identity (63) becomes

[TABLE]

from which it follows that

[TABLE]

Taking the $k$ th degree polynomial

[TABLE]

and noting that $-\ln t=|\ln t|$ for $0<t<1$ and $|U_{j}(x)|\leq 1$ for $x\in[-1,1]$ , we have

[TABLE]

From (58), it is straightforward to justify that

[TABLE]

Therefore, from (56), (60) and (65) it follows that (61) and (62) hold.

Combining Lemma 4, (50), Theorem 6 and Theorem 8, by a simple manipulation, we achieve the following bounds for $\sin\angle(y,\widetilde{\mathcal{S}}_{k})$ and $\lambda_{opt}-\lambda_{k}$ .

Theorem 9.

Suppose that $\|s_{opt}\|=\|s_{k}\|=\Delta$ . Then

[TABLE]

and asymptotically

[TABLE]

where

[TABLE]

$\widetilde{\gamma}_{k}=\|\widetilde{\pi}_{k}M(I-\widetilde{\pi}_{k})\|$ * with $\widetilde{\pi}_{k}$ the orthogonal projector onto $\widetilde{\mathcal{S}}_{k}$ defined by (28), and $s(\lambda_{k})$ and $t$ are defined by (43) and (66).*

A-priori bound (68), for the first time, proves that $\lambda_{opt}-\lambda_{k}$ converges to zero as $k$ increases. As a matter of fact, based on this bound, we can further establish a much sharper bound for $\lambda_{opt}-\lambda_{k}$ . Before proceeding, we first derive the following result, which will play a key role in establishing the sharper a-priori bound for $\lambda_{opt}-\lambda_{k}$ .

Theorem 10.

For $k=0,1,\ldots,k_{max}$ , the following a-priori bound holds:

[TABLE]

where $\kappa=\frac{\alpha_{1}+\lambda_{opt}}{\alpha_{n}+\lambda_{opt}}$ and $\beta_{0}=\|g\|$ .

Proof. Consider the symmetric positive definite linear system

[TABLE]

with $\beta_{0}=\|g\|$ , which is (21) for $k=k_{\max}$ and has the solution $h_{k_{\max}}$ . When taking $e_{1}$ as the starting vector, i.e., taking the zero vector as an initial guess to $h_{k_{\max}}$ , the symmetric Lanczos process generates an orthonormal basis $\{e_{i}\}_{i=1}^{k+1}$ of the $(k+1)$ dimensional Krylov subspace

[TABLE]

and the symmetric tridiagonal $T_{k}+\lambda_{opt}I$ . Define $E_{k}=(e_{1},e_{2},\ldots,e_{k+1})$ . Then $T_{k}+\lambda_{opt}I=E_{k}^{T}(T_{k_{\max}}+\lambda_{opt}I)E_{k}$ . Applying the symmetric Lanczos method to solving (71), at iteration $k\leq k_{\max}$ we obtain the projected problem

[TABLE]

Write its solution as $\tilde{y}_{k}$ . Then the symmetric Lanczos method computes the approximation $\tilde{h}_{k}=E_{k}\tilde{y}_{k}$ of $h_{k_{\max}}$ .

Define the error $\varepsilon_{k}=h_{k_{\max}}-\tilde{h}_{k}$ and the residual $r_{k}=-\beta_{0}e_{1}-(T_{k_{\max}}+\lambda_{opt}I)\tilde{h}_{k}$ of (71). Note that the initial residual $r_{0}=-\beta_{0}e_{1}$ . Then $\|r_{0}\|^{2}=\beta_{0}^{2}$ and

[TABLE]

from which and [19, Theorem 2.11] it follows that the square of $(T_{k_{\max}}+\lambda_{opt}I)$ -norm error satisfies

[TABLE]

As a result, we obtain

[TABLE]

Notice that the eigenvalues of $T_{k_{\max}}$ are the exact eigenvalues of $A$ , which means that the smallest and largest eigenvalues of $T_{k_{\max}}+\lambda_{opt}I$ lie in $[\alpha_{n}+\lambda_{opt},\alpha_{1}+\lambda_{opt}]$ . Since the symmetric Lanczos method is mathematically equivalent to the conjugate gradient method at the same iteration when the same initial guess on $h_{k_{\max}}$ is used, applying a standard estimate (cf. [11, Theorem 3.1.1] and [19, Theorem 2.30]) to $\|\varepsilon_{k}\|_{(T_{k_{\max}}+\lambda_{opt}I)}^{2}$ gives rise to

[TABLE]

Since $r_{0}=-\beta_{0}e_{1}$ , the the squared initial error

[TABLE]

Exploiting $\beta_{0}\|({T_{k_{\max}}+\lambda_{opt}I})^{-1}e_{1}\|=\|h_{k_{\max}}\|=\Delta$ , we obtain

[TABLE]

Substituting the above three relations into (72) yields (70).

Theorem 11.

Assume that the symmetric Lanczos process breaks down at iteration $k_{\max}$ and $\|s_{opt}\|=\|s_{k}\|=\Delta$ for $k=0,1,\ldots,k_{\max}$ . Then for $k$ suitably large we have the asymptotic a-priori bound

[TABLE]

*where the factors *

[TABLE]

with $\beta_{0}=\|g\|$ .

Proof. From (21), we obtain

[TABLE]

and $\|h_{k}\|=\beta_{0}\|(T_{k}+\lambda_{k}I)^{-1}e_{1}\|=\Delta$ . Therefore, by (19) we have $q(s_{k})=\phi(h_{k})$ and

[TABLE]

By assumption and (19), we have

[TABLE]

with $\|h_{k_{max}}\|=\Delta$ , and the eigenvalues $T_{k_{\max}}$ are the exact eigenvalues of $A$ . Similarly to the above derivation, we obtain

[TABLE]

Subtracting the two hand sides of (76) and (77) yields

[TABLE]

Since $\|(T_{k}+\lambda_{opt}I)^{-1}\|\leq\frac{1}{\alpha_{n}+\lambda_{opt}}$ and (67) has proved that $\lambda_{opt}-\lambda_{k}$ is nonnegative and tends to zero as $k$ increases, we must have $(\lambda_{opt}-\lambda_{k})\|(T_{k}+\lambda_{opt}I)^{-1}\|<1$ , i.e., $\lambda_{opt}-\lambda_{k}\leq\alpha_{n}+\lambda_{opt}$ , for $k$ suitably large. Precisely, by (68), a sufficient condition is to choose $k$ such that

[TABLE]

Moreover, since $\lambda_{k}\rightarrow\lambda_{opt}$ , by continuity argument, we have

[TABLE]

where the quantity in the right hand side has been shown by (72) to be strictly negative for all $k=0,1,\ldots,k_{\max}-1$ . Therefore, $e_{1}^{T}(T_{k}+\lambda_{k}I)^{-1}e_{1}-e_{1}^{T}(T_{k_{\max}}+\lambda_{opt}I)^{-1}e_{1}$ must become nonpositive for $k$ suitably large, that is, the first term in the right hand side of (78) becomes nonpositive as $k$ increases. As a result, from (78) we obtain the inequality

[TABLE]

when $k$ is suitably large.

Let us analyze $e_{1}^{T}(T_{k}+\lambda_{k}I)^{-1}e_{1}$ . Since $(\lambda_{opt}-\lambda_{k})\|(T_{k}+\lambda_{opt}I)^{-1}\|<1$ for $k$ suitably large, exploiting the series expansion of $\left((I-(\lambda_{opt}-\lambda_{k})(T_{k}+\lambda_{opt}I)^{-1})\right)^{-1}$ , we obtain

[TABLE]

Therefore, we have

[TABLE]

which is nonnegative provided that $k$ is suitably large. Substituting this relation into (79) and dropping the nonnegative higher small term $\mathcal{O}((\lambda_{opt}-\lambda_{k})^{2})$ in the resulting left-hand side give rise to

[TABLE]

with $\eta_{k1}$ and $\eta_{k2}$ defined by (74) and (75), respectively, which proves (73).

Since $T_{k}+\lambda_{opt}I$ is symmetric positive definite and its eigenvalues lie between $\alpha_{n}+\lambda_{opt}$ and $\alpha_{1}+\lambda_{opt}$ , the smallest and largest ones of $A+\lambda_{opt}I$ , respectively, we have $\frac{1}{(\alpha_{1}+\lambda_{opt})^{2}}\leq e_{1}^{T}(T_{k}+\lambda_{opt}I)^{-2}e_{1}\leq\frac{1}{(\alpha_{n}+\lambda_{opt})^{2}}$ . As a result, from the forms of $\eta_{k1}$ and $\eta_{k2}$ , it is straightforward to obtain

[TABLE]

independent of iteration $k$ .

Relation (73) shows that bounding $\lambda_{opt}-\lambda_{k}$ amounts to bounding $e_{1}^{T}(T_{k_{\max}}+\lambda_{opt}I)^{-1}e_{1}-e_{1}^{T}(T_{k}+\lambda_{opt}I)^{-1}e_{1}$ and $q(s_{k})-q(s_{opt})$ separately. We have established an a-priori bound (70) for the former one. Now we investigate $q(s_{k})-q(s_{opt})$ . Steihaug [27] has proved that the error $q(s_{k})-q(s_{opt})$ of the optimal objective value monotonically decreases with respect to $k$ . Zhang et al. [31, Theorem 4.3] have given the following result. Starting with it, we can derive a new a-priori bound for $q(s_{k})-q(s_{opt})$ , whose proof is much shorter than those in [31].

Lemma 12 ([31]).

Suppose $\|s_{opt}\|=\|s_{k}\|=\Delta$ . Then

[TABLE]

for any nonzero $\tilde{s}\in\mathcal{K}_{k}(g,A)$ .

Theorem 13.

Suppose $\|s_{opt}\|=\|s_{k}\|=\Delta$ . Then

[TABLE]

where $\kappa=\frac{\alpha_{1}+\lambda_{opt}}{\alpha_{n}+\lambda_{opt}}$ .

Proof. Relation (81) has shown that

[TABLE]

By definition, we have

[TABLE]

where $\pi_{k}$ is the orthogonal projector onto $\mathcal{K}_{k}(g,A)$ . From the above relation and Lemma 5, it is immediate that

[TABLE]

Substituting it into (83) yields (82).

By a comparison, we find that bound (82) is as sharp as (4.24a) and (4.26a) in [31] but has a simpler form than the latter two, and its proof is also simpler.

Substituting bound (82) for $q(s_{k})-q(s_{opt})$ into (73) and bound (70) into (73) ultimately leads to the following a-priori bound for $\lambda_{opt}-\lambda_{k}$ .

Theorem 14.

Suppose $\|s_{opt}\|=\|s_{k}\|=\Delta$ . Then for $k$ suitably large we have

[TABLE]

with the factors $\eta_{k1}$ and $\eta_{k2}$ defined by (74) and (75), respectively.

This theorem clearly indicates that, except for the bounded factor, $\lambda_{opt}-\lambda_{k}$ converges at least as fast as $\left(\frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}\right)^{2(k+1)}$ , and bound (86) is much sharper than bound (68) and is roughly square of the latter.

5 A-priori bounds for $\sin\angle(s_{k},s_{opt})$ and

$\|(A+\lambda_{k}I)s_{k}+g\|$

Suppose that $\|s_{opt}\|=\|s_{k}\|=\Delta$ . Then $s_{k}/\|s_{opt}\|$ and $s_{opt}/\|s_{opt}\|$ have unit length. It is worthwhile to notice that the measures $\sin\angle(s_{k},s_{opt})$ and $\|s_{k}-s_{opt}\|/\|s_{opt}\|$ are equivalent once they start to become fairly small. In fact, for $\angle(s_{k},s_{opt})$ fairly small we have

[TABLE]

It is seen from (46) and (10) that $s_{k}$ and $s_{opt}$ are the same as $y^{(k)}_{1}$ and $y_{1}$ up to scaling, respectively. As a result, we have

[TABLE]

We take two steps to estimate $\sin\angle(s_{k},s_{opt})$ . Firstly, we bound $\sin\angle(y_{1}^{(k)},y_{1})$ in terms of $\sin\angle(y^{(k)},y)$ with $y$ and $y^{(k)}$ defined by (9) and (41), respectively. Secondly, we establish an a-priori bound for $\sin\angle(y^{(k)},y)$ , showing how it converges to zero as $k$ increases. To this end, we need the following result [12, Lemma 2.3].

Lemma 15 ([12]).

Let $u=\left(\begin{array}[]{c}u_{1}\\ u_{2}\\ \end{array}\right)$ and $\tilde{u}=\left(\begin{array}[]{c}\tilde{u}_{1}\\ \tilde{u}_{2}\\ \end{array}\right)$ where $u_{i}$ , $\tilde{u}_{i}\in\mathbb{C}^{n}$ for $i=1,2$ , and $\|u_{1}\|=\|\tilde{u}_{1}\|=1$ . Then

[TABLE]

With this lemma, we can present the following bound.

Theorem 16.

For the unit length eigenvector $y^{T}=(y_{1}^{T},y_{2}^{T})^{T}$ of $M$ associated with the eigenvalue $\lambda_{opt}$ and $y^{(k)}$ defined by (41), we have

[TABLE]

Proof. From (10) and (46), since

[TABLE]

with the unit length vectors $y_{1}^{(k)}/\|y_{1}^{(k)}\|$ and $y_{1}/\|y_{1}\|$ , by definition (41) of $y^{(k)}$ and Lemma 15 we obtain

[TABLE]

Bound (89) indicates that how fast $\sin\angle(s_{k},s_{opt})$ converges amounts to how fast $\sin\angle(y^{(k)},y)$ tends to zero as $k$ increases. In what follows, we derive an a-priori bound for $\sin\angle(y^{(k)},y)$ .

As has been seen, $(\mu_{1},y)$ and $(\mu_{1}^{(k)},z^{(k)})$ are simple eigenpairs of $M$ and $M_{k}$ , respectively, and $(\mu_{1}^{(k)},y^{(k)})$ is the Ritz pair approximating the eigenpair $(\mu_{1},y)$ of $M$ . Let $(y,Y_{\perp})$ be orthogonal. Then the columns of $Y_{\perp}$ form an orthonormal basis of the orthogonal complement of the subspace spanned by $y$ . It follows from the relation $My=\mu_{1}y$ that

[TABLE]

where $f^{T}=y^{T}MY_{\perp}$ and $L=Y_{\perp}^{T}MY_{\perp}$ .

Because the right hand side of (91) is block triangular, the eigenvalues of $M$ consist of $\mu_{1}$ and the eigenvalues of $L$ . Since $\mu_{1}$ is simple, $L-\mu_{1}I$ is nonsingular. The quantity

[TABLE]

is called the separation of $\mu_{1}$ and $L$ , and $sep(\mu_{1},L)=\sigma_{\min}(L-\mu_{1}I)$ , the smallest singular value of $L-\mu_{1}I$ [28].

Let the columns of $Z_{\perp}^{(k)}$ be an orthonormal basis of the orthogonal complement of the subspace spanned by $z^{(k)}$ and $(z^{(k)},Z_{\perp}^{(k)})$ be orthogonal. From (30) we have $M_{k}z^{(k)}=\mu_{1}^{(k)}z^{(k)}$ , from which it follows that

[TABLE]

where $f_{k}^{T}=(z^{(k)})^{T}M_{k}Z_{\perp}^{(k)}$ and $C_{k}=(Z_{\perp}^{(k)})^{T}M_{k}Z_{\perp}^{(k)}$ . Note that the eigenvalues of $C_{k}$ are the Ritz values but $\mu_{1}^{(k)}$ of $M$ with respect to the subspace $\widetilde{\mathcal{S}}_{k}$ defined by (28). As a result, by (29), $\mu_{1}^{(k)}$ is a simple eigenvalue of $M_{k}$ and $sep(\mu_{1}^{(k)},C_{k})>0$ . Since $\mu_{1}-\mu_{1}^{(k)}=\lambda_{opt}-\lambda_{k}\geq 0$ , $\lambda_{k}\rightarrow\lambda_{opt}$ and $sep(\mu_{1},C_{k})\geq sep(\mu_{1}^{(k)},C_{k})-|\mu_{1}-\mu_{1}^{(k)}|$ , we must have $sep(\mu_{1},C_{k})>0$ for $k$ suitably large.

In our notation, the following result is established in [17].

Lemma 17 ([17]).

With the previous notation, let $\varepsilon_{k}=\sin\angle(y,\widetilde{\mathcal{S}}_{k})$ , assume that $sep(\mu_{1},C_{k})>0$ . Then

[TABLE]

Combining (89) and (94) with (67) yields the following result immediately.

Theorem 18.

For the unit length eigenvector $y^{T}=(y_{1}^{T},y_{2}^{T})^{T}$ of $M$ associated with its rightmost eigenvalue $\mu_{1}$ , assume that $sep(\mu_{1},C_{k})>0$ . Then it holds that

[TABLE]

where $\kappa=\frac{\alpha_{1}+\lambda_{opt}}{\alpha_{n}+\lambda_{opt}},$

[TABLE]

and $t=\frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}$ (cf. (69) and (66)).

This theorem indicates that $s_{k}$ converges to $s_{opt}$ at least as fast as $\left(\frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}\right)^{k+1}$ .

Finally, we establish a-priori bounds for the residual norm $\|(A+\lambda_{k}I)s_{k}+g\|$ .

Theorem 19.

Suppose $\|s_{opt}\|=\|s_{k}\|=\Delta$ . Then for $k=0,1,\ldots,k_{\max}$ we have

[TABLE]

by dropping the higher order small term $(\lambda_{opt}-\lambda_{k})\|s_{opt}-s_{k}\|$ .

Proof. From (3), we have

[TABLE]

Therefore, from $\|s_{k}\|=\Delta$ , $\lambda_{opt}-\lambda_{k}\geq 0$ , and $\lambda_{opt}\geq 0$ , noting that $\|A+\lambda_{opt}I\|=\alpha_{1}+\lambda_{opt}$ , we obtain

[TABLE]

by dropping the higher order small term $(\lambda_{opt}-\lambda_{k})\|s_{opt}-s_{k}\|$ .

Keep (87) in mind. By substituting bound (86) for $\lambda_{opt}-\lambda_{k}$ and bound (95) for $\sin\angle(s_{k},s_{opt})$ , which is approximately equal to $\|s_{opt}-s_{k}\|/\|s_{opt}\|$ for $k$ sufficiently large, into (96), we obtain an approximate a-priori bound for $\|(A+\lambda_{k}I)s_{k}+g\|$ . They illustrate that $\|(A+\lambda_{k}I)s_{k}+g\|$ is dominated by $\|s_{k}-s_{opt}\|$ and tends to zero at least as fast as $\left(\frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}\right)^{k+1}$ . Since the resulting bound is not rigorous, we do not write it explicitly.

As a by-product, by exploiting some of the previous results, it is easy to establish an a-priori bound for $\|s_{k}-s_{opt}\|$ , as shown below. With it, we will establish a rigorous a-priori bound for $\|(A+\lambda_{k}I)s_{k}+g\|$ .

Theorem 20.

Suppose $\|s_{opt}\|=\|s_{k}\|=\Delta$ . Then

[TABLE]

where $\kappa=\frac{\alpha_{1}+\lambda_{opt}}{\alpha_{n}+\lambda_{opt}}$ .

Proof. It follows from [31, Theorem 4.3] and (84) that

[TABLE]

where $\pi_{k}$ is the orthogonal projector onto $\mathcal{K}_{k}(g,A)$ . Therefore, (97) follows from the above relation and (85) directly.

This theorem is the same as (4.18b) in [31]. With it, by substituting bound (86) for $\lambda_{opt}-\lambda_{k}$ and bound (20) for $\|s_{k}-s_{opt}\|$ into (96), it is straightforward to obtain the following rigorous a-priori bound for $\|(A+\lambda_{k}I)s_{k}+g\|$ .

Theorem 21.

Suppose $\|s_{opt}\|=\|s_{k}\|=\Delta$ , and let $\|r_{k}\|=\|(A+\lambda_{k}I)s_{k}+g\|$ . Then for $k$ suitably large we have

[TABLE]

with the factors $\eta_{k1}$ and $\eta_{k2}$ defined by (74) and (75), respectively.

Clearly, the second term of the right hand side in (98) dominates the bound soon as $k$ increases.

Summarizing the results obtained in these two sections, we conclude that the convergence rates of $\lambda_{opt}-\lambda_{k}$ and $q(s_{k})-q(s_{opt})$ are the squares of $\sin\angle(s_{k},s_{opt})$ , $\|s_{k}-s_{opt}\|$ and $\|(A+\lambda_{k}I)s_{k}+g\|$ . This means that the convergence of $q(s_{k})$ and $\lambda_{k}$ uses roughly half of the iterations as needed for $s_{k}$ and $\|(A+\lambda_{k}I)s_{k}+g\|$ when the three errors and $\|(A+\lambda_{k}I)s_{k}+g\|$ are reduced to about the same level.

6 Numerical examples

In this section, we compare our a-priori bounds in this paper with the four errors in the GLTR method: $\lambda_{opt}-\lambda_{k}$ , $\sin\angle(s_{k},s_{opt})$ , $q(s_{k})-q(s_{opt})$ and $\|(A+\lambda_{k}I)s_{k}+g\|$ , respectively. In order to give a full justification on our a-priori bounds, we test TRS’s with $A$ having different representative eigenvalue distributions and various condition numbers $\kappa$ ’s.

All the experiments were performed on an Intel Core (TM) i7, CPU 3.6GHz, 8 GB RAM using MATLAB 2017A under the Microsoft Windows 10 64 bit.

Throughout this section, we always take $n=10000$ and a fixed trust-region radius $\Delta=1$ , and the vector $g$ is a unit length vector generated by the Matlab built-in function ${\sf randn(n,1)}$ . Since the uncomputable $\varepsilon_{k}$ tends to zero as $k$ increases, we take $\varepsilon_{k}=0$ in the denominator of the bound of Theorem 18. We exploit the Matlab functions eigs and svds with the stopping tolerance $10^{-14}$ to compute $\lambda_{opt}$ , $s_{opt}$ and $\|M\|$ , respectively, use them as the “exact” ones, and then compute $q(s_{opt})$ . To maintain the numerical orthogonality of the Lanczos basis vectors, in finite precision arithmetic, we use the symmetric Lanczos process with complete reorthogonalization.

When assessing our a-priori bounds, we should note that the bounds may be often large overestimates of the true errors, but that there are cases where the actual errors and their bounds become close to each other when $k$ increases. However one cannot say that a certain kind of bound is the sharpest in all cases. Possible overestimates of our bounds are not surprising, since the bounds are established in the worst case and the factors in front of $\left(\frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}\right)^{k+1}$ or $\left(\frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}\right)^{2(k+1)}$ are the largest possible. Our aim consists in giving a-priori bounds which may yield sharp estimates of the asymptotic convergence rates even if those factors in front of the bounds are large.

Example 1. This example is randomly generated, where the symmetric indefinite sparse matrix is generated by the Matlab function

[TABLE]

where ${\sf rc}$ is a vector of $A$ ’s eigenvalues, and we take $density=0.01$ . We construct two $A$ ’s by taking two different ${\sf rc}$ ’s.

Example 1a. The elements of ${\sf rc}$ are evenly distributed among $[-2,2]$ :

[TABLE]

Example 1b. We take the $i$ th element ${\sf rc}(i)$ of ${\sf rc}$ as

[TABLE]

Therefore, the eigenvalues of $A$ lies in the union $[-e,-1.0002]\cup[1.0002,e]$ , and their magnitudes monotonically increases at the rate $e^{2/n}$ at each subinterval.

In Tables 1–2 and Figures 1–2, we list the results and compare the a-priori bounds with $\lambda_{opt}-\lambda_{k}$ , $\sin\angle(s_{k},s_{opt})$ , $q(s_{k})-q(s_{opt})$ and $\|(A+\lambda_{k}I)s_{k}+g\|$ , respectively.

Example 2. We take $A$ to be diagonal with translated Chebyshev nodes on the diagonal. This problem is tested in [31]. The zero nodes of the $n$ th Chebyshev polynomial in $[-1,1]$ are given by

[TABLE]

Given an interval $[a,b]$ , the linear transformation

[TABLE]

maps $x\in[-1,1]$ to $y\in[a,b]$ . The $n$ th translated Chebyshev zero nodes on $[a,b]$ are

[TABLE]

which monotonically decreases for $j=1,2,\ldots,n/2$ and increases for $j=n/2,\ldots,n$ , respectively, and cluster at $[a,b]=[-5,5]$ and $A=diag\{t^{[a,b]}_{jn}\},\ j=1,2,\ldots,n$ .

In Figure 3 and Table 3, we draw and list the results.

Example 3. We use the Strakoš matrix [19, p.16], which is used to test the behavior of the symmetric Lanczos method for the eigenvalue problem. The matrix $A$ is diagonal with the eigenvalues

[TABLE]

$i=1,2,\ldots,n$ . The parameter $\rho$ controls the eigenvalue distribution. The large eigenvalues of $A$ are well separated for $\rho<1$ . We take $\alpha_{1}=8$ , $\alpha_{n}=-2$ and $\rho=0.99$ .

In Figure 4 and Table 4, we depict and list the results.

Example 4. We take

[TABLE]

with $G$ generated by ${\sf randn(n)}$ and $A:=A/\|A\|$ . The eigenvalues of $A$ exhibit normal distribution characteristics. Figure 5 and Table 5 give the results.

We have observed from the figures and tables that, for all the test problems, (i) the corresponding bounds predict the convergence rates of $\lambda_{opt}-\lambda_{k}$ , $\sin\angle(s_{k},s_{opt})$ , $q(s_{k})-q(s_{opt})$ and $\|(A+\lambda_{k}I)s_{k}+g\|$ accurately and (ii) the bounds are very close to their values in most of the cases, especially for $\lambda_{opt}-\lambda_{k}$ and $q(s_{k})-q(s_{opt})$ .

The tables and figures also indicate that (i) the errors $\lambda_{opt}-\lambda_{k}$ and $q(s_{k})-q(s_{opt})$ as well as their bounds use roughly half of the iterations needed for $\sin\angle(s_{k},s_{opt})$ and $\|(A+\lambda_{k}I)s_{k}+g\|$ as well as their bounds to achieve approximately the same tolerance and (ii) the condition number $\kappa$ affects the convergence of the GLTR method: the bigger $\kappa$ is, the more iterations the method needs to reduce each of $\lambda_{opt}-\lambda_{k}$ , $\sin\angle(s_{k},s_{opt})$ , $q(s_{k})-q(s_{opt})$ and $\|(A+\lambda_{k}I)s_{k}+g\|$ to approximately the same level.

7 Conclusion

The GLTR method has been receiving high attention both theoretically and numerically. Some a-priori bounds have been obtained for $q(s_{k})-q(s_{opt})$ and $\|s_{k}-s_{opt}\|$ in the literature, but there has been no quantitative analysis and result on $\lambda_{opt}-\lambda_{k}$ and $\|(A+\lambda_{k}I)s_{k}+g\|$ . Starting with the mathematical equivalence of the solution of TRS (1) and the eigenvalue problem of the augmented matrix $M$ , we have established a-priori bounds for $\lambda_{opt}-\lambda_{k}$ , $\sin\angle(s_{k},s_{opt})$ , $q(s_{k})-q(s_{opt})$ , and the residual norm $\|(A+\lambda_{k}I)s_{k}+g\|$ . The results prove how the three errors and the residual norm decrease as the subspace dimension increases. Numerical results have confirmed that our bounds are realistic and they accurately predict the true convergence rates of the three errors and the residual norm in the GLTR method.

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. Adachi, S. Iwata, Y. Nakatsukasa, and A. Takeda , Solving the trust-region subproblem by a generalized eigenvalue problem , SIAM J. Optim., 27 (2017), pp. 269–291.
2[2] R. H. Byrd, R. B. Schnabel, and G. A. Shultz , Approximate solution of the trust region problem by minimization over two-dimensional subspaces , Math. Prog., 40 (1988), pp. 247–263.
3[3] A. R. Conn, N. I. M. Gould, and P. L. Toint , Trust-region Methods , SIAM, Philadelphia, 2000.
4[4] R. Ei Attar , Special Functions and Orthogonal Polynomials , Lulu Press, USA, 2006.
5[5] J. B. Erway, and P. E. Gill , A subspace minimization method for the trust-region step , SIAM J. Optim., 20 (2010), pp. 1439–1461.
6[6] J. B. Erway, P. E. Gill, and J. D. Griffin , Iterative methods for finding a trust-region step , SIAM J. Optim., 20 (2009), pp. 1110–1131.
7[7] C. Fortin, and H. Wolkowicz , The trust region subproblem and semidefinite programming , Optim. Methods Softw., 19 (2004), pp. 41–67.
8[8] N. I. M. Gould, S. Lucidi, M. Roma, and P. L. Toint , Solving the trust-region subproblem using the Lanczos method , SIAM J. Optim., 9 (1999), pp. 504–525.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

The convergence of the Generalized Lanczos Trust-Region Method for the Trust-Region

Abstract

keywords:

AMS:

1 Introduction

Theorem 1**.**

2 Preliminaries

2.1 A solution to TRS (1)

Definition 2** (Hard Case).**

2.2 The equivalence of the TRS and a matrix

Theorem 3** ([1]).**

Remark 2.1**.**

3 The generalized Lanczos trust-region (GLTR) method [8]

4 A-priori bounds for

Lemma 4**.**

Lemma 5**.**

Theorem 6**.**

Theorem 7**.**

Theorem 8**.**

Theorem 9**.**

Theorem 10**.**

Theorem 11**.**

Lemma 12** ([31]).**

Theorem 13**.**

Theorem 14**.**

5 A-priori bounds for sin⁡∠(sk,sopt)\sin\angle(s_{k},s_{opt})sin∠(sk​,sopt​) and

Lemma 15** ([12]).**

Theorem 16**.**

Lemma 17** ([17]).**

Theorem 18**.**

Theorem 19**.**

Theorem 20**.**

Theorem 21**.**

6 Numerical examples

7 Conclusion

Theorem 1.

Definition 2 (Hard Case).

Theorem 3 ([1]).

Remark 2.1.

Lemma 4.

Lemma 5.

Theorem 6.

Theorem 7.

Theorem 8.

Theorem 9.

Theorem 10.

Theorem 11.

Lemma 12 ([31]).

Theorem 13.

Theorem 14.

5 A-priori bounds for $\sin\angle(s_{k},s_{opt})$ and

Lemma 15 ([12]).

Theorem 16.

Lemma 17 ([17]).

Theorem 18.

Theorem 19.

Theorem 20.

Theorem 21.