Linear systems solvers - recent developments and implications for   lattice computations

Andreas Frommer (Department of Mathematics; University of Wuppertal,; Germany)

arXiv:hep-lat/9608074·hep-lat·October 28, 2009

Linear systems solvers - recent developments and implications for lattice computations

Andreas Frommer (Department of Mathematics, University of Wuppertal,, Germany)

PDF

TL;DR

This paper reviews recent advances in Krylov subspace methods for solving non-Hermitian linear systems, emphasizing their near-optimal performance for lattice gauge theory computations and highlighting preconditioning as a key area for future improvements.

Contribution

It analyzes the effectiveness of mature Krylov methods like QMR, BiCGStab, and GMRES for Wilson fermion matrices, stressing the importance of preconditioning.

Findings

01

Krylov methods are near-optimal for Wilson fermion matrices

02

Preconditioning is crucial for further improvements

03

Mature methods like QMR, BiCGStab, GMRES are effective

Abstract

We review the numerical analysis' understanding of Krylov subspace methods for solving (non-hermitian) systems of equations and discuss its implications for lattice gauge theory computations using the example of the Wilson fermion matrix. Our thesis is that mature methods like QMR, BiCGStab or restarted GMRES are close to optimal for the Wilson fermion matrix. Consequently, preconditioning appears to be the crucial issue for further improvements.

Equations44

M x = b

M x = b

r^{m} = p_{m} (M) r^{0},

r^{m} = p_{m} (M) r^{0},

M = V Λ V^{- 1}

M = V Λ V^{- 1}

∥ r^{m} ∥ \leq ∥ V ∥ \cdot ∥ V^{- 1} ∥ \cdot ∥ r^{0} ∥ \cdot ∥ p_{m} (Λ) ∥.

∥ r^{m} ∥ \leq ∥ V ∥ \cdot ∥ V^{- 1} ∥ \cdot ∥ r^{0} ∥ \cdot ∥ p_{m} (Λ) ∥.

∥ p_{m} (Λ) ∥ = λ \in σ (M) max ∣ p_{m} (λ) ∣.

∥ p_{m} (Λ) ∥ = λ \in σ (M) max ∣ p_{m} (λ) ∣.

c_{m} := p _{m} ( 0 ) = 1 d e g ( p _{m} ) \leq m , min λ \in σ (M) max ∣ p_{m} (Λ) ∣, m = 0, 1, \dots

c_{m} := p _{m} ( 0 ) = 1 d e g ( p _{m} ) \leq m , min λ \in σ (M) max ∣ p_{m} (Λ) ∣, m = 0, 1, \dots

∥ r^{m} ∥ = p _{m} ( 0 ) = 1 d e g ( p _{m} ) \leq m , min ∥ p (M) r^{0} ∥

∥ r^{m} ∥ = p _{m} ( 0 ) = 1 d e g ( p _{m} ) \leq m , min ∥ p (M) r^{0} ∥

M = e^{i Θ} (T + iσ I), \mbox w h er e T = T^{†} \mbox an d σ, Θ \in \mbox IR .

M = e^{i Θ} (T + iσ I), \mbox w h er e T = T^{†} \mbox an d σ, Θ \in \mbox IR .

(r^{m})^{†} \tilde{p}_{m} (M) \tilde{r} = 0

(r^{m})^{†} \tilde{p}_{m} (M) \tilde{r} = 0

M J = J M^{†} .

M J = J M^{†} .

V_{1}^{- 1} M V_{2}^{- 1} x = b

V_{1}^{- 1} M V_{2}^{- 1} x = b

\mbox so l v e V_{2} w = y, v = M w, \mbox so l v e V_{1} z = v .

\mbox so l v e V_{2} w = y, v = M w, \mbox so l v e V_{1} z = v .

M = D - L - U,

M = D - L - U,

V = (\frac{1}{ω} D - L) D^{- 1} (\frac{1}{ω} D - U) .

V = (\frac{1}{ω} D - L) D^{- 1} (\frac{1}{ω} D - U) .

M=\left(\begin{array}[]{cc}D_{1}&-B_{1}\\ -B_{2}&D_{2}\end{array}\right).

M=\left(\begin{array}[]{cc}D_{1}&-B_{1}\\ -B_{2}&D_{2}\end{array}\right).

V_{1}^{-1}MV_{2}^{-1}=\left(\begin{array}[]{cc}I&0\\ 0&I-B_{2}D_{1}^{-1}B_{1}D_{2}^{-1}\end{array}\right).

V_{1}^{-1}MV_{2}^{-1}=\left(\begin{array}[]{cc}I&0\\ 0&I-B_{2}D_{1}^{-1}B_{1}D_{2}^{-1}\end{array}\right).

\begin{array}[]{l}\mbox{solve }(\widehat{D}-U)v=x,\\ \mbox{solve }(\widehat{D}-L)w=(D-2\widehat{D})v+x,\\ y=\widehat{D}(v+w).\end{array}

\begin{array}[]{l}\mbox{solve }(\widehat{D}-U)v=x,\\ \mbox{solve }(\widehat{D}-L)w=(D-2\widehat{D})v+x,\\ y=\widehat{D}(v+w).\end{array}

B=\left(\begin{array}[]{cc}0&B_{1}\\ B_{2}&0\end{array}\right).

B=\left(\begin{array}[]{cc}0&B_{1}\\ B_{2}&0\end{array}\right).

M_{e} = I - κ^{2} \cdot B_{2} B_{1} .

M_{e} = I - κ^{2} \cdot B_{2} B_{1} .

Q = γ_{5} \cdot M, Q_{e} = γ_{5} \cdot M_{e} .

Q = γ_{5} \cdot M, Q_{e} = γ_{5} \cdot M_{e} .

c_{m} (Q_{e}) \leq (\frac{1 - \frac{a _{e}}{b _{e}}}{1 + \frac{a _{e}}{b _{e}}})^{m /2}, c_{m} (Q_{e}^{2}) \leq (\frac{1 - \frac{a _{e}}{b _{e}}}{1 + \frac{a _{e}}{b _{e}}})^{m} .

c_{m} (Q_{e}) \leq (\frac{1 - \frac{a _{e}}{b _{e}}}{1 + \frac{a _{e}}{b _{e}}})^{m /2}, c_{m} (Q_{e}^{2}) \leq (\frac{1 - \frac{a _{e}}{b _{e}}}{1 + \frac{a _{e}}{b _{e}}})^{m} .

c_{m} (M_{e}) \leq (1 - \frac{α _{e}^{2}}{b _{e}^{2}})^{m} / 2 = ρ_{m},

c_{m} (M_{e}) \leq (1 - \frac{α _{e}^{2}}{b _{e}^{2}})^{m} / 2 = ρ_{m},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Linear systems solvers – recent developments and implications

for lattice computations

A. Frommer

Fachbereich Mathematik, Universität Wuppertal,

42097 Wuppertal, Germany

Abstract

We review the numerical analysis’ understanding of Krylov subspace methods for solving (non-hermitian) systems of equations and discuss its implications for lattice gauge theory computations using the example of the Wilson fermion matrix. Our thesis is that mature methods like QMR, BiCGStab or restarted GMRES are close to optimal for the Wilson fermion matrix. Consequently, preconditioning appears to be the crucial issue for further improvements.

1 KRYLOV SUBSPACE METHODS

Given a linear system of equations

[TABLE]

with $M\in\mbox{\sf\hskip 2.5pt\rule[0.50003pt]{0.35004pt}{5.89996pt}C}^{n\times n}$ being non-singular, the class of Krylov subspace iterative methods for solving (1) is characterized by the following generic template

[TABLE]

Here, $q_{m-1}$ is a polynomial of degree $\leq m-1$ . For the residual $r^{m}=b-Mx^{m}$ we therefore get

[TABLE]

where $p_{m}$ is the polynomial $p_{m}(t)=1-tq_{m-1}(t)$ . In an algorithmic description of virtually any Krylov subspace method, the polynomials $q_{m-1}$ or $p_{m}$ are not explicitly present, but they are crucial to a theoretical analysis of the method. Moreover, the relation (2) is also the key to understanding the condition (‘difficulty’) of the linear system to be solved and we start by discussing this point.

1.1 Condition

Assume that $M$ is diagonaizable, i.e. we have a decomposition of the form

[TABLE]

where $\Lambda$ is diagonal with its diagonal containing the eigenvalues, and $V$ is the corresponding matrix of (right) eigenvectors. Denoting the spectrum of $M$ by $\sigma(M)$ , from (2) we now get $r^{m}=Vp_{m}(\Lambda)V^{-1}r^{0}$ and therefore

[TABLE]

Since $p_{m}(\Lambda)$ is diagonal we have

[TABLE]

Considering $\|V\|,\|V^{-1}\|$ and $\|r^{0}\|$ as constants, the best bound any Krylov subspace method can achieve in (3) is the one obtained for the polynomial which minimizes $\|p_{m}(\Lambda)\|$ . In this sense, the quantities

[TABLE]

represent a measure of the condition of the system (1), since no Krylov subspace method can achieve a better bound in (2) than the one which replaces $\|p_{m}(\Lambda)\|$ by $c_{m}$ . Finding the optimal polynomial in (4) is a complex approximation problem for which solutions are known only in special cases. However, it is clear that due to the restriction $p_{m}(0)=1$ the numbers $c_{m}$ will tend to zero only slowly if there are many eigenvalues close to 0, particularly if they are distributed quite evenly around [math].

1.2 Optimal methods

A Krylov subspace method is feasible algorithmically if it requires only a finite amount of ressources like storage and computer time. We express this fact by saying that the method can be implemented using short recurrencies, meaning that all quantities needed at iteration $m$ can be computed from those of a small number of previous iterations. Note that each Krylov subspace method will require at least one multiplication with $M$ per iteration to account for the fact that the degree of the polynomial $p_{m}$ will increase as the iteration proceeds. The following theorem [1] shows that optimality and short recurrencies can only be achieved for a restricted class of matrices.

Theorem 1.1

A Krylov subspace method which achieves optimality, i.e.

[TABLE]

for every initial residual $r^{0}$ and which can be implemented using short recurrencies exists only if $M$ is of the form

[TABLE]

This theorem also holds if $\|\cdot\|$ is replaced by an energy norm of the form $\|x\|_{H}=x^{\dagger}Hx$ with $H\in\mbox{\sf\hskip 2.5pt\rule[0.50003pt]{0.35004pt}{5.89996pt}C}^{n\times n}$ hermitian and positive definite. For $M$ hermitian and positive definite the CG method achieves optimality in the energy norm with $H=M$ . For $M$ hermitian (but possibly indefinite), MINRES [2] is optimal in the Euclidian norm. The paper [3] gives algorithmic descriptions for optimal methods in the other cases of Theorem 1.1. Note that the above theorem includes matrices of the form $\sigma I+S$ with $S^{\dagger}=-S$ (take $\Theta=-\pi/4$ ), arising for staggered fermions.

2 NON-HERMITIAN SYSTEMS

The last 10 years have seen tremendous progress in Krylov subspace methods for solving linear systems which, like the Wilson fermion matrix, do not fall into the category covered by Theorem 1.1. See [4, 5] for an overview. For simplicity, such systems will just be termed ‘non-hermitian’ in the sequel. In these cases one must find an adequate compromise between the quality of the Krylov subspace method to use and the ressources required by the method.

The first method of this kind is the BiCG method [6]. Here, an additional shadow residual $\tilde{r}$ is selected and the $m$ -th iterate $x^{m}$ is defined by the Galerkin condition

[TABLE]

for all polynomials $\tilde{p}_{m}$ of degree $\leq m$ . In case that $M$ is hermitian positive definite and $\tilde{r}=r^{0}$ the method reduces to the CG method. BiCG needs two matrix multiplies (one with $M$ and one with $M^{\dagger}$ ) per iteration and the residuals typically undergo quite large variations. Moreover, there are situations where the method breaks down (due to division by zero) without having reached a solution. Although exact breakdowns do rarely occur in practice, near breakdowns severely affect the numerical stability.

2.1 QMR

QMR, the quasi minimal residual method of [7], can be regarded as one way to make BiCG more reliable. As BiCG it is based upon the non-symmetric Lanczos process to compute an appropriate basis $v_{1},\ldots,v_{m}$ of the Krylov subspace $K_{m}(M,r^{0})=\{p_{l}(M)r^{0},\,\deg p_{l}\leq m-1\}$ . The $m$ -th residual $r^{m}$ is characterized by the coefficient vector $(\alpha_{1},\ldots,\alpha_{m})$ in $r^{m}=\sum_{i=1}^{m}\alpha_{i}v_{i}$ having minimal norm subject to the condition $r^{m}=p_{m}(M)r^{0},\,\deg p_{m}\leq m,\,p_{m}(0)=1$ . If the Lanczos vectors $v_{1},\ldots,v_{m}$ were orthogonal this would imply that $r^{m}$ is minimal. Since for non-hermitian matrices the Lanczos vectors are not orthogonal, minimizing the coefficient vector merely implies a ‘quasi’ minimality of $r^{m}$ whence the name QMR. QMR eliminates one source of breakdowns present in BiCG. Moreover, using a look-ahead strategy in the non-symmetric Lanczos process, almost all other (exact or near) breakdowns are also avoided at the price of extra storage. All these features are implemented in QMRPACK which is available from netlib. As in BiCG each iteration costs one multiply with $M$ and one with $M^{\dagger}$ . The quite smooth convergence of QMR is also justified by the theoretical analysis.

2.2 $J$ -hermitian matrices

A matrix $M$ is said to be $J$ -hermitian if there exists a matrix $J$ such that

[TABLE]

In this particular case, the non-symmetric Lanczos process can be made less costly, since through the right choice of the ‘shadow residual’ $\widetilde{r}$ all multiplications with $M^{\dagger}$ can be replaced by multiplications with $J$ [8]. Consequently, BiCG and QMR require only one multiply with $M$ and one with $J$ in each iteration. For the Wilson fermion matrix we have $J=\gamma_{5}$ and thus multiplies with $J$ are by far more cheaper than with $M$ . Exploiting the $\gamma_{5}$ -symmetry thus makes QMR (and BiCG) competitive to the other methods discussed in this section, see [9, 10]. At the time of writing this article, including the $J$ -hermitian case into QMRPACK was under preparation [11] but not yet completed.

2.3 BiCGStab

The BiCGStab [12] method is another way to stabilize BiCG. Here, multiplies with $M^{\dagger}$ are replaced by multiplies with $M$ such that an additional one-dimensional minimization process is performed during each iteration.

All computational effort, in particular, all matrix multiplies is spent working on the iterates of the system to solve. Typically, BiCGStab produces less varying residuals than BiCG, although the same sources for breakdowns are still present. BiCGStab is quite easy to implement ‘from scratch’. Some variations are described in [13, 14]

2.4 Restarted GMRES

In contrast to the Lanzcos process, the Arnoldi process computes an orthogonal basis of $K_{m}(M,r^{0})$ for a general non-hermitian matrix $M$ . From the Arnoldi basis it is possible to calculate an optimal iterate $x^{m}$ (such that $r^{m}$ satisfies (5) ) by solving a small least squares problem.

The resulting method is called GMRES, the generalized minimal residual method [15]. However, the Arnoldi process does not rely on short recurrencies requiring $m$ vectors of storage and $O(m^{2})$ inner products to be computed.

One therefore has to stop GMRES after a certain number ( $k$ , say) of iterations and restart the process with the current iterate $x^{k}$ as a new initial guess. The resulting method is termed restarted GMRES or GMRES( $k$ ). For $k=1$ , a restart is done after every iteration. Hence, GMRES(1) is identical to the familiar MR method [16], where the iterate $x^{m+1}$ is obtained by minimizing $r^{m+1}(t)=b-M(x^{m}+tr^{m})$ with respect to $t\in\mbox{\sf\hskip 2.5pt\rule[0.50003pt]{0.35004pt}{5.89996pt}C}$ . There are situations where GMRES( $k$ ) stagnates without reaching a solution, even for large restart values $k$ , but if all eigenvalues of $M$ lie in the right half plane GMRES( $k$ ) is known to converge for all $k$ [5, 15].

3 PRECONDITIONING

We have seen in Section 1 that the eigenvalue distribution of $M$ determines a bound on the maximal speed of any Krylov subspace method for $M$ . Once we have a method which is close to optimal, the only way of getting further improvement is to change the matrix $M$ to one for which the eigenvalue distribution is more favorable. This is precisely the purpose of preconditioning where the original system $Mx=b$ is changed to

[TABLE]

with $\widehat{b}=V_{1}^{-1}b$ and $\widehat{x}=V_{2}x$ . The matrices $V_{1},V_{2}$ are called the left and right preconditioner, resp., and their product $V=V_{1}V_{2}$ is often referred to as the preconditioner. Note that the spectrum of $V_{1}^{-1}MV_{2}^{-1}$ is identical to that of $V^{-1}M$ , so that the effect of preconditioning on the eigenvalue distribution is determined by $V$ alone but not by its factorization $V=V_{1}V_{2}$ . A preconditioner should approximate $M$ (so that the eigenvalues of $V^{-1}M$ cluster around 1). On the other hand, performing a Krylov subspace method on the preconditioned system requires multiplies with the preconditioned matrix like in $z=V_{1}^{-1}MV_{2}^{-1}y$ which are normally obtained via

[TABLE]

Preconditioning thus introduces additional solves with the matrices $V_{1}$ and $V_{2}$ and this overhead should not be too expensive in order to get an efficient method. A good preconditioner is always a compromise between the latter requirement and the fact that $V$ should well approximate $M$ .

Conceptually, one may distinguish between two types of preconditioners: In problem oriented preconditioners the matrix $V$ is taken as a simpler or reduced (with respect to $M$ ) representation of the underlying physical problem. Algebraic preconditioners are obtained directly from $M$ without recourse to the application from which $M$ arises. Interestingly, algebraic preconditioners seem to be more successful than problem oriented ones in QCD computations and we therefore focus on the latter ones.

3.1 SSOR preconditioners

Each matrix $M$ can be decomposed into

[TABLE]

where $D,-L$ and $-U\in\mbox{\sf\hskip 2.5pt\rule[0.50003pt]{0.35004pt}{5.89996pt}C}^{n\times n}$ represent the diagonal, the stricly lower and the stricly upper triangular part of $M$ . We assume that $M$ has all diagonal elements $\not=0$ , so that $D,D-L$ and $D-U$ are all non-singular. For a given relaxation parameter $\omega\not=0$ the SSOR preconditioner is defined by (see [5], e.g.)

[TABLE]

For $\omega=1$ we thus have $V=M-LD^{-1}U$ as an approximation to $M$ . Systems with the preconditioner $V$ are easy to solve because $\frac{1}{\omega}D-L$ and $\frac{1}{\omega}D-U$ are triangular so that $x$ in $(\frac{1}{\omega}D-L)x=y$ can be obtained by a simple forward recursion, and similarly by a backward recursion in $(\frac{1}{\omega}D-U)x=y$ . Note that the situation becomes more involved if we consider parallelization issues since recursions are known to parallelize badly.

Assume that $M$ is of the particular form

[TABLE]

This is the case for the Wilson fermion matrix if we use the standard odd-even ordering (with $D_{1}=D_{2}=I$ ). If we take $V_{1}=(D-L)D^{-1}$ and $V_{2}=(D-U)$ we get

[TABLE]

For the Wilson fermion matrix the second diagonal block in (7) is commonly called the odd-even reduced system. Our discussion shows that odd-even reduction is nothing else but the SSOR preconditioning with respect to the odd-even ordering and with $\omega=1$ . Very exceptionally, in this case it is of no harm to calculate the preconditioned matrix explicitly as done in (7), whereas in general this produces too much fill-in to be practicable. If we re-interprete $D,-L,-U$ as block parts of $M$ , the above discussion can also be used to derive block SSOR preconditioners. In QCD this can be useful in the context of improved actions where $D$ then is block diagonal with blocks of size $12\times 12$ . See also [17, 18]

3.2 ILU factorizations

The incomplete LU factorization (ILU) (see [5], e.g.) is another algebraic method to obtain a preconditioner $V=(\widehat{D}-\widehat{L})\widehat{D}^{-1}(\widehat{D}-\widehat{U})$ for $M$ where, again, $\widehat{D},\widehat{L},\widehat{U}$ are diagonal, strictly lower and strictly upper triangular, respectively. These matrices are obtained by performing a variant of Gaussian elimination on $M$ imposing restrictions on the amount of fill-in in the factors $\widehat{D}-\widehat{L}$ and $\widehat{D}-\widehat{U}$ so that $V$ represents only an approximate (incomplete) factorization of $M$ . If we allow for no fill-in (i.e. $\widehat{D}-\widehat{L}$ and $\widehat{D}-\widehat{U}$ have the same sparsity structure as $M$ ) and if $M$ represents a nearest neighbor-coupling on a regular grid, then $\widehat{L}=L$ and $\widehat{U}=U$ , so that the only difference to the SSOR preconditioner resides in the diagonal part $\widehat{D}$ . For the Wilson fermion matrix with Wilson parameter $r=1$ both preconditioners turn out to be totally equal. ILU preconditioners are often somewhat more efficient than SSOR preconditioners, but note that they require a start-up phase to compute $\widehat{D}$ (and $\widehat{L}$ and $\widehat{U}$ , in general).

3.3 The Eisenstat trick

If we have an SSOR or ILU preconditioner of the form $V_{1}=(\widehat{D}-L)\widehat{D}^{-1}$ and $V_{2}=(\widehat{D}-U)$ , the product $y=V_{1}^{-1}MV_{2}^{-1}x$ can be computed as

[TABLE]

As far as flop counts are concerned, the above scheme is as expensive as one multiplication with $M$ itself, except for some additional operations involving diagonal matrices which can usually be neglected. So, due to the Eisenstat trick, the ILU and SSOR preconditioners do not increase the amount of work per iteration, thus making these preconditioners particularly attractive. Note that the Eisenstat trick can also be applied in more general situations, see [19].

3.4 The influence of orderings

When writing down the equation $Mx=b$ we are free to chose any ordering for the variables, and the change from one ordering to another translates into a transformation of the kind $M\to P^{\dagger}MP$ with $P$ a permutation matrix. For both, the SSOR and ILU preconditioners, the spectrum of the preconditioned matrix depends on the ordering chosen (but the Eisenstat trick can always be applied). There is therefore a potential to optimize these preconditioners using the best ordering. Typically, orderings which yield good preconditioners make the recurrencies in solving the triangular systems less amenable to parallel implementations. For example, the natural lexicographic ordering of lattice points in the Wilson fermion matrix was shown to yield a high quality ILU preconditioner [20], but it cannot be handled efficiently on a distributed memory parallel computer. In [21] it was shown that a new locally lexicographic ordering can yield up to a factor 2 improvement over odd-even preconditioning on a Quadrics parallel computer.

3.5 Polynomial preconditioning

Another algebraic preconditioner is obtained by taking $V^{-1}=s(M)$ where $s$ is a polynomial such that $s(M)$ approximates $M^{-1}$ . So the multiplication with $V^{-1}$ requires $\deg(s)$ multiplies with $M$ . Consequently, $deg(s)+1$ steps on the original system are as expensive as one step on the preconditioned system and the iterates are from the same Krylov subspace. In this respect polynomial preconditioning therefore offers little advantage, but it was shown in [17] that it can be useful as a mean of stabilizing the MR method in certain situations.

4 EXAMPLE: WILSON FERMIONS

The generic form of the Wilson fermion matrix is $M=I-\kappa B$ where $B$ represents the nearest neighbor coupling on the space-time lattice. Taking the even-odd ordering, $B$ has the form

[TABLE]

The odd-even reduced matrix $M_{e}$ from (7) is

[TABLE]

A typical example of the eigenvalue distribution for $M$ and $M_{e}$ (calculated from a confined configuration on a small $4^{4}$ -lattice at $\beta=5.0$ and $\kappa=0.150$ ) is given on top of Fig. 1. Note that all eigenvalues lie in the right half plane so that GMRES( $k$ ) is known to converge for all $k$ . A number $\mu$ is an eigenvalue of $M_{e}$ if and only if it is of the form $\mu=\lambda(2-\lambda)$ where $\lambda$ is an eigenvalue of $M$ . We write $\alpha_{e}>0$ for the smallest real part of an eigenvalue of $M_{e}$ .

Both, $M$ and $M_{e}$ are $\gamma_{5}$ -symmetric, and we denote the respective symmetrized systems by

[TABLE]

$Q$ and $Q_{e}$ are both hermitian and half of their eigenvalues are negative and half are positive. Moreover, the eigenvalue plots given in Fig. 1 show that except for a pair close to zero the eigenvalues are quite evenly distributed in two intervals symmetric to the origin, denoted $[-b_{e},-a_{e}],[a_{e},b_{e}]$ for $Q_{e}$ . Finally, if we consider $M_{e}^{\dagger}M_{e}=Q_{e}^{2}$ , then its eigenvalues are just the squares of those of $Q_{e}$ and are therefore distributed in the interval $[a_{e}^{2},b_{e}^{2}]$ .

With the information of Fig. 1 as a background, we can now start to discuss the condition of the different matrices, i.e. the numbers $c_{m}$ from (4). First of all we realize that odd-even preconditioning really makes the spectrum of $M$ and $Q$ ‘nicer’, since eigenvalues are mapped away from 0 and are more clustered. We thus restrict the subsequent discussion to the even-odd preconditioned matrices. For the hermitian matrices $Q_{e}$ and $Q_{e}^{2}$ , rather good bounds for $c_{m}$ can be derived via the Chebyshev polynomials on $[a_{e},b_{e}]$ , and we obtain

[TABLE]

Since MINRES is a feasible optimal method for $Q_{e}$ and CG is an optimal method for $Q^{2}_{e}=M_{e}^{\dagger}M_{e}$ , we also can take the above numbers as an approximate measure for the performance of these methods. They indicate that CGNR, the CG method applied to the normal equations $M^{\dagger}Mx=M^{\dagger}b$ would require half as many iterations as MINRES for $Qx=\gamma_{5}b$ . So in terms of matrix multiplies with $Q$ (or $M$ ) – which is the computationally dominating part –, both methods should be comparable. Fig. 2 gives some experimental data, where we show the convergence history of MINRES and CGNR plotting the residual norm against the number of matrix mulitplies. We see that MINRES actually performs somewhat better than CGNR. The data comes from a confined configuration on a $16^{4}$ lattice at $\beta=6.0$ and $\kappa=0.155$ which yields a relative quark mass of 0.02, approx. (In order to observe substantial differences between different methods it is important to work on ‘difficult’ problems, i.e. with small relative quark masses.)

For the non-hermitian matrix $M_{e}$ it is not possible to give an accurate bound on $c_{m}$ , but we know at least that

[TABLE]

and MR (= GMRES(1)) already achieves $\|r^{m}\|\leq\rho_{m}\|r^{0}\|$ [5, 16]. The remaining parts of Fig. 2 give the convergence history for GMRES( $k$ ) for several values of $k$ for the same configuration as before as well as the corresponding results for BiCG, QMR and BiCGStab. In BiCG and QMR we made use of the savings due to $\gamma_{5}$ -symmetry. One immediately notices the more erratic behavior of BiCG and BiCGStab. We also see that increasing $k$ in GMRES( $k$ ) gives significant improvement, but there seems little use taking $k$ larger than 8. Finally, QMR, BiCG, BiCGStab perform best and quite comparably which we can interprete as an indication that they are all close to optimal for $M_{e}$ . This observation is backed by results from [22] proving that even the full GMRES method did not give substantial improvement over BiCG, QMR or BiCGStab.

Acknowledgement

I am grateful for the continuing excellent cooperation with K. Schilling’s group at Wuppertal and Jülich, particularly to Th. Lippert. The numerical results were obtained together with P. Fiebach from the Department of Mathematics in Wuppertal.

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] V. Faber and T. Manteuffel, SIAM J. Numer. Anal. 21 (1984) 352.
2[2] C. Paige and A. Saunders, SIAM J. Numer. Anal. 16 (1975) 617.
3[3] R. Freund, Numer. Math. 57 (1990), 285.
4[4] R. Freund, G. Golub and N. Nachtigal, Acta Numerica (1991) 57.
5[5] Y. Saad, Iterative Methods for Sparse Linear Systems, PWS, Boston, 1996.
6[6] R. Fletcher, pp. 73-89 in Lecture Notes in Mathematics 506, (G. Watson, ed.), Springer, Berlin, 1975.
7[7] R. Freund, N. Nachtigal, Numer. Math. 60 (1991) 315.
8[8] R. Freund, pp. 33-47 in Proceedings of the Cornelius Lanczos International Centenary Conference (J. Brown et al., eds), SIAM, Philadelphia, 1994.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Linear systems solvers – recent developments and implications

Abstract

1 KRYLOV SUBSPACE METHODS

1.1 Condition

1.2 Optimal methods

Theorem 1.1

2 NON-HERMITIAN SYSTEMS

2.1 QMR

2.2 JJJ-hermitian matrices

2.3 BiCGStab

2.4 Restarted GMRES

3 PRECONDITIONING

3.1 SSOR preconditioners

3.2 ILU factorizations

3.3 The Eisenstat trick

3.4 The influence of orderings

3.5 Polynomial preconditioning

4 EXAMPLE: WILSON FERMIONS

Acknowledgement

2.2 $J$ -hermitian matrices