A Mixed Precision Eigensolver Based on the Jacobi Algorithm
Zhengbo Zhou

TL;DR
This paper introduces a mixed precision spectral decomposition method for real symmetric matrices using an optimized Jacobi algorithm, which significantly accelerates computation while maintaining accuracy.
Contribution
It proposes a novel mixed precision approach combining low and high precision computations with efficient index selection and orthogonalization techniques.
Findings
The cyclic-by-row index selection is more efficient than classical methods.
Polar decomposition with Newton-Schulz iteration outperforms QR in speed and accuracy.
The new method reduces spectral decomposition time by approximately 30%.
Abstract
The classic method for computing the spectral decomposition of a real symmetric matrix, the Jacobi algorithm, can be accelerated by using mixed precision arithmetic. The Jacobi algorithm is aiming to reduce the off-diagonal entries iteratively using Givens rotations. We investigate how to use the low precision to speed up this algorithm based on the approximate spectral decomposition in low precision. We first study two different index choosing techniques, classical and cyclic-by-row, for the Jacobi algorithm. Numerical testing suggests that cyclic-by-row is more efficient. Then we discuss two different methods of orthogonalizing an almost orthogonal matrix: the QR factorization and the polar decomposition. For polar decomposition, we speed up the Newton iteration by using the one-step Schulz iteration. Based on numerical testing, using the polar decomposition approach (Newton--Schulz…
| Type | Unit Roundoff |
|---|---|
| Single precision | 5.96e-8 |
| Double precision | 1.11e-16 |
| Approach 1 | Approach 2 | |
|---|---|---|
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Advanced Optimization Algorithms Research · Iterative Methods for Nonlinear Equations
A Mixed Precision Eigensolver Based on the Jacobi Algorithm
Zhengbo Zhou
\school
Mathematics \facultyScience and Engineering \submitdateSubmit date: 10 September 2022
\beforeabstract
The classic method for computing the spectral decomposition of a real symmetric matrix, the Jacobi algorithm, can be accelerated by using mixed precision arithmetic. The Jacobi algorithm is aiming to reduce the off-diagonal entries iteratively using Givens rotations. We investigate how to use the low precision to speed up this algorithm based on the approximate spectral decomposition in low precision.
We first study two different index choosing techniques, classical and cyclic-by-row, for the Jacobi algorithm. Numerical testing suggests that cyclic-by-row is more efficient. Then we discuss two different methods of orthogonalizing an almost orthogonal matrix: the QR factorization and the polar decomposition. For polar decomposition, we speed up the Newton iteration by using the one-step Schulz iteration. Based on numerical testing, using the polar decomposition approach (Newton–Schulz iteration) is not only faster but also more accurate than using the QR factorization.
A mixed precision algorithm for computing the spectral decomposition of a real symmetric matrix at double precision is provided. In doing so we compute the approximate eigenvector matrix of in single precision using eig and single in MATLAB. We then use the Newton–Schulz iteration to orthogonalize the eigenvector matrix into an orthogonal matrix in double precision. Finally, we apply the cyclic-by-row Jacobi algorithm on and obtain the spectral decomposition of . At this stage, we will see, from the testings, the cyclic-by-row Jacobi algorithm only need less than 10 iterations to converge by utilizing the quadratic convergence. The new mixed precision algorithm requires roughly 30% of the time used by the Jacobi algorithm on its own. \afterabstract
\prefacesection
Acknowledgements It is my pleasure to acknowledge the advices from my supervisors Prof. Nicholas J. Higham and Prof. Françoise Tisseur. I wouldn’t start my research in numerical linear algebra without their guidance and this thesis cannot be completed without their unlimited patience and insightful feedback.
I would also like to thank all my friends for providing emotional support. In addition, two special thanks to Zeyu Wu, for his suggestion on coding aspects, and Anheng Xi, for the translation of several German articles from Jacobi and Schönhage.
Finally, I thank my parents, Guangyuan Zhou and Ling Li, for their unconditional love and for giving me the chance to study abroad. By standing upon the shoulders of my parents, I have seen further than them.
\afterpreface
Chapter 1 Introduction
Modern hardware increasingly supports low precision floating-point arithmetic. Using low precision, we have opportunities to accelerate linear algebra computations. A symmetric eigenproblem is solving
[TABLE]
with and the matrix is symmetric. Equivalently, the problem is aiming to compute the spectral decomposition
[TABLE]
and is the diagonal matrix with the eigenvalues of along the diagonal.
Traditional ways of solving such a problem include the Jacobi algorithm, the power method and the QR algorithm [8, , Section 5–7]. This thesis is concerned with the Jacobi algorithm proposed by Jacobi in 1846 which is an iterative method that reduces the off-diagonal entries at each step. In addition, we investigate how the usage of low precision arithmetic can speed up the Jacobi algorithm.
In 2000, Drmač and Veselić [5, ] provided a way to speed up the Jacobi algorithm on the real symmetric matrix . Let be the approximate eigenvector matrix of , then applying the Jacobi algorithm on can speed the reduction up.
In Chapter 2 we provide essential definitions and examples including norms, orthogonal matrix and the singular value decomposition. We introduce two important classes of the orthogonal matrix, the Givens rotation and the Householder transformation.
The Jacobi algorithm is introduced in Chapter 3. We describe the derivation of the Jacobi algorithm and its convergence rate. In Section 3.2 and 3.3 we study how to choose the pair of index required by the Jacobi algorithm.
The classical Jacobi algorithm, mentioned in Section 3.2, chooses the largest off-diagonal entry as the pair of index and the cyclic-by-row Jacobi algorithm, mentioned in Section 3.3, chooses the indices row by row but restricts to the superdiagonal part of the matrix. Although the classical Jacobi algorithm reduces the off-diagonal entries most optimally, the cyclic-by-row Jacobi algorithm does not require expensive off-diagonal searches. Finally, we conduct numerical tests to examine the speed and conclude that we shall stick with the faster cyclic-by-row Jacobi algorithm.
We provide a procedure to get an approximate eigenvector matrix in Chapter 4. Firstly, compute the spectral decomposition at low precision such that
[TABLE]
where is the unit roundoff in low precision. Then we orthogonalize to such that where is the unit roundoff in double precision (“Unit roundoff” - defined in Section 1.2). We can use as the approximate eigenvector matrix of and precondition into , then the Jacobi algorithm applied on should converge in a few sweeps. In Section 4.2 and 4.3 we review two methods for orthogonalizing a matrix: the Householder QR factorization and the polar decomposition. Particularly, in Section 4.3, we study the Newton iteration approach and its variant Newton–Schulz iteration to compute the unitary factor of the polar decomposition. MATLAB codes are presented and after several numerical testings, among the Householder QR factorization approach, the Newton iteration approach and the Newton–Schulz iteration approach, the latter method is the most accurate and efficient one to use.
Finally, we assemble all the ingredients and produce Algorithm 9 in Chapter 5, a mixed precision Jacobi algorithm. In Section 5.2.2, we review some literature on the quadratic convergence of the cyclic-by-row Jacobi algorithm. We observe from the numerical testings that for any symmetric matrix , after preconditioning, the Jacobi algorithm should converge within 10 iterations. The numerical testings suggest the Jacobi algorithm only needs two iterations to converge and there is a 75% reduction of time using preconditioned Jacobi algorithm compare to the Jacobi algorithm on its own.
1.1 Notations
We use the Householder notations:
- •
Use capital letters for matrices.
- •
Use lower case letters for vectors.
- •
Use lower case Greek letters for scalars.
We use and to represent the eigenvalues and the singular values of . We denote or if the matrix has been specified as the largest singular value of .
We say that the problem is well conditioned if small changes in the data always make small changes in the solution ; otherwise, it is ill-conditioned. We use as a measure to visualize the condition of a problem. (Norms will be introduced in Section 2.1)
We say a matrix is almost orthogonal if where is the identity matrix. A matrix is almost diagonal if and this quantity is zero if is diagonal.
1.2 IEEE standard
The unit roundoff measures the relative error due to approximation in floating-point arithmetic. Throughout this thesis, we will only consider two precisions: single precision and double precision [14, , Section 2.1]. We will denote as the unit roundoff at single precision and as the unit roundoff at double precision. IEEE standard 754-2019 [19, , Table 3.5] shown in Table 1.1 will be used. For simplicity, any matrix with subscript will be at single precision and any matrix with subscript will be at double precision.
1.3 Reproducible Research
In this section, we will present our testing environment and system specifications. With these informations, the reader will have the ability to reproduce any results we have in this thesis.
Testing environment and system specifications:
- •
System OS: Windows 10.
- •
MATLAB version: 9.12.0.2009381 (R2022a) Update 4 [17].
- •
CPU: Intel(R) Core(TM) i5-9600KF CPU @ 3.70GHz.
- •
GPU: NVIDIA GeForce RTX 2060.
- •
All the testings that require the random number generator will initiate after the MATLAB command rng(1,'twister')111MATLAB built-in function to control the random number generator, see https://www.mathworks.com/help/matlab/ref/rng.html..
Chapter 2 Matrix Norms, Orthogonal Matrices and Singular
Value Decomposition
In this chapter, we will introduce several concepts that are important for our analysis. In the first section, vector norms and matrix norms will be presented. Then we will discuss an important class of matrices, orthogonal matrices. Finally, a crucial decomposition, singular value decomposition, will be discussed.
2.1 Norms
Matrix norms are essential for matrix algorithms. For example, the matrix norms provide a way of measuring the difference between two matrices and help us to check if we have the desired solution in finite arithmetic. In this section, we will do our demonstrations in the real world but the complex case is similar. Also, we will only introduce the norm for square matrices, but these theories are all applicable to any general matrices with careful attention to matching dimensions.
2.1.1 Vector Norms
Before introducing matrix norm, a brief illustration of vector norm is necessary.
Definition 2.1** (vector norm).**
A vector norm on is a function such that it satisfies the following properties
for all . 2. 2.
if and only if . 3. 3.
for all and . 4. 4.
for all .
Example 2.2**.**
A useful class of vector norm is the -norm
[TABLE]
One particular example of the -norm is the Euclidean norm or simply -norm defined by taking
[TABLE]
The latter equality is obvious.
2.1.2 Matrix Norms
Let us denote as matrices with real entries.
Definition 2.3** (matrix norm).**
A matrix norm on is a function satisfying the following properties
for all . 2. 2.
if and only if . 3. 3.
for all and . 4. 4.
for all .
One simple matrix norm is the Frobenius norm
[TABLE]
We will use the Frobenius norm intensively in the next section on the Jacobi algorithm.
Another important class of matrix norms is called induced norms or subordinate norms. Given a vector norm defined in Section 2.1.1, the corresponding induced norm is defined as
[TABLE]
Example 2.4**.**
The matrix 2-norm is induced by the vector 2-norm. From the definition of the induced norm,
[TABLE]
for some positive number , and is the corresponding chosen . Since for , hence we have . Also, by , we can pre-multiply at both sides and we have which implies is the eigenvalue of . In case we are choosing such that is maximum, therefore we have the definition of the matrix 2-norm
[TABLE]
where and denotes the largest eigenvalue and singular value of .
Definition 2.5** (consistent norms).**
A norm is consistent if it is submultiplicative
[TABLE]
for all such that the product defines.
Example 2.6**.**
The Frobenius norm and all subordinate (induced) norms are consistent. This is useful for constructing upper bounds.
2.2 Orthogonal Matrices
Recall the definition of the orthogonal matrix,
Definition 2.7** (orthogonal matrices).**
A matrix is said to be orthogonal if
[TABLE]
where denotes the identity matrix of size . If , then we call unitary rather than orthogonal and we denote its conjugate transpose as .
Theorem 2.8**.**
The vector 2-norm is invariant under orthogonal transformations.
Proof.
Given an orthogonal matrix and any vector , we have
[TABLE]
which proves the theorem. ∎
Theorem 2.9**.**
The Frobenius norm and matrix 2-norm are the orthogonally invariant norm. Namely, the following norm equality holds for these two norms
[TABLE]
Proof.
By the definition of the Frobenius norm, we have
[TABLE]
By the properties , (2.4) simplifies to
[TABLE]
Hence we finished the proof for the Frobenius norm.
By the definition of the matrix 2-norm, we have
[TABLE]
Notice that if is orthogonal, then and share the same eigenvalues. Therefore , therefore we have
[TABLE]
Hence we proved the theorem. ∎
2.2.1 Givens Rotations
Definition 2.10** (Givens rotation).**
A Givens rotation has the form
[TABLE]
where and for some .
Given a vector and a Givens rotation , the product rotate through radians clockwise in the plane. Hence it is obvious that we can make either or become zero by manipulating . This idea will be utilized in Section 3.1 in order to eliminate th entry of a symmetric matrix.
Proposition 2.11**.**
Givens rotations are orthogonal.
Proof.
Let be a Givens rotation. By using the trigonometric identity , we have the following
[TABLE]
We can calculate the following matrix products and using (2.7).
[TABLE]
Hence is orthogonal. ∎
Notice that the matrix can be written as
[TABLE]
where is the th column of the identity matrix if and and are defined as
[TABLE]
Proposition 2.12**.**
Suppose and is a Givens rotation. If we define as
[TABLE]
then and are the same except in rows and columns and .
Proof.
Using the notations in (2.8) and (2.9), matrix can be written as
[TABLE]
Hence if and for or , in general. ∎
2.2.2 Householder Transformations
Another important class of orthogonal matrix is the Householder transformations. Unlike the Givens rotation which replaces a specific entry into zero, Householder transformation can change multiple entries into zeros. In this section, we will first discuss how this matrix can set entries to zeros and then a MATLAB implementation will be present together with numerical testing.
Definition 2.13**.**
A Householder transformation (synonyms are Householder matrix and Householder reflector) is an matrix of the form
[TABLE]
The vector is the Householder vector.
Householder transformation has useful properties,
Symmetry:
[TABLE] 2. 2.
Orthogonality:
[TABLE]
Transforming a Vector
The Householder transformation can be used to zero components of a vector. Suppose we have a vector and we want
[TABLE]
where . Since is orthogonal, we require , hence where is the first column of the identity matrix. We can rewrite as
[TABLE]
and since is independent of the scaling of , hence for , we can set and we can choose . We can check our choice by computing : Firstly, we can compute and
[TABLE]
Substitute these two components into (2.10)
[TABLE]
The choice of the sign of depends on the sign of the first entry of . Suppose is dominant by its first entry, namely is close to a multiple of , then can have a small norm due to cancellation and this may result a large relative error in . This relative error can be avoided by setting
[TABLE]
as suggested in many textbooks, such as [14, , Section 19.1] and [9, , Section 5.1.3]. Then we get
[TABLE]
Given , instead of generating the Householder matrix such that it satisfies (2.11), we should only generate the Householder vector and the coefficient . The benefit of doing so can be seen from the Householder QR factorization algorithm in section 4.2.1. Based on the previous discussion, we can conclude these into Algorithm 1.
Algorithm 1**.**
Given , this algorithm computes and such that is orthogonal and .
- [TABLE]
Notice that, line 6 rewrites the form of to avoid subtractive cancellation as suggested by [9, , Algorithm 5.1.1]
[TABLE]
Implementation and Testing
During the MATLAB implementation, we do not need to create a new vector . It is sufficient to just overwrite the values of since the only difference between and in Algorithm 1 is the first entry of these two vectors.
1function [x,b] = house(x)
2n = length(x); sigma = x(2:n)'*x(2:n);
3mu = sqrt(x(1)*x(1) + sigma);
4if x(1) >= 0, x(1) = x(1) + mu;
5else, x(1) = -sigma/(x(1) + mu); end
6b = 2/(x(1)*x(1) + sigma);
7end
To test the above function, we first generate a vector using randn(5,1).
1x' =
2 -1.3499e+00, 3.0349e+00, 7.2540e-01, -6.3055e-02, 7.1474e-01
Using the above function and call [v,b] = house(x), we can construct using the formula . Then we examine the product and we have
1(P * x)' =
2 3.4748e+00 0 1.7694e-16 -9.1073e-18 1.1102e-16
The result satisfies our expectations.
The Householder transformation can quickly introduce zeros into a vector using orthogonal matrices and this is useful when we study the QR factorization.
2.3 Singular Value Decomposition
Theorem 2.14** (singular value decomposition).**
If , , then there exists two orthogonal matrices and such that
[TABLE]
where are all non-negative and arranged in non-ascending order. We denote (2.12) as the singular value decomposition (SVD) of and are the singular values of .
From now, without explicit mention, will be a square matrix of full rank (our input is always square, full rank matrix). Therefore, if we have the singular value decomposition , then are unitary and where .
From the definition, the singular values of can be computed via computing the eigenvalues of , where . Hence, the singular values of are the positive roots of the eigenvalues of . Based on this relationship, we have the following properties.
Corollary 2.15**.**
If of full rank and normal, then .
Proof.
If is normal, then . By spectral theorem, if is normal, then where is unitary and is a diagonal matrix with eigenvalues of on the diagonal. Therefore, the singular values of are
[TABLE]
Combine this Corollary and (2.2), we have: if is normal, then . ∎
2.4 Test Matices
Throughout this thesis, we will perform lots of MATLAB testing on our algorithms, hence we want to generate real symmetric matrices with desired condition number and different eigenvalue distributions.
To accomplish such a goal, we use the MATLAB built-in function gallery111A MATLAB built-in function to generate test matrices. For the full manual, you may refer to https://www.mathworks.com/help/matlab/ref/gallery.html.. If we use gallery('randsvd',n, -kappa) where kappa is a positive integer, then it will generate a symmetric positive definite matrix with . Our code my_randsvd from Appendix B.1 is the modified version of gallery('randsvd') and we deliberately change some eigenvalues to negative such that the output matrix will not necessarily be positive definite. It provides two different eigenvalue distributions.
Mode ’geo’, A = my_randsvd(n, kappa, 'geo'): The output matrix with and the magnitude of the eigenvalues of are geometrically distributed. 2. 2.
Mode ’ari’, A = my_randsvd(n, kappa, 'ari'): The magnitude of the eigenvalues of are arithmetically distributed.
In Chapter 5, we will test our algorithm on matrices with different distributions. For the rest of the thesis, we will use mode ’geo’ by default.
Chapter 3 Jacobi Algorithm
The Jacobi algorithm for eigenvalues are first published by Jacobi in 1846 [16] and became widely used in the 1950s after the computer is invented. In this chapter, we will first derive the Jacobi algorithm and discuss its linear convergence. Then the rest of this chapter will focus on two different Jacobi algorithms: Classical and Cyclic-by-row. MATLAB implementation and testing will be given.
3.1 Derivation and Convergence
Given a symmetric matrix , the idea of the Jacobi algorithm is that at the th step, we use to replace , where is orthogonal. We aim to have the property that the off-diagonal entries of are smaller than the off-diagonal entries of . To do this, we use the Givens rotation matrix defined in section 2.2.1.
Definition 3.1** (off operator).**
We define the quantity
[TABLE]
which is the Frobenius norm of the off-diagonal elements.
The aim of the Jacobi algorithm on can be translated to reduce the quantity . This can be done in 2 steps.
Choosing a pair of index . We assume that . 2. 2.
Overwriting by applying on the matrix in the way of
[TABLE]
where is chosen such that .
Using the Jacobi algorithm on , we produce a sequence of matrices \big{\{}A^{(k)}\big{\}}_{k=0}^{\infty}, where
[TABLE]
This set of iterations satisfy
[TABLE]
In the rest of this section, we will discuss how we can achieve (3.1) using step 2 and how we can construct such a matrix given the choice . Then in the next two sections, we present two different methods of choosing the index
- •
The Classical Jacobi Algorithm.
- •
The Cyclic-by-row Jacobi Algorithm.
Considering the subproblem which only involves four entries of .
[TABLE]
Since is symmetric (symmetry is preserved by orthogonal transformations), we have . Hence we can make further simplification to (3.2)
[TABLE]
To achieve step 2, we require , which means
[TABLE]
This equality is essential for us to find the desired Givens rotation and the construction will be discussed in section 3.1.1.
If we take the Frobenius norm on the first line of (3.2) and set , since the Frobenius norm is orthogonally invariant norm (Theorem 2.9), we have
[TABLE]
Using the notations in proposition 2.12 and the condition , we have
[TABLE]
Therefore, after steps and for some choices of the index pair , we always have
[TABLE]
and this reduction will never terminate until all the off-diagonal entries are zero. However, since each time we introduced a zero, the previous entry we chose will not necessarily remain zero, hence the will require an infinite number of iterations to converge to zero. In Section 3.2, we proved that even if we choose the largest at each iteration, we still require infinite iteration for converges to zero.
3.1.1 Choices of Givens rotation matrix
In this section, we provide a theory to choose and for constructing the Givens rotation matrix we need at each iteration. Firstly, if , there is no need to do further work since and we can simply choose and which lead to an identity matrix. Otherwise, we look at (3.4)
[TABLE]
Notice that and , we have
[TABLE]
By define , we have transformed (3.6) into
[TABLE]
This quadratic equation can be solved and we would like to choose the root of smaller magnitude.
[TABLE]
Using the fact that and , we can define and by
[TABLE]
Here we choose the smaller roots (in magnitude) of , so that we ensure (shown in Figure 3.1), hence the angle of rotation is bounded and this choice turns out to be important for the quadratic convergence as discussed by Schönhage [20, ].
3.1.2 Implementation and Testing
Section 3.1.1 can be summarized in the following algorithm.
Algorithm 2**.**
Given a symmetric matrix and integers such that . This algorithm computes a pair to construct as presented in (2.6), such that if , then .
- [TABLE]
When we do the MATLAB implementation, we need to make a minor change to lines 6 and 8. In practice, we need to change these to
[TABLE]
where we choose the positive sign if and the minus sign if . It is easy to prove these two sets of expressions are the same in exact arithmetic, however, they differ in finite precision (See Appendix A.1) and (3.9) is usually preferable. Then it is straightforward to implement the choice using MATLAB.
1function [c,s] = jacobi_pair(A,p,q)
2if A(p,q) == 0
3 c = 1; s = 0;
4else
5 tau = (A(q,q)-A(p,p))/(2*A(p,q));
6 if tau >= 0
7 t = 1/(tau + sqrt(1+tau*tau));
8 else
9 t = 1/(tau - sqrt(1+tau*tau));
10 end
11 c = 1/sqrt(1+tt); s = tc;
12end
13end
We can test this function by constructing the matrix with given and examine the entry using the following routine,
1clc; clear; close all;
2n = 100; format short e;
3A = my_testmatrices(n);
4pq = [57,64]; % choose a pair of index
5[c,s] = jacobi_pair(A,pq(1),pq(2));
6% construct the desire Givens rotation matrix
7G = eye(n); G([pq(1),pq(2)],[pq(1),pq(2)]) = [c,s;-s,c];
8B = G' * A * G;
9disp('Before'); disp(A(pq(1),pq(2))); disp(A(pq(2),pq(1)));
10disp('After');disp(B(pq(1),pq(2))); disp(B(pq(2),pq(1)));
This routine should give where and .
1Before
2 -9.4140e-01
3 -9.4140e-01
4After
5 -2.2204e-16
6 -1.1102e-16
Before applying the Givens rotations, the entries are and after such transformation we have and small enough for computers to consider them as zeros. Hence we successfully provide a way of constructing Givens rotations that achieve step 2 as discussed earlier in this section and the remaining part of this chapter is to sophistically choose the index pair .
3.2 Classical Jacobi Algorithm
From (3.6), we proved that , hence it is natural to consider the choice such that is maximal. This choice was first proposed by Jacobi [16] in 1846 and the future literature refer to it as the classical Jacobi algorithm. We can summarize this choice into Algorithm 3.
Algorithm 3**.**
Given a symmetric matrix and a positive tolerance , this algorithm overwrites with where is orthogonal.
- [TABLE]
The structure of Algorithm 3 adapts from [3, , Algorithm 2.1] and the stopping criterion is adapted from [9, , Section 8.5.5], [5, , Theorem 1.1] and [4, , Section 1].
3.2.1 Linear Convergence
Notice that since is chosen to be the largest off-diagonal element, hence we have the inequality
[TABLE]
Using (3.6), we have
[TABLE]
Denote as the th Jacobi update of , then we have the iterative bound
[TABLE]
This implies that the classical Jacobi algorithm converges linearly to a diagonal matrix whose entries are the eigenvalues of , and they may appear in any order.
Remark 3.2**.**
Although the term decrease at a rate of , if we consider the steps as one ‘group’ or ‘sweep’ of update, then we actually can expect asymptotic quadratic convergence [23, ] where
[TABLE]
where is the number of sweeps.
3.2.2 Implementation and Testing
Before we implement the Jacobi algorithm, we need two functions (i) the function that calculates the Frobenius norm of the off-diagonal entries and (ii) the function that finds the index of the maximum entry in modulus.
(i) can be done by carefully selecting the diagonal entries and set to zero, and then calculating the Frobenius norm of the new matrix.
1function offA = off(A)
2n = length(A); % get the dimension of the matrix A
3A(1:n+1:n*n) = 0; % set the diagonal entries to zero
4offA = norm(A,"fro"); % calculate the norm
5end
The third line of the code will eliminate the diagonal entries, then the fourth line will give us the desired output, .
(ii) can be implemented using the following function:
1function [p,q] = maxoff(A)
2n = length(A); % dimension of matrix A
3A(1:n+1:n*n) = 0; A = abs(A); % clear the diagonal entries
4[val, idx1] = max(A);
5[~, q] = max(val);
6p = idx1(q);
7end
At the fourth line, I use [val, idx1] = max(A) to find the index, idx1, of the maximum entries of each column and I store the values in val. Then we can apply max() again to find the index of the maximum entry of val, denoted as q. Finally, we locate the column number of the maximum entry by calling idx1(q) as shown in the sixth line.
Making use of these two functions, we can implement the classical Jacobi algorithm in MATLAB. The function jacobi_classical will take two inputs (i) the symmetric matrix and (ii) the tolerance . I choose to output two matrices, and such that and the number of iterations required.
Notice that if we would like to update by , since , the cost would be flops. However, we can utilize proposition 2.12, namely, we only need to update the and th rows and columns of , and this version of the update will only take flops. We can see this reduction in the following code
1[c,s] = jacobi_pair(A,p,q); G = [c s; -s c];
2A([p q],:) = G'*A([p q],:);
3A(:,[p q]) = A(:,[p q])*G;
From the code above, we get both the updated matrix and the orthogonal matrix . Line 2 requires a matrix multiplication between which needs flops, and the same argument applies to line 3. Hence when updating the matrix , we only require flops. Assemble all above, we have
1function [V,A,counter] = jacobi_classical(A,tol)
2counter = 0; n = length(A); V = eye(n); done_rot = true;
3tol1 = tol * norm(A);
4while done_rot
5 if isint(counter/(n*n)), A = (A + A')/2; end % maintain symmetry
6 done_rot = false; [p,q] = maxoff(A);
7 if abs(A(p,q)) > tol1 * sqrt(abs(A(p,p) * A(q,q)))
8 counter = counter + 1; done_rot = true;
9 [c,s] = jacobi_pair(A,p,q);
10 J = [c,s;-s,c];
11 A([p,q],:) = J'*A([p,q],:);
12 A(:,[p,q]) = A(:,[p,q]) * J;
13 V(:,[p,q]) = V(:,[p,q]) * J;
14 else
15 A = diag(diag(A)); % output diagonal matrix
16 break;
17 end
18end
19end
After several iterations, although in exact arithmetic, the matrix should always be symmetric, in floating-point arithmetic, we need to maintain its symmetry as shown in line 5. We can then test our code by the following routine
1clc;clear; format short e
2n = 1e2; A = randn(n); A = A + A'; tol = 2^(-53);
3[V,D,iter] = jacobi_classical(A,tol);
4disp('norm(AV - VD)'); disp(norm(A *V - V * D))
Here we should expect the norm is about which is in our test.
1norm(AV - VD)
2 4.3553e-13
Hence the algorithm works as expected.
3.3 Cyclic-by-row Jacobi Algorithm
The classical Jacobi method requires at each step the searching of for one maximum entry in modulus. For is large, this can be extremely expensive. It will be better if we fixed a sequence of choice , here in cyclic order shown in the following scheme suggested by [10, ] and [7, , Section 1.2]
[TABLE]
We keep using in this fashion until we meet the required tolerance and this procedure can be described in Algorithm 4.
Algorithm 4**.**
Given a symmetric matrix and a positive tolerance , this algorithm overwrites with where is orthogonal.
- [TABLE]
Similarly, the structure of Algorithm 4 adapts from [3, , Algorithm 2.1] and the stopping criterion is adapted from [9, , Section 8.5.5], [5, , Theorem 1.1] and [4, , Section 1].
3.3.1 Implementation and Testing
The MATLAB code is similar to the classical Jacobi algorithm, and the only difference is the way of choosing :
1function [V,A,iter] = jacobi_cyclic(A,tol,maxiter)
2n = length(A); V = eye(n); iter = 0; done_rot = true;
3while done_rot && iter < maxiter
4 done_rot = false;
5 for p = 1:n-1
6 for q = p+1:n
7 if abs(A(p,q)) > tol * sqrt(abs(A(p,p)*A(q,q)))
8 done_rot = true;
9 [c,s] = jacobi_pair(A,p,q);
10 J = [c,s;-s,c];
11 A([p,q],:) = J'*A([p,q],:);
12 A(:,[p,q]) = A(:,[p,q]) * J;
13 V(:,[p,q]) = V(:,[p,q]) * J;
14 end
15 end
16 end
17 if done_rot
18 A = (A + A')/2; iter = iter + 1;
19 else
20 A = diag(diag(A));
21 return;
22 end
23end
24end
Notice that, since the cyclic-by-row Jacobi algorithm does not reduce in the most optimal way, it may require much more iterations than the classical Jacobi algorithm. Hence we restrict the maximum number of iterations by maxiter at line 3.
Using the same routine as presented in Section 3.2.2, we have
1norm(AV - VD)
2 5.9258e-13
The required tolerance is and therefore we indeed have a spectral decomposition of at double precision.
3.4 Comparision
In the previous two sections, we discussed both the classical Jacobi algorithm and the cyclic-by-row Jacobi algorithm. However, we need to choose between them. In this section, we will examine the performance concerning both the dimension and the condition number . We can generate Figure 3.2 using code in Appendix B.3. Notice that, in the code, we call a function call jacobi_eig and this is the assembled version of the Algorithm 3 and 4 shown in Appendix B.2.
From the right figure, we can see the condition number does not affect the time too much. However, from the left figure, we observe that the downside of searching for the maximum off-diagonal entry is significant for large . The cyclic-by-row Jacobi algorithm uses about of the time used by the classical Jacobi algorithm for . Therefore, although the classical Jacobi reduces the in the most optimal way, in practice, we should always stick with the cyclic-by-row Jacobi algorithm.
Chapter 4 Orthogonalization
4.1 Introduction
From Chapter 3, given a symmetric matrix , we can use the Jacobi algorithm to find an spectral decomposition . However, for large , this procedure can still be time-consuming and is not generally as fast as the symmetric QR Algorithm [9, , Section 8.5]. However, the Jacobi algorithm can exploit the situation that is almost diagonal ( is small). In 2000, Drmač and Veselić proposed a way of speeding up this procedure using approximate eigenvectors as preconditioner [5, , Section 1]. They proposed the following
Given a symmetric matrix and its approximate orthogonal eigenvector matrix . 2. 2.
Computing . 3. 3.
Diagonalizing .
Suppose is orthogonal in double precision and step 2 and 3 are carried out at double precision, then the eigenvalues we computed will have a small relative error and more efficient than the pure Jacobi method [5, , Section 2]. In this section, we will provide a way of constructing this approximate vector matrix:
Compute the spectral decomposition at single precision such that
[TABLE]
where is the dimension of the real symmetric matrix and is the unite roundoff at single precision. This can be achieved using eig function in MATLAB via the following routine
1[Q_low,D] = eig(single(A));
2Q_low = double(Q_low); 2. 2.
Orthogonalize the matrix to such that it is orthogonal at double precision,
[TABLE]
In this chapter, we will first review two methods to orthogonalize : the QR factorization and the polar decomposition. We aim to find the orthogonal (at double precision) matrix that minimizes the norm where is the exact matrix of the eigenvectors of . However, in practice, we have no access to the matrix , therefore, we instead find the orthogonal matrix that minimizes . This problem can be transformed to: Given , find the orthogonal matrix such that it satisfies
[TABLE]
Finally, the code in MATLAB will be presented and we will access them to decide which to use in practice.
4.2 The Householder QR Factorization
The QR factorization of is a factorization
[TABLE]
where is orthogonal and is upper triangular. Suppose we have a QR factorization at double precision of ,
[TABLE]
where is orthogonal at double precision, and this is already our desired preconditioner. The QR factorization can be achieved using Householder transformations in Section 2.2.2.
4.2.1 Theory of Householder QR Factorization
The idea can be illustrated using a small matrix :
[TABLE]
where , , are the Householder transformations. Notice that only applies to the bottom right corner of the matrix, and this can be done by embedding inside a larger matrix via
[TABLE]
Hence from the example above, we can generalize the Householder QR factorization for . It will produce a sequence of matrices where is the input and is the upper triangular matrix . Notice, there are stages where the th stage transform to .
At the th stage of the Householder QR factorization, we can carefully partition the matrix into the following form
[TABLE]
where is upper triangular (achieved at the ()th stage) and is the vector that we should focused on the th stage. Choose a Householder transformation such that , where is the first column of and is the first entry of the vector . We then embed matrix into a larger matrix via
[TABLE]
Then let , we have
[TABLE]
which is closer to the upper triangular form. After steps, the matrix will be upper triangular and we denote as and we obtain the QR factorization of
[TABLE]
Since are composed of , Householder matrices, and the identity matrix. Hence it is obvious that are also symmetric and orthogonal. Then we can construct via . This procedure can be summarized in the Algorithm 5.
Algorithm 5**.**
Given , this algorithm computes an orthogonal matrix and an upper triangular matrix such that .
- [TABLE]
Line is directly adapted from (4.3). Line can be viewed by partitioning the orthogonal matrix into four parts and multiplying the matrix structured as (4.2),
[TABLE]
Hence, at each step of the algorithm, we only update the final columns of the matrix . Also, focused on the line 4, if we construct and do a matrix-matrix multiplication between where , then the cost will be about . If we utilize the components we computed from Algorithm 1, the Householder vector and the coefficient , we can reduce the computational cost by doing the matrix-vector products instead. We can rewrite using and
[TABLE]
By this procedure, we transform a matrix-matrix multiplication to
Vector-matrix multiplication: requires flops. 2. 2.
Scalar-vector inner product: requires flops. 3. 3.
Vector-vector outer product: requires flops. 4. 4.
Matrix-matrix subtraction: requires flops.
Therefore, the overall cost will be flops. Compare with the matrix-matrix multiplication which requires flops, this approach is much more efficient.
4.2.2 Implementation and Testing
Based on these analyses in Section 4.2.1, we can implement it into MATLAB.
1function [Q,A] = myqr(A)
2[m,n] = size(A); Q = eye(n);
3if m ~= n, error('Input should be a square matrix.'), end
4for k = 1:n-1
5[v,b] = house(A(k:n,k)); % compute the components of P
6A(k:n,k:n) = A(k:n,k:n) - (bv)(v'*A(k:n,k:n)); % update A
7Q(:,k:n) = Q(:,k:n) - (Q(:,k:n)v)(b*v'); % update Q
8end
Recall from the last section, we discussed that at the th step, updating requires about flops. Similarly, we can update based on (4.4) which only involves columns of . Therefore, line 7 requires flops at the th step. We can then compute the theoretical overall cost,
[TABLE]
To test our code, we are focused on two quantities, and . The former evaluates whether our computed is orthogonal at double precision and the latter evaluates whether we have a QR factorization. In theory, we should have
[TABLE]
We can use the following MATLAB routine, notice that we add the accuracy of the MATLAB built-in function qr as a reference.
1clc; clear; format short e;
2n = 1e2; ud = 2^(-53); A = randn(n);
3[Q,R] = myqr(A); [Q1,R1] = qr(A); % qr factorization
4orthg = norm(Q'*Q - eye(n)); % check orthogonality
5qralg = norm(Q*R - A)/norm(A); % check if we have a qr factorization
6fprintf('Orthogonal? %d \nMy QR accuracy is %d\n',orthg, qralg);
7fprintf('MATLAB qr function accuracy is %d \n', norm(Q1*R1 - A)/norm(A));
8fprintf('n*(machine precision) = %d\n', ud*n);
And we have the output:
1Orthogonal? 3.567685e-15
2My QR accuracy is 1.493857e-15
3MATLAB qr function accuracy is 1.026458e-15
4n*(machine precision) = 1.110223e-14
The first and second outputs are all smaller than therefore our code works well. In addition, the second and the third outputs are the accuracy of functions myqr and qr and we can see these are about the same, therefore our code achieves similar accuracy as the MATLAB built-in function.
Moreover, I am interested in how behaved as increases. From Figure 4.1, we can see does not increase too much when increases and is well bounded by the reference line .
By now, we have a way to orthogonalize a matrix. However, this approach does not utilize the property that our input matrix is an almost orthogonal matrix. Namely, applying myqr to has no difference from applying myqr to a general matrix. In the next section, we will introduce a way to orthogonalize a matrix which exploits the fact that the input is almost orthogonal.
4.3 The Polar Decomposition
In complex analysis, it is known that for any , we can write in polar form, namely . The polar decomposition is its matrix analogue. The polar decomposition can be derived from the singular value decomposition discussed in Section 2.3. Conventionally, we will present our analysis in complex but this can be restricted to real. All the norm discussed in this section will be the orthogonally invariant norm.
Theorem 4.1** (Polar decomposition).**
If , then there exists a unitary matrix and a unique Hermitian positive semidefinite matrix such that . We call the unitary factor.
Proof.
By Theorem 2.14, suppose and , then the SVD of can be written as
[TABLE]
where . Unitarity of can be easily proved by its definition
[TABLE]
is a positive semidefinite due to its eigenvalues are the singular values of which are all non-negative. By forming , we have , therefore can be uniquely determined by [15, , Theorem 1.29]. ∎
Notice that, if is full rank, then is clearly non-singular by construction, therefore can be uniquely determined by . Similar to the QR factorization, we have a decomposition of where one of the factors is unitary, and we expect that we can use this unitary factor as the preconditioner. The next section gives a special property which shows the unitary factor of the polar decomposition of is the one that minimizes the distance .
4.3.1 Best Approximation Property
The unitary factor of the polar decomposition of satisfies the best approximation property.
Theorem 4.2** ([6, , Theorem 1],[15, , Theorem 8.4]).**
Let and is its polar decomposition. Then
[TABLE]
holds for every unitarily invariant norm.
To prove Theorem 4.2, we need the following two results:
Lemma 4.3** ([18, , Lemma 4]).**
Let be Hermitian matrices, and denote by
[TABLE]
be the eigenvalues of and respectively. Then if we denote by the numbers arranged in non-ascending order of magnitude, then we have
[TABLE]
Proof.
See [6, , Theorem 2]. ∎
Theorem 4.4** ([18, , Theorem 5], [15, , Theorem B.3]).**
Let and be the singular values of the complex matrices and respectively. Then
[TABLE]
where and are diagonal matrices with entries the singular values of and arranged in non-ascending order respectively.
Proof.
We have the fact by H. Wielandt mentioned in [6, , p. 113]: For any matrix of order , the Hermitian matrix
[TABLE]
of order . The eigenvalues of are precisely the singular values of and their negatives. We denote the singular values of and we have
[TABLE]
and the eigenvalues of these three Hermitian matrices are
[TABLE]
By Lemma 4.3, if we denote the numbers arranged in non-ascending order of magnitude, then we have
[TABLE]
Here we focus on the proof for -norm, to see the proof considering all the orthogonally invariant norms, refer to [18, , Theorem 1]. By Corollary 2.15
[TABLE]
Proof of Theorem 4.2.
For any unitary matrix , it has singular values all equal to . Hence by Theorem 4.4, if has the singular value decomposition , then
[TABLE]
Remain to prove that . From Theorem 4.1, the Hermitian part can be written as , hence
[TABLE]
which completes the proof. ∎
This property indicates that the unitary factor of the input will be the desired orthogonal matrix and then we can use it as the preconditioner. The rest of this section will focus on how to compute this factor.
4.3.2 SVD approach
The most obvious approach will be the singular value decomposition approach.
For , by Theorem 4.1, we can use the singular value decomposition to compute the polar decomposition using the following procedure:
Compute the singular value decomposition of where . 2. 2.
Form via (4.5), .
The computational cost of computing the singular value decomposition is approximately flops [9, , Figure 8.6.1] with flops to form . This is good generally, but it does not utilize the property that, in our scenario, is almost orthogonal ( is small). The next two sections will discuss a more sophisticated way of computing the unitary factor.
4.3.3 Newton’s Method Approach
Consider the Newton iteration [15, , Section 8.3]
[TABLE]
where denotes the conjugate transpose of the inverse of . The following theorem shows the iteration (4.6) converges to the unitary factor of .
Theorem 4.5** (Newton iteration for the unitary polar factor).**
Suppose is nonsingular, then the iteration (4.6) produces a sequence such that
[TABLE]
where is the unitary factor of the polar decomposition of .
Proof.
See [13, , Section 3.2].
Let the singular value decomposition of be , where and are unitary. Then by Theorem 4.1, we have where and . The trick of the proof is to define a new sequence where
[TABLE]
From iteration (4.6), we have
[TABLE]
Notice that and , hence the iteration becomes
[TABLE]
Since is diagonal with positive diagonal entries (since is full rank), hence the iteration (4.8) is well-defined and will produce a sequence of diagonal matrices and we can write , where . Using this notation, we can rewrite the iteration (4.8) elementwise:
[TABLE]
where . By an argument discussed in [12, , p. 84], we can write
[TABLE]
This implies that the diagonal entries of converge to quadratically as . Since the argument holds for any , we conclude that and use the definition of we have
[TABLE]
as , where is the unitary factor of the polar decomposition of . ∎
The Newton iteration for the unitary factor of a square matrix can also be derived by applying the Newton’s method to the equation . Consider a function , then is unitary if is a zero of the function .
Suppose is an approximate solution to the equation , we can write and substitute into the equation
[TABLE]
Dropping the second order term, we get Newton iteration and this is the Sylvester equation for [15, , Problem 8.18]. We aim to solve and update by , which gives the Newton iteration at the th step:
[TABLE]
Assume that is Hermitian, namely , then we can further simplify the iteration by rewritten (4.10) as and the expression we got for is indeed Hermitian, hence our assumption is valid. Therefore, we have a explicit expression for , , and the iteration (4.10) becomes
[TABLE]
choosing , we have our desired Newton iteration (4.6).
Convergence of Newton Iteration
The convergence of the iteration (4.6) can be seen from the proof of Theorem 4.5.
Theorem 4.6** (Convergence of the iteration (4.6), [15, , Theorem 8.12]).**
Let be non-singular. Then the Newton iterates in (4.6) converges quadratically to the unitary polar factor of with
[TABLE]
where is any unitarily invariant norm.
Proof.
Given , the singular value decomposition of is and the unitary factor is . From (4.9), we have
[TABLE]
This is true for all , therefore we can rewrite this in matrix form
[TABLE]
Recall , then (4.11) is same as . Therefore we can conclude that converges to the identity quadratically and therefore converges to the unitary factor quadratically.
From , by taking the norm on both sides we have
[TABLE]
The final inequality comes from the fact that every unitarily invariant norm is subordinate [2, , Proposition IV 2.4]. Also, the unitary invariance implies the following three equalities,
- •
.
- •
.
- •
Using the same argument as 1, we have .
As a result, (4.12) can be rewritten as
[TABLE]
∎
We can test the iteration (4.6) by the following code
1A = my_testmatrices(10,100); X = A;
2for i = 1:10
3Y = inv(X);
4X = (Y' + X)/2;
5end
6[P,~,V] = svd(A); U = P*V';
7disp(norm(X-U));
With only iterations, the distance between our computed polar factor and the theoretical polar factor (formed by singular value decomposition) is as small as the unit roundoff. However, the disadvantage of this iteration is that it requires the explicit form of the inverse of a matrix at each step, therefore we can introduce another method using one step of the Schulz iteration [1, ] which will remove the requirement of the matrix inverse.
4.3.4 Newton–Schulz Iteration Approach
Given , an iterative method for computing the inverse , proposed by Schulz [21, , p. 58], is defined by
[TABLE]
where is an approximation to . Based on the (4.6), we can use a one-step Schulz iteration to approximate the inverse of which can be written as , where is an approximation to . In our situation, the input is an almost unitary matrix ( is small) and the output is which is unitary. Hence the sequence generated by the Newton iteration, , should always be almost unitary, therefore should be a good approximate to . The idea of Newton–Schulz iteration can be formulated by the Algorithm 6.
Algorithm 6**.**
Given such that , the algorithm computes the unitary factor of defined by Theorem 4.1.
- [TABLE]
Notice that, the one-step Schulz iteration can be embedded into the Newton iteration by substituting the from line into line and we have
[TABLE]
and this is known as the Newton–Schulz iteration [15, , Section 8.3].
Convergence of Newton–Schulz Iteration
We would like to know, the iteration (4.14) converges to under which condition. Using the similar approach as proof of Theorem 4.5, by writing where and are the unitary parts of the singular value decomposition of , we can consider the iteration (4.14) as the scalar form:
[TABLE]
where is the th singular value of . To see the convergence of (4.14), we first find the interval of convergence of (4.15).
Lemma 4.7**.**
If , then the iteration (4.15) converges quadratically to .
Proof.
If , then the iteration converges immediately. To simplify the notation, we rewrite the iteration (4.15) as
[TABLE]
Case 1: If , then which gives . Differentiate once we have and this will always be positive for . Therefore, if we start with , then , namely, start from , then the iteration (4.16) will converges to .
Case 2: If , then by similar approach we have the following
[TABLE]
Therefore, the function will monotonically decreasing from to [math] within the interval and . Hence, if , then will belongs to , then the iteration (4.16) converges to as discussed in case . The properties of the function in both case and case can be seen from the Figure 4.2 which shows that for , the function monotonically increasing from [math] to and for , the function monotonically decreasing from to [math].
Other parts of the real line may still converges to but the interval of converges are too short to consider and control (for example, if we start with , then and which will converges to ).
Remain to show that the convergence is of order . By consider the difference (similar approach as (4.9)), we have
[TABLE]
which implies quadratic convergence. ∎
From the quadratic convergence of the scalar Newton–Schulz iteration, we have the following theorem.
Theorem 4.8** (Convergence of Newton–Schulz iteration (4.14)).**
For of rank which has the polar decomposition , if or equivalently , then the Newton–Schulz iteration
[TABLE]
converges to the unitary factor quadratically as and
[TABLE]
Proof.
The result is obvious follows the proof of Theorem 4.5 and 4.6 and Lemma 4.7. ∎
Implementation and Testing
The Newton–Schulz iteration can be utilized on the almost orthogonal matrix as discussed in Section 4.1 since the eigenvalues of are all near which definitely satisfies the convergence condition for the Newton–Schulz iteration. For , iteration (4.14) can be formalized by Algorithm 7 with restrictions on the singular values of based on Theorem 4.8.
Algorithm 7**.**
Given a matrix with , this algorithm computes the unitary factor of the polar decomposition of using the Newton–Schulz iteration (4.14).
- [TABLE]
Notice that, in line 3, we use the Frobenius norm to examine the orthogonality instead of the 2-norm, this is due to the Frobenius norm being easier and cheaper to compute. We can further simplify the algorithm based on the following analysis.
From now on, the explanation will focus on the 2-norm, but it can be generalized to all unitarily invariant norms. Suppose has a singular value decomposition , then we can rewrite as
[TABLE]
Since where are sorted in non-ascending order. Using Corollary 2.15 and the fact that the singular values of the almost unitary matrix are around one, we have , therefore
[TABLE]
From relation (4.17) and the fact that , using the notations in (4.14), we have
[TABLE]
Similarly, after the second iteration, we have
[TABLE]
We can compare this quantity with the desired tolerance and we can produce Figure 4.3 using Appendix B.5.
From Figure 4.3, we observe that for less than , Algorithm 7 will converge in steps. If the dimension exceeds , by the same analysis, for any , if , then the Newton–Schulz iteration will converge in three steps. In practice, MATLAB is not able to generate such a big matrix. Therefore, we can remove the terminate condition in Algorithm 7 as shown in Algorithm 8,
Algorithm 8**.**
Given a matrix such that and , this algorithm computes the unitary factor of the polar decomposition of using the Newton–Schulz iteration (4.14).
- [TABLE]
Implementation of Algorithm 8 in MATLAB gives
1function A = newton_schulz(A)
2[~,n] = size(A);
3if n >= 52000, iter = 3;
4else, iter = 2; end
5for k = 1:iter
6A = 0.5A(3*eye(n) - A'*A);
7end
For each iteration, there are only 2 matrix-matrix multiplications which cost flops and hence the total cost when applying newton_schulz to will be about flops. Compare to the singular value decomposition approach discussed in Section 4.3.2 which costs about , the Newton–Schulz iteration is usually the preferred method when the input is almost unitary.
We will perform the following test: Given a matrix with fixed condition number, here , we compute its spectral decomposition at single precision . Then we use this as our input of newton_schulz and see how the quantity , where is the output of the function, changes as the dimension changes. Using the code from Appendix B.6, we produce Figure 4.4.
From Figure 4.4, the quantity is well bounded by the tolerance and the result is consistent with our expectations and analyses.
4.4 Numerical Comparison
In this section, we will examine the performance and accuracy of both the QR factorization and the polar decomposition using the Newton–Schulz iteration. Theoretically, for our almost orthogonal matrix , the QR factorization requires about flops and the Newton–Schulz iteration method requires about flops. However, MATLAB computes matrix-matrix products faster, hence Newton–Schulz iteration has the potential to be faster than the QR factorization. We set up our numerical comparison via the following
Generate the symmetric matrix and its approximate spectral decomposition . Here, we fixed . Even though no matter how we change the condition number of , the eigenvector matrix at single precision, , will always have condition number near , recall this is also the reason why we can always apply the Newton–Schulz iteration to the matrix . 2. 2.
For different , applying myqr and newton_schulz to and output which has been numerically proved that it is orthogonal at double precision, then we examine these functions via
- (a)
Accuracy: . 2. (b)
Performance : time used measured by tic,toc.
4.4.1 Accuracy
The accuracy is measured by where is the output of myqr and newton_schulz. From previous testings, both QR factorization and Newton–Schulz iteration are accurate enough to generate an orthogonal matrix at double precision. Therefore, here we are comparing the accuracy between these two methods shown in Figure 4.5. In addition, we add the accuracy of the MATLAB built-in function qr() served as a reference line.
From the figure, we can see that the Newton–Schulz iteration is generally more accurate than the QR factorization method.
4.4.2 Performance
The performance can be measured by the MATLAB built-in function tic and toc. Using similar code from Section 4.4.1, we are able to compare the time used by myqr and newton_schulz and produce Figure 4.6.
From the figure, the Newton–Schulz iteration is generally faster than the QR factorization approach.
Consequently, based on the accuracy and performance, we shall stick with the Newton–Schulz iteration approach as the method to orthogonalize the matrix .
Chapter 5 Mixed Precision Jacobi Algorithm
In this chapter, a mixed precision algorithm will be presented with intensive testings on its convergence, speed and accuracy. As mentioned in the Section 2.4, we will perform the testings for both mode 'geo' and mode 'ari'.
5.1 Algorithm
As discussed from [5, ], we can improve the Jacobi algorithm by using a preconditioner. From the discussions in Chapter 3 and 4, we assemble them all and collapse into the following algorithm,
Algorithm 9**.**
Given a symmetric matrix and a positive tolerance , this algorithm computes the spectral decomposition of using the cyclic-by-row Jacobi algorithm with preconditioning.
- [TABLE]
5.1.1 Implementation and Testing
It is straightforward to implement Algorithm 9 by utilizing the existing codes. MATLAB code can be found in Appendix B.9.
We would like to first access its accuracy concerning both the dimension and the condition number. Using code in Appendix B.10 we can produce Figure 5.1.
As shown from the figure, the Algorithm 9 gives more accurate results for the matrices with geometrically distributed singular values. But regardless of the distributions, the computed and satisfies the condition
[TABLE]
and therefore this algorithm can be used to find the spectral decomposition of a real symmetric matrix in practice. The remaining of this chapter is to find out two things
How many sweeps of Jacobi will be necessary for convergence for a preconditioned real symmetric matrix as described in Algorithm 9? 2. 2.
How much time can be saved by using preconditioning?
5.2 Convergence and Speed
In this section, we will analyze how many sweeps will be sufficient for the cyclic-by-row Jacobi algorithm converges for a preconditioned symmetric matrix .
5.2.1 Preconditioning
Using the notations from Algorithm 9, is the nearest orthogonal matrix compare to the approximate eigenvector matrix , hence is expected to be almost diagonal. Consequently, we expect this eigenvalue problem should be able to be solved more quickly by the Jacobi algorithm [5, , Sec. 1, Corollary 3.2]. Quantitatively, after preconditioning, the magnitude of the off-diagonal entries of should be no more than , which is equivalently saying . In practice, this bound is sharper than the theoretical bound and we can have
[TABLE]
This upper bound can be numerically tested by the following procedure: First, generate a symmetric matrix with controlled dimension and condition number, then we perform steps 1,2 and 3 in the Algorithm 9 to see how much is reduced based on this precondition technique. Using the code in Appendix B.11, we generate Figure 5.2.
From the top-left, top-right and bottom-left parts of the Figure 5.2, we see that for and , the quantity is always smaller or similar to regardless of its singular values distribution. Notice that, as changes from to , the perturbation of the quantity becomes more and more negligible compare to . Together with the bottom-right figure, as the size of passes , the quantity is always well bounded by . Based on these observations, we verify the idea of preconditioning and prove numerically that after preconditioning by , the off-diagonal entries are reduced significantly and this property can be captured by the Jacobi algorithm.
5.2.2 Quadratic Convergence
Based on the preconditioning process, in this section, we will discuss the convergence of the cyclic Jacobi algorithm applied to a preconditioned real symmetric matrix. Then we will deliver some numerical testings concerning different dimensions and condition numbers. Finally, we will test the improvements in speed by using the preconditioners.
Recall that the number of iterations in one sweep is equal to half of the number of off-diagonal entries,
[TABLE]
From the previous derivation, the Jacobi algorithm is proved to converge linearly. However, an improvement by Schönhage in 1961 [20, ] stated that the Jacobi algorithm for symmetric matrices with distinct eigenvalues is converging quadratically. Let us denote be the matrix produced by the th iteration of the Jacobi algorithm and therefore is the original input matrix. Denote be the eigenvalues of and suppose we have
[TABLE]
Suppose we reach the stage that the and the angles of rotation generate by each iteration are smaller than , as controlled in Section 3.1.1, Schönhage’s result said
[TABLE]
which implies quadratic convergence. Later in 1962, Wilkinson provided a sharper bound [23, , Section 3]. Under the same condition as described by Schönhage, Wilkinson proposed
[TABLE]
In 1966, van Kempen proved the quadratic convergence without the assumption of distinct eigenvalues. In his paper [22, , Section 2], instead of define as in (5.2), he denoted as
[TABLE]
Suppose after iterations, , then
[TABLE]
Note that although the above bound is correct, the proof in [22, ] is not correct. This mistake was unveiled by Hari who proposed a sharper bound [11, , Section 2].
Back to our situation, as described in Section 5.2.1, after preconditioning, . Assuming our eigenproblem is well conditioned and the minimum distance between the distinct eigenvalues is large enough such that
[TABLE]
where is defined by (5.3). Then we can expect quadratic convergence and the number of iterations should not exceed five since it will only take sweeps for the quantity reduces from to ().
5.2.3 Numerical Testing
By referring to the function jacobi_precondi in Appendix B.9, we can output the number of iterations required. Therefore, we can investigate the number of sweeps required using the routine from Appendix B.12. The procedure is exactly the same as the testing in Section 5.1.1 and we can produce Figure 5.3.
Suppose our testing matrices are not ill-conditioned, then for the matrices with arithmetically distributed singular values, it usually requires two iterations for the cyclic-by-row Jacobi algorithm to converge since the eigenvalue are relatively far apart and this can result in quadratic convergence as discussed in Section 5.2.2. However, for the matrices with geometrically distributed singular values, the algorithm can take four iterations to converge. By construction, the geometrically distributed singular values has smaller in (5.3) and this can result in slower convergence since it will require more sweep for to satisfies . We can exaggerate this observation by generating a matrix with very close eigenvalues.
1close all; clear; clc; rng(1,'twister');
2A_geo = my_randsvd(1e3,1e16,'geo');
3A_ari = my_randsvd(1e3,1e16,'ari');
The matrix A_geo has , whereas the matrix A_ari has . Clearly, the Algorithm 9 can hardly attained the quadratic rate of convergence on A_geo.
After applying Algorithm 9 on both A_geo and A_ari, the former testing requires 25 sweeps to converge and the latter one only need three sweeps. Therefore, the Jacobi algorithm performs better when the distance between distinct eigenvalues are large since the algorithm can take advantage of quadratic convergence.
The final testing is to see how much time the mixed precision algorithm can save, we can use the methodology in Section 4.4.2 to produce Figure 5.4. For large , the benefit of using preconditioner is significant for both distributions of the singular values.
Also, besides using the figure to visualize the difference in time, we can calculate how much the algorithm is improved for each via
[TABLE]
where and are the time used by the cyclic-by-row Jacobi algorithm without and with preconditioning.
1>> format short
2>> (t_normal_g - t_precond_g)./t_normal_g % geometrically distributed SVs
3ans =
4 0.7150 0.7574 0.7739 0.7747 0.7789
5>> (t_normal_a - t_precond_a)./t_normal_a % Arithmetically distributed SVs
6ans =
7 0.7294 0.7513 0.7543 0.7619 0.7663
The entries are corresponding to . Based on the data, we can conclude that by using the preconditioning technique stated in Algorithm 9, we saved roughly of time.
Chapter 6 Conclusion and Further Work
We studied the classical Jacobi algorithm and the cyclic-by-row Jacobi algorithm and concluded that the cyclic-by-row Jacobi algorithm is much more efficient than the other one. Then, after performing implementation and testing on the QR factorization and the Newton–Schulz iteration, in order to orthogonalize an almost orthogonal matrix, we shall always use the Newton–Schulz iteration. Finally, we followed the preconditioning technique in [5, ] and successfully propose the mixed precision algorithm 9 for the real symmetric eigenproblem. By utilizing the eig function at single precision and preconditioning, we can save roughly 75% of the time compared to applying the Jacobi algorithm alone.
Further works of this thesis include:
Rounding error analysis hasn’t been conducted. 2. 2.
There exist more ways to orthogonalize an almost orthogonal matrix. For example, the Cholesky QR factorization. 3. 3.
Only one low precision level has been studied. These theories can apply to half precision as well. However, it requires more work on exploring a half precision version of eig function in order to become faster than our Algorithm 9 which uses MATLAB built-in single precision. 4. 4.
There are other ways that can be used to speed up the Jacobi algorithm. For example, the threshold Jacobi algorithm [24, , Section 5.12] and the block Jacobi procedure [9, , Section 8.5.6].
Appendix A Supplementary Explanation
A.1 Algorithm 2
We have two different expressions for computing as shown in Table A.1.
These two expressions are the same by simple calculation,
[TABLE]
However, when we implement these into MATLAB, approach 1 will output for , and approach 2’s output gives even for . Namely, by using approach 1, the computed output is less accurate than the output from approach 2.
Appendix B Supplementary Code
B.1 Testing Matrices
By using the following code, we can generate a random real symmetric matrix with predefined condition number.
1function A = my_randsvd(n, kappa, mode)
2classname = 'double';
3if isempty(mode)
4 mode = 'Geo';
5end
6switch mode % Set up vector sigma of singular values.
7 case {'Geometric','geometric','Geo','geo'}
8 % geometrically distributed singular values
9 factor = kappa^(-1/(n-1));
10 sigma = factor.^(0:n-1);
11 case {'Arithmatic', 'arithmatic', 'Ari', 'ari'}
12 % arithmetically distributed singular values
13 sigma = ones(n,1) - (0:n-1)'/(n-1)*(1-1/kappa);
14 otherwise
15 error(message('MATLAB:randsvd:invalidMode'));
16end
17sigma = cast(sigma,classname); signs = sign(rand(1,n-2));
18sigma(2:n-1) = sigma(2:n-1).* signs;
19Q = qmult(n,1,classname);
20A = Q*diag(sigma)*Q';
21A = (A + A')/2;
B.2 Full Jacobi Algorithm Code
1function [V,D,counter] = jacobi_eig(Aini,tol,type,maxiter)
2%JACOBI_EIG Jacobi Eigenvalue Algorithm
3% [A,V,counter] = JACOBI_EIG(Aini,tol,type,maxiter) computes the eigen-
4% decomposition of Aini, Aini = VDV', where V is orthogonal.
5%Input:
6% Aini -------- Input matrix
7% tol --------- Desired tolerance, usually 1.1e-16 (double precision)
8% type -------- Type of Jacobi algorithm, either
9% * 'classical': For classical Jacobi algorithm
10% * 'cyclic': For cyclic-by-row Jacobi algorithm
11% maxiter ----- Maximum number of sweep, specially for cyclic-by-row
12% Jacobi algorithm. If undetermined, the maximum sweep will
13% be set as 10 sweeps.
14%Output:
15% V ------------ Orthogonal matrix
16% D ------------ Resulting matrix after apply Jacobi algorithm.
17% counter ------ Number of iteration
18if nargin <= 2, error('Not enough input arguments.');
19elseif nargin >= 5, error('Too many input arguments.');
20else
21 % select method, 'classical' or 'cyclic'
22 switch type
23 case {'classical','Classical'}
24 method = 1;
25 case {'cyclic','Cyclic'}
26 method = 2;
27 otherwise
28 error("The Jacobi algorithm avaliable are " + ...
29 "'classical' and 'cyclic'")
30 end
31 switch method
32 case 1
33 [V,D,counter] = jacobi_classical(Aini,tol);
34 case 2
35 switch nargin
36 case 4
37 maxiteration = maxiter;
38 [V,D,counter] = jacobi_cyclic(Aini,tol,maxiteration);
39 case 3
40 maxiteration = 10;
41 [V,D,counter] = jacobi_cyclic(Aini,tol,maxiteration);
42 otherwise
43 error('Not enough input arguments.');
44 end
45 end
46end
47end
48
49% classical Jacobi algorithm
50function [V,A,counter] = jacobi_classical(A,tol)
51counter = 0; n = length(A); V = eye(n); done_rot = true;
52tol1 = tol * norm(A);
53while done_rot
54 if isint(counter/(n*n)), A = (A + A')/2; end
55 done_rot = false; [p,q] = maxoff(A);
56 if abs(A(p,q)) > tol1 * sqrt(abs(A(p,p) * A(q,q)))
57 counter = counter + 1; done_rot = true;
58 [c,s] = jacobi_pair(A,p,q);
59 J = [c,s;-s,c];
60 A([p,q],:) = J'*A([p,q],:);
61 A(:,[p,q]) = A(:,[p,q]) * J;
62 V(:,[p,q]) = V(:,[p,q]) * J;
63 else
64 A = diag(diag(A));
65 break;
66 end
67end
68end
69
70% Cyclic-by-row Jacobi algorithm
71function [V,A,iter] = jacobi_cyclic(A,tol,maxiter)
72n = length(A); V = eye(n); iter = 0; done_rot = true;
73while done_rot && iter < maxiter
74 done_rot = false;
75 for p = 1:n-1
76 for q = p+1:n
77 if abs(A(p,q)) > tol * sqrt(abs(A(p,p)*A(q,q)))
78 done_rot = true;
79 [c,s] = jacobi_pair(A,p,q);
80 J = [c,s;-s,c];
81 A([p,q],:) = J'*A([p,q],:);
82 A(:,[p,q]) = A(:,[p,q]) * J;
83 V(:,[p,q]) = V(:,[p,q]) * J;
84 end
85 end
86 end
87 if done_rot
88 A = (A + A')/2; iter = iter + 1;
89 else
90 A = diag(diag(A));
91 return;
92 end
93% fprintf('off(A) = %d, sweep = %d\n', off(A), iter);
94end
95end
96
97% compute the Frobenius norm of the off-diagonal entries of the input matrix
98function num = off(A)
99n = length(A); A(1:n+1:n*n) = 0;
100num = norm(A,'fro');
101end
102
103% find the index of maximum off-diagonal entry of the input matrix A
104function [p,q] = maxoff(A)
105n = length(A); % dimension of matrix A
106A(1:n+1:n*n) = 0; A = abs(A); % clear the diagonal entries
107[val, idx1] = max(A);
108[~, q] = max(val); p = idx1(q);
109end
110
111% calculate the Givens rotation matrix's entries (c,s).
112function [c,s] = jacobi_pair(A,p,q)
113if A(p,q) == 0
114 c = 1; s = 0;
115else
116 tau = (A(q,q)-A(p,p))/(2*A(p,q));
117 if tau >= 0
118 t = 1/(tau + sqrt(1+tau*tau));
119 else
120 t = 1/(tau - sqrt(1+tau*tau));
121 end
122 c = 1/sqrt(1+t*t);
123 s = t*c;
124end
125end
B.3 Code for Figure 3.2
Figure 3.2 can be regenerated using the following routine
1clc; clear; close all; format short e;
2condi1 = 100; tol = 2^(-53);
3rng(1,'twister');
4for n = 1e2:1e2:1e3
5 A = my_randsvd(n,condi1,'geo');
6 fprintf('iteration of n %d/%d\n', n/1e2, 1e3/1e2);
7 tic
8 [V1,D1,iter1] = jacobi_eig(A,tol,'Classical');
9 t_classical_n(n/1e2) = toc;
10 tic
11 [V2,D2,iter2] = jacobi_eig(A,tol,'Cyclic',20);
12 t_cyclic_n(n/1e2) = toc;
13end
14nplot = 1e2:1e2:1e3;
15n = 2.5e2;
16rng(1,'twister');
17for condi = 1e3:1e3:1e5
18 A = my_randsvd(n,condi,'geo');
19 fprintf('iteration of cond %d/%d...',condi/1e3,1e5/1e3);
20 tic
21 [V1,D1,iter1] = jacobi_eig(A,tol,'Classical');
22 t_classical_condi(condi/1e3) = toc;
23 tic
24 [V2,D2,iter2] = jacobi_eig(A,tol,'Cyclic',20);
25 t_cyclic_condi(condi/1e3) = toc;
26 fprintf('complete \n');
27end
28condiplot = 1e3:1e3:1e5;
29close all;
30subplot(1,2,1); hold off;
31plot(nplot,t_classical_n,'-^','LineWidth',2);
32hold on;
33plot(nplot,t_cyclic_n,'-^','LineWidth',2);
34xlabel('Dimension ');
35ylabel('Time Used (sec)');
36title('Fixed ');
37legend('Classical Jacobi','Cyclic Jacobi','Location','northwest','FontSize',20);
38axis square
39subplot(1,2,2); hold off;
40plot(condiplot,t_classical_condi,'-^','LineWidth',1.5);
41hold on;
42plot(condiplot,t_cyclic_condi,'-^','LineWidth',1.5);
43xlabel('');
44ylabel('Time Used (sec)');
45title('Fixed ');
46legend('Classical Jacobi','Cyclic Jacobi','Location','northwest','FontSize',20);
47axis square; axis([0,1e5,0,25])
B.4 Code for Figure 4.1
Figure 4.1 can be regenerated using the following routine
1clc; clear; close all; format short e;
2ud = 2^(-53); n = 1e2:1e2:3e3; condi = 100; rng(1,'twister');
3for i = 1:length(n)
4 fprintf('iteration %d/%d\n', i, length(n));
5 A = gallery('randsvd', n(i), -condi, 3);
6 [Q,~] = myqr(A);
7 orthog(i) = norm(Q'*Q - eye(n(i)));
8end
9semilogy(n,orthog,'-^r',LineWidth=1.5);
10hold on; semilogy(n, n.*ud, '-^k',LineWidth=1.5);
11legend("", '','Location','northwest');
12xlabel('Dimension ')
B.5 Code for Figure 4.3
Figure 4.3 can be regenerated using the following routine
1clc; clear; close all;
2n = 10:10:8e4;
3sq = (sqrt(1+n .* eps(single(1/2))) - 1).^4;
4sq_ref = n .* eps(1/2);
5plot(n,sq); hold on;
6plot(n,sq_ref,'--');
7plot(52400,5.81757e-12,'^k')
8legend('','','Location','northwest');
9xlabel('Dimension ')
B.6 Code for Figure 4.4
Figure 4.4 can be regenerated using the following routine
1clc; clear; close all; format short e; rng(1,'twister');
2ud = 2^(-53);
3n = 1e2:1e2:3e3; condi = 1e2;
4for i = 1:length(n)
5 fprintf('Start iteration %d/%d,\n',i,length(n));
6 A = my_randsvd( n(i), condi, 'geo');
7 [Q,~] = eig(single(A)); Q = double(Q);
8 Q1 = newton_schulz(Q);
9 orthog(i) = norm(Q1'*Q1 - eye(n(i)));
10end
11semilogy(n,orthog,'-^r',LineWidth=1.5);
12hold on;
13semilogy(n, n.*ud, '-^k',LineWidth=1.5);
14legend("",'','Location','northwest');
15xlabel('Dimension (n)'); title('Fix '); axis square
B.7 Code for Figure 4.5
Figure 4.5 can be regenerated using the following routine
1clc; clear; close all; format short e; rng(1,'twister');
2ud = 2^(-53); n = 1e2:1e2:3e3; condi = 1e2;
3for i = 1:length(n)
4 fprintf('iteration %d/%d\n',i,length(n));
5 A = my_randsvd(n(i),condi,'geo');
6 [Q,~] = eig(single(A)); Q = double(Q);
7 Q1 = newton_schulz(Q);
8 [Q2,~] = qr(Q);
9 [Q3,~] = myqr(Q);
10 orthogns(i) = norm(Q1'*Q1 - eye(n(i)));
11 orthogqr(i) = norm(Q2'*Q2 - eye(n(i)));
12 orthogmyqr(i) = norm(Q3'*Q3 - eye(n(i)));
13end
14semilogy(n,orthogns,'-^',LineWidth=1.5); hold on;
15semilogy(n,orthogqr,'-^',LineWidth=1.5);
16semilogy(n,orthogmyqr,'-^',LineWidth=1.5);
17legend("Newton-Schulz", ...
18 "\texttt{qr()} from MATLAB", ...
19 "My own QR code",...
20 'interpreter','latex', ...
21 'Location','northwest', ...
22 'FontSize',20);
23xlabel('Dimension ');
24ylabel('')
25axis([0,3000,0,2e-14])
26axis square
B.8 Code for Figure 4.6
Figure 4.6 can be regenerated using the following routine
1clc; clear; close all; format short e;
2ud = 2^(-53); n = 1e2:1e2:3e3; condi = 100;
3time_ns = zeros(length(n),1); time_qr = time_ns;
4rng(1,'twister'); % RNG
5for i = 1:length(n)
6 fprintf('iteration %d/%d\n',i,length(n));
7 A = my_randsvd(n(i),condi,'geo');
8 [Q,~] = eig(single(A)); Q = double(Q);
9 tic; [Q1] = newton_schulz(Q); time_ns(i) = toc;
10 tic; [Q2,~] = myqr(Q); time_qr(i) = toc;
11end
12semilogy(n,time_ns,'-^',LineWidth=1.5); hold on;
13semilogy(n,time_qr,'-^',LineWidth=1.5);
14legend('Newton--Schulz','QR Factorization','Location','northwest','interpreter','latex','FontSize',20);
15xlabel('Dimension ')
16ylabel('Time (sec)')
17axis square
B.9 Mixed Precision Jacobi Algorithm Code
1function [Q,D,iter] = jacobi_precondi(A,tol,maxiter)
2%JACOBI_PRECONDI(A,tol,maxiter) compute the spectral decomposition of A using cyclic-by-row Jacobi algorithm with preconditioning.
3%% Preliminaries
4switch nargin
5 case 2, max_it = 10; case 3, max_it = maxiter;
6 otherwise, error('Please check the number of inputs');
7end
8[n,m] = size(A);
9if m ~= n, error('Input matrix should be square matrix'), end
10if issymmetric(A) == 0, error('Input matrix should be symmetric'); end
11%% Spectral decomposition at single precision
12[Q_low,~] = eig(single(A)); Q_low = double(Q_low);
13%% Newton--Schulz iteration to orthogonalize
14Q_d = newton_schulz(Q_low);
15%% Cyclic-by-row Jacobi algorithm on preconditioned A
16A_cond = Q_d' * A * Q_d;
17[V,D,iter] = jacobi_eig(A_cond,tol,'Cyclic',max_it);
18%% Output
19Q = Q_d * V;
B.10 Code for Figure 5.1
Figure 5.1 can be regenerated using the following routine
1clc; clear; format short e;
2n = 50:50:1e3; condi = 500; nplot = n;
3rng(1,'twister');
4for i = 1:length(n)
5 fprintf('iteration %d/%d...',i,length(n))
6 A_geo = my_randsvd(n(i),condi,'geo');
7 A_ari = my_randsvd(n(i),condi,'ari');
8 [V1,D1,iter1] = jacobi_precondi(A_geo,eps(1/2));
9 [V2,D2,iter2] = jacobi_precondi(A_ari,eps(1/2));
10 accuracy_n_geo(i) = norm(A_geo * V1 - V1 * D1);
11 accuracy_n_ari(i) = norm(A_ari * V2 - V2 * D2);
12 reference_n(i) = n(i) * norm(A_geo) * eps(1/2);
13 fprintf('complete\n');
14end
15%%
16subplot(1,2,1); hold off;
17semilogy(nplot,accuracy_n_geo,'-^'); hold on;
18semilogy(nplot,accuracy_n_ari,'-^');
19semilogy(nplot,nplot.*eps(1/2),'--k');
20xlabel('Dimension ');
21ylabel('');
22legend('Geometric','Arithmetic','','Location','southeast','FontSize',20);
23title('Fixed ')
24axis square
25%%
26condiplot = 1e2:1e2:1e4; n = 5e2;
27for i = 1:length(condiplot)
28 fprintf('iteration %d/%d...',i,length(condiplot))
29 A_geo = my_randsvd(n,condiplot(i),'geo');
30 A_ari = my_randsvd(n,condiplot(i),'ari');
31 [V1,D1,iter1] = jacobi_precondi(A_geo,eps(1/2));
32 [V2,D2,iter2] = jacobi_precondi(A_ari,eps(1/2));
33 accuracy_condi_geo(i) = norm(A_geo * V1 - V1 * D1);
34 accuracy_condi_ari(i) = norm(A_ari * V2 - V2 * D2);
35 reference_n(i) = n * norm(A_geo) * eps(1/2);
36 fprintf('complete\n');
37end
38%%
39close all;
40subplot(1,2,1); hold off;
41semilogy(nplot,accuracy_n_geo,'-^'); hold on;
42semilogy(nplot,accuracy_n_ari,'-^');
43semilogy(nplot,nplot.*eps(1/2),'--k');
44xlabel('Dimension ');
45ylabel('');
46legend('Geometric','Arithmetic','','Location','southeast','FontSize',20);
47title('Fixed ')
48axis square
49subplot(1,2,2); hold off;
50semilogy(condiplot,accuracy_condi_geo,'-^'); hold on;
51semilogy(condiplot,accuracy_condi_ari,'-^');
52semilogy(condiplot,reference_n,'--k');
53xlabel('');
54ylabel('');
55legend('Geometric','Arithmetic','','Location','northeast','FontSize',20);
56title('Fixed ')
57axis square; axis([0,1e4,0,8e-14]);
B.11 Code for Figure 5.2
1clc; clear all; close all; figure(1);
2n = [50,250,500];
3for i = 1:3
4 subplot(2,2,i)
5 testing_1(n(i));
6 if i == 1
7 axis([0,1e4,0,6e-6]);
8 title('Fixed ');
9 elseif i == 2
10 axis([0,1e4,0,3.8e-5]);
11 title('Fixed ');
12 elseif i == 3
13 axis([0,1e4,0,8e-5]);
14 title('Fixed ');
15 end
16 pause(0.1);
17end
18subplot(2,2,4); hold off
19condi = 1e2; n1 = 10:10:1e3; offA = zeros(1,length(n1)); rng(1,'twister');
20for i = 1:length(n1)
21 fprintf('start %d/%d\n',i,length(n1));
22 A_geo = my_randsvd(n1(i),condi,'geo');
23 A_ari = my_randsvd(n1(i),condi,'ari');
24 [Q1,~] = eig(single(A_geo)); Q1 = double(Q1);
25 Q1_d = newton_schulz(Q1);
26 offA_geo(i) = off(Q1_d' * A_geo * Q1_d)/norm(A_geo);
27 [Q2,~] = eig(single(A_ari)); Q2 = double(Q2);
28 Q2_d = newton_schulz(Q2);
29 offA_ari(i) = off(Q2_d' * A_ari * Q2_d)/norm(A_ari);
30end
31semilogy(n1,offA_geo); hold on;
32semilogy(n1,offA_ari);
33semilogy(n1,n1 .* double(eps(single(1/2))),'--k');
34legend('Geometric','Arithmetic','','Location','southeast');
35xlabel(''); ylabel('off','Interpreter','latex');
36set(gca,'FontSize',20);
37axis square;
38title('Fixed ')
39function testing_1(n)
40condi = 10:10:1e4; offA_geo = zeros(1,length(condi)); offA_ari = offA_geo;
41rng(1,'twister');
42for i = 1:length(condi)
43 fprintf('start %d/%d\n',i,length(condi));
44 A_geo = my_randsvd(n,condi(i),'geo');
45 [Q1,~] = eig(single(A_geo)); Q1 = double(Q1);
46 Q1_d = newton_schulz(Q1);
47 offA_geo(i) = off(Q1_d' * A_geo * Q1_d)/norm(A_geo);
48 A_ari = my_randsvd(n,condi(i),'ari');
49 [Q2,~] = eig(single(A_ari)); Q2 = double(Q2);
50 Q2_d = newton_schulz(Q2);
51 offA_ari(i) = off(Q2_d' * A_ari * Q2_d)/norm(A_ari);
52end
53semilogy(condi,offA_geo,'LineWidth',1.5); hold on;
54semilogy(condi,offA_ari);
55semilogy(condi,n * double(eps(single(1/2))) * ones(1,length(condi)),'--k','LineWidth',1.5);
56legend('Geometric','Arithmetic','');
57xlabel(''); ylabel('off','Interpreter','latex');
58set(gca,'FontSize',20);
59axis square;
60end
B.12 Code for Figure 5.3
Figure 5.3 can be regenerated using the following routine
1clear; clc; rng(1,'twister');
2n = 50:50:1e3; condi = 500; nplot = n;
3for i = 1:length(n)
4 A_geo = my_randsvd(n(i),condi,'geo');
5 [V1,D1,iter1] = jacobi_precondi(A_geo,eps(1/2));
6 iteration_n_geo(i) = iter1;
7 A_ari = my_randsvd(n(i),condi,'ari');
8 [V2,D2,iter2] = jacobi_precondi(A_ari,eps(1/2));
9 iteration_n_ari(i) = iter2;
10 fprintf('iteration %d/%d complete\n',i,length(n));
11end
12condiplot = 1e2:1e2:1e4; n = 5e2;
13for i = 1:length(condiplot)
14 A_geo = my_randsvd(n,condiplot(i),'geo');
15 [V1,D1,iter1] = jacobi_precondi(A_geo,eps(1/2));
16 iteration_condi_geo(i) = iter1;
17 A_ari = my_randsvd(n,condiplot(i),'ari');
18 [V2,D2,iter2] = jacobi_precondi(A_ari,eps(1/2));
19 iteration_condi_ari(i) = iter2;
20 fprintf('iteration %d/%d complete\n',i,length(condiplot));
21end
22close all
23subplot(1,2,1);
24semilogy(nplot,iteration_n_geo,'-^'); hold on;
25semilogy(nplot,iteration_n_ari,'-^');
26xlabel('Dimension ');
27ylabel('Number of iterations til converge');
28title('Fixed ')
29legend('Geometric','Arithmetic','Location','northeast');
30axis square; axis([0,1000,0,4]);
31subplot(1,2,2);
32semilogy(condiplot,iteration_condi_geo,'-^'); hold on;
33semilogy(condiplot,iteration_condi_ari,'-^');
34legend('Geometric','Arithmetic','Location','northeast');
35xlabel('');
36ylabel('Number of iterations til converge');
37title('Fixed ')
38axis square; axis([0,1e4,0,5]);
B.13 Code for Figure 5.4
Figure 5.4 can be regenerated using the following routine
1clc; clear; close all; rng(1,'twister');
2n = 1e2:1e2:5e2; condi = 500; tol = 2^(-53);
3for i = 1:length(n)
4 Ag = my_randsvd(n(i),condi,'geo');
5 Aa = my_randsvd(n(i),condi,'ari');
6 tic; [V1,D1,iter1] = jacobi_precondi(Ag,tol);
7 t_precond_g(i) = toc;
8 tic; [V2,D2,iter2] = jacobi_eig(Ag,tol,'Cyclic');
9 t_normal_g(i) = toc;
10 tic; [Va1,Da1,iter1] = jacobi_precondi(Aa,tol);
11 t_precond_a(i) = toc;
12 tic; [Va2,Da2,iter2] = jacobi_eig(Aa,tol,'Cyclic');
13 t_normal_a(i) = toc;
14 fprintf('Iteration %d/%d complete\n',i,length(n));
15end
16subplot(1,2,1)
17plot(n,t_normal_g,'-^');
18hold on;
19plot(n,t_precond_g,'-^');
20xlabel('Dimension ')
21ylabel('Time Used (sec)')
22legend('Without Preconditioning', 'With Preconditioning','Location','northwest');
23title('Geometrically distributed test matrices')
24axis square
25subplot(1,2,2)
26plot(n,t_normal_a,'-^');
27hold on;
28plot(n,t_precond_a,'-^');
29xlabel('Dimension ')
30ylabel('Time Used (sec)')
31legend('Without Preconditioning', 'With Preconditioning','Location','northwest');
32title('Arithmetically distributed test matrices')
33axis square
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Adi Ben-Israel. An iterative method for computing the generalized inverse of an arbitrary matrix . Math. Comp. , 19(91):452–455, 1965. · doi ↗
- 2[2] Rajendra Bhatia. Matrix Analysis . Springer-Verlag, New York, 1997. xi+347 pp. ISBN 978-1-4612-6857-4. · doi ↗
- 3[3] Philip I. Davies, Nicholas J. Higham, and Françoise Tisseur. Analysis of the Cholesky method with iterative refinement for solving the symmetric definite generalized eigenproblem . SIAM J. Matrix Anal. Appl. , 23(2):472–493, 2001. · doi ↗
- 4[4] James Demmel and Krešimir Veselić. Jacobi’s method is more accurate than QR . SIAM J. Matrix Anal. Appl. , 13(4):1204–1245, 1992. · doi ↗
- 5[5] Zlatko Drmač and Krešimir Veselić. Approximate eigenvectors as preconditioner . Linear Algebra Appl. , 309(1):191–215, 2000. · doi ↗
- 6[6] Ky Fan and A. J. Hoffman. Some metric inequalities in the space of matrices . Proc. Amer. Math. Soc. , 6(1):111–116, 1955.
- 7[7] George E. Forsythe and Peter Henrici. The cyclic Jacobi method for computing the principal values of a complex matrix . Trans. Amer. Math. Soc. , 94(1):1–23, 1960.
- 8[8] Gene H. Golub and Henk A. van der Vorst. Eigenvalue computation in the 20th century . J. Comput. Appl. Math. , 123(1):209–239, 2001. · doi ↗
