A Mixed Precision Eigensolver Based on the Jacobi Algorithm

Zhengbo Zhou

arXiv:2509.00495·math.NA·September 3, 2025

A Mixed Precision Eigensolver Based on the Jacobi Algorithm

Zhengbo Zhou

PDF

Open Access

TL;DR

This paper introduces a mixed precision spectral decomposition method for real symmetric matrices using an optimized Jacobi algorithm, which significantly accelerates computation while maintaining accuracy.

Contribution

It proposes a novel mixed precision approach combining low and high precision computations with efficient index selection and orthogonalization techniques.

Findings

01

The cyclic-by-row index selection is more efficient than classical methods.

02

Polar decomposition with Newton-Schulz iteration outperforms QR in speed and accuracy.

03

The new method reduces spectral decomposition time by approximately 30%.

Abstract

The classic method for computing the spectral decomposition of a real symmetric matrix, the Jacobi algorithm, can be accelerated by using mixed precision arithmetic. The Jacobi algorithm is aiming to reduce the off-diagonal entries iteratively using Givens rotations. We investigate how to use the low precision to speed up this algorithm based on the approximate spectral decomposition in low precision. We first study two different index choosing techniques, classical and cyclic-by-row, for the Jacobi algorithm. Numerical testing suggests that cyclic-by-row is more efficient. Then we discuss two different methods of orthogonalizing an almost orthogonal matrix: the QR factorization and the polar decomposition. For polar decomposition, we speed up the Newton iteration by using the one-step Schulz iteration. Based on numerical testing, using the polar decomposition approach (Newton--Schulz…

Tables2

Table 1. Table 1.1: Unit roundoff defined by IEEE standard 754-2019.

Type	Unit Roundoff
Single precision	$2^{- 24} \approx$ 5.96e-8
Double precision	$2^{- 53} \approx$ 1.11e-16

Table 2. Table A.1: Two different approaches for computing t t in Algorithm 2 .

	Approach 1	Approach 2
$τ \geq 0$	$t = - τ + \sqrt{τ^{2} + 1}$	$t = \frac{1}{τ + \sqrt{1 + τ^{2}}}$
$τ < 0$	$t = - τ - \sqrt{τ^{2} + 1}$	$t = \frac{1}{τ - \sqrt{1 + τ^{2}}}$

Equations266

A x = λ x,

A x = λ x,

A = Q Λ Q^{T}, A = A^{T} \in R^{n \times n}, Q^{T} Q = Q Q^{T} = I,

A = Q Λ Q^{T}, A = A^{T} \in R^{n \times n}, Q^{T} Q = Q Q^{T} = I,

∥ Q_{ℓ}^{T} Q_{ℓ} - I ∥ ≲ n u_{ℓ}, ∥ A Q_{ℓ} - Q_{ℓ} D_{ℓ} ∥ ≲ n u_{ℓ} ∥ A ∥,

∥ Q_{ℓ}^{T} Q_{ℓ} - I ∥ ≲ n u_{ℓ}, ∥ A Q_{ℓ} - Q_{ℓ} D_{ℓ} ∥ ≲ n u_{ℓ} ∥ A ∥,

∥ x ∥_{p} = (i = 1 \sum n ∣ x_{i} ∣^{p})^{1/ p}, p \geq 1.

∥ x ∥_{p} = (i = 1 \sum n ∣ x_{i} ∣^{p})^{1/ p}, p \geq 1.

∥ x ∥_{2} = (i = 1 \sum n ∣ x_{i} ∣^{2})^{1/2} = x^{T} x .

∥ x ∥_{2} = (i = 1 \sum n ∣ x_{i} ∣^{2})^{1/2} = x^{T} x .

∥ A ∥_{F} = (i = 1 \sum n j = 1 \sum n ∣ a_{ij} ∣^{2})^{1/2} = (Tr (A^{T} A))^{1/2} .

∥ A ∥_{F} = (i = 1 \sum n j = 1 \sum n ∣ a_{ij} ∣^{2})^{1/2} = (Tr (A^{T} A))^{1/2} .

∥ A ∥ = ∥ x ∥ = 1 max ∥ A x ∥.

∥ A ∥ = ∥ x ∥ = 1 max ∥ A x ∥.

∥ A ∥_{2} = ∥ x ∥_{2} = 1 max ∥ A x ∥_{2} = x_{μ}^{T} A^{T} A x_{μ} = μ,

∥ A ∥_{2} = ∥ x ∥_{2} = 1 max ∥ A x ∥_{2} = x_{μ}^{T} A^{T} A x_{μ} = μ,

∥ A ∥_{2} = λ_{m a x} (A^{T} A) = σ_{1},

∥ A ∥_{2} = λ_{m a x} (A^{T} A) = σ_{1},

∥ A B ∥ \leq ∥ A ∥∥ B ∥,

∥ A B ∥ \leq ∥ A ∥∥ B ∥,

Q^{T} Q = Q Q^{T} = I_{n},

Q^{T} Q = Q Q^{T} = I_{n},

∥ Q x ∥_{2} = x^{T} Q^{T} Q x = x^{T} x = ∥ x ∥_{2},

∥ Q x ∥_{2} = x^{T} Q^{T} Q x = x^{T} x = ∥ x ∥_{2},

∥ U A V ∥ = ∥ A ∥, A \in R^{n \times n}, U and V are orthogonal.

∥ U A V ∥ = ∥ A ∥, A \in R^{n \times n}, U and V are orthogonal.

∥ U A V ∥_{F}^{2}

∥ U A V ∥_{F}^{2}

\|UAV\|_{F}^{2}=\operatorname{Tr}\big{(}A^{T}U^{T}UA\big{)}=\operatorname{Tr}\big{(}A^{T}A\big{)}=\operatorname{Tr}\big{(}AA^{T}\big{)}=\|A\|_{F}^{2}.

\|UAV\|_{F}^{2}=\operatorname{Tr}\big{(}A^{T}U^{T}UA\big{)}=\operatorname{Tr}\big{(}A^{T}A\big{)}=\operatorname{Tr}\big{(}AA^{T}\big{)}=\|A\|_{F}^{2}.

∥ U A V ∥_{2}^{2} = λ_{m a x} (V^{T} A^{T} U^{T} U A V) = λ_{m a x} (V^{T} A^{T} A V) .

∥ U A V ∥_{2}^{2} = λ_{m a x} (V^{T} A^{T} U^{T} U A V) = λ_{m a x} (V^{T} A^{T} A V) .

∥ U A V ∥_{2}^{2} = λ_{m a x} (A^{T} A) = ∥ A ∥_{2}^{2} .

∥ U A V ∥_{2}^{2} = λ_{m a x} (A^{T} A) = ∥ A ∥_{2}^{2} .

(G (i, k, θ))_{pq} = ⎩ ⎨ ⎧ 1 c s - s 0 if p, q \in / {i, k}, if p = q and p \in {i, k}, if p = i and q = k, if p = k and q = i, otherwise.

(G (i, k, θ))_{pq} = ⎩ ⎨ ⎧ 1 c s - s 0 if p, q \in / {i, k}, if p = q and p \in {i, k}, if p = i and q = k, if p = k and q = i, otherwise.

G_{ik} = G ([i, k], [i, k]) = [c - s s c] \to G_{ik}^{T} G_{ik} = [1001] = G_{ik} G_{ik}^{T} .

G_{ik} = G ([i, k], [i, k]) = [c - s s c] \to G_{ik}^{T} G_{ik} = [1001] = G_{ik} G_{ik}^{T} .

G^{T} G = G G^{T} = I .

G^{T} G = G G^{T} = I .

G (p, q, θ) = [e_{1}, \dots, e_{p - 1}, e_{pq}, e_{p + 1}, \dots, e_{q - 1}, e_{q p}, e_{q + 1}, \dots, e_{n}],

G (p, q, θ) = [e_{1}, \dots, e_{p - 1}, e_{pq}, e_{p + 1}, \dots, e_{q - 1}, e_{q p}, e_{q + 1}, \dots, e_{n}],

[e_{pq}]_{k} = ⎩ ⎨ ⎧ c - s 0 if k = p if k = q if k \neq = p, q, [e_{q p}]_{k} = ⎩ ⎨ ⎧ s c 0 if k = q if k = p if k \neq = p, q .

[e_{pq}]_{k} = ⎩ ⎨ ⎧ c - s 0 if k = p if k = q if k \neq = p, q, [e_{q p}]_{k} = ⎩ ⎨ ⎧ s c 0 if k = q if k = p if k \neq = p, q .

B = G (p, q, θ)^{T} A G (p, q, θ),

B = G (p, q, θ)^{T} A G (p, q, θ),

B = e_{1}^{T} A e_{1} ⋮ e_{pq}^{T} A e_{1} ⋮ e_{q p}^{T} A e_{1} ⋮ e_{n}^{T} A e_{1} \dots \dots \dots \dots e_{1}^{T} A e_{pq} ⋮ e_{pq}^{T} A e_{pq} ⋮ e_{q p}^{T} A e_{pq} ⋮ e_{n}^{T} A e_{pq} \dots \dots \dots \dots e_{1}^{T} A e_{q p} ⋮ e_{pq}^{T} A e_{q p} ⋮ e_{q p}^{T} A e_{q p} ⋮ e_{n}^{T} A e_{q p} \dots \dots \dots \dots e_{1}^{T} A e_{n} ⋮ e_{pq}^{T} A e_{n} ⋮ e_{q p}^{T} A e_{n} ⋮ e_{n}^{T} A e_{n} .

B = e_{1}^{T} A e_{1} ⋮ e_{pq}^{T} A e_{1} ⋮ e_{q p}^{T} A e_{1} ⋮ e_{n}^{T} A e_{1} \dots \dots \dots \dots e_{1}^{T} A e_{pq} ⋮ e_{pq}^{T} A e_{pq} ⋮ e_{q p}^{T} A e_{pq} ⋮ e_{n}^{T} A e_{pq} \dots \dots \dots \dots e_{1}^{T} A e_{q p} ⋮ e_{pq}^{T} A e_{q p} ⋮ e_{q p}^{T} A e_{q p} ⋮ e_{n}^{T} A e_{q p} \dots \dots \dots \dots e_{1}^{T} A e_{n} ⋮ e_{pq}^{T} A e_{n} ⋮ e_{q p}^{T} A e_{n} ⋮ e_{n}^{T} A e_{n} .

P = I - 2 \frac{v v ^{T}}{v ^{T} v}, 0 \neq = v \in R^{n} .

P = I - 2 \frac{v v ^{T}}{v ^{T} v}, 0 \neq = v \in R^{n} .

P^{T} = I^{T} - \frac{2}{v ^{T} v} (v v^{T})^{T} = I - \frac{2}{v ^{T} v} v v^{T} = P .

P^{T} = I^{T} - \frac{2}{v ^{T} v} (v v^{T})^{T} = I - \frac{2}{v ^{T} v} v v^{T} = P .

P^{T} P = P P^{T} = P P

P^{T} P = P P^{T} = P P

= I - \frac{4}{v ^{T} v} v v^{T} + \frac{4}{( v ^{T} v ) ^{2}} v v^{T} v v^{T}

= I - \frac{4}{v ^{T} v} v v^{T} + \frac{4}{v ^{T} v} v v^{T} = I .

y = P x = (I - \frac{2}{v ^{T} v} v v^{T}) x = x - \frac{2 v ^{T} x}{v ^{T} v} v

y = P x = (I - \frac{2}{v ^{T} v} v v^{T}) x = x - \frac{2 v ^{T} x}{v ^{T} v} v

v = \frac{1}{α} x - \frac{1}{α} y, α = \frac{2 v ^{T} x}{v ^{T} v} v,

v = \frac{1}{α} x - \frac{1}{α} y, α = \frac{2 v ^{T} x}{v ^{T} v} v,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMatrix Theory and Algorithms · Advanced Optimization Algorithms Research · Iterative Methods for Nonlinear Equations

Full text

A Mixed Precision Eigensolver Based on the Jacobi Algorithm

Zhengbo Zhou

\school

Mathematics \facultyScience and Engineering \submitdateSubmit date: 10 September 2022

\beforeabstract

The classic method for computing the spectral decomposition of a real symmetric matrix, the Jacobi algorithm, can be accelerated by using mixed precision arithmetic. The Jacobi algorithm is aiming to reduce the off-diagonal entries iteratively using Givens rotations. We investigate how to use the low precision to speed up this algorithm based on the approximate spectral decomposition in low precision.

We first study two different index choosing techniques, classical and cyclic-by-row, for the Jacobi algorithm. Numerical testing suggests that cyclic-by-row is more efficient. Then we discuss two different methods of orthogonalizing an almost orthogonal matrix: the QR factorization and the polar decomposition. For polar decomposition, we speed up the Newton iteration by using the one-step Schulz iteration. Based on numerical testing, using the polar decomposition approach (Newton–Schulz iteration) is not only faster but also more accurate than using the QR factorization.

A mixed precision algorithm for computing the spectral decomposition of a real symmetric matrix at double precision is provided. In doing so we compute the approximate eigenvector matrix $Q_{\ell}$ of $A$ in single precision using eig and single in MATLAB. We then use the Newton–Schulz iteration to orthogonalize the eigenvector matrix $Q_{\ell}$ into an orthogonal matrix $Q_{d}$ in double precision. Finally, we apply the cyclic-by-row Jacobi algorithm on $Q_{d}^{T}AQ_{d}$ and obtain the spectral decomposition of $A$ . At this stage, we will see, from the testings, the cyclic-by-row Jacobi algorithm only need less than 10 iterations to converge by utilizing the quadratic convergence. The new mixed precision algorithm requires roughly 30% of the time used by the Jacobi algorithm on its own. \afterabstract

\prefacesection

Acknowledgements It is my pleasure to acknowledge the advices from my supervisors Prof. Nicholas J. Higham and Prof. Françoise Tisseur. I wouldn’t start my research in numerical linear algebra without their guidance and this thesis cannot be completed without their unlimited patience and insightful feedback.

I would also like to thank all my friends for providing emotional support. In addition, two special thanks to Zeyu Wu, for his suggestion on coding aspects, and Anheng Xi, for the translation of several German articles from Jacobi and Schönhage.

Finally, I thank my parents, Guangyuan Zhou and Ling Li, for their unconditional love and for giving me the chance to study abroad. By standing upon the shoulders of my parents, I have seen further than them.

\afterpreface

Chapter 1 Introduction

Modern hardware increasingly supports low precision floating-point arithmetic. Using low precision, we have opportunities to accelerate linear algebra computations. A symmetric eigenproblem is solving

[TABLE]

with $x\neq 0$ and the matrix $A\in\mathbb{R}^{n\times n}$ is symmetric. Equivalently, the problem is aiming to compute the spectral decomposition

[TABLE]

and $\varLambda$ is the diagonal matrix with the eigenvalues of $A$ along the diagonal.

Traditional ways of solving such a problem include the Jacobi algorithm, the power method and the QR algorithm [8, ${2000}$ , Section 5–7]. This thesis is concerned with the Jacobi algorithm proposed by Jacobi in 1846 which is an iterative method that reduces the off-diagonal entries at each step. In addition, we investigate how the usage of low precision arithmetic can speed up the Jacobi algorithm.

In 2000, Drmač and Veselić [5, ${2000}$ ] provided a way to speed up the Jacobi algorithm on the real symmetric matrix $A$ . Let $U$ be the approximate eigenvector matrix of $A$ , then applying the Jacobi algorithm on $U^{T}AU$ can speed the reduction up.

In Chapter 2 we provide essential definitions and examples including norms, orthogonal matrix and the singular value decomposition. We introduce two important classes of the orthogonal matrix, the Givens rotation and the Householder transformation.

The Jacobi algorithm is introduced in Chapter 3. We describe the derivation of the Jacobi algorithm and its convergence rate. In Section 3.2 and 3.3 we study how to choose the pair of index required by the Jacobi algorithm.

The classical Jacobi algorithm, mentioned in Section 3.2, chooses the largest off-diagonal entry as the pair of index and the cyclic-by-row Jacobi algorithm, mentioned in Section 3.3, chooses the indices row by row but restricts to the superdiagonal part of the matrix. Although the classical Jacobi algorithm reduces the off-diagonal entries most optimally, the cyclic-by-row Jacobi algorithm does not require expensive off-diagonal searches. Finally, we conduct numerical tests to examine the speed and conclude that we shall stick with the faster cyclic-by-row Jacobi algorithm.

We provide a procedure to get an approximate eigenvector matrix in Chapter 4. Firstly, compute the spectral decomposition at low precision $A\approx Q_{\ell}D_{\ell}Q_{\ell}^{T}$ such that

[TABLE]

where $u_{\ell}$ is the unit roundoff in low precision. Then we orthogonalize $Q_{\ell}$ to $Q_{d}$ such that $\|{Q_{d}^{T}Q_{d}-I}\|\lesssim nu_{d}$ where $u_{d}$ is the unit roundoff in double precision (“Unit roundoff” - defined in Section 1.2). We can use $Q_{d}$ as the approximate eigenvector matrix of $A$ and precondition $A$ into $A_{\text{cond}}=Q_{d}^{T}AQ_{d}$ , then the Jacobi algorithm applied on $A_{\text{cond}}$ should converge in a few sweeps. In Section 4.2 and 4.3 we review two methods for orthogonalizing a matrix: the Householder QR factorization and the polar decomposition. Particularly, in Section 4.3, we study the Newton iteration approach and its variant Newton–Schulz iteration to compute the unitary factor of the polar decomposition. MATLAB codes are presented and after several numerical testings, among the Householder QR factorization approach, the Newton iteration approach and the Newton–Schulz iteration approach, the latter method is the most accurate and efficient one to use.

Finally, we assemble all the ingredients and produce Algorithm 9 in Chapter 5, a mixed precision Jacobi algorithm. In Section 5.2.2, we review some literature on the quadratic convergence of the cyclic-by-row Jacobi algorithm. We observe from the numerical testings that for any symmetric matrix $A\in\mathbb{R}^{n\times n}$ , after preconditioning, the Jacobi algorithm should converge within 10 iterations. The numerical testings suggest the Jacobi algorithm only needs two iterations to converge and there is a 75% reduction of time using preconditioned Jacobi algorithm compare to the Jacobi algorithm on its own.

1.1 Notations

We use the Householder notations:

•

Use capital letters for matrices.

•

Use lower case letters for vectors.

•

Use lower case Greek letters for scalars.

We use $\lambda(A)$ and $\sigma(A)$ to represent the eigenvalues and the singular values of $A$ . We denote $\sigma_{1}(A)$ or $\sigma_{1}$ if the matrix $A$ has been specified as the largest singular value of $A$ .

We say that the $Ax=b$ problem is well conditioned if small changes in the data $A,b$ always make small changes in the solution $x$ ; otherwise, it is ill-conditioned. We use $\kappa(A)=\|{A}\|\|{A^{-1}}\|$ as a measure to visualize the condition of a problem. (Norms will be introduced in Section 2.1)

We say a matrix $A\in\mathbb{R}^{n\times n}$ is almost orthogonal if $\|{A^{*}A-I}\|\lesssim nu_{\ell}$ where $I$ is the $n\times n$ identity matrix. A matrix $A\in\mathbb{R}^{n\times n}$ is almost diagonal if $\|{A-\mathrm{diag}(A)}\|/\|{A}\|\lesssim nu_{\ell}$ and this quantity is zero if $A$ is diagonal.

1.2 IEEE standard

The unit roundoff measures the relative error due to approximation in floating-point arithmetic. Throughout this thesis, we will only consider two precisions: single precision and double precision [14, ${2002}$ , Section 2.1]. We will denote $u_{\ell}$ as the unit roundoff at single precision and $u_{d}$ as the unit roundoff at double precision. IEEE standard 754-2019 [19, ${2019}$ , Table 3.5] shown in Table 1.1 will be used. For simplicity, any matrix with subscript $\ell$ will be at single precision and any matrix with subscript $d$ will be at double precision.

1.3 Reproducible Research

In this section, we will present our testing environment and system specifications. With these informations, the reader will have the ability to reproduce any results we have in this thesis.

Testing environment and system specifications:

•

System OS: Windows 10.

•

MATLAB version: 9.12.0.2009381 (R2022a) Update 4 [17].

•

CPU: Intel(R) Core(TM) i5-9600KF CPU @ 3.70GHz.

•

GPU: NVIDIA GeForce RTX 2060.

•

All the testings that require the random number generator will initiate after the MATLAB command rng(1,'twister')111MATLAB built-in function to control the random number generator, see https://www.mathworks.com/help/matlab/ref/rng.html..

Chapter 2 Matrix Norms, Orthogonal Matrices and Singular

Value Decomposition

In this chapter, we will introduce several concepts that are important for our analysis. In the first section, vector norms and matrix norms will be presented. Then we will discuss an important class of matrices, orthogonal matrices. Finally, a crucial decomposition, singular value decomposition, will be discussed.

2.1 Norms

Matrix norms are essential for matrix algorithms. For example, the matrix norms provide a way of measuring the difference between two matrices and help us to check if we have the desired solution in finite arithmetic. In this section, we will do our demonstrations in the real world but the complex case is similar. Also, we will only introduce the norm for square matrices, but these theories are all applicable to any general matrices with careful attention to matching dimensions.

2.1.1 Vector Norms

Before introducing matrix norm, a brief illustration of vector norm is necessary.

Definition 2.1 (vector norm).

A vector norm on $\mathbb{R}^{n}$ is a function $\|{:}\|\mathbb{R}^{n}\to\mathbb{R}$ such that it satisfies the following properties

$\|{x}\|\geq 0$ for all $x\in\mathbb{R}^{n}$ . 2. 2.

$\|{x}\|=0$ if and only if $x=0$ . 3. 3.

$\|{\lambda x}\|=\lvert\lambda\rvert\|{x}\|$ for all $\lambda\in\mathbb{R}$ and $x\in\mathbb{R}^{n}$ . 4. 4.

$\|{x+y}\|\leq\|{x}\|+\|{y}\|$ for all $x,y\in\mathbb{R}^{n}$ .

Example 2.2.

A useful class of vector norm is the $p$ -norm

[TABLE]

One particular example of the $p$ -norm is the Euclidean norm or simply $2$ -norm defined by taking $p=2$

[TABLE]

The latter equality is obvious.

2.1.2 Matrix Norms

Let us denote $\mathbb{R}^{n\times n}$ as $n\times n$ matrices with real entries.

Definition 2.3 (matrix norm).

A matrix norm on $\mathbb{R}^{n\times n}$ is a function $\|{\cdot}\|:\mathbb{R}^{n\times n}\to\mathbb{R}$ satisfying the following properties

$\|{A}\|\geq 0$ for all $A\in\mathbb{R}^{n\times n}$ . 2. 2.

$\|{A}\|=0$ if and only if $A=0$ . 3. 3.

$\|{\lambda A}\|=\lvert\lambda\rvert\|{A}\|$ for all $\lambda\in\mathbb{R}$ and $A\in\mathbb{R}^{n\times n}$ . 4. 4.

$\|{A+B}\|\leq\|{A}\|+\|{B}\|$ for all $A,B\in\mathbb{R}^{n\times n}$ .

One simple matrix norm is the Frobenius norm

[TABLE]

We will use the Frobenius norm intensively in the next section on the Jacobi algorithm.

Another important class of matrix norms is called induced norms or subordinate norms. Given a vector norm defined in Section 2.1.1, the corresponding induced norm is defined as

[TABLE]

Example 2.4.

The matrix 2-norm is induced by the vector 2-norm. From the definition of the induced norm,

[TABLE]

for some positive number $\mu$ , and $x_{\mu}$ is the corresponding chosen $x$ . Since $\|{x}\|_{2}=\sqrt{x^{T}x}$ for $x\in\mathbb{R}^{n}$ , hence we have $x_{\mu}^{T}A^{T}Ax_{\mu}=\mu^{2}$ . Also, by $\|{x_{\mu}}\|_{2}=x_{\mu}^{T}x_{\mu}=1$ , we can pre-multiply $x_{\mu}$ at both sides and we have $A^{T}Ax_{\mu}=\mu^{2}x_{\mu}$ which implies $\mu$ is the eigenvalue of $A^{T}A$ . In case we are choosing $x_{\mu}$ such that $\mu$ is maximum, therefore we have the definition of the matrix 2-norm

[TABLE]

where $\lambda_{\max}(A)$ and $\sigma_{1}$ denotes the largest eigenvalue and singular value of $A$ .

Definition 2.5 (consistent norms).

A norm is consistent if it is submultiplicative

[TABLE]

for all $A,B$ such that the product $AB$ defines.

Example 2.6.

The Frobenius norm and all subordinate (induced) norms are consistent. This is useful for constructing upper bounds.

2.2 Orthogonal Matrices

Recall the definition of the orthogonal matrix,

Definition 2.7 (orthogonal matrices).

A matrix $Q\in\mathbb{R}^{n\times n}$ is said to be orthogonal if

[TABLE]

where $I_{n}$ denotes the identity matrix of size $n$ . If $Q\in\mathbb{C}^{n\times n}$ , then we call $Q$ unitary rather than orthogonal and we denote its conjugate transpose as $Q^{*}$ .

Theorem 2.8.

The vector 2-norm is invariant under orthogonal transformations.

Proof.

Given an orthogonal matrix $Q\in\mathbb{R}^{n\times n}$ and any vector $x\in\mathbb{R}^{n}$ , we have

[TABLE]

which proves the theorem. ∎

Theorem 2.9.

The Frobenius norm and matrix 2-norm are the orthogonally invariant norm. Namely, the following norm equality holds for these two norms

[TABLE]

Proof.

By the definition of the Frobenius norm, we have

[TABLE]

By the properties $\operatorname{Tr}(AB)=\operatorname{Tr}(BA)$ , (2.4) simplifies to

[TABLE]

Hence we finished the proof for the Frobenius norm.

By the definition of the matrix 2-norm, we have

[TABLE]

Notice that if $V$ is orthogonal, then $A^{T}A$ and $V^{T}A^{T}AV$ share the same eigenvalues. Therefore $\lambda_{\max}(V^{T}A^{T}AV)=\lambda_{\max}(A^{T}A)$ , therefore we have

[TABLE]

Hence we proved the theorem. ∎

2.2.1 Givens Rotations

Definition 2.10 (Givens rotation).

A Givens rotation has the form

[TABLE]

where $c=\cos(\theta)$ and $s=\sin(\theta)$ for some $\theta$ .

Given a vector $y\in\mathbb{R}^{n}$ and a Givens rotation $G(i,j,\theta)$ , the product $G(i,j,\theta)y$ rotate $y$ through $\theta$ radians clockwise in the $(i,j)$ plane. Hence it is obvious that we can make either $y_{i}$ or $y_{j}$ become zero by manipulating $\theta$ . This idea will be utilized in Section 3.1 in order to eliminate $(i,j)$ th entry of a symmetric matrix.

Proposition 2.11.

Givens rotations are orthogonal.

Proof.

Let $G=G(i,k,\theta)$ be a Givens rotation. By using the trigonometric identity $\cos^{2}(\theta)+\sin^{2}(\theta)=1$ , we have the following

[TABLE]

We can calculate the following matrix products $G^{T}G$ and $GG^{T}$ using (2.7).

[TABLE]

Hence $G$ is orthogonal. ∎

Notice that the matrix $G(p,q,\theta)$ can be written as

[TABLE]

where $e_{i}$ is the $i$ th column of the $n\times n$ identity matrix if $i\neq p,q$ and $e_{pq}$ and $e_{qp}$ are defined as

[TABLE]

Proposition 2.12.

Suppose $A\in\mathbb{R}^{n\times n}$ and $G(p,q,\theta)$ is a Givens rotation. If we define $B$ as

[TABLE]

then $B$ and $A$ are the same except in rows and columns $p$ and $q$ .

Proof.

Using the notations in (2.8) and (2.9), matrix $B$ can be written as

[TABLE]

Hence $B_{ij}=e_{i}^{T}Ae_{j}=A_{ij}$ if $i,j\neq p,q$ and for $i,j=p$ or $q$ , $B_{ij}\neq A_{ij}$ in general. ∎

2.2.2 Householder Transformations

Another important class of orthogonal matrix is the Householder transformations. Unlike the Givens rotation which replaces a specific entry into zero, Householder transformation can change multiple entries into zeros. In this section, we will first discuss how this matrix can set entries to zeros and then a MATLAB implementation will be present together with numerical testing.

Definition 2.13.

A Householder transformation (synonyms are Householder matrix and Householder reflector) is an $n\times n$ matrix $P$ of the form

[TABLE]

The vector $v$ is the Householder vector.

Householder transformation has useful properties,

Symmetry:

[TABLE] 2. 2.

Orthogonality:

[TABLE]

Transforming a Vector

The Householder transformation can be used to zero components of a vector. Suppose we have a vector $x\in\mathbb{R}^{n}$ and we want

[TABLE]

where $y(2:n)={0}$ . Since $P$ is orthogonal, we require $\|{y}\|_{2}=\|{x}\|_{2}$ , hence $y=\pm\|{x}\|_{2}e_{1}$ where $e_{1}$ is the first column of the $n\times n$ identity matrix. We can rewrite $v$ as

[TABLE]

and since $P$ is independent of the scaling of $v$ , hence for $x\neq y$ , we can set $\alpha=1$ and we can choose $v=x-y$ . We can check our choice by computing $Px$ : Firstly, we can compute $v^{T}x$ and $v^{T}v$

[TABLE]

Substitute these two components into (2.10)

[TABLE]

The choice of the sign of $y$ depends on the sign of the first entry of $x$ . Suppose $x$ is dominant by its first entry, namely $x$ is close to a multiple of $e_{1}$ , then $v=x-\mathrm{sign}\left({x_{1}}\right)\|{x}\|_{2}e_{1}$ can have a small norm due to cancellation and this may result a large relative error in $2/(v^{T}v)$ . This relative error can be avoided by setting

[TABLE]

as suggested in many textbooks, such as [14, ${2002}$ , Section 19.1] and [9, ${2013}$ , Section 5.1.3]. Then we get

[TABLE]

Given $x\neq{0}\in\mathbb{R}^{n}$ , instead of generating the Householder matrix $P$ such that it satisfies (2.11), we should only generate the Householder vector $v$ and the coefficient $\beta=2/(v^{T}v)$ . The benefit of doing so can be seen from the Householder QR factorization algorithm in section 4.2.1. Based on the previous discussion, we can conclude these into Algorithm 1.

Algorithm 1.

Given $x\neq{0}\in\mathbb{R}^{n}$ , this algorithm computes $v\in\mathbb{R}^{n}$ and $\beta\in\mathbb{R}$ such that $P=I_{n}-\beta vv^{T}$ is orthogonal and $Px=-\mathrm{sign}\left({x_{1}}\right)\|{x}\|_{2}e_{1}$ .

[TABLE]

Notice that, line 6 rewrites the form of $v_{1}$ to avoid subtractive cancellation as suggested by [9, ${2013}$ , Algorithm 5.1.1]

[TABLE]

Implementation and Testing

During the MATLAB implementation, we do not need to create a new vector $v$ . It is sufficient to just overwrite the values of $x$ since the only difference between $v$ and $x$ in Algorithm 1 is the first entry of these two vectors.

⬇

1function [x,b] = house(x)

2n = length(x); sigma = x(2:n)'*x(2:n);

3mu = sqrt(x(1)*x(1) + sigma);

4if x(1) >= 0, x(1) = x(1) + mu;

5else, x(1) = -sigma/(x(1) + mu); end

6b = 2/(x(1)*x(1) + sigma);

7end

To test the above function, we first generate a vector $x\in\mathbb{R}^{5}$ using randn(5,1).

⬇

1x' =

2 -1.3499e+00, 3.0349e+00, 7.2540e-01, -6.3055e-02, 7.1474e-01

Using the above function and call [v,b] = house(x), we can construct $P$ using the formula $P=I_{n}-\beta vv^{T}$ . Then we examine the product $Px$ and we have

⬇

1(P * x)' =

2 3.4748e+00 0 1.7694e-16 -9.1073e-18 1.1102e-16

The result satisfies our expectations.

The Householder transformation can quickly introduce zeros into a vector using orthogonal matrices and this is useful when we study the QR factorization.

2.3 Singular Value Decomposition

Theorem 2.14 (singular value decomposition).

If $A\in\mathbb{R}^{m\times n}$ , $m\geq n$ , then there exists two orthogonal matrices $U\in\mathbb{R}^{m\times m}$ and $V\in\mathbb{R}^{n\times n}$ such that

[TABLE]

where $\sigma_{1},\dots,\sigma_{p}$ are all non-negative and arranged in non-ascending order. We denote (2.12) as the singular value decomposition (SVD) of $A$ and $\sigma_{1},\dots,\sigma_{p}$ are the singular values of $A$ .

From now, without explicit mention, $A$ will be a square matrix of full rank (our input $Q_{\ell}$ is always square, full rank matrix). Therefore, if we have the singular value decomposition $A=U\Sigma V^{T}$ , then $U,V\in\mathbb{R}^{n\times n}$ are unitary and $\Sigma=\mathrm{diag}(\sigma_{1},\dots,\sigma_{n})\in\mathbb{R}^{n\times n}$ where $\sigma_{1}\geq\dots\geq\sigma_{n}>0$ .

From the definition, the singular values of $A$ can be computed via computing the eigenvalues of $A^{T}A=V\varLambda V^{T}$ , where $\varLambda=\Sigma^{2}$ . Hence, the singular values of $A$ are the positive roots of the eigenvalues of $A^{T}A$ . Based on this relationship, we have the following properties.

Corollary 2.15.

If $A\in\mathbb{R}^{n\times n}$ of full rank and normal, then $\sigma(A)=\lvert\lambda(A)\rvert$ .

Proof.

If $A$ is normal, then $A^{T}A=AA^{T}$ . By spectral theorem, if $A$ is normal, then $A=U\varLambda U^{T}$ where $U$ is unitary and $\varLambda$ is a diagonal matrix with eigenvalues of $A$ on the diagonal. Therefore, the singular values of $A$ are

[TABLE]

Combine this Corollary and (2.2), we have: if $A$ is normal, then $\|{A}\|_{2}=\max\{\lvert\lambda(A)\rvert\}$ . ∎

2.4 Test Matices

Throughout this thesis, we will perform lots of MATLAB testing on our algorithms, hence we want to generate real symmetric matrices $A$ with desired condition number and different eigenvalue distributions.

To accomplish such a goal, we use the MATLAB built-in function gallery111A MATLAB built-in function to generate test matrices. For the full manual, you may refer to https://www.mathworks.com/help/matlab/ref/gallery.html.. If we use gallery('randsvd',n, -kappa) where kappa is a positive integer, then it will generate a symmetric positive definite matrix $A\in\mathbb{R}^{n\times n}$ with $\kappa(A)=\mathtt{kappa}$ . Our code my_randsvd from Appendix B.1 is the modified version of gallery('randsvd') and we deliberately change some eigenvalues to negative such that the output matrix will not necessarily be positive definite. It provides two different eigenvalue distributions.

Mode ’geo’, A = my_randsvd(n, kappa, 'geo'): The output matrix $A\in\mathbb{R}^{n\times n}$ with $\kappa(A)=\mathtt{kappa}$ and the magnitude of the eigenvalues of $A$ are geometrically distributed. 2. 2.

Mode ’ari’, A = my_randsvd(n, kappa, 'ari'): The magnitude of the eigenvalues of $A$ are arithmetically distributed.

In Chapter 5, we will test our algorithm on matrices with different distributions. For the rest of the thesis, we will use mode ’geo’ by default.

Chapter 3 Jacobi Algorithm

The Jacobi algorithm for eigenvalues are first published by Jacobi in 1846 [16] and became widely used in the 1950s after the computer is invented. In this chapter, we will first derive the Jacobi algorithm and discuss its linear convergence. Then the rest of this chapter will focus on two different Jacobi algorithms: Classical and Cyclic-by-row. MATLAB implementation and testing will be given.

3.1 Derivation and Convergence

Given a symmetric matrix $A\in\mathbb{R}^{n\times n}$ , the idea of the Jacobi algorithm is that at the $k$ th step, we use $A^{(k+1)}=Q_{k}^{T}A^{(k)}Q_{k}$ to replace $A^{(k)}$ , where $Q_{k}$ is orthogonal. We aim to have the property that the off-diagonal entries of $A^{(k+1)}$ are smaller than the off-diagonal entries of $A^{(k)}$ . To do this, we use the Givens rotation matrix defined in section 2.2.1.

Definition 3.1 (off operator).

We define the quantity

[TABLE]

which is the Frobenius norm of the off-diagonal elements.

The aim of the Jacobi algorithm on $A$ can be translated to reduce the quantity $\textrm{off}(A)$ . This can be done in 2 steps.

Choosing a pair of index $(p,q)$ . We assume that $1\leq p<q\leq n$ . 2. 2.

Overwriting $A$ by applying $G(p,q,\theta)$ on the matrix $A$ in the way of

[TABLE]

where $\theta$ is chosen such that $A^{\prime}_{pq}=A^{\prime}_{qp}=0$ .

Using the Jacobi algorithm on $A$ , we produce a sequence of matrices $\big{\{}A^{(k)}\big{\}}_{k=0}^{\infty}$ , where

[TABLE]

This set of iterations satisfy

[TABLE]

In the rest of this section, we will discuss how we can achieve (3.1) using step 2 and how we can construct such a matrix $G(p,q,\theta)$ given the choice $(p,q)$ . Then in the next two sections, we present two different methods of choosing the index $(p,q)$

•

The Classical Jacobi Algorithm.

•

The Cyclic-by-row Jacobi Algorithm.

Considering the subproblem which only involves four entries of $A$ .

[TABLE]

Since $A$ is symmetric (symmetry is preserved by orthogonal transformations), we have $a_{pq}=a_{qp}$ . Hence we can make further simplification to (3.2)

[TABLE]

To achieve step 2, we require $b_{pq}=b_{qp}=0$ , which means

[TABLE]

This equality is essential for us to find the desired Givens rotation and the construction will be discussed in section 3.1.1.

If we take the Frobenius norm on the first line of (3.2) and set $b_{pq}=b_{qp}=0$ , since the Frobenius norm is orthogonally invariant norm (Theorem 2.9), we have

[TABLE]

Using the notations in proposition 2.12 and the condition $b_{pq}=b_{qp}=0$ , we have

[TABLE]

Therefore, after $k$ steps and for some choices of the index pair $(p,q)$ , we always have

[TABLE]

and this reduction will never terminate until all the off-diagonal entries are zero. However, since each time we introduced a zero, the previous entry we chose will not necessarily remain zero, hence the $\textrm{off}(A^{(k)})$ will require an infinite number of iterations to converge to zero. In Section 3.2, we proved that even if we choose the largest $a_{pq}$ at each iteration, we still require infinite iteration for $\textrm{off}(A^{(k)})$ converges to zero.

3.1.1 Choices of Givens rotation matrix

In this section, we provide a theory to choose $c$ and $s$ for constructing the Givens rotation matrix we need at each iteration. Firstly, if $a_{pq}=0$ , there is no need to do further work since $\textrm{off}(B)=\textrm{off}(A)$ and we can simply choose $c=1$ and $s=0$ which lead to an identity matrix. Otherwise, we look at (3.4)

[TABLE]

Notice that $c=\cos(\theta)$ and $s=\sin(\theta)$ , we have

[TABLE]

By define $t=s/c$ , we have transformed (3.6) into

[TABLE]

This quadratic equation can be solved and we would like to choose the root of smaller magnitude.

[TABLE]

Using the fact that $s^{2}+c^{2}=1$ and $t_{\mathrm{min}}=s/c$ , we can define $c$ and $s$ by

[TABLE]

Here we choose the smaller roots (in magnitude) of $t$ , so that we ensure $\max(t_{\mathrm{min}})=1$ (shown in Figure 3.1), hence the angle of rotation is bounded $|\theta|\leq\pi/4$ and this choice turns out to be important for the quadratic convergence as discussed by Schönhage [20, ${1961}$ ].

3.1.2 Implementation and Testing

Section 3.1.1 can be summarized in the following algorithm.

Algorithm 2.

Given a symmetric matrix $A\in\mathbb{R}^{n\times n}$ and integers $p,q$ such that $1\leq p<q\leq n$ . This algorithm computes a pair $(c,s)$ to construct $G(p,q,\theta)$ as presented in (2.6), such that if $B=G(p,q,\theta)^{T}AG(p,q,\theta)$ , then $b_{pq}=b_{qp}=0$ .

[TABLE]

When we do the MATLAB implementation, we need to make a minor change to lines 6 and 8. In practice, we need to change these to

[TABLE]

where we choose the positive sign if $\tau\geq 0$ and the minus sign if $\tau<0$ . It is easy to prove these two sets of expressions are the same in exact arithmetic, however, they differ in finite precision (See Appendix A.1) and (3.9) is usually preferable. Then it is straightforward to implement the choice using MATLAB.

⬇

1function [c,s] = jacobi_pair(A,p,q)

2if A(p,q) == 0

3 c = 1; s = 0;

4else

5 tau = (A(q,q)-A(p,p))/(2*A(p,q));

6 if tau >= 0

7 t = 1/(tau + sqrt(1+tau*tau));

8 else

9 t = 1/(tau - sqrt(1+tau*tau));

10 end

11 c = 1/sqrt(1+tt); s = tc;

12end

13end

We can test this function by constructing the matrix $G(p,q,\theta)$ with $(p,q)$ given and examine the $(p,q)$ entry using the following routine,

⬇

1clc; clear; close all;

2n = 100; format short e;

3A = my_testmatrices(n);

4pq = [57,64]; % choose a pair of index

5[c,s] = jacobi_pair(A,pq(1),pq(2));

6% construct the desire Givens rotation matrix

7G = eye(n); G([pq(1),pq(2)],[pq(1),pq(2)]) = [c,s;-s,c];

8B = G' * A * G;

9disp('Before'); disp(A(pq(1),pq(2))); disp(A(pq(2),pq(1)));

10disp('After');disp(B(pq(1),pq(2))); disp(B(pq(2),pq(1)));

This routine should give $b_{pq}=b_{qp}=0$ where $p=57$ and $q=64$ .

⬇

1Before

2 -9.4140e-01

3 -9.4140e-01

4After

5 -2.2204e-16

6 -1.1102e-16

Before applying the Givens rotations, the entries are $-9.414\times 10^{-1}$ and after such transformation we have $b_{pq}$ and $b_{qp}$ small enough for computers to consider them as zeros. Hence we successfully provide a way of constructing Givens rotations that achieve step 2 as discussed earlier in this section and the remaining part of this chapter is to sophistically choose the index pair $(p,q)$ .

3.2 Classical Jacobi Algorithm

From (3.6), we proved that $\textrm{off}(B)^{2}=\textrm{off}(A)^{2}-2a_{pq}^{2}$ , hence it is natural to consider the choice $(p,q)$ such that $a_{pq}^{2}$ is maximal. This choice was first proposed by Jacobi [16] in 1846 and the future literature refer to it as the classical Jacobi algorithm. We can summarize this choice into Algorithm 3.

Algorithm 3.

Given a symmetric matrix $A\in\mathbb{R}^{n\times n}$ and a positive tolerance $tol$ , this algorithm overwrites $A$ with $V^{T}AV$ where $V$ is orthogonal.

[TABLE]

The structure of Algorithm 3 adapts from [3, ${2001}$ , Algorithm 2.1] and the stopping criterion is adapted from [9, ${2013}$ , Section 8.5.5], [5, ${2000}$ , Theorem 1.1] and [4, ${1992}$ , Section 1].

3.2.1 Linear Convergence

Notice that since $|a_{pq}|$ is chosen to be the largest off-diagonal element, hence we have the inequality

[TABLE]

Using (3.6), we have

[TABLE]

Denote $A^{(k)}$ as the $k$ th Jacobi update of $A^{(0)}=A$ , then we have the iterative bound

[TABLE]

This implies that the classical Jacobi algorithm converges linearly to a diagonal matrix whose entries are the eigenvalues of $A$ , and they may appear in any order.

Remark 3.2.

Although the term $\textrm{off}(A^{(k)})$ decrease at a rate of $(1-1/N)^{1/2}$ , if we consider the $N=n(n-1)/2$ steps as one ‘group’ or ‘sweep’ of update, then we actually can expect asymptotic quadratic convergence [23, ${1962}$ ] where

[TABLE]

where $K$ is the number of sweeps.

3.2.2 Implementation and Testing

Before we implement the Jacobi algorithm, we need two functions (i) the function that calculates the Frobenius norm of the off-diagonal entries and (ii) the function that finds the index of the maximum entry in modulus.

(i) can be done by carefully selecting the diagonal entries and set to zero, and then calculating the Frobenius norm of the new matrix.

⬇

1function offA = off(A)

2n = length(A); % get the dimension of the matrix A

3A(1:n+1:n*n) = 0; % set the diagonal entries to zero

4offA = norm(A,"fro"); % calculate the norm

5end

The third line of the code will eliminate the diagonal entries, then the fourth line will give us the desired output, $\textrm{off}(A)$ .

(ii) can be implemented using the following function:

⬇

1function [p,q] = maxoff(A)

2n = length(A); % dimension of matrix A

3A(1:n+1:n*n) = 0; A = abs(A); % clear the diagonal entries

4[val, idx1] = max(A);

5[~, q] = max(val);

6p = idx1(q);

7end

At the fourth line, I use [val, idx1] = max(A) to find the index, idx1, of the maximum entries of each column and I store the values in val. Then we can apply max() again to find the index of the maximum entry of val, denoted as q. Finally, we locate the column number of the maximum entry by calling idx1(q) as shown in the sixth line.

Making use of these two functions, we can implement the classical Jacobi algorithm in MATLAB. The function jacobi_classical will take two inputs (i) the symmetric matrix $A^{(0)}\in\mathbb{R}^{n\times n}$ and (ii) the tolerance $tol$ . I choose to output two matrices, $A$ and $V$ such that $\|{A^{(0)}V-VA}\|_{2}\lesssim nu_{d}\|{A^{(0)}}\|_{2}$ and the number of iterations required.

Notice that if we would like to update $A^{(k)}$ by $G_{k}^{T}A^{(k)}G_{k}$ , since $A^{(k)},G_{k}\in\mathbb{R}^{n\times n}$ , the cost would be $\mathcal{O}(n^{3})$ flops. However, we can utilize proposition 2.12, namely, we only need to update the $p$ and $q$ th rows and columns of $A^{(k)}$ , and this version of the update will only take $O(n)$ flops. We can see this reduction in the following code

⬇

1[c,s] = jacobi_pair(A,p,q); G = [c s; -s c];

2A([p q],:) = G'*A([p q],:);

3A(:,[p q]) = A(:,[p q])*G;

From the code above, we get both the updated matrix $A^{(k+1)}$ and the orthogonal matrix $G_{k}$ . Line 2 requires a matrix multiplication between $\mathbb{R}^{2\times 2}\times\mathbb{R}^{2\times n}$ which needs $\mathcal{O}(n)$ flops, and the same argument applies to line 3. Hence when updating the matrix $A^{(k)}$ , we only require $\mathcal{O}(n)$ flops. Assemble all above, we have

⬇

1function [V,A,counter] = jacobi_classical(A,tol)

2counter = 0; n = length(A); V = eye(n); done_rot = true;

3tol1 = tol * norm(A);

4while done_rot

5 if isint(counter/(n*n)), A = (A + A')/2; end % maintain symmetry

6 done_rot = false; [p,q] = maxoff(A);

7 if abs(A(p,q)) > tol1 * sqrt(abs(A(p,p) * A(q,q)))

8 counter = counter + 1; done_rot = true;

9 [c,s] = jacobi_pair(A,p,q);

10 J = [c,s;-s,c];

11 A([p,q],:) = J'*A([p,q],:);

12 A(:,[p,q]) = A(:,[p,q]) * J;

13 V(:,[p,q]) = V(:,[p,q]) * J;

14 else

15 A = diag(diag(A)); % output diagonal matrix

16 break;

17 end

18end

19end

After several iterations, although in exact arithmetic, the matrix $A$ should always be symmetric, in floating-point arithmetic, we need to maintain its symmetry as shown in line 5. We can then test our code by the following routine

⬇

1clc;clear; format short e

2n = 1e2; A = randn(n); A = A + A'; tol = 2^(-53);

3[V,D,iter] = jacobi_classical(A,tol);

4disp('norm(AV - VD)'); disp(norm(A *V - V * D))

Here we should expect the norm $\|{AV-VD}\|_{2}$ is about $nu_{d}\|{A}\|_{2}$ which is $3.0224\times 10^{-13}$ in our test.

⬇

1norm(AV - VD)

2 4.3553e-13

Hence the algorithm works as expected.

3.3 Cyclic-by-row Jacobi Algorithm

The classical Jacobi method requires at each step the searching of $n(n-1)$ for one maximum entry in modulus. For $n$ is large, this can be extremely expensive. It will be better if we fixed a sequence of choice $(p,q)$ , here in cyclic order shown in the following scheme suggested by [10, ${1953}$ ] and [7, ${1960}$ , Section 1.2]

[TABLE]

We keep using $(p,q)$ in this fashion until we meet the required tolerance and this procedure can be described in Algorithm 4.

Algorithm 4.

Given a symmetric matrix $A\in\mathbb{R}^{n\times n}$ and a positive tolerance $tol$ , this algorithm overwrites $A$ with $V^{T}AV$ where $V$ is orthogonal.

[TABLE]

Similarly, the structure of Algorithm 4 adapts from [3, ${2001}$ , Algorithm 2.1] and the stopping criterion is adapted from [9, ${2013}$ , Section 8.5.5], [5, ${2000}$ , Theorem 1.1] and [4, ${1992}$ , Section 1].

3.3.1 Implementation and Testing

The MATLAB code is similar to the classical Jacobi algorithm, and the only difference is the way of choosing $(p,q)$ :

⬇

1function [V,A,iter] = jacobi_cyclic(A,tol,maxiter)

2n = length(A); V = eye(n); iter = 0; done_rot = true;

3while done_rot && iter < maxiter

4 done_rot = false;

5 for p = 1:n-1

6 for q = p+1:n

7 if abs(A(p,q)) > tol * sqrt(abs(A(p,p)*A(q,q)))

8 done_rot = true;

9 [c,s] = jacobi_pair(A,p,q);

10 J = [c,s;-s,c];

11 A([p,q],:) = J'*A([p,q],:);

12 A(:,[p,q]) = A(:,[p,q]) * J;

13 V(:,[p,q]) = V(:,[p,q]) * J;

14 end

15 end

16 end

17 if done_rot

18 A = (A + A')/2; iter = iter + 1;

19 else

20 A = diag(diag(A));

21 return;

22 end

23end

24end

Notice that, since the cyclic-by-row Jacobi algorithm does not reduce $\textrm{off}(A^{(k)})$ in the most optimal way, it may require much more iterations than the classical Jacobi algorithm. Hence we restrict the maximum number of iterations by maxiter at line 3.

Using the same routine as presented in Section 3.2.2, we have

⬇

1norm(AV - VD)

2 5.9258e-13

The required tolerance is $nu_{d}\|{A}\|_{2}\approx 3.0260\times 10^{-13}$ and therefore we indeed have a spectral decomposition of $A$ at double precision.

3.4 Comparision

In the previous two sections, we discussed both the classical Jacobi algorithm and the cyclic-by-row Jacobi algorithm. However, we need to choose between them. In this section, we will examine the performance concerning both the dimension $n$ and the condition number $\kappa(A)$ . We can generate Figure 3.2 using code in Appendix B.3. Notice that, in the code, we call a function call jacobi_eig and this is the assembled version of the Algorithm 3 and 4 shown in Appendix B.2.

From the right figure, we can see the condition number does not affect the time too much. However, from the left figure, we observe that the downside of searching for the maximum off-diagonal entry is significant for large $n$ . The cyclic-by-row Jacobi algorithm uses about $1/30$ of the time used by the classical Jacobi algorithm for $n=1000$ . Therefore, although the classical Jacobi reduces the $\textrm{off}(A)$ in the most optimal way, in practice, we should always stick with the cyclic-by-row Jacobi algorithm.

Chapter 4 Orthogonalization

4.1 Introduction

From Chapter 3, given a symmetric matrix $A\in\mathbb{R}^{n\times n}$ , we can use the Jacobi algorithm to find an spectral decomposition $A=Q\varLambda Q^{T}$ . However, for large $n$ , this procedure can still be time-consuming and is not generally as fast as the symmetric QR Algorithm [9, ${2013}$ , Section 8.5]. However, the Jacobi algorithm can exploit the situation that $A$ is almost diagonal ( $\textrm{off}(A)$ is small). In 2000, Drmač and Veselić proposed a way of speeding up this procedure using approximate eigenvectors as preconditioner [5, ${2000}$ , Section 1]. They proposed the following

Given a symmetric matrix $A\in\mathbb{R}^{n\times n}$ and its approximate orthogonal eigenvector matrix $Q_{\text{app}}$ . 2. 2.

Computing $A^{\prime}=Q_{\text{app}}^{T}AQ_{\text{app}}$ . 3. 3.

Diagonalizing $A^{\prime}$ .

Suppose $Q_{\text{app}}$ is orthogonal in double precision and step 2 and 3 are carried out at double precision, then the eigenvalues we computed will have a small relative error and more efficient than the pure Jacobi method [5, ${2000}$ , Section 2]. In this section, we will provide a way of constructing this approximate vector matrix:

Compute the spectral decomposition at single precision such that

[TABLE]

where $n$ is the dimension of the real symmetric matrix $A$ and $u_{\ell}$ is the unite roundoff at single precision. This can be achieved using eig function in MATLAB via the following routine

⬇

1[Q_low,D] = eig(single(A));

2Q_low = double(Q_low); 2. 2.

Orthogonalize the matrix $Q_{\ell}$ to $Q_{d}$ such that it is orthogonal at double precision,

[TABLE]

In this chapter, we will first review two methods to orthogonalize $Q_{\ell}$ : the QR factorization and the polar decomposition. We aim to find the orthogonal (at double precision) matrix $Q_{d}$ that minimizes the norm $\|{Q_{d}-U}\|_{2}$ where $U$ is the exact matrix of the eigenvectors of $A$ . However, in practice, we have no access to the matrix $U$ , therefore, we instead find the orthogonal matrix $Q_{d}$ that minimizes $\|{Q_{d}-Q_{\ell}}\|_{2}$ . This problem can be transformed to: Given $Q_{\ell}$ , find the orthogonal matrix $Q_{d}$ such that it satisfies

[TABLE]

Finally, the code in MATLAB will be presented and we will access them to decide which to use in practice.

4.2 The Householder QR Factorization

The QR factorization of $A\in\mathbb{R}^{n\times n}$ is a factorization

[TABLE]

where $Q\in\mathbb{R}^{n\times n}$ is orthogonal and $R\in\mathbb{R}^{n\times n}$ is upper triangular. Suppose we have a QR factorization at double precision of $Q_{\ell}$ ,

[TABLE]

where $Q_{d}$ is orthogonal at double precision, and this is already our desired preconditioner. The QR factorization can be achieved using Householder transformations in Section 2.2.2.

4.2.1 Theory of Householder QR Factorization

The idea can be illustrated using a small matrix $A\in\mathbb{R}^{4\times 4}$ :

[TABLE]

where $P_{k}$ , $k=1,2,3$ , are the Householder transformations. Notice that $P_{2}\in\mathbb{R}^{3\times 3}$ only applies to the bottom right corner of the matrix, and this can be done by embedding $P_{2}$ inside a larger $4\times 4$ matrix via

[TABLE]

Hence from the example above, we can generalize the Householder QR factorization for $A\in\mathbb{R}^{n\times n}$ . It will produce a sequence of matrices $\{A_{k}\}_{k=1}^{n}$ where $A_{1}$ is the input $A$ and $A_{n}$ is the upper triangular matrix $R$ . Notice, there are $k-1$ stages where the $k$ th stage transform $A_{k}$ to $A_{k+1}$ .

At the $k$ th stage of the Householder QR factorization, we can carefully partition the matrix $A_{k}$ into the following form

[TABLE]

where $R_{k-1}$ is upper triangular (achieved at the ( $k-1$ )th stage) and $x_{k}$ is the vector that we should focused on the $k$ th stage. Choose a Householder transformation $P_{k}$ such that $P_{k}x_{k}=\mathrm{sign}\left({x_{k,1}}\right)\|{x_{k}}\|_{2}e_{1}$ , where $e_{1}$ is the first column of $I_{n-k+1}$ and $x_{k,1}$ is the first entry of the vector $x_{k}$ . We then embed matrix $P_{k}$ into a larger matrix $\widetilde{P}_{k}\in\mathbb{R}^{n\times n}$ via

[TABLE]

Then let $A_{k+1}=\widetilde{P}_{k}A_{k}$ , we have

[TABLE]

which is closer to the upper triangular form. After $n-1$ steps, the matrix $A_{n}$ will be upper triangular and we denote as $R$ and we obtain the QR factorization of $A$

[TABLE]

Since $\widetilde{P}_{k}$ are composed of $P_{k}$ , Householder matrices, and the identity matrix. Hence it is obvious that $\widetilde{P}_{k}$ are also symmetric and orthogonal. Then we can construct $Q$ via $\widetilde{P}_{1}\cdots\widetilde{P}_{n-1}$ . This procedure can be summarized in the Algorithm 5.

Algorithm 5.

Given $A\in\mathbb{R}^{n\times n}$ , this algorithm computes an orthogonal matrix $Q$ and an upper triangular matrix $R$ such that $A=QR$ .

[TABLE]

Line $4$ is directly adapted from (4.3). Line $5$ can be viewed by partitioning the orthogonal matrix $Q_{k}$ into four parts and multiplying the matrix $\widetilde{P}_{k}$ structured as (4.2),

[TABLE]

Hence, at each step of the algorithm, we only update the final $n-k+1$ columns of the matrix $Q_{k}$ . Also, focused on the line 4, if we construct $P$ and do a matrix-matrix multiplication between $D_{k},P_{k}\in\mathbb{R}^{(n-k+1)\times(n-k+1)}$ where $D_{k}=[x_{k},B_{k}]$ , then the cost will be about $2(n-k+1)^{3}$ . If we utilize the components we computed from Algorithm 1, the Householder vector $v$ and the coefficient $\beta$ , we can reduce the computational cost by doing the matrix-vector products instead. We can rewrite $P_{k}D_{k}$ using $v$ and $\beta$

[TABLE]

By this procedure, we transform a matrix-matrix multiplication to

Vector-matrix multiplication: $v^{T}D_{k}$ requires $2(n-k+1)^{2}$ flops. 2. 2.

Scalar-vector inner product: $\beta v$ requires $O(n-k+1)$ flops. 3. 3.

Vector-vector outer product: $(\beta v)\cdot(v^{T}D_{k})$ requires $(n-k+1)^{2}$ flops. 4. 4.

Matrix-matrix subtraction: $D_{k}-(\beta v)\cdot(v^{T}D_{k})$ requires $O(n-k+1)^{2}$ flops.

Therefore, the overall cost will be $4(n-k+1)^{2}$ flops. Compare with the matrix-matrix multiplication which requires $O((n-k+1)^{3})$ flops, this approach is much more efficient.

4.2.2 Implementation and Testing

Based on these analyses in Section 4.2.1, we can implement it into MATLAB.

⬇

1function [Q,A] = myqr(A)

2[m,n] = size(A); Q = eye(n);

3if m ~= n, error('Input should be a square matrix.'), end

4for k = 1:n-1

5[v,b] = house(A(k:n,k)); % compute the components of P

6A(k:n,k:n) = A(k:n,k:n) - (bv)(v'*A(k:n,k:n)); % update A

7Q(:,k:n) = Q(:,k:n) - (Q(:,k:n)v)(b*v'); % update Q

8end

Recall from the last section, we discussed that at the $k$ th step, updating $A$ requires about $4(n-k+1)^{2}$ flops. Similarly, we can update $Q$ based on (4.4) which only involves $(n-k+1)$ columns of $Q$ . Therefore, line 7 requires $4(n-k+1)n$ flops at the $k$ th step. We can then compute the theoretical overall cost,

[TABLE]

To test our code, we are focused on two quantities, $\|{Q^{T}Q-I}\|_{2}$ and $\|{QR-A}\|_{2}$ . The former evaluates whether our computed $Q$ is orthogonal at double precision and the latter evaluates whether we have a QR factorization. In theory, we should have

[TABLE]

We can use the following MATLAB routine, notice that we add the accuracy of the MATLAB built-in function qr as a reference.

⬇

1clc; clear; format short e;

2n = 1e2; ud = 2^(-53); A = randn(n);

3[Q,R] = myqr(A); [Q1,R1] = qr(A); % qr factorization

4orthg = norm(Q'*Q - eye(n)); % check orthogonality

5qralg = norm(Q*R - A)/norm(A); % check if we have a qr factorization

6fprintf('Orthogonal? %d \nMy QR accuracy is %d\n',orthg, qralg);

7fprintf('MATLAB qr function accuracy is %d \n', norm(Q1*R1 - A)/norm(A));

8fprintf('n*(machine precision) = %d\n', ud*n);

And we have the output:

⬇

1Orthogonal? 3.567685e-15

2My QR accuracy is 1.493857e-15

3MATLAB qr function accuracy is 1.026458e-15

4n*(machine precision) = 1.110223e-14

The first and second outputs are all smaller than $nu_{d}\approx 10^{-13}$ therefore our code works well. In addition, the second and the third outputs are the accuracy of functions myqr and qr and we can see these are about the same, therefore our code achieves similar accuracy as the MATLAB built-in function.

Moreover, I am interested in how $\|{Q^{T}Q-I_{n}}\|_{2}$ behaved as $n$ increases. From Figure 4.1, we can see $\|{Q^{T}Q-I_{n}}\|_{2}$ does not increase too much when $n$ increases and is well bounded by the reference line $nu_{d}$ .

By now, we have a way to orthogonalize a matrix. However, this approach does not utilize the property that our input matrix $Q_{\ell}$ is an almost orthogonal matrix. Namely, applying myqr to $Q_{\ell}$ has no difference from applying myqr to a general matrix. In the next section, we will introduce a way to orthogonalize a matrix which exploits the fact that the input is almost orthogonal.

4.3 The Polar Decomposition

In complex analysis, it is known that for any $\alpha\in\mathbb{C}$ , we can write $\alpha$ in polar form, namely $\alpha=r\mathrm{e}^{\mathrm{i}\theta}$ . The polar decomposition is its matrix analogue. The polar decomposition can be derived from the singular value decomposition discussed in Section 2.3. Conventionally, we will present our analysis in complex but this can be restricted to real. All the norm $\|{\cdot}\|$ discussed in this section will be the orthogonally invariant norm.

Theorem 4.1 (Polar decomposition).

If $A\in\mathbb{C}^{n\times n}$ , then there exists a unitary matrix $U\in\mathbb{C}^{n\times n}$ and a unique Hermitian positive semidefinite matrix $H\in\mathbb{C}^{n\times n}$ such that $A=UH$ . We call $U$ the unitary factor.

Proof.

By Theorem 2.14, suppose $A\in\mathbb{C}^{n\times n}$ and $\operatorname{rank}(A)=r$ , then the SVD of $A$ can be written as

[TABLE]

where $\Sigma_{r}=\mathrm{diag}(\sigma_{1},\dots,\sigma_{r},0,\dots,0)\in\mathbb{R}^{n\times n}$ . Unitarity of $U$ can be easily proved by its definition

[TABLE]

$H$ is a positive semidefinite due to its eigenvalues are the singular values of $A$ which are all non-negative. By forming $A^{*}A$ , we have $A^{*}A=H^{*}U^{*}UH=H^{*}H=H^{2}$ , therefore $H$ can be uniquely determined by $(A^{*}A)^{1/2}$ [15, ${2008}$ , Theorem 1.29]. ∎

Notice that, if $A$ is full rank, then $H$ is clearly non-singular by construction, therefore $U$ can be uniquely determined by $U=AH^{-1}$ . Similar to the QR factorization, we have a decomposition of $A$ where one of the factors is unitary, and we expect that we can use this unitary factor as the preconditioner. The next section gives a special property which shows the unitary factor $U$ of the polar decomposition of $A$ is the one that minimizes the distance $\|{A-U}\|$ .

4.3.1 Best Approximation Property

The unitary factor $U$ of the polar decomposition of $A=UH$ satisfies the best approximation property.

Theorem 4.2 ([6, ${1955}$ , Theorem 1],[15, ${2008}$ , Theorem 8.4]).

Let $A\in\mathbb{C}^{n\times n}$ and $A=UH$ is its polar decomposition. Then

[TABLE]

holds for every unitarily invariant norm.

To prove Theorem 4.2, we need the following two results:

Lemma 4.3 ([18, ${1960}$ , Lemma 4]).

Let $X,Y$ be Hermitian matrices, and denote by

[TABLE]

be the eigenvalues of $X,Y$ and $X-Y$ respectively. Then if we denote by $(\xi-\eta)_{1},\dots,(\xi-\eta)_{n}$ the numbers $\xi_{1}-\eta_{1},\dots,\xi_{n}-\eta_{n}$ arranged in non-ascending order of magnitude, then we have

[TABLE]

Proof.

See [6, ${1955}$ , Theorem 2]. ∎

Theorem 4.4 ([18, ${1960}$ , Theorem 5], [15, ${2008}$ , Theorem B.3]).

Let $\rho_{1}\geq\cdots\geq\rho_{n}$ and $\sigma_{1}\geq\cdots\geq\sigma_{n}$ be the singular values of the complex matrices $A$ and $B$ respectively. Then

[TABLE]

where $\Sigma_{A}$ and $\Sigma_{B}$ are diagonal matrices with entries the singular values of $A$ and $B$ arranged in non-ascending order respectively.

Proof.

We have the fact by H. Wielandt mentioned in [6, ${1955}$ , p. 113]: For any matrix $M$ of order $n\times n$ , the Hermitian matrix

[TABLE]

of order $2n\times 2n$ . The eigenvalues of $\widetilde{M}$ are precisely the singular values of $M$ and their negatives. We denote $\tau_{1}\geq\cdots\geq\tau_{n}$ the singular values of $A-B$ and we have

[TABLE]

and the eigenvalues of these three Hermitian matrices are

[TABLE]

By Lemma 4.3, if we denote $(\rho-\sigma)_{1},\dots,(\rho-\sigma)_{n}$ the numbers $\rho_{1}-\sigma_{1},\dots,\rho_{n}-\sigma_{n}$ arranged in non-ascending order of magnitude, then we have

[TABLE]

Here we focus on the proof for $2$ -norm, to see the proof considering all the orthogonally invariant norms, refer to [18, ${1960}$ , Theorem 1]. By Corollary 2.15

[TABLE]

Proof of Theorem 4.2.

For any unitary matrix $Q$ , it has singular values all equal to $1$ . Hence by Theorem 4.4, if $A$ has the singular value decomposition $A=P\Sigma_{A}V^{*}$ , then

[TABLE]

Remain to prove that $\|{A-U}\|=\|{\Sigma_{A}-I_{n}}\|$ . From Theorem 4.1, the Hermitian part $H$ can be written as $V\Sigma_{A}V^{*}$ , hence

[TABLE]

which completes the proof. ∎

This property indicates that the unitary factor of the input $Q_{\ell}$ will be the desired orthogonal matrix and then we can use it as the preconditioner. The rest of this section will focus on how to compute this factor.

4.3.2 SVD approach

The most obvious approach will be the singular value decomposition approach.

For $A\in\mathbb{C}^{n\times n}$ , by Theorem 4.1, we can use the singular value decomposition to compute the polar decomposition using the following procedure:

Compute the singular value decomposition of $A\in\mathbb{C}^{n\times n}$ where $A=P\Sigma V^{*}$ . 2. 2.

Form $U$ via (4.5), $U=PV^{*}$ .

The computational cost of computing the singular value decomposition is approximately $21n^{3}$ flops [9, ${2013}$ , Figure 8.6.1] with $O(n^{3})$ flops to form $U$ . This is good generally, but it does not utilize the property that, in our scenario, $A$ is almost orthogonal ( $\|{A^{*}A-I_{n}}\|_{2}$ is small). The next two sections will discuss a more sophisticated way of computing the unitary factor.

4.3.3 Newton’s Method Approach

Consider the Newton iteration [15, ${2008}$ , Section 8.3]

[TABLE]

where $(X_{k}^{-*})$ denotes the conjugate transpose of the inverse of $X$ . The following theorem shows the iteration (4.6) converges to the unitary factor of $A$ .

Theorem 4.5 (Newton iteration for the unitary polar factor).

Suppose $A\in\mathbb{C}^{n\times n}$ is nonsingular, then the iteration (4.6) produces a sequence $\{X_{k}\}_{k}$ such that

[TABLE]

where $U$ is the unitary factor of the polar decomposition of $A$ .

Proof.

See [13, ${1986}$ , Section 3.2].

Let the singular value decomposition of $A$ be $P\Sigma Q^{*}$ , where $P$ and $Q$ are unitary. Then by Theorem 4.1, we have $A=UH$ where $U=PQ^{*}$ and $H=Q\Sigma Q^{*}$ . The trick of the proof is to define a new sequence ${D_{k}}$ where

[TABLE]

From iteration (4.6), we have

[TABLE]

Notice that $P^{*}X_{k}Q=D_{k}$ and $P^{*}X_{k}^{-*}Q=(P^{*}X_{k}Q)^{-*}$ , hence the iteration becomes

[TABLE]

Since $D_{0}$ is diagonal with positive diagonal entries (since $A$ is full rank), hence the iteration (4.8) is well-defined and will produce a sequence of diagonal matrices $\{D_{k}\}$ and we can write $D_{k}=\mathrm{diag}(d_{i}^{(k)})$ , where $d_{i}^{(k)}>0$ . Using this notation, we can rewrite the iteration (4.8) elementwise:

[TABLE]

where $i=1,\dots,n$ . By an argument discussed in [12, ${1964}$ , p. 84], we can write

[TABLE]

This implies that the diagonal entries of $D_{k}$ converge to $1$ quadratically as $k\to\infty$ . Since the argument holds for any $i=1,\dots,n$ , we conclude that $\lim_{k\to\infty}D_{k}=I_{n}$ and use the definition of $D_{k}$ we have

[TABLE]

as $k\to\infty$ , where $U$ is the unitary factor of the polar decomposition of $A$ . ∎

The Newton iteration for the unitary factor of a square matrix $A$ can also be derived by applying the Newton’s method to the equation $X^{*}X=I_{n}$ . Consider a function $F(X)=X^{*}X-I_{n}$ , then $X$ is unitary if $X$ is a zero of the function $F(X)$ .

Suppose $Y$ is an approximate solution to the equation $F(\widetilde{X})=0$ , we can write $\widetilde{X}=Y+E$ and substitute $\widetilde{X}$ into the equation $F(\widetilde{X})=0$

[TABLE]

Dropping the second order term, we get Newton iteration $Y^{*}Y+Y^{*}E+E^{*}Y-I_{n}=0$ and this is the Sylvester equation for $E$ [15, ${2008}$ , Problem 8.18]. We aim to solve $E$ and update $Y$ by $Y+E$ , which gives the Newton iteration at the $k$ th step:

[TABLE]

Assume that $X_{k}^{*}E_{k}$ is Hermitian, namely $X_{k}^{*}E_{k}=E_{k}^{*}X_{k}$ , then we can further simplify the iteration by rewritten (4.10) as $X_{k}^{*}E_{k}=(I_{n}-X_{k}^{*}X_{k})/2$ and the expression we got for $X_{k}^{*}E_{k}$ is indeed Hermitian, hence our assumption is valid. Therefore, we have a explicit expression for $E_{k}$ , $E_{k}=(X_{k}^{-*}-X_{k})/2$ , and the iteration (4.10) becomes

[TABLE]

choosing $X_{0}=A$ , we have our desired Newton iteration (4.6).

Convergence of Newton Iteration

The convergence of the iteration (4.6) can be seen from the proof of Theorem 4.5.

Theorem 4.6 (Convergence of the iteration (4.6), [15, ${2008}$ , Theorem 8.12]).

Let $A\in\mathbb{C}^{n\times n}$ be non-singular. Then the Newton iterates $X_{k}$ in (4.6) converges quadratically to the unitary polar factor $U$ of $A$ with

[TABLE]

where $\|{\cdot}\|$ is any unitarily invariant norm.

Proof.

Given $A\in\mathbb{C}^{n\times n}$ , the singular value decomposition of $A$ is $P\Sigma Q^{*}$ and the unitary factor is $U=PQ^{*}$ . From (4.9), we have

[TABLE]

This is true for all $1\leq i\leq n$ , therefore we can rewrite this in matrix form

[TABLE]

Recall $D_{k}=\mathrm{diag}(d_{i}^{(k)})$ , then (4.11) is same as $D_{k+1}-I=D_{k}^{-1}(D_{k}-I)^{2}/2$ . Therefore we can conclude that $D_{k}$ converges to the identity $I$ quadratically and therefore $X_{k}$ converges to the unitary factor $U$ quadratically.

From $D_{k+1}-I=D_{k}^{-1}(D_{k}-I)^{2}/2$ , by taking the norm on both sides we have

[TABLE]

The final inequality comes from the fact that every unitarily invariant norm is subordinate [2, ${1997}$ , Proposition IV 2.4]. Also, the unitary invariance implies the following three equalities,

•

$\|{D_{k+1}-I}\|=\|{PD_{k+1}Q^{*}-PQ^{*}}\|=\|{X_{k+1}-U}\|$ .

•

$\|{D_{k}^{-1}}\|=\|{QD_{k}^{-1}P^{*}}\|=\|{X_{k}^{-1}}\|$ .

•

Using the same argument as 1, we have $\|{D_{k}-I}\|=\|{X_{k}-U}\|$ .

As a result, (4.12) can be rewritten as

[TABLE]

∎

We can test the iteration (4.6) by the following code

⬇

1A = my_testmatrices(10,100); X = A;

2for i = 1:10

3Y = inv(X);

4X = (Y' + X)/2;

5end

6[P,~,V] = svd(A); U = P*V';

7disp(norm(X-U));

With only $10$ iterations, the distance between our computed polar factor and the theoretical polar factor (formed by singular value decomposition) is as small as the unit roundoff. However, the disadvantage of this iteration is that it requires the explicit form of the inverse of a matrix at each step, therefore we can introduce another method using one step of the Schulz iteration [1, ${1965}$ ] which will remove the requirement of the matrix inverse.

4.3.4 Newton–Schulz Iteration Approach

Given $A\in\mathbb{C}^{n\times n}$ , an iterative method for computing the inverse $A^{-1}$ , proposed by Schulz [21, ${1933}$ , p. 58], is defined by

[TABLE]

where $X_{0}$ is an approximation to $A^{-1}$ . Based on the (4.6), we can use a one-step Schulz iteration to approximate the inverse of $X_{k}^{*}$ which can be written as $Y_{1}=Y_{0}(2I-X_{k}^{*}Y_{0})$ , where $Y_{0}$ is an approximation to $X_{k}^{-*}$ . In our situation, the input $X_{0}$ is an almost unitary matrix ( $\|{X_{0}^{*}X_{0}-I}\|_{2}$ is small) and the output is $U$ which is unitary. Hence the sequence generated by the Newton iteration, $\{X_{k}\}$ , should always be almost unitary, therefore $X_{k}$ should be a good approximate to $(X_{k}^{*})^{-1}$ . The idea of Newton–Schulz iteration can be formulated by the Algorithm 6.

Algorithm 6.

Given $A\in\mathbb{C}^{n\times n}$ such that $\|{A^{*}A-I}\|_{2}\approx u_{\ell}$ , the algorithm computes the unitary factor $U$ of $A$ defined by Theorem 4.1.

[TABLE]

Notice that, the one-step Schulz iteration can be embedded into the Newton iteration by substituting the $X_{k}^{-*}$ from line $3$ into line $4$ and we have

[TABLE]

and this is known as the Newton–Schulz iteration [15, ${2008}$ , Section 8.3].

Convergence of Newton–Schulz Iteration

We would like to know, the iteration (4.14) converges to $1$ under which condition. Using the similar approach as proof of Theorem 4.5, by writing $D_{k}=P^{*}X_{k}Q$ where $P$ and $Q$ are the unitary parts of the singular value decomposition of $A=P\Sigma Q^{*}$ , we can consider the iteration (4.14) as the scalar form:

[TABLE]

where $\sigma_{i}$ is the $i$ th singular value of $A$ . To see the convergence of (4.14), we first find the interval of convergence of (4.15).

Lemma 4.7.

If $\sigma_{i}\in(0,\sqrt{3})$ , then the iteration (4.15) converges quadratically to $1$ .

Proof.

If $\sigma_{i}=1$ , then the iteration converges immediately. To simplify the notation, we rewrite the iteration (4.15) as

[TABLE]

Case 1: If $d_{k}\in(0,1)$ , then $(3-d_{k}^{2})/2>1$ which gives $d_{k}(3-d_{k}^{2})/2>d_{k}$ . Differentiate $f(d_{k})$ once we have $f^{\prime}(d_{k})=-d_{k}^{2}+(3-d_{k}^{2})/2$ and this will always be positive for $d_{k}\in(0,1)$ . Therefore, if we start with $d_{k}\in(0,1)$ , then $d_{k}<f(d_{k})=d_{k+1}<f(1)=1$ , namely, start from $d_{0}\in(0,1)$ , then the iteration (4.16) will converges to $1$ .

Case 2: If $d_{k}\in(1,\sqrt{3})$ , then by similar approach we have the following

[TABLE]

Therefore, the function $f$ will monotonically decreasing from $1$ to [math] within the interval $(1,\sqrt{3})$ and $\sup_{d_{k}\in(1,\sqrt{3})}f(d_{k})=1$ . Hence, if $d_{k}\in(0,\sqrt{3})$ , then $f(d_{k})$ will belongs to $(0,1)$ , then the iteration (4.16) converges to $1$ as discussed in case $1$ . The properties of the function $f(d_{k})$ in both case $1$ and case $2$ can be seen from the Figure 4.2 which shows that for $d_{k}\in(0,1)$ , the function $f$ monotonically increasing from [math] to $1$ and for $d_{k}\in(1,\sqrt{3})$ , the function $f$ monotonically decreasing from $1$ to [math].

Other parts of the real line may still converges to $1$ but the interval of converges are too short to consider and control (for example, if we start with $d_{0}=2.2$ , then $d_{1}=f(d_{0})=-2.0240$ and $d_{2}=1.1097\in(0,\sqrt{3})$ which will converges to $1$ ).

Remain to show that the convergence is of order $2$ . By consider the difference $d_{k+1}-1$ (similar approach as (4.9)), we have

[TABLE]

which implies quadratic convergence. ∎

From the quadratic convergence of the scalar Newton–Schulz iteration, we have the following theorem.

Theorem 4.8 (Convergence of Newton–Schulz iteration (4.14)).

For $A\in\mathbb{C}^{n\times n}$ of rank $n$ which has the polar decomposition $A=UH$ , if $\sigma_{i}(A)\in(0,\sqrt{3})$ or equivalently $\|{A}\|_{2}<\sqrt{3}$ , then the Newton–Schulz iteration

[TABLE]

converges to the unitary factor $U$ quadratically as $k\to\infty$ and

[TABLE]

Proof.

The result is obvious follows the proof of Theorem 4.5 and 4.6 and Lemma 4.7. ∎

Implementation and Testing

The Newton–Schulz iteration can be utilized on the almost orthogonal matrix $Q_{\ell}$ as discussed in Section 4.1 since the eigenvalues of $Q_{\ell}$ are all near $1$ which definitely satisfies the convergence condition for the Newton–Schulz iteration. For $A\in\mathbb{C}^{n\times n}$ , iteration (4.14) can be formalized by Algorithm 7 with restrictions on the singular values of $A$ based on Theorem 4.8.

Algorithm 7.

Given a matrix $A\in\mathbb{C}^{n\times n}$ with $\sigma(A)\subset(0,\sqrt{3})$ , this algorithm computes the unitary factor $U$ of the polar decomposition of $A=UH$ using the Newton–Schulz iteration (4.14).

[TABLE]

Notice that, in line 3, we use the Frobenius norm to examine the orthogonality instead of the 2-norm, this is due to the Frobenius norm being easier and cheaper to compute. We can further simplify the algorithm based on the following analysis.

From now on, the explanation will focus on the 2-norm, but it can be generalized to all unitarily invariant norms. Suppose $Q_{\ell}$ has a singular value decomposition $Q_{\ell}=P\Sigma Q^{*}$ , then we can rewrite $\|{Q_{\ell}^{*}Q_{\ell}-I}\|_{2}$ as

[TABLE]

Since $\Sigma=\mathrm{diag}(\sigma_{1},\dots,\sigma_{n})$ where $\sigma_{i}$ are sorted in non-ascending order. Using Corollary 2.15 and the fact that the singular values of the almost unitary matrix are around one, we have $\|{\Sigma^{2}-I}\|_{2}\approx{\sigma_{1}^{2}-1}\approx n\cdot 10^{-8}$ , therefore

[TABLE]

From relation (4.17) and the fact that $\|{Q_{\ell}-U}\|_{2}=\|{\Sigma-I}\|_{2}$ , using the notations in (4.14), we have

[TABLE]

Similarly, after the second iteration, we have

[TABLE]

We can compare this quantity with the desired tolerance $nu_{d}$ and we can produce Figure 4.3 using Appendix B.5.

From Figure 4.3, we observe that for $n$ less than $52000$ , Algorithm 7 will converge in $2$ steps. If the dimension exceeds $52400$ , by the same analysis, for any $Q_{\ell}\in\mathbb{R}^{n\times n}$ , if $n<2172548$ , then the Newton–Schulz iteration will converge in three steps. In practice, MATLAB is not able to generate such a big matrix. Therefore, we can remove the terminate condition in Algorithm 7 as shown in Algorithm 8,

Algorithm 8.

Given a matrix $Q_{\ell}\in\mathbb{C}^{n\times n}$ such that $\|{Q_{\ell}^{*}Q_{\ell}-I}\|_{2}\approx n\cdot 10^{-8}$ and $n<2\times 10^{6}$ , this algorithm computes the unitary factor $U$ of the polar decomposition of $Q_{\ell}=UH$ using the Newton–Schulz iteration (4.14).

[TABLE]

Implementation of Algorithm 8 in MATLAB gives

⬇

1function A = newton_schulz(A)

2[~,n] = size(A);

3if n >= 52000, iter = 3;

4else, iter = 2; end

5for k = 1:iter

6A = 0.5A(3*eye(n) - A'*A);

7end

For each iteration, there are only 2 matrix-matrix multiplications which cost $4n^{3}$ flops and hence the total cost when applying newton_schulz to $A\in\mathbb{C}^{n\times n}$ will be about $8n^{3}+O(n^{2})$ flops. Compare to the singular value decomposition approach discussed in Section 4.3.2 which costs about $21n^{3}$ , the Newton–Schulz iteration is usually the preferred method when the input is almost unitary.

We will perform the following test: Given a matrix $A\in\mathbb{R}^{n\times n}$ with fixed condition number, here $\kappa(A)=100$ , we compute its spectral decomposition at single precision $A=QDQ^{*}$ . Then we use this $Q$ as our input of newton_schulz and see how the quantity $\|{Q_{d}^{*}Q_{d}-I}\|_{2}$ , where $Q_{d}$ is the output of the function, changes as the dimension $n$ changes. Using the code from Appendix B.6, we produce Figure 4.4.

From Figure 4.4, the quantity $\|{Q_{d}^{*}Q_{d}-I}\|_{2}$ is well bounded by the tolerance $nu_{d}$ and the result is consistent with our expectations and analyses.

4.4 Numerical Comparison

In this section, we will examine the performance and accuracy of both the QR factorization and the polar decomposition using the Newton–Schulz iteration. Theoretically, for our almost orthogonal matrix $Q_{\ell}$ , the QR factorization requires about $10/3n^{3}$ flops and the Newton–Schulz iteration method requires about $8n^{3}$ flops. However, MATLAB computes matrix-matrix products faster, hence Newton–Schulz iteration has the potential to be faster than the QR factorization. We set up our numerical comparison via the following

Generate the symmetric matrix $A\in\mathbb{R}^{n\times n}$ and its approximate spectral decomposition $A=Q_{\ell}D_{\ell}Q_{\ell}^{*}$ . Here, we fixed $\kappa(A)=100$ . Even though no matter how we change the condition number of $A$ , the eigenvector matrix at single precision, $Q_{\ell}$ , will always have condition number near $1$ , recall this is also the reason why we can always apply the Newton–Schulz iteration to the matrix $Q_{\ell}$ . 2. 2.

For different $n$ , applying myqr and newton_schulz to $Q_{\ell}$ and output $Q_{d}$ which has been numerically proved that it is orthogonal at double precision, then we examine these functions via

(a)

Accuracy: $\|{Q^{*}Q-I}\|_{2}$ . 2. (b)

Performance : time used measured by tic,toc.

4.4.1 Accuracy

The accuracy is measured by $\|{Q_{d}^{*}Q_{d}-I}\|_{2}$ where $Q_{d}$ is the output of myqr and newton_schulz. From previous testings, both QR factorization and Newton–Schulz iteration are accurate enough to generate an orthogonal matrix at double precision. Therefore, here we are comparing the accuracy between these two methods shown in Figure 4.5. In addition, we add the accuracy of the MATLAB built-in function qr() served as a reference line.

From the figure, we can see that the Newton–Schulz iteration is generally more accurate than the QR factorization method.

4.4.2 Performance

The performance can be measured by the MATLAB built-in function tic and toc. Using similar code from Section 4.4.1, we are able to compare the time used by myqr and newton_schulz and produce Figure 4.6.

From the figure, the Newton–Schulz iteration is generally faster than the QR factorization approach.

Consequently, based on the accuracy and performance, we shall stick with the Newton–Schulz iteration approach as the method to orthogonalize the matrix $Q_{\ell}$ .

Chapter 5 Mixed Precision Jacobi Algorithm

In this chapter, a mixed precision algorithm will be presented with intensive testings on its convergence, speed and accuracy. As mentioned in the Section 2.4, we will perform the testings for both mode 'geo' and mode 'ari'.

5.1 Algorithm

As discussed from [5, ${2000}$ ], we can improve the Jacobi algorithm by using a preconditioner. From the discussions in Chapter 3 and 4, we assemble them all and collapse into the following algorithm,

Algorithm 9.

Given a symmetric matrix $A\in\mathbb{R}^{n\times n}$ and a positive tolerance $tol$ , this algorithm computes the spectral decomposition of $A$ using the cyclic-by-row Jacobi algorithm with preconditioning.

[TABLE]

5.1.1 Implementation and Testing

It is straightforward to implement Algorithm 9 by utilizing the existing codes. MATLAB code can be found in Appendix B.9.

We would like to first access its accuracy concerning both the dimension and the condition number. Using code in Appendix B.10 we can produce Figure 5.1.

As shown from the figure, the Algorithm 9 gives more accurate results for the matrices with geometrically distributed singular values. But regardless of the distributions, the computed $Q$ and $\varLambda$ satisfies the condition

[TABLE]

and therefore this algorithm can be used to find the spectral decomposition of a real symmetric matrix in practice. The remaining of this chapter is to find out two things

How many sweeps of Jacobi will be necessary for convergence for a preconditioned real symmetric matrix $A_{\text{cond}}$ as described in Algorithm 9? 2. 2.

How much time can be saved by using preconditioning?

5.2 Convergence and Speed

In this section, we will analyze how many sweeps will be sufficient for the cyclic-by-row Jacobi algorithm converges for a preconditioned symmetric matrix $A_{\text{cond}}$ .

5.2.1 Preconditioning

Using the notations from Algorithm 9, $Q_{d}$ is the nearest orthogonal matrix compare to the approximate eigenvector matrix $Q_{\ell}$ , hence $A_{\text{cond}}$ is expected to be almost diagonal. Consequently, we expect this eigenvalue problem should be able to be solved more quickly by the Jacobi algorithm [5, ${2000}$ , Sec. 1, Corollary 3.2]. Quantitatively, after preconditioning, the magnitude of the off-diagonal entries of $A_{\text{cond}}$ should be no more than $nu_{\ell}\|{A}\|_{2}$ , which is equivalently saying $\textrm{off}(A_{\text{cond}})\lesssim n^{2}(n-1)u_{\ell}\|{A}\|_{2}$ . In practice, this bound is sharper than the theoretical bound and we can have

[TABLE]

This upper bound can be numerically tested by the following procedure: First, generate a symmetric matrix $A\in\mathbb{R}^{n\times n}$ with controlled dimension and condition number, then we perform steps 1,2 and 3 in the Algorithm 9 to see how much $\textrm{off}(A)$ is reduced based on this precondition technique. Using the code in Appendix B.11, we generate Figure 5.2.

From the top-left, top-right and bottom-left parts of the Figure 5.2, we see that for $n=50,250$ and $500$ , the quantity $\textrm{off}(A_{\text{cond}})/\|{A}\|_{2}$ is always smaller or similar to $nu_{\ell}$ regardless of its singular values distribution. Notice that, as $n$ changes from $50$ to $500$ , the perturbation of the quantity $\textrm{off}(A_{\text{cond}})/\|{A}\|_{2}$ becomes more and more negligible compare to $nu_{\ell}$ . Together with the bottom-right figure, as the size of $A$ passes $100$ , the quantity $\textrm{off}(A_{\text{cond}})/\|{A}\|_{2}$ is always well bounded by $nu_{\ell}$ . Based on these observations, we verify the idea of preconditioning and prove numerically that after preconditioning $A$ by $Q_{d}^{T}AQ_{d}$ , the off-diagonal entries are reduced significantly and this property can be captured by the Jacobi algorithm.

5.2.2 Quadratic Convergence

Based on the preconditioning process, in this section, we will discuss the convergence of the cyclic Jacobi algorithm applied to a preconditioned real symmetric matrix. Then we will deliver some numerical testings concerning different dimensions and condition numbers. Finally, we will test the improvements in speed by using the preconditioners.

Recall that the number of iterations in one sweep is equal to half of the number of off-diagonal entries,

[TABLE]

From the previous derivation, the Jacobi algorithm is proved to converge linearly. However, an improvement by Schönhage in 1961 [20, ${1961}$ ] stated that the Jacobi algorithm for symmetric matrices with distinct eigenvalues is converging quadratically. Let us denote $A^{(k)}\in\mathbb{R}^{n\times n}$ be the matrix produced by the $k$ th iteration of the Jacobi algorithm and therefore $A^{(0)}\in\mathbb{R}^{n\times n}$ is the original input matrix. Denote $\lambda_{1},\dots,\lambda_{n}$ be the eigenvalues of $A^{(0)}$ and suppose we have

[TABLE]

Suppose we reach the stage that the $\textrm{off}(A^{(k)})<{\delta}/{8}$ and the angles of rotation generate by each iteration are smaller than $\pi/4$ , as controlled in Section 3.1.1, Schönhage’s result said

[TABLE]

which implies quadratic convergence. Later in 1962, Wilkinson provided a sharper bound [23, ${1962}$ , Section 3]. Under the same condition as described by Schönhage, Wilkinson proposed

[TABLE]

In 1966, van Kempen proved the quadratic convergence without the assumption of distinct eigenvalues. In his paper [22, ${1966}$ , Section 2], instead of define $\delta$ as in (5.2), he denoted $\delta$ as

[TABLE]

Suppose after $k$ iterations, $\textrm{off}(A^{(k)})<\delta/8$ , then

[TABLE]

Note that although the above bound is correct, the proof in [22, ${1966}$ ] is not correct. This mistake was unveiled by Hari who proposed a sharper bound [11, ${1991}$ , Section 2].

Back to our situation, as described in Section 5.2.1, after preconditioning, $\textrm{off}(A)\lesssim nu_{\ell}\|{A}\|_{2}$ . Assuming our eigenproblem is well conditioned and the minimum distance between the distinct eigenvalues is large enough such that

[TABLE]

where $\delta$ is defined by (5.3). Then we can expect quadratic convergence and the number of iterations should not exceed five since it will only take $4$ sweeps for the quantity $\textrm{off}(A)$ reduces from $0.1$ to $10^{-16}$ ( $0.1^{4}=10^{-16}$ ).

5.2.3 Numerical Testing

By referring to the function jacobi_precondi in Appendix B.9, we can output the number of iterations required. Therefore, we can investigate the number of sweeps required using the routine from Appendix B.12. The procedure is exactly the same as the testing in Section 5.1.1 and we can produce Figure 5.3.

Suppose our testing matrices are not ill-conditioned, then for the matrices with arithmetically distributed singular values, it usually requires two iterations for the cyclic-by-row Jacobi algorithm to converge since the eigenvalue are relatively far apart and this can result in quadratic convergence as discussed in Section 5.2.2. However, for the matrices with geometrically distributed singular values, the algorithm can take four iterations to converge. By construction, the geometrically distributed singular values has smaller $\delta$ in (5.3) and this can result in slower convergence since it will require more sweep for $\textrm{off}(A)$ to satisfies $\textrm{off}(A)<\delta/8$ . We can exaggerate this observation by generating a matrix with very close eigenvalues.

⬇

1close all; clear; clc; rng(1,'twister');

2A_geo = my_randsvd(1e3,1e16,'geo');

3A_ari = my_randsvd(1e3,1e16,'ari');

The matrix A_geo has $\min_{\lambda_{i}\neq\lambda_{j}}\lvert\lambda_{i}-\lambda_{j}\rvert\approx 10^{-18}$ , whereas the matrix A_ari has $\min_{\lambda_{i}\neq\lambda_{j}}\lvert\lambda_{i}-\lambda_{j}\rvert\approx 10^{-3}$ . Clearly, the Algorithm 9 can hardly attained the quadratic rate of convergence on A_geo.

After applying Algorithm 9 on both A_geo and A_ari, the former testing requires 25 sweeps to converge and the latter one only need three sweeps. Therefore, the Jacobi algorithm performs better when the distance between distinct eigenvalues are large since the algorithm can take advantage of quadratic convergence.

The final testing is to see how much time the mixed precision algorithm can save, we can use the methodology in Section 4.4.2 to produce Figure 5.4. For large $n$ , the benefit of using preconditioner is significant for both distributions of the singular values.

Also, besides using the figure to visualize the difference in time, we can calculate how much the algorithm is improved for each $n$ via

[TABLE]

where $t_{\text{normal}}$ and $t_{\text{precondition}}$ are the time used by the cyclic-by-row Jacobi algorithm without and with preconditioning.

⬇

1>> format short

2>> (t_normal_g - t_precond_g)./t_normal_g % geometrically distributed SVs

3ans =

4 0.7150 0.7574 0.7739 0.7747 0.7789

5>> (t_normal_a - t_precond_a)./t_normal_a % Arithmetically distributed SVs

6ans =

7 0.7294 0.7513 0.7543 0.7619 0.7663

The entries are corresponding to $n=100,200,\dots,500$ . Based on the data, we can conclude that by using the preconditioning technique stated in Algorithm 9, we saved roughly $75\%$ of time.

Chapter 6 Conclusion and Further Work

We studied the classical Jacobi algorithm and the cyclic-by-row Jacobi algorithm and concluded that the cyclic-by-row Jacobi algorithm is much more efficient than the other one. Then, after performing implementation and testing on the QR factorization and the Newton–Schulz iteration, in order to orthogonalize an almost orthogonal matrix, we shall always use the Newton–Schulz iteration. Finally, we followed the preconditioning technique in [5, ${2000}$ ] and successfully propose the mixed precision algorithm 9 for the real symmetric eigenproblem. By utilizing the eig function at single precision and preconditioning, we can save roughly 75% of the time compared to applying the Jacobi algorithm alone.

Further works of this thesis include:

Rounding error analysis hasn’t been conducted. 2. 2.

There exist more ways to orthogonalize an almost orthogonal matrix. For example, the Cholesky QR factorization. 3. 3.

Only one low precision level has been studied. These theories can apply to half precision as well. However, it requires more work on exploring a half precision version of eig function in order to become faster than our Algorithm 9 which uses MATLAB built-in single precision. 4. 4.

There are other ways that can be used to speed up the Jacobi algorithm. For example, the threshold Jacobi algorithm [24, ${1965}$ , Section 5.12] and the block Jacobi procedure [9, ${2013}$ , Section 8.5.6].

Appendix A Supplementary Explanation

A.1 Algorithm 2

We have two different expressions for computing $t$ as shown in Table A.1.

These two expressions are the same by simple calculation,

[TABLE]

However, when we implement these into MATLAB, approach 1 will output $c=1,s=0$ for $a_{pq}\approx 10^{-8}$ , and approach 2’s output gives $s\neq 0$ even for $a_{pq}\approx 10^{-14}$ . Namely, by using approach 1, the computed output is less accurate than the output from approach 2.

Appendix B Supplementary Code

B.1 Testing Matrices

By using the following code, we can generate a random real symmetric matrix with predefined condition number.

⬇

1function A = my_randsvd(n, kappa, mode)

2classname = 'double';

3if isempty(mode)

4 mode = 'Geo';

5end

6switch mode % Set up vector sigma of singular values.

7 case {'Geometric','geometric','Geo','geo'}

8 % geometrically distributed singular values

9 factor = kappa^(-1/(n-1));

10 sigma = factor.^(0:n-1);

11 case {'Arithmatic', 'arithmatic', 'Ari', 'ari'}

12 % arithmetically distributed singular values

13 sigma = ones(n,1) - (0:n-1)'/(n-1)*(1-1/kappa);

14 otherwise

15 error(message('MATLAB:randsvd:invalidMode'));

16end

17sigma = cast(sigma,classname); signs = sign(rand(1,n-2));

18sigma(2:n-1) = sigma(2:n-1).* signs;

19Q = qmult(n,1,classname);

20A = Q*diag(sigma)*Q';

21A = (A + A')/2;

B.2 Full Jacobi Algorithm Code

⬇

1function [V,D,counter] = jacobi_eig(Aini,tol,type,maxiter)

2%JACOBI_EIG Jacobi Eigenvalue Algorithm

3% [A,V,counter] = JACOBI_EIG(Aini,tol,type,maxiter) computes the eigen-

4% decomposition of Aini, Aini = VDV', where V is orthogonal.

5%Input:

6% Aini -------- Input matrix

7% tol --------- Desired tolerance, usually 1.1e-16 (double precision)

8% type -------- Type of Jacobi algorithm, either

9% * 'classical': For classical Jacobi algorithm

10% * 'cyclic': For cyclic-by-row Jacobi algorithm

11% maxiter ----- Maximum number of sweep, specially for cyclic-by-row

12% Jacobi algorithm. If undetermined, the maximum sweep will

13% be set as 10 sweeps.

14%Output:

15% V ------------ Orthogonal matrix

16% D ------------ Resulting matrix after apply Jacobi algorithm.

17% counter ------ Number of iteration

18if nargin <= 2, error('Not enough input arguments.');

19elseif nargin >= 5, error('Too many input arguments.');

20else

21 % select method, 'classical' or 'cyclic'

22 switch type

23 case {'classical','Classical'}

24 method = 1;

25 case {'cyclic','Cyclic'}

26 method = 2;

27 otherwise

28 error("The Jacobi algorithm avaliable are " + ...

29 "'classical' and 'cyclic'")

30 end

31 switch method

32 case 1

33 [V,D,counter] = jacobi_classical(Aini,tol);

34 case 2

35 switch nargin

36 case 4

37 maxiteration = maxiter;

38 [V,D,counter] = jacobi_cyclic(Aini,tol,maxiteration);

39 case 3

40 maxiteration = 10;

41 [V,D,counter] = jacobi_cyclic(Aini,tol,maxiteration);

42 otherwise

43 error('Not enough input arguments.');

44 end

45 end

46end

47end

48

49% classical Jacobi algorithm

50function [V,A,counter] = jacobi_classical(A,tol)

51counter = 0; n = length(A); V = eye(n); done_rot = true;

52tol1 = tol * norm(A);

53while done_rot

54 if isint(counter/(n*n)), A = (A + A')/2; end

55 done_rot = false; [p,q] = maxoff(A);

56 if abs(A(p,q)) > tol1 * sqrt(abs(A(p,p) * A(q,q)))

57 counter = counter + 1; done_rot = true;

58 [c,s] = jacobi_pair(A,p,q);

59 J = [c,s;-s,c];

60 A([p,q],:) = J'*A([p,q],:);

61 A(:,[p,q]) = A(:,[p,q]) * J;

62 V(:,[p,q]) = V(:,[p,q]) * J;

63 else

64 A = diag(diag(A));

65 break;

66 end

67end

68end

69

70% Cyclic-by-row Jacobi algorithm

71function [V,A,iter] = jacobi_cyclic(A,tol,maxiter)

72n = length(A); V = eye(n); iter = 0; done_rot = true;

73while done_rot && iter < maxiter

74 done_rot = false;

75 for p = 1:n-1

76 for q = p+1:n

77 if abs(A(p,q)) > tol * sqrt(abs(A(p,p)*A(q,q)))

78 done_rot = true;

79 [c,s] = jacobi_pair(A,p,q);

80 J = [c,s;-s,c];

81 A([p,q],:) = J'*A([p,q],:);

82 A(:,[p,q]) = A(:,[p,q]) * J;

83 V(:,[p,q]) = V(:,[p,q]) * J;

84 end

85 end

86 end

87 if done_rot

88 A = (A + A')/2; iter = iter + 1;

89 else

90 A = diag(diag(A));

91 return;

92 end

93% fprintf('off(A) = %d, sweep = %d\n', off(A), iter);

94end

95end

96

97% compute the Frobenius norm of the off-diagonal entries of the input matrix

98function num = off(A)

99n = length(A); A(1:n+1:n*n) = 0;

100num = norm(A,'fro');

101end

102

103% find the index of maximum off-diagonal entry of the input matrix A

104function [p,q] = maxoff(A)

105n = length(A); % dimension of matrix A

106A(1:n+1:n*n) = 0; A = abs(A); % clear the diagonal entries

107[val, idx1] = max(A);

108[~, q] = max(val); p = idx1(q);

109end

110

111% calculate the Givens rotation matrix's entries (c,s).

112function [c,s] = jacobi_pair(A,p,q)

113if A(p,q) == 0

114 c = 1; s = 0;

115else

116 tau = (A(q,q)-A(p,p))/(2*A(p,q));

117 if tau >= 0

118 t = 1/(tau + sqrt(1+tau*tau));

119 else

120 t = 1/(tau - sqrt(1+tau*tau));

121 end

122 c = 1/sqrt(1+t*t);

123 s = t*c;

124end

125end

B.3 Code for Figure 3.2

Figure 3.2 can be regenerated using the following routine

⬇

1clc; clear; close all; format short e;

2condi1 = 100; tol = 2^(-53);

3rng(1,'twister');

4for n = 1e2:1e2:1e3

5 A = my_randsvd(n,condi1,'geo');

6 fprintf('iteration of n %d/%d\n', n/1e2, 1e3/1e2);

7 tic

8 [V1,D1,iter1] = jacobi_eig(A,tol,'Classical');

9 t_classical_n(n/1e2) = toc;

10 tic

11 [V2,D2,iter2] = jacobi_eig(A,tol,'Cyclic',20);

12 t_cyclic_n(n/1e2) = toc;

13end

14nplot = 1e2:1e2:1e3;

15n = 2.5e2;

16rng(1,'twister');

17for condi = 1e3:1e3:1e5

18 A = my_randsvd(n,condi,'geo');

19 fprintf('iteration of cond %d/%d...',condi/1e3,1e5/1e3);

20 tic

21 [V1,D1,iter1] = jacobi_eig(A,tol,'Classical');

22 t_classical_condi(condi/1e3) = toc;

23 tic

24 [V2,D2,iter2] = jacobi_eig(A,tol,'Cyclic',20);

25 t_cyclic_condi(condi/1e3) = toc;

26 fprintf('complete \n');

27end

28condiplot = 1e3:1e3:1e5;

29close all;

30subplot(1,2,1); hold off;

31plot(nplot,t_classical_n,'-^','LineWidth',2);

32hold on;

33plot(nplot,t_cyclic_n,'-^','LineWidth',2);

34xlabel('Dimension $(n)$ ');

35ylabel('Time Used (sec)');

36title('Fixed $\kappa(A) = 100$ ');

37legend('Classical Jacobi','Cyclic Jacobi','Location','northwest','FontSize',20);

38axis square

39subplot(1,2,2); hold off;

40plot(condiplot,t_classical_condi,'-^','LineWidth',1.5);

41hold on;

42plot(condiplot,t_cyclic_condi,'-^','LineWidth',1.5);

43xlabel(' $\kappa(A)$ ');

44ylabel('Time Used (sec)');

45title('Fixed $n = 250$ ');

46legend('Classical Jacobi','Cyclic Jacobi','Location','northwest','FontSize',20);

47axis square; axis([0,1e5,0,25])

B.4 Code for Figure 4.1

Figure 4.1 can be regenerated using the following routine

⬇

1clc; clear; close all; format short e;

2ud = 2^(-53); n = 1e2:1e2:3e3; condi = 100; rng(1,'twister');

3for i = 1:length(n)

4 fprintf('iteration %d/%d\n', i, length(n));

5 A = gallery('randsvd', n(i), -condi, 3);

6 [Q,~] = myqr(A);

7 orthog(i) = norm(Q'*Q - eye(n(i)));

8end

9semilogy(n,orthog,'-^r',LineWidth=1.5);

10hold on; semilogy(n, n.*ud, '-^k',LineWidth=1.5);

11legend(" $\|Q^TQ - I_n\|$ ", ' $nu_d$ ','Location','northwest');

12xlabel('Dimension $(n)$ ')

B.5 Code for Figure 4.3

Figure 4.3 can be regenerated using the following routine

⬇

1clc; clear; close all;

2n = 10:10:8e4;

3sq = (sqrt(1+n .* eps(single(1/2))) - 1).^4;

4sq_ref = n .* eps(1/2);

5plot(n,sq); hold on;

6plot(n,sq_ref,'--');

7plot(52400,5.81757e-12,'^k')

8legend(' $\|X_2 - U\|_2$ ',' $n\cdot u_d$ ','Location','northwest');

9xlabel('Dimension $(n)$ ')

B.6 Code for Figure 4.4

Figure 4.4 can be regenerated using the following routine

⬇

1clc; clear; close all; format short e; rng(1,'twister');

2ud = 2^(-53);

3n = 1e2:1e2:3e3; condi = 1e2;

4for i = 1:length(n)

5 fprintf('Start iteration %d/%d,\n',i,length(n));

6 A = my_randsvd( n(i), condi, 'geo');

7 [Q,~] = eig(single(A)); Q = double(Q);

8 Q1 = newton_schulz(Q);

9 orthog(i) = norm(Q1'*Q1 - eye(n(i)));

10end

11semilogy(n,orthog,'-^r',LineWidth=1.5);

12hold on;

13semilogy(n, n.*ud, '-^k',LineWidth=1.5);

14legend(" $\|Q_d^*Q_d - I_n\|$ ",' $nu_d$ ','Location','northwest');

15xlabel('Dimension (n)'); title('Fix $\kappa(A) = 100$ '); axis square

B.7 Code for Figure 4.5

Figure 4.5 can be regenerated using the following routine

⬇

1clc; clear; close all; format short e; rng(1,'twister');

2ud = 2^(-53); n = 1e2:1e2:3e3; condi = 1e2;

3for i = 1:length(n)

4 fprintf('iteration %d/%d\n',i,length(n));

5 A = my_randsvd(n(i),condi,'geo');

6 [Q,~] = eig(single(A)); Q = double(Q);

7 Q1 = newton_schulz(Q);

8 [Q2,~] = qr(Q);

9 [Q3,~] = myqr(Q);

10 orthogns(i) = norm(Q1'*Q1 - eye(n(i)));

11 orthogqr(i) = norm(Q2'*Q2 - eye(n(i)));

12 orthogmyqr(i) = norm(Q3'*Q3 - eye(n(i)));

13end

14semilogy(n,orthogns,'-^',LineWidth=1.5); hold on;

15semilogy(n,orthogqr,'-^',LineWidth=1.5);

16semilogy(n,orthogmyqr,'-^',LineWidth=1.5);

17legend("Newton-Schulz", ...

18 "\texttt{qr()} from MATLAB", ...

19 "My own QR code",...

20 'interpreter','latex', ...

21 'Location','northwest', ...

22 'FontSize',20);

23xlabel('Dimension $(n)$ ');

24ylabel(' $\|Q_d^*Q_d - I_n\|$ ')

25axis([0,3000,0,2e-14])

26axis square

B.8 Code for Figure 4.6

Figure 4.6 can be regenerated using the following routine

⬇

1clc; clear; close all; format short e;

2ud = 2^(-53); n = 1e2:1e2:3e3; condi = 100;

3time_ns = zeros(length(n),1); time_qr = time_ns;

4rng(1,'twister'); % RNG

5for i = 1:length(n)

6 fprintf('iteration %d/%d\n',i,length(n));

7 A = my_randsvd(n(i),condi,'geo');

8 [Q,~] = eig(single(A)); Q = double(Q);

9 tic; [Q1] = newton_schulz(Q); time_ns(i) = toc;

10 tic; [Q2,~] = myqr(Q); time_qr(i) = toc;

11end

12semilogy(n,time_ns,'-^',LineWidth=1.5); hold on;

13semilogy(n,time_qr,'-^',LineWidth=1.5);

14legend('Newton--Schulz','QR Factorization','Location','northwest','interpreter','latex','FontSize',20);

15xlabel('Dimension $(n)$ ')

16ylabel('Time (sec)')

17axis square

B.9 Mixed Precision Jacobi Algorithm Code

⬇

1function [Q,D,iter] = jacobi_precondi(A,tol,maxiter)

2%JACOBI_PRECONDI(A,tol,maxiter) compute the spectral decomposition of A using cyclic-by-row Jacobi algorithm with preconditioning.

3%% Preliminaries

4switch nargin

5 case 2, max_it = 10; case 3, max_it = maxiter;

6 otherwise, error('Please check the number of inputs');

7end

8[n,m] = size(A);

9if m ~= n, error('Input matrix should be square matrix'), end

10if issymmetric(A) == 0, error('Input matrix should be symmetric'); end

11%% Spectral decomposition at single precision

12[Q_low,~] = eig(single(A)); Q_low = double(Q_low);

13%% Newton--Schulz iteration to orthogonalize

14Q_d = newton_schulz(Q_low);

15%% Cyclic-by-row Jacobi algorithm on preconditioned A

16A_cond = Q_d' * A * Q_d;

17[V,D,iter] = jacobi_eig(A_cond,tol,'Cyclic',max_it);

18%% Output

19Q = Q_d * V;

B.10 Code for Figure 5.1

Figure 5.1 can be regenerated using the following routine

⬇

1clc; clear; format short e;

2n = 50:50:1e3; condi = 500; nplot = n;

3rng(1,'twister');

4for i = 1:length(n)

5 fprintf('iteration %d/%d...',i,length(n))

6 A_geo = my_randsvd(n(i),condi,'geo');

7 A_ari = my_randsvd(n(i),condi,'ari');

8 [V1,D1,iter1] = jacobi_precondi(A_geo,eps(1/2));

9 [V2,D2,iter2] = jacobi_precondi(A_ari,eps(1/2));

10 accuracy_n_geo(i) = norm(A_geo * V1 - V1 * D1);

11 accuracy_n_ari(i) = norm(A_ari * V2 - V2 * D2);

12 reference_n(i) = n(i) * norm(A_geo) * eps(1/2);

13 fprintf('complete\n');

14end

15%%

16subplot(1,2,1); hold off;

17semilogy(nplot,accuracy_n_geo,'-^'); hold on;

18semilogy(nplot,accuracy_n_ari,'-^');

19semilogy(nplot,nplot.*eps(1/2),'--k');

20xlabel('Dimension $(n)$ ');

21ylabel(' $\|AQ - Q\Lambda\|_2/\|A\|_2$ ');

22legend('Geometric','Arithmetic',' $nu_d\|A\|_2$ ','Location','southeast','FontSize',20);

23title('Fixed $\kappa(A) = 500$ ')

24axis square

25%%

26condiplot = 1e2:1e2:1e4; n = 5e2;

27for i = 1:length(condiplot)

28 fprintf('iteration %d/%d...',i,length(condiplot))

29 A_geo = my_randsvd(n,condiplot(i),'geo');

30 A_ari = my_randsvd(n,condiplot(i),'ari');

31 [V1,D1,iter1] = jacobi_precondi(A_geo,eps(1/2));

32 [V2,D2,iter2] = jacobi_precondi(A_ari,eps(1/2));

33 accuracy_condi_geo(i) = norm(A_geo * V1 - V1 * D1);

34 accuracy_condi_ari(i) = norm(A_ari * V2 - V2 * D2);

35 reference_n(i) = n * norm(A_geo) * eps(1/2);

36 fprintf('complete\n');

37end

38%%

39close all;

40subplot(1,2,1); hold off;

41semilogy(nplot,accuracy_n_geo,'-^'); hold on;

42semilogy(nplot,accuracy_n_ari,'-^');

43semilogy(nplot,nplot.*eps(1/2),'--k');

44xlabel('Dimension $(n)$ ');

45ylabel(' $\|AQ - Q\Lambda\|_2/\|A\|_2$ ');

46legend('Geometric','Arithmetic',' $nu_d$ ','Location','southeast','FontSize',20);

47title('Fixed $\kappa(A) = 500$ ')

48axis square

49subplot(1,2,2); hold off;

50semilogy(condiplot,accuracy_condi_geo,'-^'); hold on;

51semilogy(condiplot,accuracy_condi_ari,'-^');

52semilogy(condiplot,reference_n,'--k');

53xlabel(' $\kappa(A)$ ');

54ylabel(' $\|AQ - Q\Lambda\|_2/\|A\|_2$ ');

55legend('Geometric','Arithmetic',' $nu_d$ ','Location','northeast','FontSize',20);

56title('Fixed $n = 500$ ')

57axis square; axis([0,1e4,0,8e-14]);

B.11 Code for Figure 5.2

⬇

1clc; clear all; close all; figure(1);

2n = [50,250,500];

3for i = 1:3

4 subplot(2,2,i)

5 testing_1(n(i));

6 if i == 1

7 axis([0,1e4,0,6e-6]);

8 title('Fixed $n=50$ ');

9 elseif i == 2

10 axis([0,1e4,0,3.8e-5]);

11 title('Fixed $n = 250$ ');

12 elseif i == 3

13 axis([0,1e4,0,8e-5]);

14 title('Fixed $n = 500$ ');

15 end

16 pause(0.1);

17end

18subplot(2,2,4); hold off

19condi = 1e2; n1 = 10:10:1e3; offA = zeros(1,length(n1)); rng(1,'twister');

20for i = 1:length(n1)

21 fprintf('start %d/%d\n',i,length(n1));

22 A_geo = my_randsvd(n1(i),condi,'geo');

23 A_ari = my_randsvd(n1(i),condi,'ari');

24 [Q1,~] = eig(single(A_geo)); Q1 = double(Q1);

25 Q1_d = newton_schulz(Q1);

26 offA_geo(i) = off(Q1_d' * A_geo * Q1_d)/norm(A_geo);

27 [Q2,~] = eig(single(A_ari)); Q2 = double(Q2);

28 Q2_d = newton_schulz(Q2);

29 offA_ari(i) = off(Q2_d' * A_ari * Q2_d)/norm(A_ari);

30end

31semilogy(n1,offA_geo); hold on;

32semilogy(n1,offA_ari);

33semilogy(n1,n1 .* double(eps(single(1/2))),'--k');

34legend('Geometric','Arithmetic',' $nu_\ell$ ','Location','southeast');

35xlabel(' $\kappa(A)$ '); ylabel('off $(A_{cond})/\|A\|_2$ ','Interpreter','latex');

36set(gca,'FontSize',20);

37axis square;

38title('Fixed $\kappa(A) = 100$ ')

39function testing_1(n)

40condi = 10:10:1e4; offA_geo = zeros(1,length(condi)); offA_ari = offA_geo;

41rng(1,'twister');

42for i = 1:length(condi)

43 fprintf('start %d/%d\n',i,length(condi));

44 A_geo = my_randsvd(n,condi(i),'geo');

45 [Q1,~] = eig(single(A_geo)); Q1 = double(Q1);

46 Q1_d = newton_schulz(Q1);

47 offA_geo(i) = off(Q1_d' * A_geo * Q1_d)/norm(A_geo);

48 A_ari = my_randsvd(n,condi(i),'ari');

49 [Q2,~] = eig(single(A_ari)); Q2 = double(Q2);

50 Q2_d = newton_schulz(Q2);

51 offA_ari(i) = off(Q2_d' * A_ari * Q2_d)/norm(A_ari);

52end

53semilogy(condi,offA_geo,'LineWidth',1.5); hold on;

54semilogy(condi,offA_ari);

55semilogy(condi,n * double(eps(single(1/2))) * ones(1,length(condi)),'--k','LineWidth',1.5);

56legend('Geometric','Arithmetic',' $nu_\ell$ ');

57xlabel(' $\kappa(A)$ '); ylabel('off $(A_{cond})/\|A\|_2$ ','Interpreter','latex');

58set(gca,'FontSize',20);

59axis square;

60end

B.12 Code for Figure 5.3

Figure 5.3 can be regenerated using the following routine

⬇

1clear; clc; rng(1,'twister');

2n = 50:50:1e3; condi = 500; nplot = n;

3for i = 1:length(n)

4 A_geo = my_randsvd(n(i),condi,'geo');

5 [V1,D1,iter1] = jacobi_precondi(A_geo,eps(1/2));

6 iteration_n_geo(i) = iter1;

7 A_ari = my_randsvd(n(i),condi,'ari');

8 [V2,D2,iter2] = jacobi_precondi(A_ari,eps(1/2));

9 iteration_n_ari(i) = iter2;

10 fprintf('iteration %d/%d complete\n',i,length(n));

11end

12condiplot = 1e2:1e2:1e4; n = 5e2;

13for i = 1:length(condiplot)

14 A_geo = my_randsvd(n,condiplot(i),'geo');

15 [V1,D1,iter1] = jacobi_precondi(A_geo,eps(1/2));

16 iteration_condi_geo(i) = iter1;

17 A_ari = my_randsvd(n,condiplot(i),'ari');

18 [V2,D2,iter2] = jacobi_precondi(A_ari,eps(1/2));

19 iteration_condi_ari(i) = iter2;

20 fprintf('iteration %d/%d complete\n',i,length(condiplot));

21end

22close all

23subplot(1,2,1);

24semilogy(nplot,iteration_n_geo,'-^'); hold on;

25semilogy(nplot,iteration_n_ari,'-^');

26xlabel('Dimension $(n)$ ');

27ylabel('Number of iterations til converge');

28title('Fixed $\kappa(A) = 500$ ')

29legend('Geometric','Arithmetic','Location','northeast');

30axis square; axis([0,1000,0,4]);

31subplot(1,2,2);

32semilogy(condiplot,iteration_condi_geo,'-^'); hold on;

33semilogy(condiplot,iteration_condi_ari,'-^');

34legend('Geometric','Arithmetic','Location','northeast');

35xlabel(' $\kappa(A)$ ');

36ylabel('Number of iterations til converge');

37title('Fixed $n = 500$ ')

38axis square; axis([0,1e4,0,5]);

B.13 Code for Figure 5.4

Figure 5.4 can be regenerated using the following routine

⬇

1clc; clear; close all; rng(1,'twister');

2n = 1e2:1e2:5e2; condi = 500; tol = 2^(-53);

3for i = 1:length(n)

4 Ag = my_randsvd(n(i),condi,'geo');

5 Aa = my_randsvd(n(i),condi,'ari');

6 tic; [V1,D1,iter1] = jacobi_precondi(Ag,tol);

7 t_precond_g(i) = toc;

8 tic; [V2,D2,iter2] = jacobi_eig(Ag,tol,'Cyclic');

9 t_normal_g(i) = toc;

10 tic; [Va1,Da1,iter1] = jacobi_precondi(Aa,tol);

11 t_precond_a(i) = toc;

12 tic; [Va2,Da2,iter2] = jacobi_eig(Aa,tol,'Cyclic');

13 t_normal_a(i) = toc;

14 fprintf('Iteration %d/%d complete\n',i,length(n));

15end

16subplot(1,2,1)

17plot(n,t_normal_g,'-^');

18hold on;

19plot(n,t_precond_g,'-^');

20xlabel('Dimension $(n)$ ')

21ylabel('Time Used (sec)')

22legend('Without Preconditioning', 'With Preconditioning','Location','northwest');

23title('Geometrically distributed test matrices')

24axis square

25subplot(1,2,2)

26plot(n,t_normal_a,'-^');

27hold on;

28plot(n,t_precond_a,'-^');

29xlabel('Dimension $(n)$ ')

30ylabel('Time Used (sec)')

31legend('Without Preconditioning', 'With Preconditioning','Location','northwest');

32title('Arithmetically distributed test matrices')

33axis square

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Adi Ben-Israel. An iterative method for computing the generalized inverse of an arbitrary matrix . Math. Comp. , 19(91):452–455, 1965. · doi ↗
2[2] Rajendra Bhatia. Matrix Analysis . Springer-Verlag, New York, 1997. xi+347 pp. ISBN 978-1-4612-6857-4. · doi ↗
3[3] Philip I. Davies, Nicholas J. Higham, and Françoise Tisseur. Analysis of the Cholesky method with iterative refinement for solving the symmetric definite generalized eigenproblem . SIAM J. Matrix Anal. Appl. , 23(2):472–493, 2001. · doi ↗
4[4] James Demmel and Krešimir Veselić. Jacobi’s method is more accurate than QR . SIAM J. Matrix Anal. Appl. , 13(4):1204–1245, 1992. · doi ↗
5[5] Zlatko Drmač and Krešimir Veselić. Approximate eigenvectors as preconditioner . Linear Algebra Appl. , 309(1):191–215, 2000. · doi ↗
6[6] Ky Fan and A. J. Hoffman. Some metric inequalities in the space of matrices . Proc. Amer. Math. Soc. , 6(1):111–116, 1955.
7[7] George E. Forsythe and Peter Henrici. The cyclic Jacobi method for computing the principal values of a complex matrix . Trans. Amer. Math. Soc. , 94(1):1–23, 1960.
8[8] Gene H. Golub and Henk A. van der Vorst. Eigenvalue computation in the 20th century . J. Comput. Appl. Math. , 123(1):209–239, 2001. · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A Mixed Precision Eigensolver Based on the Jacobi Algorithm

Chapter 1 Introduction

1.1 Notations

1.2 IEEE standard

1.3 Reproducible Research

Chapter 2 Matrix Norms, Orthogonal Matrices and Singular

2.1 Norms

2.1.1 Vector Norms

Definition 2.1** (vector norm).**

Example 2.2**.**

2.1.2 Matrix Norms

Definition 2.3** (matrix norm).**

Example 2.4**.**

Definition 2.5** (consistent norms).**

Example 2.6**.**

2.2 Orthogonal Matrices

Definition 2.7** (orthogonal matrices).**

Theorem 2.8**.**

Proof.

Theorem 2.9**.**

Proof.

2.2.1 Givens Rotations

Definition 2.10** (Givens rotation).**

Proposition 2.11**.**

Proof.

Proposition 2.12**.**

Proof.

2.2.2 Householder Transformations

Definition 2.13**.**

Transforming a Vector

Algorithm 1**.**

Implementation and Testing

2.3 Singular Value Decomposition

Theorem 2.14** (singular value decomposition).**

Corollary 2.15**.**

Proof.

2.4 Test Matices

Chapter 3 Jacobi Algorithm

3.1 Derivation and Convergence

Definition 3.1** (off operator).**

3.1.1 Choices of Givens rotation matrix

3.1.2 Implementation and Testing

Algorithm 2**.**

3.2 Classical Jacobi Algorithm

Algorithm 3**.**

3.2.1 Linear Convergence

Remark 3.2**.**

3.2.2 Implementation and Testing

3.3 Cyclic-by-row Jacobi Algorithm

Algorithm 4**.**

3.3.1 Implementation and Testing

3.4 Comparision

Chapter 4 Orthogonalization

4.1 Introduction

4.2 The Householder QR Factorization

4.2.1 Theory of Householder QR Factorization

Algorithm 5**.**

4.2.2 Implementation and Testing

4.3 The Polar Decomposition

Theorem 4.1** (Polar decomposition).**

Proof.

4.3.1 Best Approximation Property

Theorem 4.2** ([6, 1955{1955}1955, Theorem 1],[15, 2008{2008}2008, Theorem 8.4]).**

Lemma 4.3** ([18, 1960{1960}1960, Lemma 4]).**

Proof.

Theorem 4.4** ([18, 1960{1960}1960, Theorem 5], [15, 2008{2008}2008, Theorem B.3]).**

Proof.

Proof of Theorem 4.2.

4.3.2 SVD approach

4.3.3 Newton’s Method Approach

Theorem 4.5** (Newton iteration for the unitary polar factor).**

Proof.

Convergence of Newton Iteration

Definition 2.1 (vector norm).

Example 2.2.

Definition 2.3 (matrix norm).

Example 2.4.

Definition 2.5 (consistent norms).

Example 2.6.

Definition 2.7 (orthogonal matrices).

Theorem 2.8.

Theorem 2.9.

Definition 2.10 (Givens rotation).

Proposition 2.11.

Proposition 2.12.

Definition 2.13.

Algorithm 1.

Theorem 2.14 (singular value decomposition).

Corollary 2.15.

Definition 3.1 (off operator).

Algorithm 2.

Algorithm 3.

Remark 3.2.

Algorithm 4.

Algorithm 5.

Theorem 4.1 (Polar decomposition).

Theorem 4.2 ([6, ${1955}$ , Theorem 1],[15, ${2008}$ , Theorem 8.4]).

Lemma 4.3 ([18, ${1960}$ , Lemma 4]).

Theorem 4.4 ([18, ${1960}$ , Theorem 5], [15, ${2008}$ , Theorem B.3]).

Theorem 4.5 (Newton iteration for the unitary polar factor).

Theorem 4.6 (Convergence of the iteration (4.6), [15, ${2008}$ , Theorem 8.12]).

Algorithm 6.

Lemma 4.7.

Theorem 4.8 (Convergence of Newton–Schulz iteration (4.14)).

Algorithm 7.

Algorithm 8.

Algorithm 9.