On choices of formulations of computing the generalized singular value   decomposition of a large matrix pair

Jinzhi Huang; Zhongxiao Jia

arXiv:1907.10392·math.NA·April 13, 2021

On choices of formulations of computing the generalized singular value decomposition of a large matrix pair

Jinzhi Huang, Zhongxiao Jia

PDF

TL;DR

This paper compares two formulations for computing the GSVD of large matrices, analyzing their numerical stability and accuracy in finite precision arithmetic to guide better computational choices.

Contribution

It provides a detailed perturbation analysis of the two formulations and offers criteria for selecting the more accurate approach in finite precision computations.

Findings

01

One formulation is more numerically stable than the other.

02

Perturbation bounds help determine the preferable formulation.

03

Numerical experiments confirm the theoretical analysis.

Abstract

For the computation of the generalized singular value decomposition (GSVD) of a large matrix pair $(A, B)$ of full column rank, the GSVD is commonly formulated as two mathematically equivalent generalized eigenvalue problems, so that a generalized eigensolver can be applied to one of them and the desired GSVD components are then recovered from the computed generalized eigenpairs. Our concern in this paper is, in finite precision arithmetic, which generalized eigenvalue formulation is numerically preferable to compute the desired GSVD components more accurately. We make a detailed perturbation analysis on the two formulations and show how to make a suitable choice between them. Numerical experiments illustrate the results obtained.

Figures20

Click any figure to enlarge with its caption.

Tables3

Table 1. Table 1: Properties of the test problems with m = 1500 𝑚 1500 m=1500 , p = 2000 𝑝 2000 p=2000 and n = 1000 𝑛 1000 n=1000 .

Problem	$κ (A)$	$κ (B)$	$κ ([\begin{matrix} \begin{matrix} A \\ B \end{matrix}) \end{matrix}]$	$‖ X ‖$	$‖ X^{- 1} ‖$	$σ_{\max} (A, B)$	$σ_{\min} (A, B)$
1a	$1.0 e + 2$	$1.0 e + 2$	$7.03$	$5.31$	$1.32$	$58.1$	$1.57 e - 2$
1b	$1.0 e + 5$	$1.0 e + 2$	$5.84$	$4.45$	$1.31$	$65.0$	$2.04 e - 5$
1c	$1.0 e + 7$	$1.0 e + 2$	$9.55$	$7.19$	$1.33$	$65.1$	$1.70 e - 7$

Table 2. Table 2: Properties of the test problems with m = n 𝑚 𝑛 m=n and p = n + 1 𝑝 𝑛 1 p=n+1 .

Problem	$n$	$κ (A)$	$κ (B)$	$κ ([\begin{matrix} \begin{matrix} A \\ B \end{matrix} \end{matrix}])$	$σ_{\max} (A, B)$	$σ_{\min} (A, B)$
2a (3elt)	$4720$	$2.8 e + 3$	$3.0 e + 3$	$6.35$	$2.89 e + 3$	$5.00 e - 4$
2b (delan12)	$4096$	$5.2 e + 3$	$2.6 e + 3$	$5.15$	$2.35 e + 3$	$2.65 e - 4$
2c (viscopl1)	$4326$	$1.4 e + 5$	$2.8 e + 3$	$468$	$1.53 e + 3$	$9.34 e - 6$
2d (cavity16)	$4562$	$9.4 e + 6$	$2.9 e + 3$	$75.6$	$1.23 e + 2$	$1.51 e - 7$
2e (gemat11)	$4929$	$6.0 e + 7$	$3.1 e + 3$	$512$	$23.8$	$2.65 e - 8$
2f (bcsstk16)	$4884$	$4.9 e + 9$	$3.1 e + 3$	$78.7$	$73.5$	$2.86 e - 10$

Table 3. Table 3: A comparison of ( 1.4 ) and ( 1.5 ) for computing the GSVDs of test problems 2a-2f.

Problem	better $σ$		better $x$		better $u$		better $v$
Problem	$p c t$ ( $%$ )	$a c c$	$p c t$ ( $%$ )	$a c c$	$p c t$ ( $%$ )	$a c c$	$p c t$ ( $%$ )	$a c c$
2a	$43.37$	$- 0.24$	$35.68$	$- 0.12$	$35.93$	$- 0.12$	$35.91$	$- 0.12$
2b	$47.22$	$- 0.11$	$36.45$	$- 0.07$	$36.62$	$- 0.07$	$36.57$	$- 0.07$
2c	$83.93$	$+ 0.89$	$83.38$	$+ 0.67$	$84.51$	$+ 0.71$	$84.26$	$+ 0.71$
2d	$85.60$	$+ 2.26$	$79.35$	$+ 2.05$	$79.37$	$+ 2.04$	$79.35$	$+ 2.04$
2e	$86.61$	$+ 1.00$	$97.28$	$+ 1.12$	$95.94$	$+ 1.05$	$95.94$	$+ 1.05$
2f	$99.20$	$+ 6.60$	$99.20$	$+ 6.33$	$99.20$	$+ 6.34$	$99.20$	$+ 6.34$

Equations164

{A B = U C X^{- 1}, = V S X^{- 1}, \mbox w i t h {C S = diag {α_{1}, \dots, α_{n}}, = diag {β_{1}, \dots, β_{n}},

{A B = U C X^{- 1}, = V S X^{- 1}, \mbox w i t h {C S = diag {α_{1}, \dots, α_{n}}, = diag {β_{1}, \dots, β_{n}},

Σ = C S^{- 1} = diag {σ_{1}, \dots, σ_{n}} .

Σ = C S^{- 1} = diag {σ_{1}, \dots, σ_{n}} .

∣ σ_{1} - τ ∣ \leq ∣ σ_{2} - τ ∣ \leq \dots \leq ∣ σ_{ℓ} - τ ∣ < ∣ σ_{ℓ + 1} - τ ∣ \leq \dots \leq ∣ σ_{n} - τ ∣.

∣ σ_{1} - τ ∣ \leq ∣ σ_{2} - τ ∣ \leq \dots \leq ∣ σ_{ℓ} - τ ∣ < ∣ σ_{ℓ + 1} - τ ∣ \leq \dots \leq ∣ σ_{n} - τ ∣.

(A, B) := ([A^{T} A], [I B^{T} B]),

(A, B) := ([A^{T} A], [I B^{T} B]),

(B, A) := ([B^{T} B], [I A^{T} A]) .

(B, A) := ([B^{T} B], [I A^{T} A]) .

A Y = B Y Σ \mbox an d B Z = A Z Λ,

A Y = B Y Σ \mbox an d B Z = A Z Λ,

Σ = [Σ - Σ 0], Y = [\frac{1}{2} U \frac{1}{2} W \frac{1}{2} U - \frac{1}{2} W U_{⊥} 0]

Σ = [Σ - Σ 0], Y = [\frac{1}{2} U \frac{1}{2} W \frac{1}{2} U - \frac{1}{2} W U_{⊥} 0]

Λ = [Λ - Λ 0], Z = [\frac{1}{2} V \frac{1}{2} W^{'} \frac{1}{2} V - \frac{1}{2} W^{'} V_{⊥} 0]

Λ = [Λ - Λ 0], Z = [\frac{1}{2} V \frac{1}{2} W^{'} \frac{1}{2} V - \frac{1}{2} W^{'} V_{⊥} 0]

Y^{T} B Y = I_{m + n}, Z^{T} A Z = I_{p + n} .

Y^{T} B Y = I_{m + n}, Z^{T} A Z = I_{p + n} .

(σ, y) := (\frac{α}{β}, \frac{1}{2} [u x / β])

(σ, y) := (\frac{α}{β}, \frac{1}{2} [u x / β])

(\frac{1}{σ}, z) := (\frac{β}{α}, \frac{1}{2} [v x / α])

(\frac{1}{σ}, z) := (\frac{β}{α}, \frac{1}{2} [v x / α])

(A, B) = (A + E, B + F) \mbox an d (B, A) = (B + F, A + E),

(A, B) = (A + E, B + F) \mbox an d (B, A) = (B + F, A + E),

∥ E ∥ \leq ∥ A ∥ ϵ, ∥ F ∥ \leq ∥ B ∥ ϵ \mbox an d ∥ F ∥ \leq ∥ B ∥ ϵ, ∥ E ∥ \leq ∥ A ∥ ϵ

∥ E ∥ \leq ∥ A ∥ ϵ, ∥ F ∥ \leq ∥ B ∥ ϵ \mbox an d ∥ F ∥ \leq ∥ B ∥ ϵ, ∥ E ∥ \leq ∥ A ∥ ϵ

max {∥ E ∥, ∥ F ∥, ∥ E ∥, ∥ F ∥} \leq max {∥ A ∥^{2}, ∥ B ∥^{2}, 1} ϵ .

max {∥ E ∥, ∥ F ∥, ∥ E ∥, ∥ F ∥} \leq max {∥ A ∥^{2}, ∥ B ∥^{2}, 1} ϵ .

χ (σ, σ) = \frac{∣ σ - σ ∣}{1 + σ ^{2} 1 + σ ^{2}} .

χ (σ, σ) = \frac{∣ σ - σ ∣}{1 + σ ^{2} 1 + σ ^{2}} .

χ (σ, σ)

χ (σ, σ)

χ (σ, σ)

χ (σ, σ) = \frac{∣ y ^{T} B y \cdot y ^{T} A y - y ^{T} A y \cdot y ^{T} B y ∣}{( y ^{T} A y ) ^{2} + ( y ^{T} B y ) ^{2} ( y ^{T} A y ) ^{2} + ( y ^{T} B y ) ^{2}} .

χ (σ, σ) = \frac{∣ y ^{T} B y \cdot y ^{T} A y - y ^{T} A y \cdot y ^{T} B y ∣}{( y ^{T} A y ) ^{2} + ( y ^{T} B y ) ^{2} ( y ^{T} A y ) ^{2} + ( y ^{T} B y ) ^{2}} .

∣ y^{T} B y \cdot y^{T} A y - y^{T} A y \cdot y^{T} B y ∣

∣ y^{T} B y \cdot y^{T} A y - y^{T} A y \cdot y^{T} B y ∣

χ (σ, σ) \leq \frac{( y ^{T} E y ) ^{2} + ( y ^{T} F y ) ^{2}}{( y ^{T} A y ) ^{2} + ( y ^{T} B y ) ^{2}} \leq \frac{∥ y ∥∥ y ∥ ∥ E ∥ ^{2} + ∥ F ∥ ^{2}}{( y ^{T} A y ) ^{2} + ( y ^{T} B y ) ^{2}} .

χ (σ, σ) \leq \frac{( y ^{T} E y ) ^{2} + ( y ^{T} F y ) ^{2}}{( y ^{T} A y ) ^{2} + ( y ^{T} B y ) ^{2}} \leq \frac{∥ y ∥∥ y ∥ ∥ E ∥ ^{2} + ∥ F ∥ ^{2}}{( y ^{T} A y ) ^{2} + ( y ^{T} B y ) ^{2}} .

χ (σ^{- 1}, σ^{- 1}) = χ (σ, σ),

χ (σ^{- 1}, σ^{- 1}) = χ (σ, σ),

δ_{1} = O (ϵ) \mbox an d δ_{2} = O (ϵ)

δ_{1} = O (ϵ) \mbox an d δ_{2} = O (ϵ)

Σ = [σ Σ_{2}], Y = [y, Y_{2}], Λ = [\frac{1}{σ} Λ_{2}], Z = [z, Z_{2}] .

Σ = [σ Σ_{2}], Y = [y, Y_{2}], Λ = [\frac{1}{σ} Λ_{2}], Z = [z, Z_{2}] .

∥ s ∥ \leq ∥ Y_{2} ∥∥ h ∥ \leq \frac{∥ Y _{2} ∥ ^{2} ∥ y ∥}{min _{μ_{i} \neq = σ} ∣ μ _{i} - σ ∣} ∥ σ F - E ∥ \leq η_{1} ∥ y ∥ (1 + δ_{1})

∥ s ∥ \leq ∥ Y_{2} ∥∥ h ∥ \leq \frac{∥ Y _{2} ∥ ^{2} ∥ y ∥}{min _{μ_{i} \neq = σ} ∣ μ _{i} - σ ∣} ∥ σ F - E ∥ \leq η_{1} ∥ y ∥ (1 + δ_{1})

δ_{1} = \frac{∥ s ∥}{∥ y ∥} \leq \frac{η _{1}}{1 - η _{1}} = O (ϵ) .

δ_{1} = \frac{∥ s ∥}{∥ y ∥} \leq \frac{η _{1}}{1 - η _{1}} = O (ϵ) .

χ (σ, σ)

χ (σ, σ)

χ (σ, σ)

\frac{∥ y ∥ ^{2}}{( y ^{T} A y ) ^{2} + ( y ^{T} B y ) ^{2}} = \frac{1}{2} \frac{∥ u ∥ ^{2} + \frac{∥ x ∥ ^{2}}{β ^{2}}}{1 + σ ^{2}} = \frac{∥ x ∥ ^{2} + β ^{2}}{2 β} .

\frac{∥ y ∥ ^{2}}{( y ^{T} A y ) ^{2} + ( y ^{T} B y ) ^{2}} = \frac{1}{2} \frac{∥ u ∥ ^{2} + \frac{∥ x ∥ ^{2}}{β ^{2}}}{1 + σ ^{2}} = \frac{∥ x ∥ ^{2} + β ^{2}}{2 β} .

∥ X ∥ \leq min {∥ A^{†} ∥, ∥ B^{†} ∥} \mbox an d ∥ X^{- 1} ∥ \leq ∥ A ∥^{2} + ∥ B ∥^{2},

∥ X ∥ \leq min {∥ A^{†} ∥, ∥ B^{†} ∥} \mbox an d ∥ X^{- 1} ∥ \leq ∥ A ∥^{2} + ∥ B ∥^{2},

\frac{1}{∥ A ∥ ^{2} + ∥ B ∥ ^{2}} \leq ∥ x ∥ \leq min {∥ A^{†} ∥, ∥ B^{†} ∥} .

\frac{1}{∥ A ∥ ^{2} + ∥ B ∥ ^{2}} \leq ∥ x ∥ \leq min {∥ A^{†} ∥, ∥ B^{†} ∥} .

∥ x ∥ \geq σ_{m i n} (X) = ∥ X^{- 1} ∥^{- 1} . \hfill \qed

∥ x ∥ \geq σ_{m i n} (X) = ∥ X^{- 1} ∥^{- 1} . \hfill \qed

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

∎

11institutetext: Jinzhi Huang 22institutetext: Department of Mathematical Sciences, Tsinghua University, 100084 Beijing, China

22email: [email protected] 33institutetext: Zhongxiao Jia 44institutetext: Corresponding author. Department of Mathematical Sciences, Tsinghua University, 100084 Beijing, China

44email: [email protected]

On choices of formulations of computing

the generalized singular value decomposition of a large matrix pair ††thanks: Supported by the National Natural Science Foundation of China (No.11771249).

Jinzhi Huang

Zhongxiao Jia

Abstract

For the computation of the generalized singular value decomposition (GSVD) of a large matrix pair $(A,B)$ of full column rank, the GSVD is commonly formulated as two mathematically equivalent generalized eigenvalue problems, so that a generalized eigensolver can be applied to one of them and the desired GSVD components are then recovered from the computed generalized eigenpairs. Our concern in this paper is, in finite precision arithmetic, which generalized eigenvalue formulation is numerically preferable to compute the desired GSVD components more accurately. We make a detailed perturbation analysis on the two formulations and show how to make a suitable choice between them. Numerical experiments illustrate the results obtained.

Keywords:

Generalized singular value decomposition generalized singular value generalized singular vector generalized eigenpair eigensolver perturbation analysis condition number

MSC:

65F15 65F35 15A12 15A18 15A42

1 Introduction

The generalized singular value decomposition (GSVD) of a matrix pair $(A,B)$ was first introduced by van Loan van1976generalizing and then developed by Paige and Saunders paige1981towards . It has become a standard decomposition and an important computational tool golub2012matrix , and has been extensively used in a wide range of contexts, e.g., solutions of discrete linear ill-posed problems hansen1998rank , weighted or generalized least squares problems bjorck1996numerical , information retrieval howland2003structure , linear discriminant analysis park2005relationship , and many others betcke2008generalized ; chu1987singular ; golub2012matrix ; kaagstrom1984generalized ; vanhuffel .

Let $A\in\mathbb{R}^{m\times n}$ ( $m\geq n$ ) and $B\in\mathbb{R}^{p\times n}$ ( $p\geq n$ ) be large and possibly sparse matrices of full column rank, i.e., ${\rm rank}(A)={\rm rank}(B)=n$ . The GSVD of $(A,B)$ is

[TABLE]

where $X=[x_{1},\dots,x_{n}]$ is nonsingular, $U=[u_{1},\dots,u_{n}]$ and $V=[v_{1},\dots,v_{n}]$ are orthonormal, and the positive numbers $\alpha_{i}$ and $\beta_{i}$ satisfy $\alpha_{i}^{2}+\beta_{i}^{2}=1$ , $i=1,\dots,n$ . We call such $(\alpha_{i},\beta_{i},u_{i},v_{i},x_{i})$ a GSVD component of $(A,B)$ with the generalized singular value $\sigma_{i}=\frac{\alpha_{i}}{\beta_{i}}$ , the left generalized singular vectors $u_{i}$ and $v_{i}$ , and the right generalized singular vector $x_{i}$ , $i=1,\dots,n$ . Denote the generalized singular value matrix of $(A,B)$ by

[TABLE]

Throughout this paper, we also refer to a scalar pair $(\alpha_{i},\beta_{i})$ as a generalized singular value of $(A,B)$ . Particularly, we will denote by $\sigma_{\max}(A,B)$ and $\sigma_{\min}(A,B)$ the largest and smallest generalized singular values of $(A,B)$ , respectively. Obviously, the generalized singular values of the pair $(B,A)$ are $\frac{1}{\sigma_{i}},\ i=1,2,\ldots,n$ , the reciprocals of those of $(A,B)$ , and their generalized singular vectors are the same as those of $(A,B)$ .

For a prescribed target $\tau$ , assume that the generalized singular values of $(A,B)$ are labeled by

[TABLE]

Specifically, if we are interested in the $\ell$ smallest generalized singular values of $(A,B)$ and/or the associated left and right generalized singular vectors, we assume $\tau=0$ in (1.3), so that the generalized singular values are labeled in increasing order; if we are interested in the $\ell$ largest generalized singular values of $(A,B)$ and/or the corresponding generalized singular vectors, we assume $\tau=+\infty$ in (1.3), so that the generalized singular values are labeled in decreasing order. More generally, once $\tau$ is bigger than the largest generalized singular value, the $\ell$ generalized singular values closest to $\tau$ are the largest ones of $(A,B)$ . In these two cases, the $\ell$ GSVD components $(\alpha,\beta,u,v,x)$ are called the extreme (smallest or largest) GSVD components of $(A,B)$ . Otherwise they are called $\ell$ interior GSVD components of $(A,B)$ if the given $\tau$ is inside the spectrum of the generalized singular values of $(A,B)$ . We will abbreviate any one of the desired GSVD components as $(\sigma,u,v,x)$ or $(\alpha,\beta,u,v,x)$ with the subscripts dropped.

For a large and possibly sparse matrix pair $(A,B)$ , one kind of approach to compute the desired GSVD components works on the pair directly. Zha zha1996 proposes a joint bidiagonalization method to compute the extreme generalized singular values $\sigma$ and the associated generalized singular vectors $u,v,x$ , which is a generalization of Lanczos bidiagonalization type methods jia2003implicitly ; jia2010 for computing a partial ordinary SVD of $A$ when $B=I$ . A main bottleneck of this method is that a large-scale least squares problem with the coefficient matrix $\begin{bmatrix}\begin{smallmatrix}A\\ B\end{smallmatrix}\end{bmatrix}$ must be solved at each step of the joint bidiagonalization. Jia and Yang jiayang2018 has made a further analysis on this method and its variant, and provided more theoretical supports for its rationale.

For the computation of GSVD, a natural approach is to apply a generalized eigensolver to the mathematically equivalent generalized eigenvalue problem of the cross product matrix pair $(A^{T}A,B^{T}B)$ to compute the corresponding eigenpairs $(\sigma^{2},x)$ and then recover the desired GSVD components from the computed eigenpairs. However, because of the squaring of the generalized singular values of $(A,B)$ , for $\sigma$ small, the eigenvalues $\sigma^{2}$ of $(A^{T}A,B^{T}B)$ are much smaller. As a consequence, the smallest generalized singular values may be recovered much less accurately and even may have no accuracy jia2006 . Therefore, we will not consider such a formulation in this paper.

Another kind of commonly used approach formulates the GSVD as a generalized eigenvalue problem hochstenbach2009jacobi , where the Jacobi-Davidson method hochstenbach2004 for the ordinary SVD problem has been adapted to a mathematically equivalent formulation of the GSVD so that a suitable generalized eigensolver parlett1998symmetric ; saad2011numerical ; stewart2001matrix can be used. The approach then recovers the desired GSVD components. Concretely, the two formulations proposed in hochstenbach2009jacobi transform the GSVD into the generalized eigenvalue problem of the augmented definite matrix pair

[TABLE]

or the augmented definite matrix pair

[TABLE]

We will give detailed relationships between the GSVD of $(A,B)$ and the generalized eigenpairs of $(\widehat{A},\widehat{B})$ and $(\widetilde{B},\widetilde{A})$ in the next section. One then applies a generalized eigensolver to either of them, computes the corresponding generalized eigenpairs, and recovers the desired GSVD components from those computed generalized eigenpairs.

As will be clear next section, the nonzero eigenvalues of $(\widehat{A},\widehat{B})$ and $(\widetilde{B},\widetilde{A})$ are $\pm\sigma_{i}$ and $\pm\frac{1}{\sigma_{i}}$ , $i=1,2,\ldots,n$ , respectively. Therefore, the largest or interior generalized singular values of $(A,B)$ become the largest or interior eigenvalues of $(\widehat{A},\widehat{B})$ , and the smallest or interior generalized singular values are the largest and interior eigenvalues of $(\widetilde{B},\widetilde{A})$ . In principle, we may use a number of projection methods, e.g., Lanczos type methods, to compute the extreme GSVD components via solving the generalized eigenvalue problem of $(\widehat{A},\widehat{B})$ or $(\widetilde{B},\widetilde{A})$ . For a unified account of projection algorithms, we refer to baiedit2000 . For the computation of interior GSVD components of $(A,B)$ , we may employ the Jacobi-Davidson type method proposed in hochstenbach2009jacobi , referred as JDGSVD, where at each step a linear system, i.e., the correction equation, is solved iteratively and its approximate solution is used to expand the current searching subspaces. The JDGSVD method deals with the generalized eigenvalue problem of (1.4) or (1.5), computes some specific generalized eigenpairs, and recovers the desired GSVD components from the converged generalized eigenpairs.

As far as numerical computations are concerned, an important question arises naturally: which of the mathematically equivalent formulations (1.4) and (1.5) is numerically preferable, so that the desired GSVD components can be computed more accurately? In this paper, rather than propose or develop any numerical algorithm for computing the desired $\ell$ GSVD components, we focus on this question carefully, give a deterministic answer to it, and suggest a definitive choice. We first make a sensitivity analysis on the generalized eigenpairs of (1.4) and (1.5). Based on the results to be obtained, we establish accuracy estimates for the approximate generalized singular values and the left and right generalized singular vectors that are recovered from the approximate generalized eigenpairs obtained. Then by comparing the accuracy of the approximate GSVD components recovered from the approximate generalized eigenpairs of (1.4) and (1.5), we make a correct choice between these two formulations.

This paper is organized as follows. In Section 2 we make a sensitivity analysis on the generalized eigenvalue problems of the structured matrix pairs $(\widehat{A},\widehat{B})$ and $(\widetilde{B},\widetilde{A})$ , respectively, and give error bounds for the generalized singular values $\sigma$ and the generalized eigenvectors of $(\widehat{A},\widehat{B})$ and $(\widetilde{B},\widetilde{A})$ . In Section 3 we carry out a sensitivity analysis on the approximate generalized singular vectors that are recovered from the approximate generalized eigenpairs of $(\widehat{A},\widehat{B})$ and $(\widetilde{B},\widetilde{A})$ . Based on the results and analysis, we conclude that (1.5) is preferable to compute the GSVD more accurately when $A$ is well conditioned and $B$ is ill conditioned, and (1.4) is preferable when $A$ is ill conditioned and $B$ is well conditioned. In Section 4 we propose a few practical choice strategies on (1.4) and (1.5). In Section 5 we report the numerical experiments. We conclude the paper in Section 6.

Throughout this paper, denote by $\|\cdot\|$ the 2-norm of a vector or matrix and $\kappa(C)=\sigma_{\max}(C)/\sigma_{\min}(C)$ the condition number of a matrix $C$ with $\sigma_{\max}(C)$ and $\sigma_{\min}(C)$ being the largest and smallest singular values of $C$ , respectively, and by $C^{T}$ the transpose of $C$ . Denote by $I_{k}$ the identity matrix of order $k$ , by $0_{k}$ and $0_{k\times l}$ the zero matrices of order $k$ and $k\times l$ , respectively. The subscripts are omitted when there is no confusion. We also denote by $\mathcal{R}(C)$ the column space or range of $C$ . For brevity of our analysis and results, without loss of generality, we suppose that $\|A\|$ and $\|B\|$ are comparable in size and, furthermore, $A$ and $B$ have already been scaled so that their 2-norms are of $\mathcal{O}(1)$ , that is, $\|A\|\approx\|B\|\approx 1$ roughly, meaning that $\sigma_{\min}^{-1}(A)=\|A^{\dagger}\|\approx\kappa(A)$ and $\sigma_{\min}^{-1}(B)=\|B^{\dagger}\|\approx\kappa(B)$ roughly and the conditioning of $A$ and $B$ is reflected by $\sigma_{\min}(A)$ and $\sigma_{\min}(B)$ , respectively.

2 Perturbation analysis of generalized eigenvalue problems

and the accuracy of generalized singular values

The generalized eigendecompositions of the matrix pairs $(\widehat{A},\widehat{B})$ and $(\widetilde{B},\widetilde{A})$ are closely related to the GSVD of $(A,B)$ in the following way, which is straightforward to verify.

Lemma 2.1.

Let the GSVD of $(A,B)$ be defined by (1.1) with the generalized singular values defined by (1.2). Let $U_{\perp}\in\mathbb{R}^{m\times(m-n)}$ and $V_{\perp}\in\mathbb{R}^{p\times(p-n)}$ be such that $[U,U_{\perp}]$ and $[V,V_{\perp}]$ are orthogonal. Then the matrix pairs $(\widehat{A},\widehat{B})$ and $(\widetilde{B},\widetilde{A})$ defined by (1.4) and (1.5) have the generalized eigendecompositions

[TABLE]

respectively, where

[TABLE]

with $W=XS^{-1}$ , and

[TABLE]

with $\Lambda=\Sigma^{-1}=SC^{-1}$ and $W^{\prime}=XC^{-1}$ . Moreover, the columns of the eigenvector matrices $Y$ and $Z$ are $\widehat{B}$ - and $\widetilde{A}$ -orthonormal, respectively, i.e.,

[TABLE]

Lemma 2.1 illustrates that the GSVD component $(\alpha,\beta,u,v,x)$ of $(A,B)$ corresponds to the generalized eigenpair

[TABLE]

of the augmented matrix pair $(\widehat{A},\widehat{B})$ with the eigenvector $y$ satisfying $y^{T}\widehat{A}y=\sigma$ and $y^{T}\widehat{B}y=1$ and the generalized eigenpair

[TABLE]

of the augmented matrix pair $(\widetilde{B},\widetilde{A})$ with the eigenvector $z$ satisfying $z^{T}\widetilde{B}z=\frac{1}{\sigma}$ and $z^{T}\widetilde{A}z=1$ . Therefore, the GSVD of $(A,B)$ is mathematically equivalent to the generalized eigendecompositions (1.4) and (1.5). In order to obtain some GSVD components $(\alpha,\beta,u,v,x)$ , one can compute the corresponding generalized eigenpairs $(\sigma,y)$ of $(\widehat{A},\widehat{B})$ or $(\frac{1}{\sigma},z)$ of $(\widetilde{B},\widetilde{A})$ by applying a generalized eigensolver to (1.4) or (1.5), and then recovers the desired GSVD components.

However, in numerical computations, we can obtain only approximate eigenpairs of (1.4) and (1.5), and thus recover only approximate GSVD components of $(A,B)$ . As a result, when numerically backward stable eigensolvers solve the generalized eigenvalue problems of (1.4) and (1.5) with the computed eigenpairs whose residuals have about the same size, a natural and central concern is: which of the computed eigenpairs of (1.4) and (1.5) will yield more accurate approximations to the desired GSVD components of $(A,B)$ , that is, which of (1.4) and (1.5) is numerically preferable to compute the GSVD components more accurately?

To this end, we need to carefully estimate the accuracy of the computed eigenpairs and that of the recovered GSVD components. Given a backward stable generalized eigensolver applied to (1.4) and (1.5), let $(\widehat{\sigma},\widehat{y})$ and $(\frac{1}{\widetilde{\sigma}},\widetilde{z})$ be the computed approximations to $(\sigma,y)$ and $(\frac{1}{\sigma},z)$ , respectively. Then $(\widehat{\sigma},\widehat{y})$ and $(\frac{1}{\widetilde{\sigma}},\widetilde{z})$ are the exact eigenpairs of some perturbed matrix pairs

[TABLE]

respectively, where the perturbations satisfy

[TABLE]

for $\epsilon$ small. In applications, we typically have $\epsilon=\mathcal{O}(\epsilon_{\rm mach})$ or $\epsilon=\mathcal{O}(\epsilon_{\rm mach}^{1/2})$ with $\epsilon_{\rm mach}$ being the machine precision baiedit2000 ; golub2012matrix ; parlett1998symmetric ; saad2011numerical ; stewart2001matrix . Here in (2.7), to distinguish from the exact augmented matrices defined in (1.4) and (1.5), we have used the bold letters to denote the perturbed matrices. Notice that the assumption $\|A\|\approx\|B\|\approx 1$ made in Section 1 means $\|\widehat{A}\|=\|A\|$ , $\|\widetilde{A}\|=\max\{1,\|A\|^{2}\}$ and $\|\widehat{B}\|=\max\{1,\|B\|^{2}\},\ \|\widetilde{B}\|=\|B\|$ . Therefore, the perturbations in (2.8) satisfy

[TABLE]

In what follows, we will analyze how accurate the computed eigenpairs $(\widehat{\sigma},\widehat{y})$ and $(\frac{1}{\widetilde{\sigma}},\widetilde{z})$ are for a given small $\epsilon$ .

2.1 The accuracy of generalized singular values

Stewart and Sun in the monograph stewart1990matrix use a chordal metric to measure the distance between the approximate and exact eigenvalues of a regular matrix pair. Let $\widehat{\sigma}$ and $\sigma$ be the eigenvalues of $(\widehat{\bm{A}},\widehat{\bm{B}})$ and $(\widehat{A},\widehat{B})$ . Then the chordal distance between them is

[TABLE]

We present the following results.

Theorem 2.2.

Let $(\sigma,y)$ and $(\frac{1}{\sigma},z)$ be simple eigenpairs of $(\widehat{A},\widehat{B})$ and $(\widetilde{B},\widetilde{A})$ , respectively, and their approximations $(\widehat{\sigma},\widehat{y})$ and $(\frac{1}{\widetilde{\sigma}},\widetilde{z})$ be the exact eigenpairs of the perturbed matrix pairs $(\widehat{\bm{A}},\widehat{\bm{B}})=(\widehat{A}+\widehat{E},\widehat{B}+\widehat{F})$ and $(\widetilde{\bm{B}},\widetilde{\bm{A}})=(\widetilde{B}+\widetilde{F},\widetilde{A}+\widetilde{E})$ , respectively, with the perturbations satisfying (2.8). Assume that the approximate eigenvectors $\widehat{y}$ and $\widetilde{z}$ are decomposed in the unnormalized form of $\widehat{y}=y+s$ and $\widetilde{z}=z+t$ with $y^{T}\widehat{B}s=0$ and $z^{T}\widetilde{A}t=0$ . Then the following error bounds hold:

[TABLE]

where $\delta_{1}=\frac{\|s\|}{\|y\|}$ and $\delta_{2}=\frac{\|t\|}{\|z\|}$ .

Proof.

By the fact that $\widehat{A}y=\sigma\widehat{B}y$ and $\widehat{\bm{A}}\widehat{y}=\widehat{\sigma}\widehat{\bm{B}}\widehat{y}$ , we have $\sigma=\frac{\widehat{y}^{T}\widehat{A}y}{\widehat{y}^{T}\widehat{B}y}=\frac{y^{T}\widehat{A}\widehat{y}}{y^{T}\widehat{B}\widehat{y}}$ and $\widehat{\sigma}=\frac{y^{T}\widehat{\bm{A}}\widehat{y}}{y^{T}\widehat{\bm{B}}\widehat{y}}$ . Applying these two expressions to (2.10), we obtain

[TABLE]

By $\widehat{A}=\widehat{\bm{A}}-\widehat{E}$ and $\widehat{B}=\widehat{\bm{B}}-\widehat{F}$ , the nominator in the above equality satisfies

[TABLE]

applying which to (2.13) gives rise to

[TABLE]

Notice from $\widehat{y}=y+s$ with $s$ satisfying $y^{T}\widehat{B}s=0$ and $y^{T}\widehat{A}s=0$ that $y^{T}\widehat{A}\widehat{y}=y^{T}\widehat{A}(y+s)=y^{T}\widehat{A}y$ and $y^{T}\widehat{B}\widehat{y}=y^{T}\widehat{B}(y+s)=y^{T}\widehat{B}y$ . Moreover, it has $\|\widehat{y}\|\leq\|y\|+\|s\|=\|y\|(1+\delta_{1})$ with $\delta_{1}=\frac{\|s\|}{\|y\|}$ . Applying these facts to the above inequality gives (2.11).

Replacing $\widehat{\sigma}$ , $\widehat{y}$ , $y$ , $(\widehat{A},\widehat{B})$ and $(\widehat{\bm{A}},\widehat{\bm{B}})$ with $\frac{1}{\widetilde{\sigma}}$ , $\widetilde{z}$ , $z$ , $(\widetilde{B},\widetilde{A})$ and $(\widetilde{\bm{B}},\widetilde{\bm{A}})$ , respectively, in (2.11), and exploiting the invariance of the chordal distance under reciprocal, i.e.,

[TABLE]

we obtain (2.12). ∎

Obviously, it can be seen from the proof that (2.11) and (2.12) are independent of scalings of $y$ , $\widehat{y}$ and $z$ , $\widehat{z}$ . Therefore, our assumption in the theorem on the unnormalized decomposition form of $\widehat{y}$ and $\widehat{z}$ is without loss of generality and is only for brevity of the presentation.

For the scalars $\delta_{1}$ and $\delta_{2}$ in (2.11) and (2.12), we claim that

[TABLE]

for a sufficiently small $\epsilon$ in (2.8). To show this precisely, without loss of generality, we assume that the approximate eigenvectors $y$ of $(\widehat{A},\widehat{B})$ and $z$ of $(\widetilde{B},\widetilde{A})$ are scaled such that $y^{T}\widehat{B}y=1$ and $z^{T}\widetilde{A}z=1$ . Moreover, let the generalized eigenvalue and eigenvector matrices of $(\widehat{A},\widehat{B})$ and $(\widetilde{B},\widetilde{A})$ defined by (2.1) be partitioned as

[TABLE]

Relation (2.4) shows $Y_{2}^{T}\widehat{B}y=0$ , i.e., the columns of $Y_{2}$ form a basis of $(\widehat{B}y)^{\perp}$ , and $s^{T}\widehat{B}y=0$ indicates that we can write $s=Y_{2}h$ for some $h\in\mathbb{R}^{m+n-1}$ . By (2.24) to be proved later, we have

[TABLE]

with $\eta_{1}=\frac{\|Y_{2}\|^{2}\|\widehat{\sigma}\widehat{F}-\widehat{E}\|}{\min_{\mu_{i}\neq\sigma}|\mu_{i}-\widehat{\sigma}|}$ and $\mu_{i}$ being the eigenvalues of $(\widehat{A},\widehat{B})$ other than $\sigma$ . If $\epsilon$ in (2.8) is sufficiently small such that $\eta_{1}<1$ , then from (2.16) we obtain an explicit bound for $\delta_{1}$ :

[TABLE]

In an analogous manner, we can obtain $\delta_{2}=\mathcal{O}(\epsilon)$ .

It is worthwhile to point out that some first order expansions are derived for $\chi(\widehat{\sigma},\sigma)$ for a general regular matrix pair in (stewart1990matrix, , p.291-4) but the constants in the second order smaller terms are unknown. The proofs of bounds (2.11) and (2.12) have no special requirement on the matrix pairs and thus are directly applicable to a general regular matrix pair by replacing the transpose by the conjugate transpose and the scalars in the denominators by their absolute values. In comparison with those results in (stewart1990matrix, , p.291-4), however, our bounds contain explicit second order smaller terms since we have obtained the explicit bounds for $\delta_{1}$ and $\delta_{2}$ .

Exploiting $y=\frac{1}{\sqrt{2}}\begin{bmatrix}\begin{smallmatrix}u\\ x/\beta\end{smallmatrix}\end{bmatrix}$ and $z=\frac{1}{\sqrt{2}}\begin{bmatrix}\begin{smallmatrix}u\\ x/\alpha\end{smallmatrix}\end{bmatrix}$ in Theorem 2.2, and keeping (2.14) in mind, we can present the following results.

Theorem 2.3.

Let $(\sigma,y)$ and $(\frac{1}{\sigma},z)$ be the eigenpairs of $(\widehat{A},\widehat{B})$ and $(\widetilde{A},\widetilde{B})$ corresponding to the GSVD component $(\alpha,\beta,u,v,x)$ of $(A,B)$ . Assume that their approximations $(\widehat{\sigma},\widehat{y})$ and $(\frac{1}{\widetilde{\sigma}},\widetilde{z})$ are the generalized eigenpairs of the perturbed $(\widehat{\bm{A}},\widehat{\bm{B}})$ and $(\widetilde{\bm{B}},\widetilde{\bm{A}})$ , respectively, where the perturbations satisfy (2.9). If $\epsilon$ is sufficiently small, the following error estimates hold:

[TABLE]

where $\delta_{1}=\mathcal{O}(\epsilon)$ and $\delta_{2}=\mathcal{O}(\epsilon)$ .

Proof.

It suffices to prove (2.17), and the proof of (2.18) is similar. From Lemma 2.1, notice that the eigenvector $y$ of $(\widehat{A},\widehat{B})$ satisfies $y^{T}\widehat{A}y=\sigma$ and $y^{T}\widehat{B}y=1$ . From $\sigma=\alpha/\beta$ , $\alpha^{2}+\beta^{2}=1$ and $\|u\|=1$ , we have

[TABLE]

Applying this and (2.14) to (2.11) yields (2.17). ∎

Notice from (2.9) that the perturbation terms in the right hand sides of both (2.17) and (2.18) are no more than the same $\mathcal{O}(\varepsilon)$ . Theorem 2.3 illustrates that the accuracy of the approximate generalized singular value $\widehat{\sigma}$ and that of $\widetilde{\sigma}$ are determined by $\beta$ and $\|x\|$ , and by $\alpha$ and $\|x\|$ , respectively. Apparently, a large $\|x\|$ could severely impair the accuracy of both $\widehat{\sigma}$ and $\widetilde{\sigma}$ . Fortunately, the following bounds show that $\|x\|$ must be modest under some mild conditions.

Lemma 2.4.

Let $X$ be the right generalized singular vector matrix of $(A,B)$ as defined in (1.1) and $x$ be an arbitrary column of $X$ . Then

[TABLE]

where the superscript ${\dagger}$ denotes the Moore-Penrose generalized inverse of a matrix, and

[TABLE]

Proof.

The bounds in (2.19) and the upper bound for $\|x\|$ in (2.20) are from Theorem 2.3 of hansen1989regularization . Note that $x$ is a column of $X$ . Then the lower bound for $\|x||$ in (2.20) follows from the fact that

[TABLE]

Lemma 2.4 indicates that, provided that one of $A$ and $B$ is well conditioned, $\|x\|$ must be modest. In applications, to our best knowledge, there seems no case that both $A$ and $B$ are simultaneously ill conditioned. Therefore, without loss of generality, we will assume that at least one of $A$ and $B$ is well conditioned. Then we have $\|x\|=\mathcal{O}(1)$ . Under this assumption, the stacked matrix $\begin{bmatrix}\begin{smallmatrix}A\\ B\end{smallmatrix}\end{bmatrix}$ must be well conditioned, too (stewart1990matrix, , Theorem 4.4).

Moreover, Theorem 2.4 of hansen1989regularization shows that provided $\begin{bmatrix}\begin{smallmatrix}A\\ B\end{smallmatrix}\end{bmatrix}$ is well conditioned, the singular values of $A$ and those of $B$ behave like $\alpha_{i}$ and $\beta_{i},i=1,2,\ldots,n$ , correspondingly: the ratios of the singular values of $A$ and $\alpha_{i}$ (resp. those of the singular values of $B$ and $\beta_{i}$ ), when labeled by the same order, are bounded from below and above by $\big{\|}\begin{bmatrix}\begin{smallmatrix}A\\ B\end{smallmatrix}\end{bmatrix}^{\dagger}\big{\|}^{-1}$ and $\big{\|}\begin{bmatrix}\begin{smallmatrix}A\\ B\end{smallmatrix}\end{bmatrix}\big{\|}$ , respectively. As a consequence, it is straightforward to justify the following basic properties, which will play a vital role in analyzing the results in this paper.

Property 2.5.

Assume that at least one of $A$ and $B$ is well conditioned.

•

If both $A$ and $B$ are well conditioned, no $\alpha_{i}$ and $\beta_{i}$ are small. In this case, all the generalized singular values $\sigma_{i}$ of $(A,B)$ are neither large nor small.

•

If $A$ or $B$ is ill conditioned, there must be some small $\alpha_{i}$ or $\beta_{i}$ , that is, some generalized singular values $\sigma_{i}$ must be small or large. Moreover, the small generalized singular values $\sigma_{i}=\alpha_{i}/\beta_{i}=\alpha_{i}(1-\alpha_{i}^{2})^{-\frac{1}{2}}\approx\alpha_{i}$ for $A$ ill conditioned and the large $\sigma_{i}=(1-\beta_{i}^{2})^{\frac{1}{2}}/\beta_{i}\approx 1/\beta_{i}$ for $B$ ill conditioned.

•

If $A$ is ill conditioned and $B$ is well conditioned, all the $\sigma_{i}$ cannot be large but some of them are small; if $A$ is well conditioned and $B$ is ill conditioned, all the $\sigma_{i}$ cannot be small but some of them are large.

Notice that $\alpha^{2}+\beta^{2}=1$ and $\alpha>0$ , $\beta>0$ . We have

[TABLE]

Therefore, it follows from (2.20) that

[TABLE]

This, together with the assumption $\|A\|\approx\|B\|\approx 1$ , shows that the lower and upper bounds are roughly $\frac{1}{3}$ and 3, respectively, and the ratio is thus very modest. When at least one of $A$ and $B$ is well conditioned, it is clear that the numerators $\|x\|^{2}+\beta^{2}$ and $\|x\|^{2}+\alpha^{2}$ in the constants in front of the perturbation terms in bounds (2.17) and (2.18) are not only modest but also very comparable in size. However, it is worthwhile to remind that the lower and upper bounds in (2.21) shows that the ratio $\frac{\|x\|^{2}+\beta^{2}}{\|x\|^{2}+\alpha^{2}}$ is always modest, independent of the conditioning of $A$ and $B$ . Furthermore, relation (2.21) shows that it is the denominators $2\beta$ and $2\alpha$ that decide the size of the constants in front of the perturbation terms in bounds (2.17) and (2.18). As a consequence, in terms of Theorem 2.3 and Property 2.5, we can draw the following conclusions for the accurate computation of $\sigma$ :

•

For $A$ and $B$ well conditioned, both (1.4) and (1.5) work well.

•

If $A$ is well conditioned but $B$ is ill conditioned, (1.5) is preferable to (1.4).

•

If $A$ is ill conditioned but $B$ is well conditioned, (1.4) is better than (1.5).

2.2 The accuracy of generalized eigenvectors

In terms of the angles between the approximate and exact eigenvectors, we present the following accuracy estimates for the approximate eigenvectors of the symmetric definite matrix pairs in (1.4) and (1.5).

Theorem 2.6.

With the notations of Theorem 2.2, the following bounds hold:

[TABLE]

where the $\mu_{i}$ are the eigenvalues of $(\widehat{A},\widehat{B})$ other than $\sigma$ , and the $\nu_{i}$ are the eigenvalues of $(\widetilde{B},\widetilde{A})$ other than $\frac{1}{\sigma}$ .

Proof.

By definition, we have $(\widehat{A}+\widehat{E})\widehat{y}=\widehat{\sigma}(\widehat{B}+\widehat{F})\widehat{y}$ with $\widehat{y}=y+s=y+Y_{2}h$ for some $h\in\mathbb{R}^{m+n-1}$ and the matrix $Y_{2}$ defined as in (2.15). By a simple manipulation, we obtain

[TABLE]

Premultiplying $Y_{2}^{T}$ both hand sides of the above relation, and noticing from (2.15) and (2.4) that $Y_{2}^{T}\widehat{A}y=0$ , $Y_{2}^{T}\widehat{B}y=0$ and $Y_{2}^{T}\widehat{A}Y_{2}=\widehat{\Sigma}_{2}$ , $Y_{2}^{T}\widehat{B}Y_{2}=I_{m+n-1}$ , we obtain

[TABLE]

Taking $2$ -norms on both hand sides in the above equality and exploiting

[TABLE]

with $\mu_{i}$ being the eigenvalues of $(\widehat{A},\widehat{B})$ other than $\sigma$ leads to

[TABLE]

By definition, the sine of the angle between $\widehat{y}=y+s$ and $y$ satisfies

[TABLE]

Substituting $\|s\|=\|Y_{2}h\|\leq\|Y_{2}\|\|h\|$ and (2.24) into (2.25) yields

[TABLE]

Notice that $\widehat{B}$ is positive definite and $Y_{2}$ satisfies $Y_{2}^{T}\widehat{B}Y_{2}=I_{m+n-k}$ . We have

[TABLE]

applying which to (2.26) gives (2.22).

Following the same derivation, we obtain

[TABLE]

with $\nu_{i}$ being eigenvalues of $(\widetilde{B},\widetilde{A})$ other than $\frac{1}{\sigma}$ , i.e., (2.23) holds. ∎

Theorem 2.6 gives accuracy estimates for the approximate eigenvectors of the matrix pairs $(\widehat{A},\widehat{B})$ and $(\widetilde{B},\widetilde{A})$ . It presents the results in the form of the structured matrix pairs and their eigenvalues. For our use in the GSVD context, substituting the definitions of $(\widehat{A},\widehat{B})$ and $(\widetilde{B},\widetilde{A})$ in (1.4) and (1.5) as well as their eigenvectors in (2.2) and (2.3) into Theorem 2.6, we can express the results more clearly in terms of the generalized singular values of $(A,B)$ and the matrices $A$ and $B$ themselves.

Theorem 2.7.

With the notations of Theorem 2.3, the following results hold:

[TABLE]

where the $\sigma_{i}$ are the generalized singular values of $(A,B)$ other than $\sigma$ .

Proof.

Since the eigenvalues of $(\widehat{A},\widehat{B})$ are $\pm\sigma_{1},\pm\sigma_{2},\dots,\pm\sigma_{n}$ and $m-n$ zeros, we have

[TABLE]

where the $\sigma_{i}$ are the generalized singular values of $(A,B)$ other than $\sigma$ .

On the other hand, by definition (1.4) of $\widehat{B}$ , we have

[TABLE]

Applying (2.29) and (2.30) to (2.22), we obtain (2.27).

Notice that the eigenvalues of $(\widetilde{B},\widetilde{A})$ are $\pm\frac{1}{\sigma_{1}},\pm\frac{1}{\sigma_{2}},\dots,\pm\frac{1}{\sigma_{n}}$ and $m-n$ zeros. Following the same derivations as above, we obtain

[TABLE]

which proves (2.28). ∎

Denote $\widehat{\sigma}=\sigma(1+\omega_{1})$ and $\widetilde{\sigma}=\sigma(1+\omega_{2})$ with $\omega_{1}=\frac{\widehat{\sigma}-\sigma}{\sigma}$ and $\omega_{2}=\frac{\widetilde{\sigma}-\sigma}{\sigma}$ . Assume that $\epsilon$ in (2.9) is sufficiently small. Then from (2.10) and (2.17)–(2.18), we have $\omega_{1}=\mathcal{O}(\epsilon)$ and $\omega_{2}=\mathcal{O}(\epsilon)$ . For any generalized singular value $\sigma_{i}\neq\sigma$ of $(A,B)$ , it is straightforward to obtain

[TABLE]

and

[TABLE]

As a consequence, it holds that

[TABLE]

For the minima in the right-hand sides of (2.31) and (2.32), we have the following result.

Theorem 2.8.

Denote $\gamma_{1}=\min_{\sigma_{i}\neq\sigma}\{|1-\frac{\sigma_{i}}{\sigma}|,1\}$ and $\gamma_{2}=\min_{\sigma_{i}\neq\sigma}\{|1-\frac{\sigma}{\sigma_{i}}|,1\}$ with $\sigma_{i}$ being the generalized singular values of $(A,B)$ other than $\sigma$ . Then

[TABLE]

To prove this theorem, we need the following lemma.

Lemma 2.9.

Define $f(t)=\min\{|1-t|,1\}$ and $g(t)=\min\{|1-\frac{1}{t}|,1\}$ for $t\in(0,1)\cup(1,+\infty)$ . Then

[TABLE]

Proof.

We classify nonnegative $t$ as three subintervals:

•

if $t\in(0,\frac{1}{2})$ , then $f(t)=1-t$ , $g(t)=1$ and $\frac{g(t)}{f(t)}=\frac{1}{1-t}\in(1,2);$

•

if $t\in[\frac{1}{2},1)\cup(1,2]$ , then $f(t)=|1-t|$ , $g(t)=|1-\frac{1}{t}|$ and $\frac{g(t)}{f(t)}=\frac{1}{t}\in[\frac{1}{2},1)\cup(1,2];$

•

if $t\in(2,+\infty)$ , then $f(t)=1$ , $g(t)=1-\frac{1}{t}$ and $\frac{g(t)}{f(t)}=1-\frac{1}{t}\in(\frac{1}{2},1).$

Summarizing the above establishes (2.34). ∎

Proof of Theorem 2.8.

Denote by $\sigma_{l}$ and $\sigma_{r}$ the generalized singular values of $(A,B)$ that minimize $|1-\frac{\sigma_{i}}{\sigma}|$ and $|1-\frac{\sigma}{\sigma_{i}}|$ over all the generalized singular values $\sigma_{i}$ of $(A,B)$ other than $\sigma$ , respectively. Then $\gamma_{1}$ and $\gamma_{2}$ can be written as

[TABLE]

where the functions $f(\cdot)$ and $g(\cdot)$ are defined by Lemma 2.9. Therefore, the ratio in (2.33) is

[TABLE]

By the definitions of $\sigma_{l}$ and $\sigma_{r}$ , we have

[TABLE]

Combining (2.35) with (2.34), we obtain

[TABLE]

which completes the proof. ∎

Theorem 2.8, together with (2.31) and (2.32), means that the factors $\min_{\sigma_{i}\neq\sigma}\{|1-\frac{\sigma_{i}}{\widehat{\sigma}}|,1\}$ and $\min_{\sigma_{i}\neq\sigma}\{|1-\frac{\widetilde{\sigma}}{\sigma_{i}}|,1\}$ in (2.27) and (2.28) have approximately the same size and both are approximately the relative separation of the desired $\sigma$ from the other generalized singular values of $(A,B)$ . The bigger they are, i.e., the better the desired generalized singular value $\sigma$ is separated from the others, the more accurate the approximate eigenvectors of (1.4) and (1.5) are.

For a given $\epsilon$ , (2.9) tells us that $\sqrt{\|\widehat{E}\|^{2}+\widehat{\sigma}^{2}\|\widehat{F}\|^{2}}$ and $\sqrt{\|\widetilde{E}\|^{2}+\widetilde{\sigma}^{2}\|\widetilde{F}\|^{2}}$ in (2.27) and (2.28) are approximately equal. Therefore, Theorems 2.7–2.8 and $\widehat{\sigma}=\sigma(1+\mathcal{O}(\epsilon))$ , $\widetilde{\sigma}=\sigma(1+\mathcal{O}(\epsilon))$ show that which of $\widehat{y}$ and $\widetilde{z}$ is more accurate critically depends on the sizes of $\frac{\max\{1,\|B^{{\dagger}}\|^{2}\}}{\sigma}$ and $\max\{1,\|A^{{\dagger}}\|^{2}\}$ . Keep in mind that $\|A\|\approx\|B\|\approx 1$ means that $\max\{1,\|A^{{\dagger}}\|^{2}\}\approx\kappa^{2}(A)$ and $\max\{1,\|B^{{\dagger}}\|^{2}\}\approx\kappa^{2}(B)$ . Combining these results with Property 2.5, for a proper choice of (1.4) and (1.5) for computing eigenvectors more accurately, we can draw the following conclusions with the arguments included.

•

If $A$ and $B$ have roughly the same conditioning and both are well conditioned, then $\sigma$ cannot be large or small. In this case, both (1.4) and (1.5) are proper formulations of computing the generalized eigenvectors $y$ and $z$ with similar accuracy.

•

For $B$ ill conditioned and $A$ well conditioned, assuming that the $\beta_{i}$ are labeled in decreasing order, from Property 2.5, since the pair $(A,B)$ has large generalized singular values $\sigma_{i}\approx 1/\beta_{i}$ but has no small one, it is known that $\|B^{{\dagger}}\|\approx\frac{1}{\min_{i}\beta_{i}}\approx\max_{i}\sigma_{i}=\sigma_{\max}(A,B)$ . Therefore, we have

[TABLE]

for any $\sigma$ . Therefore, (1.5) is preferable to compute any eigenvector $z$ more accurately.

•

For $B$ well conditioned and $A$ ill conditioned, from Property 2.5, since some generalized singular values $\sigma$ of $(A,B)$ are small but none is large, it is known that $\|A^{{\dagger}}\|\approx\frac{1}{\min_{i}\alpha_{i}}\approx\frac{1}{\min_{i}\sigma_{i}}=\frac{1}{\sigma_{\min}(A,B)}$ . Therefore, we always have

[TABLE]

for any $\sigma$ . This means that (1.4) is preferable to compute any eigenvector $y$ more accurately.

Finally, we notice from Theorem 2.7 that $\hat{y}$ or $\tilde{z}$ may have no accuracy at all whenever $\kappa(B)$ or $\kappa(A)$ is as large as $\mathcal{O}(\epsilon_{\rm mach}^{-1/2})$ , even though a backward stable generalized eigensolver is applied to (1.4) or (1.5) and backward errors are $\mathcal{O}(\epsilon_{\rm mach})$ . For a large matrix pair $(A,B)$ , iterative projection methods are used to compute some specific GSVD components and stopping criteria are typically $\mathcal{O}(\epsilon_{\rm mach}^{1/2})$ , so that backward errors are $\mathcal{O}(\epsilon_{\rm mach}^{1/2})$ . In this case, $\hat{y}$ or $\tilde{z}$ may have no accuracy provided that $\kappa(B)$ or $\kappa(A)$ is as large as $\mathcal{O}(\epsilon_{\rm mach}^{-1/4})$ .

3 The accuracy of generalized singular vectors

After applying a generalized eigensolver to the matrix pair $(\widehat{A},\widehat{B})$ or $(\widetilde{B},\widetilde{A})$ , the computed eigenvalue $\widehat{\sigma}$ or $\widetilde{\sigma}$ provides an approximation to the desired generalized singular value $\sigma$ directly. However, the situation is complicated and more involved for generalized singular vectors since the generalized eigenvector

[TABLE]

defined by (2.5) or (2.6) is a stack of the normalized left generalized singular vector $u$ or $v$ and the scaled right generalized singular vector

[TABLE]

We must recover approximations to the generalized singular vectors $u,v,x$ from a computed approximate eigenvector $\widehat{y}$ or $\widetilde{z}$ . For the GSVD components of $(A,B)$ , our next task is to determine which of $\widehat{y}$ and $\widetilde{z}$ delivers more accurate approximations to $u,v$ and $x$ when the perturbations $\widehat{E},\widehat{F}$ and $\widetilde{E},\widetilde{F}$ in (2.7) approximately have the same size in norm.

For (1.4), after a generalized eigensolver is run, we write the converged approximate eigenvector as $\widehat{y}=\frac{1}{\sqrt{2}}[\widehat{u}^{T},\widehat{x}^{T}]^{T}$ with $\widehat{u}\in\mathbb{R}^{m}$ normalized to have unit length and $\widehat{x}\in\mathbb{R}^{n}$ . Then $\widehat{u}$ and $\widehat{\beta}\widehat{x}$ provide approximations to the left generalized singular vector $u$ and the right generalized singular vector $x$ , respectively, with the computed $\widehat{\sigma}=\frac{\widehat{\alpha}}{\widehat{\beta}}$ . As for the left generalized singular vector $v$ , since $Bx=\beta v$ , it is natural to take the unit length $\widehat{v}=\frac{B\widehat{x}}{\|B\widehat{x}\|}$ as its approximation.

Analogously, for (1.5), we partition $\widetilde{z}=\frac{1}{\sqrt{2}}[\widetilde{v}^{T},\widetilde{x}^{T}]^{T}$ such that $\widetilde{v}\in\mathbb{R}^{p}$ is normalized to have unit length, $\widetilde{x}\in\mathbb{R}^{n}$ , and that $\widetilde{v}$ and $\widetilde{\alpha}\widetilde{x}$ are approximations to the left generalized singular vector $v$ and the right generalized singular vector $x$ , respectively, where the computed $\frac{1}{\widetilde{\sigma}}=\frac{\widetilde{\beta}}{\widetilde{\alpha}}$ . Since $Ax=\alpha u$ , we take the unit length $\widetilde{u}=\frac{A\widetilde{x}}{\|A\widetilde{x}\|}$ as an approximation to $u$ .

Previously we have derived error estimates on $\sin\angle(\widehat{y},y)$ and $\sin\angle(\widehat{z},z)$ for the approximate eigenvectors $\widehat{y}$ and $\widetilde{z}$ . Next we exploit them to estimate the accuracy of the approximate generalized singular vectors $(\widehat{u},\widehat{v},\widehat{\beta}\widehat{x})$ and $(\widetilde{u},\widetilde{v},\widetilde{\alpha}\widetilde{x})$ recovered in the manner described above. To this end, we prove the following lemma, which is a generalization of Theorem 2.3 in jia2003implicitly .

Lemma 3.1.

Assume that $a$ and $b$ are arbitrary nonzero vectors, and let $a^{\prime}$ and $b^{\prime}$ be approximations to them, respectively. Then

[TABLE]

Moreover, it holds that

[TABLE]

where $\varrho=\sqrt{1+\max\left\{\tfrac{\|a\|^{2}}{\|b\|^{2}},\tfrac{\|b\|^{2}}{\|a\|^{2}}\right\}}.$

Proof.

By definition, the sine of the angle between two vectors $a$ and $a^{\prime}$ satisfies

[TABLE]

A similar relation holds with $a$ and $a^{\prime}$ replaced by $b$ and $b^{\prime}$ , respectively. Combining these two relations with the inequality

[TABLE]

proves (3.2). From (3.2), taking the smaller one of $\sin\angle(a^{\prime},a)$ and $\sin\angle(b^{\prime},b)$ yields (3.3). It is also straightforward to obtain

[TABLE]

Combining the above two inequalities gives rise to (3.4). ∎

Taking $a=u$ , $b=x_{\beta}$ and $a^{\prime}=\widehat{u}$ , $b^{\prime}=\widehat{x}$ , bound (3.3) illustrates that at least one of the recovered approximate generalized singular vectors $\widehat{u}$ and $\widehat{x}$ is as accurate as $\widehat{y}$ . Since $\|u\|=1$ , bound (3.4) indicates that if $\|x_{\beta}\|=\mathcal{O}(1)$ then both $\widehat{u}$ and $\widehat{x}$ have the same accuracy as $\widehat{y}$ . But bound (3.4) also states that if $\|x_{\beta}\|$ is very small or large relative to $\|u\|=1$ then one of $\widehat{u}$ and $\widehat{x}$ may have considerably poorer accuracy than $\widehat{y}$ due to the large factor $\varrho$ . Fortunately, a very small $\|x_{\beta}\|$ is unlikely to happen as $\|x\|$ is always modest under the assumption that at least one of $A$ and $B$ is well conditioned, implying that $\|x_{\beta}\|=\frac{\|x\|}{\beta}$ cannot be small as $0<\beta<1$ . On the other hand, when the largest GSVD components of $(A,B)$ are required, a large $\|x_{\beta}\|$ definitely appears if $B$ is ill conditioned since $\beta$ behaves like the singular values of $B$ and is small, as Property 2.5 shows.

Precisely, based on Lemma 3.1, we can derive quantitative accuracy estimates for the recovered approximate generalized singular vectors, as will be shown below.

Theorem 3.2.

The scaled right generalized singular vector $x_{\beta}$ , defined in (3.1), of $(A,B)$ satisfies

[TABLE]

For the approximate generalized singular vectors $\widehat{u},\widehat{v}$ and $\widehat{x}$ recovered from the approximate eigenvector $\widehat{y}$ of (1.4), it holds that

[TABLE]

Proof.

From $Bx=\beta v$ and $\|v\|=1$ we have

[TABLE]

which shows (3.5).

Take $a=u$ , $b=x_{\beta}$ in Lemma 3.1. Neglecting the first term in the left hand side of (3.2), we obtain

[TABLE]

which proves (3.6).

Neglecting the second term in the left hand side of (3.2) gives (3.7).

As for $\widehat{v}=\frac{B\widehat{x}}{\|B\widehat{x}\|}$ , exploiting $Bx=\beta v$ with $\|v\|=1$ and combining (3.6) with $\|x_{\beta}\|=\frac{\|x\|}{\beta}$ , we have

[TABLE]

which proves (3.8). ∎

As $\|x_{\beta}\|=\frac{\|x\|}{\beta}\geq\frac{1}{\|B\|}\approx 1$ , this theorem shows that the recovered approximate generalized singular vector $\widehat{x}$ is unconditionally as accurate as $\widehat{y}$ , but $\widehat{u}$ and $\widehat{v}$ are guaranteed to be as accurate as $\widehat{y}$ only if $\beta$ is not small. As Property 2.5 indicates, it is the conditioning of $B$ that determines the size of $\beta$ : for $B$ well conditioned, there is no small $\beta$ , so that the recovered approximate generalized singular vectors are guaranteed to be as accurate as $\widehat{y}$ ; for $B$ ill conditioned, some $\beta$ must be small so that $\|x_{\beta}\|$ is large, which correspond to large generalized singular values $\sigma$ , so that the associated recovered $\widehat{u}$ and $\widehat{v}$ may have poorer accuracy than $\widehat{y}$ .

In an analogous manner, we can prove the following results.

Theorem 3.3.

The scaled right generalized singular vector $x_{\alpha}$ , defined in (3.1), of $(A,B)$ satisfies

[TABLE]

For the approximate generalized singular vectors $\widetilde{u},\widetilde{v}$ and $\widetilde{x}$ recovered from the approximate eigenvector $\widetilde{z}$ of (1.5), it holds that

[TABLE]

The comments on Theorem 3.2 apply to this theorem: $\widetilde{x}$ is always as accurate as $\widetilde{z}$ ; $\widetilde{u}$ and $\widetilde{v}$ are guaranteed to be as accurate as $\widetilde{z}$ only if $\alpha$ is fairly modest, and they may be considerably poorer than $\widetilde{z}$ when $\|x_{\alpha}\|$ is large, i.e., when $\alpha$ is small. From Property 2.5, it is known that if $A$ is well conditioned then no $\alpha$ is small but if $A$ is ill conditioned then some $\alpha$ must be small, which correspond to small generalized singular values $\sigma$ .

Recall the previous fundamental conclusions on the accuracy of $\widehat{y}$ and $\widetilde{z}$ , which have been summarized in the near end of Section 2. Substituting the bounds in Theorem 2.7 for $\sin\angle(\widehat{y},y)$ and $\sin\angle(\widetilde{z},z)$ into Theorems 3.2–3.3, we obtain the corresponding error estimates for the approximate generalized singular vectors recovered from the approximate eigenvectors $\widehat{y}$ of (1.4) and $\widetilde{z}$ of (1.5), as summarized below.

Theorem 3.4.

The approximate generalized singular vectors $\widehat{u}$ , $\widehat{v}$ , and $\widehat{x}$ recovered from the approximate eigenvector $\widehat{y}$ of (1.4) satisfy

[TABLE]

Similarly, the approximate generalized singular vectors $\widetilde{u}$ , $\widetilde{v}$ , and $\widetilde{x}$ recovered from the approximate eigenvector $\widetilde{z}$ of (1.5) satisfy

[TABLE]

Combining these bounds with the above analysis and the claims in the near end of Section 2, we come to the following conclusions on a proper choice of (1.4) and (1.5) for more accurate computations of generalized singular vectors.

•

If both $A$ and $B$ are equally conditioned, i.e, both of them are well conditioned, both (1.4) and (1.5) are suitable choices.

•

If $A$ is well conditioned and $B$ is ill conditioned, (1.5) is preferable.

•

If $A$ is ill conditioned and $B$ is well conditioned, (1.4) is preferable.

By comparing these conclusions with those in the end of Section 2.1 for accurate computations of generalized singular values, we find out that they exactly coincide. Therefore, we have finally achieved our ultimate goal of making a proper choice of (1.4) and (1.5): the above conclusions apply to more accurate computations of both generalized singular values $\sigma$ and generalized singular vectors $u,v,x$ .

4 Practical choice strategies on

(1.4) and (1.5)

In Sections 2–3 we have made a sensitivity analysis on the generalized singular values and the corresponding generalized singular vectors of $(A,B)$ , which are computed by solving the generalized eigenvalue problems of (1.4) and (1.5). The results have shown that, in order to compute the desired GSVD components of $(A,B)$ more accurately, we should make a preferable choice between (1.4) and (1.5). To be practical in computations, this requires to estimate the condition numbers of $A$ and $B$ efficiently and reliably.

For $A$ and $B$ large-scale, note that we do not need to estimate $\kappa(A)$ and $\kappa(B)$ accurately, and rough estimates are enough. Taking $A$ as an example, we describe three approaches to estimate $\kappa(A)$ roughly. As $\|A\|\approx 1$ and $\kappa(A)\approx\sigma^{-1}_{\min}(A)$ , estimating $\kappa(A)$ is equivalent to estimating $\sigma_{\min}(A)$ .

The first approach: if $A$ is large-scale with special structures such that the matrix-vector multiplication with the matrix $(A^{T}A)^{-1}$ can be implemented at affordable extra cost, then one can perform a $k$ -step symmetric Lanczos method baiedit2000 ; parlett1998symmetric on $(A^{T}A)^{-1}$ and take the square root of the largest approximate eigenvalue as a reasonable estimate of $\kappa(A)$ . In the algorithm, what we need is to form $A^{T}A$ and compute its Cholesky factorization, which is used to solve lower and upper triangular linear systems at each step of the Lanczos method. The largest eigenvalue and possibly the smallest eigenvalue of $(A^{T}A)^{-1}$ can be well approximated from below and above by the largest and smallest ones of the symmetric tridiagonal matrices generated by the Lanczos process, respectively parlett1998symmetric . With $k\ll n$ , this method outputs a lower bound for $\kappa(A)$ . Since we do not need to estimate $\kappa(A)$ accurately and the Lanczos method generally converges quickly for computing the largest and smallest eigenvalues, we suggest to take a small $k=20$ in practice.

The second approach: when $A$ is a general large matrix, it is unaffordable to apply $(A^{T}A)^{-1}$ . Avron, Druinsky and Toledo avron2013spectral propose a randomized Krylov subspace method to estimate the condition number of a matrix $A$ . In their method, a consistent linear least squares problem, whose solution is generated randomly, is solved iteratively by the LSQR algorithm bjorck1996numerical , and the smallest singular value of $A$ is estimated by $\sigma_{\min}(A)\approx\frac{\|Ae\|}{\|e\|}$ with $e$ being the error of the approximate solution and the exact one. We refer the reader to avron2013spectral for details.

The third approach: as an alternative of the second approach, one can also perform a $k$ -step Lanczos bidiagonalization type method on $A$ and take the largest and smallest singular values of the resulting small projected matrix as approximations to the largest and smallest singular values of $A$ ; see jia2003implicitly ; jia2010 . We then take their ratio as a rough approximation to $\kappa(A)$ . Still, we take a small $k=20$ in practice. In this way, we can efficiently estimate $\kappa(A)$ .

Having estimated $\kappa(A)$ and $\kappa(B)$ using one of the above approaches, taking the resulting estimates as replacements of $\kappa(A)$ and $\kappa(B)$ , and based on the previous results and analysis, one can make a proper choice of (1.4) and (1.5) according to the following strategy.

•

If $0.5\kappa(B)\leq\kappa(A)\leq 2\kappa(B)$ , which means that $A$ and $B$ are equally well conditioned, then both (1.4) and (1.5) are suitable;

•

If $\kappa(A)>2\kappa(B)$ , which means that $A$ is worse conditioned than $B$ , then (1.5) is adopted;

•

If $\kappa(B)>2\kappa(A)$ , which means that $B$ is worse conditioned than $A$ , then (1.4) is recommended.

5 Numerical experiments

In this section, we report numerical experiments to confirm our theory. We do not aim to develop any algorithms based on (1.4) and (1.5) in this paper. Rather, we simply apply some existing numerically backward stable algorithms to them and compute their generalized eigendecompositions. In the experiments, we use the QZ algorithm, i.e., the Matlab built-in function eig, for the generalized eigenvalue problems (1.4) and (1.5). For each matrix pair $(A,B)$ , we recover all the $approximate$ GSVD components $(\widehat{\alpha},\widehat{\beta},\widehat{u},\widehat{v},\widehat{x})$ and $(\widetilde{\alpha},\widetilde{\beta},\widetilde{u},\widetilde{v},\widetilde{x})$ from the computed eigenpairs of the augmented matrix pairs $(\widehat{A},\widehat{B})$ and $(\widetilde{B},\widetilde{A})$ , respectively, i.e., $(\widehat{\sigma},\widehat{y})$ and $(\frac{1}{\widetilde{\sigma}},\widetilde{z})$ , which are obtained by applying eig to (1.4) and (1.5), respectively. The “ $exact$ ” GSVD components $(\alpha,\beta,u,v,x)$ are computed by applying the Matlab built-in function gsvd to $(A,B)$ . 111For the right generalized singular vector matrix $X$ in (1.1), gsvd outputs $R=X^{-T}$ in our notation. Hence $X$ is recovered by using the Matlab built-in function inv and taken as the transpose of inv( $R$ ).

We compare solution accuracy of the GSVD components based on (1.4) and (1.5), and mainly justify three points: (i) if both $A$ and $B$ are well conditioned, then both (1.4) and (1.5) are suitable for computing the GSVD of $(A,B)$ accurately; (ii) if $A$ is ill conditioned and $B$ is well conditioned, then (1.4) is preferable to compute the GSVD accurately; (iii) if $A$ is well conditioned and $B$ is ill conditioned, then (1.5) is a better formulation for computing the GSVD accurately. As mentioned in the beginning of section 1, the GSVDs of the matrix pairs $(A,B)$ and $(B,A)$ are the same with the generalized singular values being the reciprocals of each other. Under the assumption that at least one of $A$ and $B$ is well conditioned, we can always take one of them to be well conditioned and the other one well conditioned or ill conditioned. Therefore, for the sake of certainty in the experiments, we always take $B$ to be well conditioned but $A$ to be well or ill conditioned. In the meantime, we justify Property 2.5.

All the numerical experiments were performed on an Intel (R) Core (TM) i7-7700 CPU 3.60 GHz with 8 GB RAM, 4 cores and 8 threads using the Matlab R2017a with the machine precision $\epsilon_{\rm mach}=2.22\times 10^{-16}$ under the Microsoft Windows 8 64-bit system.

We measure the accuracy of the computed generalized singular values by their chordal distances from their exact counterparts and measure the accuracy of the computed generalized singular vectors by the sines of the angles between them and their exact counterparts.

Each figure in this section consists of four subfigures: the top left one depicts the accuracy of the computed generalized singular values $\sigma_{i}$ such that $\sigma$ ’s are sorted in descending order; the top right, bottom left and right ones depict the accuracy of the computed right and left generalized singular vectors $x_{i}$ and $u_{i}$ , $v_{i}$ , respectively.

Experiment 5.1.

We first test three randomly generated problems. For prescribed constants $c_{A}\geq 1$ and $c_{B}\geq 1$ , we generate the random sparse $m\times n$ matrix $A$ and $p\times n$ matrix $B$ by the Matlab commands

[TABLE]

with the density $dens=50\%$ , and $ra=[\tfrac{1}{c_{A}}:\tfrac{1}{n-1}(1-\tfrac{1}{c_{A}}):1]$ and $rb=[\tfrac{1}{c_{B}}:\tfrac{1}{n-1}(1-\tfrac{1}{c_{B}}):1]$ . The largest singular values of such $A$ and $B$ are equal to one, i.e., $\|A\|=\|B\|=1$ , and their condition numbers are $c_{A}$ and $c_{B}$ , respectively. Therefore, by prescribing the values of $c_{A}$ and $c_{B}$ , we control the condition numbers $\kappa(A)$ and $\kappa(B)$ . Table 1 lists the test problems together with their basic properties. Figures 1-3 display the results.

From Table 1 we see that $\kappa(A)=\|A^{\dagger}\|\approx\frac{1}{\sigma_{\min}(A,B)}$ and $\kappa(B)=\|B^{\dagger}\|\approx\sigma_{\max}(A,B)$ , confirming Property 2.5 and the third conclusion in the near end of Section 2. We notice that as long as at least one of $A$ and $B$ is well conditioned, so is the stacked matrix $\begin{bmatrix}\begin{smallmatrix}A\\ B\end{smallmatrix}\end{bmatrix}$ .

For problem 1a, both $A$ and $B$ are well conditioned. Figure 1 illustrates that both (1.4) and (1.5) yield equally accurate GSVD components of $(A,B)$ . Apparently, there is no winner between (1.4) and (1.5) for this problem.

For problem 1b, $A$ is moderately ill conditioned and $B$ is well conditioned. As is observed from Figure 2a, the computed generalized singular values based on (1.4) are generally more accurate than those based on (1.5), or at least as comparably accurate as the latter ones. Figures 2b-2d show that for most of the generalized singular vectors, (1.4) yields significantly more accurate approximations than (1.5) does. Therefore, (1.4) outperforms (1.5) for this problem.

For problem 1c where $A$ is quite ill conditioned and $B$ is well conditioned, the advantage of (1.4) over (1.5) is very obvious. As is visually illustrated by Figure 3, for all the generalized singular components, (1.4) yields more or even much more accurate approximations than (1.5), and the accuracy is improved by several orders. For this problem, (1.4) definitely wins.

For these three problems, we have observed that for both $A$ and $B$ well conditioned, two formulations (1.4) and (1.5) based backward stable algorithms deliver equally accurate approximations to the GSVD components of $(A,B)$ . For the problems where $A$ is ill conditioned and $B$ is well conditioned, (1.4) can produce more and even much more accurate GSVD components than (1.5). Moreover, with $B$ being well conditioned, the worse conditioned $A$ is, the more advantageous (1.4) is over (1.5). As is also observed from Figures 1-3, a suitable choice between (1.4) and (1.5) can always guarantee that under the chordal measure all the generalized singular values $\sigma$ can be computed with full accuracy, i.e., the level of $\epsilon_{\rm mach}$ , which confirms Theorem 2.3 and the analysis followed in Section 2.1.

Experiment 5.2.

We test several realistic problems. For each problem, the matrices $A$ and $B$ are normalized from $A_{0}$ and $B_{0}$ , respectively, i.e., $A=\frac{A_{0}}{\|A_{0}\|}$ and $B=\frac{B_{0}}{\|B_{0}\|}$ , where $A_{0}\in\mathbb{R}^{n\times n}$ is a square matrix from the SuiteSparse Matrix Collection davis2011university and

[TABLE]

is the transpose of the $n\times(n+1)$ first order derivative operator in dimension one hansen1998rank . Table 2 lists the test problems together with some of their basic properties, where the names inside the brackets are those of the initial matrices $A_{0}$ , in which “delan12” and “viscopl1” are abbreviations for “delaunay_n12” and “viscoplastic1”, respectively.

We observe from Table 2 that $\kappa(A)\approx\frac{1}{\sigma_{\min}(A,B)}$ well and $\kappa(B)\approx\sigma_{\max}(A,B)$ roughly, justifying Property 2.5 and the third conclusion in the near end of Section 2.

Table 3 displays some key data that exhibit the advantages of (1.4) over (1.5) when computing the GSVD of $(A,B)$ more accurately, where $pct$ denotes the percentages that the computed GSVD components based on (1.4) are more accurate than those based on (1.5), and $acc$ denotes the average orders of magnitude differences between the accuracy of the computed GSVD components based on (1.4) and the accuracy of those based on (1.5), i.e., $acc$ for the generalized singular values $\sigma$ is defined by

[TABLE]

Apparently, the bigger $pct$ and $acc$ are, the more accurate the GSVD components based on (1.4) are than those based on (1.5) on average. $pct\approx 50\%$ and $acc\approx 0$ indicate that, on average, there is little difference and these two formulations based backward stable eigensolvers compute the GSVD with similar accuracy.

For these six test problems, we have observed very similar phenomena to the previous experiments. For problems 2a and 2b where both $A$ and $B$ are equally well conditioned, (1.4) and (1.5) are competitive and there is no obvious winner between them, though (1.5) is slightly better than (1.4). However, we have seen that, for problems 2c-2f, the matrix $A$ is increasingly worse conditioned than $B$ , the measures $pct>50\%$ and $acc>0$ increase and become near to one and bigger, respectively, meaning that more and more GSVD components are computed more and even much more accurately based on (1.4) than on (1.5). Therefore, (1.4) outperforms (1.5) for these four problems.

To illustrate the accuracy visually, we depict the results on problems 2d and 2f in Figures 5 and 5, respectively. For problem 2d, the matrix $B$ is well conditioned and $A$ is ill conditioned. We can see from Figure 5 that for the largest $80\%$ of the GSVD components, (1.4) outperforms (1.5) substantially, but for the rest smallest $20\%$ ones, the two formulations are competitive as they yield comparably accurate approximations. Particularly, from Figure 5, we also observe a loss of accuracy of the approximate generalized singular vectors around the $3500$ th GSVD component. This occurs because of very small relative gaps between the corresponding generalized singular values. For problem 2f where $B$ is well conditioned and $A$ is worse conditioned, (1.4) outperforms (1.5) more substantially and the accuracy improvements illustrated by Figure 5 are tremendous. We observe that for almost all (more than 99%) GSVD components of $(A,B)$ , (1.4) yields much more accurate approximations than (1.5) does. In addition, we see from Figure 5 that for the several smallest GSVD components, using (1.5) can compute generalized singular values accurately, but the corresponding computed generalized singular vectors have no accuracy at all, while (1.4) works very well. This is not surprising and is in accordance with our comments in the near end of Section 2 by noticing that $\kappa(A)=\mathcal{O}(\epsilon_{\rm mach}^{-1/2})$ .

Finally, for all the test problems, we have observed that, with the suitable formulation chosen and under the chordal measure, the generalized singular values $\sigma$ are always computed with full accuracy, which justifies Theorem 2.3 and the analysis followed in Section 2.1.

Summarizing all the experiments, we conclude that (i) both (1.4) and (1.5) suit well for problems where both $A$ and $B$ are well conditioned, (ii) (1.4) is preferable for problems where $A$ is ill conditioned and $B$ is well conditioned, and (iii) (1.5) is preferable for problems where $A$ is well conditioned and $B$ is ill conditioned. Therefore, the numerical experiments have fully justified our theory.

6 Conclusions

The GSVD of the matrix pair $(A,B)$ can be formulated as two mathematically equivalent generalized eigenvalue problems of the matrix pairs defined by (1.4) and (1.5), to which a generalized eigensolver can be applied, and the GSVD components are recovered from the computed generalized eigenpairs. However, in numerical computations, the two formulations may behave very differently for computing the GSVD, and the same generalized eigensolver applied to them may compute GSVD components with quite different accuracy. We have made a detailed sensitivity analysis on the generalized singular values and the generalized singular vectors recovered from the computed eigenpairs by solving the generalized eigenvalue problems of the matrix pairs defined by (1.4) and (1.5), respectively. The results and analysis have shown that (i) both (1.4) and (1.5) are suitable when both $A$ and $B$ are well conditioned; (ii) (1.4) is preferable when $A$ is ill conditioned and $B$ is well conditioned; (iii) (1.5) suits better when $A$ is well conditioned and $B$ is ill conditioned. We have also proposed practical strategies of making a suitable choice between (1.4) and (1.5) in practical computations.

Illuminating numerical experiments have confirmed our theory and supported our choice strategies on (1.4) and (1.5).

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) Avron, H., Druinsky, A., Toledo, S.: Spectral condition-number estimation of large sparse matrices. Numer. Linear Algebra Appl. p. e 2235 (2019)
2(2) Bai, Z., Demmel, J., Dongrra, J., Ruhe, A., van der Vorst, H.A.: Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphia (2000)
3(3) Betcke, T.: The generalized singular value decomposition and the method of particular solutions. SIAM J. Sci. Comput. 30 , 1278–1295 (2008)
4(4) Björck, Å.: Numerical Methods for Least Squares Problems. SIAM, Philadelphia (1996)
5(5) Chu, K.W.E.: Singular value and generalized singular value decompositions and the solution of linear matrix equations. Linear Algebra Appl. 88 , 83–98 (1987)
6(6) Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Software 38 , 1–25 (2011). Data available online at http://www.cise.ufl.edu/research/sparse/matrices/
7(7) Golub, G.H., van Loan, C.F.: Matrix Computations. John Hopkins University Press (2012)
8(8) Hansen, P.C.: Regularization, GSVD and truncated GSVD. BIT 29 , 491–504 (1989)

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On choices of formulations of computing

Abstract

Keywords:

MSC:

1 Introduction

2 Perturbation analysis of generalized eigenvalue problems

Lemma 2.1**.**

2.1 The accuracy of generalized singular values

Theorem 2.2**.**

Proof.

Theorem 2.3**.**

Proof.

Lemma 2.4**.**

Proof.

Property 2.5**.**

2.2 The accuracy of generalized eigenvectors

Theorem 2.6**.**

Proof.

Theorem 2.7**.**

Proof.

Theorem 2.8**.**

Lemma 2.9**.**

Proof.

Proof of Theorem 2.8.

3 The accuracy of generalized singular vectors

Lemma 3.1**.**

Proof.

Theorem 3.2**.**

Proof.

Theorem 3.3**.**

Theorem 3.4**.**

4 Practical choice strategies on

5 Numerical experiments

Experiment 5.1**.**

Experiment 5.2**.**

6 Conclusions

Lemma 2.1.

Theorem 2.2.

Theorem 2.3.

Lemma 2.4.

Property 2.5.

Theorem 2.6.

Theorem 2.7.

Theorem 2.8.

Lemma 2.9.

Lemma 3.1.

Theorem 3.2.

Theorem 3.3.

Theorem 3.4.

Experiment 5.1.

Experiment 5.2.