The GSVD: Where are the ellipses?, Matrix Trigonometry, and more

Alan Edelman; Yuyang Wang

arXiv:1901.00485·math.NA·November 30, 2020·SIAM J. Matrix Anal. Appl.

The GSVD: Where are the ellipses?, Matrix Trigonometry, and more

Alan Edelman, Yuyang Wang

PDF

TL;DR

This paper develops a geometric and theoretical framework for the Generalized Singular Value Decomposition (GSVD), revealing its natural coordinates, applications, and advantages over traditional eigenproblem approaches, with implications across various scientific fields.

Contribution

It introduces a geometric interpretation of the GSVD, including an ellipse picture and multiaxes, and advocates for its natural application setting over the eigenproblem formulation.

Findings

01

The GSVD provides a natural coordinate system for the Grassmann manifold.

02

The geometric ellipse interpretation clarifies the role of generalized singular vectors.

03

Applications include regularization, genome reconstruction, signal processing, and statistical analysis.

Abstract

This paper provides an advanced mathematical theory of the Generalized Singular Value Decomposition (GSVD) and its applications. We explore the geometry of the GSVD which provides a long sought for ellipse picture which includes a horizontal and a vertical multiaxis. We further propose that the GSVD provides natural coordinates for the Grassmann manifold. This paper proves a theorem showing how the finite generalized singular values do or do not relate to the singular values of $A B^{†}$ . We then turn to the applications arguing that this geometrical theory is natural for understanding existing applications and recognizing opportunities for new applications. In particular the generalized singular vectors play a direct and as natural a mathematical role for certain applications as the singular vectors do for the SVD. In the same way that experts on the SVD often prefer not to cast…

Tables4

Table 1. Table 1: The C 𝐶 C and S 𝑆 S matrices are naturally simultaneously partitioned into three block columns such that the number of columns r = ( r − r b ) + ( r a + r b − r ) + ( r − r a ) 𝑟 𝑟 subscript 𝑟 𝑏 subscript 𝑟 𝑎 subscript 𝑟 𝑏 𝑟 𝑟 subscript 𝑟 𝑎 r=(r-r_{b})+(r_{a}+r_{b}-r)+(r-r_{a}) , in left to right order. The row sizes conform to A 𝐴 A and B 𝐵 B which means that we add rows of zeros to C , S 𝐶 𝑆 C,S or possibly delete some of the zero cosines/sines to achieve a row count of m 1 , m 2 subscript 𝑚 1 subscript 𝑚 2 m_{1},m_{2} . The number of non-degenerate angles (not 0 nor π / 2 𝜋 2 \pi/2 ) is the middle number ( r a + r b − r ) subscript 𝑟 𝑎 subscript 𝑟 𝑏 𝑟 (r_{a}+r_{b}-r) .

Property of $C$ and $S$	$C$	$S$
total # columns	$r$
# zero columns in $S$ (left columns):	$r - r_{b} =$ # ${c_{i} = 1} = # {s_{i} = 0}$
# non-zero columns (middle columns):	$r_{a} + r_{b} - r = # {0 < c_{i}, s_{i} < 1}$
# zero columns in $C$ (right columns):	$r - r_{a} =$ # ${c_{i} = 0} = # {s_{i} = 1}$
total # rows	$m_{1}$ = # rows $A$	$m_{2}$ = # rows $B$
# non-zero rows	$r_{a} \leq m_{1}$	$r_{b} \leq m_{2}$
# zero rows	$m_{1} - r_{a}$	$m_{2} - r_{b}$

Table 2. Table 2: A primer of the properties of GSVD.

$[\begin{matrix} A \\ B \end{matrix}] = [\begin{matrix} U C \\ V S \end{matrix}] H$
$C, S$		$θ$ : Principal angle between $span {[\begin{matrix} A \\ B \end{matrix}]}$ and $span {[\begin{matrix} I_{n} \\ 0 \end{matrix}]}$ $\sin θ$ : SVD( $B H^{†}$ ) $\cos θ$ : SVD( $A H^{†}$ ) $\tan θ$ : SVD( $B A^{†}$ ) if $r = r_{a} := rank (A)$ $cot 𝜽$ : SVD( $A B^{†}$ ) if $r = r_{b} := rank (B)$
$U$	left singular vectors of $A H^{†}$ ( or $A B^{†}$ if $r = r_{b}$ )
$V$	left singular vectors of $B H^{†}$ ( or $B A^{†}$ if $r = r_{a}$ )

Table 3. Table 3 : The GSVD as portrayed in the documentation of most technical computing languages seems unlikely to inspire the user unfamiliar with the GSVD.

language	GSVD documentation in the corresponding language
matlab (R2018b)	https://www.mathworks.com/help/matlab/ref/gsvd.html ⬇ [U,V,X,C,S] = gsvd(A,B) returns unitary matrices U and V, a (usually) square matrix X, and nonnegative diagonal matrices C and S so that A = UCX’ B = VSX’ C’C + S’S = I A and B must have the same number of columns, but may have different numbers of rows. If A is m-by-p and B is n-by-p, then U is m-by-m, V is n-by-n, X is p-by-q, C is m-by-q and S is n-by-q, where q = min(m+n,p). The nonzero elements of S are always on its main diagonal. The nonzero elements of C are on the diagonal diag(C,max(0,q-m)). If m >= q, this is the main diagonal of C.
Mathematica (11.3.0)	https://reference.wolfram.com/language/ref/SingularValueDecomposition.html `>Details and Options`. SingularValueDecomposition[m,a] gives a list of matrices {{u,ua},{w,wa},v} such that m can be written as u.w.Conjugate[Transpose[v]] and a can be written as ua.wa.Conjugate[Transpose[v]].
R (geigen v2.2)	https://www.rdocumentation.org/packages/geigen/versions/2.2/topics/GSVD The matrix $A$ is a $m$ -by- $n$ matrix and the matrix $B$ is a $p$ -by- $n$ matrix. This function decomposes both matrices; if either one is complex than the other matrix is coerced to be complex. The Generalized Singular Value Decomposition of numeric matrices $A$ and $B$ is given as A = UD_1 [0 R]Q’, and B = VD_2[0 R]Q’, where $U$ an $m \times m$ orthogonal matrix $V$ an $p \times p$ orthogonal matrix $Q$ an $n \times n$ orthogonal matrix $R$ an $r$ -by- $r$ upper triangular non singular matrix and the matrix $[0 R]$ is an $r$ -by- $n$ matrix. $D_{1}, D_{2}$ are quasi diagonal matrices and nonnegative and satisfy $D_{1}^{'} D_{1} + D_{2}^{'} D_{2} = I .$ $D_{1}$ is an $m$ -by- $r$ matrix and $D_{2}$ is a $p$ -by- $r$ matrix. For details on this decomposition and the structure of the matrices $D_{1}$ and $D_{2}$ . see http://www.netlib.org/lapack/lug/node36.html.

Table 4. Table 4 : Documentation in Julia 1.4 (and above) with the original pull request https://github.com/JuliaLang/julia/pull/30239 .

language

GSVD documentation in corresponding language

Julia 1.4 (and above)

⬇ svd(A, B) -> GeneralizedSVD Compute the generalized SVD of A and B, returning a GeneralizedSVD factorization object F such that [A;B] = [F.U * F.D1; F.V * F.D2] * F.R0 * F.Q’ * U is a M-by-M orthogonal matrix, * V is a P-by-P orthogonal matrix, * Q is a N-by-N orthogonal matrix, * D1 is a M-by-(K+L) diagonal matrix with 1s in the first K entries, * D2 is a P-by-(K+L) matrix whose top right L-by-L block is diagonal, * R0 is a (K+L)-by-N matrix whose rightmost (K+L)-by-(K+L) block is nonsingular upper block triangular, K+L is the effective numerical rank of the matrix [A; B]. Iterating the decomposition produces the components U, V, Q, D1, D2, and R0. The generalized SVD is used in applications such as when one wants to compare how much belongs to A vs. how much belongs to B, as in human vs yeast genome, or signal vs noise, or between clusters vs within clusters. (See Edelman and Wang for discussion: https://arxiv.org/abs/1901.00485) It decomposes [A; B] into [UC; VS]H, where [UC; VS] is a natural orthogonal basis for the column space of [A; B], and H = RQ’ is a natural non-orthogonal basis for the rowspace of [A;B], where the top rows are most closely attributed to the A matrix, and the bottom to the B matrix. The multi-cosine/sine matrices C and S provide a multi-measure of how much A vs how much B, and U and V provide directions in which these are measured.

Equations111

[a b] = [a 0] + [0 b]

[a b] = [a 0] + [0 b]

[a b] = [u c v s] h,

[a b] = [u c v s] h,

[a; b] = [a; 0] + [0; b] .

[a; b] = [a; 0] + [0; b] .

[A B] = [A 0] + [0 B] .

[A B] = [A 0] + [0 B] .

\left[\begin{array}[]{c}A\\ B\end{array}\right]=\left[\begin{array}[]{c}UC\\ VS\end{array}\right]H,

\left[\begin{array}[]{c}A\\ B\end{array}\right]=\left[\begin{array}[]{c}UC\\ VS\end{array}\right]H,

\left[\begin{array}[]{c}A\\ B\end{array}\right]=GH,

\left[\begin{array}[]{c}A\\ B\end{array}\right]=GH,

\left[\begin{array}[]{c}A\\ B\end{array}\right]=\sum_{i=1}^{r}\left[i^{\mbox{th}}\mbox{ column of }\begin{bmatrix}UC\\ VS\end{bmatrix}\right]\left[i^{\mbox{th}}\mbox{ row of }H\right],

\left[\begin{array}[]{c}A\\ B\end{array}\right]=\sum_{i=1}^{r}\left[i^{\mbox{th}}\mbox{ column of }\begin{bmatrix}UC\\ VS\end{bmatrix}\right]\left[i^{\mbox{th}}\mbox{ row of }H\right],

\begin{split}\left[\begin{array}[]{c}A\\ B\end{array}\right]&\approx\left(\begin{bmatrix}UC\\ VS\end{bmatrix}I_{r,k}\right)\left(I_{r,k}^{\prime}H\right)\\ &=\sum_{i=1}^{k}\left[i^{\mbox{th}}\mbox{ column of }\begin{bmatrix}UC\\ VS\end{bmatrix}\right]\left[i^{\mbox{th}}\mbox{ row of }H\right].\end{split}

\begin{split}\left[\begin{array}[]{c}A\\ B\end{array}\right]&\approx\left(\begin{bmatrix}UC\\ VS\end{bmatrix}I_{r,k}\right)\left(I_{r,k}^{\prime}H\right)\\ &=\sum_{i=1}^{k}\left[i^{\mbox{th}}\mbox{ column of }\begin{bmatrix}UC\\ VS\end{bmatrix}\right]\left[i^{\mbox{th}}\mbox{ row of }H\right].\end{split}

cos θ_{k} [u_{k} 0] + sin θ_{k} [0 v_{k}], k = 1, 2, \dots, n .

cos θ_{k} [u_{k} 0] + sin θ_{k} [0 v_{k}], k = 1, 2, \dots, n .

\begin{bmatrix}A\\ B\end{bmatrix}=\underbrace{\begin{bmatrix}UC\\ VS\end{bmatrix}}_{\text{\parbox{105.00023pt}{\begin{center}column space as a hyperplane \\ \vspace{-.05in} (a canonical basis!) \end{center}}}}\times\ \underbrace{H}_{\text{\parbox{70.00015pt}{\begin{center} \vspace{.14in} Coordinates of $[A;B]$ \\ \vspace{-.06in} in the $[UC;VS]$ basis. \end{center}}}}

\begin{bmatrix}A\\ B\end{bmatrix}=\underbrace{\begin{bmatrix}UC\\ VS\end{bmatrix}}_{\text{\parbox{105.00023pt}{\begin{center}column space as a hyperplane \\ \vspace{-.05in} (a canonical basis!) \end{center}}}}\times\ \underbrace{H}_{\text{\parbox{70.00015pt}{\begin{center} \vspace{.14in} Coordinates of $[A;B]$ \\ \vspace{-.06in} in the $[UC;VS]$ basis. \end{center}}}}

col ([Y^{'} A (Y^{⊥})^{'} A]) = col (Z^{'} A), and, col ([I_{1} 0]),

col ([Y^{'} A (Y^{⊥})^{'} A]) = col (Z^{'} A), and, col ([I_{1} 0]),

\mbox E n er g y (A) = {e ∥ A e ∥^{2} : ∥ e ∥ = 1} \subset R^{n}, (A \in R^{m, n})

\mbox E n er g y (A) = {e ∥ A e ∥^{2} : ∥ e ∥ = 1} \subset R^{n}, (A \in R^{m, n})

\mbox E n er g y (A, B) = {e \frac{∥ A e ∥ ^{2}}{∥ B e ∥ ^{2}} : ∥ e ∥ = 1} \subset R^{n}, (A \in R^{m_{1}, n}, B \in R^{m_{2}, n}) .

\mbox E n er g y (A, B) = {e \frac{∥ A e ∥ ^{2}}{∥ B e ∥ ^{2}} : ∥ e ∥ = 1} \subset R^{n}, (A \in R^{m_{1}, n}, B \in R^{m_{2}, n}) .

[\sum x_{i}^{2}]^{3} = [\sum σ_{i}^{2} x_{i}^{2}]^{2},

[\sum x_{i}^{2}]^{3} = [\sum σ_{i}^{2} x_{i}^{2}]^{2},

∥ x ∥^{2} ∥ S H x ∥^{4} = ∥ C H x ∥^{4},

∥ x ∥^{2} ∥ S H x ∥^{4} = ∥ C H x ∥^{4},

∥ x ∥^{2} = \frac{∥ C H e ∥ ^{4}}{∥ S H e ∥ ^{4}}, \mbox an d \frac{∥ C H x ∥}{∥ S H x ∥} = \frac{∥ C H e ∥}{∥ S H e ∥} .

∥ x ∥^{2} = \frac{∥ C H e ∥ ^{4}}{∥ S H e ∥ ^{4}}, \mbox an d \frac{∥ C H x ∥}{∥ S H x ∥} = \frac{∥ C H e ∥}{∥ S H e ∥} .

A B^{†} = (U C H) (V S H)^{†} = U C H H^{†} S^{†} V^{'} = U (C / S) V^{'},

A B^{†} = (U C H) (V S H)^{†} = U C H H^{†} S^{†} V^{'} = U (C / S) V^{'},

P u_{i} = {u_{i} 0 if c_{i} < 1 if c_{i} = 1 .

P u_{i} = {u_{i} 0 if c_{i} < 1 if c_{i} = 1 .

A^{†} = V Σ^{†} U^{'},

A^{†} = V Σ^{†} U^{'},

C_{*}

C_{*}

\displaystyle A=U\left[\begin{array}[]{c|cccc}\begin{array}[]{ccccc}{\text{\tiny 1}}\\ &{\text{\tiny 1}}\\ &&.\\ \\ &&&.\\ &&&&{\text{\tiny 1 }}\end{array}&0\\ \hline\cr&\begin{array}[]{ccccc}{\text{\tiny$\!\!\!\!c_{r-r_{b}+1}$}}\\ \ \ \ \ \ \vspace{.2in}\ddots\\ \\ \end{array}\end{array}\right]H

S_{*}

\displaystyle B=V\left[\begin{array}[]{c|cccc}\begin{array}[]{ccccc}{\text{\tiny 0}}\\ &{\text{\tiny 0}}\\ &&.\\ \\ &&&.\\ &&&&{\text{\tiny 0 }}\end{array}&0\\ \hline\cr&\begin{array}[]{ccccc}\\ {\text{\tiny$\!\!\!\!\!\!\!\!s_{r-r_{b}+1}$}}\\ \ \ \ \ \ \vspace{.2in}\ddots\\ \\ \end{array}\end{array}\right]H

B^{†} = H_{*}^{†} S_{*}^{†} V^{'} .

B^{†} = H_{*}^{†} S_{*}^{†} V^{'} .

P A = U C_{*} H_{*},

P A = U C_{*} H_{*},

P A B^{†} = U C_{*} H_{*} H_{*}^{†} S_{*}^{†} V^{'} = U C_{*} / S_{*} V^{'} .

P A B^{†} = U C_{*} H_{*} H_{*}^{†} S_{*}^{†} V^{'} = U C_{*} / S_{*} V^{'} .

[A_{ϵ} B_{ϵ}] = [U C (ϵ) V S (ϵ)] H,

[A_{ϵ} B_{ϵ}] = [U C (ϵ) V S (ϵ)] H,

c_{i} (ϵ) = {c_{i} cos (ϵ) s_{i} > 0 s_{i} = 0 \mbox an d s_{i} (ϵ) = {s_{i} sin (ϵ) s_{i} > 0 s_{i} = 0 .

c_{i} (ϵ) = {c_{i} cos (ϵ) s_{i} > 0 s_{i} = 0 \mbox an d s_{i} (ϵ) = {s_{i} sin (ϵ) s_{i} > 0 s_{i} = 0 .

\mbox GS V D ([3004], [11]) = 2.4 \mbox an d \infty.

\mbox GS V D ([3004], [11]) = 2.4 \mbox an d \infty.

GSVD ([3004], [10 1 ϵ]) = 2.4 + O (ϵ^{2}) \mbox an d 5/ ϵ + O (ϵ) .

GSVD ([3004], [10 1 ϵ]) = 2.4 + O (ϵ^{2}) \mbox an d 5/ ϵ + O (ϵ) .

x min {∥ A x - b ∥ + λ \cdot ∥ Lx ∥}

x min {∥ A x - b ∥ + λ \cdot ∥ Lx ∥}

x_{\lambda}=\left[\begin{array}[]{c}A\\ \lambda L\end{array}\right]^{\dagger}\left[\begin{array}[]{c}b\\ 0\end{array}\right].

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newsiamremark

remarkRemark \newsiamremarkexampleExample

The GSVD: Where are the ellipses?,

Matrix Trigonometry, and more

Alan Edelman Department of Mathematics, MIT, Cambridge, MA (). [email protected]

Yuyang Wang AWS AI Labs, East Palo Alto, CA (). Work done prior joined Amazon. [email protected]

Abstract

This paper provides an advanced mathematical theory of the Generalized Singular Value Decomposition (GSVD) and its applications. We explore the geometry of the GSVD providing a long sought for picture which includes a horizontal and a vertical multiaxis. We further propose that the GSVD provides natural coordinates for the Grassmann manifold. This paper proves a theorem showing how the finite generalized singular values do or do not relate to the singular values of $AB^{\dagger}$ .

We then turn to applications, arguing that this geometrical theory is natural for understanding existing applications and recognizing opportunities for new applications. In particular the generalized singular vectors play a direct and as natural a mathematical role for certain applications as the singular vectors do for the SVD. In the same way that experts on the SVD often prefer not to cast SVD problems as eigenproblems, we propose that the GSVD, often cast as a generalized eigenproblem, is perhaps best cast in its natural setting.

We illustrate this theoretical approach and the natural multiaxes (with labels from technical domains) in the context of applications where the GSVD arises: Tikhonov regularization (unregularized vs regularized), Genome Reconstruction (humans vs yeast), Signal Processing (signal vs noise), and statistical analysis such as Analysis of variance (ANOVA) and discriminant analysis (between clusters vs within clusters.) With the aid of our ellipse figure, we encourage the labelling of the natural multiaxes in any GSVD problem.

keywords:

GSVD, SVD, ellipse, CS Decomposition, Tikhonov Regularization

{AMS}

65F22, 15A18, 15A23

1 Introduction

1.1 Prelude

If $a\in\mathbb{R}^{m_{1}}$ and $b\in\mathbb{R}^{m_{2}}$ are two vectors, then the block vector equation in $\mathbb{R}^{m_{1}+m_{2}}$ :

[TABLE]

may be thought of geometrically as a hypotenuse vector decomposed as the sum of two legs of a right triangle. If $h=\sqrt{\|a\|^{2}+\|b\|^{2}}\neq 0$ is the length of this hypotenuse and $u=a/\|a\|,v=b/\|b\|$ are the unit direction vectors for $a,b$ then we can write

[TABLE]

where $c$ and $s$ are the cosine and sine of the corresponding angles, namely $c=\|a\|/h$ and $s=\|b\|/h$ . This is ordinary planar trigonometry of a right triangle.

For notational convenience, we will sometimes use a semicolon (“;”) to denote the stacking (or vertical concatenation) of vectors and matrices, so that

[TABLE]

We note that $[uc;vs]$ is a unit vector in the direction $[a;b].$ The cotangent $\sigma=c/s$ is a slope which provides a measure of whether the vector is primarily in the “ $a$ ” (or top) direction, or the “ $b$ ,” or a mix depending on whether $\sigma$ is large, small, or in between.

The GSVD extends the above ideas to matrices.

1.2 The GSVD

This paper provides a new approach and understanding of the generalized SVD (GSVD) [30, 38, 9] of two matrices $A\in\mathbb{R}^{m_{1},n},B\in\mathbb{R}^{m_{2},n}$ . Generalizing the introductory paragraphs, the GSVD may be understood in the context of a generalized Pythagorean theorem with

[TABLE]

We take as our definition of a GSVD, a decomposition of $[A;B]$ with the form

[TABLE]

where $U,V$ are square orthogonal in $\mathbb{R}^{m_{1},m_{1}}$ . $\mathbb{R}^{m_{2},m_{2}}$ ; $C,S$ are 1-diagonal (see Figure 1 ) such that $C^{\prime}C+S^{\prime}S=I_{r}$ , and $H$ has full row rank $r$ where $r$ denotes rank( $[A;B]$ ). The remaining dimensions are implied, namely $C,S$ are in $\mathbb{R}^{m_{1},r}$ , $\mathbb{R}^{m_{2},r},$ and $H$ is in $\mathbb{R}^{r,n}$ .

The SVD is so widely used that applications need not be listed. Historically this was not always the case. Fields such as biology, economics, and computer science could be observed learning about the SVD one-by-one with great impact. Perhaps a kind of folklore notion is that the SVD applies any time an array $A$ needs to be quickly compressed to the main information out, or whenever $AA^{\prime}$ was lurking. We would love to foster a world where the GSVD finds applications one-by-one in many fields. Perhaps the new folklore is that the GSVD applies when two arrays with a common dimension need to be quickly compressed or whenever two matrices $AA^{\prime}$ and $BB^{\prime}$ are lurking. Of course both the SVD and GSVD underly more.

Some selected applications of the GSVD include oriented energy analysis [6, 7, 8, 10, 11, 39], (here the GSVD is sometimes called by the more descriptive name QSVD for “quotient” SVD), Tikhonov regularization [21, 14], Linear Discriminant Analysis [31, 24], and more recently in microarray analysis [3]. A review from 1992 and discussion of algorithms may be found in [5].

As a point of mathematical taste, many textbooks today still treat SVDs as a byproduct of exposition on eigenvalues. This is unfortunate, as most of the time considerations of $AA^{\prime}$ or $A^{\prime}\!A$ create unnecessary mathematical baggage best abandoned. The SVD is mature enough to live its own life separate from the symmetric eigenvalue problem. Taking this notion one step further, the GSVD deserves to live separately from generalized eigenvalue problems or the SVD. When a GSVD lurks, it is recommended to abandon old fashioned language and see the true GSVD construction in full mature light. We take this approach in a number of examples in this paper.

1.3 A “GH” decomposition

To clarify and streamline our view of the roles of the pieces of the GSVD, we propose that the GSVD be considered a GH decomposition:

[TABLE]

where $G=[UC;VS]$ (for Grassmann or geometric) denotes the information in the $r$ -dimensional hyperplane representing the column space of $[A;B]$ . Specifically the columns of $G$ are a natural orthonormal basis for that hyperplane in $\mathbb{R}^{m_{1}+m_{2}}$ , and the columns of $H$ are the coordinates of the columns of $[A;B]$ in that basis. Of course the $QR$ decomposition of $[A;B]$ has exactly the same properties, with one important difference: the $Q$ is not uniquely defined by the hyperplane, while in the GSVD, the choice is more or less canonical.

We further feel that the factorization into the two matrices $G$ and $H$ emphasizes the outer product rank $r$ form:

[TABLE]

which can be readily missed in the long form.

In analogy with the SVD or Non-negative Matrix Factorization (NMF) [27], one might consider a simultaneous rank reducing method where only the $k$ rows of $H$ with largest norm are kept.

In particular if we multiply $[A;B]$ on the right by $H^{\dagger}I_{r,k}I_{r,k}^{\prime}H$ , where $I_{r,k}$ is the first $k$ columns of the $r\times r$ identity, we obtain a rank reduced $[A;B]$ :

[TABLE]

We remark that $H^{\dagger}I_{r,k}I_{r,k}^{\prime}H$ is an oblique projector when $H$ is square non-singular, and an orthogonal projector when $H$ is orthogonal.

1.4 More details about $U,V,C,S,H$

The matrices $U,V,C,S,H$ deserve more detailed discussion, as may be found in Appendix A.

To help guide the reader, we offer a table of bases for the fundamental subspaces that appear in the GSVD. It is helpful to keep in mind that the columns of $C$ and $S$ are leftward looking towards the orthogonal $U$ and $V$ matrices in the GSVD factorization, while the rows of $C$ and $S$ are rightward looking towards the full row rank $H$ in the GSVD factorization.

[TABLE]

It is useful to point out that the common nullspace of $A$ and $B$ is killed by $H$ , i.e., if $Ax=0$ and $Bx=0$ then $Hx=0$ . A vector that is in only one of the nullspaces is not killed by $H$ , but $Hx$ is killed by 0 columns in $C$ or $S$ respectively.

Let $r_{a}=\text{rank}(A),r_{b}=\text{rank}(B),r=\text{rank}[A;B]$ . Table 1 shows the structure of $C$ and $S$ . A very common case has $r=n$ in which case the sizes of $C,S$ match that of $A,B$ .

1.5 Summary

This paper contains a number of insights and results about the GSVD:

•

We present an ellipse picture of the GSVD, which requires four dimensions to get a good feel for the general case (Section 2).

•

The GSVD generalizes planar trigonometry to matrix trigonometry (Section 3).

•

We consider $[UC;VS]$ as natural coordinates for $r$ dimensional hyperplanes (the Grassmann manifold) in $\mathbb{R}^{m}$ given that $m=m_{1}+m_{2}$ . We use the Grassmann manifold coordinates to clarify the link between the CS decomposition and the GSVD (other authors have observed vaguely that they are closely related). We view the $H$ matrix as the change of coordinates from canonical coordinates $[UC;VS]$ to the specifics of $[A;B]$ (Section 4).

•

We discuss the link between the GSVD and the principal angles between subspaces (Section 5), and related “energy portraits” (Section 6).

•

We prove a theorem relating GSVD $(A,B)$ and SVD $(AB^{\dagger})$ . They are not generally identical (Section 7).

•

We revisit applications in the geometric context, and interpret the GSVD as a multi-dimensional slope and connect applications (Section 8).

Notation

For $i=1,\dots,r$ , let $u_{i}$ denote the normalized $i$ -th column of $UC$ if $c_{i}\neq 0$ , or else define $u_{i}=0$ . Similarly, let $v_{i}$ denote the normalized $i$ -th column of $VS$ if $s_{i}\neq 0$ , or else define $v_{i}=0$ . This notation conveniently avoids issues of different sizes and conventions. For example, $U$ or $V$ may have fewer than $r$ columns. Details of the placement of the $c_{i}$ and $s_{i}$ appear in Figure 1. Suffice it to say for now that $u_{i}$ is the $i$ -th column of the $U$ matrix when $c_{i}>0$ , and $v_{i}$ may be found in the $k$ -th column of the $V$ matrix when $S_{ki}=s_{i}>0$ . The indirection in $V$ is admittedly unfortunate, but in all cases, the non-zero $v_{i}$ by convention are left to right contiguous columns of $V$ that may either start from the left, or end at the right, but in many situations $v_{i}$ is not in the $i$ -th column. We use $A^{\dagger}$ to denote the pseudo-inverse of $A.$ The “slash” and “backslash” are defined as $A\backslash B:=A^{\dagger}B,$ and $A/B:=AB^{\dagger}.$ We also overload the notation $\text{GSVD}(A,B)$ to denote the generalized singular values of $(A,B),$ while $\text{SVD}(A)$ means the singular values of $A$ .

2 Where are The Ellipses?

The SVD ellipse picture for a matrix $A$ (Figure 2) is a very familiar visual for the action of $A$ on the unit ball. We are not aware of any ellipse pictures in the literature nor even a notion that a natural ellipse picture exists for the GSVD or even the CSD (CS Decomposition) [19]. We believe that the lack of a geometric view of the GSVD is part of the reason that the GSVD is not as widely understood or as widely used as it should be.

Regarding an ellipse picture, one might blame some sort of human inability to perceive higher dimensions as a complication, but we show that this is not really the case in Figure 3.

The gap in understanding is underscored by the curiosity expressed online, but without answer, on such sites as MATLAB Central [13] (reproduced here111The authors contacted Mr. Dyas on December 26, 2019 to inform him of the solution of his twenty year query.) and a similar request on the question-and-answer site Quora [34] (not reproduced here).

Subject: Generalized SVD geometry? From: Bob Dyas Date: 29 Feb, 2000 15:31:31

Message: 1 of 1 $\longleftarrow$ indicates no answer in 20 years!

Is there a geometric interpretation of the generalized singular value decomposition? I’m looking for something comparable to the geometry associated with the standard SVD. I understand how U, V and the singular values of the SVD relate to the geometry of the input matrix but I don’t have an intuitive feel for how U, V, X and the generalized singular values relate to the geometry of the two input matrices of the GSVD.

Any help would be appreciated.

Bob Dyas

2.1 Understanding the Ellipse Picture for the GSVD

Figure 2, portrayed in four dimensional space, generically serves to illustrate the GSVD in any dimensions.

Given $A\in\mathbb{R}^{m_{1},n},B\in\mathbb{R}^{m_{2},n}$ , we consider the unit sphere (shown in exploded form in Figure 2 as a red circle) in the span of $[A;B]$ (shown as a red plane). In blue and green we have the ellipses that show the “downward” and “leftward” projections of these ellipses onto the multiaxes $X$ and $Y$ defined as those vectors whose first $m_{1}$ or last $m_{2}$ coordinates may not vanish. (For example if $m_{1}=m_{2}$ in $\mathbb{R}^{4}$ , then the $X$ multiaxis consists of vectors of the form $(x_{1},x_{2},0,0)$ and the $Y$ multiaxis consists of vectors of the form $(0,0,x_{3},x_{4})$ .

The $u_{i},v_{i}$ are semi-axes of these ellipses, with lengths $c_{i},s_{i}$ . The vector $[u_{i}c_{i};v_{i}s_{i}]$ is on the (red) unit sphere in the span of $[A;B]$ .

Since we have the equality $[A;B]x=[UC;VS]Hx$ , we see that $H$ is the change of coordinates from the columns of $[A;B]$ to the orthonormal columns of $[UC;VS]$ , and $H^{\dagger}$ goes the other way.

2.2 An in depth look at small dimensional special cases

2.2.1 A red line in $\mathbb{R}^{2}$ , $X$ =the $x$ -axis, $Y$ =the $y$ -axis

$(m_{1}=m_{2}=n=r=1)$

Below we show the possibilities for $[C;S]$ for a line in $\mathbb{R}^{2}$ (drawn in red as the span of $[a,b]$ where $a$ and $b$ are $\in\mathbb{R}^{1}$ ) which may be horizontal $a\neq 0,b=0$ , general position $a\neq 0,b\neq 0$ , or vertical $a=0,b\neq 0.$ In any event the $c$ and $s$ are the cosine and sine of the angle with the horizontal.

2.2.2 A red line in $\mathbb{R}^{3}$ , $X$ =the $xy$ -plane, $Y$ =the $z$ -axis

( $m_{1}=2,m_{2}=n=r=1$ ) Below we show the possibilities for $[C;S]$ for a line in $\mathbb{R}^{3}$ (drawn in red as the span of $[a,b]$ , where $a\in\mathbb{R}^{2}$ , $b\in\mathbb{R}^{1}$ ). The $X$ multiaxis is traditionally labeled the $xy$ -plane, and the $Y$ is the $z$ -axis. A line can be in the $xy$ -plane, in general position, or along the $z$ -axis. The corresponding $[C;S]$ matrix is illustrated. The $c$ is the angle between the red line and the $xy$ -plane, while the $s$ is the angle of the red line and the $z$ -axis.

2.2.3 A red line in $\mathbb{R}^{3}$ , $X$ = $x$ -axis, $Y$ =the $yz$ -plane

( $m_{1}=2,m_{2}=n=r=1$ ) Below we show the possibilities for $[C;S]$ for a line in $\mathbb{R}^{3}$ (drawn in red as the span of $[a,b]$ , where $a\in\mathbb{R}^{1}$ , $b\in\mathbb{R}^{2}$ ). A line can be along the $x$ -axis, in general position, or in the $yz$ -plane. The corresponding $[C;S]$ matrix is illustrated. The $c$ is the angle between the red line and the $x$ axis, while the $s$ is the angle of the red line and the $yz$ -plane. The shaded $Y$ = $yz$ -plane indicates the red line is in that plane.

2.2.4 A red plane in $\mathbb{R}^{3}$ , $X$ =the $xy$ -plane, $Y$ =the $z$ -axis

( $m_{1}=2,m_{2}=1,n=r=2$ ) Below we show the possibilities for $[C;S]$ for a plane in $\mathbb{R}^{3}$ (drawn in red as the span of $[A,B]$ , where $A\in\mathbb{R}^{2,2}$ , $B\in\mathbb{R}^{1,2}$ ). A plane can be the $xy$ -plane. A plane in general position in $\mathbb{R}^{3}$ intersects the $xy$ -plane in a line (shown as a dashed red line) but does not include the $z$ axis. A final possibility for a plane is that it includes the $z$ axis (broken red/green line.)

The corresponding $[C;S]$ matrix is illustrated. We have $c_{1}=1$ corresponding to the 0 degree angle from a line in the red plane and the $x,y$ axis. We have $c_{2}$ which is the cosine of the angle formed from a line at right angles from the aforementioned line and the $xy$ -plane. Note that $s_{1}=0$ is not found in the $S$ matrix, since there is room for only one row which contains $s_{2}$ .

Figure 4 below is the ellipse picture in 3 dimensions (3d), which admittedly has too few dimensions to understand the general picture. Nevertheless, one can clearly see the unit circle in the sphere being projected down to an ellipse on the $x,y$ axis. We see the $c_{1}=1$ and $c_{2}=\cos\theta$ as the lengths of the semi-axis of the ellipse. The $u_{1}$ direction is where the plane representing $\text{span}([A;B])$ intersects the $xy$ -plane. The $u_{2}$ direction is orthogonal to $u_{1}$ and also in the $\text{span}([A;B])$ plane. The $u_{2}$ direction is the maximum slope off the $xy$ -plane, and $s_{2}=\sin\theta$ is the length of the projection of the unit circle onto the $z$ -axis. The orthogonal direction projects to [math] giving the $s_{1}=0$ .

2.3 On infinite generalized singular values and horizontal directions

As may become clear upon inspection of the small dimensional cases, it is very possible that we have some $c_{i}=1$ and $s_{i}=0$ so that the generalized singular value $c_{i}/s_{i}$ is infinite. These infinite singular values are associated with horizontal directions $[u_{i};0]$ in the “red” hyperplane, i.e. $[u_{i};0]\in{\text{s}pan}([A;B])$ . They arise when our hyperplane intersects our $X$ multiaxis in any non-zero direction.

The situation in Section 2.2.4 illustrates that this is typical when we consider a plane in $\mathbb{R}^{3}$ and $X$ is the $xy$ -plane. ( $A$ is $2\times 2$ and $B$ is $1\times 2$ .) The unit circle in the plane has a vector of length 1, $[u_{1};0]$ , that lives on the horizontal $xy$ -plane. The orthogonal direction, $[c_{2}u_{2},s_{2}]$ has a projection $[c_{2}u_{2};0]$ on the $xy$ -plane that is generically shorter than a unit vector, but still orthogonal to $[u_{1};0]$ .

3 Matrix Trigonometry

We claim that the GSVD is the natural generalization of high school trigonometry to what we might call “matrix trigonometry.”

There is so much in Figure 5 that we are all familiar with in the planar case: There is all of trigonometry, and in particular there is $\tan\theta$ which has a special role because $B/A$ is the slope of the line. If $|B|$ is small relative to $|A|,$ we have a shallow slope, and vice versa. The only hint that there is some directionality is the possibility of a $\pm$ sign. To specify directions we sometimes would write a hypotenuse vector in component form: $A\mbox{\bf i}+B\mbox{\bf j}$ . If we take the components of a unit vector in the direction of the hypotenuse, then the components form a cosine-sine pair: $\cos\theta\mbox{\bf i}+\sin\theta\mbox{\bf j}$ .

The ideas of trigonometry, slope, component form and cosine-sine pairs extend to higher dimensions through the GSVD. Instead of one triangle, there are $n$ triangles. Instead of one vector i, there are $n$ vectors in the columns of $U$ . Instead of one vector j, there are $n$ vectors in the columns of $V$ . Instead of a unit length hypotenuse there are $n$ unit length hypotenuses, which can be written in the component form

[TABLE]

The $n$ hypotenuses, as we show in Figure 3, live on a unit sphere that projects nicely “down”ward and “left”ward. The $\cos\theta_{k}u_{k}$ are semi-axes of the downward ellipse; and the $\sin\theta_{k}v_{k}$ on the leftward ellipse.

Just as $b/a$ tells you how small or big $b$ is relative to $a$ , the GSVD tells you how small or big $B$ is relative to $A$ , but now it is in $n$ natural directions. Thus $B$ can be larger than $A$ in some directions, and smaller in others.

There is some temptation to try to say that the GSVD is related to the principal angles of the column space of $A$ and the column space of $B$ . This of course makes no more sense than looking for anything other than right angles between the $x$ -axis and the $y$ -axis in 2d. The interesting angles are between the span of the column space of $[A;B]$ and the canonical axes $[I_{1};0]$ . More details can be found in Section 5.

One quick algebraic way to define the singular values of an $m,n$ matrix $A$ is to find the diagonal matrix with non-negative entries in the set $\{UAV^{\prime}\}$ where $U$ is $m$ by $m$ orthogonal and $V$ is $n$ by $n$ orthogonal. This is the equivalence class representative definition. Similarly, one can define the generalized singular values of a pair of matrices $(A,B)$ with the same number of columns. The “cosine-sine” format, is the pair of (1-)diagonal matrices $(C,S)$ with non-negative entries in the set of matrix pairs $\{(UAH^{-1},VBH^{-1}):U,V\mbox{ orthogonal},H\mbox{ non-singular}\}$ . Often the GSVD is given in “cotangent” format, which is the ratio of cosines to sines.

We summarize the GSVD properties with Table 2.

4 The relationship between the GSVD and the CS Decomposition

It is often written [19, Section 8.7.5] that the GSVD and the CS Decomposition are closely related. The geometric viewpoint highlights the GSVD and the CS decomposition as rooted in representations of points in the Grassmann manifold (linear hyperplanes through the origin) in an $m=m_{1}+m_{2}$ dimensional space using $[UC;VS]$ as natural coordinates.

The simple notion is that the information may be thought of as

[TABLE]

This connection is rooted ultimately in the Cartan decomposition of the Grassmann manifold, one of the finitely many classes of symmetric spaces [22]. The idea is that certain matrix spaces have a “KAK” or compact/abelian/compact decomposition. The SVD is one example as it is orthogonal/diagonal/orthogonal. The CS decomposition is another. This observation may be found in a numerical linear algebra conference presentation [15] and in the quantum computing literature [37].

To be sure if $[A;B]$ is already orthogonal then so is $H$ . This constitutes the “left half” of the complete CS decomposition. Thus a GSVD is a “left half” of a CS, when $[A;B]$ are orthogonal, and the “left half” of a CS is a GSVD. One can also have a basis for the orthogonal complement of span( $[A;B]$ ) to get the “right half.” This captures the isomorphism between the Grassmann manifold $\mathcal{G}_{m,n}$ (i.e., $n$ -dimensional subspace in $\mathbb{R}^{m}$ ) and $\mathcal{G}_{m,m-n}$ (i.e., $(m-n)$ -dimensional subspace in $\mathbb{R}^{m}$ ). Thus if one takes the combined SVD’s of orthogonal matrices whose spans are orthogonal complements, one has the CS decomposition and vice versa.

Any which way, the mathematical idea underlying all is that there is a fairly canonical representation for generic elements of the Grassmann manifold and a matrix connecting back to an orthogonal or arbitrary basis which has a further symmetry property when taking both the span of $[A;B]$ and its orthogonal complement in conjunction in that transposing a full orthogonal matrix reverses the roles canonical coordinates and basis converter.

Parameter Count

There has been a longstanding tradition in numerical linear algebra to overwrite matrix inputs with the parameters from the factored form. Thus if $A$ is $n\times n$ , the $LU$ factorization has the $n(n-1)/2$ parameters from $L$ and the $n(n+1)/2$ parameters from $U$ . Similarly if $A=QR$ , the $Q$ while appearing naively as an $n\times n$ matrix, actually only has $n(n-1)/2$ parameters, which is exactly what is computed in software [4].

Given an $m\times n$ matrix $[A;B]$ of rank $r$ , and a decomposition of $m$ as $m=m_{1}+m_{2}$ , we can count parameters on both the left and right sides of $[A;B]=[UC;VS]H.$ While tricky, the only facts needed are:

Rank Codimension: The codimension of the rank $r$ matrices of size $m\times n$ is $(m-r)(n-r)$ [12, Lemma 3.3]. 2. 2.

Stiefel Manifold Dimension: The dimension of the Stiefel manifold $\mathcal{V}_{m,n}$ of $n$ ordered orthonormal directions in $\mathbb{R}^{m}$ is $n(m-n)+n(n-1)/2$ [17, Section 2.2]. 3. 3.

Grassmann Manifold Dimension: The dimension of the Grassmann manifold $\mathcal{G}_{m,n}$ of $n$ -dimensional subspaces in $\mathbb{R}^{m}$ is $n(m-n)$ [17, Section 2.5].

[TABLE]

To understand the parameter count, we begin with the simple observation that $r_{a}=\min(r,m_{1})$ generically and $r_{b}=\min(r,m_{2})$ , from which we can derive the number of $\theta_{i}$ that are strictly between [math] and $\pi/2$ as $r_{a}+r_{b}-r$ . The relevant Stiefel manifolds are $\mathcal{V}_{m_{1},r_{a}+r_{b}-r}$ and $\mathcal{V}_{m_{2},r_{a}+r_{b}-r}$ . These correspond exactly to choosing the directions for the axes of the ellipses. Also one must consider $\mathcal{G}_{m_{i}-(r_{a}+r_{b}-r),r-r_{a}}$ for $i=1,2$ as this is the dimension divide between the [math] degree angles and the $\pi/2$ angles when this has content. This data is summarized below:

[TABLE]

We remark that further fine grain detailed parameter counts are possible including lower rank $A$ and $B$ , but we content ourselves with the table above.

5 Principal angles between subspaces

Section 3 points out that the GSVD of $A$ and $B$ does not contain angle information between the column spaces of $A$ and $B$ . Rather, Figure 3 illustrates that the relevant angles are between the “red space” ( $\text{col}([A;B])$ ) and the “blue space” ( $\text{col}([I_{1};0])$ ).

This suggests that the GSVD can be used to compute principal angles (see Section 6.4.3. of [19]) between the column spaces of $A$ and $B$ when $m_{1}=m_{2}.$ More precisely, it can be accomplished by letting $Z=[Y|Y^{\perp}]$ be any orthogonal matrix where $\text{col}(Y)=\text{col}(B).$ It follows that $\text{GSVD}(Y^{\prime}A,(Y^{\perp})^{\prime}A)$ are the cotangents of the desired principal angles.

This maybe seen geometrically as the GSVD computes the cotangents of angles between

[TABLE]

but we can multiply by the orthogonal matrix $Z$ , which preserves angles, obtaining the angles between $\text{col}(A)$ and $\text{col}(Z[I_{1};0])=\text{col}(B).$

We can conclude that we have a rotated Figure 3 (shown in Figure 10) where the $X$ and $Y$ multiaxes are replaced with $\text{span}(Y)$ and $\text{span}(Y^{\perp}).$

6 The Lemniscate Plots from Leuven, Belgium

In a series of early papers most of which date back to the 1980s [6, 7, 8, 10, 11, 39], energy portraits that relate to the SVD and GSVD of a matrix or a pair of matrices are discussed with applications.

The definition of an energy portrait of a single matrix is

[TABLE]

and for a pair of matrices with the same number of columns

[TABLE]

It is important to point out that the curves in Figure 6 are not ellipses but rather lemniscate-like portraits. They do not even live in the same spaces as the ellipse pictures. The standard SVD ellipse lives in $\mathbb{R}^{m}$ and the GSVD picture in this paper lives in $\mathbb{R}^{m_{1}+m_{2}}$ . By contrast, the energy portraits from Leuven live in $\mathbb{R}^{n}$ .

We provide the Julia codes that produce these curves as a reference. Readers are encouraged to try other matrices.

A = [.577699 -.224144;1.190069 .836516] # Figure 6 (Left) e(theta) = [cos(theta), sin(theta)] r1(theta) = sum(abs2, A*e(theta)) r2(theta) = sum(abs2, A’e(theta)) theta = pi * (0:.01:2) plot( theta, r1.(theta), proj=:polar, label="SVD Energy(A)") plot!(theta, r2.(theta), proj=:polar, label="SVD Energy(A’)")

A = [.27 .66 ; -1.4 1.3] # Figure 6 (Right) B = [1 0; -.5 1.1] e(theta) = [cos(theta), sin(theta)] r1(theta) = sum(abs2, Ae(theta)) r2(theta) = sum(abs2, Be(theta)) theta = pi * (0:.01:2) plot(theta,r1.(theta)./r2.(theta), proj=:polar,label="GSVD Energy(A,B)")

For completeness, we thought we would take a closer look at these older plots. To explain in what sense the curves are lemniscates, it is best to eliminate the “e” in the definition and rewrite the energy plots as the zero set of an algebraic equation, thereby connecting the portraits to the field of algebraic geometry.

Theorem 6.1.

If $Vx\ \in\mbox{Energy}(A)$ , then $x$ satisfies the algebraic polynomial equation

[TABLE]

where $A=U\Sigma V^{\prime}$ . Further if $x\in\mbox{Energy}(A,B)$ , then $x$ satisfies the algebraic polynomial equation

[TABLE]

*where $[A;B]=[UC;VS]H$ . *

Before proving the theorem we provide a historical analog. We might compare the solution set of $(\sum_{i=1}^{n}x_{i}^{2})^{3}=(\sum_{i=1}^{n}\sigma_{i}^{2}x_{i}^{2})^{2},$ with that of $(\sum_{i=1}^{2}x_{i}^{2})=(\sum_{i=1}^{2}\sigma_{i}^{2}x_{i}^{2}),$ which is the lemniscate of Booth whose study traces back to the 5th century Greek philosopher Proclus. The difference being that Booth specialized to $n=2$ and only took first powers of the quantities, but in spirit it is a similar algebraic polynomial equation.

Proof 6.2.

Taking $e=Vy$ , we see that $e\|Ae\|^{2}=Vy\|\Sigma y\|^{2}=Vx$ where $x=y\|\Sigma y\|^{2}.$ It is straightforward to check $\|x\|^{6}=\|\Sigma x\|^{4}=\|\Sigma y\|^{12},$ since $\|y\|=1$ which is exactly the result for a single matrix.

For the two matrix case, where $A=UCH$ and $B=VSH$ , if $x=e\|Ae\|^{2}/\|Be\|^{2}$ , then

[TABLE]

7 On the $\text{GSVD}(A,B)$ and the $\text{SVD}(AB^{\dagger}$ )

In this section we relate the finite part (nonzero, noninfinite) of the generalized singular values of $(A,B)$ (denoted as $\text{GSVD}(A,B)$ ) to the singular values of $AB^{\dagger}$ (denoted as $\text{SVD}(AB^{\dagger})$ ) where $B^{\dagger}$ is the pseudoinverse of $B$ . We may use the notation $A/B$ for $AB^{\dagger}$ . An issue arises that may surprise some readers.

7.1 Why there is an issue?

One may expect that there may always be a relation between the GSVD of $A,B$ and the SVD of $AB^{\dagger}$ . For example, in the matlab documentation222https://www.mathworks.com/help/matlab/ref/GSVD.html it is stated that the generalized singular values are the ratios of the diagonal elements of $C$ and $S$ in a given example. One might infer from the documentation that this is always the case.

However it is not generally true when there are infinite singular values, i.e., when $r_{b}<r$ .

Consider a simple example where $A$ is a non-singular $n\times n$ matrix, and $B$ is a nonzero $1\times n$ matrix. In this case $r_{b}=1,r=n$ . The GSVD of $A,B$ is readily verified to have $n-1$ infinite singular values, and the one finite value $\sigma_{\text{GSVD}}=1/\|B/A\|.$ The SVD of $AB^{\dagger}$ is just the length of $AB^{\dagger}=AB^{\prime}/\|B\|^{2}$ or $\sigma_{\text{SVD}}=\|BA^{\prime}\|/\|B\|^{2}.$

When $n=1,A=a,B=b$ , both of these expressions are equal to the absolute ratio $|a/b|$ , ( $r=r_{b}=1$ after all) but for larger $n$ the two matrix expressions are not equal.

An extremely simple special case takes $A=\begin{pmatrix}3&0\\ 0&4\end{pmatrix}$ and $B=(1\ \ 1).$ The two values are $\sigma_{\text{GSVD}}=2.4$ and $\sigma_{\text{SVD}}=2.5$ exactly.

The issue arises exactly when there are infinite $\sigma$ . If there are no infinite $\sigma$ , $S$ has no [math] columns, and we can write

[TABLE]

which is a singular value decomposition of $A/B$ . (We use the property that $H$ has full row rank to conclude $HH^{\dagger}=I_{r}$ and that $C/S$ is an $m_{1}\times m_{2}$ matrix with $c_{i}/s_{i}$ on the main diagonal.)

The problem that arises when some $\sigma=\infty$ is that $B^{\dagger}=(VSH)^{\dagger}=(SH)^{\dagger}V^{\prime}$ does not equal $H^{\dagger}S^{\dagger}V^{\prime}$ when $S$ has any zero columns.

7.2 The significance of horizontal directions and their orthogonal complement in $X$

In Section 2.3, we considered the intersection of span( $[A;B]$ ) with the $X$ multiaxis. An orthogonal basis for this intersection is $[u_{1};0],\ldots,[u_{r-r_{b}};0]$ which correspond exactly to the $c_{i}=1$ .

Working entirely in $X$ as an $m_{1}$ dimensional space, we are interested in the $m_{1}\times m_{1}$ projection matrix $P$ that kills the directions of intersection. Precisely we define $P$ on the orthogonal basis for $\mathbb{R}^{m_{1}}$ :

[TABLE]

Suppose $N$ is a matrix whose columns are a basis for the null space of $B$ . If we consider $AN$ then the span of the columns of $AN$ is the intersection we are discussing, i.e., the intersection of $X$ with span( $[A;B]$ ). To be sure either the column of $N$ is in the common null space of $A$ and $B$ , so that the corresponding column of $AN$ is [math], or else if one follows through the first $r-r_{b}$ columns of $H^{\dagger}$ in $A=UCH^{\dagger}$ , one sees that we will hit the “ $c_{i}=1$ ” columns in $C$ only, hence we will emerge a linear combination of $u_{1},\ldots,u_{r-r_{b}}$ .

We can thus describe $P$ as the orthogonal projection onto the left nullspace of $AN$ which is the orthogonal complement of the column space of $AN$ .

7.3 The correct modified theorem requires $PA/B$

We remind the reader of the usual definition of the matrix pseudoinverse in terms of the singular value decomposition:

[TABLE]

where $\Sigma^{\dagger}$ means taking the inverse of the finite entries in $\Sigma.$ When $A$ has full column rank and $B$ has full row rank, we have $(AB)^{\dagger}=B^{\dagger}A^{\dagger}.$ It is easy to see that $[\bm{0}\ B]^{\dagger}=[\bm{0};B^{\dagger}].$

Theorem 7.1.

*Let $N$ be a matrix whose columns are a basis for the nullspace of $B$ , and $P$ be the orthogonal projection onto the left nullspace of $AN$ . The finite non-zero generalized singular values of $(A,B)$ are the same as the non-zero singular values of $PAB^{\dagger}$ . *

Proof 7.2.

Setting notation, we have

[TABLE]

so that $B=VS_{*}H_{*}$ , where $S_{*}$ are the rightmost $r_{b}$ non-zero columns of $S$ (indexed by $i=r-r_{b}+1,...,r$ ) and $H_{*}$ are the corresponding rows (the bottom $r_{b}$ ) of $H$ . (To see this note that $B=V[0\ S_{*}][?;H_{*}]$ where the “?” denotes rows that hit the 0 columns in $S$ so we do not care what they are.) We point out that $H_{*}$ has full row rank as the rows of $H_{*}$ are a subset of the full row rank matrix $H$ . We immediately conclude that

[TABLE]

We further claim that

[TABLE]

where $C_{*}$ are the exact corresponding columns of $C$ (the rightmost $r_{b}$ indexed by $i=r-r_{b}+1,...,r$ ), which are the $c_{i}<1$ . To see this, first observe that the definition of $P$ as described in Section 7.2. is $PU=U[0\ \ I_{*}]$ where $I_{*}$ are the rightmost $r_{b}$ columns of the identity indexed by $i=r-r_{b}+1,...,r$ . Thus $PA=U[0\ C_{*}][?;H_{*}]=UC_{*}H_{*}$ the [math] indicating the columns of $U$ killed by $P$ .

Now that we have compressed out the immaterial columns, and knowing that $H_{*}H_{*}^{\dagger}=I_{r_{b}}$ by the full row rank condition, we can compute

[TABLE]

This is a singular value decomposition of $PAB^{\dagger}$ , with $\Sigma=C_{*}/S_{*}$ an $m_{1}\times m_{2}$ diagonal matrix, with the $c_{i}/s_{i}$ in decreasing order on the diagonal and no $s_{i}=0$ .

Corollary 7.3.

*If $B$ has full column rank ( $r_{b}=n$ ) or if the weaker condition holds that $r=rank([A;B])=r_{b}=rank(B)$ , then $P$ is not needed, i.e., the finite non-zero generalized singular values of $(A,B)$ are the same as the non-zero singular values of $AB^{\dagger}$ . *

Proof 7.4.

If $r_{b}=n$ , then $B$ has nothing in the nullspace, $N$ has no columns, and $P$ is obviously $I$ . More generally, if $r_{b}=r$ , then $B$ has nothing in its nullspace that is not also in the nullspace of $A$ , so if $AN$ has any columns at all, it is the zero matrix, so again projection onto the left nullspace is $P=I$ .

7.4 Blame the pseudoinverse not the GSVD

The difficulty with $AB^{\dagger}$ may seem like an unfortunate consequence of infinite singular values, but in point of fact, it is related to the discontinuity in the definition of the pseudoinverse. If one takes a bigger picture viewpoint, it is easy to see that infinite singular values are natural limits of finite singular values.

The only truly natural discontinuity in the GSVD is the reduction of rank of $[A;B]$ which reduces the dimensionality of the hyperplane (and the rank of $H$ .)

We mention some limit type results which help understand the nature of the infinite generalized singular values:

Theorem 7.5.

*If rank( $[A;B]$ )= $r$ , and $m_{2}\geq r$ , then we can define a continuous curve of matrices $[A_{\epsilon},B_{\epsilon}]$ of the same shape as $[A;B]$ without infinite generalized singular values when $\epsilon>0$ is small but whose limit as $\epsilon\rightarrow 0$ continuously converges to the generalized singular values of $[A,B]$ , finite or infinite. *

Proof 7.6.

Take

[TABLE]

where

[TABLE]

Corollary 7.7.

*If rank( $[A;B]$ )= $r$ , and $m_{2}<r$ , then we can define a continuous curve of matrices $[A_{\epsilon},B_{\epsilon}]$ without infinite generalized singular values when $\epsilon>0$ is small but whose limit as $\epsilon\rightarrow 0$ continuously converges to the generalized singular values of $[A,B]$ by row augmenting $B_{\epsilon}$ to contain $r$ rows. *

Proof 7.8.

*Simply add $r-m_{2}$ rows of zeros to the bottom of $B$ . This does not change the generalized singular values of $[A;B]$ or $U$ , $C$ or $H$ . $S$ is augmented with $r-m_{2}$ rows of zeros and $V$ is augmented with $r-m_{2}$ rows and columns with an identity matrix. Apply the construction in Theorem 7.5 to complete the proof. *

Example 7.9.

Consider that

[TABLE]

One might seek nearby matrices with no infinite generalized singular values. This is impossible if we insist that $B$ remain $1\times 2$ but is possible if we augment $B$ with one row, which in this case we can simply take

[TABLE]

Corollary 7.10.

*Suppose $[A_{\epsilon},B_{\epsilon}]$ has rank $r$ for $0\leq\epsilon<\epsilon_{0}$ is a continuous curve, where $B_{\epsilon}$ has rank $r$ for $\epsilon>0$ but may drop rank at $\epsilon=0$ . We then have that the generalized singular values are a continuous function of $[A_{\epsilon},B_{\epsilon}]$ as $\epsilon\rightarrow 0$ . *

Proof 7.11.

*The only true discontinuity in the GSVD is the potential for a drop in rank of $[A;B]$ . This is avoided in the statement by keeping $[A_{\epsilon},B_{\epsilon}]$ rank $r$ . Thus the limit of the column space is the column space of the limit. *

We do remark on the other hand that if $[A_{\epsilon},B_{\epsilon}]$ drops rank, then we can only say that the limit of the column space contains the column space of the limit, which can lead to all kind of discontinuities in the generalized singular values.

8 GSVD Applications and their Geometric Interpretations

8.1 Geometry of Tikhonov Regularization

8.1.1 The two cosine damping

We show how geometry can add insight to our understanding of Tikhonov Regularization:

[TABLE]

by providing a two cosines view of damping. Specifically, the way Tikhonov regularization reduces the solution or “weights,” is usually understood algebraically in terms of adding a regularizer term that moves the original problem away from some kind of ill-conditioned setting. We will show that, in Figure 7, one cosine comes from the projection from the horizontal (blue) plane to the span of $[A;\lambda L]$ red plane. The other cosine comes from the non-canonical basis of the plane: the columns of $[A;\lambda L]$ which elongate with $\lambda$ , hence the coordinates shrink.

While the “calming influence” **[19, Section 6.1.26]**, **[5, Section 4.4]**, **[21]** of the regularization parameter $\lambda$ has been well studied algebraically, we identify geometrically in (5) the influence as a factor of $\cos^{2}\theta_{\lambda}$ where $\tan\theta_{\lambda}=\lambda\tan\theta_{1}$ so that $\cos^{2}\theta_{\lambda}=1/(1+\lambda^{2}\tan^{2}\theta_{1})$ , where $\theta_{1}$ is the angle that corresponds to $\lambda=1.$ We will compare the $\cos^{2}$ formulation with previous formulations explaining why we find that this formulation feels somewhat more insightful.

Before we start, let us recap Tikhonov regularization. Suppose we have a matrix $A$ , which we will assume has full column rank. The $\lambda=0$ problem (standard least squares) is the computation of $x_{0}=A^{\dagger}b=(A^{\prime}A)^{-1}A^{\prime}b,$ the standard solution to the normal equations $A^{\prime}Ax=A^{\prime}b.$ To regularize we pick a suitable matrix $L$ , and a “regularization parameter” $\lambda$ , and then solve instead $(A^{\prime}A+\lambda^{2}L^{\prime}L)x=A^{\prime}b,$ which is equivalent to computing

[TABLE]

From the geometrical point of view, we believe the reformulation in Theorem 8.1 below is more revealing of the “calming effect.” Figure 7 demonstrates the hyperplane onto which $[b;0]$ gets projected for varying $\lambda.$

For every $\lambda$ , we obtain the GSVD as a continuous function of $\lambda$ :

[TABLE]

where it is easy to check that $H_{\lambda}$ is square non-singular. It is convenient to use the compact format described in Section A.3 here. Thus we take $U$ to be $m_{1}\times n$ , $C$ and $S$ to be square diagonal $n\times n$ . The exact values in $C$ and $S$ come from the trigonometry with unit hypotenuse, fixed base, and sliding height of a $c,s,1$ triangle at $\lambda=1$ , as shown in the left side of Figure 8. Namely

[TABLE]

where the operations happen on the diagonal. It also follows that

[TABLE]

The equation $H_{0}=C_{\lambda}H_{\lambda}$ has a nice trigonometric interpretation. As the column vectors of $[A;\lambda L]$ grow in length (these lengths are encoded in $H_{\lambda}$ ). the cosines in $C_{\lambda}$ relate back to the $[A;0]$ columns which are shorter in length. This is depicted in Figure 8.

Theorem 8.1.

The solution $x_{\lambda}$ to the Tikhonov Regularization problem can be written as

[TABLE]

*where $x_{0}$ is the least squares solution to $Ax=b$ and $A=UH_{0}$ , where $[A;\lambda L]=[UC_{\lambda};VS_{\lambda}]H_{\lambda}.$ *

Proof 8.2.

Since

[TABLE]

we can calculate

[TABLE]

*and use the relation $H_{\lambda}^{-1}=H_{0}^{-1}C_{\lambda}$ to complete the proof. *

Comparison and Discussion

The standard application of the GSVD to Tikhonov relates $x_{\lambda}$ to $b$ and thus gives formulas involving the non-physical, non-homogeneous factor of $c/(c^{2}+\lambda^{2}s^{2})$ rather than the homogeneous $c_{\lambda}^{2}=c^{2}/(c^{2}+\lambda^{2}s^{2})$ .

The formulation in Theorem 8.1 diagonalizes the operator that relates $x_{\lambda}$ to $x_{0}.$ We understand that when $x$ are the coordinates of a linear combination of the columns of $[A;B]$ , we have that $H_{0}x$ are the coordinates of that same vector in the natural basis. Thus the interpretation of $H_{0}^{-1}C_{\lambda}^{2}H_{0}$ simply is:

Write the vector in the natural coordinate system; 2. 2.

Multiply by a cosine squared in every natural direction; 3. 3.

Return to the original coordinate sytem.

8.2 Humans vs Yeast: Comparative Data Modeling

In a series of beautiful applications of the GSVD, Alter, et.al. **[3, 32, 33, 35, 2]** propose an approach towards data reconstruction and classification. In their case **[3]**, the $A$ and $B$ are two DNA microarrays, one from humans and the other from yeast. The rows of $A$ and $B$ live in $\mathbb{R}^{n}$ or gene space. The rows of $H$ form a basis for this row (or gene) space, and are denoted genelets. A natural question is whether the genelet is primarily human, primarily yeast, or a mixture. In general, given two matrices with equal columns, one wants to classify the basis vectors in the rows of $H$ according to its source.

The GSVD provides a natural solution by creating a single coherent model from the two datasets recording different aspects of interrelated phenomena by simultaneously identifying the similar and dissimilar between the two corresponding column-matched but row-independent matrices. For each of the $r$ rows, we have that $\theta_{i}$ denotes the angle towards $A$ . In Figure 9, we portray this. We note that **[3]** displays the angles from $-\pi/4$ to $\pi/4$ , but we will stick with the [math] to $\pi/2$ convention. It is convenient that the rows of $H$ are already sorted from “mostly $A$ ,” to “mostly $B$ .”

Our ellipse picture Figure 3 reveals the geometry readily. The $[u_{i}c_{i};v_{i}s_{i}]$ all appear on the unit ball.

The comparative Data Reconstruction equation is

[TABLE]

where $h_{i}^{\prime}$ is the $i$ -th row of $H$ . (This is exactly Equation (2).) One can preprocess $H$ so that each row is of unit direction as it is only the ratio of $c_{i}$ to $s_{i}$ that matters. Any ill-conditioning of $H$ could be worrisome.

8.3 Signal vs. Noise: A one matrix and one subspace view of the GSVD

The focus on two matrices with the same number of columns is not always the best view of the GSVD. One can take rather a single $m\times n$ matrix $M$ and any $m_{1}$ dimensional reference subspace ${\cal S}$ of $\mathbb{R}^{m}$ . We can then think of the GSVD as an additive decomposition:

[TABLE]

where $P=Y_{1}UCH$ and $Q=Y_{2}VSH$ , and the columns of $Y_{1},Y_{2}$ are orthonormal bases for ${\cal S}$ and ${\cal S}^{\perp}$ respectively. Conversely, $[Y_{1}\ Y_{2}]^{\prime}M=[Y_{1}^{\prime}M;Y_{2}^{\prime}M]$ is an ordinary GSVD.

By doing this we have a decomposition of $M=P+Q$ such that $P^{\prime}Q=Q^{\prime}P=0_{n\times n}$ . Geometrically, instead of decomposing into a “top half” and “bottom half,” into a “horizontal” and “vertical” multiaxis subspace, we are rather allowing for general multiaxes subspaces. One might think of this as a rotated view of Figure 3. More specifically, most of this paper would take $Y_{1}=[I;0]$ and $Y_{2}=[0;I]$ , but all that is required is that $Y_{1}$ and $Y_{2}$ are orthogonal complements.

This geometrical insight underlies an additive decomposition signal processing application found in **[25, 26]** where $P$ and $Q$ play the role of signal + noise.

8.4 Orthonormal Bases for $\{Ax:Bx=0\}$ and Friends

The $U$ matrix of the GSVD provides, in its columns, orthonormal bases for three mutually orthogonal subspaces that arise in many applications:

[TABLE]

The “completion” referred to in the above equation means that taken together, the columns of $U_{1}$ and $U_{2}$ form and orthonormal basis for col( $A$ ). From the perspective of Figure 3, there are the horizontal directions in the red unit sphere, the generic directions, and the directions that are not present.

8.4.1 Clustering Matrices

An important example where the GSVD lurks implicitly or explicitly is clustering. We will consider an $A$ matrix that indicates the clustering, and a $B$ matrix that indicates equality of data between the clusters.

We consider data in $\mathbb{R}^{p}$ and assume a partitioning of $p=p_{1}+\ldots+p_{k}$ , into clusters. The indicator matrix corresponding to the partition of $p$ is :

[TABLE]

which we can normalize by setting

[TABLE]

In the Julia computing language, the indicator matrix can be generated succinctly with A = cat(ones.(Int,partition)...,dims=1:2), where partition denotes the vector $[p_{1},\ldots,p_{k}]$ .

The other useful matrix in this context is the constraint matrix whose nullspace is the all ones vector:

[TABLE]

*In Julia, with the LinearAlgebra package, this may be written succinctly as *

B = [I -ones(k-1)].

Given an $m\times p$ data matrix $D$ there are a number of “scatter matrices” that arise that allow us to compare between clusters and within clusters. Following roughly the notation in **[24]**, we can partition the data

[TABLE]

*Let $d_{j}$ be the * $j$ th column of $D$ and let $N_{i}$ denote the column indices in column $i$ , $c_{i}$ is the mean of the columns in cluster $i$ , and $c$ is the mean of all the columns. The within, between, and mixed scatter matrices are defined as

[TABLE]

These scatter matrices are readily calculated through the $U$ matrix for the GSVD, one can then set U, = SVD(A,B), where the comma indicates that we are requesting only the $U$ matrix. We then have that,

[TABLE]

“Completion” means that $U_{1}$ and $U_{2}$ form an orthonormal basis for $A$ . The third block is an orthonormal basis for $A^{\perp}$ . The “between” and “within” terms are statistics jargon. Given a data vector, the first column extracts the normalized mean. The next block gives a basis for clustered vectors that are mean-free which by removing the fine details within cluster provides a way to compare between clusters. The last block provides the within cluster details. The number of columns is the dimension of the space, and in statistics jargon is known as the “degrees of freedom.” (See **[29, Chap. 10]**.)

The scatter matrices can be calculated in terms of $U$ using these formulas

[TABLE]

One recognizes that the matrices in parentheses in the three expressions above are projection matrices and the orthogonality of $U$ guarantees that $S_{w}+S_{b}=S_{m}$ .

8.4.2 One Way ANOVA made simple

A commonly used statistics test is to decide whether a proposed clustering of a vector $v$ is justified. The test takes the average (meaning divide by $k-1$ ) square component in the $U_{2}$ direction and divides it by the average (meaning divide by $p-k$ ) square component in the $U_{3}$ direction. The following Julia code shows how compactly one can reproduce an example from Wikipedia where one can quickly obtain the number computed in Step 5 of https://en.wikipedia.org/wiki/One-way_analysis_of_variance#Example.

using LinearAlgebra v = [6,8,4,5,3,4,8,12,9,11,6,8,13,9,11,8,7,12] # data vector A = cat(ones.([6,6,6])...,dims=1:2) # Indicator(6,6,6) B = [1 0 -1; 0 1 -1] # Constraint matrix U,= SVD(A,B) # GSVD (norm(U[:,2:3]’v)/norm(U[:,4:18]’v))^2 * 15/2 # The F value

9.264705882352956

While for this problem the classic approach is fine as an algorithm, for general tests for being in the column space of $A$ but orthogonal to $\{Ax:Bx=0\}$ , the GSVD is worth considering algorithmically and how we are projecting into the non-horizontal directions is worth understanding geometrically.

8.4.3 See a slope? Generalize to a GSVD

In the last line of the above code snippet, the innocent looking

            norm(U[:,2:3]’v)/norm(U[:,4:18]’v)

for an orthogonal matrix $U$ carries a message of generalization if you know how to read it. It is a ratio of components in two orthogonal directions. You can call it a slope, or a cotangent, or a tangent. What we called horizontal and vertical multiaxes in Figure 3 may now be labeled in this coordinate system: the between and within axes, following the aforementioned statistics nomenclature.

The generalization of the vector $v\in\mathbb{R}^{p}$ example of Section 3 is a $p\times n$ matrix $M$ of data, each data item being one row of length $n$ . It is therefore natural geometrically to consider and interpret the GSVD as

[TABLE]

The result is $n$ canonical directions for considering between vs within as naturally as comparing human vs yeast, or signal vs noise as we have seen in previous applications. The multislope, i.e. the generalized singular values (or perhaps we can call this the ANOVA structure) is [math] in all but at most $k-1$ directions, owing to the number of columns in $U_{2}$ .

8.4.4 Discriminant Analysis Dimension Reduction

Continuing with the idea in Section 8.4.3. we observe that it is natural to reduce out all but the $k-1$ nonzero ANOVA directions by multiplying $M$ on the right by $G=H^{\dagger}I_{r,k-1}$ or (for that matter any matrix whose columns span the same subspace of $\mathbb{R}^{n}$ .).

The reduction to $k-1$ columns

[TABLE]

can be rotated back to the standard coordinate system without any change to the nonzero generalized singular values (the ANOVA structure) to yield

[TABLE]

since $UU^{\prime}=I$ . We can reduce the mean also by adding back $U_{1}U_{1}^{\prime}G$ producing our final reduction, $MG.$

Our simple summary is that for a data matrix $M$ , ANOVA measures the nonzero generalized singular values in $[U_{2}^{\prime};U_{3}^{\prime}]M$ , a rotated multiaxis system which gives the ratios of the “between" to the “within", and these are the same as for the reduced data matrix $MG$ because we are suppressing the directions with [math] generalized singular values.

This is a geometrical derivation of an idea and algorithm presented by Park and others **[24]** with a minimization approach. In their algorithm $G$ can be derived efficiently as the first $k-1$ columns of the $Q$ from the GSVD, and the authors point out that the GSVD idea is robust even in the case of too little data.

8.5 The Jacobi Ensemble from Random Matrix Theory is a GSVD

Classical random matrix theory centers are Hermite, Laguerre, and Jacobi ensembles. Historically, they are presented in eigenvalue format, but we have argued that the eigenvalue, SVD, GSVD formats, respectively, are mathematically more natural providing simpler derivations and clearer insights. Suppose we have two Gaussian random matrices $A$ ( $m_{1}\times n$ ) and $B$ ( $m_{2}\times n$ ) with $m_{1}\geqslant n$ and $m_{2}\geqslant n$ . For example, A=randn(m1,n) and B=randn(m2,n) using Julia notation. The so-called MANOVA matrix (Multivariate Analysis of Variance) is defined to be

[TABLE]

or in the symmetric form $(A^{\prime}A+B^{\prime}B)^{-1/2}A^{\prime}A(A^{\prime}A+B^{\prime}B)^{-1/2}.$ The eigenvalues are the squares of the cosines ( $c_{i}^{2}$ ) and are jointly distributed as **[29]**

[TABLE]

where $a_{1}=\frac{\beta}{2}m_{1},a_{2}=\frac{\beta}{2}m_{2}$ and $p=1+\frac{\beta}{2}(n-1)$ ,

[TABLE]

where $\beta=1$ for real matrices, $\beta=2$ for complex matrices, $\beta=4$ for quaternion matrices, and general $\beta$ is worth considering, as in **[16]** . The eigenvalue distribution is known as the Jacobi ensemble, which was first referred by name in **[28]**. We refer interested readers to **[18]**, where the geometrical picture (a simplified version of the ellipse in Figure 3) motivates a direct derivation of the joint density of the Jacobi ensemble. Note that, the direct derivation in **[18]** fills in a gap stated in Remark 2.3 of **[20]**, where an indirect proof using the Fourier Transform is presented, but a direct proof without the Fourier Transform is desired. An earlier alternative direct proof is due to **[40]**.

9 Mathematical Software

Suppose one looks up the GSVD in the help pages of your favorite technical computing language, shown in Table 3 and the Julia version in Table 10. One gets lost in a sea of matrices whose meaning is very hard to fully appreciate. Surprisingly, we find no standard function for the GSVD in Python (NumPy and SciPy) though there is some discussion on StackOverflow **[1]** and Github Numpy issue #3475333https://github.com/numpy/numpy/issues/3475** and scipy issue #743444https://github.com/scipy/scipy/issues/743** and #1491555https://github.com/scipy/scipy/issues/1491**.

10 Acknowledgments

We thank Orly Alter, Zhaojun Bai, Michael Kirby, Andreas Noack, Chris Paige, Haesun Park, Sri Priya Ponnapalli, Charlie van Loan, Sabine van Huffel, and Joos Vandewalle for interesting conversations about the GSVD theory, software, and feedback from lectures at the 2017 Householder Symposium, and 2018 SIAM Applied Linear Algebra meeting. We thank Sungwoo Jeong for finding two references in the literature that come close to the Grassmann viewpoint for the GSVD **[23, 20]**.

We also wish to acknowledge and remember the late Gene Golub, over a decade since his passing, who so effectively promoted the singular value decomposition. We remember a time, not so long ago, when the SVD was unheard of outside of numerical linear algebra circles, and eigenvalues were all that were known. Then like dominos falling, one field after another, biology, economics, fields of engineering, statistics, computer science, and yes pure mathematics learned about the value of the SVD as a tool, as an algorithm, and even as vocabulary for effective communication. Gene with his PROF SVD (Figure 11) and DR SVD California vanity license plates seemed always nearby when a field was starting to catch on. Today the GSVD is as obscure as the SVD was in the early days. We feel that the GSVD’s time has come. We would be very pleased if one by one other fields would catch on.

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] GSVD for python generalized singular value decomposition , https://stackoverflow.com/questions/37814024/GSVD-for-python-generalized-singular-value-decomposition .
2[2] K. A. Aiello, S. P. Ponnapalli, and O. Alter , Mathematically universal and biologically consistent astrocytoma genotype encodes for transformation and predicts survival phenotype , APL Bioengineering, 2 (2018), p. 031909.
3[3] O. Alter, P. O. Brown, and D. Botstein , Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms , Proceedings of the National Academy of Sciences, 100 (2003), pp. 3351–3356.
4[4] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. Mc Kenney, and D. Sorensen , LAPACK Users’ Guide , vol. 9, Siam, 1999, http://www.netlib.org/lapack/lug/node 36.html .
5[5] Z. Bai , The csd, gsvd, their applications and computations , Preprint Series 958. Institute for Mathematics and its Applications, University of Minnesota, (1992).
6[6] D. Callaerts , Signal separation methods based on singular value decomposition and their application to the real-time extraction of the fetal electrocardiogram from cutaneous recordings , Ph D thesis, Katholieke Universiteit Leuven, 1989.
7[7] D. Callaerts, B. De Moor, J. Vandewalle, W. Sansen, G. Vantrappen, and J. Janssens , Comparison of svd methods to extract the foetal electrocardiogram from cutaneous electrode signals , Medical and Biological Engineering and Computing, 28 (1990), p. 217.
8[8] D. Chu, L. De Lathauwer, and B. De Moor , A qr-type reduction for computing the svd of a general matrix product/quotient , Numer. Math., 95 (2003), pp. 101–121.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

The GSVD: Where are the ellipses?,

Abstract

keywords:

1 Introduction

1.1 Prelude

1.2 The GSVD

1.3 A “GH” decomposition

1.4 More details about U,V,C,S,HU,V,C,S,HU,V,C,S,H

1.5 Summary

Notation

2 Where are The Ellipses?

Any help would be appreciated.

2.1 Understanding the Ellipse Picture for the GSVD

2.2 An in depth look at small dimensional special cases

2.2.1 A red line in R2\mathbb{R}^{2}R2, XXX=the xxx-axis, YYY=the yyy-axis

2.2.2 A red line in R3\mathbb{R}^{3}R3, XXX=the xyxyxy-plane, YYY=the zzz-axis

2.2.3 A red line in R3\mathbb{R}^{3}R3, XXX=xxx-axis, YYY=the yzyzyz-plane

2.2.4 A red plane in R3\mathbb{R}^{3}R3, XXX=the xyxyxy-plane, YYY=the zzz-axis

2.3 On infinite generalized singular values and horizontal directions

3 Matrix Trigonometry

4 The relationship between the GSVD and the CS Decomposition

Parameter Count

5 Principal angles between subspaces

6 The Lemniscate Plots from Leuven, Belgium

Theorem 6.1**.**

Proof 6.2**.**

7 On the GSVD(A,B)\text{GSVD}(A,B)GSVD(A,B) and the SVD(AB†\text{SVD}(AB^{\dagger}SVD(AB†)

7.1 Why there is an issue?

7.2 The significance of horizontal directions and their orthogonal complement in XXX

7.3 The correct modified theorem requires PA/BPA/BPA/B

Theorem 7.1**.**

Proof 7.2**.**

Corollary 7.3**.**

Proof 7.4**.**

7.4 Blame the pseudoinverse not the GSVD

Theorem 7.5**.**

Proof 7.6**.**

Corollary 7.7**.**

Proof 7.8**.**

Example 7.9**.**

Corollary 7.10**.**

Proof 7.11**.**

8 GSVD Applications and their Geometric Interpretations

8.1 Geometry of Tikhonov Regularization

8.1.1 The two cosine damping

Theorem 8.1**.**

Proof 8.2**.**

Comparison and Discussion

8.2 Humans vs Yeast: Comparative Data Modeling

8.3 Signal vs. Noise: A one matrix and one subspace view of the GSVD

8.4 Orthonormal Bases for {Ax:Bx=0}\{Ax:Bx=0\}{Ax:Bx=0} and Friends

8.4.1 Clustering Matrices

8.4.2 One Way ANOVA made simple

8.4.3 See a slope? Generalize to a GSVD

8.4.4 Discriminant Analysis Dimension Reduction

8.5 The Jacobi Ensemble from Random Matrix Theory is a GSVD

9 Mathematical Software

10 Acknowledgments

1.4 More details about $U,V,C,S,H$

2.2.1 A red line in $\mathbb{R}^{2}$ , $X$ =the $x$ -axis, $Y$ =the $y$ -axis

2.2.2 A red line in $\mathbb{R}^{3}$ , $X$ =the $xy$ -plane, $Y$ =the $z$ -axis

2.2.3 A red line in $\mathbb{R}^{3}$ , $X$ = $x$ -axis, $Y$ =the $yz$ -plane

2.2.4 A red plane in $\mathbb{R}^{3}$ , $X$ =the $xy$ -plane, $Y$ =the $z$ -axis

Theorem 6.1.

Proof 6.2.

7 On the $\text{GSVD}(A,B)$ and the $\text{SVD}(AB^{\dagger}$ )

7.2 The significance of horizontal directions and their orthogonal complement in $X$

7.3 The correct modified theorem requires $PA/B$

Theorem 7.1.

Proof 7.2.

Corollary 7.3.

Proof 7.4.

Theorem 7.5.

Proof 7.6.

Corollary 7.7.

Proof 7.8.

Example 7.9.

Corollary 7.10.

Proof 7.11.

Theorem 8.1.

Proof 8.2.

8.4 Orthonormal Bases for $\{Ax:Bx=0\}$ and Friends