The GSVD: Where are the ellipses?, Matrix Trigonometry, and more
Alan Edelman, Yuyang Wang

TL;DR
This paper develops a geometric and theoretical framework for the Generalized Singular Value Decomposition (GSVD), revealing its natural coordinates, applications, and advantages over traditional eigenproblem approaches, with implications across various scientific fields.
Contribution
It introduces a geometric interpretation of the GSVD, including an ellipse picture and multiaxes, and advocates for its natural application setting over the eigenproblem formulation.
Findings
The GSVD provides a natural coordinate system for the Grassmann manifold.
The geometric ellipse interpretation clarifies the role of generalized singular vectors.
Applications include regularization, genome reconstruction, signal processing, and statistical analysis.
Abstract
This paper provides an advanced mathematical theory of the Generalized Singular Value Decomposition (GSVD) and its applications. We explore the geometry of the GSVD which provides a long sought for ellipse picture which includes a horizontal and a vertical multiaxis. We further propose that the GSVD provides natural coordinates for the Grassmann manifold. This paper proves a theorem showing how the finite generalized singular values do or do not relate to the singular values of . We then turn to the applications arguing that this geometrical theory is natural for understanding existing applications and recognizing opportunities for new applications. In particular the generalized singular vectors play a direct and as natural a mathematical role for certain applications as the singular vectors do for the SVD. In the same way that experts on the SVD often prefer not to cast…
| Property of and | ||
|---|---|---|
| total # columns | ||
| # zero columns in (left columns): | # | |
| # non-zero columns (middle columns): | ||
| # zero columns in (right columns): | # | |
| total # rows | = # rows | = # rows |
| # non-zero rows | ||
| # zero rows | ||
| : Principal angle between and : SVD() : SVD() : SVD() if : SVD() if | ||
|---|---|---|
| left singular vectors of ( or if ) | ||
| left singular vectors of ( or if ) | ||
| language | GSVD documentation in the corresponding language |
|---|---|
| matlab (R2018b) | https://www.mathworks.com/help/matlab/ref/gsvd.html ⬇ [U,V,X,C,S] = gsvd(A,B) returns unitary matrices U and V, a (usually) square matrix X, and nonnegative diagonal matrices C and S so that A = U*C*X’ B = V*S*X’ C’*C + S’*S = I A and B must have the same number of columns, but may have different numbers of rows. If A is m-by-p and B is n-by-p, then U is m-by-m, V is n-by-n, X is p-by-q, C is m-by-q and S is n-by-q, where q = min(m+n,p). The nonzero elements of S are always on its main diagonal. The nonzero elements of C are on the diagonal diag(C,max(0,q-m)). If m >= q, this is the main diagonal of C. |
| Mathematica (11.3.0) |
https://reference.wolfram.com/language/ref/SingularValueDecomposition.html
>Details and Options.
SingularValueDecomposition[m,a] gives a list of matrices {{u,ua},{w,wa},v} such that m can be written as u.w.Conjugate[Transpose[v]] and a can be written as ua.wa.Conjugate[Transpose[v]].
|
| R (geigen v2.2) | https://www.rdocumentation.org/packages/geigen/versions/2.2/topics/GSVD The matrix is a -by- matrix and the matrix is a -by- matrix. This function decomposes both matrices; if either one is complex than the other matrix is coerced to be complex. The Generalized Singular Value Decomposition of numeric matrices and is given as A = UD_1 [0 R]Q’, and B = VD_2[0 R]Q’, where an orthogonal matrix an orthogonal matrix an orthogonal matrix an -by- upper triangular non singular matrix and the matrix is an -by- matrix. are quasi diagonal matrices and nonnegative and satisfy is an -by- matrix and is a -by- matrix. For details on this decomposition and the structure of the matrices and . see http://www.netlib.org/lapack/lug/node36.html. |
| language | GSVD documentation in corresponding language |
| Julia 1.4 (and above) | ⬇ svd(A, B) -> GeneralizedSVD Compute the generalized SVD of A and B, returning a GeneralizedSVD factorization object F such that [A;B] = [F.U * F.D1; F.V * F.D2] * F.R0 * F.Q’ * U is a M-by-M orthogonal matrix, * V is a P-by-P orthogonal matrix, * Q is a N-by-N orthogonal matrix, * D1 is a M-by-(K+L) diagonal matrix with 1s in the first K entries, * D2 is a P-by-(K+L) matrix whose top right L-by-L block is diagonal, * R0 is a (K+L)-by-N matrix whose rightmost (K+L)-by-(K+L) block is nonsingular upper block triangular, K+L is the effective numerical rank of the matrix [A; B]. Iterating the decomposition produces the components U, V, Q, D1, D2, and R0. The generalized SVD is used in applications such as when one wants to compare how much belongs to A vs. how much belongs to B, as in human vs yeast genome, or signal vs noise, or between clusters vs within clusters. (See Edelman and Wang for discussion: https://arxiv.org/abs/1901.00485) It decomposes [A; B] into [UC; VS]H, where [UC; VS] is a natural orthogonal basis for the column space of [A; B], and H = RQ’ is a natural non-orthogonal basis for the rowspace of [A;B], where the top rows are most closely attributed to the A matrix, and the bottom to the B matrix. The multi-cosine/sine matrices C and S provide a multi-measure of how much A vs how much B, and U and V provide directions in which these are measured. |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\newsiamremark
remarkRemark \newsiamremarkexampleExample
The GSVD: Where are the ellipses?,
Matrix Trigonometry, and more
Alan Edelman Department of Mathematics, MIT, Cambridge, MA (). [email protected]
Yuyang Wang AWS AI Labs, East Palo Alto, CA (). Work done prior joined Amazon. [email protected]
Abstract
This paper provides an advanced mathematical theory of the Generalized Singular Value Decomposition (GSVD) and its applications. We explore the geometry of the GSVD providing a long sought for picture which includes a horizontal and a vertical multiaxis. We further propose that the GSVD provides natural coordinates for the Grassmann manifold. This paper proves a theorem showing how the finite generalized singular values do or do not relate to the singular values of .
We then turn to applications, arguing that this geometrical theory is natural for understanding existing applications and recognizing opportunities for new applications. In particular the generalized singular vectors play a direct and as natural a mathematical role for certain applications as the singular vectors do for the SVD. In the same way that experts on the SVD often prefer not to cast SVD problems as eigenproblems, we propose that the GSVD, often cast as a generalized eigenproblem, is perhaps best cast in its natural setting.
We illustrate this theoretical approach and the natural multiaxes (with labels from technical domains) in the context of applications where the GSVD arises: Tikhonov regularization (unregularized vs regularized), Genome Reconstruction (humans vs yeast), Signal Processing (signal vs noise), and statistical analysis such as Analysis of variance (ANOVA) and discriminant analysis (between clusters vs within clusters.) With the aid of our ellipse figure, we encourage the labelling of the natural multiaxes in any GSVD problem.
keywords:
GSVD, SVD, ellipse, CS Decomposition, Tikhonov Regularization
{AMS}
65F22, 15A18, 15A23
1 Introduction
1.1 Prelude
If and are two vectors, then the block vector equation in :
[TABLE]
may be thought of geometrically as a hypotenuse vector decomposed as the sum of two legs of a right triangle. If is the length of this hypotenuse and are the unit direction vectors for then we can write
[TABLE]
where and are the cosine and sine of the corresponding angles, namely and . This is ordinary planar trigonometry of a right triangle.
For notational convenience, we will sometimes use a semicolon (“;”) to denote the stacking (or vertical concatenation) of vectors and matrices, so that
[TABLE]
We note that is a unit vector in the direction The cotangent is a slope which provides a measure of whether the vector is primarily in the “” (or top) direction, or the “,” or a mix depending on whether is large, small, or in between.
The GSVD extends the above ideas to matrices.
1.2 The GSVD
This paper provides a new approach and understanding of the generalized SVD (GSVD) [30, 38, 9] of two matrices . Generalizing the introductory paragraphs, the GSVD may be understood in the context of a generalized Pythagorean theorem with
[TABLE]
We take as our definition of a GSVD, a decomposition of with the form
[TABLE]
where are square orthogonal in . ; are 1-diagonal (see Figure 1 ) such that , and has full row rank where denotes rank(). The remaining dimensions are implied, namely are in , and is in .
The SVD is so widely used that applications need not be listed. Historically this was not always the case. Fields such as biology, economics, and computer science could be observed learning about the SVD one-by-one with great impact. Perhaps a kind of folklore notion is that the SVD applies any time an array needs to be quickly compressed to the main information out, or whenever was lurking. We would love to foster a world where the GSVD finds applications one-by-one in many fields. Perhaps the new folklore is that the GSVD applies when two arrays with a common dimension need to be quickly compressed or whenever two matrices and are lurking. Of course both the SVD and GSVD underly more.
Some selected applications of the GSVD include oriented energy analysis [6, 7, 8, 10, 11, 39], (here the GSVD is sometimes called by the more descriptive name QSVD for “quotient” SVD), Tikhonov regularization [21, 14], Linear Discriminant Analysis [31, 24], and more recently in microarray analysis [3]. A review from 1992 and discussion of algorithms may be found in [5].
As a point of mathematical taste, many textbooks today still treat SVDs as a byproduct of exposition on eigenvalues. This is unfortunate, as most of the time considerations of or create unnecessary mathematical baggage best abandoned. The SVD is mature enough to live its own life separate from the symmetric eigenvalue problem. Taking this notion one step further, the GSVD deserves to live separately from generalized eigenvalue problems or the SVD. When a GSVD lurks, it is recommended to abandon old fashioned language and see the true GSVD construction in full mature light. We take this approach in a number of examples in this paper.
1.3 A “GH” decomposition
To clarify and streamline our view of the roles of the pieces of the GSVD, we propose that the GSVD be considered a GH decomposition:
[TABLE]
where (for Grassmann or geometric) denotes the information in the -dimensional hyperplane representing the column space of . Specifically the columns of are a natural orthonormal basis for that hyperplane in , and the columns of are the coordinates of the columns of in that basis. Of course the decomposition of has exactly the same properties, with one important difference: the is not uniquely defined by the hyperplane, while in the GSVD, the choice is more or less canonical.
We further feel that the factorization into the two matrices and emphasizes the outer product rank form:
[TABLE]
which can be readily missed in the long form.
In analogy with the SVD or Non-negative Matrix Factorization (NMF) [27], one might consider a simultaneous rank reducing method where only the rows of with largest norm are kept.
In particular if we multiply on the right by , where is the first columns of the identity, we obtain a rank reduced :
[TABLE]
We remark that is an oblique projector when is square non-singular, and an orthogonal projector when is orthogonal.
1.4 More details about
The matrices deserve more detailed discussion, as may be found in Appendix A.
To help guide the reader, we offer a table of bases for the fundamental subspaces that appear in the GSVD. It is helpful to keep in mind that the columns of and are leftward looking towards the orthogonal and matrices in the GSVD factorization, while the rows of and are rightward looking towards the full row rank in the GSVD factorization.
[TABLE]
It is useful to point out that the common nullspace of and is killed by , i.e., if and then . A vector that is in only one of the nullspaces is not killed by , but is killed by 0 columns in or respectively.
Let . Table 1 shows the structure of and . A very common case has in which case the sizes of match that of .
1.5 Summary
This paper contains a number of insights and results about the GSVD:
- •
We present an ellipse picture of the GSVD, which requires four dimensions to get a good feel for the general case (Section 2).
- •
The GSVD generalizes planar trigonometry to matrix trigonometry (Section 3).
- •
We consider as natural coordinates for dimensional hyperplanes (the Grassmann manifold) in given that . We use the Grassmann manifold coordinates to clarify the link between the CS decomposition and the GSVD (other authors have observed vaguely that they are closely related). We view the matrix as the change of coordinates from canonical coordinates to the specifics of (Section 4).
- •
We discuss the link between the GSVD and the principal angles between subspaces (Section 5), and related “energy portraits” (Section 6).
- •
We prove a theorem relating GSVD and SVD. They are not generally identical (Section 7).
- •
We revisit applications in the geometric context, and interpret the GSVD as a multi-dimensional slope and connect applications (Section 8).
Notation
For , let denote the normalized -th column of if , or else define . Similarly, let denote the normalized -th column of if , or else define . This notation conveniently avoids issues of different sizes and conventions. For example, or may have fewer than columns. Details of the placement of the and appear in Figure 1. Suffice it to say for now that is the -th column of the matrix when , and may be found in the -th column of the matrix when . The indirection in is admittedly unfortunate, but in all cases, the non-zero by convention are left to right contiguous columns of that may either start from the left, or end at the right, but in many situations is not in the -th column. We use to denote the pseudo-inverse of The “slash” and “backslash” are defined as and We also overload the notation to denote the generalized singular values of while means the singular values of .
2 Where are The Ellipses?
The SVD ellipse picture for a matrix (Figure 2) is a very familiar visual for the action of on the unit ball. We are not aware of any ellipse pictures in the literature nor even a notion that a natural ellipse picture exists for the GSVD or even the CSD (CS Decomposition) [19]. We believe that the lack of a geometric view of the GSVD is part of the reason that the GSVD is not as widely understood or as widely used as it should be.
Regarding an ellipse picture, one might blame some sort of human inability to perceive higher dimensions as a complication, but we show that this is not really the case in Figure 3.
The gap in understanding is underscored by the curiosity expressed online, but without answer, on such sites as MATLAB Central [13] (reproduced here111The authors contacted Mr. Dyas on December 26, 2019 to inform him of the solution of his twenty year query.) and a similar request on the question-and-answer site Quora [34] (not reproduced here).
Subject: Generalized SVD geometry? From: Bob Dyas Date: 29 Feb, 2000 15:31:31
Message: 1 of 1 indicates no answer in 20 years!
Is there a geometric interpretation of the generalized singular value decomposition? I’m looking for something comparable to the geometry associated with the standard SVD. I understand how U, V and the singular values of the SVD relate to the geometry of the input matrix but I don’t have an intuitive feel for how U, V, X and the generalized singular values relate to the geometry of the two input matrices of the GSVD.
Any help would be appreciated.
Bob Dyas
2.1 Understanding the Ellipse Picture for the GSVD
Figure 2, portrayed in four dimensional space, generically serves to illustrate the GSVD in any dimensions.
Given , we consider the unit sphere (shown in exploded form in Figure 2 as a red circle) in the span of (shown as a red plane). In blue and green we have the ellipses that show the “downward” and “leftward” projections of these ellipses onto the multiaxes and defined as those vectors whose first or last coordinates may not vanish. (For example if in , then the multiaxis consists of vectors of the form and the multiaxis consists of vectors of the form .
The are semi-axes of these ellipses, with lengths . The vector is on the (red) unit sphere in the span of .
Since we have the equality , we see that is the change of coordinates from the columns of to the orthonormal columns of , and goes the other way.
2.2 An in depth look at small dimensional special cases
2.2.1 A red line in , =the -axis, =the -axis
Below we show the possibilities for for a line in (drawn in red as the span of where and are ) which may be horizontal , general position , or vertical In any event the and are the cosine and sine of the angle with the horizontal.
2.2.2 A red line in , =the -plane, =the -axis
( ) Below we show the possibilities for for a line in (drawn in red as the span of , where , ). The multiaxis is traditionally labeled the -plane, and the is the -axis. A line can be in the -plane, in general position, or along the -axis. The corresponding matrix is illustrated. The is the angle between the red line and the -plane, while the is the angle of the red line and the -axis.
2.2.3 A red line in , =-axis, =the -plane
( ) Below we show the possibilities for for a line in (drawn in red as the span of , where , ). A line can be along the -axis, in general position, or in the -plane. The corresponding matrix is illustrated. The is the angle between the red line and the axis, while the is the angle of the red line and the -plane. The shaded =-plane indicates the red line is in that plane.
2.2.4 A red plane in , =the -plane, =the -axis
( ) Below we show the possibilities for for a plane in (drawn in red as the span of , where , ). A plane can be the -plane. A plane in general position in intersects the -plane in a line (shown as a dashed red line) but does not include the axis. A final possibility for a plane is that it includes the axis (broken red/green line.)
The corresponding matrix is illustrated. We have corresponding to the 0 degree angle from a line in the red plane and the axis. We have which is the cosine of the angle formed from a line at right angles from the aforementioned line and the -plane. Note that is not found in the matrix, since there is room for only one row which contains .
Figure 4 below is the ellipse picture in 3 dimensions (3d), which admittedly has too few dimensions to understand the general picture. Nevertheless, one can clearly see the unit circle in the sphere being projected down to an ellipse on the axis. We see the and as the lengths of the semi-axis of the ellipse. The direction is where the plane representing intersects the -plane. The direction is orthogonal to and also in the plane. The direction is the maximum slope off the -plane, and is the length of the projection of the unit circle onto the -axis. The orthogonal direction projects to [math] giving the .
2.3 On infinite generalized singular values and horizontal directions
As may become clear upon inspection of the small dimensional cases, it is very possible that we have some and so that the generalized singular value is infinite. These infinite singular values are associated with horizontal directions in the “red” hyperplane, i.e. . They arise when our hyperplane intersects our multiaxis in any non-zero direction.
The situation in Section 2.2.4 illustrates that this is typical when we consider a plane in and is the -plane. ( is and is .) The unit circle in the plane has a vector of length 1, , that lives on the horizontal -plane. The orthogonal direction, has a projection on the -plane that is generically shorter than a unit vector, but still orthogonal to .
3 Matrix Trigonometry
We claim that the GSVD is the natural generalization of high school trigonometry to what we might call “matrix trigonometry.”
There is so much in Figure 5 that we are all familiar with in the planar case: There is all of trigonometry, and in particular there is which has a special role because is the slope of the line. If is small relative to we have a shallow slope, and vice versa. The only hint that there is some directionality is the possibility of a sign. To specify directions we sometimes would write a hypotenuse vector in component form: . If we take the components of a unit vector in the direction of the hypotenuse, then the components form a cosine-sine pair: .
The ideas of trigonometry, slope, component form and cosine-sine pairs extend to higher dimensions through the GSVD. Instead of one triangle, there are triangles. Instead of one vector i, there are vectors in the columns of . Instead of one vector j, there are vectors in the columns of . Instead of a unit length hypotenuse there are unit length hypotenuses, which can be written in the component form
[TABLE]
The hypotenuses, as we show in Figure 3, live on a unit sphere that projects nicely “down”ward and “left”ward. The are semi-axes of the downward ellipse; and the on the leftward ellipse.
Just as tells you how small or big is relative to , the GSVD tells you how small or big is relative to , but now it is in natural directions. Thus can be larger than in some directions, and smaller in others.
There is some temptation to try to say that the GSVD is related to the principal angles of the column space of and the column space of . This of course makes no more sense than looking for anything other than right angles between the -axis and the -axis in 2d. The interesting angles are between the span of the column space of and the canonical axes . More details can be found in Section 5.
One quick algebraic way to define the singular values of an matrix is to find the diagonal matrix with non-negative entries in the set where is by orthogonal and is by orthogonal. This is the equivalence class representative definition. Similarly, one can define the generalized singular values of a pair of matrices with the same number of columns. The “cosine-sine” format, is the pair of (1-)diagonal matrices with non-negative entries in the set of matrix pairs . Often the GSVD is given in “cotangent” format, which is the ratio of cosines to sines.
We summarize the GSVD properties with Table 2.
4 The relationship between the GSVD and the CS Decomposition
It is often written [19, Section 8.7.5] that the GSVD and the CS Decomposition are closely related. The geometric viewpoint highlights the GSVD and the CS decomposition as rooted in representations of points in the Grassmann manifold (linear hyperplanes through the origin) in an dimensional space using as natural coordinates.
The simple notion is that the information may be thought of as
[TABLE]
This connection is rooted ultimately in the Cartan decomposition of the Grassmann manifold, one of the finitely many classes of symmetric spaces [22]. The idea is that certain matrix spaces have a “KAK” or compact/abelian/compact decomposition. The SVD is one example as it is orthogonal/diagonal/orthogonal. The CS decomposition is another. This observation may be found in a numerical linear algebra conference presentation [15] and in the quantum computing literature [37].
To be sure if is already orthogonal then so is . This constitutes the “left half” of the complete CS decomposition. Thus a GSVD is a “left half” of a CS, when are orthogonal, and the “left half” of a CS is a GSVD. One can also have a basis for the orthogonal complement of span() to get the “right half.” This captures the isomorphism between the Grassmann manifold (i.e., -dimensional subspace in ) and (i.e., -dimensional subspace in ). Thus if one takes the combined SVD’s of orthogonal matrices whose spans are orthogonal complements, one has the CS decomposition and vice versa.
Any which way, the mathematical idea underlying all is that there is a fairly canonical representation for generic elements of the Grassmann manifold and a matrix connecting back to an orthogonal or arbitrary basis which has a further symmetry property when taking both the span of and its orthogonal complement in conjunction in that transposing a full orthogonal matrix reverses the roles canonical coordinates and basis converter.
Parameter Count
There has been a longstanding tradition in numerical linear algebra to overwrite matrix inputs with the parameters from the factored form. Thus if is , the factorization has the parameters from and the parameters from . Similarly if , the while appearing naively as an matrix, actually only has parameters, which is exactly what is computed in software [4].
Given an matrix of rank , and a decomposition of as , we can count parameters on both the left and right sides of While tricky, the only facts needed are:
Rank Codimension: The codimension of the rank matrices of size is [12, Lemma 3.3]. 2. 2.
Stiefel Manifold Dimension: The dimension of the Stiefel manifold of ordered orthonormal directions in is [17, Section 2.2]. 3. 3.
Grassmann Manifold Dimension: The dimension of the Grassmann manifold of -dimensional subspaces in is [17, Section 2.5].
[TABLE]
To understand the parameter count, we begin with the simple observation that generically and , from which we can derive the number of that are strictly between [math] and as . The relevant Stiefel manifolds are and . These correspond exactly to choosing the directions for the axes of the ellipses. Also one must consider for as this is the dimension divide between the [math] degree angles and the angles when this has content. This data is summarized below:
[TABLE]
We remark that further fine grain detailed parameter counts are possible including lower rank and , but we content ourselves with the table above.
5 Principal angles between subspaces
Section 3 points out that the GSVD of and does not contain angle information between the column spaces of and . Rather, Figure 3 illustrates that the relevant angles are between the “red space” () and the “blue space” ().
This suggests that the GSVD can be used to compute principal angles (see Section 6.4.3. of [19]) between the column spaces of and when More precisely, it can be accomplished by letting be any orthogonal matrix where It follows that are the cotangents of the desired principal angles.
This maybe seen geometrically as the GSVD computes the cotangents of angles between
[TABLE]
but we can multiply by the orthogonal matrix , which preserves angles, obtaining the angles between and
We can conclude that we have a rotated Figure 3 (shown in Figure 10) where the and multiaxes are replaced with and
6 The Lemniscate Plots from Leuven, Belgium
In a series of early papers most of which date back to the 1980s [6, 7, 8, 10, 11, 39], energy portraits that relate to the SVD and GSVD of a matrix or a pair of matrices are discussed with applications.
The definition of an energy portrait of a single matrix is
[TABLE]
and for a pair of matrices with the same number of columns
[TABLE]
It is important to point out that the curves in Figure 6 are not ellipses but rather lemniscate-like portraits. They do not even live in the same spaces as the ellipse pictures. The standard SVD ellipse lives in and the GSVD picture in this paper lives in . By contrast, the energy portraits from Leuven live in .
We provide the Julia codes that produce these curves as a reference. Readers are encouraged to try other matrices.
A = [.577699 -.224144;1.190069 .836516] # Figure 6 (Left) e(theta) = [cos(theta), sin(theta)] r1(theta) = sum(abs2, A*e(theta)) r2(theta) = sum(abs2, A’e(theta)) theta = pi * (0:.01:2) plot( theta, r1.(theta), proj=:polar, label="SVD Energy(A)") plot!(theta, r2.(theta), proj=:polar, label="SVD Energy(A’)")
A = [.27 .66 ; -1.4 1.3] # Figure 6 (Right) B = [1 0; -.5 1.1] e(theta) = [cos(theta), sin(theta)] r1(theta) = sum(abs2, Ae(theta)) r2(theta) = sum(abs2, Be(theta)) theta = pi * (0:.01:2) plot(theta,r1.(theta)./r2.(theta), proj=:polar,label="GSVD Energy(A,B)")
For completeness, we thought we would take a closer look at these older plots. To explain in what sense the curves are lemniscates, it is best to eliminate the “e” in the definition and rewrite the energy plots as the zero set of an algebraic equation, thereby connecting the portraits to the field of algebraic geometry.
Theorem 6.1**.**
If , then satisfies the algebraic polynomial equation
[TABLE]
where . Further if , then satisfies the algebraic polynomial equation
[TABLE]
*where . *
Before proving the theorem we provide a historical analog. We might compare the solution set of with that of which is the lemniscate of Booth whose study traces back to the 5th century Greek philosopher Proclus. The difference being that Booth specialized to and only took first powers of the quantities, but in spirit it is a similar algebraic polynomial equation.
Proof 6.2**.**
Taking , we see that where It is straightforward to check since which is exactly the result for a single matrix.
For the two matrix case, where and , if , then
[TABLE]
7 On the and the )
In this section we relate the finite part (nonzero, noninfinite) of the generalized singular values of (denoted as ) to the singular values of (denoted as ) where is the pseudoinverse of . We may use the notation for . An issue arises that may surprise some readers.
7.1 Why there is an issue?
One may expect that there may always be a relation between the GSVD of and the SVD of . For example, in the matlab documentation222https://www.mathworks.com/help/matlab/ref/GSVD.html it is stated that the generalized singular values are the ratios of the diagonal elements of and in a given example. One might infer from the documentation that this is always the case.
However it is not generally true when there are infinite singular values, i.e., when .
Consider a simple example where is a non-singular matrix, and is a nonzero matrix. In this case . The GSVD of is readily verified to have infinite singular values, and the one finite value The SVD of is just the length of or
When , both of these expressions are equal to the absolute ratio , ( after all) but for larger the two matrix expressions are not equal.
An extremely simple special case takes and The two values are and exactly.
The issue arises exactly when there are infinite . If there are no infinite , has no [math] columns, and we can write
[TABLE]
which is a singular value decomposition of . (We use the property that has full row rank to conclude and that is an matrix with on the main diagonal.)
The problem that arises when some is that does not equal when has any zero columns.
7.2 The significance of horizontal directions and their orthogonal complement in
In Section 2.3, we considered the intersection of span() with the multiaxis. An orthogonal basis for this intersection is which correspond exactly to the .
Working entirely in as an dimensional space, we are interested in the projection matrix that kills the directions of intersection. Precisely we define on the orthogonal basis for :
[TABLE]
Suppose is a matrix whose columns are a basis for the null space of . If we consider then the span of the columns of is the intersection we are discussing, i.e., the intersection of with span(). To be sure either the column of is in the common null space of and , so that the corresponding column of is [math], or else if one follows through the first columns of in , one sees that we will hit the “” columns in only, hence we will emerge a linear combination of .
We can thus describe as the orthogonal projection onto the left nullspace of which is the orthogonal complement of the column space of .
7.3 The correct modified theorem requires
We remind the reader of the usual definition of the matrix pseudoinverse in terms of the singular value decomposition:
[TABLE]
where means taking the inverse of the finite entries in When has full column rank and has full row rank, we have It is easy to see that
Theorem 7.1**.**
*Let be a matrix whose columns are a basis for the nullspace of , and be the orthogonal projection onto the left nullspace of . The finite non-zero generalized singular values of are the same as the non-zero singular values of . *
Proof 7.2**.**
Setting notation, we have
[TABLE]
so that , where are the rightmost non-zero columns of (indexed by ) and are the corresponding rows (the bottom ) of . (To see this note that where the “?” denotes rows that hit the 0 columns in so we do not care what they are.) We point out that has full row rank as the rows of are a subset of the full row rank matrix . We immediately conclude that
[TABLE]
We further claim that
[TABLE]
where are the exact corresponding columns of (the rightmost indexed by ), which are the . To see this, first observe that the definition of as described in Section 7.2. is where are the rightmost columns of the identity indexed by . Thus the [math] indicating the columns of killed by .
Now that we have compressed out the immaterial columns, and knowing that by the full row rank condition, we can compute
[TABLE]
This is a singular value decomposition of , with an diagonal matrix, with the in decreasing order on the diagonal and no .
Corollary 7.3**.**
*If has full column rank () or if the weaker condition holds that , then is not needed, i.e., the finite non-zero generalized singular values of are the same as the non-zero singular values of . *
Proof 7.4**.**
If , then has nothing in the nullspace, has no columns, and is obviously . More generally, if , then has nothing in its nullspace that is not also in the nullspace of , so if has any columns at all, it is the zero matrix, so again projection onto the left nullspace is .
7.4 Blame the pseudoinverse not the GSVD
The difficulty with may seem like an unfortunate consequence of infinite singular values, but in point of fact, it is related to the discontinuity in the definition of the pseudoinverse. If one takes a bigger picture viewpoint, it is easy to see that infinite singular values are natural limits of finite singular values.
The only truly natural discontinuity in the GSVD is the reduction of rank of which reduces the dimensionality of the hyperplane (and the rank of .)
We mention some limit type results which help understand the nature of the infinite generalized singular values:
Theorem 7.5**.**
*If rank()=, and , then we can define a continuous curve of matrices of the same shape as without infinite generalized singular values when is small but whose limit as continuously converges to the generalized singular values of , finite or infinite. *
Proof 7.6**.**
Take
[TABLE]
where
[TABLE]
Corollary 7.7**.**
*If rank()=, and , then we can define a continuous curve of matrices without infinite generalized singular values when is small but whose limit as continuously converges to the generalized singular values of by row augmenting to contain rows. *
Proof 7.8**.**
*Simply add rows of zeros to the bottom of . This does not change the generalized singular values of or , or . is augmented with rows of zeros and is augmented with rows and columns with an identity matrix. Apply the construction in Theorem 7.5 to complete the proof. *
Example 7.9**.**
Consider that
[TABLE]
One might seek nearby matrices with no infinite generalized singular values. This is impossible if we insist that remain but is possible if we augment with one row, which in this case we can simply take
[TABLE]
Corollary 7.10**.**
*Suppose has rank for is a continuous curve, where has rank for but may drop rank at . We then have that the generalized singular values are a continuous function of as . *
Proof 7.11**.**
*The only true discontinuity in the GSVD is the potential for a drop in rank of . This is avoided in the statement by keeping rank . Thus the limit of the column space is the column space of the limit. *
We do remark on the other hand that if drops rank, then we can only say that the limit of the column space contains the column space of the limit, which can lead to all kind of discontinuities in the generalized singular values.
8 GSVD Applications and their Geometric Interpretations
8.1 Geometry of Tikhonov Regularization
8.1.1 The two cosine damping
We show how geometry can add insight to our understanding of Tikhonov Regularization:
[TABLE]
by providing a two cosines view of damping. Specifically, the way Tikhonov regularization reduces the solution or “weights,” is usually understood algebraically in terms of adding a regularizer term that moves the original problem away from some kind of ill-conditioned setting. We will show that, in Figure 7, one cosine comes from the projection from the horizontal (blue) plane to the span of red plane. The other cosine comes from the non-canonical basis of the plane: the columns of which elongate with , hence the coordinates shrink.
While the “calming influence” **[19, Section 6.1.26]**, **[5, Section 4.4]**, **[21]** of the regularization parameter has been well studied algebraically, we identify geometrically in (5) the influence as a factor of where so that , where is the angle that corresponds to We will compare the formulation with previous formulations explaining why we find that this formulation feels somewhat more insightful.
Before we start, let us recap Tikhonov regularization. Suppose we have a matrix , which we will assume has full column rank. The problem (standard least squares) is the computation of the standard solution to the normal equations To regularize we pick a suitable matrix , and a “regularization parameter” , and then solve instead which is equivalent to computing
[TABLE]
From the geometrical point of view, we believe the reformulation in Theorem 8.1 below is more revealing of the “calming effect.” Figure 7 demonstrates the hyperplane onto which gets projected for varying
For every , we obtain the GSVD as a continuous function of :
[TABLE]
where it is easy to check that is square non-singular. It is convenient to use the compact format described in Section A.3 here. Thus we take to be , and to be square diagonal . The exact values in and come from the trigonometry with unit hypotenuse, fixed base, and sliding height of a triangle at , as shown in the left side of Figure 8. Namely
[TABLE]
where the operations happen on the diagonal. It also follows that
[TABLE]
The equation has a nice trigonometric interpretation. As the column vectors of grow in length (these lengths are encoded in ). the cosines in relate back to the columns which are shorter in length. This is depicted in Figure 8.
Theorem 8.1**.**
The solution to the Tikhonov Regularization problem can be written as
[TABLE]
*where is the least squares solution to and , where *
Proof 8.2**.**
Since
[TABLE]
we can calculate
[TABLE]
*and use the relation to complete the proof. *
Comparison and Discussion
The standard application of the GSVD to Tikhonov relates to and thus gives formulas involving the non-physical, non-homogeneous factor of rather than the homogeneous .
The formulation in Theorem 8.1 diagonalizes the operator that relates to We understand that when are the coordinates of a linear combination of the columns of , we have that are the coordinates of that same vector in the natural basis. Thus the interpretation of simply is:
Write the vector in the natural coordinate system; 2. 2.
Multiply by a cosine squared in every natural direction; 3. 3.
Return to the original coordinate sytem.
8.2 Humans vs Yeast: Comparative Data Modeling
In a series of beautiful applications of the GSVD, Alter, et.al. **[3, 32, 33, 35, 2]** propose an approach towards data reconstruction and classification. In their case **[3]**, the and are two DNA microarrays, one from humans and the other from yeast. The rows of and live in or gene space. The rows of form a basis for this row (or gene) space, and are denoted genelets. A natural question is whether the genelet is primarily human, primarily yeast, or a mixture. In general, given two matrices with equal columns, one wants to classify the basis vectors in the rows of according to its source.
The GSVD provides a natural solution by creating a single coherent model from the two datasets recording different aspects of interrelated phenomena by simultaneously identifying the similar and dissimilar between the two corresponding column-matched but row-independent matrices. For each of the rows, we have that denotes the angle towards . In Figure 9, we portray this. We note that **[3]** displays the angles from to , but we will stick with the [math] to convention. It is convenient that the rows of are already sorted from “mostly ,” to “mostly .”
Our ellipse picture Figure 3 reveals the geometry readily. The all appear on the unit ball.
The comparative Data Reconstruction equation is
[TABLE]
where is the -th row of . (This is exactly Equation (2).) One can preprocess so that each row is of unit direction as it is only the ratio of to that matters. Any ill-conditioning of could be worrisome.
8.3 Signal vs. Noise: A one matrix and one subspace view of the GSVD
The focus on two matrices with the same number of columns is not always the best view of the GSVD. One can take rather a single matrix and any dimensional reference subspace of . We can then think of the GSVD as an additive decomposition:
[TABLE]
where and , and the columns of are orthonormal bases for and respectively. Conversely, is an ordinary GSVD.
By doing this we have a decomposition of such that . Geometrically, instead of decomposing into a “top half” and “bottom half,” into a “horizontal” and “vertical” multiaxis subspace, we are rather allowing for general multiaxes subspaces. One might think of this as a rotated view of Figure 3. More specifically, most of this paper would take and , but all that is required is that and are orthogonal complements.
This geometrical insight underlies an additive decomposition signal processing application found in **[25, 26]** where and play the role of signal + noise.
8.4 Orthonormal Bases for and Friends
The matrix of the GSVD provides, in its columns, orthonormal bases for three mutually orthogonal subspaces that arise in many applications:
[TABLE]
The “completion” referred to in the above equation means that taken together, the columns of and form and orthonormal basis for col(). From the perspective of Figure 3, there are the horizontal directions in the red unit sphere, the generic directions, and the directions that are not present.
8.4.1 Clustering Matrices
An important example where the GSVD lurks implicitly or explicitly is clustering. We will consider an matrix that indicates the clustering, and a matrix that indicates equality of data between the clusters.
We consider data in and assume a partitioning of , into clusters. The indicator matrix corresponding to the partition of is :
[TABLE]
which we can normalize by setting
[TABLE]
In the Julia computing language, the indicator matrix can be generated succinctly with A = cat(ones.(Int,partition)...,dims=1:2), where partition denotes the vector .
The other useful matrix in this context is the constraint matrix whose nullspace is the all ones vector:
[TABLE]
*In Julia, with the LinearAlgebra package, this may be written succinctly as *
B = [I -ones(k-1)].
Given an data matrix there are a number of “scatter matrices” that arise that allow us to compare between clusters and within clusters. Following roughly the notation in **[24]**, we can partition the data
[TABLE]
*Let be the *th column of and let denote the column indices in column , is the mean of the columns in cluster , and is the mean of all the columns. The within, between, and mixed scatter matrices are defined as
[TABLE]
These scatter matrices are readily calculated through the matrix for the GSVD, one can then set U, = SVD(A,B), where the comma indicates that we are requesting only the matrix. We then have that,
[TABLE]
[TABLE]
“Completion” means that and form an orthonormal basis for . The third block is an orthonormal basis for . The “between” and “within” terms are statistics jargon. Given a data vector, the first column extracts the normalized mean. The next block gives a basis for clustered vectors that are mean-free which by removing the fine details within cluster provides a way to compare between clusters. The last block provides the within cluster details. The number of columns is the dimension of the space, and in statistics jargon is known as the “degrees of freedom.” (See **[29, Chap. 10]**.)
The scatter matrices can be calculated in terms of using these formulas
[TABLE]
One recognizes that the matrices in parentheses in the three expressions above are projection matrices and the orthogonality of guarantees that .
8.4.2 One Way ANOVA made simple
A commonly used statistics test is to decide whether a proposed clustering of a vector is justified. The test takes the average (meaning divide by ) square component in the direction and divides it by the average (meaning divide by ) square component in the direction. The following Julia code shows how compactly one can reproduce an example from Wikipedia where one can quickly obtain the number computed in Step 5 of https://en.wikipedia.org/wiki/One-way_analysis_of_variance#Example.
using LinearAlgebra v = [6,8,4,5,3,4,8,12,9,11,6,8,13,9,11,8,7,12] # data vector A = cat(ones.([6,6,6])...,dims=1:2) # Indicator(6,6,6) B = [1 0 -1; 0 1 -1] # Constraint matrix U,= SVD(A,B) # GSVD (norm(U[:,2:3]’v)/norm(U[:,4:18]’v))^2 * 15/2 # The F value
9.264705882352956
While for this problem the classic approach is fine as an algorithm, for general tests for being in the column space of but orthogonal to , the GSVD is worth considering algorithmically and how we are projecting into the non-horizontal directions is worth understanding geometrically.
8.4.3 See a slope? Generalize to a GSVD
In the last line of the above code snippet, the innocent looking
norm(U[:,2:3]’v)/norm(U[:,4:18]’v)
for an orthogonal matrix carries a message of generalization if you know how to read it. It is a ratio of components in two orthogonal directions. You can call it a slope, or a cotangent, or a tangent. What we called horizontal and vertical multiaxes in Figure 3 may now be labeled in this coordinate system: the between and within axes, following the aforementioned statistics nomenclature.
The generalization of the vector example of Section 3 is a matrix of data, each data item being one row of length . It is therefore natural geometrically to consider and interpret the GSVD as
[TABLE]
The result is canonical directions for considering between vs within as naturally as comparing human vs yeast, or signal vs noise as we have seen in previous applications. The multislope, i.e. the generalized singular values (or perhaps we can call this the ANOVA structure) is [math] in all but at most directions, owing to the number of columns in .
8.4.4 Discriminant Analysis Dimension Reduction
Continuing with the idea in Section 8.4.3. we observe that it is natural to reduce out all but the nonzero ANOVA directions by multiplying on the right by or (for that matter any matrix whose columns span the same subspace of .).
The reduction to columns
[TABLE]
can be rotated back to the standard coordinate system without any change to the nonzero generalized singular values (the ANOVA structure) to yield
[TABLE]
since . We can reduce the mean also by adding back producing our final reduction,
Our simple summary is that for a data matrix , ANOVA measures the nonzero generalized singular values in , a rotated multiaxis system which gives the ratios of the “between" to the “within", and these are the same as for the reduced data matrix because we are suppressing the directions with [math] generalized singular values.
This is a geometrical derivation of an idea and algorithm presented by Park and others **[24]** with a minimization approach. In their algorithm can be derived efficiently as the first columns of the from the GSVD, and the authors point out that the GSVD idea is robust even in the case of too little data.
8.5 The Jacobi Ensemble from Random Matrix Theory is a GSVD
Classical random matrix theory centers are Hermite, Laguerre, and Jacobi ensembles. Historically, they are presented in eigenvalue format, but we have argued that the eigenvalue, SVD, GSVD formats, respectively, are mathematically more natural providing simpler derivations and clearer insights. Suppose we have two Gaussian random matrices () and () with and . For example, A=randn(m1,n) and B=randn(m2,n) using Julia notation. The so-called MANOVA matrix (Multivariate Analysis of Variance) is defined to be
[TABLE]
or in the symmetric form The eigenvalues are the squares of the cosines () and are jointly distributed as **[29]**
[TABLE]
where and ,
[TABLE]
where for real matrices, for complex matrices, for quaternion matrices, and general is worth considering, as in **[16]** . The eigenvalue distribution is known as the Jacobi ensemble, which was first referred by name in **[28]**. We refer interested readers to **[18]**, where the geometrical picture (a simplified version of the ellipse in Figure 3) motivates a direct derivation of the joint density of the Jacobi ensemble. Note that, the direct derivation in **[18]** fills in a gap stated in Remark 2.3 of **[20]**, where an indirect proof using the Fourier Transform is presented, but a direct proof without the Fourier Transform is desired. An earlier alternative direct proof is due to **[40]**.
9 Mathematical Software
Suppose one looks up the GSVD in the help pages of your favorite technical computing language, shown in Table 3 and the Julia version in Table 10. One gets lost in a sea of matrices whose meaning is very hard to fully appreciate. Surprisingly, we find no standard function for the GSVD in Python (NumPy and SciPy) though there is some discussion on StackOverflow **[1]** and Github Numpy issue #3475333https://github.com/numpy/numpy/issues/3475** and scipy issue #743444https://github.com/scipy/scipy/issues/743** and #1491555https://github.com/scipy/scipy/issues/1491**.
10 Acknowledgments
We thank Orly Alter, Zhaojun Bai, Michael Kirby, Andreas Noack, Chris Paige, Haesun Park, Sri Priya Ponnapalli, Charlie van Loan, Sabine van Huffel, and Joos Vandewalle for interesting conversations about the GSVD theory, software, and feedback from lectures at the 2017 Householder Symposium, and 2018 SIAM Applied Linear Algebra meeting. We thank Sungwoo Jeong for finding two references in the literature that come close to the Grassmann viewpoint for the GSVD **[23, 20]**.
We also wish to acknowledge and remember the late Gene Golub, over a decade since his passing, who so effectively promoted the singular value decomposition. We remember a time, not so long ago, when the SVD was unheard of outside of numerical linear algebra circles, and eigenvalues were all that were known. Then like dominos falling, one field after another, biology, economics, fields of engineering, statistics, computer science, and yes pure mathematics learned about the value of the SVD as a tool, as an algorithm, and even as vocabulary for effective communication. Gene with his PROF SVD (Figure 11) and DR SVD California vanity license plates seemed always nearby when a field was starting to catch on. Today the GSVD is as obscure as the SVD was in the early days. We feel that the GSVD’s time has come. We would be very pleased if one by one other fields would catch on.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] GSVD for python generalized singular value decomposition , https://stackoverflow.com/questions/37814024/GSVD-for-python-generalized-singular-value-decomposition .
- 2[2] K. A. Aiello, S. P. Ponnapalli, and O. Alter , Mathematically universal and biologically consistent astrocytoma genotype encodes for transformation and predicts survival phenotype , APL Bioengineering, 2 (2018), p. 031909.
- 3[3] O. Alter, P. O. Brown, and D. Botstein , Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms , Proceedings of the National Academy of Sciences, 100 (2003), pp. 3351–3356.
- 4[4] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. Mc Kenney, and D. Sorensen , LAPACK Users’ Guide , vol. 9, Siam, 1999, http://www.netlib.org/lapack/lug/node 36.html .
- 5[5] Z. Bai , The csd, gsvd, their applications and computations , Preprint Series 958. Institute for Mathematics and its Applications, University of Minnesota, (1992).
- 6[6] D. Callaerts , Signal separation methods based on singular value decomposition and their application to the real-time extraction of the fetal electrocardiogram from cutaneous recordings , Ph D thesis, Katholieke Universiteit Leuven, 1989.
- 7[7] D. Callaerts, B. De Moor, J. Vandewalle, W. Sansen, G. Vantrappen, and J. Janssens , Comparison of svd methods to extract the foetal electrocardiogram from cutaneous electrode signals , Medical and Biological Engineering and Computing, 28 (1990), p. 217.
- 8[8] D. Chu, L. De Lathauwer, and B. De Moor , A qr-type reduction for computing the svd of a general matrix product/quotient , Numer. Math., 95 (2003), pp. 101–121.
