First-order Perturbation Theory for Eigenvalues and Eigenvectors
Anne Greenbaum, Ren-cang Li, Michael L. Overton

TL;DR
This paper develops and compares two methods for first-order perturbation analysis of eigenvalues and eigenvectors of general square matrices, extending classical results and providing practical verification techniques.
Contribution
It introduces a novel block-diagonalization proof for eigenvector perturbation and discusses extensions, normalization, and computational verification of the theory.
Findings
Two distinct proofs of eigenvector perturbation theorem presented
Extension of perturbation theory to various normalizations
Guidance on computational verification of results
Abstract
We present first-order perturbation analysis of a simple eigenvalue and the corresponding right and left eigenvectors of a general square matrix, not assumed to be Hermitian or normal. The eigenvalue result is well known to a broad scientific community. The treatment of eigenvectors is more complicated, with a perturbation theory that is not so well known outside a community of specialists. We give two different proofs of the main eigenvector perturbation theorem. The first, a block-diagonalization technique inspired by the numerical linear algebra research community and based on the implicit function theorem, has apparently not appeared in the literature in this form. The second, based on complex function theory and on eigenprojectors, as is standard in analytic perturbation theory, is a simplified version of well-known results in the literature. The second derivation uses a convenient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
First-order Perturbation Theory for
Eigenvalues and Eigenvectors
Anne Greenbaum Department of Applied Mathematics, University of Washington.
Ren-cang Li Department of Mathematics, University of Texas at Arlington. Supported in part by National Science Foundation Grants CCF-1527104 and DMS-1719620.
Michael L. Overton Courant Institute of Mathematical Sciences, New York University. Supported in part by National Science Foundation Grant DMS-1620083.
(
Dedicated to Peter Lancaster and G.W. (Pete) Stewart
Masters of Analytic Perturbation Theory and Numerical Linear Algebra
on the Occasion of their 90th and 79th Birthdays )
Abstract
We present first-order perturbation analysis of a simple eigenvalue and the corresponding right and left eigenvectors of a general square matrix, not assumed to be Hermitian or normal. The eigenvalue result is well known to a broad scientific community. The treatment of eigenvectors is more complicated, with a perturbation theory that is not so well known outside a community of specialists. We give two different proofs of the main eigenvector perturbation theorem. The first, a block-diagonalization technique inspired by the numerical linear algebra research community and based on the implicit function theorem, has apparently not appeared in the literature in this form. The second, based on complex function theory and on eigenprojectors, as is standard in analytic perturbation theory, is a simplified version of well-known results in the literature. The second derivation uses a convenient normalization of the right and left eigenvectors defined in terms of the associated eigenprojector, but although this dates back to the 1950s, it is rarely discussed in the literature. We then show how the eigenvector perturbation theory is easily extended to handle other normalizations that are often used in practice. We also explain how to verify the perturbation results computationally. We conclude with some remarks about difficulties introduced by multiple eigenvalues and give references to work on perturbation of invariant subspaces corresponding to multiple or clustered eigenvalues. Throughout the paper we give extensive bibliographic commentary and references for further reading.
1 Introduction
Eigenvalue perturbation theory is an old topic dating originally to the work of Rayleigh in the 19th century. Broadly speaking, there are two main streams of research. The most classical is analytic perturbation theory (APT), where one considers the behavior of eigenvalues of a matrix or linear operator that is an analytic function of one or more parameters. Authors of well-known books describing this body of work include Kato [Kat66, Kat76, Kat82, Kat95],111The first edition of Kato’s masterpiece Perturbation Theory for Linear Operators was published in 1966 and a revised second edition appeared in 1976. The most recent edition is the 1995 reprinting of the second edition with minor corrections. Most of this book is concerned with linear operators, but the first two chapters treat the finite-dimensional case of matrices, and these appeared as a stand-alone short version in 1982. Since we are only concerned with matrices in this article, our references to Kato’s book are to the 1982 edition, although in any case the equation numbering is consistent across all editions. Rellich [Rel69], Chatelin [Cha11], Baumgärtel [Bau85] and, in text book form, Lancaster and Tismenetsky [LT85]. Kato [Kat82, p. XII]) and Baumgärtel [Bau85, p. 21] explain that it was Rellich who first established in the 1930s that when a Hermitian matrix or self-adjoint linear operator with an isolated eigenvalue of multiplicity is subjected to a real analytic perturbation, that is a convergent power series in a real parameter , then (1) it has exactly eigenvalues converging to as , (2) these eigenvalues can also be expanded in convergent power series in and (3) the corresponding eigenvectors can be chosen to be mutually orthogonal and may also be written as convergent power series. As Kato notes, these results are exactly what were anticipated by Rayleigh, Schrödinger and others, but to prove them is by no means trivial, even in the finite-dimensional case.
The second stream of research is largely due to the numerical linear algebra (NLA) community. It is mostly restricted to matrices and generally concerns perturbation bounds rather than expansions, describing how to bound the change in the eigenvalues and associated eigenvectors or invariant subspaces when a given matrix is subjected to a perturbation with a given norm and structure. Here there are a wide variety of well-known results due to many of the founders of matrix analysis and numerical linear algebra: Gerschgorin, Hoffman and Wielandt, Mirksy, Lidskii, Ostrowski, Bauer and Fike, Henrici, Davis and Kahan, Varah, Ruhe, Stewart, Elsner, Demmel and others. These are discussed in many books, of which the most comprehensive include those by Wilkinson [Wil65], Stewart and Sun [SS90], Bhatia [Bha97, Bha07] and Stewart [Ste01], as well as Chatelin [Cha12], which actually covers both the APT and the NLA streams of research in some detail. See also the survey by Li [Li14]. An important branch of the NLA stream concerns the pseudospectra of a matrix; see the book by Trefethen and Embree [TE05] and the Pseudospectra Gateway web site [ET].
This paper is inspired by both the APT and the NLA streams of research, and its scope is limited to an important special case: first-order perturbation analysis of a simple eigenvalue and the corresponding right and left eigenvectors of a general square matrix, not assumed to be Hermitian or normal. The eigenvalue result is well known to a broad scientific community. The treatment of eigenvectors is more complicated, with a perturbation theory that is not so well known outside a community of specialists. We give two different proofs of the main eigenvector perturbation theorem. The first, inspired by the NLA research stream and based on the implicit function theorem, has apparently not appeared in the literature in this form. The second, based on complex function theory and on eigenprojectors, as is standard in APT, is largely a simplified version of results in the literature that are well known. The second derivation uses a convenient normalization of the right and left eigenvectors that depends on the perturbation parameter, but although this dates back to the 1950s, it is rarely discussed in the literature. We then show how the eigenvector perturbation theory is easily extended to handle other normalizations that are often used in practice. We also explain how to verify the perturbation results computationally. In the final section, we illustrate the difficulties introduced by multiple eigenvalues with two illuminating examples, and give references to work on perturbation of invariant subspaces corresponding to multiple or clustered eigenvalues.
2 First-order perturbation theory for a simple eigenvalue
Throughout the paper we use to denote the vector or matrix 2-norm, to denote the identity matrix of order , the superscript to denote transpose and to denote complex conjugate transpose. Greek lower case letters denote complex scalars. Latin lower case letters denote complex vectors, with the exception of for the imaginary unit and for integers. Upper case letters denote complex matrices or, in some cases, sets in the complex plane. We begin with an assumption that also serves to establish our notation.
Assumption 1
Let have a simple eigenvalue corresponding to right eigenvector (so with ) and left eigenvector (so with ), normalized so that . Let and let be a complex-valued matrix function of a complex parameter that is analytic in a neighborhood of , satisfying .
Remark 1
The normalization is always possible since the right and left eigenvector corresponding to a simple eigenvalue cannot be orthogonal. Note that since and are unique only up to scalings, we may multiply by any nonzero complex scalar provided we also scale by the reciprocal of the conjugate of so that remains equal to one. The use of the complex conjugate transpose in instead of an ordinary transpose is purely a convention that is often, but not universally, followed. The statement that the matrix is analytic means that each entry of is analytic (equivalently, complex differentiable or holomorphic) in in a neighborhood of .
The most basic result in eigenvalue perturbation theory follows.
Theorem 1
(Eigenvalue Perturbation Theorem)* Under Assumption 1, has a unique eigenvalue that is analytic in a neighborhood of , with and with*
[TABLE]
where and are respectively the derivatives of and at .
The proof appears in the next section.
The quantity
[TABLE]
introduced by [Wil65], is called the eigenvalue condition number for . We have . In the real case, is the reciprocal of the cosine of the angle between and . In the special case that is Hermitian, its right and left eigenvectors coincide so , but in this article we are concerned with general square matrices.
In the APT research stream, instead of eigenvectors, the focus is mostly on the eigenprojector222In APT, the standard term is “eigenprojection”, while in NLA, “spectral projector” is often used. The somewhat nonstandard term “eigenprojector” is a compromise. corresponding to , which can be defined as
[TABLE]
and which satisfies
[TABLE]
Note that the eigenprojector does not depend on the normalization used for the eigenvectors and (assuming ), which simplifies the associated perturbation theory, and note also that . Let denote trace and recall the property . Clearly, equation (1) is equivalent to
[TABLE]
Kato [Kat82, p. XIII] explains that the results of Rellich for analytic perturbations of self-adjoint linear operators were extended (by Sz-Nagy, Kato and others) to non-self-adjoint linear operators and therefore non-Hermitian matrices in the early 1950s using complex function theory, so (3), equivalently (1), was known at that time. However, it seems that these results were not well known until the publication of the first edition of Kato’s book in 1966 (although Kato did present a summary of these results for the linear case at a conference on matrix computations [Giv58, p. 104] in 1958). Eq. (1) was independently obtained for the analytic case by Lancaster [Lan64], and for the linear case by Wilkinson [Wil65, p.68–69] and Lidskii [Lid66]. They all used the theory of algebraic functions to obtain their results, exploiting the property that eigenvalues are roots of the characteristic polynomial. A different technique is used by Stewart and Sun [SS90, p. 185] who show that the eigenvalue is differentiable w.r.t. its matrix argument using a proof depending on Gerschgorin circles; the result for a differentiable family then follows from the ordinary chain rule.
We close this section with a brief discussion of multiple eigenvalues. The algebraic multiplicity of is the multiplicity of the factor in the characteristic polynomial , while the geometric multiplicity (which is always less than or equal to the algebraic multiplicity) is the number of associated linearly independent right (equivalently, left) eigenvectors. A simple eigenvalue has both algebraic and geometric multiplicity equal to one. More generally, if the algebraic and geometric multiplicity are equal, the eigenvalue is said to be semisimple or nondefective. An eigenvalue whose geometric multiplicity is one is called nonderogatory.
3 First-order perturbation theory for an eigenvector corresponding to a simple eigenvalue
We begin this section with a basic result from linear algebra; see [Ste01, Theorem 1.18 and eq. (3.10)] for a proof.
Lemma 1
Suppose Assumption 1 holds. There exist matrices , and satisfying
[TABLE]
Note that, from , it is immediate that the columns of and respectively span the null spaces of and . Furthermore, we have
[TABLE]
We also have and , so the columns of and are respectively bases for right and left ()-dimensional invariant subspaces of , and where is the complementary projector to . If we assume that is diagonalizable, i.e., with linearly independent eigenvectors, then we can take the columns of and of to respectively be right and left eigenvectors corresponding to the eigenvalues of that differ from , which we may denote by (some of which could coincide, as diagonalizability implies only that the eigenvalues are semisimple, not that they are simple). In this case, we can take to be the diagonal matrix . More generally, however, and may be any matrices satisfying (4), ignoring the multiplicities and Jordan structure of the other eigenvalues.
Now let
[TABLE]
It then follows that
[TABLE]
In the NLA stream of research, is called the group inverse of [SS90, p. 240–241], [MS88], [CM91], [GO11, Theorem 5.2.]. In the APT research stream, it is called the reduced resolvent matrix of w.r.t. the eigenvalue (see [Kat82, eqs. I.5.28 and II.2.11].)333The notion of group inverse or reduced resolvent extends beyond the simple eigenvalue context to multiple eigenvalues. If is a defective eigenvalue with a nontrivial Jordan structure, the reduced resolvent matrix of with respect to must take account of “eigennilpotents”. It is the same as the Drazin inverse of , a generalization of the group inverse (see [Cha12, p.98], [CM91] and, for a method to compute the Drazin inverse, [GOS15]).
We now give a first-order perturbation theorem for right and left eigenvectors corresponding to a simple eigenvalue.
Theorem 2
(Eigenvector Perturbation Theorem)* Suppose that Assumption 1 holds and define , and as in Lemma 1 and as in (5). Then there exist vector-valued functions and that are analytic in a neighborhood of with , and , satisfying the right and left eigenvector equations*
[TABLE]
where is the analytic function from Theorem 1. Furthermore, these can be chosen so that their derivatives, and , satisfy and , with444We use the notation to mean .
[TABLE]
Note that it is , not , that is analytic with respect to the complex parameter . However, is differentiable w.r.t. the real and imaginary parts of . Note also that we do not claim that and are unique, even when they are chosen to satisfy (7) and (8). Sometimes, other normalizations of the eigenvectors, not necessarily satisfying (7) and (8), are preferred, as we shall discuss in §3.4.
It follows from Theorem 2 that
[TABLE]
where , the ordinary matrix condition number of , equivalently of (as ), with the same bound also holding for .
In the diagonalizable case, as already noted above, we can take , so
[TABLE]
with the same bound also holding for . In this case, the formula (7) for the eigenvector derivative was given by Wilkinson [Wil65, p. 70]. He remarked (p. 109) that although his derivation is essentially classical perturbation theory, a simple but rigorous treatment did not seem to be readily available in the literature. Lancaster [Lan64] and Lidskii [Lid66] both showed that the perturbed eigenvector corresponding to a simple eigenvalue may be defined to be differentiable at , but they did not give the first-order perturbation term. The books by Stewart and Sun [SS90, sec. V.2] and Stewart [Ste01, sec. 1.3 and 4.2] give excellent discussions of the issues summarized above as well as many additional related results. The eigenvector derivative formula (7) in Theorem 2 above is succinctly stated just below [Ste01, eq. (3.14), p. 46], where on the same page Theorem 3.11 stating it more rigorously, and providing additional bounds, is also given; see also [Ste01, line 4, p. 48]. The reader is referred to [Ste71] and [Ste73] for a proof. Stewart [Ste71] introduced the idea of establishing the existence of a solution to an algebraic Riccati equation by a fixed point iteration, a technique that was followed up in [Ste73, eq. (1.5), p. 730] and [Dem86, eq. (7.2), p.187]. Alternatively, proofs of Theorem 2 may be derived by various approaches based on the implicit function theorem; see [Mag85, Sun85] and [Sun98, Sec. 2.1]. A related argument appears in [Lax07, Theorem 9.8]. These approaches generally focus on obtaining results for the right eigenvector subject to some normalization; they can also be applied to obtain results for the left eigenvector, and these can be normalized further to obtain the condition . The proof that we give in §3.2 is also based on the implicit function theorem, using a block-diagonalization approach that obtains the perturbation results for the right and left eigenvectors simultaneously, ensuring that . Note, however, that a fundamental difficulty with eigenvectors is their lack of uniqueness. In contrast, the eigenprojector is uniquely defined, and satisfies the following perturbation theorem.
Theorem 3
(Eigenprojector Perturbation Theorem)* Suppose Assumption 1 holds and define , and as in Lemma 1 and as in (5). Then there exists a matrix-valued function that is analytic in a neighborhood of with , satisfying the eigenprojector equations*
[TABLE]
and with derivative given by
[TABLE]
This result is well known in the APT research stream [Kat82, eq. (II.2.13)], and, like the eigenvalue perturbation result, goes back to the 1950s. Furthermore, while it’s easy to see how Theorem 3 can be proved using Theorem 2, it is also the case that Theorem 2 can be proved using Theorem 3, by defining the eigenvectors appropriately in terms of the eigenprojector, as discussed below. This provides a convenient way to define eigenvectors uniquely.
We note that Theorems 1, 2 and 3 simplify significantly when is a Hermitian function of a real parameter , because then the right and left eigenvectors coincide. The results for the Hermitian case lead naturally to perturbation theory for singular values and singular vectors of a general rectangular matrix; see [Ste01, Sec. 3.3.1] and [Sun98, Sec. 3.1].
3.1 Nonrigorous derivation of the formulas in Theorems 1,
If we assume that for in some neighborhood of , the matrix has an eigenvalue and corresponding right and left eigenvectors and with such that , and are all analytic functions of satisfying , , and , then differentiating the equation and setting , we find
[TABLE]
Multiplying on the left by and using and , we obtain the formula for :
[TABLE]
Equation (11) can be written in the form
[TABLE]
Using Lemma 1, we can write
[TABLE]
and substituting this into (12) and multiplying on the left by , we find
[TABLE]
The first row equation here is , which is simply the formula for . The remaining equations are
[TABLE]
and since and is invertible, we obtain the following formula for :
[TABLE]
Note that is not completely determined by this formula because each eigenvector is determined only up to a multiplicative constant. If we can choose the scale factor in such a way that then, multiplying on the left by and recalling that , we obtain the formula in (7) for :
[TABLE]
Similarly, the formula (8) for can be derived assuming that we can choose so that .
Once formulas (7) and (8) are established, formula (10) follows immediately from
[TABLE]
Evaluating at and using formulas (7) and (8) for and , we obtain formula (10) for .
In the following subsections, we establish the assumptions used here when is a simple eigenvalue of , and thus obtain proofs of Theorems 1, 2, and 3, in two different ways. The first involves finding equations that a similarity transformation must satisfy if it is to take (or, more specifically, ) to a block diagonal form like that in Lemma 1 for . The implicit function theorem555Since the perturbation parameter and the matrix family are complex, we need a version of the implicit function theorem from complex analysis, but in the special case that and are real, we could use a more familiar version from real analysis. In that case, although some of the eigenvalues and eigenvectors of a real matrix may be complex, they occur in complex conjugate pairs and are easily represented using real quantities. is then invoked to show that these equations have a unique solution, for in some neighborhood of , and that the solution is analytic in . The second uses the argument principle and the residue theorem from complex analysis to establish that, for in a neighborhood of , each matrix has a simple eigenvalue that is analytic in and satisfies . It then follows from Lemma 1 that there is a similarity transformation taking to block diagonal form, but Lemma 1 says nothing about analyticity or even continuity of the associated matrices and . Instead, the similarity transformation is applied to the resolvent and integrated to obtain an expression for the eigenprojector that is shown to be analytic in . Finally, left and right eigenvectors satisfying the analyticity conditions along with the derivative formulas (7) and (8) are defined in terms of the eigenprojector.
Note that the assumptions used here do not generally hold when is not a simple eigenvalue of , as discussed in §4.
3.2 First proof of Theorems 1, 2 and 3, using techniques from the NLA research stream
The first proof that we give is inspired by the NLA research steam, but instead of Stewart’s fixed-point iteration technique mentioned previously, we rely on the implicit function theorem [Kra01, Theorem 1.4.11], which we now state in the form that we need.
Theorem 4
(Implicit Function Theorem)* Let be an open set, an analytic mapping, and a point where and where the Jacobian matrix is nonsingular. Then the system of equations has a unique analytic solution in a neighborhood of that satisfies .*
We now exploit this result in our proof of Theorem 2. The setting of the stage before applying the implicit function theorem follows Demmel’s variant of Stewart’s derivation mentioned above. We obtain a proof of Theorem 1 along the way, and then give a proof of Theorem 3 as an easy consequence.
Using Lemma 1, define
[TABLE]
Here the scalar , the row and column vectors and and the matrix are analytic functions of near , since is. In what follows, we will transform this matrix into a block diagonal matrix by a similarity transformation. We will choose , so that
[TABLE]
[TABLE]
with and , and consequently , , and , analytic in a neighborhood of , with , and hence . This transformation idea traces back to [Ste73, p.730] who designed a with , but in the form given here, it is due to [Dem86, p.187].
We would like to have, for sufficiently close to , the similarity transformation
[TABLE]
where and are also analytic, with and . Since is block diagonal by definition, we need to be block diagonal. Suppressing the dependence on , this last matrix is given by
[TABLE]
For clarity, we introduce the notation for the analytic row vector function . We then seek column and row vector analytic functions and making the off-diagonal blocks of (15) zero, i.e., satisfying
[TABLE]
Taking , , and equal to first and then with equal to and , respectively, in Theorem 4, we note that since , , and the Jacobian matrices
[TABLE]
are nonsingular, there are unique functions and , analytic in a neighborhood of , satisfying (16) and (17) with , , and
[TABLE]
where, using the definition (13), we have
[TABLE]
Thus, (15) is block diagonal and hence (14) holds, with an eigenvalue of , and with
[TABLE]
again suppressing dependence on on the right-hand sides. These functions are analytic in a neighborhood of , satisfying and , with
[TABLE]
proving Theorem 1.
Let be the first column of the identity matrix . Multiplying (14) on the left by and on the right by we obtain, using and ,
[TABLE]
so
[TABLE]
is analytic with and satisfies the desired right eigenvector equation in (6). Likewise, multiplying (14) on the right by and on the left by gives
[TABLE]
so
[TABLE]
is analytic with and satisfies the left eigenvector equation in (6). Furthermore, , as claimed. Finally, differentiating and we have
[TABLE]
so combining these with (18) and (19), recalling that , and using the definition of in (5), we obtain the eigenvector derivative formulas (7), (8). The properties and follow, so Theorem 2 is proved.
Finally, define
[TABLE]
The eigenprojector equations (9) follow immediately. We have
[TABLE]
so Theorem 3 follows from (7) and (8).
3.3 Second proof of Theorems 1, 2 and 3, using techniques from the APT research stream
In this proof, in contrast to the previous one, we focus on proving Thereom 3 first, obtaining the proof of Theorem 1 along the way, and finally obtaining Theorem 2 as a consequence. This proof of Theorem 3 is based on complex function theory, as is standard in APT. However, our derivation is simpler than most given in the literature, which usually prove more general results, such as giving complete analytic expansions for the eigenvalue and eigenprojector, while we are concerned only with the first order term. The key to the last part of the proof, yielding Theorem 2, is to use an appropriate eigenvector normalization.
The main tool here is the residue theorem [MH06, p. 293-294, Thm. 8.1 and 8.2]:
Theorem 5
(Residue Theorem)* Let be a simply connected domain in and let be a simple closed positively oriented contour that lies in . If is analytic inside and on , except at the points that lie inside , then*
[TABLE]
where if has a simple pole at , then
[TABLE]
and if has a pole of order at , then
[TABLE]
Let be the boundary of an open set in the complex plane containing the simple eigenvalue , with no other eigenvalues of in . First note that since , the characteristic polynomial of , does not vanish on , the same will hold for all polynomials with coefficients sufficiently close to those of ; in particular, it will hold for , the characteristic polynomial of , if is sufficiently close to ; say, . From here on, we always assume that . By the argument principle [MH06, p. 328, Thm. 8.8], the number of zeros of inside is
[TABLE]
For , this value is . Since for each , the integrand is a continuous function of , the integral above is as well. Since it is integer-valued, it must be the constant . So, let denote the unique root of in the region , i.e., the unique eigenvalue of in . Note that this means that can be written in the form , where has no roots in . It therefore follows from the residue theorem that
[TABLE]
since
[TABLE]
Since the left-hand side of (20) is an analytic function of , the right-hand side is as well. Thus has a unique eigenvalue in and is an analytic function of .
For not an eigenvalue of , define the resolvent of by
[TABLE]
Lemma 1 states that there exist left and right eigenvectors and associated with and satisfying , along with matrices , and , satisfying
[TABLE]
[TABLE]
Note that we do not claim that and are analytic, or even continuous functions of . It follows that the resolvent of satisfies
[TABLE]
where . Now, is a matrix-valued function of with no poles in , so it follows from the residue theorem (applied to the functions associated with each entry of ) that and therefore from (25) that
[TABLE]
For , this is .
From the definition of the resolvent (21), it follows that since is an analytic function of , the resolvent is as well, provided that is not an eigenvalue of . Differentiating the equation
[TABLE]
with respect to gives
[TABLE]
Considering expression (26) for , it follows that is also analytic, and its derivative at is
[TABLE]
Using (25), and writing for , we obtain
[TABLE]
From the residue theorem, the first term is zero since the integrand has a pole of order 2 at , with . The second term is also zero because the integrand has no poles inside . Since the integrand of the remaining term has a simple pole at with , we have
[TABLE]
where is the same as defined in (5). This proves Theorem 3.
Now we define the eigenvectors in terms of the eigenprojector, using a normalization that goes back to [SN51] (see [Bau85, eq. (7.1.12)]):
[TABLE]
where we use the principal branch of the square root function (and assume that is small enough so that implies that the quantities under the square roots, which are for , are bounded away from the origin and the negative real axis).666See also [Kat82, eq. (II.3.24)], which uses a related definition but without the square root, resulting in instead of . Since is analytic, it follows that and are as well. From (28) we have
[TABLE]
and similarly , so the eigenvector equations (6) hold as required, and, since , we obtain
[TABLE]
as claimed in Theorem 2.
To obtain the eigenvalue derivative, we differentiate the equation
[TABLE]
and evaluate it at :
[TABLE]
Multiplying by on the left, this becomes
[TABLE]
proving Theorem 1.
Finally, using (10), we have
[TABLE]
The first three terms are zero since and likewise . So, as , we have
[TABLE]
Similarly, . The properties and follow, so the proof of Theorem 2 is complete.
3.4 Eigenvector normalizations
Theorem 2 does not state formulas for the analytic eigenvector functions and , specifying only that they exist, satisfying , with derivatives given by (7) and (8). Furthermore, the first proof given in §3.2 does not provide formulas for , , showing only that they exist via the implicit function theorem. However, the second proof given in §3.3 does provide formulas for and in terms of the eigenprojector (which is uniquely defined) and the eigenvectors and . This formula (28) may be viewed as a normalization because it provides a way to define the eigenvectors uniquely, and furthermore, it has the property that and are analytic near and satisfy . Let us refer to (28) as “normalization 0”.
Although the beautifully simple normalization (28) dates to the 1950s, it seems to be rarely used. In this subsection we discuss some other normalizations that are more commonly used in practice. Let us denote the resulting normalized eigenvectors by and , and relate them to and , as defined in (28), by
[TABLE]
where and are two nonzero complex-valued scalar functions of to be defined below. Here we use the complex conjugate of in the definition to be consistent with the conjugated left eigenvector notation. The analyticity of and near depends on that of and . We consider several possible normalizations, continuing to assume that but not necessarily that for . In all cases, the formula
[TABLE]
follows immediately from (29), so to determine the derivatives and , we need only determine , , and , obtaining and from (7), (8), as stated in Theorem 2. Note that the derivatives of the normalized eigenvectors, and , do not necessarily satisfy and , unlike the derivatives and . We now define several different normalizations:
(i.e., the first entries of and are one). This is possible for sufficiently close to if and . Suppose this is the case. Then
[TABLE]
and
[TABLE]
Here , and hence , are analytic near . 2. 2.
and . This normalization is defined near under the assumption that ; no additional assumption is needed since . Clearly (31) holds as before. In addition, we have
[TABLE]
Again, and are analytic near . 3. 3.
and . This normalization is defined near if , which may not be the case when is complex. Suppose this does hold. We have
[TABLE]
[TABLE]
Either sign may be used, resulting in analytic and near . 4. 4.
and . In a way this is the most natural choice of normalization, because it is possible without any assumptions on , beyond . The problem, however, is that it does not define the eigenvectors uniquely. Suppose we choose , . Then , , and are not analytic in , but they are differentiable w.r.t. to the real and imaginary parts of . However, we could equally well multiply and by any unimodular complex number , so there are infinitely many different choices of , that are smooth w.r.t. the real and imaginary parts of (though not analytic in ). A variant on this normalization is known as real-positive (RP) compatibility [GO11]: it requires to be real and positive with .
More generally we could define the normalization in terms of two functions and . Depending on what choice is made, there are several possible outcomes: there could be unique and that satisfy the two normalization equations, as in cases 1 and 2 above when and ; there could be two choices as in case 3 when ; there could be an infinite number of choices as in case 4; or there could be no and that satisfy the normalization equations.
Perturbation theory for normalized eigenvectors, using several of the normalizations given above, was extensively studied by Meyer and Stewart [MS88] and by Bernasconi, Choirat and Seri [BCS11].
3.5 Computation
Suppose we wish to verify the results of Theorems 1, 2 and 3 computationally — always a good idea! Let us consider how to do this in matlab, where eigenvalues as well as right and left eigenvectors can be conveniently computed by the function eig. Let Assumption 1 hold, assuming for convenience that and for some given matrices and with . Take sufficiently small that only one computed eigenvalue of is close to the eigenvalue of . Then, assuming the eigenvalue is not too badly conditioned, we can easily verify the eigenvalue perturbation formula (1) in Theorem 1 by computing the finite difference quotient and comparing it with (1). Since the exact eigenvalue of is analytic in , it is clear that, mathematically, the difference between the difference quotient and the derivative should be as , but numerically, if is too small, rounding error dominates the computation instead [Ove01, Ch. 11].
Similarly, we can compute right and left eigenvectors and corresponding to , normalize these so that , compute the eigenprojector , and verify that the matrix difference quotient approximates formula (10) for the eigenprojector derivative given by Theorem 3.
What about the eigenvector derivative formulas? Implementing “normalization 0” defined in (28), we can compute normalizations of and by
[TABLE]
Note that the formulas on the right-hand side of (36), (37) avoid computing the eigenprojector, which can be advantageous if is large. Then we can verify that the difference quotients and approximate the eigenvector derivatives (7) and (8) given by Theorem 2.
Now consider normalizations 1 to 4 as defined in § 3.4. For normalization 1 (respectively, normalization 2) the formulas for the normalized eigenvectors and their derivatives given by (29) and (30) together with (31), (32) (respectively (31), (33)) can be verified easily provided the first components of and are not zero. Of course, the index 1 is arbitrary. A better choice is to use , where (respectively ) is the index of an entry of (respectively ) with maximum modulus, but this requires access to and . In the case of normalization 3, the formulas for the eigenvectors and their derivatives given by (29) and (30) together with (34) and (35) can also be verified easily, with a caveat due to the freedom in the choice of sign. Specifically, when and are obtained from the computed vectors and , it is important to ensure that the signs of and (and therefore and ) are consistent; this can be done by choosing the signs of the real parts (or the imaginary parts) of and to be the same, where is the index of an entry of with maximum modulus.
As for normalization 4, although we could arbitrarily choose and to be smooth functions w.r.t. the real and imaginary parts of , there is no way to know how to obtain smoothly varying computed eigenvectors from the unnormalized computed eigenvectors and .
Summarizing, the formulas for the derivatives of the eigenvalue and the eigenprojector, given in Theorems 1 and 3, are easily verified numerically, while for eigenvector normalization 0 given by (28) and normalizations 1, 2 and 3 defined in § 3.4, the formulas for the eigenvector derivatives can also usually be verified computationally. However, perhaps surprisingly, there is no panacea when it comes to the eigenvectors. Normalization 0 always requires access to the eigenvectors and , while the only way to ensure that normalizations 1 and 2 are well defined is by providing access to and so that indices and can be used instead of index 1 if necessary. As for normalization 3, it may not be well defined, and even it it is, care must be taken to avoid inconsistent sign choices. Finally, normalization 4 is simply not well defined.
Verification of the eigenvalue, eigenprojector and eigenvector formulas is illustrated by publicly available matlab programs.777https://cs.nyu.edu/overton/papers/eigvecpert-mfiles/eigvecvary_demo.zip. The main routine is eigValProjVecVaryDemo.m
4 Multiple eigenvalues
Theorems 1, 2 and 3 do not generally hold when is a multiple eigenvalue. In this section we consider two illuminating examples where multiple eigenvalues enter the picture. Recall that algebraic and geometric multiplicity, along with the terms semisimple (nondefective) and nonderogatory, were defined at the end of §1. In the following, we use the principal branch of the square root function for definiteness, but any branch would suffice.
Example 1. Let
[TABLE]
with . The limit matrix is a Jordan block, with 0 a defective, nonderogatory eigenvalue with algebraic multiplicity 2 and geometric multiplicity 1. The corresponding right and left eigenvectors are and , but these are mutually orthogonal and cannot be scaled so that . For , has two simple eigenvalues , which are not analytic in any neighborhood of 0. The corresponding right and left eigenvectors are uniquely defined, up to scalings, by
[TABLE]
The right eigenvectors are not analytic near 0, and they both converge to the unique right eigenvector of as . Likewise are not analytic, and they both converge to as . For , we can scale the eigenvectors so that , but then either or (or both) must diverge as , for .
This example is easily extended to the case, where is zero except for a single superdiagonal of 1’s and a bottom left entry , so that is a single Jordan block with 0 a nonderogatory eigenvalue with algebraic multiplicity . The eigenvalues of are then the th roots of unity times . In fact, Lidskii [Lid66] gave a remarkable general perturbation theory for eigenvalues of a linear family for which an eigenvalue of may have any algebraic and geometric multiplicities and indeed any Jordan block structure; see also [Bau85, Sec. 7.4] and [MBO97]. These results are not described in Kato’s books. However, Kato does treat eigenvalue perturbation in detail in the case that the eigenvalues of are semisimple [Kat82, Sec. II.2.3]; see also [LMZ03]. Even in this case, the behavior can be unexpectedly complex, as the following example shows.
Example 2. [LT85, p. 394] Let
[TABLE]
with . This time the limit matrix is the zero matrix, with 0 a semisimple eigenvalue with algebraic and geometric multiplicity 2, so we can take any vectors in as right or left eigenvectors of . The eigenvalues of are , which are not analytic. The corresponding right and left eigenvectors are uniquely defined, up to scaling, by the same formulas (38) as in the previous example. So again (respectively ) are not analytic, and both converge to the same vector (respectively ) as in the previous example when , although in this case has two linearly independent right (and left) eigenvectors. There is no right eigenvector of that converges to a vector that is linearly independent of as , and likewise no left eigenvector that converges to a vector linearly independent of .
In the examples above, a multiple eigenvalue splits apart under perturbation. To avoid dealing with this complexity, one may study the average behavior of a cluster of eigenvalues, and the corresponding invariant subspace, under perturbation. There is a large body of work on this topic: see the books by Kato [Kat82, Sec. II.2.1], Gohberg, Lancaster and Rodman [GLR06], Stewart [Ste01, Ch. 4] and Stewart and Sun [SS90, Ch. 5], the unpublished technical report by Sun [Sun98, Sec. 2.3], two surveys on Stewart’s many contributions by Ipsen [Ips10] and Demmel [Dem10], and papers such as [BDF08, Dem87, DF01, KK14].
In the case of a Hermitian family , the perturbation theory for multiple eigenvalues simplifies greatly; the pioneering results of Rellich were already mentioned in § 1. See [LT85, Sec. 11.7] and [GLR85] for more details.
5 Concluding Remarks
In this paper we have presented two detailed yet accessible proofs of first-order perturbation results for a simple eigenvalue of a matrix and its associated right and left eigenvectors and eigenprojector. We hope this will facilitate the dissemination of these important results to a much broader community of researchers and students than has hitherto been the case. We have also tried to convey the breadth and depth of work in the two principal relevant research streams, Analytic Perturbation Theory and Numerical Linear Algebra. There are, of course, many generalizations that we have not even begun to explore in this article. Just as one example, nonlinear eigenvalue problems, where one replaces by a matrix function with polynomial or more general nonlinear dependence on , arise in many important applications. Perturbation theory for nonlinear eigenvalue problems, representing the APT and NLA communities respectively, may be found in [ACL93] and [BH13]. Finally we remark that our bibliography, while fairly extensive, is in no way intended to be comprehensive.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[ACL 93] Alan L. Andrew, K.-W. Eric Chu, and Peter Lancaster. Derivatives of eigenvalues and eigenvectors of matrix functions. SIAM J. Matrix Anal. Appl. , 14(4):903–926, 1993.
- 2[Bau 85] H. Baumgärtel. Analytic perturbation theory for matrices and operators , volume 15 of Operator Theory: Advances and Applications . Birkhäuser Verlag, Basel, 1985.
- 3[BCS 11] M. Bernasconi, C. Choirat, and R. Seri. Differentials of eigenvalues and eigenvectors in undamped discrete systems under alternative normalizations. In Proceedings of The World Congress on Engineering , pages 285–287. International Association of Engineers, 2011.
- 4[BDF 08] David Bindel, James Demmel, and Mark Friedman. Continuation of invariant subspaces in large bifurcation problems. SIAM J. Sci. Comput. , 30(2):637–656, 2008.
- 5[BH 13] David Bindel and Amanda Hood. Localization theorems for nonlinear eigenvalue problems. SIAM J. Matrix Anal. Appl. , 34(4):1728–1749, 2013.
- 6[Bha 97] Rajendra Bhatia. Matrix analysis , volume 169 of Graduate Texts in Mathematics . Springer-Verlag, New York, 1997.
- 7[Bha 07] Rajendra Bhatia. Perturbation bounds for matrix eigenvalues , volume 53 of Classics in Applied Mathematics . Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2007. Reprint of the 1987 original.
- 8[Cha 11] Françoise Chatelin. Spectral approximation of linear operators , volume 65 of Classics in Applied Mathematics . Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2011. With a foreword by P. Henrici, With solutions to exercises by Mario Ahués, Reprint of the 1983 original.
