First-order Perturbation Theory for Eigenvalues and Eigenvectors

Anne Greenbaum; Ren-cang Li; Michael L. Overton

arXiv:1903.00785·math.NA·June 4, 2019

First-order Perturbation Theory for Eigenvalues and Eigenvectors

Anne Greenbaum, Ren-cang Li, Michael L. Overton

PDF

TL;DR

This paper develops and compares two methods for first-order perturbation analysis of eigenvalues and eigenvectors of general square matrices, extending classical results and providing practical verification techniques.

Contribution

It introduces a novel block-diagonalization proof for eigenvector perturbation and discusses extensions, normalization, and computational verification of the theory.

Findings

01

Two distinct proofs of eigenvector perturbation theorem presented

02

Extension of perturbation theory to various normalizations

03

Guidance on computational verification of results

Abstract

We present first-order perturbation analysis of a simple eigenvalue and the corresponding right and left eigenvectors of a general square matrix, not assumed to be Hermitian or normal. The eigenvalue result is well known to a broad scientific community. The treatment of eigenvectors is more complicated, with a perturbation theory that is not so well known outside a community of specialists. We give two different proofs of the main eigenvector perturbation theorem. The first, a block-diagonalization technique inspired by the numerical linear algebra research community and based on the implicit function theorem, has apparently not appeared in the literature in this form. The second, based on complex function theory and on eigenprojectors, as is standard in analytic perturbation theory, is a simplified version of well-known results in the literature. The second derivation uses a convenient…

Equations160

λ^{'} (τ_{0}) = y_{0}^{*} A^{'} (τ_{0}) x_{0}

λ^{'} (τ_{0}) = y_{0}^{*} A^{'} (τ_{0}) x_{0}

χ = ∥ x_{0} ∥∥ y_{0} ∥ \geq ∣ y_{0}^{*} x_{0} ∣ = 1,

χ = ∥ x_{0} ∥∥ y_{0} ∥ \geq ∣ y_{0}^{*} x_{0} ∣ = 1,

Π_{0} = x_{0} y_{0}^{*}

Π_{0} = x_{0} y_{0}^{*}

A_{0} Π_{0} = λ_{0} Π_{0} = Π_{0} A_{0} and Π_{0}^{2} = Π_{0} .

A_{0} Π_{0} = λ_{0} Π_{0} = Π_{0} A_{0} and Π_{0}^{2} = Π_{0} .

λ^{'} (τ_{0}) = tr (Π_{0} A^{'} (τ_{0})) .

λ^{'} (τ_{0}) = tr (Π_{0} A^{'} (τ_{0})) .

X = [x_{0}, X_{1}], Y = [y_{0}, Y_{1}], Y^{*} X = I_{n}, and Y^{*} A_{0} X = [λ_{0} 0 0 B_{1}] .

X = [x_{0}, X_{1}], Y = [y_{0}, Y_{1}], Y^{*} X = I_{n}, and Y^{*} A_{0} X = [λ_{0} 0 0 B_{1}] .

I_{n} = X Y^{*} = x_{0} y_{0}^{*} + X_{1} Y_{1}^{*} .

I_{n} = X Y^{*} = x_{0} y_{0}^{*} + X_{1} Y_{1}^{*} .

S=X_{1}\big{(}B_{1}-\lambda_{0}I_{n-1}\big{)}^{-1}Y_{1}^{*}.

S=X_{1}\big{(}B_{1}-\lambda_{0}I_{n-1}\big{)}^{-1}Y_{1}^{*}.

S Π_{0} = Π_{0} S = 0 and (A_{0} - λ_{0} I_{n}) S = S (A_{0} - λ_{0} I_{n}) = Π_{1} .

S Π_{0} = Π_{0} S = 0 and (A_{0} - λ_{0} I_{n}) S = S (A_{0} - λ_{0} I_{n}) = Π_{1} .

A (τ) x (τ) = λ (τ) x (τ), y (τ)^{*} A (τ) = λ (τ) y (τ)^{*}

A (τ) x (τ) = λ (τ) x (τ), y (τ)^{*} A (τ) = λ (τ) y (τ)^{*}

x^{'} (τ_{0})

x^{'} (τ_{0})

(y^{*})^{'} (τ_{0})

\frac{∥ x ^{'} ( τ _{0} ) ∥}{∥ x _{0} ∥} \leq κ (X) ∥ (λ_{0} I_{n - 1} - B_{1})^{- 1} ∥∥ A^{'} (τ_{0}) ∥,

\frac{∥ x ^{'} ( τ _{0} ) ∥}{∥ x _{0} ∥} \leq κ (X) ∥ (λ_{0} I_{n - 1} - B_{1})^{- 1} ∥∥ A^{'} (τ_{0}) ∥,

\frac{∥ x ^{'} ( τ _{0} ) ∥}{∥ x _{0} ∥} \leq \frac{κ ( X ) ∥ A ^{'} ( τ _{0} ) ∥}{min _{j = 1, \dots, n - 1} { ∣ λ _{0} - λ _{j} ∣ }}

\frac{∥ x ^{'} ( τ _{0} ) ∥}{∥ x _{0} ∥} \leq \frac{κ ( X ) ∥ A ^{'} ( τ _{0} ) ∥}{min _{j = 1, \dots, n - 1} { ∣ λ _{0} - λ _{j} ∣ }}

A (τ) Π (τ) = λ (τ) Π (τ) = Π (τ) A (τ) and Π (τ)^{2} = Π (τ),

A (τ) Π (τ) = λ (τ) Π (τ) = Π (τ) A (τ) and Π (τ)^{2} = Π (τ),

Π^{'} (τ_{0}) = - Π_{0} A^{'} (τ_{0}) S - S A^{'} (τ_{0}) Π_{0} .

Π^{'} (τ_{0}) = - Π_{0} A^{'} (τ_{0}) S - S A^{'} (τ_{0}) Π_{0} .

A^{'} (τ_{0}) x_{0} + A_{0} x^{'} (τ_{0}) = λ^{'} (τ_{0}) x_{0} + λ_{0} x^{'} (τ_{0}) .

A^{'} (τ_{0}) x_{0} + A_{0} x^{'} (τ_{0}) = λ^{'} (τ_{0}) x_{0} + λ_{0} x^{'} (τ_{0}) .

λ^{'} (τ_{0}) = y_{0}^{*} A^{'} (τ_{0}) x_{0} .

λ^{'} (τ_{0}) = y_{0}^{*} A^{'} (τ_{0}) x_{0} .

(A^{'} (τ_{0}) - λ^{'} (τ_{0}) I_{n}) x_{0} = - (A_{0} - λ_{0} I_{n}) x^{'} (τ_{0}) .

(A^{'} (τ_{0}) - λ^{'} (τ_{0}) I_{n}) x_{0} = - (A_{0} - λ_{0} I_{n}) x^{'} (τ_{0}) .

A_{0}-\lambda_{0}I_{n}=X\left[\begin{array}[]{cc}0&0\\ 0&B_{1}-\lambda_{0}I_{n-1}\end{array}\right]Y^{*},

A_{0}-\lambda_{0}I_{n}=X\left[\begin{array}[]{cc}0&0\\ 0&B_{1}-\lambda_{0}I_{n-1}\end{array}\right]Y^{*},

Y^{*}(A^{\prime}(\tau_{0})-\lambda^{\prime}(\tau_{0})I_{n})x_{0}=-\left[\begin{array}[]{cc}0&0\\ 0&B_{1}-\lambda_{0}I_{n-1}\end{array}\right]Y^{*}x^{\prime}(\tau_{0}).

Y^{*}(A^{\prime}(\tau_{0})-\lambda^{\prime}(\tau_{0})I_{n})x_{0}=-\left[\begin{array}[]{cc}0&0\\ 0&B_{1}-\lambda_{0}I_{n-1}\end{array}\right]Y^{*}x^{\prime}(\tau_{0}).

Y_{1}^{*} (A^{'} (τ_{0}) - λ^{'} (τ_{0}) I_{n}) x_{0} = - (B_{1} - λ_{0} I_{n - 1}) Y_{1}^{*} x^{'} (τ_{0}),

Y_{1}^{*} (A^{'} (τ_{0}) - λ^{'} (τ_{0}) I_{n}) x_{0} = - (B_{1} - λ_{0} I_{n - 1}) Y_{1}^{*} x^{'} (τ_{0}),

Y_{1}^{*} x^{'} (τ_{0}) = - (B_{1} - λ_{0} I_{n - 1})^{- 1} Y_{1}^{*} A^{'} (τ_{0}) x_{0} .

Y_{1}^{*} x^{'} (τ_{0}) = - (B_{1} - λ_{0} I_{n - 1})^{- 1} Y_{1}^{*} A^{'} (τ_{0}) x_{0} .

x^{'} (τ_{0}) = - X_{1} (B_{1} - λ_{0} I_{n - 1})^{- 1} Y_{1}^{*} A^{'} (τ_{0}) x_{0} = - S A^{'} (τ_{0}) x_{0} .

x^{'} (τ_{0}) = - X_{1} (B_{1} - λ_{0} I_{n - 1})^{- 1} Y_{1}^{*} A^{'} (τ_{0}) x_{0} = - S A^{'} (τ_{0}) x_{0} .

[x (τ) y (τ)^{*}]^{'} = x^{'} (τ) y (τ)^{*} + x (τ) [y (τ)^{*}]^{'} .

[x (τ) y (τ)^{*}]^{'} = x^{'} (τ) y (τ)^{*} + x (τ) [y (τ)^{*}]^{'} .

[γ_{11} (τ) c_{21} (τ) c_{12}^{*} (τ) C_{22} (τ)] = Y^{*} A (τ) X = [λ_{0} 0 0 B_{1}] + Y^{*} (A (τ) - A_{0}) X .

[γ_{11} (τ) c_{21} (τ) c_{12}^{*} (τ) C_{22} (τ)] = Y^{*} A (τ) X = [λ_{0} 0 0 B_{1}] + Y^{*} (A (τ) - A_{0}) X .

P (τ) = [1 p (τ) - q (τ)^{*} I_{n - 1}], Q (τ) = [1 q (τ) - p (τ)^{*} I_{n - 1}],

P (τ) = [1 p (τ) - q (τ)^{*} I_{n - 1}], Q (τ) = [1 q (τ) - p (τ)^{*} I_{n - 1}],

D (τ) = Q (τ)^{*} P (τ) = [1 + q (τ)^{*} p (τ) 0 0 I_{n - 1} + p (τ) q (τ)^{*}],

D (τ) = Q (τ)^{*} P (τ) = [1 + q (τ)^{*} p (τ) 0 0 I_{n - 1} + p (τ) q (τ)^{*}],

Q (τ)^{*} Y^{*} A (τ) X P (τ) D (τ)^{- 1} = [λ (τ) 0 0 B (τ)],

Q (τ)^{*} Y^{*} A (τ) X P (τ) D (τ)^{- 1} = [λ (τ) 0 0 B (τ)],

[γ_{11} + q^{*} c_{21} + c_{12}^{*} p + q^{*} C_{22} p - (γ_{11} I_{n - 1} - C_{22}) p + c_{21} - p c_{12}^{*} p - q^{*} (γ_{11} I_{n - 1} - C_{22}) + c_{12}^{*} - q^{*} c_{21} q^{*} C_{22} - p c_{12}^{*} - c_{21} q^{*} + p γ_{11} q^{*}] .

[γ_{11} + q^{*} c_{21} + c_{12}^{*} p + q^{*} C_{22} p - (γ_{11} I_{n - 1} - C_{22}) p + c_{21} - p c_{12}^{*} p - q^{*} (γ_{11} I_{n - 1} - C_{22}) + c_{12}^{*} - q^{*} c_{21} q^{*} C_{22} - p c_{12}^{*} - c_{21} q^{*} + p γ_{11} q^{*}] .

f (τ, p (τ))

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

First-order Perturbation Theory for

Eigenvalues and Eigenvectors

Anne Greenbaum Department of Applied Mathematics, University of Washington.

Ren-cang Li Department of Mathematics, University of Texas at Arlington. Supported in part by National Science Foundation Grants CCF-1527104 and DMS-1719620.

Michael L. Overton Courant Institute of Mathematical Sciences, New York University. Supported in part by National Science Foundation Grant DMS-1620083.

(

Dedicated to Peter Lancaster and G.W. (Pete) Stewart

Masters of Analytic Perturbation Theory and Numerical Linear Algebra

on the Occasion of their 90th and 79th Birthdays )

Abstract

We present first-order perturbation analysis of a simple eigenvalue and the corresponding right and left eigenvectors of a general square matrix, not assumed to be Hermitian or normal. The eigenvalue result is well known to a broad scientific community. The treatment of eigenvectors is more complicated, with a perturbation theory that is not so well known outside a community of specialists. We give two different proofs of the main eigenvector perturbation theorem. The first, a block-diagonalization technique inspired by the numerical linear algebra research community and based on the implicit function theorem, has apparently not appeared in the literature in this form. The second, based on complex function theory and on eigenprojectors, as is standard in analytic perturbation theory, is a simplified version of well-known results in the literature. The second derivation uses a convenient normalization of the right and left eigenvectors defined in terms of the associated eigenprojector, but although this dates back to the 1950s, it is rarely discussed in the literature. We then show how the eigenvector perturbation theory is easily extended to handle other normalizations that are often used in practice. We also explain how to verify the perturbation results computationally. We conclude with some remarks about difficulties introduced by multiple eigenvalues and give references to work on perturbation of invariant subspaces corresponding to multiple or clustered eigenvalues. Throughout the paper we give extensive bibliographic commentary and references for further reading.

1 Introduction

Eigenvalue perturbation theory is an old topic dating originally to the work of Rayleigh in the 19th century. Broadly speaking, there are two main streams of research. The most classical is analytic perturbation theory (APT), where one considers the behavior of eigenvalues of a matrix or linear operator that is an analytic function of one or more parameters. Authors of well-known books describing this body of work include Kato [Kat66, Kat76, Kat82, Kat95],111The first edition of Kato’s masterpiece Perturbation Theory for Linear Operators was published in 1966 and a revised second edition appeared in 1976. The most recent edition is the 1995 reprinting of the second edition with minor corrections. Most of this book is concerned with linear operators, but the first two chapters treat the finite-dimensional case of matrices, and these appeared as a stand-alone short version in 1982. Since we are only concerned with matrices in this article, our references to Kato’s book are to the 1982 edition, although in any case the equation numbering is consistent across all editions. Rellich [Rel69], Chatelin [Cha11], Baumgärtel [Bau85] and, in text book form, Lancaster and Tismenetsky [LT85]. Kato [Kat82, p. XII]) and Baumgärtel [Bau85, p. 21] explain that it was Rellich who first established in the 1930s that when a Hermitian matrix or self-adjoint linear operator with an isolated eigenvalue $\lambda$ of multiplicity $m$ is subjected to a real analytic perturbation, that is a convergent power series in a real parameter $\kappa$ , then (1) it has exactly $m$ eigenvalues converging to $\lambda$ as $\kappa\to 0$ , (2) these eigenvalues can also be expanded in convergent power series in $\kappa$ and (3) the corresponding eigenvectors can be chosen to be mutually orthogonal and may also be written as convergent power series. As Kato notes, these results are exactly what were anticipated by Rayleigh, Schrödinger and others, but to prove them is by no means trivial, even in the finite-dimensional case.

The second stream of research is largely due to the numerical linear algebra (NLA) community. It is mostly restricted to matrices and generally concerns perturbation bounds rather than expansions, describing how to bound the change in the eigenvalues and associated eigenvectors or invariant subspaces when a given matrix is subjected to a perturbation with a given norm and structure. Here there are a wide variety of well-known results due to many of the founders of matrix analysis and numerical linear algebra: Gerschgorin, Hoffman and Wielandt, Mirksy, Lidskii, Ostrowski, Bauer and Fike, Henrici, Davis and Kahan, Varah, Ruhe, Stewart, Elsner, Demmel and others. These are discussed in many books, of which the most comprehensive include those by Wilkinson [Wil65], Stewart and Sun [SS90], Bhatia [Bha97, Bha07] and Stewart [Ste01], as well as Chatelin [Cha12], which actually covers both the APT and the NLA streams of research in some detail. See also the survey by Li [Li14]. An important branch of the NLA stream concerns the pseudospectra of a matrix; see the book by Trefethen and Embree [TE05] and the Pseudospectra Gateway web site [ET].

This paper is inspired by both the APT and the NLA streams of research, and its scope is limited to an important special case: first-order perturbation analysis of a simple eigenvalue and the corresponding right and left eigenvectors of a general square matrix, not assumed to be Hermitian or normal. The eigenvalue result is well known to a broad scientific community. The treatment of eigenvectors is more complicated, with a perturbation theory that is not so well known outside a community of specialists. We give two different proofs of the main eigenvector perturbation theorem. The first, inspired by the NLA research stream and based on the implicit function theorem, has apparently not appeared in the literature in this form. The second, based on complex function theory and on eigenprojectors, as is standard in APT, is largely a simplified version of results in the literature that are well known. The second derivation uses a convenient normalization of the right and left eigenvectors that depends on the perturbation parameter, but although this dates back to the 1950s, it is rarely discussed in the literature. We then show how the eigenvector perturbation theory is easily extended to handle other normalizations that are often used in practice. We also explain how to verify the perturbation results computationally. In the final section, we illustrate the difficulties introduced by multiple eigenvalues with two illuminating examples, and give references to work on perturbation of invariant subspaces corresponding to multiple or clustered eigenvalues.

2 First-order perturbation theory for a simple eigenvalue

Throughout the paper we use $\|\cdot\|$ to denote the vector or matrix 2-norm, $I_{n}$ to denote the identity matrix of order $n$ , the superscript $\operatorname{T}$ to denote transpose and $*$ to denote complex conjugate transpose. Greek lower case letters denote complex scalars. Latin lower case letters denote complex vectors, with the exception of $i$ for the imaginary unit and $j,k,\ell,m,n$ for integers. Upper case letters denote complex matrices or, in some cases, sets in the complex plane. We begin with an assumption that also serves to establish our notation.

Assumption 1

Let $A_{0}\in\mathbb{C}^{n\times n}$ have a simple eigenvalue $\lambda_{0}$ corresponding to right eigenvector $x_{0}$ (so $Ax_{0}=\lambda_{0}x_{0}$ with $x_{0}\neq 0$ ) and left eigenvector $y_{0}$ (so $y_{0}^{*}A=\lambda_{0}y_{0}^{*}$ with $y_{0}\neq 0$ ), normalized so that $y_{0}^{*}x_{0}=1$ . Let $\tau_{0}\in\mathbb{C}$ and let $A(\tau)$ be a complex-valued matrix function of a complex parameter $\tau$ that is analytic in a neighborhood of $\tau_{0}$ , satisfying $A(\tau_{0})=A_{0}$ .

Remark 1

The normalization $y_{0}^{*}x_{0}=1$ is always possible since the right and left eigenvector corresponding to a simple eigenvalue cannot be orthogonal. Note that since $x_{0}$ and $y_{0}$ are unique only up to scalings, we may multiply $x_{0}$ by any nonzero complex scalar $\omega$ provided we also scale $y_{0}$ by the reciprocal of the conjugate of $\omega$ so that $y_{0}^{*}x_{0}$ remains equal to one. The use of the complex conjugate transpose in $y_{0}^{*}$ instead of an ordinary transpose is purely a convention that is often, but not universally, followed. The statement that the matrix $A(\tau)$ is analytic means that each entry of $A(\tau)$ is analytic (equivalently, complex differentiable or holomorphic) in $\tau$ in a neighborhood of $\tau_{0}$ .

The most basic result in eigenvalue perturbation theory follows.

Theorem 1

(Eigenvalue Perturbation Theorem)* Under Assumption 1, $A(\tau)$ has a unique eigenvalue $\lambda(\tau)$ that is analytic in a neighborhood of $\tau_{0}$ , with $\lambda(\tau_{0})=\lambda_{0}$ and with*

[TABLE]

where $\lambda^{\prime}(\tau_{0})$ and $A^{\prime}(\tau_{0})$ are respectively the derivatives of $\lambda(\tau)$ and $A(\tau)$ at $\tau=\tau_{0}$ .

The proof appears in the next section.

The quantity

[TABLE]

introduced by [Wil65], is called the eigenvalue condition number for $\lambda_{0}$ . We have $|\lambda^{\prime}(\tau_{0})|\leq\chi\|A^{\prime}(\tau_{0})\|$ . In the real case, $\chi$ is the reciprocal of the cosine of the angle between $x_{0}$ and $y_{0}$ . In the special case that $A_{0}$ is Hermitian, its right and left eigenvectors coincide so $\chi=1$ , but in this article we are concerned with general square matrices.

In the APT research stream, instead of eigenvectors, the focus is mostly on the eigenprojector222In APT, the standard term is “eigenprojection”, while in NLA, “spectral projector” is often used. The somewhat nonstandard term “eigenprojector” is a compromise. corresponding to $\lambda_{0}$ , which can be defined as

[TABLE]

and which satisfies

[TABLE]

Note that the eigenprojector does not depend on the normalization used for the eigenvectors $x_{0}$ and $y_{0}$ (assuming $y_{0}^{*}x_{0}=1$ ), which simplifies the associated perturbation theory, and note also that $\chi=\|\Pi_{0}\|$ . Let $\mathrm{tr}$ denote trace and recall the property $\mathrm{tr}(XY)=\mathrm{tr}(YX)$ . Clearly, equation (1) is equivalent to

[TABLE]

Kato [Kat82, p. XIII] explains that the results of Rellich for analytic perturbations of self-adjoint linear operators were extended (by Sz-Nagy, Kato and others) to non-self-adjoint linear operators and therefore non-Hermitian matrices in the early 1950s using complex function theory, so (3), equivalently (1), was known at that time. However, it seems that these results were not well known until the publication of the first edition of Kato’s book in 1966 (although Kato did present a summary of these results for the linear case at a conference on matrix computations [Giv58, p. 104] in 1958). Eq. (1) was independently obtained for the analytic case by Lancaster [Lan64], and for the linear case by Wilkinson [Wil65, p.68–69] and Lidskii [Lid66]. They all used the theory of algebraic functions to obtain their results, exploiting the property that eigenvalues are roots of the characteristic polynomial. A different technique is used by Stewart and Sun [SS90, p. 185] who show that the eigenvalue is differentiable w.r.t. its matrix argument using a proof depending on Gerschgorin circles; the result for a differentiable family $A(\tau)$ then follows from the ordinary chain rule.

We close this section with a brief discussion of multiple eigenvalues. The algebraic multiplicity of $\lambda_{0}$ is the multiplicity of the factor $\lambda-\lambda_{0}$ in the characteristic polynomial $\det(A_{0}-\lambda I_{n})$ , while the geometric multiplicity (which is always less than or equal to the algebraic multiplicity) is the number of associated linearly independent right (equivalently, left) eigenvectors. A simple eigenvalue has both algebraic and geometric multiplicity equal to one. More generally, if the algebraic and geometric multiplicity are equal, the eigenvalue is said to be semisimple or nondefective. An eigenvalue whose geometric multiplicity is one is called nonderogatory.

3 First-order perturbation theory for an eigenvector corresponding to a simple eigenvalue

We begin this section with a basic result from linear algebra; see [Ste01, Theorem 1.18 and eq. (3.10)] for a proof.

Lemma 1

Suppose Assumption 1 holds. There exist matrices $X_{1}\in\mathbb{C}^{n\times(n-1)}$ , $Y_{1}\in\mathbb{C}^{n\times(n-1)}$ and $B_{1}\in\mathbb{C}^{(n-1)\times(n-1)}$ satisfying

[TABLE]

Note that, from $Y^{*}X=I_{n}$ , it is immediate that the columns of $X_{1}$ and $Y_{1}$ respectively span the null spaces of $y_{0}^{*}$ and $x_{0}^{*}$ . Furthermore, we have

[TABLE]

We also have $A_{0}X_{1}=X_{1}B_{1}$ and $Y_{1}^{*}A_{0}=B_{1}Y_{1}^{*}$ , so the columns of $X_{1}$ and $Y_{1}$ are respectively bases for right and left ( $n-1$ )-dimensional invariant subspaces of $A_{0}$ , and $I_{n}=\Pi_{0}+\Pi_{1}$ where $\Pi_{1}=X_{1}Y_{1}^{*}$ is the complementary projector to $\Pi_{0}$ . If we assume that $A_{0}$ is diagonalizable, i.e., with $n$ linearly independent eigenvectors, then we can take the columns of $X_{1}$ and of $Y_{1}$ to respectively be right and left eigenvectors corresponding to the eigenvalues of $A_{0}$ that differ from $\lambda_{0}$ , which we may denote by $\lambda_{1},\ldots,\lambda_{n-1}$ (some of which could coincide, as diagonalizability implies only that the eigenvalues are semisimple, not that they are simple). In this case, we can take $B_{1}$ to be the diagonal matrix $\mathrm{diag}(\lambda_{1},\ldots,\lambda_{n-1})$ . More generally, however, $X_{1}$ and $Y_{1}$ may be any matrices satisfying (4), ignoring the multiplicities and Jordan structure of the other eigenvalues.

Now let

[TABLE]

It then follows that

[TABLE]

In the NLA stream of research, $S$ is called the group inverse of $A_{0}-\lambda_{0}I_{n}$ [SS90, p. 240–241], [MS88], [CM91], [GO11, Theorem 5.2.]. In the APT research stream, it is called the reduced resolvent matrix of $A_{0}$ w.r.t. the eigenvalue $\lambda_{0}$ (see [Kat82, eqs. I.5.28 and II.2.11].)333The notion of group inverse or reduced resolvent extends beyond the simple eigenvalue context to multiple eigenvalues. If $\lambda_{0}$ is a defective eigenvalue with a nontrivial Jordan structure, the reduced resolvent matrix of $A$ with respect to $\lambda_{0}$ must take account of “eigennilpotents”. It is the same as the Drazin inverse of $A_{0}-\lambda_{0}I_{n}$ , a generalization of the group inverse (see [Cha12, p.98], [CM91] and, for a method to compute the Drazin inverse, [GOS15]).

We now give a first-order perturbation theorem for right and left eigenvectors corresponding to a simple eigenvalue.

Theorem 2

(Eigenvector Perturbation Theorem)* Suppose that Assumption 1 holds and define $X_{1}$ , $Y_{1}$ and $B_{1}$ as in Lemma 1 and $S$ as in (5). Then there exist vector-valued functions $x(\tau)$ and $y(\tau)^{*}$ that are analytic in a neighborhood of $\tau_{0}$ with $x(\tau_{0})=x_{0}$ , $y(\tau_{0})=y_{0}$ and $y(\tau)^{*}x(\tau)=1$ , satisfying the right and left eigenvector equations*

[TABLE]

where $\lambda(\tau)$ is the analytic function from Theorem 1. Furthermore, these can be chosen so that their derivatives, $x^{\prime}(\tau)$ and $(y^{*})^{\prime}(\tau)$ , satisfy $y_{0}^{*}x^{\prime}(\tau_{0})=0$ and $(y^{*})^{\prime}(\tau_{0})x_{0}=0$ , with444We use the notation $(y^{*})^{\prime}(\tau_{0})$ to mean $\frac{d}{d\tau}(y(\tau)^{*})|_{\tau=\tau_{0}}$ .

[TABLE]

Note that it is $y(\tau)^{*}$ , not $y(\tau)$ , that is analytic with respect to the complex parameter $\tau$ . However, $y(\tau)$ is differentiable w.r.t. the real and imaginary parts of $\tau$ . Note also that we do not claim that $x(\tau)$ and $y(\tau)$ are unique, even when they are chosen to satisfy (7) and (8). Sometimes, other normalizations of the eigenvectors, not necessarily satisfying (7) and (8), are preferred, as we shall discuss in §3.4.

It follows from Theorem 2 that

[TABLE]

where $\kappa(X)=\kappa(Y)=\|X\|\|Y\|$ , the ordinary matrix condition number of $X$ , equivalently of $Y$ (as $Y^{*}=X^{-1}$ ), with the same bound also holding for $\|(y^{*})^{\prime}(\tau_{0})\|/\|y_{0}\|$ .

In the diagonalizable case, as already noted above, we can take $B_{1}=\mathrm{diag}(\lambda_{1},\ldots,\lambda_{n-1})$ , so

[TABLE]

with the same bound also holding for $\|(y^{*})^{\prime}(\tau_{0})\|/\|y_{0}\|$ . In this case, the formula (7) for the eigenvector derivative was given by Wilkinson [Wil65, p. 70]. He remarked (p. 109) that although his derivation is essentially classical perturbation theory, a simple but rigorous treatment did not seem to be readily available in the literature. Lancaster [Lan64] and Lidskii [Lid66] both showed that the perturbed eigenvector corresponding to a simple eigenvalue $\lambda_{0}$ may be defined to be differentiable at $\lambda_{0}$ , but they did not give the first-order perturbation term. The books by Stewart and Sun [SS90, sec. V.2] and Stewart [Ste01, sec. 1.3 and 4.2] give excellent discussions of the issues summarized above as well as many additional related results. The eigenvector derivative formula (7) in Theorem 2 above is succinctly stated just below [Ste01, eq. (3.14), p. 46], where on the same page Theorem 3.11 stating it more rigorously, and providing additional bounds, is also given; see also [Ste01, line 4, p. 48]. The reader is referred to [Ste71] and [Ste73] for a proof. Stewart [Ste71] introduced the idea of establishing the existence of a solution to an algebraic Riccati equation by a fixed point iteration, a technique that was followed up in [Ste73, eq. (1.5), p. 730] and [Dem86, eq. (7.2), p.187]. Alternatively, proofs of Theorem 2 may be derived by various approaches based on the implicit function theorem; see [Mag85, Sun85] and [Sun98, Sec. 2.1]. A related argument appears in [Lax07, Theorem 9.8]. These approaches generally focus on obtaining results for the right eigenvector subject to some normalization; they can also be applied to obtain results for the left eigenvector, and these can be normalized further to obtain the condition $y(\tau)^{*}x(\tau)=1$ . The proof that we give in §3.2 is also based on the implicit function theorem, using a block-diagonalization approach that obtains the perturbation results for the right and left eigenvectors simultaneously, ensuring that $y(\tau)^{*}x(\tau)=1$ . Note, however, that a fundamental difficulty with eigenvectors is their lack of uniqueness. In contrast, the eigenprojector is uniquely defined, and satisfies the following perturbation theorem.

Theorem 3

(Eigenprojector Perturbation Theorem)* Suppose Assumption 1 holds and define $X_{1}$ , $Y_{1}$ and $B_{1}$ as in Lemma 1 and $S$ as in (5). Then there exists a matrix-valued function $\Pi(\tau)$ that is analytic in a neighborhood of $\tau_{0}$ with $\Pi(\tau_{0})=\Pi_{0}$ , satisfying the eigenprojector equations*

[TABLE]

and with derivative given by

[TABLE]

This result is well known in the APT research stream [Kat82, eq. (II.2.13)], and, like the eigenvalue perturbation result, goes back to the 1950s. Furthermore, while it’s easy to see how Theorem 3 can be proved using Theorem 2, it is also the case that Theorem 2 can be proved using Theorem 3, by defining the eigenvectors appropriately in terms of the eigenprojector, as discussed below. This provides a convenient way to define eigenvectors uniquely.

We note that Theorems 1, 2 and 3 simplify significantly when $A(\tau)$ is a Hermitian function of a real parameter $\tau$ , because then the right and left eigenvectors coincide. The results for the Hermitian case lead naturally to perturbation theory for singular values and singular vectors of a general rectangular matrix; see [Ste01, Sec. 3.3.1] and [Sun98, Sec. 3.1].

3.1 Nonrigorous derivation of the formulas in Theorems 1,

2, and 3

If we assume that for $\tau$ in some neighborhood of $\tau_{0}$ , the matrix $A(\tau)$ has an eigenvalue $\lambda(\tau)$ and corresponding right and left eigenvectors $x(\tau)$ and $y(\tau)$ with $y(\tau)^{*}x(\tau)=1$ such that $\lambda(\tau)$ , $x(\tau)$ and $y(\tau)^{*}$ are all analytic functions of $\tau$ satisfying $\lambda(\tau_{0})=\lambda_{0}$ , $x(\tau_{0})=x_{0}$ , and $y(\tau_{0})=y_{0}$ , then differentiating the equation $A(\tau)x(\tau)=\lambda(\tau)x(\tau)$ and setting $\tau=\tau_{0}$ , we find

[TABLE]

Multiplying on the left by $y_{0}^{*}$ and using $y_{0}^{*}A_{0}=\lambda_{0}y_{0}^{*}$ and $y_{0}^{*}x_{0}=1$ , we obtain the formula for $\lambda^{\prime}(\tau_{0})$ :

[TABLE]

Equation (11) can be written in the form

[TABLE]

Using Lemma 1, we can write

[TABLE]

and substituting this into (12) and multiplying on the left by $Y^{*}$ , we find

[TABLE]

The first row equation here is $y_{0}^{*}(A^{\prime}(\tau_{0})-\lambda^{\prime}(\tau_{0})I_{n})x_{0}=0$ , which is simply the formula for $\lambda^{\prime}(\tau_{0})$ . The remaining $n-1$ equations are

[TABLE]

and since $Y_{1}^{*}x_{0}=0$ and $B_{1}-\lambda_{0}I_{n-1}$ is invertible, we obtain the following formula for $Y_{1}^{*}x^{\prime}(\tau_{0})$ :

[TABLE]

Note that $x^{\prime}(\tau_{0})$ is not completely determined by this formula because each eigenvector $x(\tau)$ is determined only up to a multiplicative constant. If we can choose the scale factor in such a way that $y_{0}^{*}x^{\prime}(\tau_{0})=0$ then, multiplying on the left by $X_{1}$ and recalling that $X_{1}Y_{1}^{*}=I_{n}-x_{0}y_{0}^{*}$ , we obtain the formula in (7) for $x^{\prime}(\tau_{0})$ :

[TABLE]

Similarly, the formula (8) for $(y^{*})^{\prime}(\tau_{0})$ can be derived assuming that we can choose $y(\tau)$ so that $(y^{*})^{\prime}(\tau_{0})x_{0}=0$ .

Once formulas (7) and (8) are established, formula (10) follows immediately from

[TABLE]

Evaluating at $\tau=\tau_{0}$ and using formulas (7) and (8) for $x^{\prime}(\tau_{0})$ and $y^{\prime}(\tau_{0})$ , we obtain formula (10) for $\Pi^{\prime}(\tau_{0})$ .

In the following subsections, we establish the assumptions used here when $\lambda_{0}$ is a simple eigenvalue of $A_{0}$ , and thus obtain proofs of Theorems 1, 2, and 3, in two different ways. The first involves finding equations that a similarity transformation must satisfy if it is to take $A(\tau)$ (or, more specifically, $Y^{*}A(\tau)X$ ) to a block diagonal form like that in Lemma 1 for $A_{0}$ . The implicit function theorem555Since the perturbation parameter $\tau$ and the matrix family $A(\tau)$ are complex, we need a version of the implicit function theorem from complex analysis, but in the special case that $\tau$ and $A(\tau)$ are real, we could use a more familiar version from real analysis. In that case, although some of the eigenvalues and eigenvectors of a real matrix may be complex, they occur in complex conjugate pairs and are easily represented using real quantities. is then invoked to show that these equations have a unique solution, for $\tau$ in some neighborhood of $\tau_{0}$ , and that the solution is analytic in $\tau$ . The second uses the argument principle and the residue theorem from complex analysis to establish that, for $\tau$ in a neighborhood of $\tau_{0}$ , each matrix $A(\tau)$ has a simple eigenvalue $\lambda(\tau)$ that is analytic in $\tau$ and satisfies $\lambda(\tau_{0})=\lambda_{0}$ . It then follows from Lemma 1 that there is a similarity transformation taking $A(\tau)$ to block diagonal form, but Lemma 1 says nothing about analyticity or even continuity of the associated matrices $X(\tau)$ and $Y(\tau)^{*}$ . Instead, the similarity transformation is applied to the resolvent and integrated to obtain an expression for the eigenprojector $\Pi(\tau)$ that is shown to be analytic in $\tau$ . Finally, left and right eigenvectors satisfying the analyticity conditions along with the derivative formulas (7) and (8) are defined in terms of the eigenprojector.

Note that the assumptions used here do not generally hold when $\lambda_{0}$ is not a simple eigenvalue of $A_{0}$ , as discussed in §4.

3.2 First proof of Theorems 1, 2 and 3, using techniques from the NLA research stream

The first proof that we give is inspired by the NLA research steam, but instead of Stewart’s fixed-point iteration technique mentioned previously, we rely on the implicit function theorem [Kra01, Theorem 1.4.11], which we now state in the form that we need.

Theorem 4

(Implicit Function Theorem)* Let $D\subset\mathbb{C}\times\mathbb{C}^{\ell}$ be an open set, $h=(h_{1},\ldots,h_{\ell}):D\rightarrow\mathbb{C}^{\ell}$ an analytic mapping, and $(\tau_{0},z^{0})\in D$ a point where $h(\tau_{0},z^{0})=0$ and where the Jacobian matrix $\left(\frac{\partial h_{j}}{\partial z_{k}}\right)_{j,k=1}^{\ell}$ is nonsingular. Then the system of equations $h(\tau,z)=0$ has a unique analytic solution $z=z(\tau)$ in a neighborhood of $\tau_{0}$ that satisfies $z(\tau_{0})=z^{0}$ .*

We now exploit this result in our proof of Theorem 2. The setting of the stage before applying the implicit function theorem follows Demmel’s variant of Stewart’s derivation mentioned above. We obtain a proof of Theorem 1 along the way, and then give a proof of Theorem 3 as an easy consequence.

Using Lemma 1, define

[TABLE]

Here the scalar $\gamma_{11}$ , the row and column vectors $c_{12}^{*}$ and $c_{21}$ and the $(n-1)\times(n-1)$ matrix $C_{22}$ are analytic functions of $\tau$ near $\tau_{0}$ , since $A(\tau)$ is. In what follows, we will transform this matrix into a block diagonal matrix by a similarity transformation. We will choose $p(\tau)$ , $q(\tau)$ so that

[TABLE]

with $p(\tau)$ and $q(\tau)^{*}$ , and consequently $P(\tau)$ , $Q(\tau)^{*}$ , and $D(\tau)$ , analytic in a neighborhood of $\tau_{0}$ , with $p(\tau_{0})=q(\tau_{0})=0$ , and hence $P(\tau_{0})=Q(\tau_{0})=D(\tau_{0})=I_{n}$ . This transformation idea traces back to [Ste73, p.730] who designed a $P$ with $p=q$ , but in the form given here, it is due to [Dem86, p.187].

We would like to have, for $\tau$ sufficiently close to $\tau_{0}$ , the similarity transformation

[TABLE]

where $\lambda(\tau)$ and $B(\tau)$ are also analytic, with $\lambda(\tau_{0})=\lambda_{0}$ and $B(\tau_{0})=B_{1}$ . Since $D(\tau)$ is block diagonal by definition, we need $Q(\tau)^{*}Y^{*}A(\tau)XP(\tau)$ to be block diagonal. Suppressing the dependence on $\tau$ , this last matrix is given by

[TABLE]

For clarity, we introduce the notation $w(\tau)$ for the analytic row vector function $q(\tau)^{*}$ . We then seek column and row vector analytic functions $p(\tau)$ and $w(\tau)$ making the off-diagonal blocks of (15) zero, i.e., satisfying

[TABLE]

Taking $\ell=n-1$ , $z=0$ , and $h$ equal to first $f$ and then $g$ with $s$ equal to $p$ and $w$ , respectively, in Theorem 4, we note that since $f(\tau_{0},0)=0$ , $g(\tau_{0},0)=0$ , and the Jacobian matrices

[TABLE]

are nonsingular, there are unique functions $p$ and $w$ , analytic in a neighborhood of $\tau_{0}$ , satisfying (16) and (17) with $p(\tau_{0})=0$ , $w(\tau_{0})=0$ , and

[TABLE]

where, using the definition (13), we have

[TABLE]

Thus, (15) is block diagonal and hence (14) holds, with $\lambda(\tau)$ an eigenvalue of $A(\tau)$ , and with

[TABLE]

again suppressing dependence on $\tau$ on the right-hand sides. These functions are analytic in a neighborhood of $\tau_{0}$ , satisfying $\lambda(\tau_{0})=\lambda_{0}$ and $B(\tau_{0})=B_{1}$ , with

[TABLE]

proving Theorem 1.

Let $e_{1}$ be the first column of the identity matrix $I_{n}$ . Multiplying (14) on the left by $XQ(\tau)^{-*}$ and on the right by $e_{1}$ we obtain, using $Y^{*}X=I$ and $D(\tau)=Q(\tau)^{*}P(\tau)$ ,

[TABLE]

so

[TABLE]

is analytic with $x(\tau_{0})=x_{0}$ and satisfies the desired right eigenvector equation in (6). Likewise, multiplying (14) on the right by $D(\tau)P(\tau)^{-1}Y^{*}$ and on the left by $e_{1}^{\operatorname{T}}$ gives

[TABLE]

so

[TABLE]

is analytic with $y(\tau_{0})=y_{0}$ and satisfies the left eigenvector equation in (6). Furthermore, $y(\tau)^{*}x(\tau)=1$ , as claimed. Finally, differentiating $x(\tau)$ and $y(\tau)^{*}$ we have

[TABLE]

so combining these with (18) and (19), recalling that $w(\tau)=q(\tau)^{*}$ , and using the definition of $S$ in (5), we obtain the eigenvector derivative formulas (7), (8). The properties $y_{0}^{*}x^{\prime}(\tau_{0})=0$ and $(y^{*})^{\prime}(\tau_{0})x_{0}=0$ follow, so Theorem 2 is proved.

Finally, define

[TABLE]

The eigenprojector equations (9) follow immediately. We have

[TABLE]

so Theorem 3 follows from (7) and (8).

$\Box$

3.3 Second proof of Theorems 1, 2 and 3, using techniques from the APT research stream

In this proof, in contrast to the previous one, we focus on proving Thereom 3 first, obtaining the proof of Theorem 1 along the way, and finally obtaining Theorem 2 as a consequence. This proof of Theorem 3 is based on complex function theory, as is standard in APT. However, our derivation is simpler than most given in the literature, which usually prove more general results, such as giving complete analytic expansions for the eigenvalue and eigenprojector, while we are concerned only with the first order term. The key to the last part of the proof, yielding Theorem 2, is to use an appropriate eigenvector normalization.

The main tool here is the residue theorem [MH06, p. 293-294, Thm. 8.1 and 8.2]:

Theorem 5

(Residue Theorem)* Let $D$ be a simply connected domain in $\mathbb{C}$ and let $\Gamma$ be a simple closed positively oriented contour that lies in $D$ . If $f$ is analytic inside $\Gamma$ and on $\Gamma$ , except at the points $\zeta_{1},\ldots,\zeta_{m}$ that lie inside $\Gamma$ , then*

[TABLE]

where if $f$ has a simple pole at $\zeta_{\ell}$ , then

[TABLE]

and if $f$ has a pole of order $k$ at $\zeta_{\ell}$ , then

[TABLE]

Let $\Gamma$ be the boundary of an open set $\Delta$ in the complex plane containing the simple eigenvalue $\lambda_{0}$ , with no other eigenvalues of $A_{0}$ in $\Delta\cup\Gamma$ . First note that since $\det(A_{0}-\lambda I_{n})$ , the characteristic polynomial $p_{0}$ of $A_{0}$ , does not vanish on $\Gamma$ , the same will hold for all polynomials with coefficients sufficiently close to those of $p_{0}$ ; in particular, it will hold for $\det(A(\tau)-\lambda I_{n})$ , the characteristic polynomial $p_{\tau}$ of $A(\tau)$ , if $\tau$ is sufficiently close to $\tau_{0}$ ; say, $|\tau-\tau_{0}|\leq\epsilon$ . From here on, we always assume that $|\tau-\tau_{0}|\leq\epsilon$ . By the argument principle [MH06, p. 328, Thm. 8.8], the number of zeros of $p_{\tau}$ inside $\Delta$ is

[TABLE]

For $\tau=\tau_{0}$ , this value is $1$ . Since for each $\zeta\in\Gamma$ , the integrand $\frac{d}{d\zeta}p_{\tau}(\zeta)/p_{\tau}(\zeta)$ is a continuous function of $\tau$ , the integral above is as well. Since it is integer-valued, it must be the constant $1$ . So, let $\lambda(\tau)$ denote the unique root of $p_{\tau}$ in the region $\Delta$ , i.e., the unique eigenvalue of $A(\tau)$ in $\Delta$ . Note that this means that $p_{\tau}(\zeta)$ can be written in the form $(\zeta-\lambda(\tau))q(\zeta)$ , where $q$ has no roots in $\Delta\cup\Gamma$ . It therefore follows from the residue theorem that

[TABLE]

since

[TABLE]

Since the left-hand side of (20) is an analytic function of $\tau$ , the right-hand side is as well. Thus $A(\tau)$ has a unique eigenvalue $\lambda(\tau)$ in $\Delta$ and $\lambda(\tau)$ is an analytic function of $\tau$ .

For $\zeta$ not an eigenvalue of $A(\tau)$ , define the resolvent of $A(\tau)$ by

[TABLE]

Lemma 1 states that there exist left and right eigenvectors $y_{0}(\tau)$ and $x_{0}(\tau)$ associated with $\lambda(\tau)$ and satisfying $y_{0}(\tau)^{*}x_{0}(\tau)=1$ , along with matrices $X_{1}(\tau)\in\mathbb{C}^{n\times(n-1)}$ , $Y_{1}(\tau)\in\mathbb{C}^{n\times(n-1)}$ and $B_{1}(\tau)\in\mathbb{C}^{(n-1)\times(n-1)}$ , satisfying

[TABLE]

Note that we do not claim that $X(\tau)$ and $Y(\tau)^{*}$ are analytic, or even continuous functions of $\tau$ . It follows that the resolvent of $A(\tau)$ satisfies

[TABLE]

where $S(\zeta;\tau)=X_{1}(\tau)(B_{1}(\tau)-\zeta I_{n-1})^{-1}Y_{1}(\tau)^{*}$ . Now, $S(\zeta;\tau)$ is a matrix-valued function of $\zeta$ with no poles in $\Delta$ , so it follows from the residue theorem (applied to the functions associated with each entry of $S(\zeta;\tau)$ ) that $\int_{\Gamma}S(\zeta;\tau)\,d\zeta=0$ and therefore from (25) that

[TABLE]

For $\tau=\tau_{0}$ , this is $\Pi(\tau_{0})=\Pi_{0}$ .

From the definition of the resolvent (21), it follows that since $A(\tau)$ is an analytic function of $\tau$ , the resolvent $R(\zeta;A(\tau))$ is as well, provided that $\zeta$ is not an eigenvalue of $A(\tau)$ . Differentiating the equation

[TABLE]

with respect to $\tau$ gives

[TABLE]

Considering expression (26) for $\Pi(\tau)$ , it follows that $\Pi(\tau)$ is also analytic, and its derivative at $\tau=\tau_{0}$ is

[TABLE]

Using (25), and writing $S_{0}(\zeta)$ for $S(\zeta;\tau_{0})$ , we obtain

[TABLE]

From the residue theorem, the first term is zero since the integrand has a pole of order 2 at $\lambda_{0}$ , with $\mathrm{Res}[(\lambda_{0}-\zeta)^{-2},\lambda_{0}]=0$ . The second term is also zero because the integrand has no poles inside $\Gamma$ . Since the integrand of the remaining term has a simple pole at $\lambda_{0}$ with $\mathrm{Res}[(\lambda_{0}-\zeta)^{-1},\lambda_{0}]=-1$ , we have

[TABLE]

where $S_{0}(\lambda_{0})=S(\lambda_{0};\tau_{0})=X_{1}(\tau_{0})(B_{1}(\tau_{0})-\lambda_{0}I)^{-1}Y_{1}(\tau_{0})$ is the same as $S$ defined in (5). This proves Theorem 3.

Now we define the eigenvectors in terms of the eigenprojector, using a normalization that goes back to [SN51] (see [Bau85, eq. (7.1.12)]):

[TABLE]

where we use the principal branch of the square root function (and assume that $\epsilon$ is small enough so that $|\tau-\tau_{0}|\leq\epsilon$ implies that the quantities under the square roots, which are $1$ for $\tau=\tau_{0}$ , are bounded away from the origin and the negative real axis).666See also [Kat82, eq. (II.3.24)], which uses a related definition but without the square root, resulting in $y(\tau)^{*}x_{0}=y_{0}^{*}x(\tau)=1$ instead of $y(\tau)^{*}x(\tau)=1$ . Since $\Pi(\tau)$ is analytic, it follows that $x(\tau)$ and $y(\tau)^{*}$ are as well. From (28) we have

[TABLE]

and similarly $y(\tau)^{*}A(\tau)=\lambda(\tau)y(\tau)^{*}$ , so the eigenvector equations (6) hold as required, and, since $\Pi(\tau)^{2}=\Pi(\tau)$ , we obtain

[TABLE]

as claimed in Theorem 2.

To obtain the eigenvalue derivative, we differentiate the equation

[TABLE]

and evaluate it at $\tau=\tau_{0}$ :

[TABLE]

Multiplying by $y_{0}^{*}$ on the left, this becomes

[TABLE]

proving Theorem 1.

Finally, using (10), we have

[TABLE]

The first three terms are zero since $Sx_{0}=-X_{1}(\lambda_{0}I-B_{1})^{-1}Y_{1}^{*}x_{0}=0$ and likewise $y_{0}^{*}S=0$ . So, as $\Pi_{0}x_{0}=x_{0}$ , we have

[TABLE]

Similarly, $(y^{*})^{\prime}(\tau_{0})=-y_{0}^{*}A^{\prime}(\tau_{0})S$ . The properties $y_{0}^{*}x^{\prime}(\tau_{0})=0$ and $(y^{*})^{\prime}(\tau_{0})x_{0}=0$ follow, so the proof of Theorem 2 is complete.

$\Box$

3.4 Eigenvector normalizations

Theorem 2 does not state formulas for the analytic eigenvector functions $x(\tau)$ and $y(\tau)^{*}$ , specifying only that they exist, satisfying $y(\tau)^{*}x(\tau)=1$ , with derivatives given by (7) and (8). Furthermore, the first proof given in §3.2 does not provide formulas for $x(\tau)$ , $y(\tau)$ , showing only that they exist via the implicit function theorem. However, the second proof given in §3.3 does provide formulas for $x(\tau)$ and $y(\tau)$ in terms of the eigenprojector $\Pi(\tau)$ (which is uniquely defined) and the eigenvectors $x_{0}$ and $y_{0}$ . This formula (28) may be viewed as a normalization because it provides a way to define the eigenvectors uniquely, and furthermore, it has the property that $x(\tau)$ and $y(\tau)^{*}$ are analytic near $\tau_{0}$ and satisfy $y(\tau)^{*}x(\tau)=1$ . Let us refer to (28) as “normalization 0”.

Although the beautifully simple normalization (28) dates to the 1950s, it seems to be rarely used. In this subsection we discuss some other normalizations that are more commonly used in practice. Let us denote the resulting normalized eigenvectors by $\hat{x}(\tau)$ and $\hat{y}(\tau)$ , and relate them to $x(\tau)$ and $y(\tau)$ , as defined in (28), by

[TABLE]

where $\alpha(\tau)$ and $\beta(\tau)$ are two nonzero complex-valued scalar functions of $\tau$ to be defined below. Here we use the complex conjugate of $\beta$ in the definition to be consistent with the conjugated left eigenvector notation. The analyticity of $\hat{x}(\tau)$ and $\hat{y}(\tau)^{*}$ near $\tau_{0}$ depends on that of $\alpha(\tau)$ and $\beta(\tau)^{*}$ . We consider several possible normalizations, continuing to assume that $y_{0}^{*}x_{0}=1$ but not necessarily that $\hat{y}(\tau)^{*}\hat{x}(\tau)=1$ for $\tau\not=0$ . In all cases, the formula

[TABLE]

follows immediately from (29), so to determine the derivatives $\hat{x}^{\prime}(\tau_{0})$ and $(\hat{y}^{*})^{\prime}(\tau_{0})$ , we need only determine $\alpha(\tau_{0})$ , $\alpha^{\prime}(\tau_{0})$ , $\beta^{*}(\tau_{0})$ and $(\beta^{*})^{\prime}(\tau_{0})$ , obtaining $x^{\prime}(\tau_{0})$ and $(y^{*})^{\prime}(\tau_{0})$ from (7), (8), as stated in Theorem 2. Note that the derivatives of the normalized eigenvectors, $\hat{x}^{\prime}(\tau_{0})$ and $(\hat{y}^{*})^{\prime}(\tau_{0})$ , do not necessarily satisfy $y_{0}^{*}\hat{x}^{\prime}(\tau_{0})=0$ and $(\hat{y}^{*})^{\prime}(\tau_{0})x_{0}=0$ , unlike the derivatives $x^{\prime}(\tau_{0})$ and $(y^{*})^{\prime}(\tau_{0})$ . We now define several different normalizations:

$e_{1}^{\operatorname{T}}\hat{x}(\tau)=e_{1}^{\operatorname{T}}\hat{y}(\tau)=1$ (i.e., the first entries of $\hat{x}(\tau)$ and $\hat{y}(\tau)$ are one). This is possible for $\tau$ sufficiently close to $\tau_{0}$ if $e_{1}^{\operatorname{T}}x_{0}\neq 0$ and $e_{1}^{\operatorname{T}}y_{0}\neq 0$ . Suppose this is the case. Then

[TABLE]

and

[TABLE]

Here $\alpha(\tau)$ , $\beta(\tau)^{*}$ and hence $\hat{x}(\tau)$ , $\hat{y}^{*}(\tau)$ are analytic near $\tau_{0}$ . 2. 2.

$e_{1}^{\operatorname{T}}\hat{x}(\tau)=1$ and $\hat{y}(\tau)^{*}\hat{x}(\tau)=1$ . This normalization is defined near $\tau_{0}$ under the assumption that $e_{1}^{\operatorname{T}}x_{0}\neq 0$ ; no additional assumption is needed since $y_{0}^{*}x_{0}=1$ . Clearly (31) holds as before. In addition, we have

[TABLE]

Again, $\alpha(\tau)$ and $\beta(\tau)^{*}$ are analytic near $\tau_{0}$ . 3. 3.

$\hat{x}(\tau)^{\operatorname{T}}\hat{x}(\tau)=1$ and $\hat{y}(\tau)^{*}\hat{x}(\tau)=1$ . This normalization is defined near $\tau_{0}$ if $x_{0}^{\operatorname{T}}x_{0}\neq 0$ , which may not be the case when $x_{0}$ is complex. Suppose this does hold. We have

[TABLE]

Either sign may be used, resulting in analytic $\alpha(\tau)$ and $\beta^{*}(\tau)$ near $\tau=\tau_{0}$ . 4. 4.

$\hat{x}(\tau)^{*}\hat{x}(\tau)=1$ and $\hat{y}(\tau)^{*}\hat{x}(\tau)=1$ . In a way this is the most natural choice of normalization, because it is possible without any assumptions on $x_{0}$ , $y_{0}$ beyond $y_{0}^{*}x_{0}=1$ . The problem, however, is that it does not define the eigenvectors uniquely. Suppose we choose $\alpha(\tau)=1/\|x(\tau)\|$ , $\beta(\tau)=\|x(\tau)\|$ . Then $\alpha$ , $\beta^{*}$ , $\hat{x}$ and $\hat{y}^{*}$ are not analytic in $\tau$ , but they are differentiable w.r.t. to the real and imaginary parts of $\tau$ . However, we could equally well multiply $\alpha$ and $\beta$ by any unimodular complex number $e^{i\theta}$ , so there are infinitely many different choices of $\alpha$ , $\beta$ that are smooth w.r.t. the real and imaginary parts of $\tau$ (though not analytic in $\tau$ ). A variant on this normalization is known as real-positive (RP) compatibility [GO11]: it requires $y(\tau)^{*}x(\tau)$ to be real and positive with $\|x(\tau)\|=\|y(\tau)\|=1$ .

More generally we could define the normalization in terms of two functions $\psi(\hat{x}(\tau),\hat{y}(\tau))=1$ and $\omega(\hat{x}(\tau),\hat{y}(\tau)))=1$ . Depending on what choice is made, there are several possible outcomes: there could be unique $\alpha(\tau)$ and $\beta(\tau)$ that satisfy the two normalization equations, as in cases 1 and 2 above when $e_{1}^{\operatorname{T}}x_{0}\neq 0$ and $e_{1}^{\operatorname{T}}y_{0}\neq 0$ ; there could be two choices as in case 3 when $x_{0}^{\operatorname{T}}x_{0}\neq 0$ ; there could be an infinite number of choices as in case 4; or there could be no $\alpha(\tau)$ and $\beta(\tau)$ that satisfy the normalization equations.

Perturbation theory for normalized eigenvectors, using several of the normalizations given above, was extensively studied by Meyer and Stewart [MS88] and by Bernasconi, Choirat and Seri [BCS11].

3.5 Computation

Suppose we wish to verify the results of Theorems 1, 2 and 3 computationally — always a good idea! Let us consider how to do this in matlab, where eigenvalues as well as right and left eigenvectors can be conveniently computed by the function eig. Let Assumption 1 hold, assuming for convenience that $\tau_{0}=0$ and $A(\tau)=A_{0}+\tau\Delta A$ for some given matrices $A_{0}$ and $\Delta A$ with $\|A_{0}\|=\|\Delta A\|=1$ . Take $|\tau|$ sufficiently small that only one computed eigenvalue $\tilde{\lambda}$ of $\tilde{A}\equiv A(\tau)$ is close to the eigenvalue $\lambda_{0}$ of $A_{0}$ . Then, assuming the eigenvalue is not too badly conditioned, we can easily verify the eigenvalue perturbation formula (1) in Theorem 1 by computing the finite difference quotient $(\tilde{\lambda}-\lambda_{0})/\tau$ and comparing it with (1). Since the exact eigenvalue $\lambda(\tau)$ of $A+\tau\Delta A$ is analytic in $\tau$ , it is clear that, mathematically, the difference between the difference quotient and the derivative should be $O(|\tau|)$ as $\tau\to 0$ , but numerically, if $|\tau|$ is too small, rounding error dominates the computation instead [Ove01, Ch. 11].

Similarly, we can compute right and left eigenvectors $\tilde{x}$ and $\tilde{y}$ corresponding to $\tilde{\lambda}$ , normalize these so that $\tilde{y}^{*}\tilde{x}=1$ , compute the eigenprojector $\widetilde{\Pi}=\tilde{x}\tilde{y}^{*}$ , and verify that the matrix difference quotient $(\widetilde{\Pi}-\Pi_{0})/\tau$ approximates formula (10) for the eigenprojector derivative given by Theorem 3.

What about the eigenvector derivative formulas? Implementing “normalization 0” defined in (28), we can compute normalizations of $\tilde{x}$ and $\tilde{y}$ by

[TABLE]

Note that the formulas on the right-hand side of (36), (37) avoid computing the eigenprojector, which can be advantageous if $n$ is large. Then we can verify that the difference quotients $(\tilde{\tilde{x}}-x_{0})/\tau$ and $(\tilde{\tilde{y}}^{*}-y_{0}^{*})/\tau$ approximate the eigenvector derivatives (7) and (8) given by Theorem 2.

Now consider normalizations 1 to 4 as defined in § 3.4. For normalization 1 (respectively, normalization 2) the formulas for the normalized eigenvectors and their derivatives given by (29) and (30) together with (31), (32) (respectively (31), (33)) can be verified easily provided the first components of $x_{0}$ and $y_{0}$ are not zero. Of course, the index 1 is arbitrary. A better choice is to use $e_{j}^{\operatorname{T}}\hat{x}(\tau)=e_{k}^{\operatorname{T}}\hat{y}(\tau)=1$ , where $j$ (respectively $k$ ) is the index of an entry of $x_{0}$ (respectively $y_{0}$ ) with maximum modulus, but this requires access to $x_{0}$ and $y_{0}$ . In the case of normalization 3, the formulas for the eigenvectors and their derivatives given by (29) and (30) together with (34) and (35) can also be verified easily, with a caveat due to the freedom in the choice of sign. Specifically, when $\hat{x}(\tau)$ and $\hat{y}(\tau)$ are obtained from the computed vectors $\tilde{x}$ and $\tilde{y}$ , it is important to ensure that the signs of $\hat{x}(\tau)$ and $\hat{x}(\tau_{0})$ (and therefore $\hat{y}(\tau)$ and $\hat{y}(\tau_{0})$ ) are consistent; this can be done by choosing the signs of the real parts (or the imaginary parts) of $e_{j}^{\operatorname{T}}\hat{x}(\tau)$ and $e_{j}^{\operatorname{T}}\hat{x}(\tau_{0})$ to be the same, where $j$ is the index of an entry of $\hat{x}(\tau_{0})$ with maximum modulus.

As for normalization 4, although we could arbitrarily choose $\alpha$ and $\beta^{*}$ to be smooth functions w.r.t. the real and imaginary parts of $\tau$ , there is no way to know how to obtain smoothly varying computed eigenvectors from the unnormalized computed eigenvectors $\tilde{x}$ and $\tilde{y}$ .

Summarizing, the formulas for the derivatives of the eigenvalue and the eigenprojector, given in Theorems 1 and 3, are easily verified numerically, while for eigenvector normalization 0 given by (28) and normalizations 1, 2 and 3 defined in § 3.4, the formulas for the eigenvector derivatives can also usually be verified computationally. However, perhaps surprisingly, there is no panacea when it comes to the eigenvectors. Normalization 0 always requires access to the eigenvectors $x_{0}$ and $y_{0}$ , while the only way to ensure that normalizations 1 and 2 are well defined is by providing access to $x_{0}$ and $y_{0}$ so that indices $j$ and $k$ can be used instead of index 1 if necessary. As for normalization 3, it may not be well defined, and even it it is, care must be taken to avoid inconsistent sign choices. Finally, normalization 4 is simply not well defined.

Verification of the eigenvalue, eigenprojector and eigenvector formulas is illustrated by publicly available matlab programs.777https://cs.nyu.edu/overton/papers/eigvecpert-mfiles/eigvecvary_demo.zip. The main routine is eigValProjVecVaryDemo.m

4 Multiple eigenvalues

Theorems 1, 2 and 3 do not generally hold when $\lambda_{0}$ is a multiple eigenvalue. In this section we consider two illuminating examples where multiple eigenvalues enter the picture. Recall that algebraic and geometric multiplicity, along with the terms semisimple (nondefective) and nonderogatory, were defined at the end of §1. In the following, we use the principal branch of the square root function for definiteness, but any branch would suffice.

Example 1. Let

[TABLE]

with $\tau\to\tau_{0}=0$ . The limit matrix $A_{0}=A(0)$ is a Jordan block, with 0 a defective, nonderogatory eigenvalue with algebraic multiplicity 2 and geometric multiplicity 1. The corresponding right and left eigenvectors are $x_{0}=[1,~{}0]^{\operatorname{T}}$ and $y_{0}=[0,~{}1]^{\operatorname{T}}$ , but these are mutually orthogonal and cannot be scaled so that $y_{0}^{*}x_{0}=1$ . For $\tau\not=0$ , $A(\tau)$ has two simple eigenvalues $\lambda_{1,2}(\tau)=\pm\tau^{1/2}$ , which are not analytic in any neighborhood of 0. The corresponding right and left eigenvectors are uniquely defined, up to scalings, by

[TABLE]

The right eigenvectors $x_{1,2}(\tau)$ are not analytic near 0, and they both converge to the unique right eigenvector $x_{0}$ of $A_{0}$ as $\tau\to 0$ . Likewise $y_{1,2}(\tau)^{*}$ are not analytic, and they both converge to $y_{0}^{*}$ as $\tau\to 0$ . For $\tau\neq 0$ , we can scale the eigenvectors so that $y_{1}(\tau)^{*}x_{1}(\tau)=y_{2}(\tau)^{*}x_{2}(\tau)=1$ , but then either $x_{j}(\tau)$ or $y_{j}(\tau)$ (or both) must diverge as $\tau\to 0$ , for $j=1,2$ .

This example is easily extended to the $n\times n$ case, where $A(\tau)$ is zero except for a single superdiagonal of 1’s and a bottom left entry $\tau$ , so that $A_{0}$ is a single Jordan block with 0 a nonderogatory eigenvalue with algebraic multiplicity $n$ . The eigenvalues of $A(\tau)$ are then the $n$ th roots of unity times $\tau^{1/n}$ . In fact, Lidskii [Lid66] gave a remarkable general perturbation theory for eigenvalues of a linear family $A(\tau)$ for which an eigenvalue of $A_{0}$ may have any algebraic and geometric multiplicities and indeed any Jordan block structure; see also [Bau85, Sec. 7.4] and [MBO97]. These results are not described in Kato’s books. However, Kato does treat eigenvalue perturbation in detail in the case that the eigenvalues of $A_{0}$ are semisimple [Kat82, Sec. II.2.3]; see also [LMZ03]. Even in this case, the behavior can be unexpectedly complex, as the following example shows.

Example 2. [LT85, p. 394] Let

[TABLE]

with $\tau\to 0$ . This time the limit matrix $A_{0}=A(0)$ is the zero matrix, with 0 a semisimple eigenvalue with algebraic and geometric multiplicity 2, so we can take any vectors in $\mathbb{C}^{2}$ as right or left eigenvectors of $A_{0}$ . The eigenvalues of $A(\tau)$ are $\lambda_{1,2}(\tau)=\pm\tau^{3/2}$ , which are not analytic. The corresponding right and left eigenvectors are uniquely defined, up to scaling, by the same formulas (38) as in the previous example. So again $x_{1,2}(\tau)$ (respectively $y_{1,2}(\tau)^{*}$ ) are not analytic, and both converge to the same vector $x_{0}$ (respectively $y_{0}^{*}$ ) as in the previous example when $\tau\rightarrow 0$ , although in this case $A_{0}$ has two linearly independent right (and left) eigenvectors. There is no right eigenvector of $A(\tau)$ that converges to a vector that is linearly independent of $x_{0}$ as $\tau\to 0$ , and likewise no left eigenvector that converges to a vector linearly independent of $y_{0}$ .

In the examples above, a multiple eigenvalue splits apart under perturbation. To avoid dealing with this complexity, one may study the average behavior of a cluster of eigenvalues, and the corresponding invariant subspace, under perturbation. There is a large body of work on this topic: see the books by Kato [Kat82, Sec. II.2.1], Gohberg, Lancaster and Rodman [GLR06], Stewart [Ste01, Ch. 4] and Stewart and Sun [SS90, Ch. 5], the unpublished technical report by Sun [Sun98, Sec. 2.3], two surveys on Stewart’s many contributions by Ipsen [Ips10] and Demmel [Dem10], and papers such as [BDF08, Dem87, DF01, KK14].

In the case of a Hermitian family $A(\tau)$ , the perturbation theory for multiple eigenvalues simplifies greatly; the pioneering results of Rellich were already mentioned in § 1. See [LT85, Sec. 11.7] and [GLR85] for more details.

5 Concluding Remarks

In this paper we have presented two detailed yet accessible proofs of first-order perturbation results for a simple eigenvalue of a matrix and its associated right and left eigenvectors and eigenprojector. We hope this will facilitate the dissemination of these important results to a much broader community of researchers and students than has hitherto been the case. We have also tried to convey the breadth and depth of work in the two principal relevant research streams, Analytic Perturbation Theory and Numerical Linear Algebra. There are, of course, many generalizations that we have not even begun to explore in this article. Just as one example, nonlinear eigenvalue problems, where one replaces $A-\lambda I_{n}$ by a matrix function $F(A,\lambda)$ with polynomial or more general nonlinear dependence on $\lambda$ , arise in many important applications. Perturbation theory for nonlinear eigenvalue problems, representing the APT and NLA communities respectively, may be found in [ACL93] and [BH13]. Finally we remark that our bibliography, while fairly extensive, is in no way intended to be comprehensive.

Bibliography48

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[ACL 93] Alan L. Andrew, K.-W. Eric Chu, and Peter Lancaster. Derivatives of eigenvalues and eigenvectors of matrix functions. SIAM J. Matrix Anal. Appl. , 14(4):903–926, 1993.
2[Bau 85] H. Baumgärtel. Analytic perturbation theory for matrices and operators , volume 15 of Operator Theory: Advances and Applications . Birkhäuser Verlag, Basel, 1985.
3[BCS 11] M. Bernasconi, C. Choirat, and R. Seri. Differentials of eigenvalues and eigenvectors in undamped discrete systems under alternative normalizations. In Proceedings of The World Congress on Engineering , pages 285–287. International Association of Engineers, 2011.
4[BDF 08] David Bindel, James Demmel, and Mark Friedman. Continuation of invariant subspaces in large bifurcation problems. SIAM J. Sci. Comput. , 30(2):637–656, 2008.
5[BH 13] David Bindel and Amanda Hood. Localization theorems for nonlinear eigenvalue problems. SIAM J. Matrix Anal. Appl. , 34(4):1728–1749, 2013.
6[Bha 97] Rajendra Bhatia. Matrix analysis , volume 169 of Graduate Texts in Mathematics . Springer-Verlag, New York, 1997.
7[Bha 07] Rajendra Bhatia. Perturbation bounds for matrix eigenvalues , volume 53 of Classics in Applied Mathematics . Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2007. Reprint of the 1987 original.
8[Cha 11] Françoise Chatelin. Spectral approximation of linear operators , volume 65 of Classics in Applied Mathematics . Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2011. With a foreword by P. Henrici, With solutions to exercises by Mario Ahués, Reprint of the 1983 original.