Rational Minimax Iterations for Computing the Matrix $p$th Root
Evan S. Gawlik

TL;DR
This paper extends rational minimax iteration methods from the matrix square root to the matrix pth root for integers p ≥ 2, analyzing their convergence, stability, and error characteristics.
Contribution
It generalizes Zolotarev's rational minimax iterations to compute matrix pth roots, addressing the lack of recursion for p > 2 and analyzing key properties.
Findings
Iterations exhibit equioscillatory error behavior.
Convergence order and stability are preserved for p > 2.
Numerical examples confirm theoretical predictions.
Abstract
In [E. S. Gawlik, Zolotarev iterations for the matrix square root, arXiv preprint 1804.11000, (2018)], a family of iterations for computing the matrix square root was constructed by exploiting a recursion obeyed by Zolotarev's rational minimax approximants of the function . The present paper generalizes this construction by deriving rational minimax iterations for the matrix root, where is an integer. The analysis of these iterations is considerably different from the case , owing to the fact that when , rational minimax approximants of the function do not obey a recursion. Nevertheless, we show that several of the salient features of the Zolotarev iterations for the matrix square root, including equioscillatory error, order of convergence, and stability, carry over to case . A key role in the analysis is played by the asymptotic…
| Iterations | ||||||
|---|---|---|---|---|---|---|
| Padé- | 0 | 17 | 12 | 6 | 4 | 2 |
| Padé- | 0 | 27 | 7 | 6 | 1 | 0 |
| Minimax- | 0 | 17 | 20 | 2 | 1 | 1 |
| Minimax- | 0 | 34 | 6 | 1 | 0 | 0 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\newsiamremark
remarkRemark
\nobibliography*
\headers
Rational minimax iterations for computing the matrix th rootE. S. Gawlik
Rational Minimax Iterations for Computing the Matrix th Root
Evan S. Gawlik Department of Mathematics, University of Hawaii at Manoa () [email protected]
Abstract
In [\bibentrygawlik2018zolotarev], a family of iterations for computing the matrix square root was constructed by exploiting a recursion obeyed by Zolotarev’s rational minimax approximants of the function . The present paper generalizes this construction by deriving rational minimax iterations for the matrix root, where is an integer. The analysis of these iterations is considerably different from the case , owing to the fact that when , rational minimax approximants of the function do not obey a recursion. Nevertheless, we show that several of the salient features of the Zolotarev iterations for the matrix square root, including equioscillatory error, order of convergence, and stability, carry over to case . A key role in the analysis is played by the asymptotic behavior of rational minimax approximants on short intervals. Numerical examples are presented to illustrate the predictions of the theory.
keywords:
Matrix root, matrix power, rational approximation, minimax, uniform approximation, matrix iteration, Chebyshev approximation, Padé approximation, Newton iteration, Zolotarev
{AMS}
65F30, 65F60, 41A20, 49K35
1 Introduction
In recent years, a growing body of literature has highlighted the usefulness of rational minimax iterations for computing functions of matrices [25, 26, 7, 8, 4]. In these studies, is approximated by a rational function of possessing two properties: closely (and often optimally) approximates in the uniform norm over a subset of the real line, and can be generated from a recursion. A prominent example of such an iteration was introduced by Nakatsukasa and Freund in [26], where it was observed that rational minimax approximants of the function obey a recursion, allowing one to rapidly compute and related decompositions such as the polar decomposition, symmetric eigendecomposition, SVD, and, in subsequent work, the CS decomposition [8]. An analogous recursion for rational minimax approximants of has recently been used to construct iterations for the matrix square root [7], building upon ideas of Beckermann [2]. There, the iterations are referred to as Zolotarev iterations, owing to the role played by explicit formulas for rational minimax approximants of and derived by Zolotarev [31].
The aim of this paper is to introduce a family of rational minimax iterations for computing the principal root of a square matrix , where is an integer. Recall that the principal root of a square matrix having no nonpositive real eigenvalues is the unique solution of whose eigenvalues are contained in [15, Theorem 7.2]. The iterations we propose reduce to the Zolotarev iterations for the matrix square root [7] when , but when , they differ from the Zolotarev iterations in several important ways. Notably, for all integers , the iterations generate a rational function of of which has the property that for scalar inputs, the relative error equioscillates on a certain interval (see Section 2 for our terminology). Remarkably, when , equioscillates often enough to render minimal among all choices of with a fixed numerator and denominator degree [7]. This optimality property is the hallmark of the Zolotarev iterations, and it allows one to appeal to classical results from rational approximation theory to estimate the maximum relative error. When , no such optimality property holds. Much of this paper is devoted to showing that the rational minimax iterations for the root still enjoy many of the same desirable features as the Zolotarev iterations for the square root, despite the absence of optimality in the case . We take care to present our results in such a way that when , the salient features of the Zolotarev iterations are recovered as special cases.
There are a number of connections between the iterations we derive and existing iterations from the literature on the matrix root. We have already mentioned that they reduce to the Zolotarev iterations when . For arbitrary , the two lowest order versions of our rational minimax iterations are scaled variants of the Newton iteration and the inverse Newton iteration [15, Chapter 6], [3, Section 6], [18]. In another limiting case, our iterations reduce to the Padé iterations [21, Section 5]. Relative to these iterations, the rational minimax iterations offer advantages primarily when the matrix has eigenvalues with widely varying magnitudes. As an extreme example, if and is Hermitian positive definite with condition number , convergence is achieved in double-precision arithmetic after just iterations when using our type- rational minimax iteration. In contrast, up to iterations are needed when using the type- Padé iteration. Our numerical experiments indicate that the situation is similar, but less dramatic, for non-normal matrices with eigenvalues away from the positive real axis.
This paper is organized as follows. In Section 2, we review the Zolotarev iterations for the matrix square root by summarizing the contents of [7]. In Section 3, we introduce rational minimax iterations for the matrix root and present our main results: Theorem 3.1, Theorem 3.2, and their corollaries. Proofs of these results are provided separately in Section 4. Finally, Section 5 presents numerical experiments that illustrate the predictions of the theory.
2 Background: Zolotarev iterations for the matrix square root
Let us summarize the Zolotarev iterations for the matrix square root and their key properties [7]. Let denote the set of all rational functions of type – ratios of polynomials of degree to polynomials of degree . We say that a function in has exact type if, after canceling common factors, and have degree exactly and , respectively. The number is called the defect of in . In most of what follows, is a real variable; we use the letter since the behavior of on will play an important role later in the paper.
Given a continuous, increasing bijection and a number , let denote the best type- rational approximant of on :
[TABLE]
It is well-known that the minimization problem above has a unique solution [1, p. 55]. Furthermore, explicit formulas for are known for [31]. Let denote the unique scalar multiple of with the property that
[TABLE]
For and , the Zolotarev iteration of type for computing the square root of a square matrix reads
[TABLE]
It is proven in [7] that in exact arithmetic, and with order of convergence for any with no nonpositive real eigenvalues. In floating point arithmetic, it is necessary to reformulate the iteration to ensure its stability; we detail the stable reformulation of (3-4) later on.
The iteration (3-4) has the remarkable property that it generates an optimal rational approximation of of high degree. Namely, , where
[TABLE]
A simple consequence of this is that if is Hermitian positive definite with eigenvalues in , then
[TABLE]
where
[TABLE]
For more detailed error estimates, including error estimates for non-normal with eigenvalues in , see [7].
3 Minimax iterations for the matrix root
In this paper, we propose an iteration for computing roots of matrices that generalizes (3-4). Given , , and an integer , the iteration reads
[TABLE]
The Zolotarev iterations (3-4) correspond to the cases in (6-7). (Note that we abusively referred to these cases as “the case ” in Section 1).
With the exception of the cases and , explicit formulas for are not known. However, can be computed numerically; see Section 5 for details. Note that the cost of computing is independent of the dimension of , so it is expected to be negligible for problems involving large matrices.
As with the square root iteration (3-4), it is necessary to reformulate the root iteration (6-7) to ensure its stability. This is accomplished by considering the iteration for and implied by (6-7). Exploiting commutativity, we have
[TABLE]
where . (We swapped the order of the first two indices to emphasize that is a rational function of type , not .)
The remainder of this section presents a series of results about the behavior of the iteration (6-7) and its counterpart (8-10). Proofs of these results are given in Section 4.
3.1 Functional iteration
A great deal of information about the behavior of the iteration (6-7) (and hence (8-10)) can be gleaned from a study of the functional iteration
[TABLE]
Indeed, we have in (6-7), and and in (8-10).
The following theorem summarizes the properties of the functional iteration (11-12). In the interest of generality, it focuses on a slight generalization of (11-12) that reduces to (11-12) when the function appearing below is . The theorem makes use of the following terminology. A continuous function is said to equioscillate times on an interval if there exist points at which
[TABLE]
for some . It is well-known that the minimax approximants (1) are uniquely characterized by the property that equioscillates at least times on , where is the defect of in [28, Theorem 24.1]. We will be particularly interested in those functions for which:
- (LABEL:enumisec:pthroot.i)
For every and , has exact type . Furthermore, equioscillates exactly times on , achieves its maximum at , and achieves an extremum at .
The function is satisfies this hypothesis; see Lemma 4.12 for a proof.
Theorem 3.1**.**
Let be a continuous, increasing bijection satisfying (sec:pthroot.i). Let and , and define recursively by
[TABLE]
Then, with and , we have:
- (LABEL:enumisec:pthroot.ii)
For every ,
[TABLE]
and
[TABLE] 2. (LABEL:enumisec:pthroot.iii)
For every , the relative error equioscillates times on , and it achieves its extrema at the endpoints. 3. (LABEL:enumisec:pthroot.iv)
If , is Lipschitz on , and , then monotonically with order of convergence as .
Let us discuss the meaning of this theorem. It states that the iteration (13-14) generates a function with the following curious property: The maximum relative error in on the interval is equal to the maximum relative error in the best rational approximant of on a much smaller interval . Indeed, as increases, the length of remains constant, whereas the length of is by (15), assuming is Lipschitz near . Since rational functions of type can approximate smooth functions on intervals of length with accuracy , we see from (16) that , assuming is smooth enough near . That is, with order of convergence .
For most functions , the iteration (13-14) is not useful, as it (rather circularly) uses (and ) to generate an approximation of . Furthermore, the approximation it generates need not be a rational function of . The function , however, is exceptional, in that the iteration (13-14) – which reduces to (11-12) for this – generates a rational function without requiring the evaluation of any roots.
The following theorem specializes Theorem 3.1 to the case and gives precise information about the constants implicit in the convergence result (sec:pthroot.iv). In it, we use the notation for the rising factorial (the Pochhammer symbol): .
Theorem 3.2**.**
Let , , and with and . Let and be defined by the iteration (11-12), and let and . Then the conclusions (sec:pthroot.ii) and (sec:pthroot.iii) hold with . Furthermore, as , monotonically with
[TABLE]
where
[TABLE]
Note that when and , (18) simplifies to . This is consistent with the results of [7], where it is shown that for these , , and , an asymptotically sharp bound of the form holds with a constant depending on .
3.2 Convergence of the matrix iteration
An immediate consequence of Theorem 3.2 is that the iteration (6-7) converges when is Hermitian positive definite with eigenvalues in .
Corollary 3.3**.**
Let , , and with and . Let be Hermitian positive definite. If the eigenvalues of lie in , then the iteration (6-7) generates a sequence that converges to with order . In particular, we have
[TABLE]
for every , where obeys the recursion
[TABLE]
*and is given by (18). *
A similar result holds for the coupled iteration (8-10).
Corollary 3.4**.**
Let , and be as in Corollary 3.3. Then the coupled iteration (8-10) generates sequences and that converge to and respectively, with order . In particular, we have
[TABLE]
*for every , where obeys the recursion (19). *
Note that the bounds above imply corresponding bounds on the relative errors , , and . For instance,
[TABLE]
When is non-normal and/or has eigenvalues away from the positive real axis, the behavior of the matrix iteration (6-7) (and hence (8-10)) is dictated by the behavior of the scalar iteration (11-12) on complex inputs . This has been analyzed in detail for the case in [8], but for , numerical experiments indicate that the scalar iteration converges in a subset of the complex plane with fractal structure, a typical feature of iterations for the root. We study this behavior numerically in Section 5. It remains an open problem to determine theoretically the convergence region for the iteration (11-12).
3.3 Special cases
For certain values of , , and , the theory above recovers some known results from the literature. We discuss these situations below.
3.3.1 Square roots
When , , and , a remarkable phenomenon occurs, allowing us to draw the connection between Theorem 3.1 and the results of [7] that we alluded to earlier. For these , , and , the function is a rational function of type , where is given by (5). In both the case and the case , we have
[TABLE]
so (sec:pthroot.iii) implies that equioscillates times on . It follows from the theory of rational minimax approximation that is the best rational approximant of of type on :
[TABLE]
In particular,
[TABLE]
for every . This shows that Theorem 3.1 includes [7, Theorem 1] as a special case.
3.3.2 Low-order iterations
When is an integer and or , we recover variants of another family of iterations.
Proposition 3.5**.**
Let be an integer and . We have
[TABLE]
and
[TABLE]
Note that the formula (20) for appears in [24, Theorem 2] and [20]; see also [13, Lemma 3.2] for a related result.
The preceding proposition shows that when , the iteration (6-7) reads
[TABLE]
where
[TABLE]
This is a scaled variant of the popular Newton iteration [15, Equation 7.5] for the matrix root. The scaling heuristic above is reminiscent of one proposed by Hoskins and Walton [17], but theirs is based on type- rational minimax approximants of .
On the other hand, when , the iteration (6-7) reads
[TABLE]
where
[TABLE]
In terms of the matrix , the iteration for becomes
[TABLE]
which is a scaled variant of the inverse Newton iteration [15, Equation (7.12)] for computing .
3.3.3 Padé iterations
We recover one more family of iterations by considering the limit as in (6-7).
Below, we say that a family of rational functions converges coefficientwise to as if the coefficients of the polynomials in the numerator and denominator of , appropriately normalized, approach those of as .
Proposition 3.6**.**
As , converges coefficientwise to the type- Padé approximant of at :
[TABLE]
It follows that the iteration (6-7) reduces formally to
[TABLE]
as . This is precisely the Padé iteration for the matrix root studied by Laszkiewicz and Ziętak [21, Equation (36)]. When , it is the Halley iteration [19, p. 11], [12]. In terms of and , the iteration (25) reads
[TABLE]
where .
For later use, it will be convenient to define
[TABLE]
The Padé iterations (25) and (26-27) are then simply the iterations obtained by setting in the minimax iterations (6-7) and (8-10), respectively.
3.4 Stability of the coupled matrix iteration
As alluded to earlier, the uncoupled matrix iteration (6-7) exhibits numerical instability, whereas the coupled iteration (8-10) does not. We justify the latter claim below.
We recall the following definition. A matrix iteration with fixed point is said to be stable in a neighborhood of if the Fréchet derivative of at has bounded powers at [15, Definition 4.17]. That is, if denotes the Fréchet derivative of at in a direction , then there exists a constant such for every and every , where .
We first address the stability of the coupled Padé iteration (26-27).
Proposition 3.7**.**
Let and with and . The Padé iteration (26-27) is stable in a neighborhood of for any . In particular, with , we have
[TABLE]
*for any , and is idempotent. *
Consider now the coupled minimax iteration (8-10). Theorem 3.1 established that converges to in (10). We argue in Section 5 that when is close to 1, it is numerically prudent to set (and all subsequent iterates) equal to 1, thereby reverting to the Padé iteration (26-27). Since the latter iteration is stable, it follows that the aforementioned modification of (8-10) is stable as well.
4 Proofs
In this section, we prove Theorems 3.1 and 3.2, Corollaries 3.3 and 3.4, and Propositions and 3.5, 3.6, and 3.7.
4.1 Proof of Theorem 3.1
4.1.1 Equioscillation
To prove the claims (sec:pthroot.ii) and (sec:pthroot.iii) in Theorem 3.1, we use an inductive argument. When , (sec:pthroot.iii) holds since the relative error decreases monotonically from to as runs from to . This shows also that , so (15) holds when . Next, we prove two lemmas in preparation for the inductive step.
Lemma 4.1**.**
Let be a continuous, increasing bijection satisfying (sec:pthroot.i). Then the recurrence (14) is equivalent to
[TABLE]
Proof 4.2**.**
Since
[TABLE]
the defining property (2) of implies that
[TABLE]
Also, the assumption (sec:pthroot.i) implies that
[TABLE]
so
[TABLE]
*Since this holds for any , it follows that the recurrence (14) is equivalent to (28). *
Lemma 4.3**.**
Let be a continuous, increasing bijection satisfying (sec:pthroot.i). Let and . Let be any continuous function on with the property that equioscillates times on and achieves its extrema at the endpoints, where and . Define
[TABLE]
*Then equioscillates times on with extrema , and it achieves its extrema at the endpoints. *
Proof 4.4**.**
The assumed equioscillation of on implies that the function equioscillates times on with extrema . If we now define
[TABLE]
then we conclude that equioscillates times on with extrema . Moreover, it achieves its extrema at the endpoints by our assumptions on .
By the same reasoning as above, the function
[TABLE]
has the property that equioscillates times on with extrema , and it achieves its extrema at the endpoints by the assumption (sec:pthroot.i).
Consider now the function
[TABLE]
We claim that equioscillates on with extrema . To see this, we make two observations. First, as runs from to , runs from/to to/from a total of times, achieving its extrema at the endpoints each time. Second, each time runs from/to to/from , equioscillates times with extrema . By counting extrema, we conclude that the composition (29) (minus 1) equioscillates
[TABLE]
times on with extrema .
Finally, consider the function
[TABLE]
In view of the equioscillation of (29), the function equioscillates times on with extrema , and it achieves its extrema at the endpoints. We will complete the proof by showing that . Using the fact that , , and , we have
[TABLE]
Remark 4.5**.**
When , the function
[TABLE]
appearing in the proof above is a rational approximant of the sector function . In fact, the proof above reveals that on each of the segments , , the relative error
[TABLE]
*is real-valued and equioscillates times with extrema . In particular, for , is Zolotarev’s type- best rational approximant of the sign function on [26]. *
We are now ready to prove (sec:pthroot.ii-sec:pthroot.iii). Suppose (sec:pthroot.iii) and (15) hold at step in the iteration (11-12). Then Lemma 4.3 (applied with , , and , so that and ) implies that (sec:pthroot.iii) and (15) hold at step , so in fact they hold for all . It now follows immediately that (16) is equivalent to (28), which, in turn, is equivalent to (14) by Lemma 4.1. This completes the proof of (sec:pthroot.ii-sec:pthroot.iii).
4.1.2 Convergence
We now address the last claim (sec:pthroot.iv) of Theorem 3.1, which concerns the convergence of to [math] in the iteration
[TABLE]
with ,
[TABLE]
and .
Lemma 4.6**.**
*Let , and let be a continuous, increasing bijection satisfying (sec:pthroot.i). If , then is continuous, nonnegative, and nondecreasing on . Furthermore, for every . *
Proof 4.7**.**
It is obvious that is nonnegative and nondecreasing. To show that for every , note that (31) is no larger than the uniform relative error committed by the constant function :
[TABLE]
*for every . This establishes that . The inequality is in fact strict since we assumed (sec:pthroot.i), which implies that the minimizer of the relative error is not a constant function when . It remains to show that is continuous on . We assumed in (sec:pthroot.i) that the minimizer for has defect 0 in for each , so, for each fixed , the map is continuous with respect to the uniform norm at [23]. By considering functions obtained by scaling and translating the input to , we deduce that depends continuously on , again with respect to the uniform norm. Hence, the map is continuous on , and so too is . *
It follows from the above properties of that monotonically in the iteration for every .
4.1.3 Rate of convergence
It remains to show that the order of convergence of to [math] is . As we explained in the paragraph below Theorem 3.1, it suffices to note that when is in a neighborhood of ,
[TABLE]
Indeed, this, together with (16), gives
[TABLE]
assuming is Lipschitz near and . Below, we give more precise information about the constant implicit in (32). We begin with a lemma that shows, in essence, that the uniform error in the best type- rational approximant of a function on a small interval is about times smaller than the uniform error in the type- Padé approximant of .
Lemma 4.8**.**
Let be and positive in a neighborhood of [math]. Assume that the type- Padé approximant of about [math] has defect [math] in , and
[TABLE]
where . For each , let
[TABLE]
Then, as ,
[TABLE]
Proof 4.9**.**
Let
[TABLE]
Among polynomials of degree with unit leading coefficient, the polynomial is the one that deviates least from [math] on . Up to a rescaling, this is precisely the degree- Chebyshev polynomial of the first kind :
[TABLE]
Now let be the type -Padé approximant of
[TABLE]
Since we assumed that the Padé approximant of has defect [math] in , the Taylor coefficients of approach those of as [30, Corollary of Theorem 2a]. It follows that for each sufficiently small,
[TABLE]
for some with as . Thus, for each sufficiently small,
[TABLE]
Hence, as ,
[TABLE]
for every , uniformly in . Multiplying by , we conclude that
[TABLE]
for every , uniformly in . Finally, by the definition of ,
[TABLE]
In fact, this bound is sharp, for the following reason. The relation (34) shows that for sufficiently small, approximately equioscillates, in the sense that there exist points at which alternates in sign and satisfies
[TABLE]
where . The de la Vallée Poussin lower bound [28, Exercise 24.5] then implies that
[TABLE]
Remark 4.10**.**
*The proof above suggests a heuristic for constructing near-best rational minimax approximants on short intervals : one computes the Padé approximant of rather than . *
Remark 4.11**.**
*The near equioscillation of in the proof above can be used to show that is close to : , uniformly in as . The argument is essentially the same as the one used in [29, p. 429-430] to show that Carathéodory-Féjer approximants are close to minimax approximants on small intervals. *
It is now a simple matter to estimate the constant implicit in (32). As , the above lemma gives
[TABLE]
where
[TABLE]
and is the Taylor coefficient of in the difference between and its type- Padé approximant about . A short calculation shows that and , so
[TABLE]
It follows that in the iteration (30), we have
[TABLE]
4.2 Proof of Theorem 3.2
Having proved Theorem 3.1, we now verify that the function satisfies the hypothesis (sec:pthroot.i), and we prove Theorem 3.2.
We begin by establishing a few properties of the minimax approximants . The proof of the following lemma is similar to that in [27, Lemma 2], which studies rational functions of type that minimize the maximum absolute error on rather than the maximum relative error on , . The proof makes use of the following terminology. A Chebyshev system of dimension on an interval is a linearly independent set of continuous functions on with the property that any nontrivial linear combination has at most (distinct) roots in .
Lemma 4.12**.**
Let , , and , . If minimizes
[TABLE]
then has exact type , equioscillates exactly times on , and
[TABLE]
Proof 4.13**.**
Suppose that , where and are polynomials of exact degree and , respectively. Observe that the function
[TABLE]
belongs to the space spanned by
[TABLE]
which is a Chebyshev system on of dimension . Thus, has at most zeros on . In particular, has at most zeros on , so it equioscillates at most times on . But equioscillates at least times on , where . It follows that
[TABLE]
so
[TABLE]
From this we conclude that , , , and equioscillates exactly times on .
Let be the points at which achieves its extrema on . Suppose that or . By considering the graph of , one easily deduces that there exists such that has at least roots in . But
[TABLE]
so has at most roots in . In particular, has at most roots in , a contradiction. It follows that and .
It remains to verify that the signs in (36-37) are correct. Consider the dependence of on the parameters and . Denote this dependence by . By an argument similar to the one made in the proof of Lemma 4.6, the maps and are continuous on and , respectively. These maps also have no zeros, since has a nonzero extremum at for every . Now, for small , the proof of Lemma 4.8 shows that for ,
[TABLE]
*where is the coefficient of in the Taylor expansion of about . In particular, has the same sign as for close to [math], which, as we verify below in (39), is positive. By continuity, for every , and (36-37) follow. *
The preceding lemma shows that the function satisfies the hypothesis (sec:pthroot.i), so Theorem 3.2 will follow if we can show that the constant in the estimate (17) is given by (18). In view of the general estimate (35), it suffices to determine the coefficient of the leading-order term in , where is the Padé approximant (24) of about . This is given by [10, Lemma 3.12]
[TABLE]
Inserting this into (35) and noting that and
[TABLE]
we obtain (18).
4.3 Proof of Corollaries 3.3 and 3.4
To prove Corollaries 3.3 and 3.4, observe that with , we have
[TABLE]
[TABLE]
and
[TABLE]
The results follow from the above equalities and the bounds
[TABLE]
[TABLE]
and
[TABLE]
4.4 Proof of Proposition 3.5
To prove the formula (20) for , it suffices to show that the function
[TABLE]
achieves its global maximum on at both endpoints and has global minimum [math] on . Indeed, if this is the case, then the rescaled function
[TABLE]
has relative error which equioscillates three times on , and so must be the minimizer for . A calculation verifies that has a critical point at , , , is decreasing on , and is increasing on .
The proof of (21) is similar. In this case, a calculation verifies that the function
[TABLE]
has a critical point at , , , is decreasing on , and is increasing on .
4.5 Proof of Proposition 3.6
Trefethen and Gutknecht [30, Theorem 3b] have shown that for any function analytic in a neighborhood of , converges coefficientwise as to the type- Padé approximant of about , provided that the Padé approximant has defect [math] in . Their proof carries over easily to minimizers of the relative error , assuming . Since has defect [math] in [9], Proposition 3.6 follows. The explicit formula (24) for is from [21, p. 954].
4.6 Proof of Proposition 3.7
Since is a Padé approximant of about of type , we have and
[TABLE]
Hence, , , and for any . Thus, with , we obtain
[TABLE]
Setting and , we find that , so is idempotent.
5 Numerical examples
In this section, we present numerical examples and discuss the implementation of the rational minimax iteration (8-10).
5.1 Implementation
Implementing the rational minimax iteration (8-10) requires evaluating the rational function at a matrix argument . With the exception of the special cases detailed in Section 3.3, explicit formulas for this function are not available. Nevertheless, (or, more precisely, its unscaled counterpart ) can be computed numerically using, for instance, the function MiniMaxApproximation from Mathematica’s FunctionApproximations package. We used this function along with Apart to compute in partial fraction form. For close to , the computation of poses numerical difficulties, so we rounded to (thereby reverting to the Padé iteration (26-27)) whenever . We also observed that for close to [math] and , accuracy improved if was computed as , where .
Note that a more robust option for computing minimizers of the maximum absolute error is the Chebfun function minimax [6]. However, Chebfun currently does not support minimization of the maximum relative error .
Algorithm 1 summarizes the implementation of the rational minimax iteration (8-10). For simplicity, it focuses on the type iteration. The type iteration with is similar, but the form of the partial fraction expansion of varies with . In the algorithm, the eigenvalues of with the smallest and largest magnitudes are denoted and , respectively.
The choices of and used in the algorithm are motivated by Corollary 3.4: they ensure that the spectrum of is contained in the annulus . In particular, if is Hermitian positive definite, then the spectrum of is contained in , and Corollary 3.4 is directly applicable. Neither nor need to be computed accurately; our experience suggests that estimates can be used without significantly degrading the algorithm’s performance.
As a termination criterion, we terminated the iterations when
[TABLE]
where is a relative error tolerance. This is a generalization to arbitrary of the termination criterion described in [7, Section 4.3].
Floating point operations
If is and is computed with binary powering in Line 9 of Algorithm 1, then the cost of each iteration in Algorithm 1 is about flops, where [15, p. 72]. In the first iteration, the cost reduces to flops since . If parallelism is exploited, then the matrix inversions in Line 8 can be performed simultaneously, as can Lines 9-10. The effective cost of such a parallel implementation is flops in the first iteration and flops in each remaining iteration. Further savings in computational costs can be achieved when ; see [7, Section 4.2] for details.
5.2 Scalar iteration
Asymptotic convergence rates
To verify the asymptotic convergence rates predicted by Theorem 3.2, we computed , , for various choices of , , , and . Table 1 reports the results for three such choices. (We selected values of , , , and so that the asymptotic regime was reached before convergence to machine precision occurred.) The table demonstrates that the ratios approach the constant given by (18). Note that the entry in the row of the last column should be ignored, since is below machine precision in that instance.
Complex inputs
To study the behavior of the rational function generated by the type- iteration (11-12), we numerically computed the sets
[TABLE]
for various choices of , , , , and . The boundaries of these sets are plotted in Fig. 1. They are plotted in the coordinate plane rather than the usual coordinate plane to facilitate viewing. The shaded regions in the plots correspond to points for which . Numerical evidence indicates that at these points, . Furthermore, the shaded regions have a fractal structure. Both of these phenomena are typical features of iterations for the root when [5].
Fig. 1 gives valuable insight into the behavior of the matrix iteration (6-7) (and, of course, its coupled counterpart (8-10)). Indeed, if is a normal matrix with eigenvalues in , then the iteration (6-7) converges in at most iterations with a relative tolerance in the -norm. As an example, the plot in row 3, column 2 of Fig. 1 demonstrates that contains the set
[TABLE]
when , , and . It follows that the type- iteration (6-7) converges to in at most iterations for any normal matrix with spectrum in the right half plane and .
For comparison, Fig. 2 shows the boundaries of the sets
[TABLE]
where this time is the rational function generated by (11-12) with the initial condition replaced by . By Proposition 3.6, the sets characterize the convergence behavior of the Padé iteration (24) (and its coupled counterpart (26-27)) with the initial iterate scaled by .
Notice that for small (the two rightmost columns of Fig. 2), the sets do not contain scalars with extreme magnitudes ( and ) unless is relatively large. Comparing, for instance, the bottom right plots in Figs. 1 and 2, we see that if is Hermitian positive definite with spectrum in , then the type- rational minimax iteration (11-12) converges in at most iterations, whereas the type- Padé iteration (24) converges in at most . The same observation holds, in fact, for the type- and type- iterations, which are not shown in Figs. 1-2. This is entirely analagous to the behavior observed in the case in [7, Section 5.1]. In fact, with the exception of the low-order iterations, Figs. 1-2 bear a rather strong resemblance to Figs. 1-2 of [7].
It is worth noting that for the low-order iterations, the sets occupy more of the complex plane when is generated from the rational minimax iteration than when is generated from the Padé iteration (see the shaded regions in row 1 of Figs. 1-2). This appears to be a drawback of the low-order rational minimax iterations. The moderate-order and high-order iterations do not suffer as much from this issue; compare the shaded regions in the bottom two rows of Figs. 1-2, which occupy only a small neighborhood of the nonpositive real axis (). The latter observation suggests that for moderate to high and , it is safe to apply Algorithm 1 to matrices with spectrum contained in , where is close to . For matrices with eigenvalues that lie very near but not on the nonpositive real axis, a simple workaround is to compute using any algorithm for the matrix square root, and then compute . One can also compute with , as in [13, 16], but the advantages of minimax approximation over Padé approximation become less pronounced as increases, since has eigenvalues clustered near for large .
5.3 Matrix iteration
To test Algorithm 1, we applied it to a collection of matrices of size from the Matrix Computation Toolbox [14]. We selected those matrices in the toolbox with condition number (where denotes the unit roundoff) and with spectrum contained in the sector . We also included those matrices whose spectrum could be rotated into the aforementioned sector by multiplying by a suitable scalar , . A total of 41 matrices met these criteria.
Fig. 3 plots the relative error in the computed root of for each of the matrices, where . The tests are sorted in order of decreasing , where
[TABLE]
denotes the Frobenius-norm relative condition number of the matrix root of [15, Problem 7.4]. Results for five methods are shown: the rational minimax iterations (8-10) of type and , the Padé iterations (26-27) of type and , and the built-in Matlab function funm. The Padé iterations were implemented using Algorithm 1 with Lines 1-2 replaced by and . The results indicate that the algorithms under consideration behave in a forward stable way, with relative errors mostly lying within a small factor of .
In Table 2, the number of iterations used by each iterative method on the 41 tests are recorded. In analogy with the results of [7], the rational minimax iterations very often converged more quickly than the Padé iterations on these tests.
6 Conclusion
This paper has constructed and analyzed a family of iterations for computing the matrix root using rational minimax approximants of the function . The output of each step of the type- iteration is a rational function of with the property that the scalar function equioscillates times on , where is a parameter depending on . With the exception of the Zolotarev iterations (i.e. and ), this equioscillatory behavior does not render minimal among all choices of with the same numerator and denominator degree. Nevertheless, we have shown that many of the desirable features of the Zolotarev iterations carry over to the general setting. A key role in the analysis was played by the asymptotic behavior of rational minimax approximants on short intervals.
Several topics mentioned in this paper are worth pursuing in more detail. Remark 4.5 leads naturally to a family of rational minimax iterations for the matrix sector function . As , these iterations likely reduce to the Padé iterations for the sector function studied by Laszkiewicz and Ziętak [21, Section 5], so the results therein could inform an analysis of the convergence of the rational minimax iterations on matrices that are non-normal and/or have spectrum away from the positive real axis. Another topic of interest is computing the action of on a vector using rational minimax iterations. Li and Yang [22] address a similar task: computing the action of a spectral filter on using Zolotarev iterations for . It my may be possible to construct a similar algorithm for computing . Finally, the functional iteration (11-12) is of interest in its own right, as it offers a method of rapidly generating rational approximants of with small relative error, a tool that may have applications in, for instance, numerical conformal mapping [11].
Acknowledgments
The author was supported in part by the NSF under grant DMS-1703719.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] N. I. Akhiezer , Theory of approximation , Frederick Ungar Publishing Corporation, 1956.
- 2[2] B. Beckermann , Optimally scaled Newton iterations for the matrix square root , Advances in Matrix Functions and Matrix Equations workshop, Manchester, UK, 2013.
- 3[3] D. A. Bini, N. J. Higham, and B. Meini , Algorithms for the matrix pth root , Numerical Algorithms, 39 (2005), pp. 349–378.
- 4[4] R. Byers and H. Xu , A new scaling for Newton’s iteration for the polar decomposition and its backward stability , SIAM Journal on Matrix Analysis and Applications, 30 (2008), pp. 822–843.
- 5[5] J. R. Cardoso and A. F. Loureiro , Iteration functions for pth roots of complex numbers , Numerical Algorithms, 57 (2011), pp. 329–356.
- 6[6] T. A. Driscoll, N. Hale, and L. N. Trefethen , Chebfun guide , 2014.
- 7[7] E. S. Gawlik , Zolotarev iterations for the matrix square root , ar Xiv preprint 1804.11000, (2018).
- 8[8] E. S. Gawlik, Y. Nakatsukasa, and B. D. Sutton , A backward stable algorithm for computing the CS decomposition via the polar decomposition , SIAM Journal on Matrix Analysis and Applications, 39 (2018), pp. 1448–1469.
