Rational Minimax Iterations for Computing the Matrix $p$th Root

Evan S. Gawlik

arXiv:1903.06268·math.NA·March 18, 2019

Rational Minimax Iterations for Computing the Matrix $p$th Root

Evan S. Gawlik

PDF

TL;DR

This paper extends rational minimax iteration methods from the matrix square root to the matrix pth root for integers p ≥ 2, analyzing their convergence, stability, and error characteristics.

Contribution

It generalizes Zolotarev's rational minimax iterations to compute matrix pth roots, addressing the lack of recursion for p > 2 and analyzing key properties.

Findings

01

Iterations exhibit equioscillatory error behavior.

02

Convergence order and stability are preserved for p > 2.

03

Numerical examples confirm theoretical predictions.

Abstract

In [E. S. Gawlik, Zolotarev iterations for the matrix square root, arXiv preprint 1804.11000, (2018)], a family of iterations for computing the matrix square root was constructed by exploiting a recursion obeyed by Zolotarev's rational minimax approximants of the function $z^{1/2}$ . The present paper generalizes this construction by deriving rational minimax iterations for the matrix $p^{t h}$ root, where $p \geq 2$ is an integer. The analysis of these iterations is considerably different from the case $p = 2$ , owing to the fact that when $p > 2$ , rational minimax approximants of the function $z^{1/ p}$ do not obey a recursion. Nevertheless, we show that several of the salient features of the Zolotarev iterations for the matrix square root, including equioscillatory error, order of convergence, and stability, carry over to case $p > 2$ . A key role in the analysis is played by the asymptotic…

Tables2

Table 1. Table 1: Values of { ε k } k = 1 3 superscript subscript subscript 𝜀 𝑘 𝑘 1 3 \{\varepsilon_{k}\}_{k=1}^{3} generated by the iteration ( 30 ) with f ( z ) = z 1 / p 𝑓 𝑧 superscript 𝑧 1 𝑝 f(z)=z^{1/p} for various choices of m 𝑚 m , ℓ ℓ \ell , p 𝑝 p , and ε 0 subscript 𝜀 0 \varepsilon_{0} . In each instance, the ratios ε k / ε k − 1 m + ℓ + 1 subscript 𝜀 𝑘 superscript subscript 𝜀 𝑘 1 𝑚 ℓ 1 \varepsilon_{k}/\varepsilon_{k-1}^{m+\ell+1} approach the constant C ( m , ℓ , p ) 𝐶 𝑚 ℓ 𝑝 C(m,\ell,p) given by ( 18 ), whose value is recorded in the last row of the table for reference.

	$(m, ℓ, p) = (1, 1, 13)$				$(m, ℓ, p) = (2, 2, 3)$				$(m, ℓ, p) = (3, 3, 5)$
$k$	$ε_{k}$		$ε_{k} / ε_{k - 1}^{m + ℓ + 1}$		$ε_{k}$		$ε_{k} / ε_{k - 1}^{m + ℓ + 1}$		$ε_{k}$		$ε_{k} / ε_{k - 1}^{m + ℓ + 1}$
$0$	$5$	$.0000 \cdot 10^{- 1}$			$9$	$.9999 \cdot 10^{- 1}$			$9$	$.0000 \cdot 10^{- 1}$
$1$	$1$	$.4864 \cdot 10^{- 1}$	$1$	$.19 \cdot 10^{0}$	$7$	$.8215 \cdot 10^{- 1}$	$7$	$.82 \cdot 10^{- 1}$	$4$	$.2647 \cdot 10^{- 2}$	$8$	$.92 \cdot 10^{- 2}$
$2$	$9$	$.5361 \cdot 10^{- 3}$	$2$	$.90 \cdot 10^{0}$	$1$	$.4269 \cdot 10^{- 2}$	$4$	$.87 \cdot 10^{- 2}$	$2$	$.1116 \cdot 10^{- 11}$	$8$	$.23 \cdot 10^{- 2}$
$3$	$3$	$.0325 \cdot 10^{- 6}$	$3$	$.50 \cdot 10^{0}$	$1$	$.4379 \cdot 10^{- 11}$	$2$	$.43 \cdot 10^{- 2}$	$0$	$.0000 \cdot 10^{0}$	$0$	$.00 \cdot 10^{0}$
			$3$	$.50 \cdot 10^{0}$			$2$	$.43 \cdot 10^{- 2}$			$8$	$.25 \cdot 10^{- 2}$

Table 2. Table 2: Number of iterations used by each iterative method in the tests appearing in Fig. 3 .

Iterations	$2$	$3$	$4$	$5$	$\geq 6$
Padé- $(4, 4)$	17	12	6	4	2
Padé- $(8, 8)$	27	7	6	1	0
Minimax- $(4, 4)$	17	20	2	1	1
Minimax- $(8, 8)$	34	6	1	0	0

Equations222

r_{m, ℓ} (\cdot, α, f) = r \in R_{m, ℓ} arg min z \in [f^{- 1} (α), 1] max \frac{r ( z ) - f ( z )}{f ( z )} .

r_{m, ℓ} (\cdot, α, f) = r \in R_{m, ℓ} arg min z \in [f^{- 1} (α), 1] max \frac{r ( z ) - f ( z )}{f ( z )} .

z \in [f^{- 1} (α), 1] min \frac{r ^ _{m, ℓ} ( z , α , f ) - f ( z )}{f ( z )} = 0.

z \in [f^{- 1} (α), 1] min \frac{r ^ _{m, ℓ} ( z , α , f ) - f ( z )}{f ( z )} = 0.

X_{k + 1}

X_{k + 1}

α_{k + 1}

(m_{k}, ℓ_{k}) = {(\frac{1}{2} (2 m)^{k}, \frac{1}{2} (2 m)^{k} - 1), (\frac{1}{2} ((2 m + 1)^{k} - 1), \frac{1}{2} ((2 m + 1)^{k} - 1)), \mbox i f ℓ = m - 1, \mbox i f ℓ = m .

(m_{k}, ℓ_{k}) = {(\frac{1}{2} (2 m)^{k}, \frac{1}{2} (2 m)^{k} - 1), (\frac{1}{2} ((2 m + 1)^{k} - 1), \frac{1}{2} ((2 m + 1)^{k} - 1)), \mbox i f ℓ = m - 1, \mbox i f ℓ = m .

∥ (X_{k} - A^{1/2}) A^{- 1/2} ∥_{2} \leq E_{m_{k}, ℓ_{k}} (\cdot, [α^{2}, 1]),

∥ (X_{k} - A^{1/2}) A^{- 1/2} ∥_{2} \leq E_{m_{k}, ℓ_{k}} (\cdot, [α^{2}, 1]),

E_{m, ℓ} (f, S) = r \in R_{m, ℓ} min z \in S max \frac{r ( z ) - f ( z )}{f ( z )} .

E_{m, ℓ} (f, S) = r \in R_{m, ℓ} min z \in S max \frac{r ( z ) - f ( z )}{f ( z )} .

X_{k + 1}

X_{k + 1}

α_{k + 1}

Y_{k + 1}

Y_{k + 1}

Z_{k + 1}

α_{k + 1}

f_{k + 1} (z)

f_{k + 1} (z)

α_{k + 1}

g (z_{j}) = σ (- 1)^{j} z \in [a, b] max ∣ g (z) ∣, j = 0, 1, \dots, m - 1.

g (z_{j}) = σ (- 1)^{j} z \in [a, b] max ∣ g (z) ∣, j = 0, 1, \dots, m - 1.

f_{k + 1} (z)

f_{k + 1} (z)

α_{k + 1}

α_{k} = \frac{1 - ε _{k}}{1 + ε _{k}}

α_{k} = \frac{1 - ε _{k}}{1 + ε _{k}}

ε_{k + 1} = E_{m, ℓ} (f, [f^{- 1} (α_{k}), 1]) .

ε_{k + 1} = E_{m, ℓ} (f, [f^{- 1} (α_{k}), 1]) .

ε_{k + 1} = C (m, ℓ, p) ε_{k}^{m + ℓ + 1} + o (ε_{k}^{m + ℓ + 1}),

ε_{k + 1} = C (m, ℓ, p) ε_{k}^{m + ℓ + 1} + o (ε_{k}^{m + ℓ + 1}),

C (m, ℓ, p) = \frac{p ^{m + ℓ + 1} m ! ℓ ! ( 1/ p ) _{ℓ + 1} ( 1 - 1/ p ) _{m}}{2 ^{m + ℓ} ( m + ℓ + 1 )! ( m + ℓ )!} .

C (m, ℓ, p) = \frac{p ^{m + ℓ + 1} m ! ℓ ! ( 1/ p ) _{ℓ + 1} ( 1 - 1/ p ) _{m}}{2 ^{m + ℓ} ( m + ℓ + 1 )! ( m + ℓ )!} .

∥ X_{k} A^{- 1/ p} - I ∥_{2} \leq ε_{k},

∥ X_{k} A^{- 1/ p} - I ∥_{2} \leq ε_{k},

ε_{k + 1} = E_{m, ℓ} (p \cdot, [(\frac{1 - ε _{k}}{1 + ε _{k}})^{p}, 1]) = C (m, ℓ, p) ε_{k}^{m + ℓ + 1} + o (ε_{k}^{m + ℓ + 1}), ε_{0} = \frac{1 - α}{1 + α},

ε_{k + 1} = E_{m, ℓ} (p \cdot, [(\frac{1 - ε _{k}}{1 + ε _{k}})^{p}, 1]) = C (m, ℓ, p) ε_{k}^{m + ℓ + 1} + o (ε_{k}^{m + ℓ + 1}), ε_{0} = \frac{1 - α}{1 + α},

∥ Y_{k} A^{- 1/ p} - I ∥_{2}

∥ Y_{k} A^{- 1/ p} - I ∥_{2}

∥ Z_{k} A^{1/ p} - I ∥_{2}

\frac{∥ X _{k} - A ^{1/ p} ∥ _{2}}{∥ A ^{1/ p} ∥ _{2}} = \frac{∥ ( X _{k} A ^{- 1/ p} - I ) A ^{1/ p} ∥ _{2}}{∥ A ^{1/ p} ∥ _{2}} \leq ∥ X_{k} A^{- 1/ p} - I ∥_{2} \leq ε_{k} .

\frac{∥ X _{k} - A ^{1/ p} ∥ _{2}}{∥ A ^{1/ p} ∥ _{2}} = \frac{∥ ( X _{k} A ^{- 1/ p} - I ) A ^{1/ p} ∥ _{2}}{∥ A ^{1/ p} ∥ _{2}} \leq ∥ X_{k} A^{- 1/ p} - I ∥_{2} \leq ε_{k} .

m_{k} + ℓ_{k} = (m + ℓ + 1)^{k} - 1,

m_{k} + ℓ_{k} = (m + ℓ + 1)^{k} - 1,

f_{k} (z) = r_{m_{k}, ℓ_{k}} (z, α, \cdot), if p = 2 and ℓ \in {m - 1, m} .

f_{k} (z) = r_{m_{k}, ℓ_{k}} (z, α, \cdot), if p = 2 and ℓ \in {m - 1, m} .

ε_{k} = E_{m, ℓ} (\cdot, [α_{k}^{2}, 1]) = E_{m_{k}, ℓ_{k}} (\cdot, [α^{2}, 1]), if ℓ \in {m - 1, m},

ε_{k} = E_{m, ℓ} (\cdot, [α_{k}^{2}, 1]) = E_{m_{k}, ℓ_{k}} (\cdot, [α^{2}, 1]), if ℓ \in {m - 1, m},

\overset{r}{^}_{1, 0} (z, α, p \cdot) = \frac{1}{p} ((p - 1) μ + \frac{z}{μ ^{p - 1}}), μ = (\frac{α - α ^{p}}{( p - 1 ) ( 1 - α )})^{1/ p} .

\overset{r}{^}_{1, 0} (z, α, p \cdot) = \frac{1}{p} ((p - 1) μ + \frac{z}{μ ^{p - 1}}), μ = (\frac{α - α ^{p}}{( p - 1 ) ( 1 - α )})^{1/ p} .

\overset{r}{^}_{0, 1} (z, α, p \cdot) = \frac{p}{( p + 1 ) ν - ν ^{p + 1} z}, ν = (\frac{( p + 1 ) ( 1 - α )}{1 - α ^{p + 1}})^{1/ p} .

\overset{r}{^}_{0, 1} (z, α, p \cdot) = \frac{p}{( p + 1 ) ν - ν ^{p + 1} z}, ν = (\frac{( p + 1 ) ( 1 - α )}{1 - α ^{p + 1}})^{1/ p} .

X_{k + 1}

X_{k + 1}

α_{k + 1}

μ_{k} = (\frac{α _{k} - α _{k}^{p}}{( p - 1 ) ( 1 - α _{k} )})^{1/ p} .

μ_{k} = (\frac{α _{k} - α _{k}^{p}}{( p - 1 ) ( 1 - α _{k} )})^{1/ p} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newsiamremark

remarkRemark

\nobibliography*

\headers

Rational minimax iterations for computing the matrix $p$ th rootE. S. Gawlik

Rational Minimax Iterations for Computing the Matrix $p$ th Root

Evan S. Gawlik Department of Mathematics, University of Hawaii at Manoa () [email protected]

Abstract

In [\bibentrygawlik2018zolotarev], a family of iterations for computing the matrix square root was constructed by exploiting a recursion obeyed by Zolotarev’s rational minimax approximants of the function $z^{1/2}$ . The present paper generalizes this construction by deriving rational minimax iterations for the matrix $p^{th}$ root, where $p\geq 2$ is an integer. The analysis of these iterations is considerably different from the case $p=2$ , owing to the fact that when $p>2$ , rational minimax approximants of the function $z^{1/p}$ do not obey a recursion. Nevertheless, we show that several of the salient features of the Zolotarev iterations for the matrix square root, including equioscillatory error, order of convergence, and stability, carry over to case $p>2$ . A key role in the analysis is played by the asymptotic behavior of rational minimax approximants on short intervals. Numerical examples are presented to illustrate the predictions of the theory.

keywords:

Matrix root, matrix power, rational approximation, minimax, uniform approximation, matrix iteration, Chebyshev approximation, Padé approximation, Newton iteration, Zolotarev

{AMS}

65F30, 65F60, 41A20, 49K35

1 Introduction

In recent years, a growing body of literature has highlighted the usefulness of rational minimax iterations for computing functions of matrices [25, 26, 7, 8, 4]. In these studies, $f(A)$ is approximated by a rational function $r$ of $A$ possessing two properties: $r$ closely (and often optimally) approximates $f$ in the uniform norm over a subset of the real line, and $r$ can be generated from a recursion. A prominent example of such an iteration was introduced by Nakatsukasa and Freund in [26], where it was observed that rational minimax approximants of the function $\mathrm{sign}(z)=z/(z^{2})^{1/2}$ obey a recursion, allowing one to rapidly compute $\mathrm{sign}(A)$ and related decompositions such as the polar decomposition, symmetric eigendecomposition, SVD, and, in subsequent work, the CS decomposition [8]. An analogous recursion for rational minimax approximants of $z^{1/2}$ has recently been used to construct iterations for the matrix square root [7], building upon ideas of Beckermann [2]. There, the iterations are referred to as Zolotarev iterations, owing to the role played by explicit formulas for rational minimax approximants of $\mathrm{sign}(z)$ and $z^{1/2}$ derived by Zolotarev [31].

The aim of this paper is to introduce a family of rational minimax iterations for computing the principal $p^{th}$ root $A^{1/p}$ of a square matrix $A$ , where $p\geq 2$ is an integer. Recall that the principal $p^{th}$ root of a square matrix $A$ having no nonpositive real eigenvalues is the unique solution of $X^{p}=A$ whose eigenvalues are contained in $\{z\in\mathbb{C}\mid-\pi/p<\arg z<\pi/p\}$ [15, Theorem 7.2]. The iterations we propose reduce to the Zolotarev iterations for the matrix square root [7] when $p=2$ , but when $p>2$ , they differ from the Zolotarev iterations in several important ways. Notably, for all integers $p\geq 2$ , the iterations generate a rational function of $r$ of $A$ which has the property that for scalar inputs, the relative error $e(z)=(r(z)-z^{1/p})/z^{1/p}$ equioscillates on a certain interval $[a,b]$ (see Section 2 for our terminology). Remarkably, when $p=2$ , $e(z)$ equioscillates often enough to render $\max_{a\leq z\leq b}|e(z)|$ minimal among all choices of $r$ with a fixed numerator and denominator degree [7]. This optimality property is the hallmark of the Zolotarev iterations, and it allows one to appeal to classical results from rational approximation theory to estimate the maximum relative error. When $p>2$ , no such optimality property holds. Much of this paper is devoted to showing that the rational minimax iterations for the $p^{th}$ root still enjoy many of the same desirable features as the Zolotarev iterations for the square root, despite the absence of optimality in the case $p>2$ . We take care to present our results in such a way that when $p=2$ , the salient features of the Zolotarev iterations are recovered as special cases.

There are a number of connections between the iterations we derive and existing iterations from the literature on the matrix $p^{th}$ root. We have already mentioned that they reduce to the Zolotarev iterations when $p=2$ . For arbitrary $p\geq 2$ , the two lowest order versions of our rational minimax iterations are scaled variants of the Newton iteration and the inverse Newton iteration [15, Chapter 6], [3, Section 6], [18]. In another limiting case, our iterations reduce to the Padé iterations [21, Section 5]. Relative to these iterations, the rational minimax iterations offer advantages primarily when the matrix $A$ has eigenvalues with widely varying magnitudes. As an extreme example, if $p=3$ and $A$ is Hermitian positive definite with condition number $\leq 10^{16}$ , convergence is achieved in double-precision arithmetic after just $2$ iterations when using our type- $(6,6)$ rational minimax iteration. In contrast, up to $5$ iterations are needed when using the type- $(6,6)$ Padé iteration. Our numerical experiments indicate that the situation is similar, but less dramatic, for non-normal matrices with eigenvalues away from the positive real axis.

This paper is organized as follows. In Section 2, we review the Zolotarev iterations for the matrix square root by summarizing the contents of [7]. In Section 3, we introduce rational minimax iterations for the matrix $p^{th}$ root and present our main results: Theorem 3.1, Theorem 3.2, and their corollaries. Proofs of these results are provided separately in Section 4. Finally, Section 5 presents numerical experiments that illustrate the predictions of the theory.

2 Background: Zolotarev iterations for the matrix square root

Let us summarize the Zolotarev iterations for the matrix square root and their key properties [7]. Let $\mathcal{R}_{m,\ell}$ denote the set of all rational functions of type $(m,\ell)$ – ratios of polynomials of degree $\leq m$ to polynomials of degree $\leq\ell$ . We say that a function $r(z)=g(z)/h(z)$ in $\mathcal{R}_{m,\ell}$ has exact type $(m^{\prime},\ell^{\prime})$ if, after canceling common factors, $g(z)$ and $h(z)$ have degree exactly $m^{\prime}\leq m$ and $\ell^{\prime}\leq\ell$ , respectively. The number $d=\min\{m-m^{\prime},\ell-\ell^{\prime}\}$ is called the defect of $r$ in $\mathcal{R}_{m,\ell}$ . In most of what follows, $z$ is a real variable; we use the letter $z$ since the behavior of $r$ on $\mathbb{C}$ will play an important role later in the paper.

Given a continuous, increasing bijection $f:[0,1]\rightarrow[0,1]$ and a number $\alpha\in(0,1)$ , let $r_{m,\ell}(z,\alpha,f)$ denote the best type- $(m,\ell)$ rational approximant of $f(z)$ on $[f^{-1}(\alpha),1]$ :

[TABLE]

It is well-known that the minimization problem above has a unique solution [1, p. 55]. Furthermore, explicit formulas for $r_{m,\ell}(\cdot,\alpha,\sqrt{\cdot})$ are known for $\ell\in\{m-1,m\}$ [31]. Let $\hat{r}_{m,\ell}(z,\alpha,f)$ denote the unique scalar multiple of $r_{m,\ell}(z,\alpha,f)$ with the property that

[TABLE]

For $m\in\mathbb{N}$ and $\ell\in\{m-1,m\}$ , the Zolotarev iteration of type $(m,\ell)$ for computing the square root of a square matrix $A$ reads

[TABLE]

It is proven in [7] that in exact arithmetic, $X_{k}\rightarrow A^{1/2}$ and $\alpha_{k}\rightarrow 1$ with order of convergence $m+\ell+1$ for any $A$ with no nonpositive real eigenvalues. In floating point arithmetic, it is necessary to reformulate the iteration to ensure its stability; we detail the stable reformulation of (3-4) later on.

The iteration (3-4) has the remarkable property that it generates an optimal rational approximation of $A^{1/2}$ of high degree. Namely, $\widetilde{X}_{k}:=2\alpha_{k}X_{k}/(1+\alpha_{k})=r_{m_{k},\ell_{k}}(A,\alpha,\sqrt{\cdot})$ , where

[TABLE]

A simple consequence of this is that if $A$ is Hermitian positive definite with eigenvalues in $[\alpha^{2},1]$ , then

[TABLE]

where

[TABLE]

For more detailed error estimates, including error estimates for non-normal $A$ with eigenvalues in $\mathbb{C}\setminus(-\infty,0]$ , see [7].

3 Minimax iterations for the matrix $p^{th}$ root

In this paper, we propose an iteration for computing $p^{th}$ roots of matrices that generalizes (3-4). Given $\alpha\in(0,1)$ , $m,\ell\in\mathbb{N}_{0}$ , and an integer $p\geq 2$ , the iteration reads

[TABLE]

The Zolotarev iterations (3-4) correspond to the cases $\{(m,\ell,p)\mid m\in\mathbb{N},\,\ell\in\{m-1,m\},\,p=2\}$ in (6-7). (Note that we abusively referred to these cases as “the case $p=2$ ” in Section 1).

With the exception of the cases $\{(m,\ell,p)\mid m\in\mathbb{N},\,\ell\in\{m-1,m\},\,p=2\}$ and $\{(m,\ell,p)\mid(m,\ell)\in\{(0,0),(1,0),(0,1)\},\,p\geq 2\}$ , explicit formulas for $\hat{r}_{m,\ell}(z,\alpha,\sqrt[p]{\cdot})$ are not known. However, $\hat{r}_{m,\ell}(z,\alpha,\sqrt[p]{\cdot})$ can be computed numerically; see Section 5 for details. Note that the cost of computing $\hat{r}_{m,\ell}(z,\alpha,\sqrt[p]{\cdot})$ is independent of the dimension of $A$ , so it is expected to be negligible for problems involving large matrices.

As with the square root iteration (3-4), it is necessary to reformulate the $p^{th}$ root iteration (6-7) to ensure its stability. This is accomplished by considering the iteration for $Y_{k}=X_{k}^{1-p}A$ and $Z_{k}=X_{k}^{-1}$ implied by (6-7). Exploiting commutativity, we have

[TABLE]

where $h_{\ell,m,p}(z,\alpha)=r_{m,\ell}(z,\alpha,\sqrt[p]{\cdot})^{-1}$ . (We swapped the order of the first two indices to emphasize that $h_{\ell,m,p}(z,\alpha)$ is a rational function of type $(\ell,m)$ , not $(m,\ell)$ .)

The remainder of this section presents a series of results about the behavior of the iteration (6-7) and its counterpart (8-10). Proofs of these results are given in Section 4.

3.1 Functional iteration

A great deal of information about the behavior of the iteration (6-7) (and hence (8-10)) can be gleaned from a study of the functional iteration

[TABLE]

Indeed, we have $X_{k}=f_{k}(A)$ in (6-7), and $Y_{k}=f_{k}(A)^{1-p}A$ and $Z_{k}=f_{k}(A)^{-1}$ in (8-10).

The following theorem summarizes the properties of the functional iteration (11-12). In the interest of generality, it focuses on a slight generalization of (11-12) that reduces to (11-12) when the function $f$ appearing below is $f(z)=z^{1/p}$ . The theorem makes use of the following terminology. A continuous function $g(z)$ is said to equioscillate $m$ times on an interval $[a,b]$ if there exist $m$ points $a\leq z_{0}<z_{1}<\dots<z_{m-1}\leq b$ at which

[TABLE]

for some $\sigma\in\{-1,1\}$ . It is well-known that the minimax approximants (1) are uniquely characterized by the property that $\frac{r_{m,\ell}(z,\alpha,f)-f(z)}{f(z)}$ equioscillates at least $m+\ell+2-d$ times on $[f^{-1}(\alpha),1]$ , where $d$ is the defect of $r_{m,\ell}(z,\alpha,f)$ in $\mathcal{R}_{m,\ell}$ [28, Theorem 24.1]. We will be particularly interested in those functions $f$ for which:

(LABEL:enumisec:pthroot.i)

For every $\alpha\in(0,1)$ and $m,\ell\in\mathbb{N}_{0}$ , $r_{m,\ell}(z,\alpha,f)$ has exact type $(m,\ell)$ . Furthermore, $\frac{r_{m,\ell}(z,\alpha,f)-f(z)}{f(z)}$ equioscillates exactly $m+\ell+2$ times on $[f^{-1}(\alpha),1]$ , achieves its maximum at $z=f^{-1}(\alpha)$ , and achieves an extremum at $z=1$ .

The function is $f(z)=z^{1/p}$ satisfies this hypothesis; see Lemma 4.12 for a proof.

Theorem 3.1.

Let $f:[0,1]\rightarrow[0,1]$ be a continuous, increasing bijection satisfying (sec:pthroot.i). Let $\alpha\in(0,1)$ and $m,\ell\in\mathbb{N}_{0}$ , and define $f_{k}(z)$ recursively by

[TABLE]

Then, with $\widetilde{f}_{k}(z)=\frac{2\alpha_{k}}{1+\alpha_{k}}f_{k}(z)$ and $\varepsilon_{k}=\max_{z\in[f^{-1}(\alpha),1]}\left|\frac{\widetilde{f}_{k}(z)-f(z)}{f(z)}\right|$ , we have:

(LABEL:enumisec:pthroot.ii)

For every $k\geq 0$ ,

[TABLE]

and

[TABLE] 2. (LABEL:enumisec:pthroot.iii)

For every $k\geq 0$ , the relative error $\frac{\widetilde{f}_{k}(z)-f(z)}{f(z)}$ equioscillates $(m+\ell+1)^{k}+1$ times on $[f^{-1}(\alpha),1]$ , and it achieves its extrema at the endpoints. 3. (LABEL:enumisec:pthroot.iv)

If $f\in C^{m+\ell+1}([\alpha,1])$ , $f^{-1}$ is Lipschitz on $[\alpha,1]$ , and $(m,\ell)\neq(0,0)$ , then $\varepsilon_{k}\rightarrow 0$ monotonically with order of convergence $m+\ell+1$ as $k\rightarrow\infty$ .

Let us discuss the meaning of this theorem. It states that the iteration (13-14) generates a function $\widetilde{f}_{k}(z)\approx f(z)$ with the following curious property: The maximum relative error in $\widetilde{f}_{k}(z)$ on the interval $[f^{-1}(\alpha),1]$ is equal to the maximum relative error in the best rational approximant of $f(z)$ on a much smaller interval $[f^{-1}(\alpha_{k-1}),1]$ . Indeed, as $k$ increases, the length of $[f^{-1}(\alpha),1]$ remains constant, whereas the length of $[f^{-1}(\alpha_{k-1}),1]=[f^{-1}(\alpha_{k-1}),f^{-1}(1)]$ is $O(1-\alpha_{k-1})=O(\varepsilon_{k-1})$ by (15), assuming $f^{-1}$ is Lipschitz near $z=1$ . Since rational functions of type $(m,\ell)$ can approximate smooth functions on intervals of length $O(\varepsilon_{k-1})$ with accuracy $O(\varepsilon_{k-1}^{m+\ell+1})$ , we see from (16) that $\varepsilon_{k}=O(\varepsilon_{k-1}^{m+\ell+1})$ , assuming $f$ is smooth enough near $z=1$ . That is, $\varepsilon_{k}\rightarrow 0$ with order of convergence $m+\ell+1$ .

For most functions $f$ , the iteration (13-14) is not useful, as it (rather circularly) uses $f$ (and $f^{-1}$ ) to generate an approximation of $f$ . Furthermore, the approximation it generates need not be a rational function of $z$ . The function $f(z)=z^{1/p}$ , however, is exceptional, in that the iteration (13-14) – which reduces to (11-12) for this $f$ – generates a rational function $f_{k}(z)$ without requiring the evaluation of any $p^{th}$ roots.

The following theorem specializes Theorem 3.1 to the case $f(z)=z^{1/p}$ and gives precise information about the constants implicit in the convergence result (sec:pthroot.iv). In it, we use the notation $(\beta)_{m}$ for the rising factorial (the Pochhammer symbol): $(\beta)_{m}=\beta(\beta+1)(\beta+2)\cdots(\beta+m-1)$ .

Theorem 3.2.

Let $\alpha\in(0,1)$ , $m,\ell\in\mathbb{N}_{0}$ , and $p\in\mathbb{N}$ with $p>2$ and $(m,\ell)\neq 0$ . Let $f_{k}(z)$ and $\alpha_{k}$ be defined by the iteration (11-12), and let $\widetilde{f}_{k}(z)=\frac{2\alpha_{k}}{1+\alpha_{k}}f_{k}(z)$ and $\varepsilon_{k}=\max_{z\in[\alpha^{p},1]}\left|\frac{\widetilde{f}_{k}(z)-z^{1/p}}{z^{1/p}}\right|$ . Then the conclusions (sec:pthroot.ii) and (sec:pthroot.iii) hold with $f(z)=z^{1/p}$ . Furthermore, as $k\rightarrow\infty$ , $\varepsilon_{k}\rightarrow 0$ monotonically with

[TABLE]

where

[TABLE]

Note that when $p=2$ and $\ell\in\{m-1,m\}$ , (18) simplifies to $C(m,\ell,2)=4^{-(m+\ell)}$ . This is consistent with the results of [7], where it is shown that for these $m$ , $\ell$ , and $p$ , an asymptotically sharp bound of the form $\varepsilon_{k}\leq 4\rho^{-(m+\ell+1)^{k}}$ holds with $\rho$ a constant depending on $\alpha$ .

3.2 Convergence of the matrix iteration

An immediate consequence of Theorem 3.2 is that the iteration (6-7) converges when $A$ is Hermitian positive definite with eigenvalues in $[\alpha^{p},1]$ .

Corollary 3.3.

Let $\alpha\in(0,1)$ , $m,\ell\in\mathbb{N}_{0}$ , and $p,n\in\mathbb{N}$ with $p\geq 2$ and $(m,\ell)\neq(0,0)$ . Let $A\in\mathbb{C}^{n\times n}$ be Hermitian positive definite. If the eigenvalues of $A$ lie in $[\alpha^{p},1]$ , then the iteration (6-7) generates a sequence $\widetilde{X}_{k}=2\alpha_{k}X_{k}/(1+\alpha_{k})$ that converges to $A^{1/p}$ with order $m+\ell+1$ . In particular, we have

[TABLE]

for every $k\geq 0$ , where $\varepsilon_{k}$ obeys the recursion

[TABLE]

*and $C(m,\ell,p)$ is given by (18). *

A similar result holds for the coupled iteration (8-10).

Corollary 3.4.

Let $\alpha,m,\ell,p,n$ , and $A$ be as in Corollary 3.3. Then the coupled iteration (8-10) generates sequences $\widetilde{Y}_{k}=(1+\alpha_{k})^{p-1}Y_{k}/(2\alpha_{k})^{p-1}$ and $\widetilde{Z}_{k}=(1+\alpha_{k})Z_{k}/(2\alpha_{k})$ that converge to $A^{1/p}$ and $A^{-1/p}$ respectively, with order $m+\ell+1$ . In particular, we have

[TABLE]

*for every $k\geq 0$ , where $\varepsilon_{k}$ obeys the recursion (19). *

Note that the bounds above imply corresponding bounds on the relative errors $\|\widetilde{X}_{k}-A^{1/p}\|_{2}/\|A^{1/p}\|_{2}$ , $\|\widetilde{Y}_{k}-A^{1/p}\|_{2}/\|A^{1/p}\|_{2}$ , and $\|\widetilde{Z}_{k}-A^{-1/p}\|_{2}/\|A^{-1/p}\|_{2}$ . For instance,

[TABLE]

When $A$ is non-normal and/or has eigenvalues away from the positive real axis, the behavior of the matrix iteration (6-7) (and hence (8-10)) is dictated by the behavior of the scalar iteration (11-12) on complex inputs $z$ . This has been analyzed in detail for the case $p=2$ in [8], but for $p>2$ , numerical experiments indicate that the scalar iteration converges in a subset of the complex plane with fractal structure, a typical feature of iterations for the $p^{th}$ root. We study this behavior numerically in Section 5. It remains an open problem to determine theoretically the convergence region $\{z\in\mathbb{C}\mid\lim_{k\rightarrow\infty}f_{k}(z)=z^{1/p}\}$ for the iteration (11-12).

3.3 Special cases

For certain values of $m$ , $\ell$ , and $p$ , the theory above recovers some known results from the literature. We discuss these situations below.

3.3.1 Square roots

When $p=2$ , $m\in\mathbb{N}$ , and $\ell\in\{m-1,m\}$ , a remarkable phenomenon occurs, allowing us to draw the connection between Theorem 3.1 and the results of [7] that we alluded to earlier. For these $p$ , $m$ , and $\ell$ , the function $\widetilde{f}_{k}(z)$ is a rational function of type $(m_{k},\ell_{k})$ , where $(m_{k},\ell_{k})$ is given by (5). In both the case $\ell=m-1$ and the case $\ell=m$ , we have

[TABLE]

so (sec:pthroot.iii) implies that $\frac{\widetilde{f}_{k}(z)-f(z)}{f(z)}$ equioscillates $m_{k}+\ell_{k}+2$ times on $[f^{-1}(\alpha),1]$ . It follows from the theory of rational minimax approximation that $\widetilde{f}_{k}(z)$ is the best rational approximant of $\sqrt{z}$ of type $(m_{k},\ell_{k})$ on $[\alpha^{2},1]$ :

[TABLE]

In particular,

[TABLE]

for every $k\geq 1$ . This shows that Theorem 3.1 includes [7, Theorem 1] as a special case.

3.3.2 Low-order iterations

When $p\geq 2$ is an integer and $(m,\ell)=(1,0)$ or $(0,1)$ , we recover variants of another family of iterations.

Proposition 3.5.

Let $p\geq 2$ be an integer and $\alpha\in(0,1)$ . We have

[TABLE]

and

[TABLE]

Note that the formula (20) for $\hat{r}_{1,0}(z,\alpha,\sqrt[p]{\cdot})$ appears in [24, Theorem 2] and [20]; see also [13, Lemma 3.2] for a related result.

The preceding proposition shows that when $(m,\ell)=(1,0)$ , the iteration (6-7) reads

[TABLE]

where

[TABLE]

This is a scaled variant of the popular Newton iteration [15, Equation 7.5] for the matrix $p^{th}$ root. The scaling heuristic above is reminiscent of one proposed by Hoskins and Walton [17], but theirs is based on type- $(1,0)$ rational minimax approximants of $z^{(p-1)/p}$ .

On the other hand, when $(m,\ell)=(0,1)$ , the iteration (6-7) reads

[TABLE]

where

[TABLE]

In terms of the matrix $Z_{k}=X_{k}^{-1}$ , the iteration for $X_{k}$ becomes

[TABLE]

which is a scaled variant of the inverse Newton iteration [15, Equation (7.12)] for computing $A^{-1/p}$ .

3.3.3 Padé iterations

We recover one more family of iterations by considering the limit as $\alpha\uparrow 1$ in (6-7).

Below, we say that a family of rational functions $\{r_{\alpha}\in\mathcal{R}_{m,\ell}\mid\alpha\in(0,1)\}$ converges coefficientwise to $r_{1}\in\mathcal{R}_{m,\ell}$ as $\alpha\uparrow 1$ if the coefficients of the polynomials in the numerator and denominator of $r_{\alpha}$ , appropriately normalized, approach those of $r_{1}$ as $\alpha\uparrow 1$ .

Proposition 3.6.

As $\alpha\uparrow 1$ , $\hat{r}(z,\alpha,\sqrt[p]{\cdot})$ converges coefficientwise to the type- $(m,\ell)$ Padé approximant $P_{m,\ell,p}(z)$ of $z^{1/p}$ at $z=1$ :

[TABLE]

It follows that the iteration (6-7) reduces formally to

[TABLE]

as $\alpha\uparrow 1$ . This is precisely the Padé iteration for the matrix $p^{th}$ root studied by Laszkiewicz and Ziętak [21, Equation (36)]. When $(m,\ell)=(1,1)$ , it is the Halley iteration [19, p. 11], [12]. In terms of $Y_{k}=X_{k}^{1-p}A$ and $Z_{k}=X_{k}^{-1}$ , the iteration (25) reads

[TABLE]

where $Q_{\ell,m,p}(z)=P_{m,\ell,p}(z)^{-1}$ .

For later use, it will be convenient to define

[TABLE]

The Padé iterations (25) and (26-27) are then simply the iterations obtained by setting $\alpha=1$ in the minimax iterations (6-7) and (8-10), respectively.

3.4 Stability of the coupled matrix iteration

As alluded to earlier, the uncoupled matrix iteration (6-7) exhibits numerical instability, whereas the coupled iteration (8-10) does not. We justify the latter claim below.

We recall the following definition. A matrix iteration $X_{k+1}=g(X_{k})$ with fixed point $X_{*}$ is said to be stable in a neighborhood of $X_{*}$ if the Fréchet derivative of $g$ at $X_{*}$ has bounded powers at $X_{*}$ [15, Definition 4.17]. That is, if $L_{g}(A,E)$ denotes the Fréchet derivative of $g$ at $A\in\mathbb{C}^{n\times n}$ in a direction $E\in\mathbb{C}^{n\times n}$ , then there exists a constant $c>0$ such $\|G^{j}(E)\|\leq c\|E\|$ for every $j$ and every $E\in\mathbb{C}^{n\times n}$ , where $G(E)=L_{g}(X_{*},E)$ .

We first address the stability of the coupled Padé iteration (26-27).

Proposition 3.7.

Let $m,\ell\in\mathbb{N}_{0}$ and $p,n\in\mathbb{N}$ with $(m,\ell)\neq(0,0)$ and $p\geq 2$ . The Padé iteration (26-27) is stable in a neighborhood of $(B,B^{-1})$ for any $B\in\mathbb{C}^{n\times n}$ . In particular, with $g(Y,Z)=(YQ_{\ell,m,p}(YZ)^{p-1},Q_{\ell,m,p}(YZ)Z)$ , we have

[TABLE]

*for any $E,F\in\mathbb{C}^{n\times n}$ , and $L_{g}(B,B^{-1};\cdot,\cdot)$ is idempotent. *

Consider now the coupled minimax iteration (8-10). Theorem 3.1 established that $\alpha_{k}$ converges to $1$ in (10). We argue in Section 5 that when $\alpha_{k}$ is close to 1, it is numerically prudent to set $\alpha_{k}$ (and all subsequent iterates) equal to 1, thereby reverting to the Padé iteration (26-27). Since the latter iteration is stable, it follows that the aforementioned modification of (8-10) is stable as well.

4 Proofs

In this section, we prove Theorems 3.1 and 3.2, Corollaries 3.3 and 3.4, and Propositions and 3.5, 3.6, and 3.7.

4.1 Proof of Theorem 3.1

4.1.1 Equioscillation

To prove the claims (sec:pthroot.ii) and (sec:pthroot.iii) in Theorem 3.1, we use an inductive argument. When $k=0$ , (sec:pthroot.iii) holds since the relative error $\frac{\widetilde{f}_{0}(z)-f(z)}{f(z)}=\frac{2\alpha}{f(z)(1+\alpha)}-1$ decreases monotonically from $\frac{1-\alpha}{1+\alpha}$ to $-\frac{1-\alpha}{1+\alpha}$ as $z$ runs from $f^{-1}(\alpha)$ to $1$ . This shows also that $\varepsilon_{0}=\frac{1-\alpha}{1+\alpha}$ , so (15) holds when $k=0$ . Next, we prove two lemmas in preparation for the inductive step.

Lemma 4.1.

Let $f:[0,1]\rightarrow[0,1]$ be a continuous, increasing bijection satisfying (sec:pthroot.i). Then the recurrence (14) is equivalent to

[TABLE]

Proof 4.2.

Since

[TABLE]

the defining property (2) of $\hat{r}_{m,\ell}(z,\alpha,f)$ implies that

[TABLE]

Also, the assumption (sec:pthroot.i) implies that

[TABLE]

so

[TABLE]

*Since this holds for any $\alpha\in(0,1)$ , it follows that the recurrence (14) is equivalent to (28). *

Lemma 4.3.

Let $f:[0,1]\rightarrow[0,1]$ be a continuous, increasing bijection satisfying (sec:pthroot.i). Let $\alpha\in(0,1)$ and $m,\ell\in\mathbb{N}_{0}$ . Let $\widetilde{F}(z)$ be any continuous function on $[f^{-1}(\alpha),1]$ with the property that $\frac{\widetilde{F}(z)-f(z)}{f(z)}$ equioscillates $q$ times on $[f^{-1}(\alpha),1]$ and achieves its extrema $\pm\varepsilon$ at the endpoints, where $q\geq 2$ and $0<\varepsilon<1$ . Define

[TABLE]

*Then $\frac{H(z)-f(z)}{f(z)}$ equioscillates $(m+\ell+1)(q-1)+1$ times on $[f^{-1}(\alpha),1]$ with extrema $\pm E_{m,\ell}(f,[f^{-1}(\alpha^{\prime}),1])$ , and it achieves its extrema at the endpoints. *

Proof 4.4.

The assumed equioscillation of $\frac{\widetilde{F}(z)}{f(z)}-1$ on $[f^{-1}(\alpha),1]$ implies that the function $\frac{\widetilde{F}(f^{-1}(z))}{z}-1$ equioscillates $q$ times on $[\alpha,1]$ with extrema $\pm\varepsilon$ . If we now define

[TABLE]

then we conclude that $S(z)-1$ equioscillates $q$ times on $[\alpha,1]$ with extrema $\frac{1-\varepsilon^{2}}{1\pm\varepsilon}-1=\mp\varepsilon$ . Moreover, it achieves its extrema at the endpoints by our assumptions on $\widetilde{F}$ .

By the same reasoning as above, the function

[TABLE]

has the property that $s_{m,\ell}(z,\alpha^{\prime},f)-1$ equioscillates $m+\ell+2$ times on $[\alpha^{\prime},1]$ with extrema $\pm\varepsilon^{\prime}$ , and it achieves its extrema at the endpoints by the assumption (sec:pthroot.i).

Consider now the function

[TABLE]

We claim that $g(z)-1$ equioscillates on $[\alpha,1]$ with extrema $\pm\varepsilon^{\prime}$ . To see this, we make two observations. First, as $z$ runs from $\alpha$ to $1$ , $\frac{S(z)}{1+\varepsilon}$ runs from/to $\frac{1-\varepsilon}{1+\varepsilon}=\alpha^{\prime}$ to/from $\frac{1+\varepsilon}{1+\varepsilon}=1$ a total of $q-1$ times, achieving its extrema at the endpoints each time. Second, each time $y=\frac{S(z)}{1+\varepsilon}$ runs from/to $\alpha^{\prime}$ to/from $1$ , $s_{m,\ell}(y,\alpha^{\prime},f)-1$ equioscillates $m+\ell+2$ times with extrema $\pm\varepsilon^{\prime}$ . By counting extrema, we conclude that the composition (29) (minus 1) equioscillates

[TABLE]

times on $[\alpha,1]$ with extrema $\pm\varepsilon^{\prime}$ .

Finally, consider the function

[TABLE]

In view of the equioscillation of (29), the function $h(z)-1$ equioscillates $(m+\ell+1)(q-1)+1$ times on $[f^{-1}(\alpha),1]$ with extrema $\frac{1-\varepsilon^{\prime 2}}{1\pm\varepsilon^{\prime}}-1=\mp\varepsilon^{\prime}$ , and it achieves its extrema at the endpoints. We will complete the proof by showing that $h(z)=\frac{H(z)}{f(z)}$ . Using the fact that $1-\varepsilon^{\prime}=\frac{2\alpha^{\prime\prime}}{1+\alpha^{\prime\prime}}$ , $\widetilde{F}(z)=(1-\varepsilon)F(z)$ , and $r_{m,\ell}(z,\alpha,f)=(1-\varepsilon^{\prime})\hat{r}_{m,\ell}(z,\alpha,f)$ , we have

[TABLE]

Remark 4.5.

When $f(z)=z^{1/p}$ , the function

[TABLE]

appearing in the proof above is a rational approximant of the sector function $\mathrm{sect}_{p}(z)=z/(z^{p})^{1/p}$ . In fact, the proof above reveals that on each of the segments $\{z\in\mathbb{C}\mid e^{-2\pi ij/p}z\in[\alpha^{\prime},1]\}$ , $j=0,1,2,\dots,p-1$ , the relative error

[TABLE]

*is real-valued and equioscillates $m+\ell+2$ times with extrema $\pm\varepsilon^{\prime}$ . In particular, for $\ell\in\{m-1,m\}$ , $s_{m,\ell}(z,\alpha^{\prime},\sqrt{\cdot})$ is Zolotarev’s type- $(2\ell+1,2m)$ best rational approximant of the sign function $\mathrm{sign}(z)=z/(z^{2})^{1/2}$ on $[-1,-\alpha^{\prime}]\cup[\alpha^{\prime},1]$ [26]. *

We are now ready to prove (sec:pthroot.ii-sec:pthroot.iii). Suppose (sec:pthroot.iii) and (15) hold at step $k$ in the iteration (11-12). Then Lemma 4.3 (applied with $\widetilde{F}=\widetilde{f}_{k}$ , $\varepsilon=\varepsilon_{k}$ , and $q=(m+\ell+1)^{k}+1$ , so that $\alpha^{\prime}=\alpha_{k}$ and $\alpha^{\prime\prime}=\alpha_{k+1}$ ) implies that (sec:pthroot.iii) and (15) hold at step $k+1$ , so in fact they hold for all $k$ . It now follows immediately that (16) is equivalent to (28), which, in turn, is equivalent to (14) by Lemma 4.1. This completes the proof of (sec:pthroot.ii-sec:pthroot.iii).

4.1.2 Convergence

We now address the last claim (sec:pthroot.iv) of Theorem 3.1, which concerns the convergence of $\varepsilon_{k}$ to [math] in the iteration

[TABLE]

with $\alpha\in(0,1)$ ,

[TABLE]

and $(m,\ell)\neq(0,0)$ .

Lemma 4.6.

*Let $m,\ell\in\mathbb{N}_{0}$ , and let $f:[0,1]\rightarrow[0,1]$ be a continuous, increasing bijection satisfying (sec:pthroot.i). If $(m,\ell)\neq(0,0)$ , then $G$ is continuous, nonnegative, and nondecreasing on $(0,1)$ . Furthermore, $G(\varepsilon)<\varepsilon$ for every $\varepsilon\in(0,1)$ . *

Proof 4.7.

It is obvious that $G$ is nonnegative and nondecreasing. To show that $G(\varepsilon)<\varepsilon$ for every $\varepsilon\in(0,1)$ , note that (31) is no larger than the uniform relative error committed by the constant function $g(z)=1-\varepsilon$ :

[TABLE]

*for every $z\in\left[f^{-1}\left(\frac{1-\varepsilon}{1+\varepsilon}\right),1\right]$ . This establishes that $G(\varepsilon)\leq\varepsilon$ . The inequality is in fact strict since we assumed (sec:pthroot.i), which implies that the minimizer of the relative error is not a constant function when $(m,\ell)\neq(0,0)$ . It remains to show that $G$ is continuous on $(0,1)$ . We assumed in (sec:pthroot.i) that the minimizer for $E_{m,\ell}(f,[f^{-1}(\alpha),1])$ has defect 0 in $\mathcal{R}_{m,\ell}$ for each $\alpha\in(0,1)$ , so, for each fixed $\alpha\in(0,1)$ , the map $g\mapsto r_{m,\ell}(\cdot,\alpha,g)$ is continuous with respect to the uniform norm at $g=f$ [23]. By considering functions $g$ obtained by scaling and translating the input to $f$ , we deduce that $r_{m,\ell}(\cdot,\alpha,f)$ depends continuously on $\alpha\in(0,1)$ , again with respect to the uniform norm. Hence, the map $\alpha\mapsto E_{m,\ell}(f,[f^{-1}(\alpha),1])$ is continuous on $(0,1)$ , and so too is $G$ . *

It follows from the above properties of $G$ that $\varepsilon_{k}\rightarrow 0$ monotonically in the iteration $\varepsilon_{k+1}=G(\varepsilon_{k})$ for every $\varepsilon_{0}\in(0,1)$ .

4.1.3 Rate of convergence

It remains to show that the order of convergence of $\varepsilon_{k}$ to [math] is $m+\ell+1$ . As we explained in the paragraph below Theorem 3.1, it suffices to note that when $f$ is $C^{m+\ell+1}$ in a neighborhood of $1$ ,

[TABLE]

Indeed, this, together with (16), gives

[TABLE]

assuming $f^{-1}$ is Lipschitz near $1$ and $f^{-1}(1)=1$ . Below, we give more precise information about the constant implicit in (32). We begin with a lemma that shows, in essence, that the uniform error in the best type- $(m,\ell)$ rational approximant of a function $g(z)$ on a small interval $[-\delta,\delta]$ is about $2^{m+\ell}$ times smaller than the uniform error in the type- $(m,\ell)$ Padé approximant of $g(z)$ .

Lemma 4.8.

Let $g(z)$ be $C^{m+\ell+1}$ and positive in a neighborhood of [math]. Assume that the type- $(m,\ell)$ Padé approximant $p(z)$ of $g(z)$ about [math] has defect [math] in $\mathcal{R}_{m,\ell}$ , and

[TABLE]

where $c_{g}\in\mathbb{R}$ . For each $\delta>0$ , let

[TABLE]

Then, as $\delta\rightarrow 0$ ,

[TABLE]

Proof 4.9.

Let

[TABLE]

Among polynomials of degree $m+\ell+1$ with unit leading coefficient, the polynomial $z^{m+\ell+1}-q(z)$ is the one that deviates least from [math] on $[-\delta,\delta]$ . Up to a rescaling, this is precisely the degree- $(m+\ell+1)$ Chebyshev polynomial of the first kind $T_{m+\ell+1}(z)$ :

[TABLE]

Now let $R(z)$ be the type $(m,\ell)$ -Padé approximant of

[TABLE]

Since we assumed that the Padé approximant of $g(z)$ has defect [math] in $\mathcal{R}_{m,\ell}$ , the Taylor coefficients of $R(z)$ approach those of $p(z)$ as $\delta\rightarrow 0$ [30, Corollary of Theorem 2a]. It follows that for each $\delta>0$ sufficiently small,

[TABLE]

for some $\bar{c}_{g}$ with $\bar{c}_{g}-c_{g}=o(1)$ as $\delta\rightarrow 0$ . Thus, for each $\delta>0$ sufficiently small,

[TABLE]

Hence, as $\delta\rightarrow 0$ ,

[TABLE]

for every $z\in[-\delta,\delta]$ , uniformly in $z$ . Multiplying by $\frac{1}{g(z)}=\frac{1}{g(0)}+o(1)$ , we conclude that

[TABLE]

for every $z\in[-\delta,\delta]$ , uniformly in $z$ . Finally, by the definition of $r_{\delta}$ ,

[TABLE]

In fact, this bound is sharp, for the following reason. The relation (34) shows that for $\delta$ sufficiently small, $\frac{R(z)-g(z)}{g(z)}$ approximately equioscillates, in the sense that there exist $m+\ell+2$ points $-\delta\leq z_{0}\leq z_{1}\leq\dots\leq z_{m+\ell+1}\leq\delta$ at which $\frac{R(z)-g(z)}{g(z)}$ alternates in sign and satisfies

[TABLE]

where $\gamma=o(\delta^{m+\ell+1})$ . The de la Vallée Poussin lower bound [28, Exercise 24.5] then implies that

[TABLE]

Remark 4.10.

*The proof above suggests a heuristic for constructing near-best rational minimax approximants on short intervals $[-\delta,\delta]$ : one computes the Padé approximant of $\bar{g}(z)=g(z)-c_{g}z^{m+\ell+1}+2c_{g}(\delta/2)^{m+\ell+1}T_{m+\ell+1}(z/\delta)$ rather than $g(z)$ . *

Remark 4.11.

*The near equioscillation of $R$ in the proof above can be used to show that $R$ is close to $r_{\delta}$ : $R(z)-r_{\delta}(z)=o(\delta^{m+\ell+1})$ , uniformly in $z\in[-\delta,\delta]$ as $\delta\rightarrow 0$ . The argument is essentially the same as the one used in [29, p. 429-430] to show that Carathéodory-Féjer approximants are close to minimax approximants on small intervals. *

It is now a simple matter to estimate the constant implicit in (32). As $\varepsilon\rightarrow 0$ , the above lemma gives

[TABLE]

where

[TABLE]

and $c_{f,\delta}$ is the Taylor coefficient of $(z-1+\delta)^{m+\ell+1}$ in the difference between $f(z)$ and its type- $(m,\ell)$ Padé approximant about $z=1-\delta$ . A short calculation shows that $\delta=\varepsilon(f^{-1})^{\prime}(1)+o(\varepsilon)=\varepsilon/f^{\prime}(1)+o(\varepsilon)$ and $c_{f}:=c_{f,0}=c_{f,\delta}+o(1)$ , so

[TABLE]

It follows that in the iteration (30), we have

[TABLE]

4.2 Proof of Theorem 3.2

Having proved Theorem 3.1, we now verify that the function $f(z)=z^{1/p}$ satisfies the hypothesis (sec:pthroot.i), and we prove Theorem 3.2.

We begin by establishing a few properties of the minimax approximants $r_{m,\ell}(z,\alpha,\sqrt[p]{\cdot})$ . The proof of the following lemma is similar to that in [27, Lemma 2], which studies rational functions of type $(\ell+1,\ell)$ that minimize the maximum absolute error on $[0,1]$ rather than the maximum relative error on $[\alpha,1]$ , $\alpha>0$ . The proof makes use of the following terminology. A Chebyshev system of dimension $N$ on an interval $I\subseteq\mathbb{R}$ is a linearly independent set $\{g_{j}(z)\}_{j=1}^{N}$ of continuous functions on $I$ with the property that any nontrivial linear combination $\sum_{j=1}^{N}c_{j}g_{j}(z)$ has at most $N-1$ (distinct) roots in $I$ .

Lemma 4.12.

Let $m,\ell\in\mathbb{N}_{0}$ , $0<a<b<\infty$ , and $p\in\mathbb{N}$ , $p\geq 2$ . If $r\in\mathcal{R}_{m,\ell}$ minimizes

[TABLE]

then $r$ has exact type $(m,\ell)$ , $e(z)$ equioscillates exactly $m+\ell+2$ times on $[a,b]$ , and

[TABLE]

Proof 4.13.

Suppose that $r(z)=g(z)/h(z)$ , where $g(z)$ and $h(z)$ are polynomials of exact degree $m^{\prime}\leq m$ and $\ell^{\prime}\leq\ell$ , respectively. Observe that the function

[TABLE]

belongs to the space $W$ spanned by

[TABLE]

which is a Chebyshev system on $[a,b]$ of dimension $m^{\prime}+\ell^{\prime}+2$ . Thus, $z^{1/p}h(z)e(z)$ has at most $m^{\prime}+\ell^{\prime}+1$ zeros on $[a,b]$ . In particular, $e(z)$ has at most $m^{\prime}+\ell^{\prime}+1$ zeros on $[a,b]$ , so it equioscillates at most $m^{\prime}+\ell^{\prime}+2$ times on $[a,b]$ . But $e(z)$ equioscillates at least $m+\ell+2-d$ times on $[a,b]$ , where $d=\min\{m-m^{\prime},\ell-\ell^{\prime}\}\geq 0$ . It follows that

[TABLE]

so

[TABLE]

From this we conclude that $d=0$ , $m^{\prime}=m$ , $\ell^{\prime}=\ell$ , and $e(z)$ equioscillates exactly $m+\ell+2$ times on $[a,b]$ .

Let $a\leq z_{0}<z_{1}<\dots<z_{m+\ell+1}\leq b$ be the points at which $e(z)$ achieves its extrema on $[a,b]$ . Suppose that $z_{0}>a$ or $z_{m+\ell+1}<b$ . By considering the graph of $e(z)$ , one easily deduces that there exists $c\in\mathbb{R}$ such that $e(z)-c$ has at least $m+\ell+2$ roots in $[a,b]$ . But

[TABLE]

so $z^{1/p}h(z)(e(z)-c)$ has at most $m^{\prime}+\ell^{\prime}+1=m+\ell+1$ roots in $[a,b]$ . In particular, $e(z)-c$ has at most $m+\ell+1$ roots in $[a,b]$ , a contradiction. It follows that $z_{0}=a$ and $z_{m+\ell+1}=b$ .

It remains to verify that the signs in (36-37) are correct. Consider the dependence of $e(z)$ on the parameters $a$ and $b$ . Denote this dependence by $e(z;a,b)$ . By an argument similar to the one made in the proof of Lemma 4.6, the maps $a\mapsto e(a;a,b)$ and $b\mapsto e(a;a,b)$ are continuous on $(0,b)$ and $(a,\infty)$ , respectively. These maps also have no zeros, since $e(z;a,b)$ has a nonzero extremum at $z=a$ for every $0<a<b<\infty$ . Now, for small $\delta>0$ , the proof of Lemma 4.8 shows that for $z\in[1-\delta,1+\delta]$ ,

[TABLE]

*where $c_{f}$ is the coefficient of $(z-1)^{m+\ell+1}$ in the Taylor expansion of $P_{m,\ell,p}(z)-z^{1/p}$ about $z=1$ . In particular, $e(1-\delta;1-\delta,1+\delta)$ has the same sign as $c_{f}T_{m+\ell+1}(-1)=(-1)^{m+\ell+1}c_{f}$ for $\delta$ close to [math], which, as we verify below in (39), is positive. By continuity, $e(a;a,b)>0$ for every $0<a<b<\infty$ , and (36-37) follow. *

The preceding lemma shows that the function $f(z)=z^{1/p}$ satisfies the hypothesis (sec:pthroot.i), so Theorem 3.2 will follow if we can show that the constant $C(m,\ell,p)$ in the estimate (17) is given by (18). In view of the general estimate (35), it suffices to determine the coefficient $c_{f}$ of the leading-order term $c_{f}(z-1)^{m+\ell+1}$ in $P_{m,\ell,p}(z)-z^{1/p}$ , where $P_{m,\ell,p}(z)$ is the Padé approximant (24) of $z^{1/p}$ about $z=1$ . This is given by [10, Lemma 3.12]

[TABLE]

Inserting this into (35) and noting that $f^{\prime}(1)=\frac{1}{p}$ and

[TABLE]

we obtain (18).

4.3 Proof of Corollaries 3.3 and 3.4

To prove Corollaries 3.3 and 3.4, observe that with $e_{k}(z)=\frac{\widetilde{f}_{k}(z)-z^{1/p}}{z^{1/p}}$ , we have

[TABLE]

and

[TABLE]

The results follow from the above equalities and the bounds

[TABLE]

and

[TABLE]

4.4 Proof of Proposition 3.5

To prove the formula (20) for $\hat{r}_{1,0}(z,\alpha,\sqrt[p]{\cdot})$ , it suffices to show that the function

[TABLE]

achieves its global maximum on $[\alpha^{p},1]$ at both endpoints and has global minimum [math] on $[\alpha^{p},1]$ . Indeed, if this is the case, then the rescaled function

[TABLE]

has relative error which equioscillates three times on $[\alpha^{p},1]$ , and so must be the minimizer for $E_{1,0}(\sqrt[p]{\cdot},[\alpha^{p},1])$ . A calculation verifies that $\hat{e}(z)$ has a critical point at $z=\mu^{p}$ , $\hat{e}(\mu^{p})=0$ , $\hat{e}(\alpha^{p})=\hat{e}(1)$ , $\hat{e}(z)$ is decreasing on $(\alpha^{p},\mu^{p})$ , and $\hat{e}(z)$ is increasing on $(\mu^{p},1)$ .

The proof of (21) is similar. In this case, a calculation verifies that the function

[TABLE]

has a critical point at $z=1/\nu^{p}$ , $\hat{e}(1/\nu^{p})=0$ , $\hat{e}(\alpha^{p})=\hat{e}(1)$ , $\hat{e}(z)$ is decreasing on $(\alpha^{p},1/\nu^{p})$ , and $\hat{e}(z)$ is increasing on $(1/\nu^{p},1)$ .

4.5 Proof of Proposition 3.6

Trefethen and Gutknecht [30, Theorem 3b] have shown that for any function $f$ analytic in a neighborhood of $1$ , $\operatorname*{arg\,min}_{r\in\mathcal{R}_{m,\ell}}\linebreak\max_{z\in[1-\delta,1]}|r(z)-f(z)|$ converges coefficientwise as $\delta\rightarrow 0$ to the type- $(m,\ell)$ Padé approximant of $f$ about $z=1$ , provided that the Padé approximant has defect [math] in $\mathcal{R}_{m,\ell}$ . Their proof carries over easily to minimizers of the relative error $|(r(z)-f(z))/f(z)|$ , assuming $f(1)\neq 0$ . Since $P_{m,\ell,p}(z)$ has defect [math] in $\mathcal{R}_{m,\ell}$ [9], Proposition 3.6 follows. The explicit formula (24) for $P_{m,\ell,p}(z)$ is from [21, p. 954].

4.6 Proof of Proposition 3.7

Since $Q_{\ell,m,p}(z)^{-1}=P_{m,\ell,p}(z)$ is a Padé approximant of $f(z)=z^{1/p}$ about $z=1$ of type $(m,\ell)\neq(0,0)$ , we have $Q_{\ell,m,p}(1)=1$ and

[TABLE]

Hence, $Q_{\ell,m,p}(I)=I$ , $L_{Q_{\ell,m,p}}(I,E)=-\frac{1}{p}E$ , and $L_{Q_{\ell,m,p}^{p-1}}(I,E)=-\frac{p-1}{p}E$ for any $E\in\mathbb{C}^{n\times n}$ . Thus, with $g(Y,Z)=(YQ_{\ell,m,p}(ZY)^{p-1},Q_{\ell,m,p}(ZY)Z)$ , we obtain

[TABLE]

Setting $\widetilde{E}=\frac{1}{p}(E-(p-1)BFB)$ and $\widetilde{F}=\frac{1}{p}((p-1)F-B^{-1}EB^{-1})$ , we find that $L_{g}(B,B^{-1};\widetilde{E},\widetilde{F})=L_{g}(B,B^{-1};E,F)$ , so $L_{g}(B,B^{-1};\cdot,\cdot)$ is idempotent.

5 Numerical examples

In this section, we present numerical examples and discuss the implementation of the rational minimax iteration (8-10).

5.1 Implementation

Implementing the rational minimax iteration (8-10) requires evaluating the rational function $h_{\ell,m,p}(z,\alpha_{k})=\hat{r}_{m,\ell}(z,\alpha_{k},\sqrt[p]{\cdot})^{-1}$ at a matrix argument $Z_{k}Y_{k}$ . With the exception of the special cases detailed in Section 3.3, explicit formulas for this function are not available. Nevertheless, $\hat{r}_{m,\ell}(z,\alpha_{k},\sqrt[p]{\cdot})$ (or, more precisely, its unscaled counterpart $r_{m,\ell}(z,\alpha_{k},\sqrt[p]{\cdot})$ ) can be computed numerically using, for instance, the function MiniMaxApproximation from Mathematica’s FunctionApproximations package. We used this function along with Apart to compute $h_{\ell,m,p}(z,\alpha_{k})$ in partial fraction form. For $\alpha_{k}$ close to $1$ , the computation of $h_{\ell,m,p}(z,\alpha_{k})$ poses numerical difficulties, so we rounded $\alpha_{k}$ to $1$ (thereby reverting to the Padé iteration (26-27)) whenever $\alpha_{k}>0.99$ . We also observed that for $\alpha_{k}$ close to [math] and $\ell=m$ , accuracy improved if $r_{m,m}(z,\alpha_{k},\sqrt[p]{\cdot})$ was computed as $R(1/z)$ , where $R=\operatorname*{arg\,min}_{r\in\mathcal{R}_{m,m}}\max_{1\leq z\leq\alpha_{k}^{-p}}|(r(z)-z^{-1/p})/z^{-1/p}|$ .

Note that a more robust option for computing minimizers of the maximum absolute error $|r(z)-f(z)|$ is the Chebfun function minimax [6]. However, Chebfun currently does not support minimization of the maximum relative error $|(r(z)-f(z))/f(z)|$ .

Algorithm 1 summarizes the implementation of the rational minimax iteration (8-10). For simplicity, it focuses on the type $(m,m)$ iteration. The type $(m,\ell)$ iteration with $\ell\neq m$ is similar, but the form of the partial fraction expansion of $h_{\ell,m,p}(z,\alpha)$ varies with $\ell$ . In the algorithm, the eigenvalues of $A$ with the smallest and largest magnitudes are denoted $\lambda_{\mathrm{min}}(A)$ and $\lambda_{\mathrm{max}}(A)$ , respectively.

The choices of $\alpha_{0}$ and $\tau$ used in the algorithm are motivated by Corollary 3.4: they ensure that the spectrum of $A/\tau$ is contained in the annulus $\{z\in\mathbb{C}\mid\alpha_{0}^{p}\leq|z|\leq 1\}$ . In particular, if $A$ is Hermitian positive definite, then the spectrum of $A/\tau$ is contained in $[\alpha_{0}^{p},1]$ , and Corollary 3.4 is directly applicable. Neither $\lambda_{\mathrm{min}}(A)$ nor $\lambda_{\mathrm{max}}(A)$ need to be computed accurately; our experience suggests that estimates can be used without significantly degrading the algorithm’s performance.

As a termination criterion, we terminated the iterations when

[TABLE]

where $\Delta=10^{-15}$ is a relative error tolerance. This is a generalization to arbitrary $p$ of the termination criterion described in [7, Section 4.3].

Floating point operations

If $A$ is $n\times n$ and $(a_{0}I+W)^{p-1}$ is computed with binary powering in Line 9 of Algorithm 1, then the cost of each iteration in Algorithm 1 is about $(6+2m+\beta\log_{2}(p-1))n^{3}$ flops, where $\beta\in[1,2]$ [15, p. 72]. In the first iteration, the cost reduces to $(2+2m+\beta\log_{2}(p-1))n^{3}$ flops since $Z_{0}=I$ . If parallelism is exploited, then the $m$ matrix inversions in Line 8 can be performed simultaneously, as can Lines 9-10. The effective cost of such a parallel implementation is $(4+\beta\log_{2}(p-1))n^{3}$ flops in the first iteration and $(6+\beta\log_{2}(p-1))n^{3}$ flops in each remaining iteration. Further savings in computational costs can be achieved when $p=2$ ; see [7, Section 4.2] for details.

5.2 Scalar iteration

Asymptotic convergence rates

To verify the asymptotic convergence rates predicted by Theorem 3.2, we computed $\varepsilon_{k}=\frac{1-\alpha_{k}}{1+\alpha_{k}}$ , $k=1,2,3$ , for various choices of $m$ , $\ell$ , $p$ , and $\varepsilon_{0}$ . Table 1 reports the results for three such choices. (We selected values of $m$ , $\ell$ , $p$ , and $\varepsilon_{0}$ so that the asymptotic regime was reached before convergence to machine precision occurred.) The table demonstrates that the ratios $\varepsilon_{k}/\varepsilon_{k-1}^{m+\ell+1}$ approach the constant $C(m,\ell,p)$ given by (18). Note that the entry in the row $k=3$ of the last column should be ignored, since $\varepsilon_{3}$ is below machine precision in that instance.

Complex inputs

To study the behavior of the rational function $\widetilde{f}_{k}(z)$ generated by the type- $(m,\ell)$ iteration (11-12), we numerically computed the sets

[TABLE]

for various choices of $\delta$ , $\alpha$ , $m$ , $\ell$ , and $p$ . The boundaries of these sets are plotted in Fig. 1. They are plotted in the $(\log_{10}|z|,\arg z)$ coordinate plane rather than the usual $(\operatorname{Re}z,\operatorname{Im}z)$ coordinate plane to facilitate viewing. The shaded regions in the plots correspond to points $z\in\mathbb{C}$ for which $\lim_{k\rightarrow\infty}\widetilde{f}_{k}(z)\neq z^{1/p}$ . Numerical evidence indicates that at these points, $\lim_{k\rightarrow\infty}\widetilde{f}_{k}(z)\in\{e^{2\pi ij/p}z^{1/p}\mid j\in\{1,2,\dots,p-1\}\}$ . Furthermore, the shaded regions have a fractal structure. Both of these phenomena are typical features of iterations for the $p^{th}$ root when $p>2$ [5].

Fig. 1 gives valuable insight into the behavior of the matrix iteration (6-7) (and, of course, its coupled counterpart (8-10)). Indeed, if $A$ is a normal matrix with eigenvalues in $\mathcal{S}(k)$ , then the iteration (6-7) converges in at most $k$ iterations with a relative tolerance $\delta$ in the $2$ -norm. As an example, the plot in row 3, column 2 of Fig. 1 demonstrates that $\mathcal{S}(2)$ contains the set

[TABLE]

when $(m,\ell)=(8,8)$ , $p=3$ , and $\alpha=10^{-10/3}$ . It follows that the type- $(8,8)$ iteration (6-7) converges to $A^{1/3}$ in at most $2$ iterations for any normal matrix $A$ with spectrum in the right half plane and $|\lambda_{\mathrm{max}}(A)/\lambda_{\mathrm{min}}(A)|\leq 10^{10}$ .

For comparison, Fig. 2 shows the boundaries of the sets

[TABLE]

where this time $\widetilde{f}_{k}(z)$ is the rational function generated by (11-12) with the initial condition $\alpha_{0}=\alpha$ replaced by $\alpha_{0}=1$ . By Proposition 3.6, the sets $\mathcal{T}(k)$ characterize the convergence behavior of the Padé iteration (24) (and its coupled counterpart (26-27)) with the initial iterate scaled by $1/\alpha^{p/2}$ .

Notice that for small $\alpha$ (the two rightmost columns of Fig. 2), the sets $\mathcal{T}(k)$ do not contain scalars with extreme magnitudes ( $|z|=\alpha^{p}$ and $|z|=1$ ) unless $k$ is relatively large. Comparing, for instance, the bottom right plots in Figs. 1 and 2, we see that if $A$ is Hermitian positive definite with spectrum in $[10^{-16},1]$ , then the type- $(8,8)$ rational minimax iteration (11-12) converges in at most $2$ iterations, whereas the type- $(8,8)$ Padé iteration (24) converges in at most $5$ . The same observation holds, in fact, for the type- $(6,6)$ and type- $(7,7)$ iterations, which are not shown in Figs. 1-2. This is entirely analagous to the behavior observed in the case $p=2$ in [7, Section 5.1]. In fact, with the exception of the low-order iterations, Figs. 1-2 bear a rather strong resemblance to Figs. 1-2 of [7].

It is worth noting that for the low-order iterations, the sets $\{z\in\mathbb{C}\mid\lim_{k\rightarrow\infty}\widetilde{f}_{k}(z)\neq z^{1/p}\}$ occupy more of the complex plane when $\widetilde{f}_{k}(z)$ is generated from the rational minimax iteration than when $\widetilde{f}_{k}(z)$ is generated from the Padé iteration (see the shaded regions in row 1 of Figs. 1-2). This appears to be a drawback of the low-order rational minimax iterations. The moderate-order and high-order iterations do not suffer as much from this issue; compare the shaded regions in the bottom two rows of Figs. 1-2, which occupy only a small neighborhood of the nonpositive real axis ( $|\arg z|=\pi$ ). The latter observation suggests that for moderate to high $m$ and $\ell$ , it is safe to apply Algorithm 1 to matrices with spectrum contained in $\{z\in\mathbb{C}\colon|\arg z|\leq\Theta\}$ , where $\Theta<\pi$ is close to $\pi$ . For matrices with eigenvalues that lie very near but not on the nonpositive real axis, a simple workaround is to compute $A^{1/2}$ using any algorithm for the matrix square root, and then compute $((A^{1/2})^{1/p})^{2}$ . One can also compute $((A^{1/2^{s}})^{1/p})^{2^{s}}$ with $s>1$ , as in [13, 16], but the advantages of minimax approximation over Padé approximation become less pronounced as $s$ increases, since $A^{1/2^{s}}$ has eigenvalues clustered near $1$ for large $s$ .

5.3 Matrix iteration

To test Algorithm 1, we applied it to a collection of matrices of size $10\times 10$ from the Matrix Computation Toolbox [14]. We selected those $10\times 10$ matrices in the toolbox with condition number $\leq u^{-1}$ (where $u=2^{-53}$ denotes the unit roundoff) and with spectrum contained in the sector $\{z\in\mathbb{C}\colon|\arg z|<0.9\pi\}$ . We also included those matrices whose spectrum could be rotated into the aforementioned sector by multiplying $A$ by a suitable scalar $e^{i\theta}$ , $\theta\in[0,2\pi]$ . A total of 41 matrices met these criteria.

Fig. 3 plots the relative error $\|\widehat{X}-A^{1/p}\|_{\infty}/\|A^{1/p}\|_{\infty}$ in the computed $p^{th}$ root $\widehat{X}$ of $A$ for each of the $41$ matrices, where $p=3$ . The tests are sorted in order of decreasing $\kappa^{(p)}(A)$ , where

[TABLE]

denotes the Frobenius-norm relative condition number of the matrix $p^{th}$ root $X$ of $A$ [15, Problem 7.4]. Results for five methods are shown: the rational minimax iterations (8-10) of type $(4,4)$ and $(8,8)$ , the Padé iterations (26-27) of type $(4,4)$ and $(8,8)$ , and the built-in Matlab function funm. The Padé iterations were implemented using Algorithm 1 with Lines 1-2 replaced by $\tau=1/\sqrt{|\lambda_{\mathrm{min}}(A)\lambda_{\mathrm{max}}(A)|}$ and $\alpha_{0}=1$ . The results indicate that the algorithms under consideration behave in a forward stable way, with relative errors mostly lying within a small factor of $u\kappa^{(p)}(A)$ .

In Table 2, the number of iterations used by each iterative method on the 41 tests are recorded. In analogy with the results of [7], the rational minimax iterations very often converged more quickly than the Padé iterations on these tests.

6 Conclusion

This paper has constructed and analyzed a family of iterations for computing the matrix $p^{th}$ root using rational minimax approximants of the function $z^{1/p}$ . The output of each step $k$ of the type- $(m,\ell)$ iteration is a rational function $r$ of $A$ with the property that the scalar function $e(z)=(r(z)-z^{1/p})/z^{1/p}$ equioscillates $(m+\ell+1)^{k}+1$ times on $[\alpha^{p},1]$ , where $\alpha\in(0,1)$ is a parameter depending on $A$ . With the exception of the Zolotarev iterations (i.e. $p=2$ and $\ell\in\{m-1,m\}$ ), this equioscillatory behavior does not render $\max_{\alpha^{p}\leq z\leq 1}|e(z)|$ minimal among all choices of $r$ with the same numerator and denominator degree. Nevertheless, we have shown that many of the desirable features of the Zolotarev iterations carry over to the general setting. A key role in the analysis was played by the asymptotic behavior of rational minimax approximants on short intervals.

Several topics mentioned in this paper are worth pursuing in more detail. Remark 4.5 leads naturally to a family of rational minimax iterations for the matrix sector function $\mathrm{sect}_{p}(A)=A(A^{p})^{-1/p}$ . As $\alpha\uparrow 1$ , these iterations likely reduce to the Padé iterations for the sector function studied by Laszkiewicz and Ziętak [21, Section 5], so the results therein could inform an analysis of the convergence of the rational minimax iterations on matrices that are non-normal and/or have spectrum away from the positive real axis. Another topic of interest is computing the action of $A^{1/p}$ on a vector $b$ using rational minimax iterations. Li and Yang [22] address a similar task: computing the action of a spectral filter on $b$ using Zolotarev iterations for $\mathrm{sign}(z)$ . It my may be possible to construct a similar algorithm for computing $A^{1/p}b$ . Finally, the functional iteration (11-12) is of interest in its own right, as it offers a method of rapidly generating rational approximants of $z^{1/p}$ with small relative error, a tool that may have applications in, for instance, numerical conformal mapping [11].

Acknowledgments

The author was supported in part by the NSF under grant DMS-1703719.

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] N. I. Akhiezer , Theory of approximation , Frederick Ungar Publishing Corporation, 1956.
2[2] B. Beckermann , Optimally scaled Newton iterations for the matrix square root , Advances in Matrix Functions and Matrix Equations workshop, Manchester, UK, 2013.
3[3] D. A. Bini, N. J. Higham, and B. Meini , Algorithms for the matrix pth root , Numerical Algorithms, 39 (2005), pp. 349–378.
4[4] R. Byers and H. Xu , A new scaling for Newton’s iteration for the polar decomposition and its backward stability , SIAM Journal on Matrix Analysis and Applications, 30 (2008), pp. 822–843.
5[5] J. R. Cardoso and A. F. Loureiro , Iteration functions for pth roots of complex numbers , Numerical Algorithms, 57 (2011), pp. 329–356.
6[6] T. A. Driscoll, N. Hale, and L. N. Trefethen , Chebfun guide , 2014.
7[7] E. S. Gawlik , Zolotarev iterations for the matrix square root , ar Xiv preprint 1804.11000, (2018).
8[8] E. S. Gawlik, Y. Nakatsukasa, and B. D. Sutton , A backward stable algorithm for computing the CS decomposition via the polar decomposition , SIAM Journal on Matrix Analysis and Applications, 39 (2018), pp. 1448–1469.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Rational Minimax Iterations for Computing the Matrix pppth Root

Abstract

keywords:

1 Introduction

2 Background: Zolotarev iterations for the matrix square root

3 Minimax iterations for the matrix pthp^{th}pth root

3.1 Functional iteration

Theorem 3.1**.**

Theorem 3.2**.**

3.2 Convergence of the matrix iteration

Corollary 3.3**.**

Corollary 3.4**.**

3.3 Special cases

3.3.1 Square roots

3.3.2 Low-order iterations

Proposition 3.5**.**

3.3.3 Padé iterations

Proposition 3.6**.**

3.4 Stability of the coupled matrix iteration

Proposition 3.7**.**

4 Proofs

4.1 Proof of Theorem 3.1

4.1.1 Equioscillation

Lemma 4.1**.**

Proof 4.2**.**

Lemma 4.3**.**

Proof 4.4**.**

Remark 4.5**.**

4.1.2 Convergence

Lemma 4.6**.**

Proof 4.7**.**

4.1.3 Rate of convergence

Lemma 4.8**.**

Proof 4.9**.**

Remark 4.10**.**

Remark 4.11**.**

4.2 Proof of Theorem 3.2

Lemma 4.12**.**

Proof 4.13**.**

4.3 Proof of Corollaries 3.3 and 3.4

4.4 Proof of Proposition 3.5

4.5 Proof of Proposition 3.6

4.6 Proof of Proposition 3.7

5 Numerical examples

5.1 Implementation

Floating point operations

5.2 Scalar iteration

Asymptotic convergence rates

Complex inputs

5.3 Matrix iteration

6 Conclusion

Acknowledgments

Rational Minimax Iterations for Computing the Matrix $p$ th Root

3 Minimax iterations for the matrix $p^{th}$ root

Theorem 3.1.

Theorem 3.2.

Corollary 3.3.

Corollary 3.4.

Proposition 3.5.

Proposition 3.6.

Proposition 3.7.

Lemma 4.1.

Proof 4.2.

Lemma 4.3.

Proof 4.4.

Remark 4.5.

Lemma 4.6.

Proof 4.7.

Lemma 4.8.

Proof 4.9.

Remark 4.10.

Remark 4.11.

Lemma 4.12.

Proof 4.13.