On the conditioning of the matrix-matrix exponentiation

Jo\~ao R. Cardoso; Amir Sadeghi

arXiv:1703.08804·math.NA·March 28, 2017

On the conditioning of the matrix-matrix exponentiation

Jo\~ao R. Cardoso, Amir Sadeghi

PDF

TL;DR

This paper investigates the conditioning and derivatives of the matrix-matrix exponentiation function, providing new theoretical insights, algorithms for condition number computation, and numerical experiments to understand its stability and sensitivity.

Contribution

It introduces new results on the Fréchet derivative and conditioning of the matrix-matrix exponentiation, including algorithms for computing the condition number and applications to other matrix functions.

Findings

01

Derived new formulas for the Fréchet derivative of the matrix-matrix exponentiation.

02

Proposed an algorithm for computing the relative condition number of A^B.

03

Numerical experiments demonstrate the effectiveness of the proposed methods.

Abstract

If $A$ has no eigenvalues on the closed negative real axis, and $B$ is arbitrary square complex, the matrix-matrix exponentiation is defined as $A^{B} := e^{l o g (A) B}$ . This function arises, for instance, in Von Newmann's quantum-mechanical entropy, which in turn finds applications in other areas of science and engineering. Since in general $A$ and $B$ do not commute, this bivariate matrix function may not be a primary matrix function as commonly defined, which raises many challenging issues. In this paper, we revisit this function and derive new related results. Particular emphasis is given to its Fr\'echet derivative and conditioning. We present a general result on the Fr\'echet derivative of bivariate matrix functions with applications not only to the matrix-matrix exponentiation but also to other functions, such as the second order Fr\'echet derivatives and some iteration…

Equations108

A^{B} := e^{l o g (A) B},

A^{B} := e^{l o g (A) B},

^{B} A := e^{B l o g (A)} .

^{B} A := e^{B l o g (A)} .

(E, F) \to (0, 0) lim \frac{∥ f ( X + E , Y + F ) - f ( X , Y ) - L _{f} ( X , Y ; E , F ) ∥}{∥ ( E , F ) ∥} = 0.

(E, F) \to (0, 0) lim \frac{∥ f ( X + E , Y + F ) - f ( X , Y ) - L _{f} ( X , Y ; E , F ) ∥}{∥ ( E , F ) ∥} = 0.

L_{f} (X, Y; E, F) = h \to 0 lim \frac{f ( X + h E , Y + h F ) - f ( X , Y )}{h} .

L_{f} (X, Y; E, F) = h \to 0 lim \frac{f ( X + h E , Y + h F ) - f ( X , Y )}{h} .

∥ L_{f} (X, Y) ∥ := max_{(E, F) \neq = 0} \frac{∥ L _{f} ( X , Y ; E , F ) ∥}{∥ ( E , F ) ∥} .

∥ L_{f} (X, Y) ∥ := max_{(E, F) \neq = 0} \frac{∥ L _{f} ( X , Y ; E , F ) ∥}{∥ ( E , F ) ∥} .

κ_{f} (X, Y) := \frac{∥ L _{f} ( X , Y ) ∥ ∥ ( X , Y ) ∥}{∥ f ( X , Y ) ∥} .

κ_{f} (X, Y) := \frac{∥ L _{f} ( X , Y ) ∥ ∥ ( X , Y ) ∥}{∥ f ( X , Y ) ∥} .

Γ (A) = \int_{0}^{\infty} e^{- t} t^{A - I} d t,

Γ (A) = \int_{0}^{\infty} e^{- t} t^{A - I} d t,

B (A, B) = \int_{0}^{1} t^{A - I} (1 - t)^{B - I} d t .

B (A, B) = \int_{0}^{1} t^{A - I} (1 - t)^{B - I} d t .

A^{B} = k = 0 \sum \infty \frac{1}{k !} (lo g (A) B)^{k},

A^{B} = k = 0 \sum \infty \frac{1}{k !} (lo g (A) B)^{k},

A^{B} = k \to \infty lim (I + \frac{1}{k} lo g (A) B)^{k} .

A^{B} = k \to \infty lim (I + \frac{1}{k} lo g (A) B)^{k} .

\frac{d X ( t )}{d t} = (lo g (A) B) X (t), X (0) = I .

\frac{d X ( t )}{d t} = (lo g (A) B) X (t), X (0) = I .

{D}={\mathop{\mathrm{diag}}}(\lambda_{{}_{1}},\lambda_{2})=\left[\begin{array}[]{cc}a-ib&0\\ 0&a+ib\\ \end{array}\right],\quad{V}=\left[\begin{array}[]{rr}i&-i\\ 1&1\\ \end{array}\right],

{D}={\mathop{\mathrm{diag}}}(\lambda_{{}_{1}},\lambda_{2})=\left[\begin{array}[]{cc}a-ib&0\\ 0&a+ib\\ \end{array}\right],\quad{V}=\left[\begin{array}[]{rr}i&-i\\ 1&1\\ \end{array}\right],

\log({A})=\left[\begin{array}[]{rr}i&-i\\ 1&1\\ \end{array}\right]\left[\begin{array}[]{cc}\log(\overline{z})&0\\ 0&\log(z)\\ \end{array}\right]\left[\begin{array}[]{rr}i&-i\\ 1&1\\ \end{array}\right]^{-1}=\left[\begin{array}[]{cc}\log(r)&-\theta\\ \theta&\log(r)\\ \end{array}\right].

\log({A})=\left[\begin{array}[]{rr}i&-i\\ 1&1\\ \end{array}\right]\left[\begin{array}[]{cc}\log(\overline{z})&0\\ 0&\log(z)\\ \end{array}\right]\left[\begin{array}[]{rr}i&-i\\ 1&1\\ \end{array}\right]^{-1}=\left[\begin{array}[]{cc}\log(r)&-\theta\\ \theta&\log(r)\\ \end{array}\right].

\log({A}){B}=\left[\begin{array}[]{cc}\alpha\log(r)-\theta\gamma&\beta\log(r)-\theta\delta\\ \gamma\log(r)+\theta\alpha&\delta\log(r)+\theta\beta\\ \end{array}\right].

\log({A}){B}=\left[\begin{array}[]{cc}\alpha\log(r)-\theta\gamma&\beta\log(r)-\theta\delta\\ \gamma\log(r)+\theta\alpha&\delta\log(r)+\theta\beta\\ \end{array}\right].

e^{{M}}=\frac{1}{\Omega}\left[\begin{array}[]{cc}e^{\frac{m_{11}+m_{22}}{2}}\left[\Omega\cosh(\frac{\Omega}{2})+(m_{11}-m_{22})\sinh(\frac{\Omega}{2})\right]&2m_{12}e^{\frac{m_{11}+m_{22}}{2}}\sinh(\frac{\Omega}{2})\\ 2m_{21}e^{\frac{m_{11}+m_{22}}{2}}\sinh(\frac{\Omega}{2})&e^{\frac{m_{11}+m_{22}}{2}}\left[\Omega\cosh(\frac{\Omega}{2})+(m_{22}-m_{11})\sinh(\frac{\Omega}{2})\right]\\ \end{array}\right].

e^{{M}}=\frac{1}{\Omega}\left[\begin{array}[]{cc}e^{\frac{m_{11}+m_{22}}{2}}\left[\Omega\cosh(\frac{\Omega}{2})+(m_{11}-m_{22})\sinh(\frac{\Omega}{2})\right]&2m_{12}e^{\frac{m_{11}+m_{22}}{2}}\sinh(\frac{\Omega}{2})\\ 2m_{21}e^{\frac{m_{11}+m_{22}}{2}}\sinh(\frac{\Omega}{2})&e^{\frac{m_{11}+m_{22}}{2}}\left[\Omega\cosh(\frac{\Omega}{2})+(m_{22}-m_{11})\sinh(\frac{\Omega}{2})\right]\\ \end{array}\right].

m_{11} = lo g (r^{α}) - θ γ, m_{12} = lo g (r^{β}) - θ δ,

m_{11} = lo g (r^{α}) - θ γ, m_{12} = lo g (r^{β}) - θ δ,

m_{21} = lo g (r^{γ}) + θ α, m_{22} = lo g (r^{δ}) + θ β .

m_{21} = lo g (r^{γ}) + θ α, m_{22} = lo g (r^{δ}) + θ β .

σ (lo g (A) B) \subset {lo g (α_{i}) β_{j} : i, j = 1, \dots, n} .

σ (lo g (A) B) \subset {lo g (α_{i}) β_{j} : i, j = 1, \dots, n} .

^{B} A = B A^{B} B^{- 1} .

^{B} A = B A^{B} B^{- 1} .

\phi\left(\left[\begin{array}[]{cc}X&E\\ 0&X\end{array}\right]\right)=\left[\begin{array}[]{cc}\phi(X)&L_{\phi}(X,E)\\ 0&\phi(X)\end{array}\right],

\phi\left(\left[\begin{array}[]{cc}X&E\\ 0&X\end{array}\right]\right)=\left[\begin{array}[]{cc}\phi(X)&L_{\phi}(X,E)\\ 0&\phi(X)\end{array}\right],

f\left(\left[\begin{array}[]{cc}X&E\\ 0&X\end{array}\right],\,\left[\begin{array}[]{cc}Y&F\\ 0&Y\end{array}\right]\right)=\left[\begin{array}[]{cc}f(X,Y)&L_{f}(X,Y;E,F)\\ 0&f(X,Y)\end{array}\right].

f\left(\left[\begin{array}[]{cc}X&E\\ 0&X\end{array}\right],\,\left[\begin{array}[]{cc}Y&F\\ 0&Y\end{array}\right]\right)=\left[\begin{array}[]{cc}f(X,Y)&L_{f}(X,Y;E,F)\\ 0&f(X,Y)\end{array}\right].

f\left(\left[\begin{array}[]{cc}X&X^{\prime}(0)\\ 0&X\end{array}\right],\,\left[\begin{array}[]{cc}Y&Y^{\prime}(0)\\ 0&Y\end{array}\right]\right)=\left[\begin{array}[]{cc}f(X,Y)&\left.\frac{d}{dt}\right|_{t=0}f\left(X(t),Y(t)\right)\\ 0&f(X,Y)\end{array}\right].

f\left(\left[\begin{array}[]{cc}X&X^{\prime}(0)\\ 0&X\end{array}\right],\,\left[\begin{array}[]{cc}Y&Y^{\prime}(0)\\ 0&Y\end{array}\right]\right)=\left[\begin{array}[]{cc}f(X,Y)&\left.\frac{d}{dt}\right|_{t=0}f\left(X(t),Y(t)\right)\\ 0&f(X,Y)\end{array}\right].

U=\left[\begin{array}[]{cc}I&I/\epsilon\\ 0&I\end{array}\right],

U=\left[\begin{array}[]{cc}I&I/\epsilon\\ 0&I\end{array}\right],

\displaystyle f\left(\left[\begin{array}[]{cc}X(0)&\frac{X(\epsilon)-X(0)}{\epsilon}\\ 0&X(0)\end{array}\right],\,\left[\begin{array}[]{cc}Y(0)&\frac{Y(\epsilon)-Y(0)}{\epsilon}\\ 0&Y(0)\end{array}\right]\right)=

\displaystyle f\left(\left[\begin{array}[]{cc}X(0)&\frac{X(\epsilon)-X(0)}{\epsilon}\\ 0&X(0)\end{array}\right],\,\left[\begin{array}[]{cc}Y(0)&\frac{Y(\epsilon)-Y(0)}{\epsilon}\\ 0&Y(0)\end{array}\right]\right)=

\displaystyle\quad\qquad=U\,f\left(U^{-1}\left[\begin{array}[]{cc}X(0)&\frac{X(\epsilon)-X(0)}{\epsilon}\\ 0&X(0)\end{array}\right]\,U,\,U^{-1}\left[\begin{array}[]{cc}Y(0)&\frac{Y(\epsilon)-Y(0)}{\epsilon}\\ 0&Y(0)\end{array}\right]\,U\right)U^{-1}

\displaystyle\quad\qquad=U\,f\left(\left[\begin{array}[]{cc}X(0)&0\\ 0&X(\epsilon)\end{array}\right],\,\left[\begin{array}[]{cc}Y(0)&0\\ 0&Y(\epsilon)\end{array}\right]\right)U^{-1}

\displaystyle\quad\qquad=U\,\left[\begin{array}[]{cc}e^{\log(X(0))\,Y(0)}&0\\ 0&e^{\log(X(\epsilon))\,Y(\epsilon)}\end{array}\right]\,U^{-1}

\displaystyle\quad\qquad=\left[\begin{array}[]{cc}f(X,Y)&\frac{f\left(X(\epsilon),Y(\epsilon)\right)-f(X,Y)}{\epsilon}\\ 0&f\left(X(\epsilon),Y(\epsilon)\right)\end{array}\right],

L_{f}(A,B;E,F)=L_{\exp}\big{(}\log(A)\,B;\log(A)F+L_{\log}(A;E)\,B\big{)},

L_{f}(A,B;E,F)=L_{\exp}\big{(}\log(A)\,B;\log(A)F+L_{\log}(A;E)\,B\big{)},

\displaystyle f\left(\left[\begin{array}[]{cc}A&E\\ 0&A\end{array}\right],\,\left[\begin{array}[]{cc}B&F\\ 0&B\end{array}\right]\right)

\displaystyle f\left(\left[\begin{array}[]{cc}A&E\\ 0&A\end{array}\right],\,\left[\begin{array}[]{cc}B&F\\ 0&B\end{array}\right]\right)

L_{f} (t, B; ϵ, F) = L_{e x p} (lo g (t) A; lo g (t) F + B ϵ / t) .

L_{f} (t, B; ϵ, F) = L_{e x p} (lo g (t) A; lo g (t) F + B ϵ / t) .

g(X,Y)=\left[\begin{array}[]{c}g_{1}(X,Y)\\ g_{2}(X,Y)\end{array}\right],

g(X,Y)=\left[\begin{array}[]{c}g_{1}(X,Y)\\ g_{2}(X,Y)\end{array}\right],

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\marginsize

2.5cm2.5cm1.0cm1.0cm

On the conditioning of the matrix-matrix exponentiation

João R. Cardosoa111E-mail address of João R. Cardoso: [email protected], Amir Sadeghi b222Corresponding author (E-mail address: [email protected]) ,

a* Polytechnic Institute of Coimbra/ISEC, Coimbra – Portugal, and

Institute of Systems and Robotics, University of Coimbra, Pólo II, Coimbra – Portugal

bDepartment of Mathematics, Robat Karim Branch, Islamic Azad University, Tehran, Iran. *

Abstract

If ${A}$ has no eigenvalues on the closed negative real axis, and $B$ is arbitrary square complex, the matrix-matrix exponentiation is defined as $A^{B}:=e^{\log({A}){B}}$ . This function arises, for instance, in Von Newmann’s quantum-mechanical entropy, which in turn finds applications in other areas of science and engineering. Since in general $A$ and $B$ do not commute, this bivariate matrix function may not be a primary matrix function as commonly defined, which raises many challenging issues. In this paper, we revisit this function and derive new related results. Particular emphasis is given to its Fréchet derivative and conditioning. We present a general result on the Fréchet derivative of bivariate matrix functions with applications not only to the matrix-matrix exponentiation but also to other functions, such as the second order Fréchet derivatives and some iteration functions arising in matrix iterative methods. The numerical computation of the Fréchet derivative is discussed and an algorithm for computing the relative condition number of $A^{B}$ is proposed. Some numerical experiments are included.

keywords: Matrix-matrix exponentiation, Conditioning, Fréchet Derivative, Matrix exponential, Matrix logarithm

1 Introduction

Let $A$ be an $n\times n$ square complex matrix with no eigenvalues on the closed negative real axis $\mathbb{R}_{0}^{-}$ and let $B$ be an arbitrary square complex matrix of order $n$ . The matrix-matrix exponentiation $A^{B}$ is defined as

[TABLE]

where $e^{X}$ stands for the exponential of the matrix $X$ and $\log(A)$ denotes the principal logarithm of $A$ , i.e., the unique solution of the matrix equation $e^{X}=A$ whose eigenvalues lie on the open strip $\{z\in\mathbb{C}:-\pi<\mathop{\mathrm{Im}}z<\pi\}$ of the complex plane; $\mathop{\mathrm{Im}}z$ stands for the imaginary part of $z$ .

For background on matrix exponential, matrix logarithm and general matrix functions see [15, 20] and the references therein. Note that although $A^{B}$ includes well-known matrix functions as particular cases (for instance, the matrix inverse and real powers of a matrix; see Lemma 2.1 below), it is not, in general, a primary matrix function as defined in those books. Indeed, there may not exist a scalar single variable stem function associated to the matrix-matrix exponentiation. However, we can view $A^{B}$ as being an extension of the two variable function $x^{y}=e^{x\log y}$ , but the lack of commutativity between $A$ and $B$ turns the extension of this function to matrices quite cumbersome. An interesting attempt to define the concept of bivariate matrix function as an operator is given in the monograph [24]. Although we refer to the matrix-matrix exponentiation as being a bivariate matrix function, it does not belong to the class of bivariate matrix functions defined in [24]. Here, $A^{B}$ can be regarded as a function from $\mathbb{C}^{n\times n}\times\mathbb{C}^{n\times n}$ to $\mathbb{C}^{n\times n}$ which assigns to each pair of matrices $(A,B)$ the $n\times n$ square complex matrix $A^{B}$ . Another way of defining the concept of matrix-matrix exponentiation would be

[TABLE]

In Section 2, some relationships between $A^{B}$ and ${}^{B}A$ are pointed out (see, in particular, (iii) in Lemma 2.2). However, our attention will be mainly focused on $A^{B}$ . Analogue results follow straightforward for ${}^{B}A$ . A definition of the matrix-matrix exponentiation in a componentwise fashion is also possible, as used in [11] to deal with some problems is Statistics. However, this latter definition is not considered in this work.

One of our goals is to investigate the sensitivity of the function $A^{B}$ to perturbations of first order in $A$ and $B$ . A widely used tool to carry out this is the Fréchet derivative, which in turn allows the computation of the condition number of the function. In this work, we derive a general result on the Fréchet derivative of certain bivariate matrix functions (Theorem 3.1), which can be used to find easily an explicit formula for the Fréchet derivative of the matrix-matrix exponentiation in terms of the Fréchet derivatives of the matrix exponential and matrix logarithm. Formulae for the Fréchet derivatives of other bivariate matrix functions, such as iteration functions to the matrix square root (see [15, Sec. 6.4]) and to the matrix arithmetic-geometric mean (see [7]), can also be obtained from the application of that result. The same holds for the second order Fréchet derivatives of primary matrix functions.

Given a map $f:\mathbb{C}^{n\times n}\times\mathbb{C}^{n\times n}\rightarrow\mathbb{C}^{n\times n}$ , the Fréchet derivative of $f$ at $(X,Y)$ , with $X,Y\in\mathbb{C}^{n\times n}$ , in the direction of $(E,F)$ , where $E,F\in\mathbb{C}^{n\times n}$ , is a linear operator $L_{f}(X,Y)$ that maps the “direction matrix” $(E,F)$ to $L_{f}(X,Y;E,F)$ such that

[TABLE]

The Fréchet derivative of $f$ may not exist at $(X,Y)$ , but if it does it is unique and coincides with the directional (or Gâteaux) derivative of $f$ at $(X,Y)$ in the direction $(E,F)$ . Hence, the existence of the Fréchet derivative guarantees that for any $E,F\in\mathbb{C}^{n\times n}$ ,

[TABLE]

Any consistent matrix norm $\|.\|$ on $\mathbb{C}^{m\times n}$ induces the operator norm

[TABLE]

The (relative) condition number of $f$ at $(X,Y)$ is defined by

[TABLE]

Hence, if an approximation to $L_{f}(X,Y;E,F)$ is known, then there exist numerical schemes to estimate $\|L_{f}(X,Y)\|$ (for instance, the power method on Fréchet derivative proposed in [23]; see also [15, Alg. 3.20]) and then the condition number $\kappa_{f}(X,Y)$ . As far as we know, we are the first to investigate the Fréchet derivative of the matrix-matrix exponentiation and its conditioning. In Section 4, we discuss the efficient computation of $L_{f}(A,B;E,F)$ , where $f(A,B):=A^{B}$ , and propose a power method for estimating the Frobenius norm of $L_{f}(A,B)$ and then the corresponding condition number $\kappa_{f}(A,B)$ . In the numerical experiments carried out in Section 5 for several pairs of matrices $(A,B)$ , two iterations of the power method suffices to estimate $\|L_{f}(A,B)\|_{F}$ (where $\|.\|_{F}$ stands for the Frobenius norm), with a relative error smaller than $10^{-3}$ .

Here, one uses the same notation to denote both the matrix norm and the induced operator norm. For more information on the Fréchet derivative and its properties see, for instance, [5, Ch. X] and [15, Ch. 3]. Note also that the pair $(E,F)$ corresponds, using matrix terminology, to the block matrix $\left[\begin{array}[]{c}E\\ F\end{array}\right]$ . So the notation $\|(E,F)\|$ used above is clear.

To our knowledge, the terminology “matrix-matrix exponentiation” was firstly coined by Barradas and Cohen in [6], where this function arises in a problem of Von Newmann’s quantum-mechanical entropy. Some properties of the matrix-matrix exponentiation are addressed in [6], for the particular case when $A$ is a normal matrix. We revisit some of those properties and derive new ones.

A particular case of the matrix-matrix exponentiation is the so called “scalar-matrix exponentiation”. If $t$ is a complex number no belonging to $\mathbb{R}_{0}^{-}$ , we can define $t^{A}$ as the function from $\mathbb{C}\times\mathbb{C}^{n\times n}$ to $\mathbb{C}^{n\times n}$ which assigns to each pair $(t,B)$ the $n\times n$ square complex matrix $t^{A}:=e^{\log tA}$ . This function appears in the definitions of matrix Gamma and Beta functions, which in turn can be applied to solving certain matrix differential equations [21, 22]. Gamma and Beta functions in matrix form are defined, respectively, as [22]:

[TABLE]

Our results apply easily to this particular case.

Notation: $\|.\|$ denotes a subordinate matrix norm and $\|.\|_{F}$ the Frobenius norm; $\mathop{\mathrm{Im}}(z)$ is the imaginary part of the complex number $z$ ; $\sigma(A)$ is the spectrum of the matrix $A$ ; $A^{\ast}$ is the conjugate transpose of $A$ , $L^{\star}_{f}(.)$ is the adjoint of the linear transformation $L_{f}(.)$ .

The organization of the paper is as follows. In Section 2 we revisit some facts about the matrix-matrix exponentiation and add some related results not previously stated in the literature. A formula for the Fréchet derivative of certain bivariate matrix is proposed in Section 3. This formula is in turn used to derive a formula for the Fréchet derivative of the matrix-matrix exponentiation. It is also explained how it can be applied to the Fréchet derivative of well known bivariate functions. Section 4 is devoted to investigate the conditioning of the matrix-matrix exponentiation. In particular, an algorithm for estimating the relative condition number is propose. Its performance is illustrated by numerical experiments in Section 5. A few conclusions are drawn in Section 6.

2 Basic results

In this section we present some theoretical results on the matrix-matrix exponentiation that can be derived from the properties of the much studied exponential and logarithm matrix functions.

According to the definition (1.1) and some well-known identities valid for the matrix exponential, we have

[TABLE]

and

[TABLE]

In addition, ${A}^{{B}}$ can be considered as the solution of the matrix initial value problem

[TABLE]

Lemma 2.1.

If ${A}\in\mathbb{C}^{n\times n}$ has no eigenvalues on $\mathbb{R}_{0}^{-}$ , ${B}$ is any square complex matrix, and $f(A,B)=A^{B}$ , then the following properties hold:

(i)

${A}^{0}={I}$ * and ${{I}}^{{B}}={I}$ ; *

** 2. (ii)

${A}^{\alpha{I}}={A}^{\alpha}$ *, with $\alpha\in\mathbb{R}$ . In particular, ${A}^{\frac{1}{2}{I}}={A}^{\frac{1}{2}}$ and ${A}^{-{I}}={A}^{-1};$ *

** 3. (iii)

If the eigenvalues of $\log(A)B$ satisfy $-\pi<\mathop{\mathrm{Im}}(\lambda)<\pi$ , then ${A}^{({BC})}=({A}^{{B}})^{{C}}$ ;

** 4. (iv)

${A}^{-{B}}{A}^{{B}}={A}^{{B}}{A}^{-{B}}={I}$ , therefore $({A}^{B})^{-1}={A}^{-{B}}$ ;

** 5. (v)

$({A}^{{B}})^{\ast}=\,^{{B}^{\ast}}{A}^{\ast}$ *, where $X^{\ast}$ stands for the conjugate transpose of $X$ ; *

** 6. (vi)

If $S$ is an invertible matrix then $f(SAS^{-1},SBS^{-1})=S\,f(A,B)\,S^{-1}$ .

Proof.

Immediate consequence from properties of matrix exponential and matrix logarithm. See [15, 20]. ∎

The following example shows that explicit formulae for matrix-matrix exponentiation may involve complicated expressions, even for the case $2\times 2$ , with $A$ normal.

Example 1.

Let ${A}=\bigl{[}\begin{smallmatrix}a&b\\ -b&a\\ \end{smallmatrix}\bigr{]}$ be a nonsingular normal matrix and ${B}=\bigl{[}\begin{smallmatrix}\alpha&\beta\\ \gamma&\delta\\ \end{smallmatrix}\bigr{]}$ be an arbitrary matrix. Our aim is to find a closed expression for $A^{B}$ . The eigenvalues and eigenvectors of ${A}$ are displayed in matrices $D$ and $V$ , respectively:

[TABLE]

where $a-ib=\overline{z}=re^{-i\theta}$ and $a+ib=z=re^{i\theta}$ for $-\pi<\theta\leq\pi$ . It is clear that in the sense of polar notation, we have $r^{2}=a^{2}+b^{2}$ and $\theta=\arctan(b/a)$ (provided that $a\neq 0$ ). Therefore, the logarithm of ${A}$ can be evaluated by the decomposition $\log({A})={V}\log({D}){V}^{-1}$ as following:

[TABLE]

Hence, multiplying the matrices $\log({A})$ and ${B}$ ,

[TABLE]

It is known that the exponential of an $2\times 2$ matrix ${M}=\bigl{[}\begin{smallmatrix}m_{11}&m_{12}\\ m_{21}&m_{22}\\ \end{smallmatrix}\bigr{]}$ , can be explicitly obtained by the following relation (see [28]):

[TABLE]

where, $\Omega=\sqrt{(m_{11}-m_{22})^{2}+4m_{12}m_{21}}$ . Consequently, an explicit formula for ${A}^{{B}}$ can be obtained via substituting:

[TABLE]

As mentioned before, some facts about the matrix-matrix exponentiation have been reported in [6], under the assumption of $A$ being normal. One of them is revisited in (i) of the next lemma. However, the relationships between $A^{B}$ and ${}^{B}A$ , and their spectra, stated in (ii) and (iii) of the following lemma are new.

Lemma 2.2.

If ${A}\in\mathbb{C}^{n\times n}$ has no eigenvalues on $\mathbb{R}_{0}^{-}$ , and ${B}$ is any square complex matrix, then the following properties hold:

(i)

$A^{B}$ * and ${}^{B}A$ have the same spectra;*

** 2. (ii)

*If $A$ and $B$ commute, and have spectra $\sigma(A)=\{\alpha_{1},\ldots,\alpha_{n}\}$ , $\sigma(B)=\{\beta_{1},\ldots,\beta_{n}\}$ , then the spectrum of $A^{B}$ (or ${}^{B}A$ ) is given by $\{\alpha_{i_{1}}^{\beta_{j_{1}}},\ldots,\alpha_{i_{n}}^{\beta_{j_{n}}}\}$ for some permutations $\{i_{1},\ldots,i_{n}\}$ and $\{j_{1},\ldots,j_{n}\}$ of the set $\{1,\ldots,n\}$ ; *

** 3. (iii)

$B\,A^{B}=\,^{B}A\,B.$ **

Proof.

(i)

See [6, Thm. 3.2]. This follows immediately from the classical result of matrix theory stating that when $X$ and $Y$ are square matrices, both products $XY$ and $YX$ have the same spectra (see, for instance, Theorem 1.3.20 and Problem 9 in [19]). 2. (ii)

Since $A$ and $B$ commute, $\log(A)$ and $B$ also commute. Hence, the results follows from the fact that

[TABLE] 3. (iii)

Immediate consequence of the identity $Ye^{XY}=e^{YX}Y$ , that is valid for any square complex matrices of order $n$ ; see [15, Cor. 1.34].

∎

One important implication of the statement (iii) of Lemma 2.2 is that when $B$ in nonsingular, ${}^{B}A$ can be computed easily from $A^{B}$ :

[TABLE]

A natural way of computing the matrix-matrix exponentiation $A^{B}$ is to first evaluate $\log(A)$ and then the exponential of $\log(A)B$ . Matrix exponential and logarithm are much studied functions and one can found many methods for computing them in the literature. The most popular method to the matrix exponential is the so-called scaling and squaring method combined with Padé approximation, that has been investigated and improved by many authors; see for instance [15, Ch. 10] and the references therein and also the more recent paper [1] that includes the algorithm where the expm function of recent versions of MATLAB is based on. The MATLAB function logm implements the algorithm provided in [2, 3], which is an improved version of the inverse scaling and squaring with Padé approximants method proposed in [23]. Other methods for approximating these functions include, for instance, the Taylor polynomial based methods for the matrix exponential proposed in [29] and the iterative transformation-free method of [7] for the matrix logarithm.

A topic that needs further research is the development of algorithms for the matrix-matrix exponentiation that are less expensive than the computation of one matrix exponential and one matrix logarithm plus a matrix product. This seems to be a very challenging issue, especially when $B$ does not commute with $A$ . Of course, for some particular cases of the matrix-matrix exponentiation (e.g., the matrix square root, matrix $p$ -th roots, the matrix inverse) there are more efficient methods that do not involve the computation of matrix exponentials and logarithms. This problem becomes easier even in the more general case when $A$ and $B$ commute. This is because both matrices may share the same Schur decomposition which reduces considerably the computational effort.

3 The Fréchet derivative of bivariate matrix functions

A key result, very useful from both theoretical and computational perspectives, related with the Fréchet derivative of a primary matrix function $\phi:\mathbb{C}^{n\times n}\rightarrow\mathbb{C}^{n\times n}$ , states that

[TABLE]

where $\phi$ is a scalar complex function $2n-1$ times continuously differentiable on an open subset containing the spectrum of $X$ , the matrix $E\in\mathbb{C}^{n\times n}$ is arbitrary and $L_{\phi}(X,E)$ denotes the Fréchet derivative of $\phi$ at $X$ in the direction of $E$ (see [25, Thm. 2.1] and [15, Eq. (3.16)]).

Next theorem extends the identity (3.1) to certain bivariate matrix functions.

Theorem 3.1.

Let $X=\left[x_{ij}\right]_{i,j},\,Y=\left[y_{ij}\right]_{i,j},\,E,F\in\mathbb{C}^{n\times n}$ and assume that $f(X,Y)\in\mathbb{C}^{n\times n}$ is a bivariate matrix function with partial derivatives $\frac{\partial f}{\partial x_{ij}}$ and $\frac{\partial f}{\partial y_{ij}}$ being continuous functions on an open subset ${\mathcal{S}}\subset\mathbb{C}^{n\times n}\times\mathbb{C}^{n\times n}$ containing $(X,Y)$ . If the curves $X(t):=X+tE$ and $Y(t):=Y+tF$ are differentiable at $t=0$ , with $\left(X(t),Y(t)\right)\in{\mathcal{S}}$ for all $t$ in a certain neighborhood of [math], $f$ maps $2\times 2$ –block upper triangular matrices to $2\times 2$ –block upper triangular, then

[TABLE]

Proof.

Since the $2n^{2}$ partial derivatives $\frac{\partial f}{\partial x_{ij}}$ and $\frac{\partial f}{\partial y_{ij}}$ exist and are continuous on the open subset ${\mathcal{S}}$ , the Fréchet derivative of $f$ on ${\mathcal{S}}$ exists (see [8, Sec. 3.1]) and thus coincides with the Gâteaux derivative.

Assuming that $X(t)$ and $Y(t)$ are differentiable at $t=0$ and $\left(X(t),Y(t)\right)\in{\mathcal{S}}$ for all $t$ in a certain neighborhood of [math], we shall prove below that an analogue identity to the one in [25, Eq. (1.1)] (see also [15, Thm. 3.6]) holds for our function $f$ , that is,

[TABLE]

Indeed, denoting

[TABLE]

with $\epsilon\neq 0$ , we have

[TABLE]

from which the result follows by evaluating the limit of the above matrices when $\epsilon\rightarrow 0$ . ∎

An explicit formula for the Fréchet derivative of the matrix-matrix exponentiation, in terms of the Fréchet derivatives of the matrix exponential and matrix logarithm, is given in the next corollary.

Corollary 3.1.

Let ${\mathcal{S}}$ be the open subset formed by all pairs $(A,B)$ with $A$ having no eigenvalues on $\mathbb{R}_{0}^{-}$ and $B$ arbitrary. Denoting $f(A,B):=A^{B}$ , it holds

[TABLE]

where $L_{\exp}$ and $L_{\log}$ stand for the Fréchet derivatives of the matrix exponential and matrix logarithm, respectively.

Proof.

It easy to check that the conditions of Theorem 3.1 are met. Then the result follows immediately from the identities

[TABLE]

∎

If $A=tI$ , with $t$ not belonging to the closed negative real axis, then $(tI)^{B}=t^{B}$ is the scalar-matrix exponentiation which arises in matrix Beta and Gamma functions defined in (1.4) and (1.3), respectively. The Fréchet derivative in this special case can be written as

[TABLE]

For $B=\alpha\,I$ , with $\alpha\in\mathbb{R}$ , the matrix-matrix exponentiation $f$ reduces to the single variable matrix function $f(A)=A^{\alpha}$ of real powers of $A$ , which has been addressed recently in [16, 17]. Provided that $\alpha$ is not affected by any kind of perturbation (that is, $F=0$ ), (3.9) reduces to formula (2.4) in [16], that has been obtained using other techniques. Note that while (3.9) covers the case when $\alpha$ is perturbed, formula (2.4) in [16] does not.

In addition to the result of the previous corollary, the identity (3.2) provides alternative means for obtaining closed expressions for the Fréchet derivatives of other known bivariate matrix functions. For instance, many iteration functions for approximating the matrix square root ([14], [15, Ch. 6]) or, more generally, for the matrix $p$ -th root [13, 18] are of the form

[TABLE]

where $g_{1}$ and $g_{2}$ satisfy some smooth requirements. Since

[TABLE]

a closed expression for $L_{g_{i}}(X,Y;E,F)$ ( $i=1,2$ ) follows from (3.2). The same relationship applies to the matrix arithmetic-geometric mean iteration [7, 30] and to find expressions for the second order Fréchet derivatives [5, Ch. X] of primary matrix functions.

Closed formulae for the Fréchet derivatives of matrix exponential and matrix logarithm are available in the literature. One of the most known for the matrix exponential is the integral formula

[TABLE]

(see [31] and [15, Ch. 10]). Another formula, involving the vectorization of the Fréchet derivative, is

[TABLE]

where $\mathop{\mathrm{vec}}(.)$ stands for the operator that stacks the columns of $E$ into a long vector of size $n^{2}\times 1$ , and

[TABLE]

with $\psi(x)=(e^{x}-1)/x$ . The symbols $\otimes$ and $\oplus$ denote the Kronecker product and the Kronecker sum, respectively. Other representations for $K_{\exp}(A)$ are available in [15, Eq. (10.3)]; see also [27, 23].

An integral representation of the Fréchet derivative of the matrix logarithm is

[TABLE]

(see [9] and [15, Ch. 11]). Vectorizing (3.15) yields

[TABLE]

where

[TABLE]

Gathering the formulae above, a vectorization of the Fréchet derivative of the matrix-matrix exponentiation $f(A,B)=A^{B}$ can be given by

[TABLE]

where

[TABLE]

Fréchet derivatives allow us to understand how the function $f(A,B)=A^{B}$ behaves when both $A$ and $B$ are subject to small perturbations. Suppose now that $A$ does not suffer any kind of perturbation but $B$ does. Now just $B$ is regarded as a variable and similar perturbed results to those of matrix exponential are valid, as shown below in Theorem 3.2.

Theorem 3.2.

Assume that $A$ has no eigenvalue on $\mathbb{R}_{0}^{-}$ . For any ${B}_{1},{B}_{2}\in\mathbb{C}^{n\times n}$ , the following relation holds:

[TABLE]

Proof.

From the theory of the matrix exponential, it is straightforward that

[TABLE]

(see [4]). Let us consider ${B}={B}_{1}$ , ${B}_{2}={B_{1}+E}$ and $t=1$ in (3.20). Hence, we have

[TABLE]

Therefore, taking norms, one has

[TABLE]

∎

4 Conditioning of the matrix-matrix exponentiation

From now on, we will consider the Frobenius norm only. However, with appropriate modifications, some results can be adapted to other norms. The key factor for evaluating the condition number $k_{f}(A,B)$ is the norm of the operator $L_{f}(A,B)$ . In this section, we first present an upper bound to such a norm and then a power method for its estimation. The notation $f(A,B)=A^{B}$ is used again.

Theorem 4.1.

Assume that the conditions of Corollary 3.1 are valid. With respect to the Frobenius norm, the following inequality holds:

[TABLE]

Proof.

For $M:=\log(A)F+L_{\log}(A;E)\,B$ , we have

[TABLE]

By (3.9),

[TABLE]

Hence

[TABLE]

∎

If $\|A-I\|_{F}<1$ , one can find an upper bound for the factor $\|\log(A)\|_{F}$ in the right hand side of (4.1) as follows:

[TABLE]

In the general case, it is hard to bound $\|\log(A)\|_{F}$ , which can be infinitely large. However, since the logarithm function increases in a very slow fashion, in practice the values attained by $\|\log(A)\|_{F}$ can be considered small. For instance, its largest value for the ten matrices considered in the numerical experiments in Section 5 is about $50$ (see bottom-left plot in Figure 1).

For better estimates to $\left\|L_{f}(A,B)\right\|_{F}$ , we propose below a particular power method for the matrix-matrix exponentiation using the framework of [15, Alg. 3.20]. Before stating the detailed steps of the methods, we shall address two important issues raised by its implementation. The first one is the computation of the Fréchet derivative $L_{f}(A,B;E,F)$ and the second one is how to find the adjoint operator $L^{\star}$ with respect to the Euclidean inner product $\langle X,Y\rangle=\mathop{\mathrm{trace}}(Y^{\ast}X)$ . We recall that the matrix of our linear operator $L_{f}(A,B)$ is not square which means that its expression is not so simple to obtain as in the square case, where one just needs to take the conjugate transpose of the argument (check the top of p. 66 in [15]).

About the first issue, and attending to the developments carried out in Section 3, we will use formula (3.9). For the computation of $L_{\exp}$ we consider [1, Alg. 6.4], and for the computation of $\log$ and $L_{\log}$ we use [3, Alg. 5.1] (without the computation of $L_{\log}^{\star}$ ). We can, alternatively, use

[TABLE]

(this should be read as: the Fréchet derivative is the block $(1,2)$ of the resulting matrix in the right-hand side; see (3.3)), but this formula is more expensive than (3.9), even if we exploit the particular structure of the two block matrices in the right-hand side of (4.3). More disadvantages of formulae like (4.3) are mentioned in [2, 3].

Now we focus on finding a closed expression for the adjoint operator $L^{\star}_{f}(A,B)$ , where $f(A,B)=A^{B}$ . According to the theory of adjoint operators (see, for instance, [10]), one needs to look for the unique operator

$\begin{array}[]{rccl}L_{f}^{\star}(A,B):&\mathbb{C}^{n\times n}&\longrightarrow&\mathbb{C}^{n\times n}\times\mathbb{C}^{n\times n}\\ &W&\longmapsto&L_{f}^{\star}(A,B;W),\\ \end{array}$

such that

[TABLE]

where $K^{\ast}_{f}(A,B)$ is the conjugate transpose of (3.18). Since

[TABLE]

and, for $Z:=L_{\exp}\left((\log(A)B)^{\ast};W\right)B^{\ast},$ it holds

[TABLE]

one has

[TABLE]

and, consequentely,

[TABLE]

We are now ready to propose an algorithm to estimate the condition number $\kappa_{f}$ of the matrix-matrix exponentiation with respect to the Frobenius norm.

Algorithm 4.1.

Given $A,B\in\mathbb{C}^{n\times n}$ , with $A$ having no eigenvalues on the closed negative real axis, this algorithm estimates the condition number $\kappa_{f}(A,B)$ defined in (1.2), where $f(A,B)=A^{B}$ , with respect to the Frobenius norm.

Choose nonzero starting matrices $E_{0},F_{0}\in\mathbb{C}^{n\times n}$ and a tolerance tol;
Set $\gamma_{0}=0$ , $\gamma_{1}=1$ and $k=0$ ;
while* $\left|\gamma_{k+1}-\gamma_{k}\right|>\mathtt{tol}\,\gamma_{k+1}$ *

$W_{k+1}=L_{f}(A,B;E_{k},F_{k})$ , with $L_{f}$ given by (3.9);
$Z_{k+1}=L^{\star}_{f}\left(A,B;W_{k+1}\right)$ , with $L_{f}^{\star}$ given by (4.5);
$\gamma_{k+1}=\left\|Z_{k+1}\right\|_{F}/\left\|W_{k+1}\right\|_{F}$ ;
$E_{k+1}=Z_{k+1}(1:n,1:n)$ ; $F_{k+1}=Z_{k+1}(n+1:2n,1:n)$ ;
$k=k+1$ ;

end**
$\|L_{f}(A,B)\|_{F}\approx\gamma_{k+1}$ ;
$\kappa_{f}(A,B)=\|(A,B)\|_{F}\,\|L_{f}(A,B)\|_{F}/\left\|A^{B}\right\|_{F}$ .

Cost: $(5\alpha_{1}+\alpha_{2}+2\alpha_{3}+2\alpha_{4})k$ , where $\alpha_{1}$ is the cost of computing one matrix-matrix product (about $2n^{3}$ ), $\alpha_{2}$ is the cost of computing $\log(A)$ , $\alpha_{3}$ corresponds to the computation of $L_{\log}(A;Z)$ ( $Z$ stands for a given complex matrix of order $n$ ), and $\alpha_{4}$ is the cost for $L_{\exp}\left(\log(A)B;Z\right)$ . If $\log(A)$ and $L_{\log}(A;Z)$ are computed by [3, Alg. 5.1], then $(\alpha_{2}+2\alpha_{3})k$ is about $\left(25+\left(19+\frac{13}{3}(s+m)\right)k\right)n^{3}$ , where $s$ is the number of square roots needed in the inverse scaling and squaring procedure and $m$ is the order of Padé approximants considered; assuming that $L_{\exp}$ is evaluated by [1, Alg. 6.4], $2\alpha_{4}k$ is about $(4w_{m}+12s+32/3)kn^{3}$ , where $w_{m}$ is a number given in [1, Table 6.2], which is related with the order of Padé approximants to the matrix exponential, and $s$ is the number of squarings.

5 Numerical experiments

We have implemented Algorithm 4.1 in MATLAB, with unit roundoff $u\approx 1.1\times 10^{-16}$ , with a set of ten pairs of matrices $(A_{j},B_{j}),\ j=1,\ldots,10$ , with sizes ranging from $10\times 10$ to $15\times 15$ . Many pairs include matrices with nonreal entries and/or matrices from MATLAB’s gallery (for instance, the matrices lehmer, dramadah, hilb, cauchy and condex).

The top-left plot displays the relative errors for the condition number $\kappa_{f}(A_{j},B_{j})$ estimated by Algorithm 4.1, for each pair of matrices. As “exact condition number”, we have considered the value given by our implementation of Algorithm 3.17 in [15]. It is worth noticing that this latter algorithm requires $O(n^{5})$ flops while Algorithm 4.1 involves $O(n^{3})$ flops. Top-right graphic shows that just $2$ iterations in Algorithm 4.1 were needed to meet the prescribed tolerance $\mathtt{tol}=10^{-1}$ . The bottom-left plot illustrates our claim after the proof of Theorem 4.1 about the small norm of the matrix logarithm and, finally, the bottom-right plots the values of the relative condition number $\kappa_{f}(A_{j},B_{j})$ , for each $j$ .

6 Conclusions

The Fréchet derivative of the matrix-matrix exponentiation and its conditioning have been investigated for the first time (as far as we know). We have given a general formula for the Fréchet derivative of certain bivariate matrix functions, with applications to well-know bivariate matrix functions, including the matrix-matrix exponentiation. An algorithm based on the power method for estimating the relative condition number has been proposed. Some numerical experiments illustrate our results. Basic results on the matrix-matrix exponentiation have been derived as well.

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. H. Al-Mohy, N. J. Higham, A new scaling and squaring algorithm for the matrix exponential, SIAM J. Matrix Anal. Appl., 31(3), 970–989 (2009).
2[2] A. H. Al-Mohy, N. J. Higham, Improved inverse scaling and squaring algorithms for the matrix logarithm, SIAM J. Sci. Comput., 34(4), C 153–C 169 (2012).
3[3] A. H. Al-Mohy, N. J. Higham and S. D. Relton, Computing the Frechet derivative of the matrix logarithm and estimating the condition number, SIAM J. Sci. Comput., 35(4), C 394–C 410 (2013).
4[4] R. Bellman, Introduction to matrix analysis , Mc Graw-Hill, New York (1960).
5[5] R. Bhatia, Matrix Analysis , Springer-Verlag, New York (1997).
6[6] I. Barradas, J. E. Cohen, Iterated Exponentiation, Matrix-Matrix Exponentiation, and Entropy, J. Math. Anal. Appli., 183, 76–88 (1994).
7[7] J. R. Cardoso, R. Ralha, Matrix arithmetic-geometric mean and the computation of the logarithm, SIAM J. Matrix Anal. Appl., 37 (2), 719–743 (2016).
8[8] W. Cheney , Analysis for Applied Mathematics , Graduate Texts in Mathematics 208, Springer-Verlag, New York (2001).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On the conditioning of the matrix-matrix exponentiation

Abstract

1 Introduction

2 Basic results

Lemma 2.1**.**

Proof.

Example 1**.**

Lemma 2.2**.**

Proof.

3 The Fréchet derivative of bivariate matrix functions

Theorem 3.1**.**

Proof.

Corollary 3.1**.**

Proof.

Theorem 3.2**.**

Proof.

4 Conditioning of the matrix-matrix exponentiation

Theorem 4.1**.**

Proof.

Algorithm 4.1**.**

5 Numerical experiments

6 Conclusions

Lemma 2.1.

Example 1.

Lemma 2.2.

Theorem 3.1.

Corollary 3.1.

Theorem 3.2.

Theorem 4.1.

Algorithm 4.1.