On the conditioning of the matrix-matrix exponentiation
Jo\~ao R. Cardoso, Amir Sadeghi

TL;DR
This paper investigates the conditioning and derivatives of the matrix-matrix exponentiation function, providing new theoretical insights, algorithms for condition number computation, and numerical experiments to understand its stability and sensitivity.
Contribution
It introduces new results on the Fréchet derivative and conditioning of the matrix-matrix exponentiation, including algorithms for computing the condition number and applications to other matrix functions.
Findings
Derived new formulas for the Fréchet derivative of the matrix-matrix exponentiation.
Proposed an algorithm for computing the relative condition number of A^B.
Numerical experiments demonstrate the effectiveness of the proposed methods.
Abstract
If has no eigenvalues on the closed negative real axis, and is arbitrary square complex, the matrix-matrix exponentiation is defined as . This function arises, for instance, in Von Newmann's quantum-mechanical entropy, which in turn finds applications in other areas of science and engineering. Since in general and do not commute, this bivariate matrix function may not be a primary matrix function as commonly defined, which raises many challenging issues. In this paper, we revisit this function and derive new related results. Particular emphasis is given to its Fr\'echet derivative and conditioning. We present a general result on the Fr\'echet derivative of bivariate matrix functions with applications not only to the matrix-matrix exponentiation but also to other functions, such as the second order Fr\'echet derivatives and some iterationâŠ
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\marginsize
2.5cm2.5cm1.0cm1.0cm
On the conditioning of the matrix-matrix exponentiation
JoĂŁo R. Cardosoa111E-mail address of JoĂŁo R. Cardoso: [email protected], Amir Sadeghi b222Corresponding author (E-mail address: [email protected]) ,
a* Polytechnic Institute of Coimbra/ISEC, Coimbra â Portugal, and
Institute of Systems and Robotics, University of Coimbra, PĂłlo II, Coimbra â Portugal
bDepartment of Mathematics, Robat Karim Branch, Islamic Azad University, Tehran, Iran. *
Abstract
If has no eigenvalues on the closed negative real axis, and is arbitrary square complex, the matrix-matrix exponentiation is defined as . This function arises, for instance, in Von Newmannâs quantum-mechanical entropy, which in turn finds applications in other areas of science and engineering. Since in general and do not commute, this bivariate matrix function may not be a primary matrix function as commonly defined, which raises many challenging issues. In this paper, we revisit this function and derive new related results. Particular emphasis is given to its FrĂ©chet derivative and conditioning. We present a general result on the FrĂ©chet derivative of bivariate matrix functions with applications not only to the matrix-matrix exponentiation but also to other functions, such as the second order FrĂ©chet derivatives and some iteration functions arising in matrix iterative methods. The numerical computation of the FrĂ©chet derivative is discussed and an algorithm for computing the relative condition number of is proposed. Some numerical experiments are included.
keywords: Matrix-matrix exponentiation, Conditioning, Fréchet Derivative, Matrix exponential, Matrix logarithm
1 Introduction
Let be an square complex matrix with no eigenvalues on the closed negative real axis and let be an arbitrary square complex matrix of order . The matrix-matrix exponentiation is defined as
[TABLE]
where stands for the exponential of the matrix and denotes the principal logarithm of , i.e., the unique solution of the matrix equation whose eigenvalues lie on the open strip of the complex plane; stands for the imaginary part of .
For background on matrix exponential, matrix logarithm and general matrix functions see [15, 20] and the references therein. Note that although includes well-known matrix functions as particular cases (for instance, the matrix inverse and real powers of a matrix; see Lemma 2.1 below), it is not, in general, a primary matrix function as defined in those books. Indeed, there may not exist a scalar single variable stem function associated to the matrix-matrix exponentiation. However, we can view as being an extension of the two variable function , but the lack of commutativity between and turns the extension of this function to matrices quite cumbersome. An interesting attempt to define the concept of bivariate matrix function as an operator is given in the monograph [24]. Although we refer to the matrix-matrix exponentiation as being a bivariate matrix function, it does not belong to the class of bivariate matrix functions defined in [24]. Here, can be regarded as a function from to which assigns to each pair of matrices the square complex matrix . Another way of defining the concept of matrix-matrix exponentiation would be
[TABLE]
In Section 2, some relationships between and are pointed out (see, in particular, (iii) in Lemma 2.2). However, our attention will be mainly focused on . Analogue results follow straightforward for . A definition of the matrix-matrix exponentiation in a componentwise fashion is also possible, as used in [11] to deal with some problems is Statistics. However, this latter definition is not considered in this work.
One of our goals is to investigate the sensitivity of the function to perturbations of first order in and . A widely used tool to carry out this is the Fréchet derivative, which in turn allows the computation of the condition number of the function. In this work, we derive a general result on the Fréchet derivative of certain bivariate matrix functions (Theorem 3.1), which can be used to find easily an explicit formula for the Fréchet derivative of the matrix-matrix exponentiation in terms of the Fréchet derivatives of the matrix exponential and matrix logarithm. Formulae for the Fréchet derivatives of other bivariate matrix functions, such as iteration functions to the matrix square root (see [15, Sec. 6.4]) and to the matrix arithmetic-geometric mean (see [7]), can also be obtained from the application of that result. The same holds for the second order Fréchet derivatives of primary matrix functions.
Given a map , the FrĂ©chet derivative of at , with , in the direction of , where , is a linear operator that maps the âdirection matrixâ to such that
[TABLE]
The Fréchet derivative of may not exist at , but if it does it is unique and coincides with the directional (or Gùteaux) derivative of at in the direction . Hence, the existence of the Fréchet derivative guarantees that for any ,
[TABLE]
Any consistent matrix norm on induces the operator norm
[TABLE]
The (relative) condition number of at is defined by
[TABLE]
Hence, if an approximation to is known, then there exist numerical schemes to estimate (for instance, the power method on Fréchet derivative proposed in [23]; see also [15, Alg. 3.20]) and then the condition number . As far as we know, we are the first to investigate the Fréchet derivative of the matrix-matrix exponentiation and its conditioning. In Section 4, we discuss the efficient computation of , where , and propose a power method for estimating the Frobenius norm of and then the corresponding condition number . In the numerical experiments carried out in Section 5 for several pairs of matrices , two iterations of the power method suffices to estimate (where stands for the Frobenius norm), with a relative error smaller than .
Here, one uses the same notation to denote both the matrix norm and the induced operator norm. For more information on the Fréchet derivative and its properties see, for instance, [5, Ch. X] and [15, Ch. 3]. Note also that the pair corresponds, using matrix terminology, to the block matrix \left[\begin{array}[]{c}E\\ F\end{array}\right]. So the notation used above is clear.
To our knowledge, the terminology âmatrix-matrix exponentiationâ was firstly coined by Barradas and Cohen in [6], where this function arises in a problem of Von Newmannâs quantum-mechanical entropy. Some properties of the matrix-matrix exponentiation are addressed in [6], for the particular case when is a normal matrix. We revisit some of those properties and derive new ones.
A particular case of the matrix-matrix exponentiation is the so called âscalar-matrix exponentiationâ. If is a complex number no belonging to , we can define as the function from to which assigns to each pair the square complex matrix . This function appears in the definitions of matrix Gamma and Beta functions, which in turn can be applied to solving certain matrix differential equations [21, 22]. Gamma and Beta functions in matrix form are defined, respectively, as [22]:
[TABLE]
[TABLE]
Our results apply easily to this particular case.
Notation: denotes a subordinate matrix norm and the Frobenius norm; is the imaginary part of the complex number ; is the spectrum of the matrix ; is the conjugate transpose of , is the adjoint of the linear transformation .
The organization of the paper is as follows. In Section 2 we revisit some facts about the matrix-matrix exponentiation and add some related results not previously stated in the literature. A formula for the Fréchet derivative of certain bivariate matrix is proposed in Section 3. This formula is in turn used to derive a formula for the Fréchet derivative of the matrix-matrix exponentiation. It is also explained how it can be applied to the Fréchet derivative of well known bivariate functions. Section 4 is devoted to investigate the conditioning of the matrix-matrix exponentiation. In particular, an algorithm for estimating the relative condition number is propose. Its performance is illustrated by numerical experiments in Section 5. A few conclusions are drawn in Section 6.
2 Basic results
In this section we present some theoretical results on the matrix-matrix exponentiation that can be derived from the properties of the much studied exponential and logarithm matrix functions.
According to the definition (1.1) and some well-known identities valid for the matrix exponential, we have
[TABLE]
and
[TABLE]
In addition, can be considered as the solution of the matrix initial value problem
[TABLE]
Lemma 2.1**.**
If has no eigenvalues on , is any square complex matrix, and , then the following properties hold:
- (i)
* and ; *
** 2. (ii)
*, with . In particular, and *
** 3. (iii)
If the eigenvalues of satisfy , then ;
** 4. (iv)
, therefore ;
** 5. (v)
*, where stands for the conjugate transpose of ; *
** 6. (vi)
If is an invertible matrix then .
Proof.
Immediate consequence from properties of matrix exponential and matrix logarithm. See [15, 20]. â
The following example shows that explicit formulae for matrix-matrix exponentiation may involve complicated expressions, even for the case , with normal.
Example 1**.**
Let {A}=\bigl{[}\begin{smallmatrix}a&b\\ -b&a\\ \end{smallmatrix}\bigr{]} be a nonsingular normal matrix and {B}=\bigl{[}\begin{smallmatrix}\alpha&\beta\\ \gamma&\delta\\ \end{smallmatrix}\bigr{]} be an arbitrary matrix. Our aim is to find a closed expression for . The eigenvalues and eigenvectors of are displayed in matrices and , respectively:
[TABLE]
where and for . It is clear that in the sense of polar notation, we have and (provided that ). Therefore, the logarithm of can be evaluated by the decomposition as following:
[TABLE]
Hence, multiplying the matrices and ,
[TABLE]
It is known that the exponential of an matrix {M}=\bigl{[}\begin{smallmatrix}m_{11}&m_{12}\\ m_{21}&m_{22}\\ \end{smallmatrix}\bigr{]}, can be explicitly obtained by the following relation (see [28]):
[TABLE]
where, . Consequently, an explicit formula for can be obtained via substituting:
[TABLE]
[TABLE]
As mentioned before, some facts about the matrix-matrix exponentiation have been reported in [6], under the assumption of being normal. One of them is revisited in (i) of the next lemma. However, the relationships between and , and their spectra, stated in (ii) and (iii) of the following lemma are new.
Lemma 2.2**.**
If has no eigenvalues on , and is any square complex matrix, then the following properties hold:
- (i)
* and have the same spectra;*
** 2. (ii)
*If and commute, and have spectra , , then the spectrum of (or ) is given by for some permutations and of the set ; *
** 3. (iii)
**
Proof.
- (i)
See [6, Thm. 3.2]. This follows immediately from the classical result of matrix theory stating that when and are square matrices, both products and have the same spectra (see, for instance, Theorem 1.3.20 and Problem 9 in [19]). 2. (ii)
Since and commute, and also commute. Hence, the results follows from the fact that
[TABLE] 3. (iii)
Immediate consequence of the identity , that is valid for any square complex matrices of order ; see [15, Cor. 1.34].
â
One important implication of the statement (iii) of Lemma 2.2 is that when in nonsingular, can be computed easily from :
[TABLE]
A natural way of computing the matrix-matrix exponentiation is to first evaluate and then the exponential of . Matrix exponential and logarithm are much studied functions and one can found many methods for computing them in the literature. The most popular method to the matrix exponential is the so-called scaling and squaring method combined with Padé approximation, that has been investigated and improved by many authors; see for instance [15, Ch. 10] and the references therein and also the more recent paper [1] that includes the algorithm where the expm function of recent versions of MATLAB is based on. The MATLAB function logm implements the algorithm provided in [2, 3], which is an improved version of the inverse scaling and squaring with Padé approximants method proposed in [23]. Other methods for approximating these functions include, for instance, the Taylor polynomial based methods for the matrix exponential proposed in [29] and the iterative transformation-free method of [7] for the matrix logarithm.
A topic that needs further research is the development of algorithms for the matrix-matrix exponentiation that are less expensive than the computation of one matrix exponential and one matrix logarithm plus a matrix product. This seems to be a very challenging issue, especially when does not commute with . Of course, for some particular cases of the matrix-matrix exponentiation (e.g., the matrix square root, matrix -th roots, the matrix inverse) there are more efficient methods that do not involve the computation of matrix exponentials and logarithms. This problem becomes easier even in the more general case when and commute. This is because both matrices may share the same Schur decomposition which reduces considerably the computational effort.
3 The Fréchet derivative of bivariate matrix functions
A key result, very useful from both theoretical and computational perspectives, related with the Fréchet derivative of a primary matrix function , states that
[TABLE]
where is a scalar complex function times continuously differentiable on an open subset containing the spectrum of , the matrix is arbitrary and denotes the Fréchet derivative of at in the direction of (see [25, Thm. 2.1] and [15, Eq. (3.16)]).
Next theorem extends the identity (3.1) to certain bivariate matrix functions.
Theorem 3.1**.**
Let and assume that is a bivariate matrix function with partial derivatives and being continuous functions on an open subset containing . If the curves and are differentiable at , with for all in a certain neighborhood of [math], maps âblock upper triangular matrices to âblock upper triangular, then
[TABLE]
Proof.
Since the partial derivatives and exist and are continuous on the open subset , the Fréchet derivative of on exists (see [8, Sec. 3.1]) and thus coincides with the Gùteaux derivative.
Assuming that and are differentiable at and for all in a certain neighborhood of [math], we shall prove below that an analogue identity to the one in [25, Eq. (1.1)] (see also [15, Thm. 3.6]) holds for our function , that is,
[TABLE]
Indeed, denoting
[TABLE]
with , we have
[TABLE]
from which the result follows by evaluating the limit of the above matrices when . â
An explicit formula for the Fréchet derivative of the matrix-matrix exponentiation, in terms of the Fréchet derivatives of the matrix exponential and matrix logarithm, is given in the next corollary.
Corollary 3.1**.**
Let be the open subset formed by all pairs with having no eigenvalues on and arbitrary. Denoting , it holds
[TABLE]
where and stand for the Fréchet derivatives of the matrix exponential and matrix logarithm, respectively.
Proof.
It easy to check that the conditions of Theorem 3.1 are met. Then the result follows immediately from the identities
[TABLE]
â
If , with not belonging to the closed negative real axis, then is the scalar-matrix exponentiation which arises in matrix Beta and Gamma functions defined in (1.4) and (1.3), respectively. The Fréchet derivative in this special case can be written as
[TABLE]
For , with , the matrix-matrix exponentiation reduces to the single variable matrix function of real powers of , which has been addressed recently in [16, 17]. Provided that is not affected by any kind of perturbation (that is, ), (3.9) reduces to formula (2.4) in [16], that has been obtained using other techniques. Note that while (3.9) covers the case when is perturbed, formula (2.4) in [16] does not.
In addition to the result of the previous corollary, the identity (3.2) provides alternative means for obtaining closed expressions for the Fréchet derivatives of other known bivariate matrix functions. For instance, many iteration functions for approximating the matrix square root ([14], [15, Ch. 6]) or, more generally, for the matrix -th root [13, 18] are of the form
[TABLE]
where and satisfy some smooth requirements. Since
[TABLE]
a closed expression for () follows from (3.2). The same relationship applies to the matrix arithmetic-geometric mean iteration [7, 30] and to find expressions for the second order Fréchet derivatives [5, Ch. X] of primary matrix functions.
Closed formulae for the Fréchet derivatives of matrix exponential and matrix logarithm are available in the literature. One of the most known for the matrix exponential is the integral formula
[TABLE]
(see [31] and [15, Ch. 10]). Another formula, involving the vectorization of the Fréchet derivative, is
[TABLE]
where stands for the operator that stacks the columns of into a long vector of size , and
[TABLE]
with . The symbols and denote the Kronecker product and the Kronecker sum, respectively. Other representations for are available in [15, Eq. (10.3)]; see also [27, 23].
An integral representation of the Fréchet derivative of the matrix logarithm is
[TABLE]
(see [9] and [15, Ch. 11]). Vectorizing (3.15) yields
[TABLE]
where
[TABLE]
Gathering the formulae above, a vectorization of the Fréchet derivative of the matrix-matrix exponentiation can be given by
[TABLE]
where
[TABLE]
Fréchet derivatives allow us to understand how the function behaves when both and are subject to small perturbations. Suppose now that does not suffer any kind of perturbation but does. Now just is regarded as a variable and similar perturbed results to those of matrix exponential are valid, as shown below in Theorem 3.2.
Theorem 3.2**.**
Assume that has no eigenvalue on . For any , the following relation holds:
[TABLE]
Proof.
From the theory of the matrix exponential, it is straightforward that
[TABLE]
(see [4]). Let us consider , and in (3.20). Hence, we have
[TABLE]
Therefore, taking norms, one has
[TABLE]
â
4 Conditioning of the matrix-matrix exponentiation
From now on, we will consider the Frobenius norm only. However, with appropriate modifications, some results can be adapted to other norms. The key factor for evaluating the condition number is the norm of the operator . In this section, we first present an upper bound to such a norm and then a power method for its estimation. The notation is used again.
Theorem 4.1**.**
Assume that the conditions of Corollary 3.1 are valid. With respect to the Frobenius norm, the following inequality holds:
[TABLE]
Proof.
For , we have
[TABLE]
By (3.9),
[TABLE]
Hence
[TABLE]
â
If , one can find an upper bound for the factor in the right hand side of (4.1) as follows:
[TABLE]
In the general case, it is hard to bound , which can be infinitely large. However, since the logarithm function increases in a very slow fashion, in practice the values attained by can be considered small. For instance, its largest value for the ten matrices considered in the numerical experiments in Section 5 is about (see bottom-left plot in Figure 1).
For better estimates to , we propose below a particular power method for the matrix-matrix exponentiation using the framework of [15, Alg. 3.20]. Before stating the detailed steps of the methods, we shall address two important issues raised by its implementation. The first one is the computation of the Fréchet derivative and the second one is how to find the adjoint operator with respect to the Euclidean inner product . We recall that the matrix of our linear operator is not square which means that its expression is not so simple to obtain as in the square case, where one just needs to take the conjugate transpose of the argument (check the top of p. 66 in [15]).
About the first issue, and attending to the developments carried out in Section 3, we will use formula (3.9). For the computation of we consider [1, Alg. 6.4], and for the computation of and we use [3, Alg. 5.1] (without the computation of ). We can, alternatively, use
[TABLE]
(this should be read as: the Fréchet derivative is the block of the resulting matrix in the right-hand side; see (3.3)), but this formula is more expensive than (3.9), even if we exploit the particular structure of the two block matrices in the right-hand side of (4.3). More disadvantages of formulae like (4.3) are mentioned in [2, 3].
Now we focus on finding a closed expression for the adjoint operator , where . According to the theory of adjoint operators (see, for instance, [10]), one needs to look for the unique operator
\begin{array}[]{rccl}L_{f}^{\star}(A,B):&\mathbb{C}^{n\times n}&\longrightarrow&\mathbb{C}^{n\times n}\times\mathbb{C}^{n\times n}\\ &W&\longmapsto&L_{f}^{\star}(A,B;W),\\ \end{array}
such that
[TABLE]
where is the conjugate transpose of (3.18). Since
[TABLE]
and, for it holds
[TABLE]
one has
[TABLE]
and, consequentely,
[TABLE]
We are now ready to propose an algorithm to estimate the condition number of the matrix-matrix exponentiation with respect to the Frobenius norm.
Algorithm 4.1**.**
Given , with having no eigenvalues on the closed negative real axis, this algorithm estimates the condition number defined in (1.2), where , with respect to the Frobenius norm.
- Choose nonzero starting matrices and a tolerance tol;
- Set , and ;
- while* *
- , with given by (3.9);
- , with given by (4.5);
- ;
- ; ;
- ;
- end**
- ;
- .
Cost: , where is the cost of computing one matrix-matrix product (about ), is the cost of computing , corresponds to the computation of ( stands for a given complex matrix of order ), and is the cost for . If and are computed by [3, Alg. 5.1], then is about , where is the number of square roots needed in the inverse scaling and squaring procedure and is the order of Padé approximants considered; assuming that is evaluated by [1, Alg. 6.4], is about , where is a number given in [1, Table 6.2], which is related with the order of Padé approximants to the matrix exponential, and is the number of squarings.
5 Numerical experiments
We have implemented Algorithm 4.1 in MATLAB, with unit roundoff , with a set of ten pairs of matrices , with sizes ranging from to . Many pairs include matrices with nonreal entries and/or matrices from MATLABâs gallery (for instance, the matrices lehmer, dramadah, hilb, cauchy and condex).
The top-left plot displays the relative errors for the condition number estimated by Algorithm 4.1, for each pair of matrices. As âexact condition numberâ, we have considered the value given by our implementation of Algorithm 3.17 in [15]. It is worth noticing that this latter algorithm requires flops while Algorithm 4.1 involves flops. Top-right graphic shows that just iterations in Algorithm 4.1 were needed to meet the prescribed tolerance . The bottom-left plot illustrates our claim after the proof of Theorem 4.1 about the small norm of the matrix logarithm and, finally, the bottom-right plots the values of the relative condition number , for each .
6 Conclusions
The Fréchet derivative of the matrix-matrix exponentiation and its conditioning have been investigated for the first time (as far as we know). We have given a general formula for the Fréchet derivative of certain bivariate matrix functions, with applications to well-know bivariate matrix functions, including the matrix-matrix exponentiation. An algorithm based on the power method for estimating the relative condition number has been proposed. Some numerical experiments illustrate our results. Basic results on the matrix-matrix exponentiation have been derived as well.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. H. Al-Mohy, N. J. Higham, A new scaling and squaring algorithm for the matrix exponential, SIAM J. Matrix Anal. Appl., 31(3), 970â989 (2009).
- 2[2] A. H. Al-Mohy, N. J. Higham, Improved inverse scaling and squaring algorithms for the matrix logarithm, SIAM J. Sci. Comput., 34(4), C 153âC 169 (2012).
- 3[3] A. H. Al-Mohy, N. J. Higham and S. D. Relton, Computing the Frechet derivative of the matrix logarithm and estimating the condition number, SIAM J. Sci. Comput., 35(4), C 394âC 410 (2013).
- 4[4] R. Bellman, Introduction to matrix analysis , Mc Graw-Hill, New York (1960).
- 5[5] R. Bhatia, Matrix Analysis , Springer-Verlag, New York (1997).
- 6[6] I. Barradas, J. E. Cohen, Iterated Exponentiation, Matrix-Matrix Exponentiation, and Entropy, J. Math. Anal. Appli., 183, 76â88 (1994).
- 7[7] J. R. Cardoso, R. Ralha, Matrix arithmetic-geometric mean and the computation of the logarithm, SIAM J. Matrix Anal. Appl., 37 (2), 719â743 (2016).
- 8[8] W. Cheney , Analysis for Applied Mathematics , Graduate Texts in Mathematics 208, Springer-Verlag, New York (2001).
