Matrix versions of the Hellinger distance
Rajendra Bhatia, Stephane Gaubert, Tanvi Jain

TL;DR
This paper introduces and analyzes matrix distance functions based on different geometric means, exploring their properties and applications to barycenter computation in positive definite matrices.
Contribution
It extends the concept of matrix Hellinger distances by studying new divergence measures derived from various matrix means, including the Pusz-Woronowicz and log Euclidean means.
Findings
Certain divergences are strictly convex functions.
Characterizations of barycenters with respect to these divergences.
Connections between these divergences and known metrics like Bures-Wasserstein.
Abstract
On the space of positive definite matrices we consider distance functions of the form where is the arithmetic mean and is one of the different versions of the geometric mean. When this distance is and when it is the Bures-Wasserstein metric. We study two other cases: the Pusz-Woronowicz geometric mean, and the log Euclidean mean. With these choices is no longer a metric, but it turns out that is a divergence. We establish some (strict) convexity properties of these divergences. We obtain characterisations of barycentres of positive definite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Matrix versions of the Hellinger distance
Rajendra Bhatia
Ashoka University, Sonepat
Haryana, 131029, India
,
Stephane Gaubert
INRIA and CMAP, Ecole Polytechnique, CNRS, 91128
Palaiseau, France
and
Tanvi Jain
Indian Statistical Institute
New Delhi 110016, India
Abstract.
On the space of positive definite matrices we consider distance functions of the form where is the arithmetic mean and is one of the different versions of the geometric mean. When this distance is and when it is the Bures-Wasserstein metric. We study two other cases: the Pusz-Woronowicz geometric mean, and \mathcal{G}(A,B)=\exp\big{(}\frac{\log A+\log B}{2}\big{)}, the log Euclidean mean. With these choices is no longer a metric, but it turns out that is a divergence. We establish some (strict) convexity properties of these divergences. We obtain characterisations of barycentres of positive definite matrices with respect to these distance measures.
Key words and phrases:
Geometric mean, matrix divergence, Bregman divergence, relative entropy, strict convexity, barycentre.
2010 Mathematics Subject Classification:
15B48, 49K35, 94A17, 81P45.
1. Introduction
Let and be two discrete probability distributions; i.e. and are -vectors with nonnegative coordinates such that The Hellinger distance between and is the Euclidean norm of the difference between the square roots of and ; i.e.
[TABLE]
This distance and its continuous version, are much used in statistics, where it is customary to take as the definition of the Hellinger distance. We have then
[TABLE]
where is the arithmetic mean of the vectors and is their geometric mean, and stands for
A matrix/noncommutative/quantum version would seek to replace the probability vectors and by density matrices and ; i.e., positive semidefinite matrices with In the discussion that follows, the restriction on trace is not needed, and so we let and be any two positive semidefinite matrices. On the other hand, a part of our analysis requires and to be positive definite. This will be clear from the context. We let be the set of complex positive definite matrices. The notation means that is positive (semi) definite.
Here we run into the essential difference between the matrix and the scalar case. For positive definite matrices and there is only one possible
arithmetic mean, However, the geometric mean could have different meanings. Each of these leads to a different version of the Hellinger distance on matrices. In this paper we study some of these distances and their properties.
The Euclidean inner product on matrices is defined as The associated Euclidean norm is
[TABLE]
Recall that the matrices and have the same eigenvalues. Thus if and are positive definite, then is not positive definite unless and commute. However, the eigenvalues of are all positive as they are the same as the eigenvalues of Also every matrix with positive eigenvalues has a unique square root with positive eigenvalues. If are positive definite, then we denote by the square root that has positive eigenvalues. Since the matrices and are similar, and hence have the same eigenvalues.
The straightforward generalisation of (1) for positive definite matrices is evidently
[TABLE]
Another version could be
[TABLE]
While it is clear from (3) that is a metric on it is not obvious that is a metric. It turns out that
[TABLE]
where the minimum is taken over all unitary matrices It follows from this that is a metric. This is called the Bures distance in the quantum information literature and the Wasserstein metric in the literature on optimal transport. It plays an important role in both these subjects. We refer the reader to [18] for a recent exposition, and to [12, 26, 28, 36] for earlier work. The quantity is called the fidelity between the states and In the special case when are pure states, we have and For qubit states this is the distance on the Bloch sphere.
For various reasons, theoretical and practical, the most accepted definition of geometric mean of is the entity
[TABLE]
This formula was introduced by Pusz and Woronowicz [32]. When and commute reduces to The mean has been studied extensively for several years and has remarkable properties that make it useful in diverse areas. One of them is its connection with operator inequalities related to monotonicity and convexity theorems for the quantum entropy. See Chapter 4 of [15] for a detailed exposition. Another object of interest has been the log Euclidean mean defined as
[TABLE]
This mean too reduces to when and commute, and has been used in various contexts [7], though it lacks some pleasing properties that has.
Thus it is natural to consider two more matrix versions of the Hellinger distance, viz,
[TABLE]
and
[TABLE]
In view of what has been discussed, we may expect that and are metrics on However, it turns out that neither of them obeys the triangle inequality. Examples are given in Section 2. Nevertheless, this is compensated by the fact that the squares of and both are divergences, and hence they can serve as good distance measures.
inline]SG: inconsistency of notation here is the set of nonnegative numbers, but later is is used for positive numbers, being the one dimensional version of . I now use for the positive reals and for the nonnegative reals, and define it at the first occurrence A smooth function from to the set of nonnegative real numbers, , is called a divergence if
if and only if
The first derivative with respect to the second variable color=blue!20]RB: second variable precised vanishes on the diagonal; i.e.,
[TABLE]
The second derivative is positive on the diagonal; i.e.,
[TABLE]
See [4], Sections 1.2 and 1.3.
The prototypical example is the Euclidean divergence The functions and are also divergences. Another well-known example is the Kullback-Leibler divergence [4]. A special kind of divergence is the Bregman divergence inline]SG: the mother function should take values in rather than , think of corresponding to a strictly convex differentiable color=blue!20]RB: say it is differentiable function If is such a function, then
[TABLE]
is called the Bregman divergence corresponding to Not every divergence arises in this way. In particular, the square of the Hellinger distance, on probability vectors is not a Bregman divergence.
Now we describe our main results. We will show that both the functions
[TABLE]
are divergences. We will show that and are jointly convex in the variables and and strictly convex in each of the variables separately. One consequence of this is that for every -tuple in and
positive weights the minimisation problem
[TABLE]
has a unique solution when or When the minimum in (13) is attained at the -power mean
[TABLE]
This is one of the much studied family of classical power means. When the minimiser in (13) is the Wasserstein mean [2, 18]. This is the unique solution of the matrix equation
[TABLE]
This mean has major applications in optimal transport, statistics, quantum information and other areas. Means with respect to various divergences have also been of interest in information theory. See e.g., [8, 30]. An inspection of (14) and (15) shows a common feature. Both for and the minimiser in (13) is the solution of the equation
[TABLE]
where is the version of the geometric mean chosen in the definition of That is, in the case of and in the case of It turns out that this is also the case for but not for When the minimisation problem (13) has a unique solution which is also the solution of the matrix equation
[TABLE]
This, in general, is different from the solution of the matrix equation
[TABLE]
When the problem (13) has a unique solution which is also the solution of the matrix equation
[TABLE]
In the past few years there has been extensive work on the Cartan mean (also known as Karcher or Riemann mean) of positive definite matrices.
This is the solution of the minimisation problem
[TABLE]
where
[TABLE]
is the Cartan metric on the manifold .color=blue!20]RB: Cartan added This mean from classical differential geometry has found several important applications [9, 15, 16, 24, 29].
Our analysis of leads to some interesting facts about quantum relative entropy. We observe that the convex function leads to the Bregman divergence and the log Euclidean mean is the barycentre with respect to this Bregman divergence. As a related issue, we explore properties of barycentres with respect to general matrix Bregman divergences, and point out similarities and crucial differences between the scalar and matrix case.
Convexity properties of matrix Bregman divergences have been studied in [11, 31], and matrix approximation problems with divergences in [23]. Means with respect to matrix divergences are studied in [22]. In [35] Sra studied a related distance function
[TABLE]
and showed that this is a metric on . Several parallels between this metric and the Cartan metric are pointed out in [35].
2. Convexity and derivative computations
Inequalities for traces of matrix expressions have a long history. For the different geometric means mentioned in Section 1, we know [17] that
[TABLE]
It follows that
[TABLE]
Since is a metric, this implies that if and only if The same is true for Thus and satisfy the first condition in the definition of a divergence. To prove is a divergence we need to compute its first and second derivatives. These results are of independent interest.
Proposition 1**.**
Let be a positive definite matrix. Let be the map on defined as
[TABLE]
Then the derivative of is given by the formula
[TABLE]
where
Proof.
We will use the integral representation
[TABLE]
where See [14] p.143. Using this we see that the derivative of the function is the linear map
[TABLE]
where is any Hermitian matrix. This shows that
[TABLE]
This proves the proposition.
Theorem 2**.**
Let and be the first and the second derivatives of Then
[TABLE]
[TABLE]
(In other words, the gradient of at every diagonal point is [math] and the Hessian is positive.)
Proof.
For a fixed let be the map on defined as When the expression in (23) reduces to
[TABLE]
Recalling that we see that
[TABLE]
This establishes (26). Next note that for the second derivative we have
[TABLE]
From (23) we see that
[TABLE]
By definition
[TABLE]
Hence, from (29) we see that is equal tocolor=blue!20]RB: displayed eq edited
[TABLE]
When and this reduces to give
[TABLE]
This proves (27).
inline]SG: rewrote what follows as the restriction to is unnatural and the notation was undefined Consider maps defined on and taking values in or (the set of positive real numbers). We say that is concave if for all in and
[TABLE]
It is strictly concave if the two sides of (31) are equal only if A map from into or is called jointly concave if for all in and
[TABLE]
It is a basic fact in the theory of the geometric mean that is jointly concave in and , see [5, 6]. However, it is not strictly jointly concave. Indeed, even the function on is not strictly jointly concave (its restriction to the diagonal is linear). Our next theorem says that in each of the variables separately, the geometric mean is strictly concave. color=blue!20]RB: para above edited
Theorem 3**.**
For each the function
[TABLE]
is strictly concave on This implies that the function is also strictly concave.color=blue!20]RB: last sentence added
Proof.
Suppose
[TABLE]
We have to show that this implies Rewrite the above equality as
[TABLE]
By the concavity of the expression inside the braces is positive semidefinite. The trace of such a matrix is zero if and only if the matrix itself is zero. Hence
[TABLE]
Using the definition (6) this can be written as
[TABLE]
Cancel the factors occurring on both sides, then square both sides, and rearrange terms to get
[TABLE]
This is the same as saying
[TABLE]
The square of a Hermitian matrix is zero only if Hence, we have
[TABLE]
From this it follows that
Finally, if are to elements of such that , taking traces on both sides, we have, We have seen that this implies . color=blue!20]RB: last para added
As a consequence, we observe that
[TABLE]
is jointly convex in and and is strictly convex in each of the variables separately.
Now we turn to the analysis of on the same lines as above. The arguments we present in this case are quite different. From (22) we know that
[TABLE]
We also know that
[TABLE]
and
[TABLE]
Together, these three relations lead to the conclusion that
[TABLE]
Thus satisfies condition (10).
By a theorem of Bhagwat and Subramanian [13]
[TABLE]
One of the several remarkable concavity theorems of Carlen and Lieb, [20, 21] says that the expression is jointly concave in when and jointly convex when Using equation (32) we obtain from this the joint concavity of As a consequence is jointly convex in Hence we have proved the following theorem.
Theorem 4**.**
The function is a divergence on
We have shown that and are divergences. But unlike and they are not the squares of metrics on i.e., and are not metrics. The following two examples show that and do not satisfy the triangle inequality.
Let
[TABLE]
Then and This example is a small modification of one suggested to us by Suvrit Sra, to whom we are thankful. Let
[TABLE]
Then and
Next we study some more properties of , like its strict convexity in each of the arguments, and its connections with matrix entropy. To put these in context we recall some facts about Bregman divergence.
Let be a smooth strictly convex function and let
[TABLE]
be the associated Bregman divergence. Then is strictly convex in the variable but need not be convex in (See, e.g., [23] Section 2.2.)
Given in the minimiser
[TABLE]
always turns out to be the arithmetic mean
[TABLE]
independent of the mother function
In fact, this property characterises Bregman divergences; see [23, 8]. We can also consider the problem
[TABLE]
In this case, a calculation shows that the solution is the quasi-arithmetic mean (the Kolmogorov mean) associated with the function More precisely, the solution of (35), which we may think of as the mean, or the barycentre, of the points with respect to the divergence is
[TABLE]
We wish to study the matrix version of the problems (34) and (35). Here we run into a basic difference between the one-variable and the several-variables cases. It is natural to replace the derivative in (36) by the gradient in the several-variables case. If is a differentiable strictly convex function defined on an open interval of , then, its derivative is a strictly monotone continuous function, and hence a homeomorphism from to its image . In particular, is defined. The appropriate generalisation of these facts to the several-variable case requires the notion of a Legendre type function.
Definition** (Section 26 in [33] or Def. 2.8 in [10]).**
Suppose is a convex lower-semicontinuous function from to , and let . We say that is of Legendre type if it satisfies
- (i)
, 2. (ii)
is differentiable on , 3. (iii)
is strictly convex on , 4. (iv)
, for all and .
If is of Legendre type, the gradient mapping is a homeomorphism from to , where denotes the Legendre-Fenchel conjugate of . See Theorem 26.5 in [33].
Lemma 5**.**
If is of Legendre type, and is the Bregman divergence associated with , and , then the function
[TABLE]
achieves its minimum at a unique point, which belongs to .
The proof is given in Appendix A. We shall apply this lemma in the situation where is a convex function defined only on and taking finite values on this set. The map trivially extends to a convex lower-semicontinuous function defined on the whole space of Hermitian matrices—set for , and if . We shall say that the original function defined on is of Legendre type if its extension is of Legendre type.
Theorem 6**.**
Let be a differentiable strictly convex function from to and let be the Bregman divergence corresponding to Then:
- (i)
The minimiser in the problem
[TABLE]
is the arithmetic mean 2. (ii)
If, in addition, is of Legendre type, then the problem
[TABLE]
has a unique solution, and this is given by
[TABLE] 3. (iii)
If is any differentiable strictly convex function from to and is the Bregman divergence on corresponding to the function on , then the solution of the minimisation problem (38) is
[TABLE]
Proof.
[TABLE]
where denotes the arithmetic mean Hence
[TABLE]
Since is strictly convex, for every
[TABLE]
This implies that
[TABLE]
which shows that is the unique minimiser of the problem (37). (ii). Let be the map from to defined as
[TABLE]
Then
[TABLE]
5 shows that the minimum of the map on the set is achieved at some point , and by the first order optimality condition, , showing that satisfies (39).
(iii). If is a differentiable convex function on and is the Bregman divergence corresponding to then Hence, to show that the minimisation problem (38) has a solution, it suffices to show that the first order optimality condition
[TABLE]
is satisfied for some in . Since is strictly convex, as noted above, is strictly increasing and is a homeomorphism from to the interval . The spectrum of each matrix belongs to , and so the spectrum of also belongs to , which implies that (41) is solvable.
The assumption that is of Legendre type is not needed in the tracial case (statement (iii)). 11 in Appendix B shows that this assumption cannot be dispensed with in the case of statement (ii).
The much studied convex function
[TABLE]
on leads to the Bregman divergence
[TABLE]
This is called the Kullback-Leibler divergence. Since the solution of the minimisation problem (35) in this case is
[TABLE]
the geometric mean of
As a matrix analogue of (42) one considers the function on defined as
[TABLE]
The associated Bregman divergence then is
[TABLE]
(See [4], p.12). The quantity
[TABLE]
is called the relative entropy and has been of great interest in quantum information. Given in their barycentre with respect to the divergence i.e., the solution of the minimisation problem (38) is the log Euclidean mean
[TABLE]
It is also of interest to compute the variance of the points with respect to i.e., the minimum value of the objective function in (38). This is the quantity
[TABLE]
For the divergence in (45), is the log Euclidean mean given in (47). So
[TABLE]
In other words
[TABLE]
the difference between the traces of the arithmetic and the log Euclidean means of
In particular, the divergence can be characterised using (49), as the minimum value
[TABLE]
where is defined by (45). Using this characterisation we can show that the function is strictly convex in each of the variables separately. To this end, we recall the following lemma of convex analysis, showing that the “marginal” of a jointly convex function is convex; compare with Proposition 2.22 of [34] where a similar result (without the strictness conclusion) is provided. inline]SG: added last sentence with a ref to [34] as this is known in convex analysis
Lemma 7**.**
Let be a jointly convex function which is strictly convex in each of its variables separately. Suppose for each
[TABLE]
exists. Then the function is jointly convex, and is strictly convex in each of the variables separately.
Proof.
Given choose and such that
[TABLE]
and
[TABLE]
Then
[TABLE]
This shows that is jointly convex. Now we show that it is strictly convex in the first variable. Let be any three points with Choose and such that
[TABLE]
and
[TABLE]
Two cases arise. If then
[TABLE]
because of strict convexity of in the second variable. This implies that
[TABLE]
If then by strict convexity of in the first variable,
[TABLE]
and by joint convexity of
[TABLE]
Adding the last two inequalities we get
[TABLE]
Thus is strictly convex in the first variable, and by symmetry it is so in the second variable.
Theorem 8**.**
For each the function is strictly convex on
Proof.
One of the fundamental, and best known, properties of the relative entropy is that it is jointly convex function of and (See, e.g., Section IX.6 in [14].) It is also known that if is strictly convex function on then the function is strictly convex on (See, e.g., Theorem 4 in [19].) It follows from this that is strictly convex in each of the variables separately. Combining these properties of Lemma 7 and the characterisation of as the minimum value in (50) we obtain Theorem 8.
It might be pertinent to add here that the question of equality in the joint convexity inequality
[TABLE]
has been addressed in [25] and [27]. In [27] Jencova and Ruskai show that the equality holds in (52) if and only if
[TABLE]
On the other hand, Hiai et al [25] show that equality holds in (52) if and only if
[TABLE]
We are thankful to F. Hiai for making us aware of these results.
3. Barycentres
If is a convex function on an open convex set, then a critical point of is the global minimum of If is strictly convex, then can have at most one such critical point. In this section we show that for and the objective function in (13) has a critical point, and hence in both cases the problem (13) has a unique solution.
Theorem 9**.**
When the minimum in (13) is attained at a unique point which is the solution of the matrix equation (17)
[TABLE]
This minimiser is the -power mean given by (14) if commutes with every In particular, the minimiser is if
- (i)
all ’s commute, or
- (ii)
**
Proof.
For a fixed positive definite matrix define the map as
[TABLE]
By Proposition 1, we have
[TABLE]
The objective function in (13) is
[TABLE]
Using the definition of we have
[TABLE]
Then using the above expression for we see that
[TABLE]
At the last step above we use the cyclicity of the trace function. Hence the critical point of is the matrix if and only if satisfies the matrix equation
[TABLE]
Taking congruence with on both sides we see that (53) is equivalent to (17). We now show that there exists a positive definite matrix that satisfies (17). Let such that for all and let be the compact set Define the map as
[TABLE]
Since Thus we have We know that This gives By the Brouwer fixed point theorem, we get that has a fixed point in This fixed point is the solution of (17). Suppose commutes with every We show that satisfies (17). Differentiating (24) we get
[TABLE]
Using in (53) and using (54) we get
[TABLE]
This proves the second statement of the theorem. If (i) holds, it follows from (14) that commutes with ’s. The same is trivially true if (ii) holds.
Theorem 10**.**
When the minimum in (13) is attained at a unique point which satisfies the matrix equation (19)
[TABLE]
Proof.
Start with the integral representation
[TABLE]
This shows that for all and all Hermitian we have
[TABLE]
For a fixed let
[TABLE]
Then
[TABLE]
The log Euclidean mean So, by the chain rule and Dyson’s formula (see [14] p. 311), we have
[TABLE]
This shows that
[TABLE]
using the cyclicity of trace. Using (55) and the cyclicity once again, we obtain
[TABLE]
Hence, for the function
[TABLE]
we have
[TABLE]
The objective function in (13) is
[TABLE]
So, we have
[TABLE]
where
[TABLE]
This shows that if and only if
[TABLE]
Choose an orthonormal basis in which and let in this basis. Then the condition (57) says that
[TABLE]
This shows that is diagonal, and
[TABLE]
Thus as claimed. We should also show that the equation (19) has a unique solution. Let be positive numbers such that for all Let be the compact convex set The function is operator monotone. So for all in we have Hence is in for all This shows that the function maps into itself. By Brouwer’s fixed point theorem has a unique fixed point in This is a solution of (19) and therefore must be unique.
Finally, we remark that in the case of the barycentre is given explicitly by the formula (14). For it has been given implicitly as solution of the equations (15),(17),(19), respectively. When and , color=blue!20]RB: precised the solution of (15) is the Wasserstein mean of and defined as
[TABLE]
See [18].
Acknowledgements: The authors thank F. Hiai and S. Sra for helpful comments and references, and the anonymous referee for a careful reading of the manuscript. The first author is grateful to INRIA and École polytechnique, Palaiseau for visits that facilitated this work, and to CSIR(India) for the award of a Bhatnagar Fellowship.
Appendix A Proof of 5
We make a variation of the proof of Theorem 3.12 in [10], dealing with a related problem (the minimisation of over a closed convex set).
Since is of Legendre type, Theorem 3.7(iii) of [10] shows that for all , the map is coercive, meaning that . A sum of coercive functions is coercive, and so the map
[TABLE]
is coercive. The infimum of a coercive lower-semicontinuous function on a closed non-empty set is attained, so there is an element such that . Suppose that belongs to the boundary of . Let us fix an arbitrary , and let , defined for . We have
[TABLE]
Using property (iv) of the definition of Legendre type functions, we get that , which entails that for small enough. Since for all , this contradicts the optimality of . So , which proves 5.
Appendix B Examples
In the last statement of 6, dealing with tracial convex functions, we required to be differentiable and strictly convex on . In the second statement, dealing with the non tracial case, we made a stronger assumption, requiring to be of Legendre type. We now give an example showing that the Legendre condition cannot be dispensed with. To this end, it is convenient to construct first an example showing the tightness of 5.
Need for the Legendre condition in 5
Let us fix , let ,
[TABLE]
and consider the affine transformation . Let , , and
[TABLE]
[TABLE]
Observe that since .
Consider now, for , the map defined on and . Observe that is strictly convex and differentiable. Let denote the Bregman divergence associated with , and let . We claim that [math] is the unique point of minimum of over . Indeed,
[TABLE]
from which we get
[TABLE]
It follows that if is chosen close enough to , so that . Then, since is convex, we have
[TABLE]
showing the claim.
Consider now the modification of , so that for , and otherwise. The function is strictly convex, lower-semicontinuous, and differentiable on the interior of its domain, but not of Legendre type, and the conclusion of 5 does not apply to it.
The geometric intuition leading to this example is described in the figure.
Need for the Legendre condition in 6
We next construct an example showing that the Legendre condition in the second statement of 6 cannot be dispensed with. Observe that the inverse of the linear operator in (60) is given by
[TABLE]
In particular, it is a nonnegative matrix.
We set , and consider the “quantum” analogue of , i.e.,
[TABLE]
Then,
[TABLE]
is a completely positive map leaving invariant. The analogue of the map is
[TABLE]
where denotes the identity matrix.
We now consider the map defined on the space of Hermitian matrices. The function is differentiable and strictly convex, still assuming that . We set , , and now define to be the Bregman divergence associated with . Let
[TABLE]
We then have the following result.
Proposition 11**.**
The minimum of the function on the closure of is achieved at point [math]. Moreover, the equation
[TABLE]
has no solution in .
Proof.
From [3] (Theorem 2.1) or [1] (Theorem 2.3), we have
[TABLE]
where is the polar decomposition of . In particular, if is diagonal and positive semidefinite,
[TABLE]
Then, by a computation similar to the one in the scalar case above, we get
[TABLE]
We conclude, as in (61), that
[TABLE]
where now is the Frobenius scalar product on the space of Hermitian matrices. It follows that [math] is the unique point of minimum of on .
Moreover, if the equation (62) had a solution , the first order optimality condition for the minimisation of the function over would be satisfied, showing that for all , and by density, , contradicting the fact that [math] is the unique point of minimum of over .
Note added to the second version: In the earlier version of this paper posted on January 5, 2019 that appeared in Letters in Mathematical Physics, 109, (2019) 1777-1804, , we made an unfortunate error. Theorem 9 in that version wrongly claimed that for the case the solution of the minimisation problem (13) is also the solution of the matrix equation (18). The mistake in the statement and in the proof has been pointed in J. Pitrik and D. Virosztek, Quantum Hellinger distances revisited, arXiv: 1903.10455v3. In this paper some more general divergence functions are considered, the barycentre equations are derived, and an example is given to show that the solution to the matrix equations (17) and (18) need not be the same.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] T.J. Abatzoglou, Norm derivatives on spaces of operators , Math. Ann., 239 (1979), 129-135.
- 2[2] M. Agueh and G. Carlier, Barycenters in the Wasserstein space , SIAM J. Math. Anal. Appl. 43 (2011), 904-924.
- 3[3] J.G. Aiken, J.A. Erdos, J.A. Goldstein Unitary approximation of positive operators , Illinois J. Math., 24 (1980), 61-72.
- 4[4] S. Amari, Information Geometry and its Applications , Springer (Tokyo), 2016.
- 5[5] T. Ando, Concavity of certain maps on positive definite matrices and applications to Hadamard products , Linear Algebra Appl. 26 (1979), 203-241. color=blue!20]RB: ref added
- 6[6] T. Ando, C.-K. Li and R. Mathias, Geometric means , Linear Algebra Appl. 385 (2004), 305-334.
- 7[7] V. Arsigny, P. Fillard, X. Pennec and N. Ayache, Geometric means in a novel vector space structure on symmetric positive-definite matrices , SIAM J. Math. Anal. Appl. 29 (2007), 328-347.
- 8[8] A. Banerjee, S. Merugu, I. S. Dhillon and J. Ghosh, Clustering with Bregman divergences , J. Mach. Learn. Res. 6 (2005), 1705-1749.
