Quantum Hellinger distances revisited
J\'ozsef Pitrik, D\'aniel Virosztek

TL;DR
This paper introduces generalized quantum Hellinger divergences involving Kubo-Ando means, explores their properties, and characterizes barycenters, clarifying previous claims about their form in non-commuting cases.
Contribution
It extends quantum Hellinger distances by defining a family of divergences with Kubo-Ando means and characterizes their barycenters, correcting prior assumptions for non-commuting operators.
Findings
Generalized divergences are jointly convex and satisfy data processing inequality.
Barycenters are characterized as weighted multivariate 1/2-power means in commuting cases.
The previously claimed barycenter form does not hold for non-commuting operators.
Abstract
This short note aims to study quantum Hellinger distances investigated recently by Bhatia et al. [Lett. Math. Phys. 109 (2019), 1777-1804] with a particular emphasis on barycenters. We introduce the family of generalized quantum Hellinger divergences, that are of the form where is an arbitrary Kubo-Ando mean, and is the weight of We note that these divergences belong to the family of maximal quantum -divergences, and hence are jointly convex and satisfy the data processing inequality (DPI). We derive a characterization of the barycenter of finitely many positive definite operators for these generalized quantum Hellinger divergences. We note that the characterization of the barycenter as the weighted multivariate -power mean, that was claimed in the work of Bhatia et al. mentioned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Quantum Hellinger distances revisited
József Pitrik
MTA-BME Lendület (Momentum) Quantum Information Theory Research Group, and Department of Analysis, Institute of Mathematics
Budapest University of Technology and Economics
H-1521 Budapest, Hungary
[email protected] http://www.math.bme.hu/~pitrik and
Dániel Virosztek
Institute of Science and Technology Austria
Am Campus 1, 3400 Klosterneuburg, Austria
[email protected] http://pub.ist.ac.at/~dviroszt
Abstract.
This short note aims to study quantum Hellinger distances investigated recently by Bhatia et al. [8] with a particular emphasis on barycenters. We introduce the family of generalized quantum Hellinger divergences that are of the form where is an arbitrary Kubo-Ando mean, and is the weight of We note that these divergences belong to the family of maximal quantum -divergences, and hence are jointly convex, and satisfy the data processing inequality (DPI). We derive a characterization of the barycenter of finitely many positive definite operators for these generalized quantum Hellinger divergences. We note that the characterization of the barycenter as the weighted multivariate -power mean, that was claimed in [8], is true in the case of commuting operators, but it is not correct in the general case.
Key words and phrases:
quantum Hellinger distance, Kubo-Ando mean, weighted multivariate mean, barycenter, data processing inequality, convexity
2010 Mathematics Subject Classification:
Primary: 47A64. Secondary: 15A24, 81Q10.
J. Pitrik was supported by the Hungarian Academy of Sciences Lendület-Momentum grant for Quantum Information Theory, no. 96 141, and by the Hungarian National Research, Development and Innovation Office (NKFIH) via grants no. K119442, no. K124152, and no. KH129601. D. Virosztek was supported by the ISTFELLOW program of the Institute of Science and Technology Austria (project code IC1027FELL01), by the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie Grant Agreement No. 846294, and partially supported by the Hungarian National Research, Development and Innovation Office (NKFIH) via grants no. K124152, and no. KH129601.
1. Introduction
1.1. Motivation, goals
Given a measure space and probability measures and that are absolutely continuous with respect to the classical squared Hellinger distance or Hellinger divergence of and is defined as
[TABLE]
where and denote the Radon–Nikodym derivatives [16]. The Hellinger divergence is a special Csiszár-Morimoto -divergence [12, 24] generated by the convex function and it has several possible counterparts in quantum information theory. One of them is the squared Bures distance or Wasserstein metric, see, e.g., the most recent works of Bhatia et al. [10], Dinh et al. [13], and Molnár [23]. Another important quantum analogue of the classical Hellinger divergence has been investigated in [8], namely the quantity
[TABLE]
where are density operators representing quantum states, or even more generally, positive operators, and is the geometric mean introduced by Pusz and Woronowicz [28], which is a particularly important Kubo-Ando mean [3, 4, 6].
In this note, we introduce a far-reaching generalization of the quantum Hellinger divergence (2), namely, the family of generalized quantum Hellinger divergences of the form
[TABLE]
where is an arbitrary Kubo-Ando mean, and is the weight of We will note that these divergences belong to the family of maximal quantum -divergences, and hence are jointly convex, and satisfy the data processing inequality (DPI). Moreover, we will show an intimate relation between generalized quantum Hellinger divergences and operator valued Bregman divergences (Claim 2). By this close relation, we verify in Claim 3, that generalized quantum Hellinger divergences are genuine divergences in the sense of [1, Sec. 1.2 & 1.3]. Note that this is not the case for maximal quantum -divergences in general, see Remark 1. As the main result of this paper, we derive a characterization of the barycenter of finitely many positive definite operators for these generalized quantum Hellinger divergences. We will also note that the characterization of the barycenter as the weighted multivariate power mean of order , that was claimed in the work of Bhatia et al. [8, Thm. 9], is true in the case of commuting operators, but it is not correct in the general case.
1.2. Basic notions, notation
Operator monotone functions mapping the positive half-line into itself admit a transparent integral-representation by Löwner’s theory. In the seminal paper of Kubo and Ando [4], the following integral representation was considered:
[TABLE]
where is some positive Radon measure on the extended half-line By a simple push-forward of by the transformation we get the following integral-representation of positive operator monotone functions on
[TABLE]
where that is, for every Borel set This representation is also well-known and appears — among others — in [15] and [30]. Note that if is absolutely continuous with respect to the Lebesgue measure and then the density of is given by
Throughout this note, stands for a finite dimensional complex Hilbert space, denotes the set of all linear operators on and and stand for the set of all self-adjoint and positive definite operators, respectively. On we consider the usual Löwner order induced by positivity. The Fréchet derivative of a map at the point is denoted by Here, is an open subset of usually the cone of positive definite operators, and the target space is usually or Note that in the latter case is a linear map from into itself. The symbol denotes the identity operator on
For positive definite operators the Kubo-Ando connection generated by the operator monotone function is denoted by and is defined by
[TABLE]
A Kubo-Ando connection is a mean if and only if In the sequel, we will restrict our attention to means. We denote by the set of all Borel probability measures on and by the center of mass of There is a natural way to assign a weight parameter to a mean namely, More details about this weight parameter can be found in [30], we only mention that for the weighted arithmetic, geometric, and harmonic means generated by
[TABLE]
respectively, we have That is, this weight parameter coincides with the usual one in the most important special cases.
1.3. Convex order
The convex order is a well-known relation between probability measures; for we say that if for all convex functions we have It is clear that for all with we have where denotes the Dirac mass concentrated on For any fixed the map is convex. Therefore, if then for all and hence for all Consequently, if then is always positive, in particular, This quantity is exactly the one we are interested in.
2. Basic properties of quantum Hellinger distances
We are interested in divergences of the form
[TABLE]
where To avoid trivialities, we assume in the sequel that the support of is strictly larger than and therefore, is non-affine — in fact, it is strictly concave
If is the arcsine distribution, that is, then
[TABLE]
where is the Pusz-Woronowitz geometric mean [28]. The square root of this quantity (up to an irrelevant multiplicative constant) was considered in [8] as a possible quantum (or matrix) version of the classical Hellinger distance. Therefore, we will call the quantities of the form (7) generalized quantum Hellinger divergences.
We easily get that
[TABLE]
where is defined by
[TABLE]
Remark 1*.*
We note that is operator convex as is operator concave, and hence generalized quantum Hellinger divergences belong to the family of maximal quantum -divergences studied for example in [17, 19, 22, 26]. This latter divergence class consists of quantities of the form where and is operator convex [17, 26]. However, this level of generality may lead to counter-intuitive phenomena. For instance, the maximal quantum -divergence can be negative (see, e.g., [17, Example 4.4], where and ); and it may happen that for all (see, e.g., [17, Example 4.2], where and for all ). That is, maximal quantum -divergences are not divergences in the sense of [1, Sec. 1.2 & 1.3] in general. In particular, they are not necessarily positive definite. (We call a divergence positive definite, if for every and if and only if )
Now we check that generalized quantum Hellinger divergences are intimately related to operator valued Bregman divergences, and hence are reasonable measures of dissimilarity and genuine divergences in the sense of [1, Sec. 1.2 & 1.3].
2.1. The relation with Bregman divergences
Note that is an operator convex function, and that
[TABLE]
The operator valued Bregman divergence generated by the operator convex function reads as follows:
[TABLE]
In particular,
[TABLE]
As coincides with the multiplication by the constant and we get that
[TABLE]
Therefore, we obtain the following claim.
Claim 2**.**
The generalized quantum Hellinger divergence defined in (7) can be expressed by an operator valued Bregman divergence as follows:
[TABLE]
For a detailed study of Bregman divergences on matrices we refer to [27].
Now we are in the position to check that generalized quantum Hellinger divergences are genuine divergences in the sense of Amari [1, Sec. 1.2 & 1.3].
Claim 3**.**
For any the map
[TABLE]
satisfies the followings.
- (i)
* and if and only if * 2. (ii)
The first derivative of in the second variable vanishes at the diagonal, that is, for all 3. (iii)
The second derivative of in the second variable is positive at the diagonal, that is, for all
Proof.
Bregman divergences are clearly divergences (see, e.g., [8, Sec. 1]).That is,
- (i)
and if and only if 2. (ii)
for every 3. (iii)
for all
Now Claim 3 follows from Claim 2. ∎
2.2. Joint convexity, data processing inequality
As generalized quantum Hellinger divergences belong to the family of maximal quantum -divergences, they are jointly convex and they satisfy the data processing inequality, which is particularly important from the quantum information theory viewpoint. For details, see [17, 19, 22, 26]. We recall these important properties for convenience.
Property 4** (Joint convexity).**
The generalized quantum Hellinger divergence defined in (7) is jointly convex on
Property 5** (Data processing inequality).**
Let be a quantum channel, that is, a completely positive and trace preserving (CPTP) map. Let be arbitrary. Then
[TABLE]
holds for every
3. Barycenters
The notion of barycenter (or least squares mean) plays a central role in averaging procedures related to various topics in mathematics and mathematical physics. Given a metric space and an -tuple in with positive weights such that the barycenter (or Fréchet mean or Karcher mean or Cartan mean) is defined to be
[TABLE]
In our setting, and the generalised quantum Hellinger divergence plays the role of the squared distance although it is not the square of any true metric in general.
That is, we consider the optimization problem
[TABLE]
where the positive definite operators and the weights are fixed. By the strict concavity of the function
[TABLE]
is strictly convex on see, e.g., [11, 2.10. Thm.]. Therefore, there is a unique solution of (13), and it is necessarily a critical point of the function That is, it satisfies
[TABLE]
Easy computations give that
[TABLE]
where for a positive definite operator the map is defined by
[TABLE]
By differentiating (5), we have
[TABLE]
for Consequently,
[TABLE]
[TABLE]
[TABLE]
By the linearity and the cyclic property of the trace, we get from (15) and (18) that (14) is equivalent to
[TABLE]
where stands for the absolute value of an operator, that is, This latter equation amounts to
[TABLE]
So we obtained the following characterization of the barycenter.
Theorem 6**.**
Let and let be the generalized quantum Hellinger divergence generated by that is,
[TABLE]
Then the barycenter (or Cartan mean or Fréchet mean or Karcher mean) of the positive definite operators with positive weights with respect to i.e.,
[TABLE]
coincides with the unique positive definite solution of the matrix equation
[TABLE]
4. The commutative case
In this section we show that in the commutative case formula (21) can be greatly simplified (see (28) later), furthermore, the conditions on can be relaxed. Recall that in the general non-commutative case, the generating function was operator monotone (or equivalently, operator concave), and hence smooth (), see (4) and (5). When dealing with commuting operators, we need concavity only in the classical one-variable sense, and hence we require much less regularity on For now, we only require that is a strictly concave function.
Let be a maximal Abelian subalgebra (MASA). In this commutative case, the proper analogue of the generalized quantum Hellinger divergence (7) is
[TABLE]
[TABLE]
Note that now there is no underlying measure involved and the function class that we choose the s from is much larger than that in the general non-commutative case. Also note that
[TABLE]
where We easily get that for and we have
[TABLE]
and therefore,
[TABLE]
[TABLE]
That is, the derivative vanishes if and only if
[TABLE]
or equivalently,
[TABLE]
We obtained the following
Proposition 7**.**
The critical point of the function is the unique solution of the equation
[TABLE]
So in the commutative case, the equation characterizing the barycenter (28) is simpler than that in the non-commutative case (21). Note that if all the ’s are in the same MASA , then the barycenter is also in and hence it has the form described in Proposition 7. One way to show this is to use the data processing inequality (DPI) for the orthogonal projection onto which is completely positive and trace preserving, and which is denoted by to express the analogy with the classical conditional expectation. So let be the unique minimizer of Now
[TABLE]
hence which means that We also note that under the assumption for all s, (21) clearly coincides with (28), because and in this case, by the identity
[TABLE]
we have
[TABLE]
[TABLE]
Example 8*.*
Let for Then is of the form
[TABLE]
and the barycenter equation (28) reads as
[TABLE]
That is, the barycenter coincides with the weighted power mean of order which is by definition the unique positive definite solution of the equation see [21, Def. 3.2]. This example does not contain new results, the above characterization of the barycenter as weighted power mean can be found, e.g., in [2] or in [29].
Remark 9*.*
By the special choice in Example 8, we get that the claim of Bhatia et al. saying that the barycenter and the weighted power mean of order coincide [8, Thm. 9] is true in the commutative case.
Example 10*.*
Set Then is the relative entropy, that is,
[TABLE]
and the barycenter equation (28) reads as
[TABLE]
That is, the barycenter coincides with the weighted sum of the ’s. This is well-known, see, e.g., the remarks after Theorem 4 in [8].
Note that we get Example 10 from Example 8 if we take the limit Indeed,
[TABLE]
where and in the locally uniform topology.
5. Remarks
5.1. A note on a paper of Bhatia et al
In our view, Theorem 9 in [8] is not true in general. The proof contains a gap, namely, using their notation, the fact that is a critical point for does not imply that is a critical point for although formula (54) in [8] is correct.
It is true, that for commuting operators, (21) and (28) coincide. However, these equations are different without the assumption of commutativity. To demonstrate the difference, we take the following example. Let be the arcsine distribution, let and
[TABLE]
Then numerical optimization performed by Wolfram Mathematica [31] shows that
[TABLE]
[TABLE]
Note that both and have real entries. Therefore, and hence holds for every and where denotes the entrywise complex conjugate of Consequently, the strict convexity of the functions implies that has real entries. So it is enough to minimize numerically over the cone of positive definite matrices with real entries [31].
However, the barycenter obtained numerically in (36) does not coincide with the weighted power mean of order as
[TABLE]
Note that after the publication of our manuscript on arXiv.org, a correction of [8] dedicated to this problem was released [9].
5.2. A possible measure of non-commutativity
Motivated by the observations above, we introduce a function that quantifies the noncommutativity of the positive definite operators
Definition 11**.**
Given with and a convenient metric on the -dependent measure of the non-commutativity of is defined as
[TABLE]
where
[TABLE]
i.e., is the solution of (21), and is a -dependent -weighted mean of defined as the unique solution of the matrix equation (28) that we recall here for convenience:
[TABLE]
The detailed study of the quantity (38) is beyond the scope of this paper, however, it may be the subject of subsequent works.
Acknowledgements
We are grateful to Milán Mosonyi for drawing our attention to Ref.’s [8, 17, 19, 22, 25, 26], for comments on earlier versions of this paper, and for several discussions on the topic. We are also grateful to Miklós Pálfia for several discussions; to László Erdős for his essential suggestions on the structure and highlights of this paper, and for his comments on earlier versions; and to the anonymous referee for his/her valuable comments and suggestions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. Amari, Information Geometry and its Applications, Springer (Tokyo), 2016.
- 2[2] S. Amari, Integration of stochastic models by minimizing α 𝛼 \alpha -divergence, Neural Comput. 19 (2007), 2780-2796.
- 3[3] T. Ando, Concavity of certain maps on positive definite matrices and applications to Hadamard products, Linear Algebra Appl. 26 (1979), 203-241.
- 4[4] T. Ando, F. Kubo, Means of positive linear operators , Math. Ann. 246 (1980), 205–224.
- 5[5] T. Ando, F. Hiai, Operator log-convex functions and operator means, Math. Ann. 350 (2011), 611-630.
- 6[6] T. Ando, Topics on operator inequalities, Lecture note, Sapporo, 1978.
- 7[7] R. Bhatia, Matrix Analysis, Springer-Verlag, New York, 1997.
- 8[8] R. Bhatia, S. Gaubert, T. Jain, Matrix versions of the Hellinger distance, Lett. Math. Phys. 109 (2019), 1777–1804.
