Parameterized Wasserstein mean with its properties
Sejong Kim

TL;DR
This paper introduces a parameterized Wasserstein mean for positive definite matrices, exploring its properties, inequalities, and relations to other means, extending classical results like the Lie-Trotter-Kato formula.
Contribution
It proposes a new mean generalizing the Wasserstein mean, analyzes its properties, bounds, and majorization relations, extending existing mathematical frameworks.
Findings
Established norm inequalities and bounds for the mean.
Extended the Lie-Trotter-Kato formula to this new mean.
Proved log-majorization properties using the Cartan mean.
Abstract
A new least squares mean of positive definite matrices for the divergence associated with the sandwiched quasi-relative entropy has been introduced. It generalizes the well-known Wasserstein mean for covariance matrices of Gaussian distributions with mean zero, so we call it the parameterized Wasserstein mean. We investigate in this article norm inequality of the parameterized Wasserstein mean, give its bounds with respect to the Loewner order, and show the extended version of Lie-Trotter-Kato formula for the parameterized Wasserstein mean. Finally we show the log-majorzation properties of the parameterized Wasserstein mean by using the Cartan mean.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematical Inequalities and Applications · Statistical Mechanics and Entropy · Geometric Analysis and Curvature Flows
Parameterized Wasserstein mean with its properties
Sejong Kim
Abstract.
A new least squares mean of positive definite matrices for the divergence associated with the sandwiched quasi-relative entropy has been introduced. It generalizes the well-known Wasserstein mean for covariance matrices of Gaussian distributions with mean zero, so we call it the parameterized Wasserstein mean. We investigate in this article norm inequality of the parameterized Wasserstein mean, give its bounds with respect to the Loewner order, and show the extended version of Lie-Trotter-Kato formula for the parameterized Wasserstein mean. Finally we show the log-majorzation properties of the parameterized Wasserstein mean by using the Cartan mean.
Keywords: parameterized Wasserstein mean, Cartan mean, sandwiched quasi-relative entropy, log-majorization
1. Introduction
The Fréchet mean (or barycenter) is a natural average arising from the least squares mean when the space has a metric structure. On the other hand, it is not easy to know whether the Fréchet mean exists on a metric space. It has been known from [22], in general, that the Fréchet mean exists uniquely on the Hadamard space, which is the complete metric space satisfying the semi-parallelogram law. A typical and important example of the Hadamard space is the open convex cone of positive definite matrices equipped with the Riemannian trace metric . For an -tuple of positive definite matrices and a positive probability vector the Fréchet mean (also called the Cartan mean, Karcher mean)
[TABLE]
has been widely studied in theoretical and computational aspects: see [16, 18, 19, 20, 21, 26].
Especially the Wasserstein metric space of probability measures with barycenters has been recently important in a variety of research fields: see [3, 23, 24] and their bibliographies. There are several interesting results about Wasserstein barycenters on the set of all probability measures on the Euclidean space with finite second moment [1, 2, 11], including the fixed point approach to the Wasserstein mean of Gaussian distributions. For the -Wasserstein metric is defined as
[TABLE]
where denotes the set of all couplings on with marginals and . In particular, the -Wasserstein distance for two Gaussian distributions and with mean [math] and covariance matrices is formulated as
[TABLE]
where we consider that and are positive definite matrices. Note that this metric, denoted as and called the Bures-Wasserstein distance, coincides with the Bures distance of density matrices in quantum information theory and is the matrix version of the Hellinger distance of probability vectors.
For given -tuple and a positive probability vector the Wasserstein mean is the least squares mean for the Bures-Wasserstein distance:
[TABLE]
It has been shown that such a minimizer exists uniquely by using non-smooth analysis, convex duality and the theory of optimal transport [1] and by using matrix analysis [7]. Moreover, lots of interesting properties for the Wasserstein mean of positive definite matrices have been established: an iteration approach to the Wasserstein mean using the optimal transport map [2], a log-majorization property of the Wasserstein mean [6], and several inequalities (in terms of Loewner order and operator norm) and an extended version of Lie-Trotter-Kato formula for the Wasserstein mean [14].
In recent works the sandwiched quasi-relative entropy as a parameterized version of fidelity has been introduced in [10, 25]:
[TABLE]
Note that the usual fidelity is the case and it is a variant of the relative Rényi entropy. Furthermore, it has been shown in [8] that the sandwiched quasi-relative entropy is strictly concave and the following minimization problem
[TABLE]
has a unique solution by Brouwer’s fixed point theorem. So it generalizes the Wasserstein mean for , and we call it the parameterized Wasserstein mean. In this paper we investigate norm inequality of the parameterized Wasserstein mean, give bounds of the parameterized Wasserstein mean with respect to the Loewner order, and show that the parameterized Wasserstein mean satisfies the extended version of Lie-Trotter-Kato formula. Finally, we show the log-majorzation property of parameterized Wasserstein mean by using the Cartan mean.
2. Symmetric weighted geometric mean
Let be the real vector space of all Hermitian matrices. Let be the open convex cone of all positive definite matrices. The general linear group of all invertible matrices acts on via congruence transformations for and . For any we write if is positive semi-definite, and if is positive definite. This is indeed a partial order on , known as the Loewner order.
Let be the simplex of positive probability vectors in convexly spanned by the unit coordinate vectors. Let , , a permutation on -letters, and . For convenience, we denote as
[TABLE]
Definition 2.1**.**
We define a symmetric weighted geometric mean of positive definite matrices to be a map that satisfies the following properties: For , , , , and , where , these are
- (P1)
(Consistency with scalars) if the ’s commute;
- (P2)
(Joint homogeneity) ;
- (P3)
(Permutation invariance) ;
- (P4)
(Monotonicity) If for all , then ;
- (P5)
(Continuity) The map is continuous;
- (P6)
(Congruence invariance) ;
- (P7)
(Joint concavity) for ;
- (P8)
(Self-duality) ;
- (P9)
(Determinantal identity) ;
- (P10)
(Arithmetic-Geometric-Harmonic weighted mean inequalities)
[TABLE]
A map satisfying (P1)-(P10) except (P3) is called a (asymmetric) weighted geometric mean.
Note that the two-variable weighted geometric mean
[TABLE]
is uniquely determined by (P1) and (P6), and also fulfils (P1)-(P10). Moreover, the two-variable weighted geometric mean is the unique (up to parameterization) geodesic on the Hadamard space with the Riemannian trace metric.
There are many different kinds of symmetric weighted geometric means on the open convex cone including the Ando-Li-Mathias (ALM) mean [4] and Bini-Meini-Poloni (BMP) mean [9]. Among them a natural and canonical mean is the least squares mean, called the Cartan mean, which is the unique minimizer of the weighted sum of squares of the Riemannian trace metric :
[TABLE]
In [18], Lawson and Lim verified that the Cartan mean satisfies all the properties (P1)-(P10). Computing appropriate derivatives as in [5] yields that the Cartan mean coincides with the unique solution of the Karcher equation
[TABLE]
Recently, Yamazaki [26] has shown a unique characterization of the Cartan mean among other symmetric weighted geometric means, and its generalization to the probability measures with finite second moment for the Riemannian trace metric has been proved in [17].
Theorem 2.2**.**
[17, 26]** Let the map be the symmetric weighted geometric mean satisfying
[TABLE]
for any and . Then . Furthermore, the Cartan mean satisfies the property (2.3).
3. Parameterized Wasserstein means
Let , and let . For any the following minimization problem
[TABLE]
has been solved in [8], so it gives us a new multivariate matrix mean. We recall its known results in this section, and investigate more interesting consequences in the later sections.
Note that the quantity , called the sandwiched quasi-relative entropy, is a parameterized version of fidelity since is the usual fidelity. Furthermore, the objective function is strictly convex and its gradient is given by
[TABLE]
To prove the existence and uniqueness of the minimization problem (3.4), it is enough to show that the equation has a positive definite solution. Note that
[TABLE]
It has been shown in [8] that the map defined by is a self-map on the closed interval , where
and .
We denote as and the smallest and largest eigenvalues of , respectively. By Brouwer’s fixed point theorem, the map has a fixed point. This yields the existence and uniqueness of the minimizer of (3.4).
Definition 3.1**.**
Let and . For , the parameterized Wasserstein mean is defined as
[TABLE]
Theorem 3.2**.**
The parameterized Wasserstein mean is the unique positive definite matrix satisfying that
[TABLE]
equivalently,
[TABLE]
For given and , we denote as
[TABLE]
where the number of blocks in the last expression is .
The following are some properties of parameterized Wasserstein mean, compared with those of the Cartan mean.
Theorem 3.3**.**
Properties of parameterized Wasserstein mean.
- (1)
Consistency with scalars* if the ’s commute.*
- (2)
Homogeneity* for any positive scalar .*
- (3)
Permutation invariance* for any permutation on .*
- (4)
Repetition invariance* for any .*
- (5)
Unitary congruence invariance* for any unitary .*
- (6)
Determinantal inequality* .*
Moreover, if and only if , where .
Proof.
Most of items can be proved by Theorem 3.2, so we prove some.
- (1)
Assume that all ’s commute, so they are simultaneously diagonalizable. Set . Then also commutes with all the ’s, and is a solution of the equation (3.6). By uniqueness of the positive definite solution for the equation (3.6), .
- (6)
Let . Then . By the arithmetic-Cartan mean inequality,
[TABLE]
Applying Corollary 7.7.4 (e) in [13] and the determinantal identity of Cartan mean, we have
[TABLE]
Solving for , we obtain the desired inequality.
∎
Remark 3.4**.**
Using the strict concavity of the map in Theorem 7.6.6 in [13], we can not prove only the determinantal inequality of the parameterized Wasserstein mean, but also obtain the condition that the determinantal equality holds. Indeed, taking the map on the equation (3.6) yields
[TABLE]
which we get the inequality by solving for . Moreover, the equality of Theorem 3.3 (6) holds if and only if for all and . By the definition of two-variable weighted geometric mean it is equivalent to for all and .
Lemma 3.5**.**
Let and with for all and some positive scalars . Then for any .
Proof.
Assume that for all . Let and set . Since the congruence transformation and the map for preserve the Loewner order, we have , and . Then
[TABLE]
So by Theorem 3.2, and hence, . ∎
4. Inequalities of parameterized Wasserstein means
In the following we let and .
Theorem 4.1**.**
For
[TABLE]
where denotes the operator norm.
Proof.
Let . Then by (3.5), by the triangle inequality for the operator norm, by the fact that for any and , and by the sub-multiplicativity for the operator norm in [13, Section 5.6]
[TABLE]
Hence, by simplification for , we obtain the desired inequality. ∎
Proposition 4.2**.**
For
[TABLE]
Proof.
Let . Then . Since the function for is convex on from [5, Theorem 1.5.8], we have
[TABLE]
By the simple calculation we obtain the desired inequality. ∎
Remark 4.3**.**
The arithmetic-Wasserstein mean inequality
[TABLE]
has been already proved in [7], and Proposition 4.2 for also yields the inequality.
Theorem 4.4**.**
The parameterized Wasserstein mean has the following lower and upper bounds with respect to the Loewner order:
[TABLE]
where the second inequality holds when is invertible.
Proof.
Let . By the two-variable arithmetic-geometric-harmonic mean inequalities we have
[TABLE]
Since the weighted sum is operator monotone,
[TABLE]
Solving the second inequality for , we obtain the upper bound for the parameterized Wasserstein mean. Taking inverse on both sides of the first inequality and applying the arithmetic-harmonic mean inequality, we have
[TABLE]
Solving this for , we obtain the lower bound for the parameterized Wasserstein mean. ∎
The Lie-Trotter-Kato product formula of two bounded operators is not fundamental only in various research areas such as Lie theory and operator algebra, but is also widely used for Gold-Thompson trace inequality and majorization problem. It has been extended in [15] to multi-variable cases in terms of the multi-variable operator mean, what we call the multivariate Lie-Trotter mean. It has been proved that the multi-variable mean satisfying (P10) the arithmetic-geometric-harmonic mean inequalities is the multivariate Lie-Trotter mean. Even though the Wasserstein mean does not satisfy the Wasserstein-harmonic mean inequality, it has been proved by using another lower bound in [14] that the Wasserstein mean is also the multivariate Lie-Trotter mean. As an application of Theorem 4.4 we now show that the parameterized Wasserstein mean is the multivariate Lie-Trotter mean.
Lemma 4.5**.**
For , let be a continuous map with . Then for any there exists a such that for all .
Proof.
Let . Since is a continuous map with , there exists a such that for all , where . That is,
[TABLE]
since , where denotes the th eigenvalue of in decreasing order. It implies that , so . Thus, . ∎
Theorem 4.6**.**
The parameterized Wasserstein mean satisfies
[TABLE]
where for , are differentiable curves with for all .
Proof.
Let and let be differentiable curves with for all . By Lemma 4.5 there exists a sufficiently small so that for all and . Then , and for any .
By Theorem 4.4, we have
[TABLE]
Taking logarithms, using the operator monotonicity of the logarithm map, and multiplying all terms by for , we get
[TABLE]
Note that
[TABLE]
Taking the limit as in (4.7), we obtain
[TABLE]
Since the logarithm map is diffeomorphic, we get the desired identity. By the similar argument for , we obtain the conclusion. ∎
The notions of operator convexity and concavity are characterized by Jensen type inequalities in [12]. For every contraction we have
[TABLE]
and
[TABLE]
For such that its inverse is a contraction,
[TABLE]
Theorem 4.7**.**
Let . Then
- (1)
* implies , and*
- (2)
* implies .*
Proof.
Let .
- (1)
Assume that . Then , and by (4.10)
[TABLE]
Thus, by (3.5) and the above inequality
[TABLE]
- (2)
Assume that . Then by (3.5) and (4.9)
[TABLE]
so . Thus, we obtain (2) by taking inverse on both sides.
∎
For let , the set of all matrices with entries in the field of complex numbers. We define a map as
[TABLE]
Then one can easily see that is a positive linear and unital map.
Theorem 4.8**.**
Let . Let be the map satisfying the inequality
[TABLE]
If there exist positive scalars and such that for all , then
[TABLE]
for any , where .
Proof.
For some positive scalars and such that for all , we have that for by Lemma 3.5, and for all . So
[TABLE]
Applying Proposition 2.7.8 in [5] to the positive linear map , we obtain
[TABLE]
Equivalently, by Theorem 3.2
[TABLE]
Taking the congruence transformation by on both sides and applying the inequality (4.12), we obtain
[TABLE]
∎
5. Log-Majorization
Let and be two -tuples of nonnegative numbers. Let be the decreasing rearrangement of . If for all
[TABLE]
then we say that is weakly log-majorized by , and write it as . In addition, if the equality holds for , then we say that is log-majorized by , and write it as .
A standard technique in the theory of log majorization is the use of antisymmetric tensor powers. For we denote by the set of multi-indices with . Let and let . Then denotes the matrix obtained from by picking its entries from the rows corresponding to and the columns corresponding to . Recall that is a map assigning each to an \left(\begin{array}[]{c}m\\ k\\ \end{array}\right)\times\left(\begin{array}[]{c}m\\ k\\ \end{array}\right) matrix whose th entry for is given by , where the elements of are ordered by the lexicographic ordering (or the dictionary order). There are interesting properties for the antisymmetric tensor powers of positive matrix. Note that for any constant , where is the identity matrix with certain dimension, and
[TABLE]
The map is multiplicative, that is,
and .
So it is clear that for any and , and moreover, it can be extended to the symmetric weighted geometric means such as the ALM (Ando-Li-Mathias) mean, BMP (Bini-Meini-Poloni) mean, and Cartan mean :
[TABLE]
It has been shown in [15] that the map satisfying (P10) the arithmetic-geometric-harmonic weighted mean inequalities for given is the multivariate Lie-Trotter mean, as an extended version of the Lie-Trotter-Kato formula:
[TABLE]
where for , are any differentiable curves with for all . In particular, taking for each we obtain
Lemma 5.1**.**
Let the map satisfy (P10) the arithmetic-geometric-harmonic weighted mean inequalities. Then for given and
[TABLE]
where is the log-Euclidean mean.
Theorem 5.2**.**
Let and . For ,
[TABLE]
Proof.
The first log-majorization has been proved in [6].
Let . Then . Since the function for is operator concave on from [5, Theorem 4.2.3],
[TABLE]
For the symmetric weighted geometric mean satisfying the monotonicity, (P10) and (5.13), we have
[TABLE]
and moreover,
[TABLE]
Assume that . Then , so
[TABLE]
By the Loewner-Heinz inequality, it implies that for
[TABLE]
Applying the monotonicity and (5.13) of the mean to (5.15), we have
[TABLE]
Taking power on both sides yields
[TABLE]
Letting and using Lemma 5.1, we obtain that .
We have shown that for , implies . This yields that , that is,
[TABLE]
From the determinantal inequality of parameterized Wasserstein mean in Theorem 3.3 (6), we can see that the above inequality still holds for . Hence, the log-Euclidean mean is weakly log-majorized by the parameterized Wasserstein mean . ∎
The following shows the weak log-majorization between the Cartan mean of powers of given positive definite matrices and the power of parameterized Wasserstein mean of given positive definite matrices.
Theorem 5.3**.**
Let and . For ,
[TABLE]
where for any and .
Proof.
Let . Then . Since the logarithmic function is operator concave by Exercise 4.2.5 in [5], we have
[TABLE]
By Theorem 2.2 , and by the multiplicativity of antisymmetric tensor power and (5.13)
[TABLE]
Assume that for . Taking the congruence transformation by on both sides of (5.16) and applying (4.9) yield
[TABLE]
Taking the congruence transformation by on both sides implies
[TABLE]
We have shown that for , implies that . Let . Then by the homogeneity of parameterized Wasserstein mean in Theorem 3.3 (2)
[TABLE]
It implies that
[TABLE]
that is, . Thus,
[TABLE]
By the determinantal inequality of parameterized Wasserstein mean in Theorem 3.3 (6), we obtain the weak log-majorization between and . ∎
Acknowledgement
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (No. NRF-2018R1C1B6001394).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Agueh and G. Carlier, Barycenters in the Wasserstein space, SIAM J. Math. Anal. Appl. 43 (2011), 904-924.
- 2[2] P. C. Alvarez-Esteban, E. del Barrio, J. A. Cuesta-Albertos and C. Matran, A fixed point approach to barycenters in Wasserstein spaces, J. Math. Anal. Appl. 441 (2016), 744-762.
- 3[3] L. Ambrosio, N. Gigli, and G. Savaré, Gradient flows in metric spaces and in the space of probability measures, 2nd edition, Birkhäuser, 2008.
- 4[4] T. Ando, C. K. Li and R. Mathias, Geometric means, Linear Algebra Appl. 385 (2004), 305-334.
- 5[5] R. Bhatia, Positive Definite Matrices, Princeton Series in Applied Mathematics, Princeton, 2007.
- 6[6] R. Bhatia, T. Jain and Y. Lim, Inequalities for the Wasserstein mean of positive definite matrices, to appear in Linear Algebra and Its Applications.
- 7[7] R. Bhatia, T. Jain and Y. Lim, On the Bures-Wasserstein distance between positive definite matrices, to appear in Expositiones Mathematicae.
- 8[8] R. Bhatia, T. Jain and Y. Lim, Strong convexity of sandwiched entropies and related optimization problems, in preparation.
