Parameterized Wasserstein mean with its properties

Sejong Kim

arXiv:1904.09385·math.FA·August 27, 2019

Parameterized Wasserstein mean with its properties

Sejong Kim

PDF

Open Access

TL;DR

This paper introduces a parameterized Wasserstein mean for positive definite matrices, exploring its properties, inequalities, and relations to other means, extending classical results like the Lie-Trotter-Kato formula.

Contribution

It proposes a new mean generalizing the Wasserstein mean, analyzes its properties, bounds, and majorization relations, extending existing mathematical frameworks.

Findings

01

Established norm inequalities and bounds for the mean.

02

Extended the Lie-Trotter-Kato formula to this new mean.

03

Proved log-majorization properties using the Cartan mean.

Abstract

A new least squares mean of positive definite matrices for the divergence associated with the sandwiched quasi-relative entropy has been introduced. It generalizes the well-known Wasserstein mean for covariance matrices of Gaussian distributions with mean zero, so we call it the parameterized Wasserstein mean. We investigate in this article norm inequality of the parameterized Wasserstein mean, give its bounds with respect to the Loewner order, and show the extended version of Lie-Trotter-Kato formula for the parameterized Wasserstein mean. Finally we show the log-majorzation properties of the parameterized Wasserstein mean by using the Cartan mean.

Equations146

G (ω; A) = X \in P_{m} ar g min j = 1 \sum n w_{j} δ^{2} (X, A_{j}),

G (ω; A) = X \in P_{m} ar g min j = 1 \sum n w_{j} δ^{2} (X, A_{j}),

W_{2} (μ, ν) := {π \in Π (μ, ν) in f \int_{R^{n}} ∥ x - y ∥^{2} d π (x, y)}^{1/2},

W_{2} (μ, ν) := {π \in Π (μ, ν) in f \int_{R^{n}} ∥ x - y ∥^{2} d π (x, y)}^{1/2},

\frac{1}{2} W_{2} (μ, ν) = [tr (\frac{A + B}{2}) - tr (A^{1/2} B A^{1/2})^{1/2}]^{1/2},

\frac{1}{2} W_{2} (μ, ν) = [tr (\frac{A + B}{2}) - tr (A^{1/2} B A^{1/2})^{1/2}]^{1/2},

Ω (ω; A) = X \in P_{m} ar g min j = 1 \sum n w_{j} d^{2} (X, A_{j}) .

Ω (ω; A) = X \in P_{m} ar g min j = 1 \sum n w_{j} d^{2} (X, A_{j}) .

F_{t} (A, B) = tr (A^{\frac{1 - t}{2 t}} B A^{\frac{1 - t}{2 t}})^{t}, t \in (0, \infty) .

F_{t} (A, B) = tr (A^{\frac{1 - t}{2 t}} B A^{\frac{1 - t}{2 t}})^{t}, t \in (0, \infty) .

X \in P_{m} ar g min j = 1 \sum n w_{j} [tr ((1 - t) A_{j} + tX) - F_{t} (A_{j}, X)] .

X \in P_{m} ar g min j = 1 \sum n w_{j} [tr ((1 - t) A_{j} + tX) - F_{t} (A_{j}, X)] .

ω_{σ} A_{σ} M A M^{*} A^{- 1} := (w_{σ (1)}, \dots, w_{σ (n)}) \in Δ_{n} := (A_{σ (1)}, \dots, A_{σ (n)}) \in P_{m}^{n} := (M A_{1} M^{*}, \dots, M A_{n} M^{*}) \in P_{m}^{n} := (A_{1}^{- 1}, \dots, A_{n}^{- 1}) \in P_{m}^{n} .

ω_{σ} A_{σ} M A M^{*} A^{- 1} := (w_{σ (1)}, \dots, w_{σ (n)}) \in Δ_{n} := (A_{σ (1)}, \dots, A_{σ (n)}) \in P_{m}^{n} := (M A_{1} M^{*}, \dots, M A_{n} M^{*}) \in P_{m}^{n} := (A_{1}^{- 1}, \dots, A_{n}^{- 1}) \in P_{m}^{n} .

H (ω; A) := (i = 1 \sum n w_{i} A_{i}^{- 1})^{- 1} \leq M (ω; A) \leq i = 1 \sum n w_{i} A_{i} =: A (ω; A) .

H (ω; A) := (i = 1 \sum n w_{i} A_{i}^{- 1})^{- 1} \leq M (ω; A) \leq i = 1 \sum n w_{i} A_{i} =: A (ω; A) .

M (w_{1}, w_{2}; A, B) = A^{1/2} (A^{- 1/2} B A^{- 1/2})^{w_{2}} A^{1/2} =: A #_{w_{2}} B

M (w_{1}, w_{2}; A, B) = A^{1/2} (A^{- 1/2} B A^{- 1/2})^{w_{2}} A^{1/2} =: A #_{w_{2}} B

G (ω; A_{1}, \dots, A_{n}) = X \in P_{m} ar g min i = 1 \sum n w_{i} δ^{2} (X, A_{i}) .

G (ω; A_{1}, \dots, A_{n}) = X \in P_{m} ar g min i = 1 \sum n w_{i} δ^{2} (X, A_{i}) .

i = 1 \sum n w_{i} lo g (X^{- 1/2} A_{i} X^{- 1/2}) = O .

i = 1 \sum n w_{i} lo g (X^{- 1/2} A_{i} X^{- 1/2}) = O .

j = 1 \sum n w_{j} lo g A_{j} \leq 0 ⟹ M (ω; A) \leq I

j = 1 \sum n w_{j} lo g A_{j} \leq 0 ⟹ M (ω; A) \leq I

X \in P_{m} ar g min j = 1 \sum n w_{j} [tr ((1 - t) A_{j} + tX) - tr (A_{j}^{\frac{1 - t}{2 t}} X A_{j}^{\frac{1 - t}{2 t}})^{t}]

X \in P_{m} ar g min j = 1 \sum n w_{j} [tr ((1 - t) A_{j} + tX) - tr (A_{j}^{\frac{1 - t}{2 t}} X A_{j}^{\frac{1 - t}{2 t}})^{t}]

\nabla φ_{t} (X) = t [I - j = 1 \sum n w_{j} (A_{j}^{\frac{1 - t}{t}} #_{1 - t} X^{- 1})] .

\nabla φ_{t} (X) = t [I - j = 1 \sum n w_{j} (A_{j}^{\frac{1 - t}{t}} #_{1 - t} X^{- 1})] .

\nabla φ_{t} (X) = 0 ⟺ X = j = 1 \sum n w_{j} (X^{1/2} A_{j}^{\frac{1 - t}{t}} X^{1/2})^{t} .

\nabla φ_{t} (X) = 0 ⟺ X = j = 1 \sum n w_{j} (X^{1/2} A_{j}^{\frac{1 - t}{t}} X^{1/2})^{t} .

Ω_{t} (ω; A) = X \in P_{m} ar g min j = 1 \sum n w_{j} [tr ((1 - t) A_{j} + tX) - tr (A_{j}^{\frac{1 - t}{2 t}} X A_{j}^{\frac{1 - t}{2 t}})^{t}] .

Ω_{t} (ω; A) = X \in P_{m} ar g min j = 1 \sum n w_{j} [tr ((1 - t) A_{j} + tX) - tr (A_{j}^{\frac{1 - t}{2 t}} X A_{j}^{\frac{1 - t}{2 t}})^{t}] .

j = 1 \sum n w_{j} (A_{j}^{\frac{1 - t}{t}} #_{1 - t} X^{- 1}) = I,

j = 1 \sum n w_{j} (A_{j}^{\frac{1 - t}{t}} #_{1 - t} X^{- 1}) = I,

X = j = 1 \sum n w_{j} (X^{1/2} A_{j}^{\frac{1 - t}{t}} X^{1/2})^{t} .

X = j = 1 \sum n w_{j} (X^{1/2} A_{j}^{\frac{1 - t}{t}} X^{1/2})^{t} .

A^{p} ω^{p} = (\underline{A_{1}, \dots, A_{n}}, \dots, \underline{A_{1}, \dots, A_{n}}) \in P_{m}^{n p}, = \frac{1}{p} (\underline{w_{1}, \dots, w_{n}}, \dots, \underline{w_{1}, \dots, w_{n}}) \in Δ_{n p},

A^{p} ω^{p} = (\underline{A_{1}, \dots, A_{n}}, \dots, \underline{A_{1}, \dots, A_{n}}) \in P_{m}^{n p}, = \frac{1}{p} (\underline{w_{1}, \dots, w_{n}}, \dots, \underline{w_{1}, \dots, w_{n}}) \in Δ_{n p},

X \geq G (ω; (X^{1/2} A_{1}^{\frac{1 - t}{t}} X^{1/2})^{t}, \dots, (X^{1/2} A_{n}^{\frac{1 - t}{t}} X^{1/2})^{t}) .

X \geq G (ω; (X^{1/2} A_{1}^{\frac{1 - t}{t}} X^{1/2})^{t}, \dots, (X^{1/2} A_{n}^{\frac{1 - t}{t}} X^{1/2})^{t}) .

det X \geq j = 1 \prod n det (X^{1/2} A_{j}^{\frac{1 - t}{t}} X^{1/2})^{t w_{j}} = (det X)^{t} j = 1 \prod n (det A_{j})^{(1 - t) w_{j}} .

det X \geq j = 1 \prod n det (X^{1/2} A_{j}^{\frac{1 - t}{t}} X^{1/2})^{t w_{j}} = (det X)^{t} j = 1 \prod n (det A_{j})^{(1 - t) w_{j}} .

0 = lo g det [j = 1 \sum n w_{j} (A_{j}^{\frac{1 - t}{t}} #_{1 - t} X^{- 1})] \geq j = 1 \sum n w_{j} lo g det (A_{j}^{\frac{1 - t}{t}} #_{1 - t} X^{- 1}) = (1 - t) j = 1 \sum n w_{j} lo g det A_{j} - (1 - t) lo g det X,

0 = lo g det [j = 1 \sum n w_{j} (A_{j}^{\frac{1 - t}{t}} #_{1 - t} X^{- 1})] \geq j = 1 \sum n w_{j} lo g det (A_{j}^{\frac{1 - t}{t}} #_{1 - t} X^{- 1}) = (1 - t) j = 1 \sum n w_{j} lo g det A_{j} - (1 - t) lo g det X,

α^{1 - t} X^{t} \leq j = 1 \sum n w_{j} (X^{1/2} A_{j}^{\frac{1 - t}{t}} X^{1/2})^{t} \leq β^{1 - t} X^{t} .

α^{1 - t} X^{t} \leq j = 1 \sum n w_{j} (X^{1/2} A_{j}^{\frac{1 - t}{t}} X^{1/2})^{t} \leq β^{1 - t} X^{t} .

∥ Ω_{t} (ω; A) ∥ \leq (j = 1 \sum n w_{j} ∥ A_{j} ∥^{1 - t})^{\frac{1}{1 - t}},

∥ Ω_{t} (ω; A) ∥ \leq (j = 1 \sum n w_{j} ∥ A_{j} ∥^{1 - t})^{\frac{1}{1 - t}},

∥ Ω_{t} (ω; A) ∥ = ∥ X ∥ = j = 1 \sum n w_{j} (X^{1/2} A_{j}^{\frac{1 - t}{t}} X^{1/2})^{t} \leq j = 1 \sum n w_{j} (X^{1/2} A_{j}^{\frac{1 - t}{t}} X^{1/2})^{t} \leq ∥ X ∥^{t} [j = 1 \sum n w_{j} ∥ A_{j} ∥^{1 - t}] .

∥ Ω_{t} (ω; A) ∥ = ∥ X ∥ = j = 1 \sum n w_{j} (X^{1/2} A_{j}^{\frac{1 - t}{t}} X^{1/2})^{t} \leq j = 1 \sum n w_{j} (X^{1/2} A_{j}^{\frac{1 - t}{t}} X^{1/2})^{t} \leq ∥ X ∥^{t} [j = 1 \sum n w_{j} ∥ A_{j} ∥^{1 - t}] .

Ω_{t} (ω; A)^{\frac{1 - t}{t}} \leq j = 1 \sum n w_{j} A_{j}^{\frac{1 - t}{t}} .

Ω_{t} (ω; A)^{\frac{1 - t}{t}} \leq j = 1 \sum n w_{j} A_{j}^{\frac{1 - t}{t}} .

Ω_{t} (ω; A)^{\frac{1}{t}} = X^{\frac{1}{t}} = [j = 1 \sum n w_{j} (X^{1/2} A_{j}^{\frac{1 - t}{t}} X^{1/2})^{t}]^{\frac{1}{t}} \leq j = 1 \sum n w_{j} X^{1/2} A_{j}^{\frac{1 - t}{t}} X^{1/2} .

Ω_{t} (ω; A)^{\frac{1}{t}} = X^{\frac{1}{t}} = [j = 1 \sum n w_{j} (X^{1/2} A_{j}^{\frac{1 - t}{t}} X^{1/2})^{t}]^{\frac{1}{t}} \leq j = 1 \sum n w_{j} X^{1/2} A_{j}^{\frac{1 - t}{t}} X^{1/2} .

Ω_{1/2} (ω; A) \leq j = 1 \sum n w_{j} A_{j}

Ω_{1/2} (ω; A) \leq j = 1 \sum n w_{j} A_{j}

\frac{1}{1 - t} I - \frac{t}{1 - t} j = 1 \sum n w_{j} A_{j}^{\frac{t - 1}{t}} \leq Ω_{t} (ω; A) \leq [\frac{1}{1 - t} I - \frac{t}{1 - t} j = 1 \sum n w_{j} A_{j}^{\frac{1 - t}{t}}]^{- 1},

\frac{1}{1 - t} I - \frac{t}{1 - t} j = 1 \sum n w_{j} A_{j}^{\frac{t - 1}{t}} \leq Ω_{t} (ω; A) \leq [\frac{1}{1 - t} I - \frac{t}{1 - t} j = 1 \sum n w_{j} A_{j}^{\frac{1 - t}{t}}]^{- 1},

[t A_{j}^{- \frac{1 - t}{t}} + (1 - t) X]^{- 1} \leq A_{j}^{\frac{1 - t}{t}} #_{1 - t} X^{- 1} \leq t A_{j}^{\frac{1 - t}{t}} + (1 - t) X^{- 1} .

[t A_{j}^{- \frac{1 - t}{t}} + (1 - t) X]^{- 1} \leq A_{j}^{\frac{1 - t}{t}} #_{1 - t} X^{- 1} \leq t A_{j}^{\frac{1 - t}{t}} + (1 - t) X^{- 1} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematical Inequalities and Applications · Statistical Mechanics and Entropy · Geometric Analysis and Curvature Flows

Full text

Parameterized Wasserstein mean with its properties

Sejong Kim

Abstract.

A new least squares mean of positive definite matrices for the divergence associated with the sandwiched quasi-relative entropy has been introduced. It generalizes the well-known Wasserstein mean for covariance matrices of Gaussian distributions with mean zero, so we call it the parameterized Wasserstein mean. We investigate in this article norm inequality of the parameterized Wasserstein mean, give its bounds with respect to the Loewner order, and show the extended version of Lie-Trotter-Kato formula for the parameterized Wasserstein mean. Finally we show the log-majorzation properties of the parameterized Wasserstein mean by using the Cartan mean.

Keywords: parameterized Wasserstein mean, Cartan mean, sandwiched quasi-relative entropy, log-majorization

1. Introduction

The Fréchet mean (or barycenter) is a natural average arising from the least squares mean when the space has a metric structure. On the other hand, it is not easy to know whether the Fréchet mean exists on a metric space. It has been known from [22], in general, that the Fréchet mean exists uniquely on the Hadamard space, which is the complete metric space satisfying the semi-parallelogram law. A typical and important example of the Hadamard space is the open convex cone $\mathbb{P}_{m}$ of $m\times m$ positive definite matrices equipped with the Riemannian trace metric $\delta(A,B)=\|\log A^{-1/2}BA^{-1/2}\|_{2}$ . For an $n$ -tuple $\mathbb{A}=(A_{1},\dots,A_{n})\in\mathbb{P}_{m}^{n}$ of positive definite matrices and a positive probability vector $\omega=(w_{1},\dots,w_{n})$ the Fréchet mean (also called the Cartan mean, Karcher mean)

[TABLE]

has been widely studied in theoretical and computational aspects: see [16, 18, 19, 20, 21, 26].

Especially the Wasserstein metric space of probability measures with barycenters has been recently important in a variety of research fields: see [3, 23, 24] and their bibliographies. There are several interesting results about Wasserstein barycenters on the set $\mathcal{P}^{2}(\mathbb{R}^{n})$ of all probability measures on the Euclidean space $\mathbb{R}^{n}$ with finite second moment [1, 2, 11], including the fixed point approach to the Wasserstein mean of Gaussian distributions. For $\mu,\nu\in\mathcal{P}^{2}(\mathbb{R}^{n})$ the $L_{2}$ -Wasserstein metric is defined as

[TABLE]

where $\Pi(\mu,\nu)$ denotes the set of all couplings on $\mathbb{R}^{n}\times\mathbb{R}^{n}$ with marginals $\mu$ and $\nu$ . In particular, the $L_{2}$ -Wasserstein distance for two Gaussian distributions $\mu$ and $\nu$ with mean [math] and covariance matrices $A,B$ is formulated as

[TABLE]

where we consider that $A$ and $B$ are $m\times m$ positive definite matrices. Note that this metric, denoted as $d(A,B)$ and called the Bures-Wasserstein distance, coincides with the Bures distance of density matrices in quantum information theory and is the matrix version of the Hellinger distance of probability vectors.

For given $n$ -tuple $\mathbb{A}=(A_{1},\dots,A_{n})\in\mathbb{P}_{m}^{n}$ and a positive probability vector $\omega=(w_{1},\dots,w_{n})$ the Wasserstein mean is the least squares mean for the Bures-Wasserstein distance:

[TABLE]

It has been shown that such a minimizer exists uniquely by using non-smooth analysis, convex duality and the theory of optimal transport [1] and by using matrix analysis [7]. Moreover, lots of interesting properties for the Wasserstein mean of positive definite matrices have been established: an iteration approach to the Wasserstein mean using the optimal transport map [2], a log-majorization property of the Wasserstein mean [6], and several inequalities (in terms of Loewner order and operator norm) and an extended version of Lie-Trotter-Kato formula for the Wasserstein mean [14].

In recent works the sandwiched quasi-relative entropy as a parameterized version of fidelity has been introduced in [10, 25]:

[TABLE]

Note that the usual fidelity is the case $t=1/2$ and it is a variant of the relative Rényi entropy. Furthermore, it has been shown in [8] that the sandwiched quasi-relative entropy $F_{t}$ is strictly concave and the following minimization problem

[TABLE]

has a unique solution by Brouwer’s fixed point theorem. So it generalizes the Wasserstein mean for $t=1/2$ , and we call it the parameterized Wasserstein mean. In this paper we investigate norm inequality of the parameterized Wasserstein mean, give bounds of the parameterized Wasserstein mean with respect to the Loewner order, and show that the parameterized Wasserstein mean satisfies the extended version of Lie-Trotter-Kato formula. Finally, we show the log-majorzation property of parameterized Wasserstein mean by using the Cartan mean.

2. Symmetric weighted geometric mean

Let $\mathbb{H}_{m}$ be the real vector space of all $m\times m$ Hermitian matrices. Let $\mathbb{P}_{m}\subset\mathbb{H}_{m}$ be the open convex cone of all $m\times m$ positive definite matrices. The general linear group $GL_{m}$ of all $m\times m$ invertible matrices acts on $\mathbb{P}_{m}$ via congruence transformations $\Gamma_{M}(X)=MXM^{*}$ for $M\in GL_{m}$ and $X\in\mathbb{P}_{m}$ . For any $A,B\in\mathbb{H}_{m}$ we write $A\leq B$ if $B-A$ is positive semi-definite, and $A<B$ if $B-A$ is positive definite. This is indeed a partial order on $\mathbb{H}_{m}$ , known as the Loewner order.

Let $\Delta_{n}$ be the simplex of positive probability vectors in $\mathbb{R}^{n}$ convexly spanned by the unit coordinate vectors. Let $\mathbb{A}=(A_{1},\dots,A_{n})\in\mathbb{P}_{m}^{n}$ , $\omega=(w_{1},\dots,w_{n})\in\Delta_{n}$ , $\sigma\in S^{n}$ a permutation on $n$ -letters, and $M\in GL_{m}$ . For convenience, we denote as

[TABLE]

Definition 2.1.

We define a symmetric weighted geometric mean of positive definite matrices to be a map $\mathfrak{M}:\Delta_{n}\times\mathbb{P}_{m}^{n}\to\mathbb{P}_{m}$ that satisfies the following properties: For $\mathbb{A}=(A_{1},\dots,A_{n}),\mathbb{B}=(B_{1},\dots,B_{n})\in\mathbb{P}_{m}^{n}$ , $\omega=(w_{1},\dots,w_{n})\in\Delta_{n}$ , $\sigma\in S^{n}$ , $M\in GL_{m}$ , and $\mathbf{a}=(a_{1},\dots,a_{n})\in\mathbb{R}_{++}^{n}$ , where $\mathbb{R}_{++}:=(0,\infty)$ , these are

(P1)

(Consistency with scalars) $\mathfrak{M}(\omega;\mathbb{A})=A_{1}^{w_{1}}\cdots A_{n}^{w_{n}}$ if the $A_{i}$ ’s commute;

(P2)

(Joint homogeneity) $\mathfrak{M}(\omega;a_{1}A_{1},\dots,a_{n}A_{n})=a_{1}^{w_{1}}\cdots a_{n}^{w_{n}}\mathfrak{M}(\omega;\mathbb{A})$ ;

(P3)

(Permutation invariance) $\mathfrak{M}(\omega_{\sigma};\mathbb{A}_{\sigma})=\mathfrak{M}(\omega;\mathbb{A})$ ;

(P4)

(Monotonicity) If $B_{i}\leq A_{i}$ for all $1\leq i\leq n$ , then $\mathfrak{M}(\omega;\mathbb{B})\leq\mathfrak{M}(\omega;\mathbb{A})$ ;

(P5)

(Continuity) The map $\mathfrak{M}(\omega;\cdot)$ is continuous;

(P6)

(Congruence invariance) $\mathfrak{M}(\omega;M\mathbb{A}M^{*})=M\mathfrak{M}(\omega;\mathbb{A})M^{*}$ ;

(P7)

(Joint concavity) $\mathfrak{M}(\omega;\lambda\mathbb{A}+(1-\lambda)\mathbb{B})\geq\lambda\mathfrak{M}(\omega;\mathbb{A})+(1-\lambda)\mathfrak{M}(\omega;\mathbb{B})$ for $0\leq\lambda\leq 1$ ;

(P8)

(Self-duality) $\mathfrak{M}(\omega;\mathbb{A}^{-1})^{-1}=\mathfrak{M}(\omega;\mathbb{A})$ ;

(P9)

(Determinantal identity) $\displaystyle\det\mathfrak{M}(\omega;\mathbb{A})=\prod_{i=1}^{n}(\det A_{i})^{w_{i}}$ ;

(P10)

(Arithmetic-Geometric-Harmonic weighted mean inequalities)

[TABLE]

A map $\mathfrak{M}$ satisfying (P1)-(P10) except (P3) is called a (asymmetric) weighted geometric mean.

Note that the two-variable weighted geometric mean

[TABLE]

is uniquely determined by (P1) and (P6), and also fulfils (P1)-(P10). Moreover, the two-variable weighted geometric mean $A\#_{w_{2}}B$ is the unique (up to parameterization) geodesic on the Hadamard space $\mathbb{P}_{m}$ with the Riemannian trace metric.

There are many different kinds of symmetric weighted geometric means on the open convex cone $\mathbb{P}_{m}$ including the Ando-Li-Mathias (ALM) mean [4] and Bini-Meini-Poloni (BMP) mean [9]. Among them a natural and canonical mean is the least squares mean, called the Cartan mean, which is the unique minimizer of the weighted sum of squares of the Riemannian trace metric $\delta$ :

[TABLE]

In [18], Lawson and Lim verified that the Cartan mean $G$ satisfies all the properties (P1)-(P10). Computing appropriate derivatives as in [5] yields that the Cartan mean $G(\omega;\mathbb{A})$ coincides with the unique solution $X\in\mathbb{P}_{m}$ of the Karcher equation

[TABLE]

Recently, Yamazaki [26] has shown a unique characterization of the Cartan mean among other symmetric weighted geometric means, and its generalization to the probability measures with finite second moment for the Riemannian trace metric has been proved in [17].

Theorem 2.2.

[17, 26]** Let the map $\mathfrak{M}:\Delta_{n}\times\mathbb{P}_{m}^{n}\to\mathbb{P}_{m}$ be the symmetric weighted geometric mean satisfying

[TABLE]

for any $\mathbb{A}=(A_{1},\dots,A_{n})\in\mathbb{P}_{m}^{n}$ and $\omega=(w_{1},\dots,w_{n})\in\Delta_{n}$ . Then $\mathfrak{M}=G$ . Furthermore, the Cartan mean $G$ satisfies the property (2.3).

3. Parameterized Wasserstein means

Let $\mathbb{A}=(A_{1},\dots,A_{n})\in\mathbb{P}_{m}^{n}$ , and let $\omega=(w_{1},\dots,w_{n})\in\Delta_{n}$ . For any $t\in(0,1)$ the following minimization problem

[TABLE]

has been solved in [8], so it gives us a new multivariate matrix mean. We recall its known results in this section, and investigate more interesting consequences in the later sections.

Note that the quantity $\displaystyle F_{t}(A_{j},X)=\operatorname{tr}\left(A_{j}^{\frac{1-t}{2t}}XA_{j}^{\frac{1-t}{2t}}\right)^{t}$ , called the sandwiched quasi-relative entropy, is a parameterized version of fidelity since $F_{\frac{1}{2}}$ is the usual fidelity. Furthermore, the objective function $\displaystyle\varphi_{t}(X)=\sum_{j=1}^{n}w_{j}\left[\operatorname{tr}((1-t)A_{j}+tX)-\operatorname{tr}\left(A_{j}^{\frac{1-t}{2t}}XA_{j}^{\frac{1-t}{2t}}\right)^{t}\right]$ is strictly convex and its gradient is given by

[TABLE]

To prove the existence and uniqueness of the minimization problem (3.4), it is enough to show that the equation $\nabla\varphi_{t}(X)=0$ has a positive definite solution. Note that

[TABLE]

It has been shown in [8] that the map $H:\mathbb{P}_{m}\to\mathbb{P}_{m}$ defined by $\displaystyle H(X)=\sum_{j=1}^{n}w_{j}\left(X^{1/2}A_{j}^{\frac{1-t}{t}}X^{1/2}\right)^{t}$ is a self-map on the closed interval $[\alpha I,\beta I]:=\{X\in\mathbb{H}_{m}:\alpha I\leq X\leq\beta I\}$ , where

$\displaystyle\alpha:=\underset{1\leq i\leq n}{\min}\lambda_{\min}(A_{j})$ and $\displaystyle\beta:=\underset{1\leq i\leq n}{\max}\lambda_{\max}(A_{j})$ .

We denote as $\lambda_{\min}(A)$ and $\lambda_{\max}(A)$ the smallest and largest eigenvalues of $A$ , respectively. By Brouwer’s fixed point theorem, the map $H$ has a fixed point. This yields the existence and uniqueness of the minimizer of (3.4).

Definition 3.1.

Let $\mathbb{A}=(A_{1},\dots,A_{n})\in\mathbb{P}_{m}^{n}$ and $\omega=(w_{1},\dots,w_{n})\in\Delta_{n}$ . For $t\in(0,1)$ , the parameterized Wasserstein mean $\Omega_{t}(\omega;\mathbb{A})$ is defined as

[TABLE]

Theorem 3.2.

The parameterized Wasserstein mean $\Omega_{t}(\omega;\mathbb{A})$ is the unique positive definite matrix $X\in\mathbb{P}_{m}$ satisfying that

[TABLE]

equivalently,

[TABLE]

For given $\mathbb{A}=(A_{1},\dots,A_{n})\in\mathbb{P}_{m}^{n}$ and $\omega=(w_{1},\dots,w_{n})\in\Delta_{n}$ , we denote as

[TABLE]

where the number of blocks in the last expression is $p$ .

The following are some properties of parameterized Wasserstein mean, compared with those of the Cartan mean.

Theorem 3.3.

Properties of parameterized Wasserstein mean.

(1)

$($ Consistency with scalars $)$ * $\displaystyle\Omega_{t}(\omega;\mathbb{A})=\left(\sum_{j=1}^{n}w_{j}A_{j}^{1-t}\right)^{\frac{1}{1-t}}$ if the $A_{j}$ ’s commute.*

(2)

$($ Homogeneity $)$ * $\displaystyle\Omega_{t}(\omega;\alpha\mathbb{A})=\alpha\Omega_{t}(\omega;\mathbb{A})$ for any positive scalar $\alpha$ .*

(3)

$($ Permutation invariance $)$ * $\displaystyle\Omega_{t}(\omega_{\sigma};\mathbb{A}_{\sigma})=\Omega_{t}(\omega;\mathbb{A})$ for any permutation $\sigma$ on $\{1,\dots,n\}$ .*

(4)

$($ Repetition invariance $)$ * $\displaystyle\Omega_{t}(\omega^{p};\mathbb{A}^{p})=\Omega_{t}(\omega;\mathbb{A})$ for any $p\in\mathbb{N}$ .*

(5)

$($ Unitary congruence invariance $)$ * $\displaystyle\Omega_{t}(\omega;U\mathbb{A}U^{*})=U\Omega_{t}(\omega;\mathbb{A})U^{*}$ for any unitary $U$ .*

(6)

$($ Determinantal inequality $)$ * $\displaystyle\det\Omega_{t}(\omega;\mathbb{A})\geq\prod_{j=1}^{n}(\det A_{j})^{w_{j}}$ .*

Moreover, $X=\Omega_{t}(\omega;A_{1},\dots,A_{n-1},X)$ if and only if $X=\Omega_{t}(\hat{\omega};A_{1},\dots,A_{n-1})$ , where $\displaystyle\hat{\omega}=\frac{1}{1-w_{n}}(w_{1},\dots,w_{n-1})\in\Delta_{n-1}$ .

Proof.

Most of items can be proved by Theorem 3.2, so we prove some.

(1)

Assume that all $A_{j}$ ’s commute, so they are simultaneously diagonalizable. Set $\displaystyle X=\left(\sum_{j=1}^{n}w_{j}A_{j}^{1-t}\right)^{\frac{1}{1-t}}$ . Then $X$ also commutes with all the $A_{j}$ ’s, and is a solution of the equation (3.6). By uniqueness of the positive definite solution for the equation (3.6), $X=\Omega_{t}(\omega;\mathbb{A})$ .

(6)

Let $X=\Omega_{t}(\omega;\mathbb{A})$ . Then $\displaystyle X=\sum_{j=1}^{n}w_{j}\left(X^{1/2}A_{j}^{\frac{1-t}{t}}X^{1/2}\right)^{t}$ . By the arithmetic-Cartan mean inequality,

[TABLE]

Applying Corollary 7.7.4 (e) in [13] and the determinantal identity of Cartan mean, we have

[TABLE]

Solving for $\det X$ , we obtain the desired inequality.

∎

Remark 3.4.

Using the strict concavity of the map $f:\mathbb{P}_{m}\to\mathbb{R},\ f(A)=\log\det A$ in Theorem 7.6.6 in [13], we can not prove only the determinantal inequality of the parameterized Wasserstein mean, but also obtain the condition that the determinantal equality holds. Indeed, taking the map $f$ on the equation (3.6) yields

[TABLE]

which we get the inequality by solving for $\det X$ . Moreover, the equality of Theorem 3.3 (6) holds if and only if $A_{i}^{\frac{1-t}{t}}\#_{1-t}X^{-1}=A_{j}^{\frac{1-t}{t}}\#_{1-t}X^{-1}$ for all $i$ and $j$ . By the definition of two-variable weighted geometric mean it is equivalent to $A_{i}=A_{j}$ for all $i$ and $j$ .

Lemma 3.5.

Let $\omega=(w_{1},\dots,w_{n})\in\Delta_{n}$ and $\mathbb{A}=(A_{1},\dots,A_{n})\in\mathbb{P}_{m}^{n}$ with $0<\alpha I\leq A_{j}\leq\beta I$ for all $j$ and some positive scalars $\alpha,\beta$ . Then $\alpha I\leq\Omega_{t}(\omega;\mathbb{A})\leq\beta I$ for any $1/2\leq t<1$ .

Proof.

Assume that $0<\alpha I\leq A_{j}\leq\beta I$ for all $j=1,\dots,n$ . Let $1/2\leq t<1$ and set $X=\Omega_{t}(\omega;\mathbb{A})$ . Since the congruence transformation and the map $A\mapsto A^{r}$ for $r\in[0,1]$ preserve the Loewner order, we have $\alpha^{\frac{1-t}{t}}X\leq X^{1/2}A_{j}^{\frac{1-t}{t}}X^{1/2}\leq\beta^{\frac{1-t}{t}}X$ , and $\alpha^{1-t}X^{t}\leq\left(X^{1/2}A_{j}^{\frac{1-t}{t}}X^{1/2}\right)^{t}\leq\beta^{1-t}X^{t}$ . Then

[TABLE]

So $\alpha^{1-t}X^{t}\leq X\leq\beta^{1-t}X^{t}$ by Theorem 3.2, and hence, $\alpha I\leq X\leq\beta I$ . ∎

4. Inequalities of parameterized Wasserstein means

In the following we let $\mathbb{A}=(A_{1},\dots,A_{n})\in\mathbb{P}_{m}^{n}$ and $\omega=(w_{1},\dots,w_{n})\in\Delta_{n}$ .

Theorem 4.1.

For $t\in(0,1)$

[TABLE]

where $\|\cdot\|$ denotes the operator norm.

Proof.

Let $X=\Omega_{t}(\omega;\mathbb{A})$ . Then by (3.5), by the triangle inequality for the operator norm, by the fact that $\|A^{t}\|=\|A\|^{t}$ for any $A\in\mathbb{P}_{m}$ and $t\geq 0$ , and by the sub-multiplicativity for the operator norm in [13, Section 5.6]

[TABLE]

Hence, by simplification for $\|X\|$ , we obtain the desired inequality. ∎

Proposition 4.2.

For $1/2\leq t<1$

[TABLE]

Proof.

Let $X=\Omega_{t}(\omega;\mathbb{A})$ . Then $\displaystyle X=\sum_{j=1}^{n}w_{j}\left(X^{1/2}A_{j}^{\frac{1-t}{t}}X^{1/2}\right)^{t}$ . Since the function $f(A)=A^{r}$ for $1\leq r\leq 2$ is convex on $\mathbb{P}_{m}$ from [5, Theorem 1.5.8], we have

[TABLE]

By the simple calculation we obtain the desired inequality. ∎

Remark 4.3.

The arithmetic-Wasserstein mean inequality

[TABLE]

has been already proved in [7], and Proposition 4.2 for $t=1/2$ also yields the inequality.

Theorem 4.4.

The parameterized Wasserstein mean has the following lower and upper bounds with respect to the Loewner order:

[TABLE]

where the second inequality holds when $\displaystyle I-t\sum_{j=1}^{n}w_{j}A_{j}^{\frac{1-t}{t}}$ is invertible.

Proof.

Let $X=\Omega_{t}(\omega;\mathbb{A})$ . By the two-variable arithmetic-geometric-harmonic mean inequalities we have

[TABLE]

Since the weighted sum is operator monotone,

[TABLE]

Solving the second inequality for $X$ , we obtain the upper bound for the parameterized Wasserstein mean. Taking inverse on both sides of the first inequality and applying the arithmetic-harmonic mean inequality, we have

[TABLE]

Solving this for $X$ , we obtain the lower bound for the parameterized Wasserstein mean. ∎

The Lie-Trotter-Kato product formula of two bounded operators is not fundamental only in various research areas such as Lie theory and operator algebra, but is also widely used for Gold-Thompson trace inequality and majorization problem. It has been extended in [15] to multi-variable cases in terms of the multi-variable operator mean, what we call the multivariate Lie-Trotter mean. It has been proved that the multi-variable mean satisfying (P10) the arithmetic-geometric-harmonic mean inequalities is the multivariate Lie-Trotter mean. Even though the Wasserstein mean does not satisfy the Wasserstein-harmonic mean inequality, it has been proved by using another lower bound in [14] that the Wasserstein mean is also the multivariate Lie-Trotter mean. As an application of Theorem 4.4 we now show that the parameterized Wasserstein mean is the multivariate Lie-Trotter mean.

Lemma 4.5.

For $\epsilon>0$ , let $\gamma:(-\epsilon,\epsilon)\to\mathbb{P}_{m}$ be a continuous map with $\gamma(0)=I$ . Then for any $t\in(0,1)$ there exists a $\delta>0$ such that $\gamma(s)^{\frac{1-t}{t}}>tI$ for all $s\in(-\delta,\delta)$ .

Proof.

Let $t\in(0,1)$ . Since $\gamma:(-\epsilon,\epsilon)\to\mathbb{P}_{m}$ is a continuous map with $\gamma(0)=I$ , there exists a $\delta>0$ such that $\gamma(s)\in B_{r}(I)=\{A\in\mathbb{H}_{m}:\|A-I\|<r\}$ for all $s\in(-\delta,\delta)$ , where $r:=1-t^{\frac{t}{1-t}}>0$ . That is,

[TABLE]

since $\gamma(s)-I\in\mathbb{H}_{m}$ , where $\lambda_{i}(A)$ denotes the $i$ th eigenvalue of $A\in\mathbb{H}_{m}$ in decreasing order. It implies that $\lambda_{i}(\gamma(s))>1-r=t^{\frac{t}{1-t}}$ , so $\gamma(s)>t^{\frac{t}{1-t}}I$ . Thus, $\gamma(s)^{\frac{1-t}{t}}>tI$ . ∎

Theorem 4.6.

The parameterized Wasserstein mean satisfies

[TABLE]

where for $\epsilon>0$ , $\gamma_{j}:(-\epsilon,\epsilon)\to\mathbb{P}_{m}$ are differentiable curves with $\gamma_{j}(0)=I$ for all $j=1,\dots,n$ .

Proof.

Let $\omega=(w_{1},\dots,w_{n})\in\Delta_{n}$ and let $\gamma_{1},\dots,\gamma_{n}:(-\epsilon,\epsilon)\to\mathbb{P}_{m}$ be differentiable curves with $\gamma_{j}(0)=I$ for all $j$ . By Lemma 4.5 there exists a sufficiently small $\delta>0$ so that $\gamma_{j}(s)^{\frac{1-t}{t}}>tI$ for all $j$ and $s\in(-\delta,\delta)$ . Then $\displaystyle\sum_{j=1}^{n}w_{j}\gamma_{j}(s)^{\frac{t-1}{t}}<\frac{1}{t}I$ , and $\displaystyle I-t\sum_{j=1}^{n}w_{j}\gamma_{j}(s)^{\frac{t-1}{t}}>0$ for any $s\in(-\delta,\delta)$ .

By Theorem 4.4, we have

[TABLE]

Taking logarithms, using the operator monotonicity of the logarithm map, and multiplying all terms by $1/s$ for $s>0$ , we get

[TABLE]

Note that

[TABLE]

Taking the limit as $s\to 0^{+}$ in (4.7), we obtain

[TABLE]

Since the logarithm map $\log:\mathbb{P}_{m}\to\mathbb{H}_{m}$ is diffeomorphic, we get the desired identity. By the similar argument for $t<0$ , we obtain the conclusion. ∎

The notions of operator convexity and concavity are characterized by Jensen type inequalities in [12]. For every contraction $X$ we have

[TABLE]

and

[TABLE]

For $X\in GL_{m}$ such that its inverse $X^{-1}$ is a contraction,

[TABLE]

Theorem 4.7.

Let $t\in(0,1)$ . Then

(1)

$\Omega_{t}(\omega;\mathbb{A})\geq I$ * implies $\displaystyle\mathcal{A}(\omega;A_{1}^{1-t},\dots,A_{n}^{1-t})\geq I$ , and*

(2)

$\Omega_{t}(\omega;\mathbb{A})\leq I$ * implies $\Omega_{t}(\omega;\mathbb{A})\leq\mathcal{H}(\omega;A_{1}^{t-1},\dots,A_{n}^{t-1})$ .*

Proof.

Let $t\in(0,1)$ .

(1)

Assume that $X=\Omega_{t}(\omega;\mathbb{A})\geq I$ . Then $X^{-1}\leq I$ , and by (4.10)

[TABLE]

Thus, by (3.5) and the above inequality

[TABLE]

(2)

Assume that $X=\Omega_{t}(\omega;\mathbb{A})\leq I$ . Then by (3.5) and (4.9)

[TABLE]

so $\displaystyle X^{-1}\geq\sum_{j=1}^{n}w_{j}A_{j}^{1-t}$ . Thus, we obtain (2) by taking inverse on both sides.

∎

For $1\leq i,j\leq n$ let $A_{ij}\in M_{m}$ , the set of all $m\times m$ matrices with entries in the field of complex numbers. We define a map $\Phi:M_{n}(M_{m})\to M_{m}$ as

[TABLE]

Then one can easily see that $\Phi$ is a positive linear and unital map.

Theorem 4.8.

Let $\omega=(w_{1},\dots,w_{n})\in\Delta_{n}$ . Let $\mathfrak{M}^{\omega}=\mathfrak{M}(\omega;\cdot):\mathbb{P}_{m}^{n}\to\mathbb{P}_{m}$ be the map satisfying the inequality

[TABLE]

If there exist positive scalars $\alpha$ and $\beta$ such that $0<\alpha I\leq A_{j}\leq\beta I$ for all $j$ , then

[TABLE]

for any $1/2\leq t<1$ , where $X=\Omega_{t}(\omega;A_{1},\dots,A_{n})$ .

Proof.

For some positive scalars $\alpha$ and $\beta$ such that $0<\alpha I\leq A_{j}\leq\beta I$ for all $j$ , we have that $\alpha I\leq X=\Omega_{t}(\omega;A_{1},\dots,A_{n})\leq\beta I$ for $1/2\leq t<1$ by Lemma 3.5, and $\alpha I\leq(X^{1/2}A_{j}^{\frac{1-t}{t}}X^{1/2})^{t}\leq\beta I$ for all $j$ . So

[TABLE]

Applying Proposition 2.7.8 in [5] to the positive linear map $\Phi$ , we obtain

[TABLE]

Equivalently, by Theorem 3.2

[TABLE]

Taking the congruence transformation by $X^{-1/2}$ on both sides and applying the inequality (4.12), we obtain

[TABLE]

∎

5. Log-Majorization

Let $\mathbf{x}=(x_{1},\dots,x_{m})$ and $\mathbf{y}=(y_{1},\dots,y_{m})$ be two $m$ -tuples of nonnegative numbers. Let $x_{1}^{\downarrow}\geq x_{2}^{\downarrow}\geq\cdots\geq x_{m}^{\downarrow}$ be the decreasing rearrangement of $x_{1},\dots,x_{m}$ . If for all $1\leq k\leq m$

[TABLE]

then we say that $\mathbf{x}$ is weakly log-majorized by $\mathbf{y}$ , and write it as $\displaystyle\mathbf{x}\prec_{w\log}\mathbf{y}$ . In addition, if the equality holds for $k=m$ , then we say that $\mathbf{x}$ is log-majorized by $\mathbf{y}$ , and write it as $\displaystyle\mathbf{x}\prec_{\log}\mathbf{y}$ .

A standard technique in the theory of log majorization is the use of antisymmetric tensor powers. For $1\leq k\leq m$ we denote by $Q_{k,m}$ the set of multi-indices $\alpha=(\alpha_{1},\dots,\alpha_{k})$ with $1\leq\alpha_{1}<\cdots<\alpha_{k}\leq m$ . Let $A\in M_{m}$ and let $\alpha,\beta\in Q_{k,m}$ . Then $A[\alpha|\beta]$ denotes the matrix obtained from $A$ by picking its entries from the rows corresponding to $\alpha$ and the columns corresponding to $\beta$ . Recall that $\Lambda^{k}$ is a map assigning each $A\in M_{m}$ to an $\left(\begin{array}[]{c}m\\ k\\ \end{array}\right)\times\left(\begin{array}[]{c}m\\ k\\ \end{array}\right)$ matrix $\Lambda^{k}A$ whose $(\alpha,\beta)$ th entry for $\alpha,\beta\in Q_{k,m}$ is given by $\det A[\alpha|\beta]$ , where the elements of $Q_{k,m}$ are ordered by the lexicographic ordering (or the dictionary order). There are interesting properties for the antisymmetric tensor powers of positive matrix. Note that $\Lambda^{k}(cI)=c^{k}I$ for any constant $c$ , where $I$ is the identity matrix with certain dimension, and

[TABLE]

The map $\mathbb{P}_{m}\ni A\mapsto\Lambda^{k}A$ is multiplicative, that is,

$\Lambda^{k}(AB)=(\Lambda^{k}A)(\Lambda^{k}B)$ and $(\Lambda^{k}A)^{r}=\Lambda^{k}A^{r},\ r\in(-\infty,\infty)$ .

So it is clear that $\Lambda^{k}(A\#_{t}B)=(\Lambda^{k}A)\#_{t}(\Lambda^{k}B)$ for any $A,B\in\mathbb{P}_{m}$ and $t\in[0,1]$ , and moreover, it can be extended to the symmetric weighted geometric means $\mathfrak{M}$ such as the ALM (Ando-Li-Mathias) mean, BMP (Bini-Meini-Poloni) mean, and Cartan mean $G$ :

[TABLE]

It has been shown in [15] that the map $\mathfrak{M}(\omega;\cdot):\mathbb{P}_{m}^{n}\to\mathbb{P}_{m}$ satisfying (P10) the arithmetic-geometric-harmonic weighted mean inequalities for given $\omega=(w_{1},\dots,w_{n})\in\Delta_{n}$ is the multivariate Lie-Trotter mean, as an extended version of the Lie-Trotter-Kato formula:

[TABLE]

where for $\epsilon>0$ , $\gamma_{j}:(-\epsilon,\epsilon)\to\mathbb{P}_{m}$ are any differentiable curves with $\gamma_{j}(0)=I$ for all $j$ . In particular, taking $\gamma_{j}(s)=A_{j}^{s}$ for each $A_{j}\in\mathbb{P}_{m}$ we obtain

Lemma 5.1.

Let the map $\mathfrak{M}(\omega;\cdot):\mathbb{P}_{m}^{n}\to\mathbb{P}_{m}$ satisfy (P10) the arithmetic-geometric-harmonic weighted mean inequalities. Then for given $\mathbb{A}=(A_{1},\dots,A_{n})\in\mathbb{P}_{m}^{n}$ and $\omega=(w_{1},\dots,w_{n})\in\Delta_{n}$

[TABLE]

where $\displaystyle L(\omega;\mathbb{A})=\exp\left(\sum_{j=1}^{n}w_{j}\log A_{j}\right)$ is the log-Euclidean mean.

Theorem 5.2.

Let $\mathbb{A}=(A_{1},\dots,A_{n})\in\mathbb{P}_{m}^{n}$ and $\omega=(w_{1},\dots,w_{n})\in\Delta_{n}$ . For $t\in(0,1)$ ,

[TABLE]

Proof.

The first log-majorization $\lambda(G(\omega;\mathbb{A}))\prec_{\log}\lambda(L(\omega;\mathbb{A}))$ has been proved in [6].

Let $X=\Omega_{t}(\omega;\mathbb{A})$ . Then $\displaystyle I=\sum_{j=1}^{n}w_{j}\left(A_{j}^{\frac{1-t}{t}}\#_{1-t}X^{-1}\right)$ . Since the function $f(A)=A^{s}$ for $0<s<1$ is operator concave on $\mathbb{P}_{m}$ from [5, Theorem 4.2.3],

[TABLE]

For the symmetric weighted geometric mean $\mathfrak{M}$ satisfying the monotonicity, (P10) and (5.13), we have

[TABLE]

and moreover,

[TABLE]

Assume that $\Lambda^{k}X\leq I$ . Then $\Lambda^{k}X^{-1}\geq I$ , so

[TABLE]

By the Loewner-Heinz inequality, it implies that for $0<s<1$

[TABLE]

Applying the monotonicity and (5.13) of the mean $\mathfrak{M}$ to (5.15), we have

[TABLE]

Taking $\frac{1}{(1-t)s}$ power on both sides yields

[TABLE]

Letting $s\to 0$ and using Lemma 5.1, we obtain that $\Lambda^{k}L(\omega;\mathbb{A})\leq I$ .

We have shown that for $1\leq k<m$ , $\Lambda^{k}\Omega_{t}(\omega;\mathbb{A})\leq I$ implies $\Lambda^{k}L(\omega;\mathbb{A})\leq I$ . This yields that $\lambda_{1}^{\downarrow}(\Lambda^{k}L(\omega;\mathbb{A}))\leq\lambda_{1}^{\downarrow}(\Lambda^{k}\Omega_{t}(\omega;\mathbb{A}))$ , that is,

[TABLE]

From the determinantal inequality of parameterized Wasserstein mean in Theorem 3.3 (6), we can see that the above inequality still holds for $k=m$ . Hence, the log-Euclidean mean $L(\omega;\mathbb{A})$ is weakly log-majorized by the parameterized Wasserstein mean $\Omega_{t}(\omega;\mathbb{A})$ . ∎

The following shows the weak log-majorization between the Cartan mean of $p(\in(0,1))$ powers of given positive definite matrices and the $p$ power of parameterized Wasserstein mean of given positive definite matrices.

Theorem 5.3.

Let $\mathbb{A}=(A_{1},\dots,A_{n})\in\mathbb{P}_{m}^{n}$ and $\omega=(w_{1},\dots,w_{n})\in\Delta_{n}$ . For $t\in(0,1)$ ,

[TABLE]

where $\lambda(A)^{r}:=(\lambda_{1}^{r}(A),\dots,\lambda_{m}^{r}(A))$ for any $A\in\mathbb{P}_{m}$ and $r\in\mathbb{R}$ .

Proof.

Let $X=\Omega_{t}(\omega;\mathbb{A})$ . Then $\displaystyle I=\sum_{j=1}^{n}w_{j}\left(A_{j}^{\frac{1-t}{t}}\#_{1-t}X^{-1}\right)$ . Since the logarithmic function $\log:\mathbb{P}_{m}\to\mathbb{H}_{m}$ is operator concave by Exercise 4.2.5 in [5], we have

[TABLE]

By Theorem 2.2 $G(\omega;A_{1}^{\frac{1-t}{t}}\#_{1-t}X^{-1},\dots,A_{n}^{\frac{1-t}{t}}\#_{1-t}X^{-1})\leq I$ , and by the multiplicativity of antisymmetric tensor power and (5.13)

[TABLE]

Assume that $\Lambda^{k}X\leq I$ for $1\leq k\leq m$ . Taking the congruence transformation by $(\Lambda^{k}X)^{1/2}$ on both sides of (5.16) and applying (4.9) yield

[TABLE]

Taking the congruence transformation by $(\Lambda^{k}X)^{-1/2}$ on both sides implies

[TABLE]

We have shown that for $1\leq k\leq m$ , $\Lambda^{k}\Omega_{t}(\omega;\mathbb{A})\leq I$ implies that $\Lambda^{k}G(\omega;A_{1}^{1-t},\dots,A_{n}^{1-t})\leq I$ . Let $\alpha=\lambda_{1}^{\downarrow}(\Lambda^{k}\Omega_{t}(\omega;\mathbb{A}))^{1/k}$ . Then by the homogeneity of parameterized Wasserstein mean in Theorem 3.3 (2)

[TABLE]

It implies that

[TABLE]

that is, $\Lambda^{k}G(\omega;A_{1}^{1-t},\dots,A_{n}^{1-t})\leq\lambda_{1}^{\downarrow}(\Lambda^{k}\Omega_{t}(\omega;\mathbb{A}))^{1-t}I$ . Thus,

[TABLE]

By the determinantal inequality of parameterized Wasserstein mean in Theorem 3.3 (6), we obtain the weak log-majorization between $G(\omega;A_{1}^{1-t},\dots,A_{n}^{1-t})$ and $\Omega_{t}(\omega;\mathbb{A})^{1-t}$ . ∎

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (No. NRF-2018R1C1B6001394).

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. Agueh and G. Carlier, Barycenters in the Wasserstein space, SIAM J. Math. Anal. Appl. 43 (2011), 904-924.
2[2] P. C. Alvarez-Esteban, E. del Barrio, J. A. Cuesta-Albertos and C. Matran, A fixed point approach to barycenters in Wasserstein spaces, J. Math. Anal. Appl. 441 (2016), 744-762.
3[3] L. Ambrosio, N. Gigli, and G. Savaré, Gradient flows in metric spaces and in the space of probability measures, 2nd edition, Birkhäuser, 2008.
4[4] T. Ando, C. K. Li and R. Mathias, Geometric means, Linear Algebra Appl. 385 (2004), 305-334.
5[5] R. Bhatia, Positive Definite Matrices, Princeton Series in Applied Mathematics, Princeton, 2007.
6[6] R. Bhatia, T. Jain and Y. Lim, Inequalities for the Wasserstein mean of positive definite matrices, to appear in Linear Algebra and Its Applications.
7[7] R. Bhatia, T. Jain and Y. Lim, On the Bures-Wasserstein distance between positive definite matrices, to appear in Expositiones Mathematicae.
8[8] R. Bhatia, T. Jain and Y. Lim, Strong convexity of sandwiched entropies and related optimization problems, in preparation.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Parameterized Wasserstein mean with its properties

Abstract.

1. Introduction

2. Symmetric weighted geometric mean

Definition 2.1**.**

Theorem 2.2**.**

3. Parameterized Wasserstein means

Definition 3.1**.**

Theorem 3.2**.**

Theorem 3.3**.**

Proof.

Remark 3.4**.**

Lemma 3.5**.**

Proof.

4. Inequalities of parameterized Wasserstein means

Theorem 4.1**.**

Proof.

Proposition 4.2**.**

Proof.

Remark 4.3**.**

Theorem 4.4**.**

Proof.

Lemma 4.5**.**

Proof.

Theorem 4.6**.**

Proof.

Theorem 4.7**.**

Proof.

Theorem 4.8**.**

Proof.

5. Log-Majorization

Lemma 5.1**.**

Theorem 5.2**.**

Proof.

Theorem 5.3**.**

Proof.

Definition 2.1.

Theorem 2.2.

Definition 3.1.

Theorem 3.2.

Theorem 3.3.

Remark 3.4.

Lemma 3.5.

Theorem 4.1.

Proposition 4.2.

Remark 4.3.

Theorem 4.4.

Lemma 4.5.

Theorem 4.6.

Theorem 4.7.

Theorem 4.8.

Lemma 5.1.

Theorem 5.2.

Theorem 5.3.