Quantum Hellinger distances revisited

J\'ozsef Pitrik; D\'aniel Virosztek

arXiv:1903.10455·math-ph·July 29, 2020

Quantum Hellinger distances revisited

J\'ozsef Pitrik, D\'aniel Virosztek

PDF

TL;DR

This paper introduces generalized quantum Hellinger divergences involving Kubo-Ando means, explores their properties, and characterizes barycenters, clarifying previous claims about their form in non-commuting cases.

Contribution

It extends quantum Hellinger distances by defining a family of divergences with Kubo-Ando means and characterizes their barycenters, correcting prior assumptions for non-commuting operators.

Findings

01

Generalized divergences are jointly convex and satisfy data processing inequality.

02

Barycenters are characterized as weighted multivariate 1/2-power means in commuting cases.

03

The previously claimed barycenter form does not hold for non-commuting operators.

Abstract

This short note aims to study quantum Hellinger distances investigated recently by Bhatia et al. [Lett. Math. Phys. 109 (2019), 1777-1804] with a particular emphasis on barycenters. We introduce the family of generalized quantum Hellinger divergences, that are of the form $ϕ (A, B) = Tr ((1 - c) A + c B - A σ B),$ where $σ$ is an arbitrary Kubo-Ando mean, and $c \in (0, 1)$ is the weight of $σ .$ We note that these divergences belong to the family of maximal quantum $f$ -divergences, and hence are jointly convex and satisfy the data processing inequality (DPI). We derive a characterization of the barycenter of finitely many positive definite operators for these generalized quantum Hellinger divergences. We note that the characterization of the barycenter as the weighted multivariate $1/2$ -power mean, that was claimed in the work of Bhatia et al. mentioned…

Equations116

d_{H}^{2} (ρ, σ) = \frac{1}{2} \int_{X} ((\frac{d ρ}{d μ})^{\frac{1}{2}} - (\frac{d σ}{d μ})^{\frac{1}{2}})^{2} d μ,

d_{H}^{2} (ρ, σ) = \frac{1}{2} \int_{X} ((\frac{d ρ}{d μ})^{\frac{1}{2}} - (\frac{d σ}{d μ})^{\frac{1}{2}})^{2} d μ,

d_{H}^{2} (A, B) = Tr (\frac{1}{2} (A + B) - A # B),

d_{H}^{2} (A, B) = Tr (\frac{1}{2} (A + B) - A # B),

ϕ (A, B) = Tr ((1 - c) A + c B - A σ B),

ϕ (A, B) = Tr ((1 - c) A + c B - A σ B),

f (x) = \int_{[0, \infty]} \frac{x ( 1 + t )}{x + t} d m (t) (x > 0),

f (x) = \int_{[0, \infty]} \frac{x ( 1 + t )}{x + t} d m (t) (x > 0),

f_{μ} (x) = \int_{[0, 1]} \frac{x}{( 1 - λ ) x + λ} d μ (λ) (x > 0),

f_{μ} (x) = \int_{[0, 1]} \frac{x}{( 1 - λ ) x + λ} d μ (λ) (x > 0),

A σ_{f_{μ}} B = A^{\frac{1}{2}} f_{μ} (A^{- \frac{1}{2}} B A^{- \frac{1}{2}}) A^{\frac{1}{2}} .

A σ_{f_{μ}} B = A^{\frac{1}{2}} f_{μ} (A^{- \frac{1}{2}} B A^{- \frac{1}{2}}) A^{\frac{1}{2}} .

a_{λ} (x) = (1 - λ) + λ x, g_{λ} (x) = x^{λ}, and h_{λ} (x) = ((1 - λ) + λ x^{- 1})^{- 1},

a_{λ} (x) = (1 - λ) + λ x, g_{λ} (x) = x^{λ}, and h_{λ} (x) = ((1 - λ) + λ x^{- 1})^{- 1},

ϕ_{μ} (A, B) := Tr ((1 - c (μ)) A + c (μ) B - A σ_{f_{μ}} B),

ϕ_{μ} (A, B) := Tr ((1 - c (μ)) A + c (μ) B - A σ_{f_{μ}} B),

ϕ_{μ} (A, B) = Tr (\frac{1}{2} (A + B) - A # B),

ϕ_{μ} (A, B) = Tr (\frac{1}{2} (A + B) - A # B),

ϕ_{μ} (A, B) = Tr {A \cdot g_{μ} (A^{- \frac{1}{2}} B A^{- \frac{1}{2}})},

ϕ_{μ} (A, B) = Tr {A \cdot g_{μ} (A^{- \frac{1}{2}} B A^{- \frac{1}{2}})},

g_{μ} (x) = (1 - c (μ)) + c (μ) x - f_{μ} (x) .

g_{μ} (x) = (1 - c (μ)) + c (μ) x - f_{μ} (x) .

g_{μ} (x) = (1 - c (μ)) + c (μ) x + h_{μ} (x) = h_{μ} (x) - h_{μ} (1) - h_{μ}^{'} (1) (x - 1) .

g_{μ} (x) = (1 - c (μ)) + c (μ) x + h_{μ} (x) = h_{μ} (x) - h_{μ} (1) - h_{μ}^{'} (1) (x - 1) .

H_{h_{μ}}^{(o p)} (X, Y) = h_{μ} (X) - h_{μ} (Y) - D h_{μ} (Y) [X - Y] .

H_{h_{μ}}^{(o p)} (X, Y) = h_{μ} (X) - h_{μ} (Y) - D h_{μ} (Y) [X - Y] .

H_{h_{μ}}^{(o p)} (A^{- \frac{1}{2}} B A^{- \frac{1}{2}}, I) = h_{μ} (A^{- \frac{1}{2}} B A^{- \frac{1}{2}}) - h_{μ} (I) - D h_{μ} (I) [A^{- \frac{1}{2}} B A^{- \frac{1}{2}} - I] .

H_{h_{μ}}^{(o p)} (A^{- \frac{1}{2}} B A^{- \frac{1}{2}}, I) = h_{μ} (A^{- \frac{1}{2}} B A^{- \frac{1}{2}}) - h_{μ} (I) - D h_{μ} (I) [A^{- \frac{1}{2}} B A^{- \frac{1}{2}} - I] .

H_{h_{μ}}^{(o p)} (A^{- \frac{1}{2}} B A^{- \frac{1}{2}}, I) = g_{μ} (A^{- \frac{1}{2}} B A^{- \frac{1}{2}}) .

H_{h_{μ}}^{(o p)} (A^{- \frac{1}{2}} B A^{- \frac{1}{2}}, I) = g_{μ} (A^{- \frac{1}{2}} B A^{- \frac{1}{2}}) .

ϕ_{μ} (A, B) = Tr {A \cdot H_{h_{μ}}^{(o p)} (A^{- \frac{1}{2}} B A^{- \frac{1}{2}}, I)} (A, B \in B (H)^{s a}) .

ϕ_{μ} (A, B) = Tr {A \cdot H_{h_{μ}}^{(o p)} (A^{- \frac{1}{2}} B A^{- \frac{1}{2}}, I)} (A, B \in B (H)^{s a}) .

ϕ_{μ} :^{++} \times B (H)^{++} \to [0, \infty); (A, B) \mapsto ϕ_{μ} (A, B)

ϕ_{μ} :^{++} \times B (H)^{++} \to [0, \infty); (A, B) \mapsto ϕ_{μ} (A, B)

ϕ_{μ} (T (A), T (B)) \leq ϕ_{μ} (A, B)

ϕ_{μ} (T (A), T (B)) \leq ϕ_{μ} (A, B)

x \in X arg min j = 1 \sum m w_{j} ρ^{2} (a_{j}, x) .

x \in X arg min j = 1 \sum m w_{j} ρ^{2} (a_{j}, x) .

X \in B (H)^{++} arg min j = 1 \sum m w_{j} ϕ_{μ} (A_{j}, X),

X \in B (H)^{++} arg min j = 1 \sum m w_{j} ϕ_{μ} (A_{j}, X),

X \mapsto ϕ_{μ} (A, X) = Tr ((1 - c (μ)) A + c (μ) X - A^{\frac{1}{2}} f_{μ} (A^{- \frac{1}{2}} X A^{- \frac{1}{2}}) A^{\frac{1}{2}})

X \mapsto ϕ_{μ} (A, X) = Tr ((1 - c (μ)) A + c (μ) X - A^{\frac{1}{2}} f_{μ} (A^{- \frac{1}{2}} X A^{- \frac{1}{2}}) A^{\frac{1}{2}})

D (j = 1 \sum m w_{j} ϕ_{μ} (A_{j}, \cdot)) (X_{0}) [Y] = 0 (Y \in B (H)^{s a}) .

D (j = 1 \sum m w_{j} ϕ_{μ} (A_{j}, \cdot)) (X_{0}) [Y] = 0 (Y \in B (H)^{s a}) .

D (j = 1 \sum m w_{j} ϕ_{μ} (A_{j}, \cdot)) (X) [Y] = c (μ) Tr Y - j = 1 \sum m w_{j} Tr D F_{μ, A_{j}} (X) [Y],

D (j = 1 \sum m w_{j} ϕ_{μ} (A_{j}, \cdot)) (X) [Y] = c (μ) Tr Y - j = 1 \sum m w_{j} Tr D F_{μ, A_{j}} (X) [Y],

F_{μ, A} (X) := A σ_{f_{μ}} X = A^{\frac{1}{2}} f_{μ} (A^{- \frac{1}{2}} X A^{- \frac{1}{2}}) A^{\frac{1}{2}} .

F_{μ, A} (X) := A σ_{f_{μ}} X = A^{\frac{1}{2}} f_{μ} (A^{- \frac{1}{2}} X A^{- \frac{1}{2}}) A^{\frac{1}{2}} .

D f_{μ} (X) [Y] = \int_{[0, 1]} λ ((1 - λ) X + λ I)^{- 1} Y ((1 - λ) X + λ I)^{- 1} d μ (λ)

D f_{μ} (X) [Y] = \int_{[0, 1]} λ ((1 - λ) X + λ I)^{- 1} Y ((1 - λ) X + λ I)^{- 1} d μ (λ)

D F_{μ, A_{j}} (X) [Y]

D F_{μ, A_{j}} (X) [Y]

= \int_{[0, 1]} λ A_{j}^{\frac{1}{2}} ((1 - λ) A_{j}^{- \frac{1}{2}} X A_{j}^{- \frac{1}{2}} + λ I)^{- 1} A_{j}^{- \frac{1}{2}} Y A_{j}^{- \frac{1}{2}} ((1 - λ) A_{j}^{- \frac{1}{2}} X A_{j}^{- \frac{1}{2}} + λ I)^{- 1} A_{j}^{\frac{1}{2}} d μ (λ)

= \int_{[0, 1]} λ A_{j}^{\frac{1}{2}} ((1 - λ) A_{j}^{- \frac{1}{2}} X A_{j}^{- \frac{1}{2}} + λ I)^{- 1} A_{j}^{- \frac{1}{2}} Y A_{j}^{- \frac{1}{2}} ((1 - λ) A_{j}^{- \frac{1}{2}} X A_{j}^{- \frac{1}{2}} + λ I)^{- 1} A_{j}^{\frac{1}{2}} d μ (λ)

= \int_{[0, 1]} λ ((1 - λ) X A_{j}^{- 1} + λ I)^{- 1} Y ((1 - λ) A_{j}^{- 1} X + λ I)^{- 1} d μ (λ) .

= \int_{[0, 1]} λ ((1 - λ) X A_{j}^{- 1} + λ I)^{- 1} Y ((1 - λ) A_{j}^{- 1} X + λ I)^{- 1} d μ (λ) .

Tr [Y (c (μ) I - j = 1 \sum m w_{j} \int_{[0, 1]} λ (1 - λ) A_{j}^{- 1} X + λ I^{- 2} d μ (λ))] = 0 (Y \in B (H)^{s a}),

Tr [Y (c (μ) I - j = 1 \sum m w_{j} \int_{[0, 1]} λ (1 - λ) A_{j}^{- 1} X + λ I^{- 2} d μ (λ))] = 0 (Y \in B (H)^{s a}),

c (μ) I = j = 1 \sum m w_{j} \int_{[0, 1]} λ (1 - λ) A_{j}^{- 1} X + λ I^{- 2} d μ (λ) .

c (μ) I = j = 1 \sum m w_{j} \int_{[0, 1]} λ (1 - λ) A_{j}^{- 1} X + λ I^{- 2} d μ (λ) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Quantum Hellinger distances revisited

József Pitrik

MTA-BME Lendület (Momentum) Quantum Information Theory Research Group, and Department of Analysis, Institute of Mathematics

Budapest University of Technology and Economics

H-1521 Budapest, Hungary

[email protected] http://www.math.bme.hu/~pitrik and

Dániel Virosztek

Institute of Science and Technology Austria

Am Campus 1, 3400 Klosterneuburg, Austria

[email protected] http://pub.ist.ac.at/~dviroszt

Abstract.

This short note aims to study quantum Hellinger distances investigated recently by Bhatia et al. [8] with a particular emphasis on barycenters. We introduce the family of generalized quantum Hellinger divergences that are of the form $\phi(A,B)=\mathrm{Tr}\left((1-c)A+cB-A\sigma B\right),$ where $\sigma$ is an arbitrary Kubo-Ando mean, and $c\in(0,1)$ is the weight of $\sigma.$ We note that these divergences belong to the family of maximal quantum $f$ -divergences, and hence are jointly convex, and satisfy the data processing inequality (DPI). We derive a characterization of the barycenter of finitely many positive definite operators for these generalized quantum Hellinger divergences. We note that the characterization of the barycenter as the weighted multivariate $1/2$ -power mean, that was claimed in [8], is true in the case of commuting operators, but it is not correct in the general case.

Key words and phrases:

quantum Hellinger distance, Kubo-Ando mean, weighted multivariate mean, barycenter, data processing inequality, convexity

2010 Mathematics Subject Classification:

Primary: 47A64. Secondary: 15A24, 81Q10.

J. Pitrik was supported by the Hungarian Academy of Sciences Lendület-Momentum grant for Quantum Information Theory, no. 96 141, and by the Hungarian National Research, Development and Innovation Office (NKFIH) via grants no. K119442, no. K124152, and no. KH129601. D. Virosztek was supported by the ISTFELLOW program of the Institute of Science and Technology Austria (project code IC1027FELL01), by the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie Grant Agreement No. 846294, and partially supported by the Hungarian National Research, Development and Innovation Office (NKFIH) via grants no. K124152, and no. KH129601.

1. Introduction

1.1. Motivation, goals

Given a measure space $\left(X,\mathcal{A},\mu\right)$ and probability measures $\rho$ and $\sigma$ that are absolutely continuous with respect to $\mu,$ the classical squared Hellinger distance or Hellinger divergence of $\rho$ and $\sigma$ is defined as

[TABLE]

where $\mathrm{d}\rho/\mathrm{d}\mu$ and $\mathrm{d}\sigma/\mathrm{d}\mu$ denote the Radon–Nikodym derivatives [16]. The Hellinger divergence is a special Csiszár-Morimoto $f$ -divergence [12, 24] generated by the convex function $f(x)=\left(\sqrt{x}-1\right)^{2},$ and it has several possible counterparts in quantum information theory. One of them is the squared Bures distance or Wasserstein metric, see, e.g., the most recent works of Bhatia et al. [10], Dinh et al. [13], and Molnár [23]. Another important quantum analogue of the classical Hellinger divergence has been investigated in [8], namely the quantity

[TABLE]

where $A,B$ are density operators representing quantum states, or even more generally, positive operators, and $\#$ is the geometric mean introduced by Pusz and Woronowicz [28], which is a particularly important Kubo-Ando mean [3, 4, 6].

In this note, we introduce a far-reaching generalization of the quantum Hellinger divergence (2), namely, the family of generalized quantum Hellinger divergences of the form

[TABLE]

where $\sigma$ is an arbitrary Kubo-Ando mean, and $c\in(0,1)$ is the weight of $\sigma.$ We will note that these divergences belong to the family of maximal quantum $f$ -divergences, and hence are jointly convex, and satisfy the data processing inequality (DPI). Moreover, we will show an intimate relation between generalized quantum Hellinger divergences and operator valued Bregman divergences (Claim 2). By this close relation, we verify in Claim 3, that generalized quantum Hellinger divergences are genuine divergences in the sense of [1, Sec. 1.2 & 1.3]. Note that this is not the case for maximal quantum $f$ -divergences in general, see Remark 1. As the main result of this paper, we derive a characterization of the barycenter of finitely many positive definite operators for these generalized quantum Hellinger divergences. We will also note that the characterization of the barycenter as the weighted multivariate power mean of order $1/2$ , that was claimed in the work of Bhatia et al. [8, Thm. 9], is true in the case of commuting operators, but it is not correct in the general case.

1.2. Basic notions, notation

Operator monotone functions mapping the positive half-line $(0,\infty)$ into itself admit a transparent integral-representation by Löwner’s theory. In the seminal paper of Kubo and Ando [4], the following integral representation was considered:

[TABLE]

where $m$ is some positive Radon measure on the extended half-line $[0,\infty].$ By a simple push-forward of $m$ by the transformation $T:[0,\infty]\rightarrow[0,1];\,t\mapsto\lambda:=\frac{t}{t+1},$ we get the following integral-representation of positive operator monotone functions on $(0,\infty):$

[TABLE]

where $\mu=T_{\#}m,$ that is, $\mu(A)=m\left(T^{-1}(A)\right)$ for every Borel set $A\subseteq[0,1].$ This representation is also well-known and appears — among others — in [15] and [30]. Note that if $m$ is absolutely continuous with respect to the Lebesgue measure and $\mathrm{d}m(t)=\rho(t)\mathrm{d}t,$ then the density of $\mu=T_{\#}m$ is given by $\mathrm{d}\mu(\lambda)=\frac{1}{(1-\lambda)^{2}}\rho\left(\frac{\lambda}{1-\lambda}\right)\mathrm{d}\lambda.$

Throughout this note, $\mathcal{H}$ stands for a finite dimensional complex Hilbert space, $\mathcal{B}(\mathcal{H})$ denotes the set of all linear operators on $\mathcal{H},$ and $\mathcal{B}(\mathcal{H})^{sa}$ and $\mathcal{B}(\mathcal{H})^{++}$ stand for the set of all self-adjoint and positive definite operators, respectively. On $\mathcal{B}(\mathcal{H})^{sa}$ we consider the usual Löwner order induced by positivity. The Fréchet derivative of a map $\psi:\mathcal{B}(\mathcal{H})^{sa}\supseteq\mathcal{U}\rightarrow\mathcal{V}$ at the point $X\in\mathcal{U}$ is denoted by $\mathbf{D}\psi(X)[\cdot].$ Here, $\mathcal{U}$ is an open subset of $\mathcal{B}(\mathcal{H})^{sa},$ usually the cone of positive definite operators, and the target space $\mathcal{V}$ is usually $\mathbb{R}$ or $\mathcal{B}(\mathcal{H})^{sa}.$ Note that in the latter case $\mathbf{D}\psi(X)[\cdot]$ is a linear map from $\mathcal{B}(\mathcal{H})^{sa}$ into itself. The symbol $I$ denotes the identity operator on $\mathcal{H}.$

For positive definite operators $A,B\in\mathcal{B}(\mathcal{H})^{++},$ the Kubo-Ando connection generated by the operator monotone function $f_{\mu}:(0,\infty)\rightarrow(0,\infty)$ is denoted by $A\sigma_{f_{\mu}}B,$ and is defined by

[TABLE]

A Kubo-Ando connection $\sigma_{f_{\mu}}$ is a mean if and only if $f_{\mu}(1)=\mu\left([0,1]\right)=1.$ In the sequel, we will restrict our attention to means. We denote by $\mathcal{P}\left([0,1]\right)$ the set of all Borel probability measures on $[0,1],$ and by $c\left(\mu\right):=\int_{[0,1]}\lambda\mathrm{d}\mu(\lambda)$ the center of mass of $\mu.$ There is a natural way to assign a weight parameter to a mean $\sigma_{f_{\mu}},$ namely, $W\left(\sigma_{f_{\mu}}\right):=f_{\mu}^{\prime}(1)=c\left(\mu\right).$ More details about this weight parameter can be found in [30], we only mention that for the weighted arithmetic, geometric, and harmonic means generated by

[TABLE]

respectively, we have $W\left(\sigma_{a_{\lambda}}\right)=W\left(\sigma_{g_{\lambda}}\right)=W\left(\sigma_{h_{\lambda}}\right)=\lambda.$ That is, this weight parameter coincides with the usual one in the most important special cases.

1.3. Convex order

The convex order is a well-known relation between probability measures; for $\mu,\nu\in\mathcal{P}\left([0,1]\right),$ we say that $\mu\preccurlyeq\nu$ if for all convex functions $u:[0,1]\rightarrow\mathbb{R}$ we have $\int_{[0,1]}u\,\mathrm{d}\mu\leq\int_{[0,1]}u\,\mathrm{d}\nu.$ It is clear that for all $\mu\in\mathcal{P}\left([0,1]\right)$ with $c\left(\mu\right)=\lambda$ we have $\delta_{\lambda}\preccurlyeq\mu\preccurlyeq(1-\lambda)\delta_{0}+\lambda\delta_{1},$ where $\delta_{x}$ denotes the Dirac mass concentrated on $x.$ For any fixed $x>0,$ the map $\lambda\mapsto\frac{x}{(1-\lambda)x+\lambda}$ is convex. Therefore, if $\mu\preccurlyeq\nu,$ then $f_{\mu}(x)\leq f_{\nu}(x)$ for all $x>0,$ and hence $A\sigma_{f_{\mu}}B\leq A\sigma_{f_{\nu}}B$ for all $A,B\in\mathcal{B}(\mathcal{H})^{++}.$ Consequently, if $\nu=\left(1-c\left(\mu\right)\right)\delta_{0}+c\left(\mu\right)\delta_{1},$ then $A\sigma_{f_{\nu}}B-A\sigma_{f_{\mu}}B$ is always positive, in particular, $\operatorname{Tr}\left(A\sigma_{f_{\nu}}B-A\sigma_{f_{\mu}}B\right)\geq 0.$ This quantity is exactly the one we are interested in.

2. Basic properties of quantum Hellinger distances

We are interested in divergences of the form

[TABLE]

where $\mu\in\mathcal{P}\left([0,1]\right).$ To avoid trivialities, we assume in the sequel that the support of $\mu$ is strictly larger than $\{0,1\},$ and therefore, $f_{\mu}$ is non-affine — in fact, it is strictly concave

If $\mu$ is the arcsine distribution, that is, $\mathrm{d}\mu(\lambda)=\frac{1}{\pi\sqrt{\lambda(1-\lambda)}}\mathrm{d}\lambda,$ then

[TABLE]

where $\#$ is the Pusz-Woronowitz geometric mean [28]. The square root of this quantity (up to an irrelevant multiplicative constant) was considered in [8] as a possible quantum (or matrix) version of the classical Hellinger distance. Therefore, we will call the quantities of the form (7) generalized quantum Hellinger divergences.

We easily get that

[TABLE]

where $g_{\mu}:(0,\infty)\rightarrow[0,\infty)$ is defined by

[TABLE]

*Remark 1**.*

We note that $g_{\mu}$ is operator convex as $f_{\mu}$ is operator concave, and hence generalized quantum Hellinger divergences belong to the family of maximal quantum $f$ -divergences studied for example in [17, 19, 22, 26]. This latter divergence class consists of quantities of the form $S_{f}(A,B)=\operatorname{Tr}Af\left(A^{-\frac{1}{2}}BA^{-\frac{1}{2}}\right),$ where $A,B\in\mathcal{B}(\mathcal{H})^{++},$ and $f:\,(0,\infty)\rightarrow\mathbb{R}$ is operator convex [17, 26]. However, this level of generality may lead to counter-intuitive phenomena. For instance, the maximal quantum $f$ -divergence can be negative (see, e.g., [17, Example 4.4], where $f(x)=x\log{x},$ and $S_{f}\left(I,e^{-1}I\right)=-\mathrm{dim}\left(\mathcal{H}\right)e^{-1}<0$ ); and it may happen that $S_{f}(A,A)>0$ for all $A\in\mathcal{B}(\mathcal{H})^{++}$ (see, e.g., [17, Example 4.2], where $f(x)=x^{2},$ and $S_{f}(A,A)=\operatorname{Tr}A>0$ for all $A\in\mathcal{B}(\mathcal{H})^{++}$ ). That is, maximal quantum $f$ -divergences are not divergences in the sense of [1, Sec. 1.2 & 1.3] in general. In particular, they are not necessarily positive definite. (We call a divergence $D$ positive definite, if $D(A,B)\geq 0$ for every $A,B\in\mathcal{B}(\mathcal{H})^{++},$ and $D(A,B)=0$ if and only if $A=B.$ )

Now we check that generalized quantum Hellinger divergences are intimately related to operator valued Bregman divergences, and hence are reasonable measures of dissimilarity and genuine divergences in the sense of [1, Sec. 1.2 & 1.3].

2.1. The relation with Bregman divergences

Note that $h_{\mu}:=-f_{\mu}$ is an operator convex function, and that

[TABLE]

The operator valued Bregman divergence generated by the operator convex function $h_{\mu}$ reads as follows:

[TABLE]

In particular,

[TABLE]

As $\mathbf{D}h_{\mu}(I)$ coincides with the multiplication by the constant $-c\left(\mu\right),$ and $h_{\mu}^{\prime}(I)=-c\left(\mu\right)I,$ we get that

[TABLE]

Therefore, we obtain the following claim.

Claim 2.

The generalized quantum Hellinger divergence $\phi_{\mu}$ defined in (7) can be expressed by an operator valued Bregman divergence as follows:

[TABLE]

For a detailed study of Bregman divergences on matrices we refer to [27].

Now we are in the position to check that generalized quantum Hellinger divergences are genuine divergences in the sense of Amari [1, Sec. 1.2 & 1.3].

Claim 3.

For any $\mu\in\mathcal{P}{[0,1]},$ the map

[TABLE]

satisfies the followings.

(i)

$\phi_{\mu}(A,B)\geq 0$ * and $\phi_{\mu}(A,B)=0$ if and only if $A=B.$ * 2. (ii)

The first derivative of $\phi_{\mu}$ in the second variable vanishes at the diagonal, that is, $\mathbf{D}\left(\phi_{\mu}(A,\cdot)\right)(A)=0\in\mathrm{Lin}\left(\mathcal{B}(\mathcal{H})^{sa},\mathbb{R}\right)$ for all $A\in\mathcal{B}(\mathcal{H})^{++}.$ 3. (iii)

The second derivative of $\Phi_{\mu}$ in the second variable is positive at the diagonal, that is, $\mathbf{D}^{2}\left(\phi_{\mu}(A,\cdot)\right)(A)[Y,Y]\geq 0$ for all $Y\in\mathcal{B}(\mathcal{H})^{sa}.$

Proof.

Bregman divergences are clearly divergences (see, e.g., [8, Sec. 1]).That is,

(i)

$H_{h_{\mu}}^{(op)}\left(A^{-\frac{1}{2}}BA^{-\frac{1}{2}},I\right)\geq 0\in\mathcal{B}(\mathcal{H}),$ and $H_{h_{\mu}}^{(op)}\left(A^{-\frac{1}{2}}BA^{-\frac{1}{2}},I\right)=0$ if and only if $A=B,$ 2. (ii)

$\mathbf{D}\left(H_{h_{\mu}}^{(op)}\left(A^{-\frac{1}{2}}\,\cdot\,A^{-\frac{1}{2}},I\right)\right)(A)=0\in\mathrm{Lin}\left(\mathcal{B}(\mathcal{H})^{sa}\right)$ for every $A\in\mathcal{B}(\mathcal{H})^{++},$ 3. (iii)

$\mathbf{D}^{2}\left(H_{h_{\mu}}^{(op)}\left(A^{-\frac{1}{2}}\,\cdot\,A^{-\frac{1}{2}},I\right)\right)(A)[Y,Y]\geq 0\in\mathcal{B}(\mathcal{H})$ for all $Y\in\mathcal{B}(\mathcal{H})^{sa}.$

Now Claim 3 follows from Claim 2. ∎

2.2. Joint convexity, data processing inequality

As generalized quantum Hellinger divergences belong to the family of maximal quantum $f$ -divergences, they are jointly convex and they satisfy the data processing inequality, which is particularly important from the quantum information theory viewpoint. For details, see [17, 19, 22, 26]. We recall these important properties for convenience.

Property 4 (Joint convexity).

The generalized quantum Hellinger divergence $\phi_{\mu}$ defined in (7) is jointly convex on $\mathcal{B}(\mathcal{H})^{++}\times\mathcal{B}(\mathcal{H})^{++}.$

Property 5 (Data processing inequality).

Let $T:\mathcal{B}(\mathcal{H})\rightarrow\mathcal{B}(\mathcal{H})$ be a quantum channel, that is, a completely positive and trace preserving (CPTP) map. Let $\mu\in\mathcal{P}{[0,1]}$ be arbitrary. Then

[TABLE]

holds for every $A,B\in\mathcal{B}(\mathcal{H})^{++}.$

3. Barycenters

The notion of barycenter (or least squares mean) plays a central role in averaging procedures related to various topics in mathematics and mathematical physics. Given a metric space $\left(X,\rho\right)$ and an $m$ -tuple $a_{1},\dots,a_{m}$ in $X$ with positive weights $w_{1},\dots,w_{m}$ such that $\sum_{j=1}^{m}w_{j}=1,$ the barycenter (or Fréchet mean or Karcher mean or Cartan mean) is defined to be

[TABLE]

In our setting, $X=\mathcal{B}(\mathcal{H})^{++},$ and the generalised quantum Hellinger divergence $\phi_{\mu}$ plays the role of the squared distance $\rho^{2},$ although it is not the square of any true metric in general.

That is, we consider the optimization problem

[TABLE]

where the positive definite operators $A_{1},\dots,A_{m}$ and the weights $w_{1},\dots w_{m}$ are fixed. By the strict concavity of $f_{\mu},$ the function

[TABLE]

is strictly convex on $\mathcal{B}(\mathcal{H})^{++},$ see, e.g., [11, 2.10. Thm.]. Therefore, there is a unique solution $X_{0}$ of (13), and it is necessarily a critical point of the function $X\mapsto\sum_{j=1}^{m}w_{j}\phi_{\mu}\left(A_{j},X\right).$ That is, it satisfies

[TABLE]

Easy computations give that

[TABLE]

where for a positive definite operator $A,$ the map $F_{\mu,A}:\mathcal{B}(\mathcal{H})^{++}\rightarrow\mathcal{B}(\mathcal{H})^{++}$ is defined by

[TABLE]

By differentiating (5), we have

[TABLE]

for $X\in\mathcal{B}(\mathcal{H})^{++},\,Y\in\mathcal{B}(\mathcal{H})^{sa}.$ Consequently,

[TABLE]

By the linearity and the cyclic property of the trace, we get from (15) and (18) that (14) is equivalent to

[TABLE]

where $|\cdot|$ stands for the absolute value of an operator, that is, $\left|Z\right|=\left(Z^{*}Z\right)^{\frac{1}{2}}.$ This latter equation amounts to

[TABLE]

So we obtained the following characterization of the barycenter.

Theorem 6.

Let $\mu\in\mathcal{P}{[0,1]}$ and let $\phi_{\mu}$ be the generalized quantum Hellinger divergence generated by $\mu,$ that is,

[TABLE]

Then the barycenter (or Cartan mean or Fréchet mean or Karcher mean) of the positive definite operators $A_{1},\dots,A_{m}$ with positive weights $w_{1},\dots,w_{m}$ with respect to $\phi_{\mu},$ i.e.,

[TABLE]

coincides with the unique positive definite solution of the matrix equation

[TABLE]

4. The commutative case

In this section we show that in the commutative case formula (21) can be greatly simplified (see (28) later), furthermore, the conditions on $f$ can be relaxed. Recall that in the general non-commutative case, the generating function $f$ was operator monotone (or equivalently, operator concave), and hence smooth ( $C^{\infty}$ ), see (4) and (5). When dealing with commuting operators, we need concavity only in the classical one-variable sense, and hence we require much less regularity on $f.$ For now, we only require that $f:(0,\infty)\rightarrow\mathbb{R}$ is a strictly concave $C^{1}$ function.

Let $\mathcal{A}\subset\mathcal{B}(\mathcal{H})$ be a maximal Abelian subalgebra (MASA). In this commutative case, the proper analogue of the generalized quantum Hellinger divergence (7) is

[TABLE]

Note that now there is no underlying measure involved and the function class that we choose the $f^{\prime}$ s from is much larger than that in the general non-commutative case. Also note that

[TABLE]

where $g(x)=f(1)+f^{\prime}(1)(x-1)-f(x).$ We easily get that for $A,X\in\mathcal{A}\cap\mathcal{B}(\mathcal{H})^{++}$ and $Y\in\mathcal{A}\cap\mathcal{B}(\mathcal{H})^{sa}$ we have

[TABLE]

and therefore,

[TABLE]

That is, the derivative $\mathbf{D}\left(\sum_{j=1}^{m}w_{j}\phi_{f}\left(A_{j},\cdot\right)\right)(X)$ vanishes if and only if

[TABLE]

or equivalently,

[TABLE]

We obtained the following

Proposition 7.

The critical point of the function $X\mapsto\sum_{j=1}^{m}w_{j}\phi_{f}\left(A_{j},X\right)$ is the unique solution $X\in\mathcal{A}\cap\mathcal{B}(\mathcal{H})^{++}$ of the equation

[TABLE]

So in the commutative case, the equation characterizing the barycenter (28) is simpler than that in the non-commutative case (21). Note that if all the $A_{j}$ ’s are in the same MASA $\mathcal{A}\subset\mathcal{B}(\mathcal{H})$ , then the barycenter is also in $\mathcal{A},$ and hence it has the form described in Proposition 7. One way to show this is to use the data processing inequality (DPI) for the orthogonal projection onto $\mathcal{A}$ which is completely positive and trace preserving, and which is denoted by $\mathbf{E}_{\mathcal{A}}$ to express the analogy with the classical conditional expectation. So let $X_{0}$ be the unique minimizer of $X\mapsto\sum_{j=1}^{m}w_{j}\phi_{\mu}\left(A_{j},X\right).$ Now

[TABLE]

hence $\mathbf{E}_{\mathcal{A}}\left(X_{0}\right)=X_{0}$ which means that $X_{0}\in\mathcal{A}.$ We also note that under the assumption $A_{j}X=XA_{j}$ for all $j^{\prime}$ s, (21) clearly coincides with (28), because $c\left(\mu\right)=f_{\mu}^{\prime}(1),$ and in this case, by the identity

[TABLE]

we have

[TABLE]

*Example 8**.*

Let $f_{t}(x)=x^{t}$ for $t\in(0,1).$ Then $\phi_{f_{t}}$ is of the form

[TABLE]

and the barycenter equation (28) reads as

[TABLE]

That is, the barycenter coincides with the weighted power mean of order $1-t,$ which is by definition the unique positive definite solution of the equation $X=\sum_{j=1}^{m}w_{j}X\#_{1-t}A_{j},$ see [21, Def. 3.2]. This example does not contain new results, the above characterization of the barycenter as weighted power mean can be found, e.g., in [2] or in [29].

*Remark 9**.*

By the special choice $t=1/2$ in Example 8, we get that the claim of Bhatia et al. saying that the barycenter and the weighted power mean of order $1/2$ coincide [8, Thm. 9] is true in the commutative case.

*Example 10**.*

Set $f(x)=\log{x}.$ Then $\phi_{f}$ is the relative entropy, that is,

[TABLE]

and the barycenter equation (28) reads as

[TABLE]

That is, the barycenter coincides with the weighted sum of the $A_{j}$ ’s. This is well-known, see, e.g., the remarks after Theorem 4 in [8].

Note that we get Example 10 from Example 8 if we take the limit $t\to 0.$ Indeed,

[TABLE]

where $g_{t}(x)=1+t(x-1)-x^{t},$ and $\lim_{t\to 0}\frac{1}{t}\left(1-x^{t}\right)=\log{x}$ in the locally uniform topology.

5. Remarks

5.1. A note on a paper of Bhatia et al

In our view, Theorem 9 in [8] is not true in general. The proof contains a gap, namely, using their notation, the fact that $I$ is a critical point for $g$ does not imply that $X_{0}$ is a critical point for $f,$ although formula (54) in [8] is correct.

It is true, that for commuting operators, (21) and (28) coincide. However, these equations are different without the assumption of commutativity. To demonstrate the difference, we take the following example. Let $\mu$ be the arcsine distribution, $\mathrm{d}\mu(\lambda)=\frac{1}{\pi\sqrt{\lambda(1-\lambda)}}\mathrm{d}\lambda,$ let $m=2,w_{1}=w_{2}=\frac{1}{2},$ and

[TABLE]

Then numerical optimization performed by Wolfram Mathematica [31] shows that

[TABLE]

Note that both $A_{1}$ and $A_{2}$ have real entries. Therefore, $A_{j}\#\overline{X}=\overline{A_{j}\#X},$ and hence $\phi_{\mu}\left(A_{j},X\right)=\phi_{\mu}\left(A_{j},\overline{X}\right)$ holds for every $X\in\mathcal{B}(\mathcal{H})^{++}$ and $j\in\{1,2\},$ where $\overline{X}$ denotes the entrywise complex conjugate of $X.$ Consequently, the strict convexity of the functions $X\mapsto\phi_{\mu}\left(A_{j},X\right),\,j\in\{1,2\}$ implies that $\operatorname*{arg\,min}_{X\in\mathcal{B}(\mathcal{H})^{++}}\sum_{j=1}^{2}\frac{1}{2}\phi_{\mu}\left(A_{j},X\right)$ has real entries. So it is enough to minimize numerically over the cone of positive definite $2\times 2$ matrices with real entries [31].

However, the barycenter obtained numerically in (36) does not coincide with the weighted power mean of order $1/2$ as

[TABLE]

Note that after the publication of our manuscript on arXiv.org, a correction of [8] dedicated to this problem was released [9].

5.2. A possible measure of non-commutativity

Motivated by the observations above, we introduce a function that quantifies the noncommutativity of the positive definite operators $A_{1},\dots,A_{m}.$

Definition 11.

Given $\mathbf{A}=\left(A_{1},\dots,A_{m}\right)\in\left(\mathcal{B}(\mathcal{H})^{++}\right)^{m},\,\mathbf{w}=\left(w_{1},\dots,w_{m}\right)\in(0,1]^{m}$ with $\sum_{j=1}^{m}w_{j}=1,\,\,\mu\in\mathcal{P}\left([0,1]\right),$ and a convenient metric $\rho$ on $\mathcal{B}(\mathcal{H})^{++},$ the $\left(\mathbf{w},\mu,\rho\right)$ -dependent measure of the non-commutativity of $A_{1},\dots,A_{m}$ is defined as

[TABLE]

where

[TABLE]

i.e., $\mathrm{BC}\left(\mathbf{A},\mathbf{w},\mu\right)$ is the solution of (21), and $\mathrm{M}\left(\mathbf{A},\mathbf{w},\mu\right)$ is a $\mu$ -dependent $\mathbf{w}$ -weighted mean of $A_{1},\dots,A_{m}$ defined as the unique solution of the matrix equation (28) that we recall here for convenience:

[TABLE]

The detailed study of the quantity (38) is beyond the scope of this paper, however, it may be the subject of subsequent works.

Acknowledgements

We are grateful to Milán Mosonyi for drawing our attention to Ref.’s [8, 17, 19, 22, 25, 26], for comments on earlier versions of this paper, and for several discussions on the topic. We are also grateful to Miklós Pálfia for several discussions; to László Erdős for his essential suggestions on the structure and highlights of this paper, and for his comments on earlier versions; and to the anonymous referee for his/her valuable comments and suggestions.

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. Amari, Information Geometry and its Applications, Springer (Tokyo), 2016.
2[2] S. Amari, Integration of stochastic models by minimizing α 𝛼 \alpha -divergence, Neural Comput. 19 (2007), 2780-2796.
3[3] T. Ando, Concavity of certain maps on positive definite matrices and applications to Hadamard products, Linear Algebra Appl. 26 (1979), 203-241.
4[4] T. Ando, F. Kubo, Means of positive linear operators , Math. Ann. 246 (1980), 205–224.
5[5] T. Ando, F. Hiai, Operator log-convex functions and operator means, Math. Ann. 350 (2011), 611-630.
6[6] T. Ando, Topics on operator inequalities, Lecture note, Sapporo, 1978.
7[7] R. Bhatia, Matrix Analysis, Springer-Verlag, New York, 1997.
8[8] R. Bhatia, S. Gaubert, T. Jain, Matrix versions of the Hellinger distance, Lett. Math. Phys. 109 (2019), 1777–1804.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Quantum Hellinger distances revisited

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

1.1. Motivation, goals

1.2. Basic notions, notation

1.3. Convex order

2. Basic properties of quantum Hellinger distances

Remark 1*.*

2.1. The relation with Bregman divergences

Claim 2**.**

Claim 3**.**

Proof.

2.2. Joint convexity, data processing inequality

Property 4** (Joint convexity).**

Property 5** (Data processing inequality).**

3. Barycenters

Theorem 6**.**

4. The commutative case

Proposition 7**.**

Example 8*.*

Remark 9*.*

Example 10*.*

5. Remarks

5.1. A note on a paper of Bhatia et al

5.2. A possible measure of non-commutativity

Definition 11**.**

Acknowledgements

*Remark 1**.*

Claim 2.

Claim 3.

Property 4 (Joint convexity).

Property 5 (Data processing inequality).

Theorem 6.

Proposition 7.

*Example 8**.*

*Remark 9**.*

*Example 10**.*

Definition 11.