Metric properties of homogeneous and spatially inhomogeneous   F-divergences

Nicol\`o De Ponti

arXiv:1902.06305·cs.IT·February 19, 2019

Metric properties of homogeneous and spatially inhomogeneous F-divergences

Nicol\`o De Ponti

PDF

TL;DR

This paper explores the properties of F-divergences derived from entropy-transport problems, demonstrating how certain choices lead to metric properties and including well-known divergences like Jensen-Shannon.

Contribution

It introduces the marginal perspective cost function H, analyzes its metric properties, and connects it to classical divergences and the Matusita divergences within the entropy-transport framework.

Findings

01

H produces symmetric divergences in the entropic case

02

Certain F-divergences like Jensen-Shannon are analyzed for metric properties

03

For p>1, the induced cost H_p is the square of a metric on a cone space

Abstract

In this paper I investigate the construction and the properties of the so-called marginal perspective cost $H$ , a function related to Optimal Entropy-Transport problems obtained by a minimizing procedure, involving a cost function $c$ and an entropy function. In the pure entropic case, which corresponds to the choice $c = 0$ , the function $H$ naturally produces a symmetric divergence. I consider various examples of entropies and I compute the induced marginal perspective function, which includes some well-known functionals like the Hellinger distance, the Jensen-Shannon divergence and the Kullback-Liebler divergence. I discuss the metric properties of these functions and I highlight the important role of the so-called Matusita divergences. In the entropy-transport case, starting from the power like entropy $F_{p} (s) = (s^{p} - p (s - 1) - 1) / (p (p - 1))$ and the cost $c = d^{2}$ for a given metric $d$ , the…

Equations393

μ_{1} = i = 1 \sum m r_{i} δ_{x_{i}}, μ_{2} = i = 1 \sum m t_{i} δ_{x_{i}}

μ_{1} = i = 1 \sum m r_{i} δ_{x_{i}}, μ_{2} = i = 1 \sum m t_{i} δ_{x_{i}}

D_{F}(\mu_{1}||\mu_{2}):=\sum_{i=1}^{m}F\Big{(}\frac{r_{i}}{t_{i}}\Big{)}t_{i}=\sum_{i=1}^{m}\hat{F}(r_{i},t_{i})

D_{F}(\mu_{1}||\mu_{2}):=\sum_{i=1}^{m}F\Big{(}\frac{r_{i}}{t_{i}}\Big{)}t_{i}=\sum_{i=1}^{m}\hat{F}(r_{i},t_{i})

H_{F} (μ_{1} ∣∣ μ_{2}) := μ in f D_{F} (μ ∣∣ μ_{1}) + D_{F} (μ ∣∣ μ_{2}) .

H_{F} (μ_{1} ∣∣ μ_{2}) := μ in f D_{F} (μ ∣∣ μ_{1}) + D_{F} (μ ∣∣ μ_{2}) .

\tilde{H}(r,t)=\inf_{\theta>0}F\Big{(}\frac{\theta}{r}\Big{)}r+F\Big{(}\frac{\theta}{t}\Big{)}t=\inf_{\theta>0}\hat{F}(\theta,r)+\hat{F}(\theta,t).

\tilde{H}(r,t)=\inf_{\theta>0}F\Big{(}\frac{\theta}{r}\Big{)}r+F\Big{(}\frac{\theta}{t}\Big{)}t=\inf_{\theta>0}\hat{F}(\theta,r)+\hat{F}(\theta,t).

T_{1} : Γ_{0} (R_{+}) \to Γ_{0} (R_{+}), T_{1} (F) (s) := H (1, s),

T_{1} : Γ_{0} (R_{+}) \to Γ_{0} (R_{+}), T_{1} (F) (s) := H (1, s),

D_{T_{1} (F)} (μ_{1} ∣∣ μ_{2}) = D_{T_{1} (F)} (μ_{2} ∣∣ μ_{1}) .

D_{T_{1} (F)} (μ_{1} ∣∣ μ_{2}) = D_{T_{1} (F)} (μ_{2} ∣∣ μ_{1}) .

H (r, t) = (r - t)^{2} .

H (r, t) = (r - t)^{2} .

H(r,t)=r\ln(r)+t\ln(t)-(r+t)\ln\Big{(}\frac{r+t}{2}\Big{)}.

H(r,t)=r\ln(r)+t\ln(t)-(r+t)\ln\Big{(}\frac{r+t}{2}\Big{)}.

U_{p} (s) := \frac{1}{p ( p - 1 )} (s^{p} - p (s - 1) - 1), if p \neq = 0, 1.

U_{p} (s) := \frac{1}{p ( p - 1 )} (s^{p} - p (s - 1) - 1), if p \neq = 0, 1.

H(r,t)=\frac{2}{p}\Big{[}\mathfrak{M}_{1}(r,t)-\mathfrak{M}_{1-p}(r,t)\Big{]},

H(r,t)=\frac{2}{p}\Big{[}\mathfrak{M}_{1}(r,t)-\mathfrak{M}_{1-p}(r,t)\Big{]},

M_{p} (r, t) := (\frac{r ^{p} + t ^{p}}{2})^{\frac{1}{p}} .

M_{p} (r, t) := (\frac{r ^{p} + t ^{p}}{2})^{\frac{1}{p}} .

H(r,t)=(r-t)\ln\Big{(}\frac{r}{t}\Big{)}.

H(r,t)=(r-t)\ln\Big{(}\frac{r}{t}\Big{)}.

H (r, t) = ∣ r - t ∣.

H (r, t) = ∣ r - t ∣.

\tilde{H}(x_{1},r_{1};x_{2},r_{2})=\inf_{\theta>0}r_{1}F\Big{(}\frac{\theta}{r_{1}}\Big{)}+r_{2}F\Big{(}\frac{\theta}{r_{2}}\Big{)}+\theta c(x_{1},x_{2}).

\tilde{H}(x_{1},r_{1};x_{2},r_{2})=\inf_{\theta>0}r_{1}F\Big{(}\frac{\theta}{r_{1}}\Big{)}+r_{2}F\Big{(}\frac{\theta}{r_{2}}\Big{)}+\theta c(x_{1},x_{2}).

\displaystyle H_{p}(x_{1},r;x_{2},t)=\frac{2}{p}\Big{[}\mathfrak{M}_{1}(r,t)-\mathfrak{M}_{1-p}(r,t)\bigg{(}1+(1-p)\frac{d^{2}(x_{1},x_{2})}{2}\bigg{)}_{+}^{\frac{p}{p-1}}\Big{]},\ \ \ p\neq 0,1.

\displaystyle H_{p}(x_{1},r;x_{2},t)=\frac{2}{p}\Big{[}\mathfrak{M}_{1}(r,t)-\mathfrak{M}_{1-p}(r,t)\bigg{(}1+(1-p)\frac{d^{2}(x_{1},x_{2})}{2}\bigg{)}_{+}^{\frac{p}{p-1}}\Big{]},\ \ \ p\neq 0,1.

\displaystyle H_{1}(x_{1},r;x_{2},t)=2\Big{[}\mathfrak{M}_{1}(r,t)-\mathfrak{M}_{0}(r,t)e^{-\frac{d^{2}(x_{1},x_{2})}{2}}\Big{]},

\displaystyle H_{1}(x_{1},r;x_{2},t)=2\Big{[}\mathfrak{M}_{1}(r,t)-\mathfrak{M}_{0}(r,t)e^{-\frac{d^{2}(x_{1},x_{2})}{2}}\Big{]},

\displaystyle H_{0}(x_{1},r;x_{2},t)=r\ln{r}+t\ln{t}-(r+t)\ln{\Big{(}\frac{r+t}{2+d^{2}(x_{1},x_{2})}\Big{)}}.

(x_{1}, r_{1}) \sim (x_{2}, r_{2}) ⟺ r_{1} = r_{2} = 0 \mbox or r_{1} = r_{2}, x_{1} = x_{2} .

(x_{1}, r_{1}) \sim (x_{2}, r_{2}) ⟺ r_{1} = r_{2} = 0 \mbox or r_{1} = r_{2}, x_{1} = x_{2} .

H (r, t) = \frac{( r - t ) ^{2}}{2 ( r + t )} .

H (r, t) = \frac{( r - t ) ^{2}}{2 ( r + t )} .

\mathrm{D}(F):=\big{\{}s\in[0,+\infty):F(s)<+\infty\big{\}}.

\mathrm{D}(F):=\big{\{}s\in[0,+\infty):F(s)<+\infty\big{\}}.

rec (F) (r) := α \to + \infty lim \frac{F ( 1 + α r )}{α}, F_{\infty}^{^{'}} := rec (F) (1) .

rec (F) (r) := α \to + \infty lim \frac{F ( 1 + α r )}{α}, F_{\infty}^{^{'}} := rec (F) (1) .

\hat{F}(r,t):=\begin{cases}F\big{(}\frac{r}{t}\big{)}t&\mbox{if}\ t>0\\ \mathrm{rec}(F)(r)&\mbox{if}\ t=0.\end{cases}

\hat{F}(r,t):=\begin{cases}F\big{(}\frac{r}{t}\big{)}t&\mbox{if}\ t>0\\ \mathrm{rec}(F)(r)&\mbox{if}\ t=0.\end{cases}

F_{0}^{'} := {- \infty lim_{s ↓ 0} \frac{F ( s ) - F ( 0 )}{s} \mbox i f F (0) = + \infty, \mbox o t h er w i se,

F_{0}^{'} := {- \infty lim_{s ↓ 0} \frac{F ( s ) - F ( 0 )}{s} \mbox i f F (0) = + \infty, \mbox o t h er w i se,

\mbox{aff}F_{\infty}:=\begin{cases}+\infty&\mbox{if}\ \ F^{\prime}_{\infty}=+\infty,\\ \lim_{s\to\infty}\big{(}F^{\prime}_{\infty}s-F(s)\big{)}&\mbox{otherwise},\end{cases}

\mbox{aff}F_{\infty}:=\begin{cases}+\infty&\mbox{if}\ \ F^{\prime}_{\infty}=+\infty,\\ \lim_{s\to\infty}\big{(}F^{\prime}_{\infty}s-F(s)\big{)}&\mbox{otherwise},\end{cases}

F^{*} (ϕ) := s \geq 0 sup {s ϕ - F (s)} .

F^{*} (ϕ) := s \geq 0 sup {s ϕ - F (s)} .

R (s) := {F (\frac{1}{s}) s F_{\infty}^{^{'}} \mbox i f s > 0 \mbox i f s = 0,

R (s) := {F (\frac{1}{s}) s F_{\infty}^{^{'}} \mbox i f s > 0 \mbox i f s = 0,

R (1) = 0, R (0) = F_{\infty}^{'}, R_{\infty}^{'} = F (0), R_{0}^{'} = - aff F_{\infty}, aff R_{\infty} = - F_{0}^{'} .

R (1) = 0, R (0) = F_{\infty}^{'}, R_{\infty}^{'} = F (0), R_{0}^{'} = - aff F_{\infty}, aff R_{\infty} = - F_{0}^{'} .

μ_{1} = i = 1 \sum m r_{i} δ_{x_{i}}, μ_{2} = i = 1 \sum m t_{i} δ_{x_{i}},

μ_{1} = i = 1 \sum m r_{i} δ_{x_{i}}, μ_{2} = i = 1 \sum m t_{i} δ_{x_{i}},

D_{F} (μ_{1} ∣∣ μ_{2}) := i = 1 \sum m \hat{F} (r_{i}, t_{i}) = i = 1 \sum m \hat{R} (t_{i}, r_{i}) .

D_{F} (μ_{1} ∣∣ μ_{2}) := i = 1 \sum m \hat{F} (r_{i}, t_{i}) = i = 1 \sum m \hat{R} (t_{i}, r_{i}) .

ψ \leq - F^{*} (ϕ) ⟺ ϕ \leq - R^{*} (ψ) .

ψ \leq - F^{*} (ϕ) ⟺ ϕ \leq - R^{*} (ψ) .

M_{p} (r, t) := (\frac{r ^{p} + t ^{p}}{2})^{\frac{1}{p}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Metric properties of homogeneous and spatially inhomogeneous $F$ -divergences

††thanks: N. De Ponti is with the Department of Mathematics, University of Pavia, Pavia 27100, Italy (e-mail: [email protected])

Nicolò De Ponti

Abstract

In this paper I investigate the construction and the properties of the so-called marginal perspective cost $H$ , a function related to Optimal Entropy-Transport problems obtained by a minimizing procedure, involving a cost function $c$ and an entropy function. In the pure entropic case, which corresponds to the choice $c=0$ , the function $H$ naturally produces a symmetric divergence. I consider various examples of entropies and I compute the induced marginal perspective function, which includes some well-known functionals like the Hellinger distance, the Jensen-Shannon divergence and the Kullback-Liebler divergence. I discuss the metric properties of these functions and I highlight the important role of the so-called Matusita divergences. In the entropy-transport case, starting from the power like entropy $F_{p}(s)=(s^{p}-p(s-1)-1)/(p(p-1))$ and the cost $c=d^{2}$ for a given metric $d$ , the main result of the paper ensures that for every $p>1$ the induced marginal perspective cost $H_{p}$ is the square of a metric on the corresponding cone space.

Index Terms:

$f$ -divergence, induced marginal perspective cost, Optimal Transport, Optimal Entropy-Transport, triangle inequality, power like entropies, Matusita divergences, Kullback-Liebler divergence, Hellinger distance, total variation.

I Introduction

Given a function $F\in\Gamma_{0}(\mathbb{R}_{+}):=\{f:[0,+\infty)\rightarrow[0,+\infty],f\ \mathrm{convex,\ lower\ semicontinuous\ and}\ f(1)=0\}$ , a finite set $\Omega=\{x_{1},..,x_{m}\}$ , and two probability densities

[TABLE]

such that $t_{i}>0$ when $r_{i}>0$ for every $i=1,..,m,$ the $F$ -divergence of $\mu_{1}$ from $\mu_{2}$ is defined as

[TABLE]

where $\hat{F}(r,t):=F\big{(}\frac{r}{t}\big{)}t$ is the perspective function induced by $F$ (here I am using the convention $F\big{(}\frac{0}{0}\big{)}0=0$ ).

Since their introduction by Csiszár [1], Ali and Silvey [2], $F$ -divergences have become a fundamental tool in information theory and statistics. They can be interpreted as a sort of "distance function" on the set of probability distributions, even if they do not generally fulfill the symmetric property and the triangle inequality. I refer to Liese and Vajda [3], [4], and references therein for a systematic presentation of these functionals, including the total variation (for $F(s)=|s-1|$ ), and the $\chi^{\alpha}$ divergences generated by the choice $F(s)=|s-1|^{\alpha}$ (discussed by Vajda in [5]). Another important class of divergences is represented by the so-called Matusita divergences $F(s)=|s^{a}-1|^{\frac{1}{a}}$ [6], which include as a particular case the well known Hellinger distance $F(s)=(\sqrt{s}-1)^{2}$ [7].

Starting from a $F$ -divergence, there is a simple variational way to generate a new symmetric divergence by setting

[TABLE]

This is related to the marginal perspective function $H$ , the lower semicontinuous envelope of the function

[TABLE]

The function $H$ obtained in this way is jointly convex, lower semicontinuous and it is zero on the diagonal. As a result, one gets a natural map

[TABLE]

with the additional property

[TABLE]

Using different functions $F\in\Gamma_{0}(\mathbb{R}_{+})$ , that I also call entropy functions in the present paper, the minimizing procedure (3) gives raise to well-known statistical functionals.

For the function $F(s)=U_{1}(s):=s\ln(s)-s-1$ , the result is the Hellinger distance [7]

[TABLE]

When $F(s)=U_{0}(s):=s-1-\ln(s)$ , one gets the Jensen-Shannon divergence [8]

[TABLE]

The previous examples are taken from the class of the power like entropies $\{U_{p}\}$

[TABLE]

They give raise to the family of functions

[TABLE]

where the expression is written in the terms of the power mean

[TABLE]

The entropy $F(s)=s^{2}-2\ln(s)-1$ produces the symmetric Kullback-Leibler divergence [9]

[TABLE]

The marginal perspective function can also be computed starting from non-smooth entropies as $F(s)=|s-1|$ , which induces the celebrated total variation distance

[TABLE]

The metric properties of the $F$ -divergences have been investigated by many authors like Csiszar, Endres, Kafka, Osterreicher, Schindelin, Vincze ([10], [11], [12], [13], [14]), to cite only a few. In the pure entropic setting, I generalize a previous result of Osterreicher [13] and I prove that, for the power like entropy $U_{p}$ , the induced function $H$ given by (9) is the square of a metric on $[0,+\infty)$ for every $p\in(-\infty,\frac{1}{2}]\cup[1,+\infty).$

In the pure entropic case, I also characterize the limit of the sequence $T_{1}^{(n)}(F)$ and I prove that the total variation and its positive multiples are the only divergences that are also a distance. Under additional assumptions, the convergence properties of the sequence $T_{a}^{(n)}(F)$ are also studied, where I put $T_{a}(F):=2^{\frac{1}{a}-1}T_{1}(F)$ , $a\in(0,1)$ . I will show that this is strictly related to those divergences $F$ for which $H^{a}$ is a distance, and I will emphasize the central role of the class of Matusita divergences.

Recently, $F$ -divergences have been considered by Liero, Mielke, Savaré [15] as penalizing functionals in the formulation of Optimal Entropy-Transport problems, a generalization of Optimal-Transport problems obtained by relaxing the marginal constraints. Given a cost function $c:X_{1}\times X_{2}\rightarrow[0,+\infty)$ and an admissible entropy function $F\in\Gamma_{0}(\mathbb{R}_{+})$ , a crucial role in the theory is played by the induced marginal perspective cost $H:X_{1}\times[0,+\infty)\times X_{2}\times[0,+\infty)\rightarrow[0,+\infty]$ , the lower semicontinuos envelope of the function

[TABLE]

The function $H$ remains positively $1$ -homogeneous with respect to $(r_{1},r_{2})$ , a property used in [15] in order to derive a "homogeneous formulation" of Optimal Entropy-Transport problems that allows the study of the metric and dynamical aspects of the theory.

When the starting entropy $F$ has a strict minimum at $s=1$ , and the cost $c$ is a symmetric function such that $c(x_{1},x_{2})=0$ if and only if $\ x_{1}=x_{2}$ , I will show that the induced marginal perspective cost $H$ is symmetric, non-negative and $H(x_{1},r_{1};x_{2},r_{2})=0$ if and only if $(x_{1},r_{1})=(x_{2},r_{2})$ or $r_{1}=r_{2}=0$ .

In the presence of a non-zero cost function $c$ , an explicit computation of the induced marginal perspective cost is often unavailable. A special case, central in the study of Optimal Entropy-Transport problems, is given by the choices $X_{1}=X_{2}=X$ , $c=d^{2}$ for a metric $d$ on $X$ , and $F=U_{p}$ . It holds

[TABLE]

When $p=1$ or $p=0$ , one gets

[TABLE]

Our main theorem states that for any $p\geq 1$ the square root of $H_{p}$ satisfies the triangle inequality on the cone space over $X$ . The latter is the space $\mathfrak{C}=Y/{\sim}$ , where $Y=X\times[0,+\infty)$ and

[TABLE]

Thus, I provide new examples of entropy-transport metrics besides the Gaussian Hellinger-Kantorovich distance ( $p=1$ ) and the related Hellinger-Kantorovich distance studied in [15]. The class of examples includes, for $p=2$ , a transport variant of the Vincze-Le Cam distance [16], [17],

[TABLE]

This paper is organized as follows.

In Section II, I recall some basic concepts of convex analysis, in particular I discuss the connection between the entropy function and the induced perspective function.

In the third section, I recall the definition of the power means and their main properties. The results in this section will be useful in the study of the marginal perspective cost generated by the power like entropies.

Section IV is devoted to the study of the costless version of the function $H$ . I provide a list of examples of admissible entropy functions, which includes indicator functions, $\chi^{\alpha}$ divergences, Matusita divergences, power like entropies and other two families of convex functions that I have called power-logarithmic entropies and double power entropies. Then, I compute the induced marginal perspective function and I discuss the metric properties of the function obtained starting from some of the previous examples. Finally, I study the convergence properties of the iteration of the minimizing procedure (3) and I will highlight the role of the class of Matusita divergences.

In the fifth section I introduce the notion of homogeneous marginal perspective cost and I discuss its main properties.

In section VI, I present the Optimal Entropy-Transport problem and I briefly motivate the "homogeneous formulation" of this problem, via the homogeneous marginal perspective cost.

In the last section I focus on the marginal perspective cost $H_{p}$ induced by the power like entropy $U_{p}$ and by the cost $c=d^{2}$ , for a given metric $d$ . I prove the main theorem of the paper, which ensures that the function $H_{p}$ is the square of a metric on the corresponding cone space.

For the sake of simplicity, I limit the discussion to finite nonnegative measures over finite discrete set, but the results can be generalized to finite nonnegative Radon measures over Hausdorff topological spaces (see [15]). I plan to address this case in a future work.

In this paper, a real function $f$ is increasing (resp. decreasing) if for any $r<s$ we have $f(r)\leq f(s)$ (resp. $f(r)\geq f(s)$ ).

II Entropy functions

A function $F:[0,+\infty)\rightarrow[0,+\infty]$ belongs to the class $\Gamma_{0}({\mathbb{R}_{+}})$ of admissible entropy functions if $F$ is convex, lower semicontinuous and $F(1)=0$ . The domain of the function $F$ is the set

[TABLE]

Let $F\in\Gamma_{0}({\mathbb{R}_{+}})$ , the recession function $\mathrm{rec}(F)$ and the recession constant $F^{{}^{\prime}}_{\infty}$ are defined by

[TABLE]

The perspective function induced by $F$ is the function $\hat{F}:[0,+\infty)\times[0,+\infty)\rightarrow[0,+\infty]$ , given by

[TABLE]

$\hat{F}$ is jointly convex, lower semicontinuous and $\hat{F}(r,r)=0$ for any $r$ .

The right derivative $F^{\prime}_{0}$ at [math], and the asymptotic affine coefficient $\mbox{aff}F_{\infty}$ are defined by

[TABLE]

which are well posed due to the convexity of $F$ .

The Legendre conjugate function $F^{*}:\mathbb{R}:\rightarrow(-\infty,+\infty]$ is defined by

[TABLE]

$F^{*}$ is the conjugate of the convex function $\tilde{F}:\mathbb{R}\rightarrow[0,+\infty]$ obtained by extending $F$ to $+\infty$ for negative arguments. It is convex and lower semicontinuous. Concerning the behavior of $F^{*}$ , the following Lemma holds ([15], section $2.3$ ):

Lemma 1.

The function $F^{*}$ is an increasing homeomorphism between $(F_{0}^{\prime},F^{\prime}_{\infty})$ and $(-F(0),\mathrm{aff}F_{\infty})$ with $F^{*}(0)=0$ .

The reverse entropy function $R:[0,\infty)\rightarrow[0,\infty]$ is defined by

[TABLE]

so that $R(s)=\hat{F}(1,s).$ In particular, $R$ is convex, lower semicontinuous and the map $F\mapsto R$ is an involution of $\Gamma_{0}({\mathbb{R}_{+}})$ . Moreover, it holds $\hat{F}(r,t)=\hat{R}(t,r)$ and the function $R$ satisfies

[TABLE]

Starting from a function $F\in\Gamma_{0}(\mathbb{R}_{+})$ , a finite set $\Omega=\{x_{1},..,x_{m}\}$ , and two probability densities

[TABLE]

the $F$ -divergence of $\mu_{1}$ from $\mu_{2}$ is given by

[TABLE]

The Legendre conjugates of $F$ and $R$ are related by

[TABLE]

III Power means

In this section I study the power means (also called generalized means), a family of functions that includes the well-known arithmetic, geometric and harmonic means. The property of these functions will be useful later on.

In what follows $r,t$ will denote two non-negative real numbers and $p$ a real parameter, which I suppose for the present not to be [math]. The $p$ -power mean between $r$ and $t$ is given by

[TABLE]

except when $p<0$ and $r$ or $t$ is zero. In this case $\mathfrak{M}_{p}$ is equal to zero:

[TABLE]

In the case $p=0$ I put

[TABLE]

so that $\lim_{p\to 0}\mathfrak{M}_{p}(r,t)=\mathfrak{M}_{0}(r,t).$

It is easy to see that $\mathfrak{M}_{p}(r,r)=r$ for every $p\in\mathbb{R}$ and every $r\geq 0$ . The function $\mathfrak{M}_{p}$ is symmetric, i.e. $\mathfrak{M}_{p}(r,t)=\mathfrak{M}_{p}(t,r),$ and positively $1$ -homogeneous in the sense that $\mathfrak{M}_{p}(\lambda r,\lambda t)=\lambda\mathfrak{M}_{p}(r,t)$ for every $\lambda\geq 0.$ Moreover, it is not difficult to prove that $M_{p}(r,s)\leq M_{p}(r,t)$ for every $p$ , $r$ and $s\leq t.$

$\mathfrak{M}_{1}$ is the well-known arithmetic mean, $\mathfrak{M}_{0}$ is the geometric mean and $\mathfrak{M}_{-1}$ is called harmonic mean.

The main theorem (see [18] for a proof) regarding the power means is the following:

Theorem 1.

If $p_{1}<p_{2}$ then

[TABLE]

with the case of equality given by $r=t$ , or $p_{2}\leq 0$ and $r\wedge t=0$ .

In particular,

[TABLE]

for any $p\in\mathbb{R}$ , $r,t\in[0,\infty).$

IV Costless marginal perspective

Let $F\in\Gamma_{0}(\mathbb{R}_{+})$ be an admissible entropy function and let $R$ be its reverse entropy. In general, for the induced perspective function one has $\hat{F}\neq\hat{R}$ , so that the $F$ -divergence does not satisfy the symmetric property. In order to replace $F$ with a new "symmetric entropy", a natural procedure is the following: define the marginal perspective function $H_{F}:[0,+\infty)\times[0,+\infty)\rightarrow[0,+\infty]$ as the lower semicontinuous envelope of the function

[TABLE]

An equivalent definition can be given in term of the induced perspective functions $\hat{F}$ or $\hat{R}$ by:

[TABLE]

The infimum in the definition is a minimum and it occurs in the interval $[r_{1},r_{2}]$ (without loss of generality I am assuming $r_{1}\leq r_{2}$ ): to see this it is enough to notice that the function $\theta\mapsto\hat{F}(\theta,r_{1})+\hat{F}(\theta,r_{2})$ is lower semicontinuous and it is decreasing in $[0,r_{1}]$ and increasing in $[r_{2},+\infty)$ . I will prove in section V (in a more general context), that the function $H_{F}$ is non-negative, symmetric, jointly convex and positively $1$ -homogeneous. Moreover, when the function $F$ has a strict minimum at $1$ , $H_{F}(r,t)=0$ if and only if $r=t$ . It is important to notice, since $H_{F}$ is $1$ -homogeneous, that the study of the function $H_{F}$ is equivalent to the study of the $1$ -variable function $s\mapsto H_{F}(1,s)\in\Gamma_{0}(\mathbb{R}_{+}).$ I will continuously use this fact in the paper.

IV-A Examples

I consider now different examples of admissible entropy function $F$ and I compute the expression of the induced marginal perspective $H_{F}$ . I will in general suppose $rt>0$ , so that I can avoid ambiguous expressions at the boundary of the domain that should be treated carefully.

Example 1.

(Indicator functions) The indicator function of the closed interval with endpoints $a$ and $b$ , $0\leq a\leq 1\leq b\leq+\infty$ , is defined by

[TABLE]

When $F=I_{[a,b]}$ one obtains

[TABLE]

where $\frac{b}{a}=+\infty$ if $a=0$ and $\frac{a}{b}=0$ if $b=+\infty$ .

Example 2.

( $\chi^{\alpha}$ divergences) Given a parameter $\alpha\geq 1$ , the $\chi^{\alpha}$ divergence is defined as

[TABLE]

$\chi^{1}=|s-1|$ * is the famous total variation entropy.*

The entropy function $F=\chi^{\alpha}$ gives raise to the marginal perspective function

[TABLE]

We can recognize the expression of the so-called Puri-Vincze divergence.

Example 3.

(Matusita divergences) For $0<a\leq 1$ the Matusita divergence is given by $M_{a}(s)=|s^{a}-1|^{\frac{1}{a}}$ . Clearly $\chi^{1}=M_{1}.$

When $F=M_{a}$ it is easy to see that

[TABLE]

It is interesting to note that except for the constant factor $2^{1-\frac{1}{a}}$ , the Matusita function $M_{a}$ remains invariant after the minimizing procedure (33). I will come back to this point in section IV-C.

Example 4.

(Power like entropies) Let $p$ be any real number. I call power-like entropy of order $p$ the function $U_{p}:[0,+\infty)\rightarrow[0,+\infty]$ characterized by

[TABLE]

The function $U_{p}$ can be computed explicitly and one gets:

[TABLE]

with $\displaystyle U_{p}(0)=1/p$ for $p>0$ and $U_{p}(0)=+\infty$ for $p\leq 0$ . This family of functions, also called Dichotomy Class, was introduced by Liese and Vajda [19],[4].

Given $F=U_{p}$ , we obtain the following expression:

[TABLE]

We can recognize some well-known statistical functionals: for example in the logarithmic entropy case $p=1$ it appears the Hellinger distance

[TABLE]

I have already notice that the same function is obtained starting from the entropy $U_{\frac{1}{2}}(s)=2(\sqrt{s}-1)^{2}=2M_{\frac{1}{2}}$ .

For $p=0$ we have the Jensen-Shannon divergence, a squared distance between measures derived from the Kullback-Leibler divergence ([11]).

The quadratic entropy $U_{2}(s)=\frac{1}{2}(s-1)^{2}$ gives raise to the triangular discrimination

[TABLE]

Example 5.

(Power-logarithmic entropies) Given a real number $p\geq 1$ , I call power-logarithmic entropy of order $p$ the function $V_{p}:[0,+\infty)\rightarrow[0,+\infty]$

[TABLE]

and $V_{p}(0)=+\infty.$ It is easy to see that $V_{p}\in\mathcal{C}^{\infty}(0,+\infty)$ and $V_{p}(0)=\lim_{s\downarrow 0}V_{p}(s)$ .

Starting from the power-logarithmic entropy of order $p$ one gets:

[TABLE]

As expected, $H_{V_{1}}=H_{U_{0}}$ since $V_{1}=U_{0}$ . When $p=2$ , one obtains the symmetric Kullback-Leibler divergence [9]:

[TABLE]

Example 6.

(Double power entropies) Given two parameters $p,q$ such that $p\geq 1,\ 0<q\leq 1$ and $p\neq q$ , or $p<0,\ q\geq 1$ , the double power entropy of order $p,q$ is given by

[TABLE]

$W_{p,q}$ * is a strictly convex function, $W_{p,q}\in\mathcal{C}^{\infty}(0,+\infty)$ , and it is extendex in [math] by continuity so that $W_{p,q}(0)=p-q$ when $p,q$ are positive, $W_{p,q}(0)=+\infty$ when $p<0.$ *

A direct computation shows that:

[TABLE]

For example, when $p=3/2,q=1/2$ one gets

[TABLE]

IV-B Divergences and triangle inequality

As we have previously seen, starting from a function $F\in\Gamma_{0}(\mathbb{R}_{+})$ such that $F(s)=0$ if and only if $s=1$ , the marginal perspective function $H$ is non-negative, symmetric and $H(r,t)=0$ if and only if $r=t$ (if no confusion is possible, from now on I will denote by $H$ the function $H_{F}$ ). In this section I begin the discussion regarding another property that $H$ has to fulfill in order to be a metric on $[0,\infty)$ : the triangle inequality.

When I write " $d$ is a metric on a space $X$ " I mean that $d:X\times X\rightarrow[0,+\infty)$ is a function such that $d(x,y)=0$ if and only if $x=y$ , it is symmetric, i.e. $d(x,y)=d(y,x)$ for every $x,y\in X$ , and it satisfies the triangle inequality in the sense that $d(x,z)\leq d(x,y)+d(y,z)$ for every $x,y,z\in X$ .

Since I will prove that the only divergence that is also a distance is the total variation, I will also discuss when the power $H^{a}$ , $a\in(0,1)$ , is a metric on $[0,+\infty)$ .

The convexity of the function $H$ implies that

[TABLE]

I recall this simple Lemma:

Lemma 2.

Let $(X,d)$ be a metric space and $f:[0,+\infty)\rightarrow[0,+\infty)$ be a concave function such that $f(r)=0$ if and only if $r=0$ . Then $(X,f(d))$ is a metric space.

Proof.

$f(d(x_{1},x_{2}))\geq 0$ and $f(d(x_{1},x_{2}))=0$ if and only if $d(x_{1},x_{2})=0$ which implies $x_{1}=x_{2}$ . It is clear that $f(d)$ is symmetric. Since $f$ is concave and $f(r)>0$ for every $r>0$ it follows that $f$ is increasing and subadditive, thus

[TABLE]

∎

An easy consequence of the Lemma is that if $H^{a}$ is a metric, then $H^{b}$ is a metric for every $b\in(0,a]$ .

Using the symmetry, the $1$ -homogeneity of the function $H$ together with the property (52), it follows that the triangle inequality for the function $H^{a}$ is equivalent to the following inequality

[TABLE]

A last useful remark is that

[TABLE]

is a necessary condition for the existence of a power $a$ such that $H^{a}$ is a metric.

Regarding the examples previously seen, it was proved by Kafka, Osterreicher and Vincze [12] that $H^{a}_{\chi^{\alpha}}$ is a metric when $a=1/\alpha.$

The Matusita divergences clearly provide the distance $H^{a}_{M_{a}}$ .

When $p>1$ , $\lim_{u\downarrow 0}H_{V_{p}}(u,1)=+\infty$ so that, except for the case $p=1$ , the power-logarithmic entropy is not a metric for every power $a$ .

I now turn the attention to the function $H_{p}:=H_{U_{p}}$ . It has the following expression

[TABLE]

that is also valid when $rt=0$ with the convention $0\ln(0)=0$ .

As I have already notice, $H_{p}$ is the square of a metric on $[0,+\infty)$ for $p=0,p=\frac{1}{2},p=1$ . I investigate now the same question for every real number $p.$ This was already done by Osterreicher in the case $p\geq 1$ [13]. Following the same approach I prove:

Theorem 2.

The induced marginal perspective function $H_{p}$ is the square of a metric on $[0,+\infty)$ for any $p\in(-\infty,\frac{1}{2}]\cup[1,+\infty)$ . $\sqrt{H_{p}}$ does not satisfy the triangle inequality if $p\in(\frac{1}{2},1).$

For the proof of the Theorem I will use the following lemma. It is the first example in the paper of a fact that will be recurrent: the central role of the class of Matusita divergences in the study of the metric properties of the marginal perspective function.

Lemma 3.

Given a number $a\in(0,1]$ and an induced marginal perspective function $H$ , if

[TABLE]

is decreasing in $[0,1),$ then $H^{a}$ satisfies the triangle inequality.

Proof.

Due to the monotonicity of the square root function, one has that

[TABLE]

is decreasing in $[0,1)$ , so that $h^{a}(u)\geq h^{a}(v)$ and $h^{a}(u)\geq h^{a}(\frac{u}{v})$ if $0\leq u<v<1$ . It follows that

[TABLE]

∎

Proof of Theorem 2.

Using now Lemma 3, it remains to show that the function

[TABLE]

is decreasing in $(0,1)$ , where I have used the notation $f_{p}(u):=H_{p}(u,1).$ The derivative of the function $h_{p}$ is the following:

[TABLE]

where I set

[TABLE]

Note that $\phi_{p}(1)=0$ and $\psi_{p}(u)=\sqrt{u}\phi_{p}^{\prime}(u)$ satisfies:

[TABLE]

The function $\psi_{p}$ is such that $\psi_{p}(1)=0$ and

[TABLE]

Now let us suppose $p>1$ : I have to prove that $\phi_{p}$ is positive in $(0,1)$ . This is implied by $\psi_{p}(u)<0$ in $(0,1)$ which is true because $\psi^{\prime}_{p}(u)$ is positive in $(0,1)$ . Similar considerations can be applied to the case $p<0$ and $p\in(0,\frac{1}{2}).$

For $p\in(\frac{1}{2},1)$ one gets $\psi_{p}^{\prime}(u)<0$ in $(0,1)$ so $\psi_{p}$ is positive in $(0,1)$ . This implies that $\phi_{p}$ is negative and so $h_{p}$ is increasing in $(0,1)$ . As a consequence, an analysis of the proof of Lemma 3 shows that the triangle inequality is reversed for these values of $p$ . ∎

Remark 1.

It was proved by Osterreicher and Vajda ([14]) that, if $p\in(\frac{1}{2},1)$ , $H_{p}^{1-p}$ is a metric.

IV-C Marginal perspective function and convergence properties

We have seen that the construction of the marginal perspective function naturally produces a symmetric divergence. In this section I will show that this is not the only feature of the minimization procedure (33): iterating this process I will highlight the important role of the class of Matusita divergences.

I define the space $\Gamma_{0}^{s}(\mathbb{R}_{+})$ as the set of functions $F\in\Gamma_{0}(\mathbb{R}_{+})$ such that $F$ is equals to its reverse entropy $R$ .

At the beginning of section IV we have seen how to generate a map $T_{1}:\Gamma_{0}(\mathbb{R}_{+})\rightarrow\Gamma_{0}^{s}(\mathbb{R}_{+})$ : starting from a function $F\in\Gamma_{0}(\mathbb{R}_{+})$ , I define $T_{1}(F)(s):=H_{F}(1,s)$ , where $H_{F}$ is the lower semicontinuous envelope of the function $\tilde{H}_{F}$ obtained by (33). I also denote by $T_{a}:\Gamma_{0}(\mathbb{R}_{+})\rightarrow\Gamma_{0}^{s}(\mathbb{R}_{+})$ the map given by $T_{a}(F):=2^{\frac{1}{a}-1}T_{1}(F)$ for every $a\in(0,1]$ .

It is clear that the two trivial entropies

[TABLE]

are fixed points of the map $T_{a}$ for any $a\in(0,1]$ .

Another important property that follows immediately from the definition is that

where $\displaystyle g(u):=\frac{F^{a}(u)}{1-u^{a}}.$ Since the triangle inequality holds, at least one of the numbers $\frac{g(u)}{g(v)}$ and $\frac{g(u)}{g(\frac{u}{v})}$ is less or equal than $1$ . Choosing $u:=v^{2}$ , it follows $g(v^{2})\leq g(v)$ for any $v<1$ . By contradiction let us suppose it does not exists a positive constant $c$ such that $F(s)>c|s^{a}-1|^{\frac{1}{a}}$ , then it exists a sequence $v_{n}\in(0,1)$ such that $g(v_{n})\rightarrow 0$ . So, I can find a $\bar{v}\in(0,1)$ such that $D(0,1)=g(0)>g(\bar{v})$ . On the other hand, since the sequence $w_{n}$ defined by $w_{0}=\bar{v}$ , $w_{n}=w^{2}_{n-1}$ converges to [math], by continuity of the function $g$ we have that $g(w_{n})\rightarrow g(0)$ which is a contradiction since $g(0)>g(w_{0})$ and $g(w_{n})$ is decreasing.

Now it is easy to show that the metric $D$ is complete: since $H^{a}$ is a metric, $H$ is symmetric and $D(0,1)=F^{a}(0):=c_{2}<+\infty$ . From the convexity of the function $F$ it follows $F^{a}(s)\leq c_{2}|s-1|^{a}$ so that

[TABLE]

The result follows using the fact thta $M^{a}_{a},M^{a}_{1}$ are two complete metrics that induce the same convergence. ∎

Recall that, given a metric space $(X,d)$ and the interval $I=[0,1]$ , a curve $\gamma:I\rightarrow X$ is a constant speed geodesic if

[TABLE]

A metric space $(X,d)$ is a geodesic space if for every pair of points $x,y\in X$ it exists a constant speed geodesic between $x$ and $y$ . A well-known fact is that a complete metric space is a geodesic space if and only if for every pair of points $x,y\in X$ it exists $z\in X$ such that $d(x,z)=d(z,y)=\frac{1}{2}d(x,y)$ . The point $z$ is called mid-point between $x$ and $y$ .

I am now ready to prove the analogous of Theorem 3 in the case $0<a<1$ , under an additional assumption.

Theorem 4.

Let $F\in\Gamma_{0}^{s}(\mathbb{R}_{+})$ and let us suppose that $H^{a}$ , $a\in(0,1)$ , is a distance and $T_{a}(F)=F$ . Then $F(s)=c|s^{a}-1|^{\frac{1}{a}}$ for a constant $c\in(0,+\infty)$ .

Proof.

Since $T_{a}(F)=F$ one has that for any $r,t$ it exists $s$ such that

[TABLE]

Using the fact that $H^{a}$ is a metric and the concavity of the function $f(x)=x^{a}$ one gets

[TABLE]

Equation (67) implies the equality in the inequality (68), in particular $H^{a}(r,s)=H^{a}(s,t).$

Since $r,t$ are two arbitrary points and $H^{a}$ is a complete metric from Lemma 6, it follows that $(\mathbb{R}_{+},H^{a})$ is a one dimensional geodesic space, so it must be isometric to $(\mathbb{R}_{+},|\cdot|)$ (for a reference see [20], chapter $2$ ). In particular it exists $\phi:\mathbb{R}_{+}\rightarrow\mathbb{R}_{+}$ increasing and continuous such that I can write $H^{a}(r,t)=|\phi(t)-\phi(r)|$ . From the $1$ -homogeneity of the function $H$ , it follows $H^{a}(r,t)=r^{a}H^{a}(1,\frac{t}{r})$ for $r>0$ , so that

[TABLE]

Evaluating equation (69) for $t=2r$ I get

[TABLE]

whereas the choice $r=2$ yields

[TABLE]

Now consider the previous equation with $t=2r$ , it follows

[TABLE]

Using now the identities (70) and (72), it follows

[TABLE]

and I can compute $\phi(r)$ as

[TABLE]

so that $\displaystyle H^{a}(1,r)=(r^{a}-1)\frac{H^{a}(2,1)}{2^{a}-1}$ for any $r\geq 1$ , which prove the theorem. ∎

Remark 2.

I do not know if the assumption that $H^{a}$ is a metric can be removed in order to obtain the same characterization as in Theorem 3. The difficulty is that the value of the function $s\mapsto 2^{\frac{1}{a}-1}H(r,s)+2^{\frac{1}{a}-1}H(s,t)$ at $s=r$ and $s=t$ is strictly greater that $H(r,t)$ , unless $a=1$ .

In order to obtain that also in the case $0<a<1$ the limit function is a fixed point of the map $T_{a}$ , I need the following Lemma:

Lemma 7.

Let $X$ be a compact space and let $f_{n}:X\rightarrow[0,+\infty]$ be a sequence of lower semicontinuous functions such that $f_{n}(x)\leq f_{n+1}(x)$ for every $n\in\mathbb{N}$ and every $x\in X$ . Then

[TABLE]

where I put $f_{\infty}(x):=\lim_{n\to\infty}f_{n}(x).$

Proof.

The functions $f_{n}$ and $f_{\infty}$ are lower semicontinuous over a compact set so that they have a minimum. Since $f_{n}(x)\leq f_{\infty}(x)$ for every $x\in X$ it is clear that

[TABLE]

Let us suppose now $a<\min_{x\in X}f_{\infty}(x)$ , so that for every $x\in X$ $a<f_{\infty}(x)$ . Since $\lim_{n}f_{n}(x)=f_{\infty}(x)$ , it exists $n=n(x)$ such that $a<f_{n}(x)$ . It follows that the family $\{a<f_{n}\}_{n\in\mathbb{N}}$ is an open cover of $X$ . Let $n_{1},...,n_{j}$ be a finite collection of indexes such that

[TABLE]

Let $N:=\max\{n_{1},...,n_{j}\}$ , so that $X\subset\{a<f_{N}\}$ since $f_{n}$ are increasing. This implies that $a<f_{n}(x)$ for every $x\in X$ so that $a<\lim_{n\to\infty}\min_{x\in X}f_{n}(x)$ . Since $a$ is an arbitrary number less than $\min_{x\in X}f_{\infty}(x)$ , the Lemma follows. ∎

I can now state the Theorem about the convergence of the iterations of the map $T^{a}$ .

Theorem 5.

Let $a\in(0,1)$ . Given a function $F\in\Gamma_{0}^{s}(\mathbb{R}_{+})$ , if $H^{a}$ is a metric then the sequence $\{T_{a}^{(n)}(F)\}$ converges pointwise to a fixed point of the map $T_{a}$ . In particular, if the limit function $F^{\infty}$ is such that $(H^{\infty})^{a}$ is a metric, then $F^{\infty}(s)=c|s^{a}-1|^{\frac{1}{a}}$ where $c\in(0,+\infty).$

Proof.

Lemma 4 implies that $T_{a}(F)\geq F$ . By the monotonicity property (63) the sequence $T_{a}^{(n)}(F)$ is increasing so it converges pointwise to a function $F^{\infty}:\mathbb{R}_{+}\rightarrow[0,\infty]$ . Since $H^{a}$ is a metric, $F$ is convex and finite everywhere (thus continuous), as well as $T_{a}^{(n)}(F)$ . I want to show that $F^{\infty}$ is a fixed point of $T_{a}$ :

[TABLE]

where I have denoted by $sc^{-}(f)$ the lower semicontinuous envelope of the function $f$ and I have used Lemma 7 applied to $f_{n}(\theta):=T_{a}^{(n)}(F)(\theta)+\theta T_{a}^{(n)}(F)(\frac{s}{\theta})$ and $X:=[1,s]$ . The conclusion follows from Theorem 4. ∎

Remark 3.

It is not difficult to show that $F^{\infty}$ can be equal to $I_{\{1\}}$ . For example, take $F(s)=|s-1|$ and consider the sequence $T_{a}^{(n)}(F)$ with $a\in(0,1).$

[TABLE]

is increasing in $(1,+\infty)$ : this is obvious in the interval $(1,b]$ ; consider now two numbers $r,t$ such that $b<r<t$ . I define $s\mapsto l_{r}(s)$ to be the affine function that coincide with $\underline{F}$ at $b$ and such that $l_{r}(r)=c|r^{a}-1|^{\frac{1}{a}})$ , and I notice that the convexity of the function $s\mapsto c|s^{a}-1|^{\frac{1}{a}}$ implies that the slope of $l_{r}$ is greater or equal than the positive slope of the function $\underline{F}$ in $(b,+\infty)$ . Using again the convexity of the function $c|s^{a}-1|^{\frac{1}{a}}$ and the trivial fact that the quotient

[TABLE]

is increasing in $(b,+\infty)$ , I conclude because

[TABLE]

∎

Theorem 6.

Let $F\in\Gamma_{0}^{s}(\mathbb{R}_{+})$ be a function such that

[TABLE]

Then

[TABLE]

Proof.

For every $\epsilon>0$ it exists a $b>1$ such that

[TABLE]

so that

[TABLE]

where $\underline{F},\bar{F}$ are defined in Lemma 8 and 9. Take now an arbitrary $s\in\mathbb{R}_{+}$ , from the monotonicity property (63) it follows

[TABLE]

so that by Lemma 8 and Lemma 9 one gets

[TABLE]

Since $\epsilon$ is arbitrary, it exists the limit of $T_{a}^{(n)}(F)(s)$ and it is equal to $c|s^{a}-1|^{\frac{1}{a}}$ .

∎

V Marginal perspective cost

V-A Marginal perspective function

In this section I introduce the marginal perspective cost. I will modify the definition of marginal perspective function that we have seen in section IV in order to take into account the presence of a cost function. The construction is motivated by the study of optimal entropy-transport problem (see [15], section $5$ , and the section VI of the present paper).

First of all, given a number $c\in[0,+\infty)$ and an admissible entropy function $F$ , the marginal perspective function $H_{c}:[0,\infty)\times[0,\infty)\rightarrow[0,\infty]$ is defined as the lower semicontinuous envelope of the function

[TABLE]

where $R$ is the reverse entropy function of $F$ . Of course, the function $H_{0}$ coincides with the marginal perspective function $H_{F}$ introduced in section IV. When the numbers $r_{1},r_{2}$ are positive, the function $\tilde{H}_{c}$ can be also computed as

[TABLE]

or in terms of the perspective function as

[TABLE]

For $c=+\infty$ I set

[TABLE]

The following lemma, proved in [15] (lemma $5.3$ ), gives a dual characterization of $H_{c}$ :

Lemma 10.

For every $c\geq 0$ the function $H_{c}$ can be represented as

[TABLE]

In particular, the marginal perspective function is lower semicontinuous, convex and positively $1$ -homogeneous with respect to $(r_{1},r_{2})$ , increasing and concave with respect to $c$ . Moreover, $H_{c}$ coincides with $\tilde{H}_{c}$ in the interior of its domain.

V-B Induced marginal perspective cost

When $c=c(x_{1},x_{2})$ is a function $c:X_{1}\times X_{2}\rightarrow[0,+\infty]$ , the induced marginal perspective cost is the function $H:X_{1}\times[0,+\infty)\times X_{2}\times[0,+\infty)\rightarrow[0,+\infty]$ defined as

[TABLE]

A particularly important case is when $X_{1}=X_{2}=X$ and $c$ is induced by a metric $d$ on $X$ .

Given a metric space $(X,d)$ , I am interested in determining when the function $H$ is the power of a metric on the corresponding cone space. The latter is the space $\mathfrak{C}=Y/{\sim}$ , where $Y=X\times[0,+\infty)$ and

[TABLE]

It is important to highlight that the space $\mathfrak{C}$ can be endowed with a "natural" metric $d_{\mathfrak{C}}$ (see [20], Prop. $3.6.13$ ):

[TABLE]

Theorem 7.

Let $F(s)$ be an admissible entropy function with a strict minimum at $s=1$ and let $c$ be a symmetric function such that $c(x_{1},x_{2})=0$ if and only if $\ x_{1}=x_{2}$ . Then the induced marginal perspective cost $H$ is symmetric, non-negative and $H(x_{1},r_{1};x_{2},r_{2})=0$ if and only if $(x_{1},r_{1})\sim(x_{2},r_{2})$ . In particular, $H$ is a well defined function on the cone $\mathfrak{C}$ .

Proof.

Since $0\in\mathrm{D}(R^{*})$ and $R^{*}(0)=0$ it is clear that $H\geq 0$ . Moreover, when $r_{1}=r_{2}=0$ it follows from the dual representation (84) that $H(x_{1},r_{1};x_{2},r_{2})=0$ . If $(x_{1},r_{1})\sim(x_{2},r_{2})$ and $r_{1}=r_{2}>0$ then $c(x_{1},x_{2})=0$ and the fact that the marginal perspective cost is null follows from the possible choice $\theta=r_{1}$ in the expression (79). Since $c$ is symmetric it is clear that

[TABLE]

It remains to prove that $H=0$ implies $(x_{1},r_{1})\sim(x_{2},r_{2})$ . Lemma 1 and equation (25) tell us that $R^{*}$ is an increasing homeomorphism between $(-\mathrm{aff}F_{\infty},F(0))$ and $(-F^{\prime}_{\infty},-F^{\prime}_{0})$ with $R^{*}(0)=0$ . Since $F$ is a convex function with a strict minimum at $s=1$ , it holds $\mathrm{aff}F_{\infty}>0,\ F(0)>0,\ F^{\prime}_{\infty}>0,F^{\prime}_{0}<0$ . In particular, it exists a positive number $k>0$ such that the function $R^{*}$ is finite, continuous and strictly increasing in $(-k,k)$ . Hence, it follows again from the representation (84) that $H(x_{1},r_{1};x_{2},r_{2})=0$ and $c(x_{1},x_{2})>0$ implies $r_{1}=r_{2}=0$ . Moreover, when $c(x_{1},x_{2})=0$ we must have $r_{1}=r_{2}$ : suppose by contradiction that $0=r_{1}<r_{2}$ (the other case is similar), in the equation (84) we find $-k<\psi_{1}<0<\psi_{2}<k$ such that $R_{1}^{*}(\psi_{1})+R^{*}(\psi_{2})\leq 0$ , contradicting the fact $H=0$ . Finally, when $H(x_{1},r_{1};x_{2},r_{2})=0$ , $c(x_{1},x_{2})=0$ and $r_{1},r_{2}$ are positive I can prove that $r_{1}=r_{2}$ using the fact that $\tilde{H}_{0}=0$ implies $r_{1}=r_{2}$ because, using now the expression (80), I know that for every natural $n$ it exists $\theta_{n}$ such that

[TABLE]

In particular, for $n$ large enough, $\theta_{n}\in[K_{1},K_{2}]$ for some constants $0<K_{1}<1<K_{2}$ , and by extracting a subsequence $\theta_{n_{j}}$ it follows that $\theta_{n_{j}}\rightarrow\bar{\theta}$ . The lower semicontinuity of $F$ forces $\frac{\bar{\theta}}{r_{1}}=\frac{\bar{\theta}}{r_{2}}=1$ so that $r_{1}=r_{2}$ . ∎

If the function $F$ has not a strict minimum at $s=1$ , the induced marginal perspective cost can be null even if $r_{1}\neq r_{2}$ . To see this, take $F:[0,+\infty)\rightarrow[0,+\infty)$ defined by

[TABLE]

that gives $H_{0}\equiv 0$ , so that $H(x_{1},r_{1};x_{2},r_{2})\equiv 0$ .

VI Entropy-Transport problem

In this section I consider two discrete spaces $X_{1}=\{x_{1}^{1},x_{1}^{2},..,x_{1}^{m}\}$ and $X_{2}=\{x_{2}^{1},x_{2}^{2},..,x_{2}^{n}\}$ and I let $c:X_{1}\times X_{2}\rightarrow[0,+\infty]$ be a proper (i.e. not identically $+\infty$ ) cost function that I will denote by $c_{i,j}:=c(x_{1}^{i},x_{2}^{j}).$ I will also denote by $\mathcal{M}(X_{i})$ the set of finite, nonnegative measures on $X_{i},\ i=1,2$ (I refer to [15] for a more general topological setting).

Given two finite measures $\mu_{i}\in\mathcal{M}(X_{i}),$ which can be identified with vectors $(r_{1},...,r_{m})\in\mathbb{R}_{+}^{m},(t_{1},...,t_{n})\in\mathbb{R}_{+}^{n}$ by

[TABLE]

the classical Optimal-Transport problem between $\mu_{1}$ and $\mu_{2}$ is defined as the minimization of the functional

[TABLE]

with respect to any positive measure $\boldsymbol{\gamma}\in\mathcal{M}(X_{1}\times X_{2}),$ $\boldsymbol{\gamma}=\sum_{i,j}\gamma_{i,j}\delta_{(x_{1}^{i},x_{2}^{j})},$ that satisfies the marginal constraints

[TABLE]

a condition that forces the measures $\mu_{1},\mu_{2}$ to have equal mass, i.e. $\sum_{i}r_{i}=\sum_{j}t_{j}.$

Optimal Entropy-Transport problems arise naturally when one tries to relax the request on the marginals (91). Let $F$ be a superlinear entropy function, the Optimal Entropy-Transport problem between $\mu_{1}$ and $\mu_{2}$ is defined as the minimization of the functional

[TABLE]

with respect to any positive measure $\boldsymbol{\gamma}\in\mathcal{M}(X_{1}\times X_{2}),$ $\boldsymbol{\gamma}=\sum_{i,j}\gamma_{i,j}\delta_{(x_{1}^{i},x_{2}^{j})}.$

I notice that the presence of the admissible entropy functions $F$ in the cost functional $\mathcal{E}$ penalizes the measures $\boldsymbol{\gamma}$ that do not satisfy the constraints (91) (at least when $F$ have a strict minimum at $1$ ), and it allows to minimize with respect any measure $\boldsymbol{\gamma}\in\mathcal{M}(X_{1}\times X_{2}).$

Given a measure $\boldsymbol{\gamma}\in\mathcal{M}(X_{1}\times X_{2})$ such that

[TABLE]

I call marginal perspective cost functional $\mathcal{H}(\mu_{1},\mu_{2}|\boldsymbol{\gamma})$ the quantity

[TABLE]

An important result (Theorem $5.5$ , [15]) tell us that

[TABLE]

The advantages of the $\mathcal{H}$ -formulation of the problem are based on the homogeneity of the marginal perspective cost, which allows another useful formulation of the problem on the cone space, and the intrinsic metric properties of the function $H$ (see [15] for the special case of the Hellinger-Kantorovich distance and the rest of the present paper for other examples).

It is interesting to notice that one can recover the usual pure entropy problem in the case

[TABLE]

In this case, it is not difficult to show (example E. $5$ , [15]) that, given two measures

[TABLE]

it holds

[TABLE]

where $\gamma_{i}>0,\ i=1,...,m,$ and $f(s)=H_{0}(s,1).$

VII Triangle inequality in the Entropy-Transport case

In this section I deal with the case $X_{1}=X_{2}=X$ , $F=U_{p}$ and $c(x_{1},x_{2})=d^{2}(x_{1},x_{2})$ , where $d:X\times X\rightarrow[0,\infty)$ is a metric on the space $X$ . I denote by $H_{p}$ the induced marginal perspective cost. In the case $p\neq 0,1$ it holds:

[TABLE]

When $p=1$ or $p=0$ one gets:

[TABLE]

From the previous section, taking $X=\{x\}$ , we already know that $H_{p}$ cannot be the square of a metric if $p\in(\frac{1}{2},1)$ . I am going to prove that even for the case $p\leq\frac{1}{2}$ the triangle inequality fails, i.e.

[TABLE]

for given values of $r,s,t,d(x_{1},x_{2}),d(x_{2},x_{3}),d(x_{1},x_{3})$ .

If $0<p\leq\frac{1}{2}$ I choose $r=s=0,t>0$ so that

[TABLE]

The triangle inequality is clearly not satisfied when

[TABLE]

When $p=0$ , I choose again $r=s=0,t>0$ so that

[TABLE]

Once again, the triangle inequality fails for

[TABLE]

If $p<0$ , I choose instead $0<r<s<t$ and

[TABLE]

so that

[TABLE]

It is not difficult to see that the triangle inequality fails when $d(x_{1},x_{3})$ is sufficiently large, because $\mathfrak{M}_{1-p}(r,s)<\mathfrak{M}_{1-p}(r,t)$ and

[TABLE]

when $d(x_{1},x_{3})\rightarrow+\infty$ .

Let us now move to the case $p\geq 1.$

Theorem 8.

Let us suppose $X_{1}=X_{2}=X$ and $c=d^{2}$ for a metric $d$ on $X$ . Then $\sqrt{H_{p}}$ is a metric on the cone $\mathfrak{C}$ for every $p\geq 1$ .

Proof.

The proof is long so I have divided it in different steps:

Step 1. $\mathit{The\ only\ problem\ is\ the\ triangle\ inequality.}$

It is clear that $H_{p}$ is finite and I can apply Theorem 7 so that it remains to prove that the square root of $H_{p}$ satisfies the triangle inequality.

Step 2. $\mathit{Change\ of\ the\ space\ part\ and\ case\ p=1.}$

I use now Lemma 2 in order to change the expression of the function $H_{p}$ in a more familiar one.

Proposition 1.

$H_{p}$ * is the square of a metric on the cone if $\bar{H}_{p}$ is the square of a metric on the cone for every metric $d$ on $X$ , where I put*

[TABLE]

Proof.

In order to apply Lemma 2, in the case $p>1$ I define $f_{p}:[0,+\infty)\rightarrow[0,\frac{\pi}{2}]$ ,

[TABLE]

Thus, I have to show that $f_{p}$ is a concave function and $f_{p}(d)=0$ if and only if $d=0$ . The second statement is obvious, for the first one I notice that it is enough to prove that the function is concave when $d\in\big{(}0,\sqrt{\frac{2}{p-1}}\big{)}$ . Let us compute the second derivative: I put

[TABLE]

so that

[TABLE]

Thus

[TABLE]

Recalling that $d\in\big{(}0,\sqrt{\frac{2}{p-1}}\big{)}$ and $g_{p}(d)\in(0,1)$ , the function $f_{p}$ is concave if and only if

[TABLE]

Since $\frac{p}{p-1}>1$ it holds $(1-(p-1)\frac{d^{2}}{2}\Big{)}^{\frac{p}{p-1}}\geq 1-p\frac{d^{2}}{2}$ by the Bernoulli inequality, so that

[TABLE]

and (115) follows. In the case $p=1$ I have to check that $f_{1}:[0,+\infty)\rightarrow[0,\frac{\pi}{2})$ defined by

[TABLE]

is concave and $f_{1}(d)=0$ if and only if $d=0$ , which is trivial.

∎

It is now clear that $\bar{H}_{1}$ is the square of a metric on the cone space, because (87) is the square of a metric on the cone space and $d\wedge\frac{\pi}{2}=(d\wedge\frac{\pi}{2})\wedge\pi$ is a metric if $d$ is a metric.

Step 3. $\mathit{Triangle\ inequality\ for\ large\ values\ of\ d.}$

From now on, I suppose $p>1$ and I have to show that

[TABLE]

is the square of a metric for any metric $d$ on $X$ .

Lemma 11.

The function

[TABLE]

is increasing in $[0,\infty)$ for $p>1$ .

Proof.

Just notice that

[TABLE]

is decreasing in $[0,\infty)$ . The result follows easily. ∎

In view of the Lemma 11, from now on I also assume

[TABLE]

and I have to prove that for every $p>1$ , for every metric $d$ on $X$ and for every $r,s,t\in[0,+\infty),x_{1},x_{2},x_{3}\in X$ the following triangle inequality holds:

[TABLE]

I start with the case $d(x_{1},x_{2})\geq\frac{\pi}{2}$ and $d(x_{2},x_{3})\geq\frac{\pi}{2}$ . Then

[TABLE]

and the triangle inequality $\eqref{disuguaglianza triangolare caso generale}$ follows easily.

In the case $d(x_{1},x_{2})\leq\frac{\pi}{2}$ and $d(x_{2},x_{3})\geq\frac{\pi}{2}$ it holds

[TABLE]

In view of the Lemma 11 the worst case is when $d(x_{1},x_{2})=0$ , so that it is sufficient to prove

[TABLE]

Using now the Lemma 1, the right hand side of $\eqref{scelgo p=1}$ is not lower than

[TABLE]

hence I have to prove that

[TABLE]

which is obvious in the case $r\leq s$ , on the other hand if $r>s$ one gets

[TABLE]

and taking the square of both sides $\eqref{da elevare al quadrato}$ is trivially proved.

Now I suppose $d(x_{1},x_{3})\geq\frac{\pi}{2}$ , $d(x_{1},x_{2})<\frac{\pi}{2}$ and $d(x_{2},x_{3})<\frac{\pi}{2}$ . Then

[TABLE]

By the same reasoning as before, it is sufficient to show the inequality

[TABLE]

that follows from the triangle inequality for the cone distance $d_{\mathfrak{C}}$ , since

[TABLE]

if $\frac{\pi}{2}\leq d(x_{1},x_{3})<\pi$ .

Step 4. $\mathit{Triangle\ inequality\ with\ }d<\frac{\pi}{2}\ \mathit{and}\ t\leq s$

Thus, I can assume

[TABLE]

Without loss of generality, I can also assume $r<t$ in the inequality (119), so that I have to deal with three cases: $s\leq r$ , $r<s<t$ , $t\leq s$ . In this step of the proof, I start with the latter case:

Lemma 12.

For any fixed $r,t,x_{1},x_{2},x_{3}$ , the function

[TABLE]

is increasing in $[t,+\infty)$ .

Proof.

The result follows if I prove that for any fixed $x_{1},x_{2}$ the function

[TABLE]

is increasing in $[1,+\infty)$ . This easily follows since

[TABLE]

where the last inequality holds since it is equivalent to the following

[TABLE]

∎

Thus, it is sufficient to show the case $s<t$ .

Step 5. $\mathit{Case\ }r<s<t$

I start with a useful lemma:

Lemma 13.

Let $A,B,C$ three non-negative numbers. Then

[TABLE]

if and only if for every $\alpha,\beta\in(0,1)$ such that $\alpha+\beta=1$ we have

[TABLE]

Proof.

Let us suppose $\eqref{caso con radici}$ . Then

[TABLE]

where I have used the Jensen inequality for the convex function $f(x)=x^{2}$ . In order to show that $\eqref{caso al quadrato}\Rightarrow\eqref{caso con radici}$ I notice that if $A=0$ or $B=0$ the result is clearly true, otherwise I choose $\alpha,\beta$ such that $\frac{\sqrt{A}}{\alpha}=\frac{\sqrt{B}}{\beta}$ . Thus

[TABLE]

∎

In order to simplify the notation, from now on I put $d(x_{1},x_{3})=d_{13},\ d(x_{1},x_{2})=d_{12},\ d(x_{2},x_{3})=d_{23}$ . Then, I can use Lemma 13 and the triangle inequality in the case $p=1$ in order to derive a new inequality. Given $\alpha,\beta\in(0,1)$ such that $\alpha+\beta=1$ , one gets:

[TABLE]

where the last inequality in (127) is valid if and only if (using again Lemma 13):

[TABLE]

I notice that $\cos(d_{13})\leq\cos(d_{12})\wedge\cos(d_{23})$ . Thus, it is enough to prove $\eqref{disuguaglianza r<s<t}$ in the case $d_{13}=d_{12}=d_{23}=0$ . Now, I adapt the strategy used in the proof of [11, Lemma 2] I put $u:=\frac{r}{s}\in(0,1)$ , $\beta u:=\frac{t}{s}\in(1,+\infty)$ , so that $\beta$ is a real number greater than $1$ . Thus, $\frac{1}{\beta}<u<1$ and, denoted by $F(s)$ the function

[TABLE]

it follows

[TABLE]

where

[TABLE]

Lemma 14.

The function

[TABLE]

is increasing in $(\frac{1}{\beta},1)$ with only one zero inside the interval, so that $F$ is minimized when $s=r$ or $s=t$ and the inequality $\eqref{disuguaglianza r<s<t}$ holds.

Proof.

Since $g_{p}$ is continuous in $(0,1)$ and $(1,+\infty)$ , it is enough to show that $g_{p}$ is increasing in $(0,1)$ and $(1,+\infty)$ , and

[TABLE]

The limits are easy to compute expanding the function near $u=1$ . When $u\in(0,1)\cup(1,+\infty)$ it follows:

[TABLE]

The proof is complete if I show that

[TABLE]

for any $p>1$ and any positive $u$ .

I put $v=\frac{u^{1-p}+1}{2}$ , so that I have to prove

[TABLE]

for any $p>1$ and $v\in(\frac{1}{2},+\infty)$ . Finally I put $w=\Big{(}\frac{2v-1}{v^{2}}\Big{)}^{\frac{1}{p-1}}\in(0,1)$ and I prove that

[TABLE]

for any $p>1$ and $w\in(0,1)$ . To prove the last inequality, I notice that $h(1)=0$ and $h$ is a decreasing function because

[TABLE]

∎

Step 6. $\mathit{Case\ }s\leq r$

The strategy is to use again Lemma 13 and the triangle inequality for the case $p=1$ , but I have to derive a different inequality with respect to the previous step.

Lemma 15.

I denote with $\theta_{p}:[0,+\infty)\times[0,+\infty)\rightarrow[0,+\infty)$ the function

[TABLE]

Then $\theta_{p}(s,t)\leq\theta_{p}(r,t).$

Proof.

It is sufficient to prove that $\theta_{p}(u,1)$ is increasing in $(0,1)$ . This is easy to prove, indeed

[TABLE]

∎

Let $\alpha,\beta$ be any two numbers in $(0,1)$ such that $\alpha+\beta=1$ . Let us suppose, at first, $\theta_{p}(s,r)\leq\theta_{p}(r,t)$ . Then

[TABLE]

where the first inequality in (133) follows by the triangle inequality for $\sqrt{\bar{H}_{1}}$ and $\sqrt{\mathfrak{M}_{1}}$ , while the second inequality follows since $\theta_{p}(s,r)\leq\theta_{p}(r,t)$ , $\theta_{p}(s,t)\leq\theta_{p}(r,t)$ and $\bar{H}_{1}\leq\mathfrak{M}_{1}$ .

It remains to investigate the case $\theta_{p}(s,r)>\theta_{p}(r,t)$ . Let us suppose

[TABLE]

Then

[TABLE]

where in the first inequality I use $\eqref{disuguaglianza finale}$ , in the second I use the hypothesis $\theta_{p}(s,r)>\theta_{p}(r,t)$ , in the third I reason as in the second step of the inequality (133) in order to replace $\theta_{p}(r,t)$ with $\theta_{p}(s,t)$ .

Finally, the proof is complete if I prove the inequality $\eqref{disuguaglianza finale}$ . Since the case $r=s$ is trivial, I put $u:=\frac{s}{r}<1$ , $v:=\frac{t}{r}>1$ , so that I can rewrite the inequality $\eqref{disuguaglianza finale}$ in the following equivalent way

[TABLE]

Now I use the estimate

[TABLE]

so that it is sufficient to prove that for any $u\in(0,1)$ and any $v\in(1,+\infty)$

[TABLE]

It is easy to see that the last inequality is true at least if $p\geq\frac{3}{2}$ . For example, one can bound the left hand side with

[TABLE]

and the right hand side with

[TABLE]

Then, standard computations show that:

[TABLE]

If $1<p<\frac{3}{2}$ one needs precise bounds that I have found in [21]. The supremum of the left hand side of $\eqref{disuguaglianza finale in u e v}$ is $\frac{4}{p-1}$ . For the right hand side of $\eqref{disuguaglianza finale in u e v}$ one has:

[TABLE]

and again using the results in [21] it is proved that the sharp lower bound for the last expression is $\frac{4}{p-1}$ .

∎

Acknowledgment. The author thanks Prof. Giuseppe Savaré for many valuable suggestions.

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] I. Csiszar, “Eine informationstheoretische ungleichung und ihre anwendung auf den beweis der ergodizitat von markoffschen ketten,” Magyar. Tud. Akad. Mat. Kutato Int. Kozl , vol. 8, pp. 85–108, 1963.
2[2] S. Ali and S. Silvey, “A general class of coefficients of divergence of one distribution from another,” J. Roy. Stat. Soc. Ser. B , vol. 28, pp. 131–142, 1966.
3[3] F. Liese and I. Vajda, “On divergences and informations in statistics and information theory,” IEEE Transactions on Information Theory , vol. 52, pp. 4394–4412, 2006.
4[4] I. Vajda, Theory of statistical inference and information . Springer Netherlands, 1989.
5[5] ——, “ χ α superscript 𝜒 𝛼 \chi^{\alpha} –divergence and generalized fisher’s information,” in Transactions of the Sixth Prague Conference on Information Theory, Statistical Decision Function, Random Processes , 1973, pp. 873–886.
6[6] K. Matusita, “Distances and decision rules,” Annals of the Institute of Statistical Mathematics , vol. 16, pp. 305–320, 1964.
7[7] E. Hellinger, “Neue begründung der theorie quadratischer formen von unendlichvielen veränderlichen,” J. Reine Angew. Math , vol. 136, 1909.
8[8] J. Lin, “Divergence measures based on the shannon entropy,” IEEE Transactions on Information Theory , vol. 37, pp. 145–151, 1991.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Metric properties of homogeneous and spatially inhomogeneous FFF-divergences

Abstract

Index Terms:

I Introduction

II Entropy functions

Lemma 1**.**

III Power means

Theorem 1**.**

IV Costless marginal perspective

IV-A Examples

Example 1**.**

Example 2**.**

Example 3**.**

Example 4**.**

Example 5**.**

Example 6**.**

IV-B Divergences and triangle inequality

Lemma 2**.**

Proof.

Theorem 2**.**

Lemma 3**.**

Proof.

Proof of Theorem 2.

Remark 1**.**

IV-C Marginal perspective function and convergence properties

Lemma 4**.**

Proof.

Lemma 5**.**

Proof.

Theorem 3**.**

Proof.

Lemma 6**.**

Proof.

Theorem 4**.**

Proof.

Remark 2**.**

Lemma 7**.**

Proof.

Theorem 5**.**

Proof.

Remark 3**.**

Lemma 8**.**

Proof.

Lemma 9**.**

Proof.

Theorem 6**.**

Proof.

V Marginal perspective cost

V-A Marginal perspective function

Lemma 10**.**

V-B Induced marginal perspective cost

Theorem 7**.**

Proof.

VI Entropy-Transport problem

VII Triangle inequality in the Entropy-Transport case

Theorem 8**.**

Proof.

Proposition 1**.**

Proof.

Lemma 11**.**

Proof.

Lemma 12**.**

Proof.

Lemma 13**.**

Proof.

Lemma 14**.**

Proof.

Lemma 15**.**

Proof.

Metric properties of homogeneous and spatially inhomogeneous $F$ -divergences

Lemma 1.

Theorem 1.

Example 1.

Example 2.

Example 3.

Example 4.

Example 5.

Example 6.

Lemma 2.

Theorem 2.

Lemma 3.

Remark 1.

Lemma 4.

Lemma 5.

Theorem 3.

Lemma 6.

Theorem 4.

Remark 2.

Lemma 7.

Theorem 5.

Remark 3.

Lemma 8.

Lemma 9.

Theorem 6.

Lemma 10.

Theorem 7.

Theorem 8.

Proposition 1.

Lemma 11.

Lemma 12.

Lemma 13.

Lemma 14.

Lemma 15.