Metric properties of homogeneous and spatially inhomogeneous F-divergences
Nicol\`o De Ponti

TL;DR
This paper explores the properties of F-divergences derived from entropy-transport problems, demonstrating how certain choices lead to metric properties and including well-known divergences like Jensen-Shannon.
Contribution
It introduces the marginal perspective cost function H, analyzes its metric properties, and connects it to classical divergences and the Matusita divergences within the entropy-transport framework.
Findings
H produces symmetric divergences in the entropic case
Certain F-divergences like Jensen-Shannon are analyzed for metric properties
For p>1, the induced cost H_p is the square of a metric on a cone space
Abstract
In this paper I investigate the construction and the properties of the so-called marginal perspective cost , a function related to Optimal Entropy-Transport problems obtained by a minimizing procedure, involving a cost function and an entropy function. In the pure entropic case, which corresponds to the choice , the function naturally produces a symmetric divergence. I consider various examples of entropies and I compute the induced marginal perspective function, which includes some well-known functionals like the Hellinger distance, the Jensen-Shannon divergence and the Kullback-Liebler divergence. I discuss the metric properties of these functions and I highlight the important role of the so-called Matusita divergences. In the entropy-transport case, starting from the power like entropy and the cost for a given metric , the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Metric properties of homogeneous and spatially inhomogeneous -divergences
††thanks: N. De Ponti is with the Department of Mathematics, University of Pavia, Pavia 27100, Italy (e-mail: [email protected])
Nicolò De Ponti
Abstract
In this paper I investigate the construction and the properties of the so-called marginal perspective cost , a function related to Optimal Entropy-Transport problems obtained by a minimizing procedure, involving a cost function and an entropy function. In the pure entropic case, which corresponds to the choice , the function naturally produces a symmetric divergence. I consider various examples of entropies and I compute the induced marginal perspective function, which includes some well-known functionals like the Hellinger distance, the Jensen-Shannon divergence and the Kullback-Liebler divergence. I discuss the metric properties of these functions and I highlight the important role of the so-called Matusita divergences. In the entropy-transport case, starting from the power like entropy and the cost for a given metric , the main result of the paper ensures that for every the induced marginal perspective cost is the square of a metric on the corresponding cone space.
Index Terms:
-divergence, induced marginal perspective cost, Optimal Transport, Optimal Entropy-Transport, triangle inequality, power like entropies, Matusita divergences, Kullback-Liebler divergence, Hellinger distance, total variation.
I Introduction
Given a function , a finite set , and two probability densities
[TABLE]
such that when for every the -divergence of from is defined as
[TABLE]
where \hat{F}(r,t):=F\big{(}\frac{r}{t}\big{)}t is the perspective function induced by (here I am using the convention F\big{(}\frac{0}{0}\big{)}0=0).
Since their introduction by Csiszár [1], Ali and Silvey [2], -divergences have become a fundamental tool in information theory and statistics. They can be interpreted as a sort of "distance function" on the set of probability distributions, even if they do not generally fulfill the symmetric property and the triangle inequality. I refer to Liese and Vajda [3], [4], and references therein for a systematic presentation of these functionals, including the total variation (for ), and the divergences generated by the choice (discussed by Vajda in [5]). Another important class of divergences is represented by the so-called Matusita divergences [6], which include as a particular case the well known Hellinger distance [7].
Starting from a -divergence, there is a simple variational way to generate a new symmetric divergence by setting
[TABLE]
This is related to the marginal perspective function , the lower semicontinuous envelope of the function
[TABLE]
The function obtained in this way is jointly convex, lower semicontinuous and it is zero on the diagonal. As a result, one gets a natural map
[TABLE]
with the additional property
[TABLE]
Using different functions , that I also call entropy functions in the present paper, the minimizing procedure (3) gives raise to well-known statistical functionals.
For the function , the result is the Hellinger distance [7]
[TABLE]
When , one gets the Jensen-Shannon divergence [8]
[TABLE]
The previous examples are taken from the class of the power like entropies
[TABLE]
They give raise to the family of functions
[TABLE]
where the expression is written in the terms of the power mean
[TABLE]
The entropy produces the symmetric Kullback-Leibler divergence [9]
[TABLE]
The marginal perspective function can also be computed starting from non-smooth entropies as , which induces the celebrated total variation distance
[TABLE]
The metric properties of the -divergences have been investigated by many authors like Csiszar, Endres, Kafka, Osterreicher, Schindelin, Vincze ([10], [11], [12], [13], [14]), to cite only a few. In the pure entropic setting, I generalize a previous result of Osterreicher [13] and I prove that, for the power like entropy , the induced function given by (9) is the square of a metric on for every
In the pure entropic case, I also characterize the limit of the sequence and I prove that the total variation and its positive multiples are the only divergences that are also a distance. Under additional assumptions, the convergence properties of the sequence are also studied, where I put , . I will show that this is strictly related to those divergences for which is a distance, and I will emphasize the central role of the class of Matusita divergences.
Recently, -divergences have been considered by Liero, Mielke, Savaré [15] as penalizing functionals in the formulation of Optimal Entropy-Transport problems, a generalization of Optimal-Transport problems obtained by relaxing the marginal constraints. Given a cost function and an admissible entropy function , a crucial role in the theory is played by the induced marginal perspective cost , the lower semicontinuos envelope of the function
[TABLE]
The function remains positively -homogeneous with respect to , a property used in [15] in order to derive a "homogeneous formulation" of Optimal Entropy-Transport problems that allows the study of the metric and dynamical aspects of the theory.
When the starting entropy has a strict minimum at , and the cost is a symmetric function such that if and only if , I will show that the induced marginal perspective cost is symmetric, non-negative and if and only if or .
In the presence of a non-zero cost function , an explicit computation of the induced marginal perspective cost is often unavailable. A special case, central in the study of Optimal Entropy-Transport problems, is given by the choices , for a metric on , and . It holds
[TABLE]
When or , one gets
[TABLE]
Our main theorem states that for any the square root of satisfies the triangle inequality on the cone space over . The latter is the space , where and
[TABLE]
Thus, I provide new examples of entropy-transport metrics besides the Gaussian Hellinger-Kantorovich distance () and the related Hellinger-Kantorovich distance studied in [15]. The class of examples includes, for , a transport variant of the Vincze-Le Cam distance [16], [17],
[TABLE]
This paper is organized as follows.
In Section II, I recall some basic concepts of convex analysis, in particular I discuss the connection between the entropy function and the induced perspective function.
In the third section, I recall the definition of the power means and their main properties. The results in this section will be useful in the study of the marginal perspective cost generated by the power like entropies.
Section IV is devoted to the study of the costless version of the function . I provide a list of examples of admissible entropy functions, which includes indicator functions, divergences, Matusita divergences, power like entropies and other two families of convex functions that I have called power-logarithmic entropies and double power entropies. Then, I compute the induced marginal perspective function and I discuss the metric properties of the function obtained starting from some of the previous examples. Finally, I study the convergence properties of the iteration of the minimizing procedure (3) and I will highlight the role of the class of Matusita divergences.
In the fifth section I introduce the notion of homogeneous marginal perspective cost and I discuss its main properties.
In section VI, I present the Optimal Entropy-Transport problem and I briefly motivate the "homogeneous formulation" of this problem, via the homogeneous marginal perspective cost.
In the last section I focus on the marginal perspective cost induced by the power like entropy and by the cost , for a given metric . I prove the main theorem of the paper, which ensures that the function is the square of a metric on the corresponding cone space.
For the sake of simplicity, I limit the discussion to finite nonnegative measures over finite discrete set, but the results can be generalized to finite nonnegative Radon measures over Hausdorff topological spaces (see [15]). I plan to address this case in a future work.
In this paper, a real function is increasing (resp. decreasing) if for any we have (resp. ).
II Entropy functions
A function belongs to the class of admissible entropy functions if is convex, lower semicontinuous and . The domain of the function is the set
[TABLE]
Let , the recession function and the recession constant are defined by
[TABLE]
The perspective function induced by is the function , given by
[TABLE]
is jointly convex, lower semicontinuous and for any .
The right derivative at [math], and the asymptotic affine coefficient are defined by
[TABLE]
[TABLE]
which are well posed due to the convexity of .
The Legendre conjugate function is defined by
[TABLE]
is the conjugate of the convex function obtained by extending to for negative arguments. It is convex and lower semicontinuous. Concerning the behavior of , the following Lemma holds ([15], section ):
Lemma 1**.**
The function is an increasing homeomorphism between and with .
The reverse entropy function is defined by
[TABLE]
so that In particular, is convex, lower semicontinuous and the map is an involution of . Moreover, it holds and the function satisfies
[TABLE]
Starting from a function , a finite set , and two probability densities
[TABLE]
the -divergence of from is given by
[TABLE]
The Legendre conjugates of and are related by
[TABLE]
III Power means
In this section I study the power means (also called generalized means), a family of functions that includes the well-known arithmetic, geometric and harmonic means. The property of these functions will be useful later on.
In what follows will denote two non-negative real numbers and a real parameter, which I suppose for the present not to be [math]. The -power mean between and is given by
[TABLE]
except when and or is zero. In this case is equal to zero:
[TABLE]
In the case I put
[TABLE]
so that
It is easy to see that for every and every . The function is symmetric, i.e. and positively -homogeneous in the sense that for every Moreover, it is not difficult to prove that for every , and
is the well-known arithmetic mean, is the geometric mean and is called harmonic mean.
The main theorem (see [18] for a proof) regarding the power means is the following:
Theorem 1**.**
If then
[TABLE]
with the case of equality given by , or and .
In particular,
[TABLE]
for any ,
IV Costless marginal perspective
Let be an admissible entropy function and let be its reverse entropy. In general, for the induced perspective function one has , so that the -divergence does not satisfy the symmetric property. In order to replace with a new "symmetric entropy", a natural procedure is the following: define the marginal perspective function as the lower semicontinuous envelope of the function
[TABLE]
An equivalent definition can be given in term of the induced perspective functions or by:
[TABLE]
The infimum in the definition is a minimum and it occurs in the interval (without loss of generality I am assuming ): to see this it is enough to notice that the function is lower semicontinuous and it is decreasing in and increasing in . I will prove in section V (in a more general context), that the function is non-negative, symmetric, jointly convex and positively -homogeneous. Moreover, when the function has a strict minimum at , if and only if . It is important to notice, since is -homogeneous, that the study of the function is equivalent to the study of the -variable function I will continuously use this fact in the paper.
IV-A Examples
I consider now different examples of admissible entropy function and I compute the expression of the induced marginal perspective . I will in general suppose , so that I can avoid ambiguous expressions at the boundary of the domain that should be treated carefully.
Example 1**.**
(Indicator functions) The indicator function of the closed interval with endpoints and , , is defined by
[TABLE]
When one obtains
[TABLE]
where if and if .
Example 2**.**
( divergences) Given a parameter , the divergence is defined as
[TABLE]
* is the famous total variation entropy.*
The entropy function gives raise to the marginal perspective function
[TABLE]
We can recognize the expression of the so-called Puri-Vincze divergence.
Example 3**.**
(Matusita divergences) For the Matusita divergence is given by . Clearly
When it is easy to see that
[TABLE]
It is interesting to note that except for the constant factor , the Matusita function remains invariant after the minimizing procedure (33). I will come back to this point in section IV-C.
Example 4**.**
(Power like entropies) Let be any real number. I call power-like entropy of order the function characterized by
[TABLE]
The function can be computed explicitly and one gets:
[TABLE]
with for and for . This family of functions, also called Dichotomy Class, was introduced by Liese and Vajda [19],[4].
Given , we obtain the following expression:
[TABLE]
We can recognize some well-known statistical functionals: for example in the logarithmic entropy case it appears the Hellinger distance
[TABLE]
I have already notice that the same function is obtained starting from the entropy .
For we have the Jensen-Shannon divergence, a squared distance between measures derived from the Kullback-Leibler divergence ([11]).
The quadratic entropy gives raise to the triangular discrimination
[TABLE]
Example 5**.**
(Power-logarithmic entropies) Given a real number , I call power-logarithmic entropy of order the function
[TABLE]
and It is easy to see that and .
Starting from the power-logarithmic entropy of order one gets:
[TABLE]
As expected, since . When , one obtains the symmetric Kullback-Leibler divergence [9]:
[TABLE]
Example 6**.**
(Double power entropies) Given two parameters such that and , or , the double power entropy of order is given by
[TABLE]
* is a strictly convex function, , and it is extendex in [math] by continuity so that when are positive, when *
A direct computation shows that:
[TABLE]
For example, when one gets
[TABLE]
IV-B Divergences and triangle inequality
As we have previously seen, starting from a function such that if and only if , the marginal perspective function is non-negative, symmetric and if and only if (if no confusion is possible, from now on I will denote by the function ). In this section I begin the discussion regarding another property that has to fulfill in order to be a metric on : the triangle inequality.
When I write " is a metric on a space " I mean that is a function such that if and only if , it is symmetric, i.e. for every , and it satisfies the triangle inequality in the sense that for every .
Since I will prove that the only divergence that is also a distance is the total variation, I will also discuss when the power , , is a metric on .
The convexity of the function implies that
[TABLE]
I recall this simple Lemma:
Lemma 2**.**
Let be a metric space and be a concave function such that if and only if . Then is a metric space.
Proof.
and if and only if which implies . It is clear that is symmetric. Since is concave and for every it follows that is increasing and subadditive, thus
[TABLE]
∎
An easy consequence of the Lemma is that if is a metric, then is a metric for every .
Using the symmetry, the -homogeneity of the function together with the property (52), it follows that the triangle inequality for the function is equivalent to the following inequality
[TABLE]
A last useful remark is that
[TABLE]
is a necessary condition for the existence of a power such that is a metric.
Regarding the examples previously seen, it was proved by Kafka, Osterreicher and Vincze [12] that is a metric when
The Matusita divergences clearly provide the distance .
When , so that, except for the case , the power-logarithmic entropy is not a metric for every power .
I now turn the attention to the function . It has the following expression
[TABLE]
that is also valid when with the convention .
As I have already notice, is the square of a metric on for . I investigate now the same question for every real number This was already done by Osterreicher in the case [13]. Following the same approach I prove:
Theorem 2**.**
The induced marginal perspective function is the square of a metric on for any . does not satisfy the triangle inequality if
For the proof of the Theorem I will use the following lemma. It is the first example in the paper of a fact that will be recurrent: the central role of the class of Matusita divergences in the study of the metric properties of the marginal perspective function.
Lemma 3**.**
Given a number and an induced marginal perspective function , if
[TABLE]
is decreasing in then satisfies the triangle inequality.
Proof.
Due to the monotonicity of the square root function, one has that
[TABLE]
is decreasing in , so that and if . It follows that
[TABLE]
∎
Proof of Theorem 2.
Using now Lemma 3, it remains to show that the function
[TABLE]
is decreasing in , where I have used the notation The derivative of the function is the following:
[TABLE]
where I set
[TABLE]
Note that and satisfies:
[TABLE]
The function is such that and
[TABLE]
Now let us suppose : I have to prove that is positive in . This is implied by in which is true because is positive in . Similar considerations can be applied to the case and
For one gets in so is positive in . This implies that is negative and so is increasing in . As a consequence, an analysis of the proof of Lemma 3 shows that the triangle inequality is reversed for these values of . ∎
Remark 1**.**
It was proved by Osterreicher and Vajda ([14]) that, if , is a metric.
IV-C Marginal perspective function and convergence properties
We have seen that the construction of the marginal perspective function naturally produces a symmetric divergence. In this section I will show that this is not the only feature of the minimization procedure (33): iterating this process I will highlight the important role of the class of Matusita divergences.
I define the space as the set of functions such that is equals to its reverse entropy .
At the beginning of section IV we have seen how to generate a map : starting from a function , I define , where is the lower semicontinuous envelope of the function obtained by (33). I also denote by the map given by for every .
It is clear that the two trivial entropies
[TABLE]
are fixed points of the map for any .
Another important property that follows immediately from the definition is that
[TABLE]
Due to the difference between the case and the case , I have divided the analysis of the behaviour of the map . Nevertheless, the strategy behind the proofs is in common: I will show that, under suitable conditions, the sequence is monotone and the pointwise limit is a fixed point of the map . I then prove that implies (in the case I mean that ).
I start with a simple Lemma that provides a crucial monotonicity property.
Lemma 4**.**
For any , if satisfies the triangle inequality then .
Proof.
For any the convexity of the function yields
[TABLE]
The result follows by taking the infimum of the left hand side with respect to . ∎
Lemma 5**.**
Given a function the sequence is decreasing and it converges pointwise to a fixed point of the map .
Proof.
Since the map is equals to when or , it follows that for any . Thus, for any the sequence is a decreasing sequence bounded from below and thus it has a limit that I denote by , and it is clear that . The limit function is a fixed point of : with the same reasoning as at the beginning of the proof, one gets ; for the reverse inequality I notice that for any and any it holds
[TABLE]
where is the perspective function induced by . The result follows taking the limit with respect to and then minimizing with respect to . ∎
Theorem 3**.**
The only fixed point of the map are the functions of the form where . In particular, an induced marginal perspective function is a metric on if and only if , .
Proof.
It is clear that the function is a fixed point of for any . I show now that they are the only fixed points: since is a convex function that has the same value when and , implies that for any . Using the homogeneous property of the function , this is equivalent to the fact that for any . In particular, taking any , and , it holds
[TABLE]
By taking the difference of the previous equations, one gets F\big{(}\frac{t}{2}\big{)}=F(2)\big{(}\frac{t}{2}-1\big{)} for any so that, using again the homogeneity and the symmetry, for any .
In order to conclude the proof I notice that if is a metric then, using Lemma 4, it follows . Since Lemma 5 provides the converse inequality, is a fixed point of and the only fixed points that induces a metric on are the functions of the form with ∎
In order to deal with the case I need some preliminary results and some additional assumptions. I start by proving that every metric of the form , , is a complete metric.
Lemma 6**.**
Let and let us suppose that is a metric for a number . Then it exists such that
[TABLE]
and is a complete metric.
Proof.
For any I rewrite the distance between and as
[TABLE]
where Since the triangle inequality holds, at least one of the numbers and is less or equal than . Choosing , it follows for any . By contradiction let us suppose it does not exists a positive constant such that , then it exists a sequence such that . So, I can find a such that . On the other hand, since the sequence defined by , converges to [math], by continuity of the function we have that which is a contradiction since and is decreasing.
Now it is easy to show that the metric is complete: since is a metric, is symmetric and . From the convexity of the function it follows so that
[TABLE]
The result follows using the fact thta are two complete metrics that induce the same convergence. ∎
Recall that, given a metric space and the interval , a curve is a constant speed geodesic if
[TABLE]
A metric space is a geodesic space if for every pair of points it exists a constant speed geodesic between and . A well-known fact is that a complete metric space is a geodesic space if and only if for every pair of points it exists such that . The point is called mid-point between and .
I am now ready to prove the analogous of Theorem 3 in the case , under an additional assumption.
Theorem 4**.**
Let and let us suppose that , , is a distance and . Then for a constant .
Proof.
Since one has that for any it exists such that
[TABLE]
Using the fact that is a metric and the concavity of the function one gets
[TABLE]
Equation (67) implies the equality in the inequality (68), in particular
Since are two arbitrary points and is a complete metric from Lemma 6, it follows that is a one dimensional geodesic space, so it must be isometric to (for a reference see [20], chapter ). In particular it exists increasing and continuous such that I can write . From the -homogeneity of the function , it follows for , so that
[TABLE]
Evaluating equation (69) for I get
[TABLE]
whereas the choice yields
[TABLE]
Now consider the previous equation with , it follows
[TABLE]
Using now the identities (70) and (72), it follows
[TABLE]
and I can compute as
[TABLE]
so that for any , which prove the theorem. ∎
Remark 2**.**
I do not know if the assumption that is a metric can be removed in order to obtain the same characterization as in Theorem 3. The difficulty is that the value of the function at and is strictly greater that , unless .
In order to obtain that also in the case the limit function is a fixed point of the map , I need the following Lemma:
Lemma 7**.**
Let be a compact space and let be a sequence of lower semicontinuous functions such that for every and every . Then
[TABLE]
where I put
Proof.
The functions and are lower semicontinuous over a compact set so that they have a minimum. Since for every it is clear that
[TABLE]
Let us suppose now , so that for every . Since , it exists such that . It follows that the family is an open cover of . Let be a finite collection of indexes such that
[TABLE]
Let , so that since are increasing. This implies that for every so that . Since is an arbitrary number less than , the Lemma follows. ∎
I can now state the Theorem about the convergence of the iterations of the map .
Theorem 5**.**
Let . Given a function , if is a metric then the sequence converges pointwise to a fixed point of the map . In particular, if the limit function is such that is a metric, then where
Proof.
Lemma 4 implies that . By the monotonicity property (63) the sequence is increasing so it converges pointwise to a function . Since is a metric, is convex and finite everywhere (thus continuous), as well as . I want to show that is a fixed point of :
[TABLE]
where I have denoted by the lower semicontinuous envelope of the function and I have used Lemma 7 applied to and . The conclusion follows from Theorem 4. ∎
Remark 3**.**
It is not difficult to show that can be equal to . For example, take and consider the sequence with
In the final part of this section I want to study the connection between the behaviour of the function in a neighborhood of and the limit function . I start with two lemmas:
Lemma 8**.**
Let , , and be the function defined by
[TABLE]
Then
Proof.
It is sufficient to consider the case ; by definition we have
[TABLE]
When it is clear that . Moreover, I notice that in the case
[TABLE]
the expression (73) is minimized by , so that for such an . Using now the bound given by Theorem 1, I deduce that the inequalities (74) are certainly satisfied when . The theorem is now an easy consequence of the fact that the sequence , is strictly increasing and it diverges to . ∎
Lemma 9**.**
Let , , and be the function defined by when and extended linearly outside in such a way that the left derivative of at is the slope of the linear extension in . Then
Proof.
The lemma follows if I prove that
[TABLE]
for every . Indeed (75) implies that is a distance, so that, by Theorem 5, must converge to a function that is a fixed point of . Since for every and every , it holds for every and this implies that for every . Indeed, let us suppose by contradiction it exists such that and consider the constant such that Since and are fixed points of and they coincide in , it must exists another number , , where they coincide. Iterating the argument it is easy to show that and have to coincide on a sequence of numbers that converges to but this is absurd since for every and the functions and coincide only at
It remains to show that (75) holds. I use Lemma 3: I have to prove that the function
[TABLE]
is increasing in : this is obvious in the interval ; consider now two numbers such that . I define to be the affine function that coincide with at and such that , and I notice that the convexity of the function implies that the slope of is greater or equal than the positive slope of the function in . Using again the convexity of the function and the trivial fact that the quotient
[TABLE]
is increasing in , I conclude because
[TABLE]
∎
Theorem 6**.**
Let be a function such that
[TABLE]
Then
[TABLE]
Proof.
For every it exists a such that
[TABLE]
so that
[TABLE]
where are defined in Lemma 8 and 9. Take now an arbitrary , from the monotonicity property (63) it follows
[TABLE]
so that by Lemma 8 and Lemma 9 one gets
[TABLE]
Since is arbitrary, it exists the limit of and it is equal to .
∎
V Marginal perspective cost
V-A Marginal perspective function
In this section I introduce the marginal perspective cost. I will modify the definition of marginal perspective function that we have seen in section IV in order to take into account the presence of a cost function. The construction is motivated by the study of optimal entropy-transport problem (see [15], section , and the section VI of the present paper).
First of all, given a number and an admissible entropy function , the marginal perspective function is defined as the lower semicontinuous envelope of the function
[TABLE]
where is the reverse entropy function of . Of course, the function coincides with the marginal perspective function introduced in section IV. When the numbers are positive, the function can be also computed as
[TABLE]
or in terms of the perspective function as
[TABLE]
For I set
[TABLE]
The following lemma, proved in [15] (lemma ), gives a dual characterization of :
Lemma 10**.**
For every the function can be represented as
[TABLE]
In particular, the marginal perspective function is lower semicontinuous, convex and positively -homogeneous with respect to , increasing and concave with respect to . Moreover, coincides with in the interior of its domain.
V-B Induced marginal perspective cost
When is a function , the induced marginal perspective cost is the function defined as
[TABLE]
A particularly important case is when and is induced by a metric on .
Given a metric space , I am interested in determining when the function is the power of a metric on the corresponding cone space. The latter is the space , where and
[TABLE]
It is important to highlight that the space can be endowed with a "natural" metric (see [20], Prop. ):
[TABLE]
Theorem 7**.**
Let be an admissible entropy function with a strict minimum at and let be a symmetric function such that if and only if . Then the induced marginal perspective cost is symmetric, non-negative and if and only if . In particular, is a well defined function on the cone .
Proof.
Since and it is clear that . Moreover, when it follows from the dual representation (84) that . If and then and the fact that the marginal perspective cost is null follows from the possible choice in the expression (79). Since is symmetric it is clear that
[TABLE]
It remains to prove that implies . Lemma 1 and equation (25) tell us that is an increasing homeomorphism between and with . Since is a convex function with a strict minimum at , it holds . In particular, it exists a positive number such that the function is finite, continuous and strictly increasing in . Hence, it follows again from the representation (84) that and implies . Moreover, when we must have : suppose by contradiction that (the other case is similar), in the equation (84) we find such that , contradicting the fact . Finally, when , and are positive I can prove that using the fact that implies because, using now the expression (80), I know that for every natural it exists such that
[TABLE]
In particular, for large enough, for some constants , and by extracting a subsequence it follows that . The lower semicontinuity of forces so that . ∎
If the function has not a strict minimum at , the induced marginal perspective cost can be null even if . To see this, take defined by
[TABLE]
that gives , so that .
VI Entropy-Transport problem
In this section I consider two discrete spaces and and I let be a proper (i.e. not identically ) cost function that I will denote by I will also denote by the set of finite, nonnegative measures on (I refer to [15] for a more general topological setting).
Given two finite measures which can be identified with vectors by
[TABLE]
the classical Optimal-Transport problem between and is defined as the minimization of the functional
[TABLE]
with respect to any positive measure that satisfies the marginal constraints
[TABLE]
a condition that forces the measures to have equal mass, i.e.
Optimal Entropy-Transport problems arise naturally when one tries to relax the request on the marginals (91). Let be a superlinear entropy function, the Optimal Entropy-Transport problem between and is defined as the minimization of the functional
[TABLE]
with respect to any positive measure
I notice that the presence of the admissible entropy functions in the cost functional penalizes the measures that do not satisfy the constraints (91) (at least when have a strict minimum at ), and it allows to minimize with respect any measure
Given a measure such that
[TABLE]
I call marginal perspective cost functional the quantity
[TABLE]
An important result (Theorem , [15]) tell us that
[TABLE]
The advantages of the -formulation of the problem are based on the homogeneity of the marginal perspective cost, which allows another useful formulation of the problem on the cone space, and the intrinsic metric properties of the function (see [15] for the special case of the Hellinger-Kantorovich distance and the rest of the present paper for other examples).
It is interesting to notice that one can recover the usual pure entropy problem in the case
[TABLE]
In this case, it is not difficult to show (example E., [15]) that, given two measures
[TABLE]
it holds
[TABLE]
where and
VII Triangle inequality in the Entropy-Transport case
In this section I deal with the case , and , where is a metric on the space . I denote by the induced marginal perspective cost. In the case it holds:
[TABLE]
When or one gets:
[TABLE]
From the previous section, taking , we already know that cannot be the square of a metric if . I am going to prove that even for the case the triangle inequality fails, i.e.
[TABLE]
for given values of .
If I choose so that
[TABLE]
The triangle inequality is clearly not satisfied when
[TABLE]
When , I choose again so that
[TABLE]
Once again, the triangle inequality fails for
[TABLE]
If , I choose instead and
[TABLE]
so that
[TABLE]
It is not difficult to see that the triangle inequality fails when is sufficiently large, because and
[TABLE]
when .
Let us now move to the case
Theorem 8**.**
Let us suppose and for a metric on . Then is a metric on the cone for every .
Proof.
The proof is long so I have divided it in different steps:
Step 1.
It is clear that is finite and I can apply Theorem 7 so that it remains to prove that the square root of satisfies the triangle inequality.
Step 2.
I use now Lemma 2 in order to change the expression of the function in a more familiar one.
Proposition 1**.**
* is the square of a metric on the cone if is the square of a metric on the cone for every metric on , where I put*
[TABLE]
Proof.
In order to apply Lemma 2, in the case I define ,
[TABLE]
Thus, I have to show that is a concave function and if and only if . The second statement is obvious, for the first one I notice that it is enough to prove that the function is concave when d\in\big{(}0,\sqrt{\frac{2}{p-1}}\big{)}. Let us compute the second derivative: I put
[TABLE]
so that
[TABLE]
[TABLE]
[TABLE]
Thus
[TABLE]
Recalling that d\in\big{(}0,\sqrt{\frac{2}{p-1}}\big{)} and , the function is concave if and only if
[TABLE]
Since it holds (1-(p-1)\frac{d^{2}}{2}\Big{)}^{\frac{p}{p-1}}\geq 1-p\frac{d^{2}}{2} by the Bernoulli inequality, so that
[TABLE]
and (115) follows. In the case I have to check that defined by
[TABLE]
is concave and if and only if , which is trivial.
∎
It is now clear that is the square of a metric on the cone space, because (87) is the square of a metric on the cone space and is a metric if is a metric.
Step 3.
From now on, I suppose and I have to show that
[TABLE]
is the square of a metric for any metric on .
Lemma 11**.**
The function
[TABLE]
is increasing in for .
Proof.
Just notice that
[TABLE]
is decreasing in . The result follows easily. ∎
In view of the Lemma 11, from now on I also assume
[TABLE]
and I have to prove that for every , for every metric on and for every the following triangle inequality holds:
[TABLE]
I start with the case and . Then
[TABLE]
and the triangle inequality follows easily.
In the case and it holds
[TABLE]
In view of the Lemma 11 the worst case is when , so that it is sufficient to prove
[TABLE]
Using now the Lemma 1, the right hand side of is not lower than
[TABLE]
hence I have to prove that
[TABLE]
which is obvious in the case , on the other hand if one gets
[TABLE]
and taking the square of both sides is trivially proved.
Now I suppose , and . Then
[TABLE]
By the same reasoning as before, it is sufficient to show the inequality
[TABLE]
that follows from the triangle inequality for the cone distance , since
[TABLE]
if .
Step 4.
Thus, I can assume
[TABLE]
Without loss of generality, I can also assume in the inequality (119), so that I have to deal with three cases: , , . In this step of the proof, I start with the latter case:
Lemma 12**.**
For any fixed , the function
[TABLE]
is increasing in .
Proof.
The result follows if I prove that for any fixed the function
[TABLE]
is increasing in . This easily follows since
[TABLE]
where the last inequality holds since it is equivalent to the following
[TABLE]
∎
Thus, it is sufficient to show the case .
Step 5.
I start with a useful lemma:
Lemma 13**.**
Let three non-negative numbers. Then
[TABLE]
if and only if for every such that we have
[TABLE]
Proof.
Let us suppose . Then
[TABLE]
where I have used the Jensen inequality for the convex function . In order to show that I notice that if or the result is clearly true, otherwise I choose such that . Thus
[TABLE]
∎
In order to simplify the notation, from now on I put . Then, I can use Lemma 13 and the triangle inequality in the case in order to derive a new inequality. Given such that , one gets:
[TABLE]
where the last inequality in (127) is valid if and only if (using again Lemma 13):
[TABLE]
I notice that . Thus, it is enough to prove in the case . Now, I adapt the strategy used in the proof of [11, Lemma 2] I put , , so that is a real number greater than . Thus, and, denoted by the function
[TABLE]
it follows
[TABLE]
where
[TABLE]
Lemma 14**.**
The function
[TABLE]
is increasing in with only one zero inside the interval, so that is minimized when or and the inequality holds.
Proof.
Since is continuous in and , it is enough to show that is increasing in and , and
[TABLE]
The limits are easy to compute expanding the function near . When it follows:
[TABLE]
The proof is complete if I show that
[TABLE]
for any and any positive .
I put , so that I have to prove
[TABLE]
for any and . Finally I put w=\Big{(}\frac{2v-1}{v^{2}}\Big{)}^{\frac{1}{p-1}}\in(0,1) and I prove that
[TABLE]
for any and . To prove the last inequality, I notice that and is a decreasing function because
[TABLE]
∎
Step 6.
The strategy is to use again Lemma 13 and the triangle inequality for the case , but I have to derive a different inequality with respect to the previous step.
Lemma 15**.**
I denote with the function
[TABLE]
Then
Proof.
It is sufficient to prove that is increasing in . This is easy to prove, indeed
[TABLE]
∎
Let be any two numbers in such that . Let us suppose, at first, . Then
[TABLE]
where the first inequality in (133) follows by the triangle inequality for and , while the second inequality follows since , and .
It remains to investigate the case . Let us suppose
[TABLE]
Then
[TABLE]
where in the first inequality I use , in the second I use the hypothesis , in the third I reason as in the second step of the inequality (133) in order to replace with .
Finally, the proof is complete if I prove the inequality . Since the case is trivial, I put , , so that I can rewrite the inequality in the following equivalent way
[TABLE]
Now I use the estimate
[TABLE]
so that it is sufficient to prove that for any and any
[TABLE]
It is easy to see that the last inequality is true at least if . For example, one can bound the left hand side with
[TABLE]
and the right hand side with
[TABLE]
Then, standard computations show that:
[TABLE]
If one needs precise bounds that I have found in [21]. The supremum of the left hand side of is . For the right hand side of one has:
[TABLE]
and again using the results in [21] it is proved that the sharp lower bound for the last expression is .
∎
Acknowledgment. The author thanks Prof. Giuseppe Savaré for many valuable suggestions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] I. Csiszar, “Eine informationstheoretische ungleichung und ihre anwendung auf den beweis der ergodizitat von markoffschen ketten,” Magyar. Tud. Akad. Mat. Kutato Int. Kozl , vol. 8, pp. 85–108, 1963.
- 2[2] S. Ali and S. Silvey, “A general class of coefficients of divergence of one distribution from another,” J. Roy. Stat. Soc. Ser. B , vol. 28, pp. 131–142, 1966.
- 3[3] F. Liese and I. Vajda, “On divergences and informations in statistics and information theory,” IEEE Transactions on Information Theory , vol. 52, pp. 4394–4412, 2006.
- 4[4] I. Vajda, Theory of statistical inference and information . Springer Netherlands, 1989.
- 5[5] ——, “ χ α superscript 𝜒 𝛼 \chi^{\alpha} –divergence and generalized fisher’s information,” in Transactions of the Sixth Prague Conference on Information Theory, Statistical Decision Function, Random Processes , 1973, pp. 873–886.
- 6[6] K. Matusita, “Distances and decision rules,” Annals of the Institute of Statistical Mathematics , vol. 16, pp. 305–320, 1964.
- 7[7] E. Hellinger, “Neue begründung der theorie quadratischer formen von unendlichvielen veränderlichen,” J. Reine Angew. Math , vol. 136, 1909.
- 8[8] J. Lin, “Divergence measures based on the shannon entropy,” IEEE Transactions on Information Theory , vol. 37, pp. 145–151, 1991.
