Inequalities related to some types of entropies and divergences
Shigeru Furuichi, Nicu\c{s}or Minculete

TL;DR
This paper explores mathematical properties and bounds of various extended entropies and divergences, including Tsallis, biparametrical, and quantum entropies, using inequalities like Hermite-Hadamard.
Contribution
It introduces new bounds and inequalities for extended entropies and divergences, including biparametrical and quantum types, expanding theoretical understanding.
Findings
New bounds for Tsallis quasilinear entropy and divergence.
Bounds for biparametrical extended entropies and divergences.
Inequalities for extended Lin's divergence and characterizations of quantum entropies.
Abstract
The aim of this paper is to discuss new results concerning some kinds of parametric extended entropies and divergences. As a result of our studies for mathematical properties on entropy and divergence, we give new bounds for the Tsallis quasilinear entropy and divergence by applying the Hermite-Hadamard inequality. We also give bounds for biparametrical extended entropies and divergences which have been given in \cite{7}. In addition, we study -quasilinear entropies and divergences as alternative biparametrical extended entropy and divergence, and then we give bounds for them. Finally we obtain inequalities for an extended Lin's divergence and some characterizations of Fermi-Dirac entropy and Bose-Einstein entropy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Inequalities related to some types of entropies and divergences
Shigeru Furuichi1111E-mail:[email protected] and Nicuşor Minculete2222E-mail:[email protected]
1Department of Information Science,
College of Humanities and Sciences, Nihon University,
3-25-40, Sakurajyousui, Setagaya-ku, Tokyo, 156-8550, Japan
2Transilvania University of Braşov, Braşov, 500091, Romania
Abstract. The aim of this paper is to discuss new results concerning some kinds of parametric extended entropies and divergences. As a result of our studies for mathematical properties on entropy and divergence, we give new bounds for the Tsallis quasilinear entropy and divergence by applying the Hermite-Hadamard inequality. We also give bounds for biparametrical extended entropies and divergences which have been given in [1]. In addition, we study -quasilinear entropies and divergences as alternative biparametrical extended entropy and divergence, and then we give bounds for them. Finally we obtain inequalities for an extended Lin’s divergence and some characterizations of Fermi-Dirac entropy and Bose-Einstein entropy.
**Keywords : ** Shannon entropy, divergence (relative entropy), Tsallis entropy, Rényi entropy, biparametrical extended entropy and biparametrical extended divergence
**2010 Mathematics Subject Classification : ** Primary 46C05, secondary 26D15, 26D10.
1 Introduction
Generalized entropies have been studied by many researchers (we refer the interested readers to [2]). Rényi [3] and Tsallis [4] entropies are well known as one-parameter generalizations of Shannon entropy, being intensively studied not only in the field of classical statistical physics [5, 6, 7], but also in the field of quantum physics in relation to the entanglement [8].
The Tsallis entropy is a natural one-parameter extended form of the Shannon entropy, hence it can be applied to known models which describe systems of great interest in atomic physics [9]. However, to our best knowledge, the physical relevance of a parameter of the Tsallis entropy was highly debated and it has not been completely clarified yet, the parameter being considered as a measure of the non-extensivity of the system under consideration.
One of the authors of the present paper studied the Tsallis entropy and the Tsallis divergence from a mathematical point of view. Firstly, fundamental properties of the Tsallis divergence were discussed in [10]. The uniqueness theorem for the Tsallis entropy and Tsallis divergence was studied in [11]. Following this result, an axiomatic characterization of a biparametrical extended divergence was given in [1]. In [12], information theoretical properties of the Tsallis entropy and some inequalities for conditional and joint Tsallis entropies were derived. In [13], matrix trace inequalities for the Tsallis entropy were studied. And, in [14], the maximum entropy principle for the Tsallis entropy and the minimization of the Fisher information in Tsallis statistics were studied.
Quite recently, we provided mathematical inequalities for some divergences in [15], considering that it is important to study the mathematical inequalities for the development of new entropies. We show several results from our paper [16], here we define a further generalized entropy based on Tsallis and Rényi entropies and study mathematical properties by the use of scalar inequalities to develop the theory of entropies. While we applied the Young inequality in [15] and Jensen type inequality in [16] to obtain the inequalities for entropies and divergences, we apply the Hermite-Hadamard inequality with the integral relation
[TABLE]
to obtain some new results in the present paper, where the -logarithmic function is defined by (). We also study two different kinds of biparametical extended entropies and divergences in Section 3.
We start from the weighted quasilinear mean (see [17, p.677] for example) for some continuous and strictly monotonic function , defined by
[TABLE]
where for . If we take , then coincides with the weighted arithmetic mean . If we take , then coincides with the weighted geometric mean . If and , then is equal to Tsallis entropy [4]:
[TABLE]
where is a probability distribution with for all . Since the -logarithmic function uniformly converges to the usual logarithmic function in the limit , Tsallis entropy converges to Shannon entropy in the limit :
[TABLE]
Thus, it is known that Tsallis entropy is one of the generalizations of Shannon entropy. It is also known that Rényi entropy [3] is a generalization of Shannon entropy. Hereafter we use the notations and with and for , as probability distributions. Here, we review a quasilinear entropy [2] as another generalization of Shannon entropy. For a continuous and strictly monotonic function on , the quasilinear entropy is given by
[TABLE]
If we take in (4), then I_{1}^{\log}\big{(}\mathbf{p}\big{)}=H_{1}(\mathbf{p}). We may also redefine the quasilinear entropy by
[TABLE]
for a continuous and strictly monotonic function on \big{(}0,\infty\big{)}. If we take in (5), then we have . The case is also useful in practice, since we recapture the Rényi entropy, namely I_{1}^{x^{1-q}}\big{(}\mathbf{p}\big{)}=R_{q}(\mathbf{p}), where the Rényi entropy [3] is defined by
[TABLE]
The generalized entropies involving Tsallis entropies and quasilinear entropies were stuided in [19] by the use of refined Young inequality.
Definition 1.0.1
For a continuous and strictly monotonic function on and two probability distributions and with for all , the quasilinear divergence is defined by
[TABLE]
The quasilinear divergence coincides with the usual divergence if , i.e.,
[TABLE]
We denote by the Rényi divergence [1] defined by
[TABLE]
This is another particular case of the quasilinear divergence, namely, for \psi\big{(}x\big{)}=x^{1-q}, we have
[TABLE]
From [1], we denote by
[TABLE]
the Tsallis divergence, which can be written with -logarithm as follows:
[TABLE]
The Tsallis divergence converges to the usual divergence (relative entropy, Kullback-Leibler information) as :
[TABLE]
Another divergence (relative entropy) is called -divergence [1], given by
[TABLE]
for . We recall from [1] that
[TABLE]
and
[TABLE]
where . Using the inequality , for every , we deduce that, for we have
[TABLE]
and for we have
[TABLE]
Recall the following definition:
Definition 1.0.2
([16])* For a continuous and strictly monotonic function on and with , the Tsallis quasilinear entropy (-quasilinear entropy) is defined by*
[TABLE]
where is a probability distribution with for all
Notice that if does not depend on the parameter , then we have
[TABLE]
For and with , we define the -exponential function as the inverse function of the -logarithmic function by
[TABLE]
Note that the function is the solution of the differential equation [18], where and
If we take , then we have Furthermore, we have
Proposition 1.0.3
([16])* The Tsallis quasilinear entropy is nonnegative:*
[TABLE]
We note here that the -exponential function gives us the following connection between Rényi entropy and Tsallis entropy [4]:
[TABLE]
We should note here that \exp_{q}H_{q}\big{(}\mathbf{p}\big{)} is always defined, since we have
[TABLE]
Definition 1.0.4
For a continuous and strictly monotonic function on and two probability distributions and with for all , the Tsallis quasilinear divergence is defined by
[TABLE]
We notice that if does not depend on the parameter . We have
[TABLE]
For , the Tsallis quasilinear divergence becomes Tsallis divergence,
[TABLE]
And for , we have
[TABLE]
Proposition 1.0.5
([16])* If is a concave increasing function or a convex decreasing function, then we have nonnegativity of the Tsallis quasilinear divergence:*
[TABLE]
Remark 1.0.6
The following two functions satisfy the sufficient condition in the above proposition:
- (i)
* for with .*
- (ii)
* for with .*
It is notable that the following identity holds:
[TABLE]
2 New bounds for Tsallis quasilinear entropy and divergence
We start with the following lemma.
Lemma 2.0.1
Let be a real number strictly positive.
- (I)
Let .
- (I-i)
If , then
[TABLE]
- (I-ii)
If , then
[TABLE]
- (II)
Let .
- (II-i)
If , then
[TABLE]
- (II-ii)
If , then
[TABLE]
Proof. For the case of , all inequalities (21), (22),(23) and (24) hold trivially. So we assume and in the sequel. We use the following identity [20, Theorem 2.3],
[TABLE]
with , and .
We also use the Hermite-Hadamard inequality for the convex function :
[TABLE]
Since the function on is convex for by , we have
[TABLE]
- (I)
If , then
[TABLE]
which shows the second and third inequalities in (21) and (22).
- (I-i)
If , then we have the first and last inequalities in (21) since and .
- (I-ii)
If , then we have the first and last inequalities in (22) since and .
- (II)
If , then
[TABLE]
which shows the second and third inequalities in (23) and (24).
- (II-i)
If , then we have the first and last inequalities in (23) since and .
- (II-ii)
If , then we have the first and last inequalities in (24) since and .
∎
To state the following proposition, we recall the quasi-entropy:
[TABLE]
which appeared in [2, Eq.(7.1.1)] as a special case. See also [17, Result 10.15.]. We have the following results as a simple consequence of Lemma 2.0.1.
Proposition 2.0.2
Let be a real number strictly positive and a probability distribution with for all . If , then we have
[TABLE]
If , then we have
[TABLE]
Theorem 2.0.3
Let be a continuous and strictly monotonic function on , with , and let be a probability distribution with for all . If , then we have
[TABLE]
and if , then we have
[TABLE]
where means .
Proof. If the function is strictly increasing, then the function is strictly increasing. Since for every , we deduce Therefore, we have . It implies that . Similarly it is proven for a strictly decreasing function .
If , then from Lemma 2.0.1, for , we have
[TABLE]
which imply inequalities (30). Similarly we deduce the reversed inequalities (31).
∎
Corollary 2.0.4
Let be a continuous and strictly monotonic function on , with and let be a probability distribution with for all . If , then we have
[TABLE]
and if , then we have
[TABLE]
Proof. For from Theorem 2.0.3 it follows that we have the inequalities of the statement, taking into account of and
∎
Corollary 2.0.5
Under the same assumptions as in Corollary 2.0.4, for we have
[TABLE]
and if , then we have
[TABLE]
Proof. For \psi\big{(}x\big{)}=x^{1-q} from Theorem 2.0.3 it follows that we have the relations of the statement.
∎
Next, we obtain an estimation for the Tsallis quasilinear divergence.
Theorem 2.0.6
Let be a concave increasing function or a convex decreasing function, let and be two probability distributions with for all . If , then we have
[TABLE]
If , then we have
[TABLE]
Proof. We firstly assume that is a concave increasing function. From the Jensen’s inequality
[TABLE]
which is equivalent to
[TABLE]
In this case, is also increasing, so we have
[TABLE]
If , then from Lemma 2.0.1 we have
[TABLE]
which is equivalent to the inequalities in (32).
In the case that is a convex decreasing function, we can similarly prove the inequalities in (36). By the similar way, we prove the inequalities in (37) in the case .
∎
3 Biparametrical extended entropies and divergences
In this section, we consider two different kinds of biparametrical extended entropies and divergences. Firstly we give bounds on these defined in [21, 1] in Subsection 3.1. Secondly, we give bounds for the -quasilinear entropy and divergence in Subsection 3.2.
3.1 Biparametrical extended entropy and divergence given in [1, 21]
We recall that Wada and Suyari in [21] have axiomatically defined the biparametrical extended entropy by
[TABLE]
for such that or , where is a probability distribution. Since the equality holds, multiplying by , we deduce the relation
[TABLE]
where and . As a consequence,
[TABLE]
It is easy to see that
[TABLE]
For and from (39) we have
[TABLE]
Consequently
[TABLE]
In [1], the relation between the biparametrical extended entropy and the Tsallis entropy, which is expressed by a convex combination, was given by:
[TABLE]
Proposition 3.1.1
Let be two real numbers strictly positive and a probability distribution with for all . If and then we have
[TABLE]
and if and , then we have
[TABLE]
Proof. If and , then using inequalities (28) and (29), we have
[TABLE]
and
[TABLE]
Using these inequalities with the identity (40), we obtain the inequalities (41) and (40). ∎
Remark 3.1.2
The inequalities from Proposition (41) can be similarly proven by swapping and in (40), (28) and (29), with .
In [1], we established that the biparametrical extended divergence was axiomatically given by
[TABLE]
for two real numbers and such that , where and are two probability distributions. The biparametrical extended divergence is a generalization of the Tsallis divergence, because for in (43), we deduce the following identity:
[TABLE]
Moreover, we have
[TABLE]
and a convex combination between the biparametrical extended divergence and the Tsallis divergence expressed by
[TABLE]
This divergence is nonnegative, i.e. , and has many other properties: symmetry, joint convexity, monotonicity.
Next, we define the quasi-divergence:
[TABLE]
Notice that the quasi-divergence is a generalization of the divergence, since for we obtain .
Lemma 3.1.3
Let be a real number strictly positive and and two probability distributions with for all . If then we have
[TABLE]
and if , then we have
[TABLE]
Proof. If , then from the proof of Lemma 2.0.1, we have the following inequality:
[TABLE]
for all . Consequently for , we obtain
[TABLE]
Multiplying by and passing to the sum, we deduce
[TABLE]
which implies the statement. Similarly we prove for the case .
∎
Using the identity (44) with the inequalities (45) and (46), we obtain the following results.
Proposition 3.1.4
Let be strictly positive real numbers and and two be probability distributions with for all . If and , then we have
[TABLE]
and if and , then we have
[TABLE]
3.2 A biparametrical extended entropy and divergence defined by the -logarithmic function
We firstly give the notation. The biparametrical extended logarithmic function (see e.g. [19]) for is defined by
[TABLE]
which uniformly converges to the usual logarithmic function as and . This is a decreasing function with respect to the indices. Correspondingly, the inverse function of is denoted by
[TABLE]
We start with the Tsallis -quasilinear entropies and Tsallis -quasilinear divergences as they were defined in [15].
Definition 3.2.1
Let be a continuous and strictly monotonic function on and with . The -quasilinear entropy is defined by
[TABLE]
For we have the following entropic functional:
[TABLE]
This also gives rise to another case of interest:
[TABLE]
which in particular coincides with Arimoto entropy.
Definition 3.2.2
For a continuous and strictly monotonic function on and with and two probability distributions and with for all , the -quasilinear divergence is defined by
[TABLE]
For we have the following relation:
[TABLE]
By a direct calculation we have
[TABLE]
and
[TABLE]
Thus
- (i)
If and , then .
- (ii)
If and , then .
Therefore, we obtain the non-negativity of the biparametrical divergence
[TABLE]
for and , by using Jensen’s inequality.
By analogy to the entropy computation, the following Arimoto type divergence:
[TABLE]
The non-negativity in the above inequality follows from the fact that for and for each , and its reverse holds for and for each .
Similarly, we apply Lemma 2.0.1 for the biparametric case. Above we defined the \big{(}r,q\big{)}-logarithmic function for by
[TABLE]
Lemma 3.2.3
Let be two strictly positive real numbers. If , , then we have
[TABLE]
If , , then we have
[TABLE]
If , , then we have
[TABLE]
If , , then we have
[TABLE]
Proof. For and we have . Using Lemma 2.0.1 for and we have
[TABLE]
which implies inequality (3.2.3). Similarly, we show the other cases.
∎
Theorem 3.2.4
Let be two strictly positive real numbers. Let be a continuous and strictly monotonic function on and a probability distribution with for all . If , then we have
[TABLE]
and if , then we have
[TABLE]
Proof. If and , then, using inequality (3.2.3) we deduce the statement. Similarly we deduce the reversed inequalities using inequality (3.2.3).
∎
Theorem 3.2.5
Let be two strictly positive real numbers. Let be a concave increasing function or a convex decreasing function, and be two probability distributions with for all . If , then
[TABLE]
If , then
[TABLE]
Proof. From the proof of Theorem 2.0.6, we deduce
[TABLE]
If , then from inequality (3.2.3) we obtain the inequalities (57). For the case that is a convex decreasing function, we can similarly prove the inequality. Similarly we prove the case .
∎
Remark 3.2.6
It is well known that for every . For we have, when , the inequality
[TABLE]
which implies that Multiplying by and passing to the sum from to , we obtain for ,
[TABLE]
An analogous way, for , we deduce
[TABLE]
Using the above inequality, for , we have Multiplying by and passing to the sum from to , we obtain for ,
[TABLE]
and for we have
[TABLE]
4 Some inequalities for the extended Lin’s divergence
The Tsallis divergence (relative entropy) is rewritten by
[TABLE]
The Jeffreys divergence is defined by
[TABLE]
and the Jensen-Shannon divergence is defined as
[TABLE]
(see e.g. [19]).
In [22, Lemma 7] we proved the general case of following inequality with a parameter in hypodivergence.
[TABLE]
We can prove the following inequality with one parameter .
Theorem 4.0.1
Let and let , with for two probability distributions and . Then for we have
[TABLE]
and for we deduce the inverse inequality.
Proof. Using the arithmetic-geometric mean inequality, we have
[TABLE]
Similarly, using the geometric-harmonic mean inequality, we have
[TABLE]
which implies the second inequality.
∎
Note that the first inequality in Theorem 4.0.1 recovers the inequality (64) when .
Lin’s divergence [23] is given by
[TABLE]
From inequality (64) by passing to the limit when , we have the inequality
[TABLE]
Similarly, we have
[TABLE]
By summing the above relations, we obtain an inequality between the Jeffreys divergence and Jensen-Shannon divergence:
[TABLE]
Proposition 4.0.2
Fro two probability distributions and , we have
[TABLE]
Proof. Using the Lin’s divergence and the usual divergence, we have
[TABLE]
Consider the function given by . The function is concave, because . Therefore, applying Jensen’s inequality, we obtain
[TABLE]
Consequently, then follows the inequality of the statement. ∎
5 Some characterizations of the Fermi-Dirac entropy and the Bose-Einstein entropy
In [24] the physical phenomena for power-law were studied from Tsallis statistical viewpoints using the FermiDiracTsallis entropy given by
[TABLE]
Similarly, in [15] the BoseEinsteinTsallis entropy is defined as
[TABLE]
These entropies are one-parameter extensions of the Fermi-Dirac entropy and the Bose-Einstein entropy defined by
[TABLE]
and
[TABLE]
respectively.
Theorem 5.0.1
Let be a probability distribution satisfying for all . Then we have
[TABLE]
Proof: Mention that . We consider the function , defined by Since , we deduce that the function is increasing. So, which means that Therefore, we obtain the inequality of the statement. ∎
It may be of interest for the readers to give the following alternative proof of Theorem 5.0.1. Alternative proof of Theorem 5.0.1: For a probability distribution with , for all , we deduce two probability distributions , with , for all , and , with , for all . It is easy to see that
[TABLE]
and
[TABLE]
where
Taking the difference between the two above relations, we have the following relation:
[TABLE]
But, for a probability distribution we generally have , so we deduce and . Therefore, we find that
[TABLE]
The last inequality can be proven by . This first inequality is equivalent to , where the sequence is given by and it is increasing for ∎
Similarly we can prove the following result.
Theorem 5.0.2
For a probability distribution with for all and , we have
[TABLE]
Proof. The special case was proven in Theorem 5.0.1. In the sequel we assume . Since
[TABLE]
we consider the function , defined by We find . For and , we have which proves that . Therefore, the function is decreasing, so which means that
[TABLE]
But, since , we obtain , which shows that the inequality of the statement is true. An analogous way, for , we deduce the inequality of the statement. ∎
Remark 5.0.3
The FermiDiracTsallis entropy converges to the FermiDirac entropy , and the BoseEinsteinTsallis entropy converges to the BoseEinstein entropy , when we take the limit . Therefore, by passing to the limit in inequality (71), when , we obtain inequality (70).
6 Conclusion
In this paper, we have obtained some mathematical inequalities for some entropies and divergences. In section 3, we studied some mathematical properties on the biparametrical extended entropy given in (50) and in (38). Also we found the biparametrical extended divergences in the same section to be interested as given in (52) and given in (43). It is also natural to be interested in the relations between them. We easily find that
[TABLE]
and
[TABLE]
Since it is quite difficult to find the relation for any parameters and , we will try to study about it in the future.
Acknowledgements
The authors would like to thank the referees for their careful and insightful comments to improve our manuscript. The author (S.F.) was partially supported by JSPS KAKENHI Grant Number 16K05257.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. Furuichi, An axiomatic characterization of a two-parameter extended relative entropy , J. Math. Phys., 51 (2010), 123302.
- 2[2] J. Aczél and Z. Daróczy, On Measures of information and their characterizations , Academic Press, San Diego, 1975.
- 3[3] A. Rényi, On measures of entropy and information , In: Proc. 4th Berkeley Symp., Mathematical and Statistical Probability, 1 (1961), 547–561.
- 4[4] C. Tsallis, Possible generalization of Bolzmann-Gibbs statistics , J. Stat.Phys., 52 (1988), 479–487.
- 5[5] C. Tsallis, A.K.Rajagopal, A.R.Plastino, I.Andricioaei, J.E.Straub, S.Abe, J.Naudts, M.Czachor, J.Klao, S.Kobe, Y.Okamoto and U.H.E.Hansmann, Nonextensive statistical mechanics and its applications , in S. Abe, Y. Okamoto (eds.), Springer, Berlin, 2001.
- 6[6] C. Tsallis, Introduction to nonextensive statistical mechanics: Approaching a complex world , Springer, Berlin, 2009.
- 7[7] C. Tsallis, Entropy. In: Encyclopedia of complexity and systems science , Springer, Berlin, 2009.
- 8[8] L.-H. Sun, G.-X. Li and Z. Ficek, Continuous variables approach to entanglement creation and processing , Appl. Math. Inf. Sci. 4 (2010), 315–339.
