A two-parameter entropy and its fundamental properties
Supriyo Dutta, Shigeru Furuichi, Partha Guha

TL;DR
This paper introduces a new two-parameter generalized entropy that encompasses Tsallis and Shannon entropies, exploring its fundamental properties and comparing its information-theoretic and geometric characteristics.
Contribution
It proposes a novel two-parameter entropy and analyzes its key properties, extending the understanding of generalized entropies beyond existing models.
Findings
The new entropy reduces to Tsallis and Shannon entropies at specific parameters.
It satisfies sub-additivity, strong sub-additivity, joint convexity, and information monotonicity.
The entropy exhibits distinct information-geometric properties compared to classical entropies.
Abstract
This article proposes a new two-parameter generalized entropy, which can be reduced to the Tsallis and the Shannon entropy for specific values of its parameters. We develop a number of information-theoretic properties of this generalized entropy and divergence, for instance, the sub-additive property, strong sub-additive property, joint convexity, and information monotonicity. This article presents an exposit investigation on the information-theoretic and information-geometric characteristics of the new generalized entropy and compare them with the properties of the Tsallis and the Shannon entropy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy Β· Mathematical Inequalities and Applications
Elements of Generalized Tsallis Relative Entropy in Classical Information Theory
Supriyo Dutta 1, Shigeru Furuichi2 , Partha Guha 1
1 Department of Theoretical Science
S. N. Bose National Centre for Theoretical Sciences
Block - JD, Sector - III, Salt Lake City, Kolkata
West Bengal, India - 700 106
2 Department of Information Science,
College of Humanities and Sciences, Nihon University,
3-25-40, Sakurajyousui, Setagaya-Ku, Tokyo, 156-8550, Japan Email: [email protected]: [email protected]: [email protected]
Abstract
This article proposes a modification in the Sharma-Mittal entropy and distinguishes it as generalised Tsallis entropy. This modification accomplish the Sharma-Mittal entropy to be used in classical information theory. We derive a product rule
[TABLE]
for the two-parameter deformed logarithm . It assists us to derive a number of information theoretic properties of the generalized Tsallis entropy, and related entropy. They include the sub-additive property, strong sub-additive property, joint convexity, and information monotonicity. This article is an exposit investigation on the information-theoretic, and information-geometric characteristics of generalized Tsallis entropy.
1 Introduction
Information geometry [1] has been developed in the field of statistics as a geometric way to analyse different order dependencies between random variables. The information geometry has a unique feature. It has a dualistic structures of affine connections. In this article, we study information geometry of a two parameter generalization of Tsallis entropy which also reduces to the Gibbs Shannon entropy [2, 3, 4, 5]. The Tsallis entropy which is followed by the -thermostatistics is a generalization of the thermostatistics based on -entropy [6]. In literature, the properties of Tsallis entropy and Tsallis relative entropies are investigated in detail [7, 8, 9]. In this article, we have explained the information theoretic , and information geometric structures associated to the generalised Tsallis entropy.
In literature, a number of variations of the Sharma-Mittal entropy [10, 11] is available. Although it is utilised in thermodynamics, biology and computer science, the information theoretic counterpart of this entropy is not investigated till date. Here, we propose a rectification in the definition of Sharma-Mittal entropy, available in [12], which opens a scope of information theoretic investigations. The Sharma-Mittal entropy can be reduced to both the Renyi and Tsallis entropy for specific limits on its parameters. After our modification it loss this important feature and can be reducible to the Tsallis entropy, only. Therefore, we call it a two parameter generalization of Tsallis entropy and denote it by . Similarly, we define the generalised Tsallis relative entropy or generalised Tsallis divergence . The significant attributes of and derived in this article are are listed below:
Pseudo-additivity of generalised Tsallis entropy:
[TABLE] 2. 2.
Sub-additive property of generalized Tsallis entropy:
[TABLE] 3. 3.
Pseudo-additivity of generalised Tsallis relative entropy:
[TABLE] 4. 4.
Joint convexity of generalised Tsallis relative entropy:
[TABLE] 5. 5.
Information monotonicity of generalised Tsallis relative entropy:
[TABLE]
This article consists of six sections. The next section redefines the Sharma-Mittal logarithm and establishes a number of its characteristics required for the calculations. Section 3 discusses the generalised Tsallis entropy. The chain rule of joint generalised Tsallis entropy is discussed here. Section 4 is dedicated to generalised Tsallis relative entropy and its properties. We discuss the information geometric aspects of relative entropy in section 5. Then we conclude the article with a number of open problems.
2 Preliminary properties of a two parameter deformed logarithm
A function is convex [13] if , for all . More generally,
[TABLE]
It can be proved that, if is a twice differentiable convex function then . The function is concave if is convex. Hence, a function is said to be concave if
[TABLE]
Given probability distribution , the Sharma-Mittal entropy [12] of the random variable is defined by
[TABLE]
and , such that,
[TABLE]
Now, recall a few properties of natural logarithm which is useful in the literature of information theory. The function for all . Also, for all we have , that is is monotonically decreasing. Moreover, , which indicates is a convex function for all . Now, we restrict our discussion of the deformed logarithm to the range of its parameters and such that fulfils these characteristics.
Theorem 1**.**
For , and the function is positive, convex, and monotonically decreasing for all .
Proof.
Note that, for all and for all . If we have and . Differentiating we find , that is is a monotonically decreasing function. Also, . Hence, is a convex function.
For all , we have for all . Therefore, for all and . Define with . Differentiating we get , for all , which indicates is a monotonically decreasing function for all . Again, the double derivative . The assumption of the theorem suggests that . Also, and for all . Combining we get which is sufficient for convexity.
It can be proved that, if two given functions are convex, and both monotonically non-decreasing (or non-increasing) on an interval, then is convex [13]. Hence, is convex for , and .
Moreover, and for all and both are monotonically decreasing. Therefore, their product is monotonically decreasing for , and . β
The convexity of requires . Hence, we redefine as follows
Definition 1**.**
[TABLE]
with and .
Note that, different notations are used for redefining the deformed logarithm. The importance of restricting the range of will be clarified later. Clearly, and is undefined.
It can be proved that the product rule of two parameter deformed logarithm mentioned in equation (8) is given by
[TABLE]
where [4]. Now, for a joint random variables we have . The equation (10) suggests that the joint Sharma-Mittal entropy is
[TABLE]
These expressions have no known counterpart in Shannon and Tsallis information theory. It prevents us to derive the chain rule for Sharma-Mittal entropy. This is a theoretical barrier restricting the utilization of Sharma-Mittal entropy in classical information theory [14]. But, the deformed logarithm mentioned in definition 1 follows a product rule discussed in next lemma:
Lemma 1**.**
Given any two real numbers we have
[TABLE]
Proof.
[TABLE]
It leads us to the result. β
The product rule of two parameter deformed logarithm can also be expressed as follows:
Lemma 2**.**
Given any two real numbers we have
[TABLE]
Proof.
Note that,
[TABLE]
Hence, we find the result. β
Now, we consider a few properties of the two parameter deformed logarithm which will be useful later.
Corollary 1**.**
For any non-zero real number we have , or .
Proof.
Putting in the lemma 2 we find
[TABLE]
which leads to the result. β
Corollary 2**.**
For any two non-zero real numbers and we have
[TABLE]
Proof.
Considering and in lemma 2 we find that
[TABLE]
Now putting we have the result. β
Lemma 3**.**
For any non-zero real number and any real number we have .
Proof.
[TABLE]
β
Lemma 4**.**
The function is a convex function for , and .
Proof.
. Therefore . Now if , that is . β
Lemma 5**.**
For any real number and for the function is a convex function.
Proof.
Simplifying we get . Therefore, and . For we have . Hence, for all , which indicates that is a convex function. β
Now we state the generalized log sum inequality for two parameter deformed logarithm:
Theorem 2**.**
Let and be non-negative numbers. In addition, and . Then,
[TABLE]
Proof.
[TABLE]
Now lemma 5 suggests that is a convex function. Therefore,
[TABLE]
which indicates the proof. β
3 Modified Sharma-Mittal entropy
For proceeding further, recall the definition of two parameter deformed logarithm and Sharma-Mittal entropy in equation (8). Equation (11) justifies that the two parameter deformed logarithm does not produce the chain rule for the Sharma-Mittal entropy. Therefore, we need to modify the definition of Sharma-Mittal entropy mentioned in the equation (8). A two parameter deformed entropy is investigated in [15] which is defined by
[TABLE]
Our proposal for a two parameter deformed entropy is mentioned in definition 2. Definition 2 with the function enables us to establish the chain rule and subadditivity and so on for two parameter entropy. This is a great advantage to use Definition 2 instead of the definition given in equation (19).
Definition 2**.**
We define the generalized Tsallis entropy for a random variable with probability distribution as
[TABLE]
where with , and , mentioned in definition 1.
Assuming and in equation (19), we recover the expression in definition 2.
The function is undefined for but . Hence conventionally, if for some then we have . The theorem 1 suggests that , that is for any non-zero probability . Therefore, given any random variable we have .
This modification in Sharma-Mittal entropy is consistent to the Tsallis entropy [12]. Putting , in the expression we find
[TABLE]
Now, for in the expression of we find that
[TABLE]
which is the Tsallis logarithm. Putting in definition 2, we find that
[TABLE]
which is the Tsallis entropy [6, 8].
There are a number of similar expressions generating the Tsallis entropy. For instance, consider . This expression also produces the Tsallis entropy for . Another expression with similar property is . In fact there are many others. Our motivation behind considering the expression of definition 2 as an ideal expression of generalized Tsallis entropy comes from lemma 1, which is the product rule for deformed logarithm. It establishes a chain rule like identity for the generalized Tsallis entropy. But, other expressions do not offer this scope. Details of the chain rule for generalized Tsallis entropy will be discussed below.
Definition 3**.**
(Joint entropy) Let be a probability distribution of the joint random variable . The generalized Tsallis joint entropy of the joint random variable is defined by
[TABLE]
Similarly, for three random variables , and the joint entropy will be given by
[TABLE]
Recall that the probability distribution of conditional random variable is given by . In sort, . Now, we define the conditional entropy as follows:
Definition 4**.**
(Conditional entropy) Given a conditional random variable we define the generalized Tsallis conditional entropy as
[TABLE]
As , we can alternatively write down
[TABLE]
This definition can be generalized for three or more random variables. Given three random variables and we have
[TABLE]
In a similar fashion, we can define
[TABLE]
Similarly, the definition of conditional entropy can be extended for any number of random variables for defining .
The definitions of the generalised Tsallis joint entropy and conditional entropy are also consistent to the definitions of Tsallis joint and conditional entropy [8], respectively. Note that,
[TABLE]
which is the Tsallis joint entropy. In addition,
[TABLE]
which is the Tsallis conditional entropy.
Lemma 6**.**
Given two independent random variables and the generalised Tsallis conditional entropy can be expressed as
[TABLE]
Proof.
From the definition of conditional entropy we find that
[TABLE]
Also the definition of generalized Tsallis entropy mentioned in definition 2 suggests that that is . Putting it in the above equation we construct
[TABLE]
For independent random variables and we have . Therefore,
[TABLE]
β
Putting in this result we find,
[TABLE]
for any two independent random variables and . Now we have the following corollaries.
Given any two independent random variables and , the lemma 6 suggests that . But, this inequality holds for any two random variables and , which we discuss in the lemma below.
Lemma 7**.**
Given any two random variables and we have .
Proof.
Consider a function where and . Simplifying, . As and , we have . Differentiating we get and for all . Therefore, is a convex function that is is a concave function.
As , we have . Also, indicates,
[TABLE]
for . Combining we get
[TABLE]
Now, applying the concavity property of we find
[TABLE]
Expanding in the above equation,
[TABLE]
Summing over we find
[TABLE]
Combining this equation with equation (35) we find
[TABLE]
The first and the last term of the above inequality indicates . β
Theorem 3**.**
(Chain rule for generalised Tsallis entropy) Given any two random variables and the generalised Tsallis joint entropy can be expressed as
[TABLE]
Proof.
The probability of joint random variables can be expressed as . The product rule of mentioned in lemma 1 indicates that
[TABLE]
Applying we find that
[TABLE]
The definition 2 of generalized Tsallis entropy suggests that that is . Therefore, the above equation indicates that
[TABLE]
Multiplying both side by and summing over and we get
[TABLE]
Now, definition 3 and 4 together indicate
[TABLE]
β
The above theorem clearly indicates that . For two independent random variables and the lemma 6 and theorem 3 produce that the pseudo-additivity property for generalised Tsallis entropy is
[TABLE]
Putting in the above equation we find
[TABLE]
which is the pseudoadditivity property of Tsallis entropy [8].
Corollary 3**.**
The following chain rules holds for generalized Tsallis entropy: .
Proof.
We have . Now, applying the product rule mentioned in lemma 1 we find
[TABLE]
Now the equation and the definitions of joint and conditional entropies indicate . β
Corollary 4**.**
The generalized Tsallis entropy also fulfils the following chain rule: .
Proof.
We also have . Applying the similar approach in corollary 3 and theorem 3 we have
[TABLE]
Applying corollary 3 we have
[TABLE]
Now the theorem 3 suggests . Putting it in the above equation we have
[TABLE]
β
The corollary 4 also suggests that
[TABLE]
In general the corollary 3 and 4 can be generalized as
[TABLE]
which indicates
[TABLE]
For any two independent random variables ad equation (44) suggests that
[TABLE]
as . If and are any two random variables theorem 3 and lemma 7 together indicate the following theorem, which is the Sub-additive property of generalized Tsallis entropy:
Theorem 4**.**
Given any two random variables and we have
[TABLE]
where and .
This theorem can be further generalized as
[TABLE]
where are random variables.
Lemma 8**.**
Given any three random variables , and we have .
Proof.
Recall from the proof of lemma 7 that the function , where and is a convex function, as well as . Therefore, as we have
[TABLE]
In addition, indicates
[TABLE]
Also, recall a basic result of conditional probability which is
[TABLE]
Using the concavity property of in the expression below we get
[TABLE]
Multiplying both side of the above inequality with and summing over and we find
[TABLE]
Note that, . Therefore,
[TABLE]
Combining we get . β
The above inequality leads up the the strong sub-additivity property of generalized Tsallis entropy which is mentioned below.
Theorem 5**.**
Given any three random variable and we have
[TABLE]
Proof.
The theorem 3 indicates
[TABLE]
Now, applying the chain rules for the generalized Tsallis entropy mentioned in corollary 4 we find
[TABLE]
The chain rule in corollary 3 leads us to
[TABLE]
Now lemma 8 indicates . Therefore,
[TABLE]
Hence, the result. β
4 Fundamental properties of generalized Tsallis relative entropy
In Shannon information theory, the relative entropy, or Kullback-Leibler divergence is a measure of difference between two probability distributions. Recall that given two probability distributions and the Kullback-Leibler divergence [14] is defined by
[TABLE]
We generalize it in terms of generalized Tsallis entropy as follows:
Definition 5**.**
(generalized Tsallis relative entropy) Given two probability distributions and the generalised Tsallis relative entropy is given by
[TABLE]
The equivalence between two expressions of follows from corollary 1. In the above definition, the term is essential to establish the pseudo-additive property of the generalized Tsallis relative entropy. But, in the above definition applying , as earlier, does not leads us to the usual definition of the Tsallis relative entropy. Putting in we find
[TABLE]
which is the Tsallis relative entropy [8, 7]. Now we discuss a few properties of the generalised Tsallis divergence.
Lemma 9**.**
(Nonnegativity) For any two probability distribution and the generalised Tsallis relative entropy . The equality holds for .
Proof.
Lemma 4 suggests that is a convex function for all and . Therefore,
[TABLE]
Now, . Note that, if then
[TABLE]
as . β
Lemma 10**.**
(Symmetry) Let and be two probability distributions, such that, and for a permutation and probability distributions and . Then .
Proof.
The permutation alters the position of under addition and keeps the sum , unaltered. Hence, the proof follows trivially. β
Lemma 11**.**
(Possibility of extension) Let and , then .
Proof.
Define . Expanding logarithm of of we find
[TABLE]
Hence, we find . In addition, we have . Now applying Moore-Osgood theorem [16] we find that . Therefore, . Hence, . β
Given two probability distributions and we can define a joint probability distribution . Note that, for all and we have . In addition, . Now, we have the following theorem.
Theorem 6**.**
(Pseudo-additivity) Given probability distributions , , and we have
[TABLE]
Proof.
Recall the product rule of mentioned in the lemma 1. Expanding the logarithm we find
[TABLE]
Multiplying with both side we find
[TABLE]
Now, applying the definition 5 we find
[TABLE]
β
Putting and in the above result we find
[TABLE]
which is the Pseudo-additive property of Tsallis relative entropy [8].
Theorem 7**.**
(Joint convexity) Let and for are probability distributions. Construct new probability distributions , and as convex combinations. Then
[TABLE]
Proof.
Note that,
[TABLE]
Now applying the log-sum inequality stated in theorem 2 we find
[TABLE]
Summing over , we find the result. β
Before proceeding farther, we make a change of notations from now on. Let be a random variable with outcomes . We represent a probability distribution by a finite sequence as . Now consider a transition probability matrix such that for all . Let and be two probability distributions. After a transition with the new probability distributions are and , where , and . Now, we have the following theorem.
Theorem 8**.**
(Information monotonicity in general) Given probability distributions , and transition probability matrix we have .
Proof.
Modifying the notations in definition 5 we find that
[TABLE]
Now, theorem 2 suggests that
[TABLE]
Hence, we have . β
In theorem 8, if the probability transition matrix has , then partitions the random variable into groups such that , and . Then . Now the theorem 8 indicates , which is formally mentioned as information monotonicity.
5 Information geometric aspects
Let us start with reviewing main concepts of information geometry. We consider a probability simplex,
[TABLE]
with the distribution described by -independent probabilities . Consider a parametric family of distributions with parameter vector , where is a parameter space. If the parameter space is a differentiable manifold and the mapping is a diffeomorphism we can identify statistical models in the family as points on the manifold . The Fisher-Rao information matrix , where s is the gradient may be used to endow with the following Riemannian metric
[TABLE]
If is discrete the above integral is replaced with a sum. An equivalent form of (7) for normalized distributions that is given by
[TABLE]
In information geometry [17, 18], a function between two points and in a manifold M for is called divergence if it fulfils the following conditions:
. 2. 2.
if and only if .
We denote the coordinates of a point by . For infinitesimally close two points and , we have by Taylor expansion For small we have
[TABLE]
where is a positive-definite matrix. Hence, the Riemannian metric induced by a divergence is given by
[TABLE]
Thus the divergence gives us a means of determining the degree of separation between two points on a manifold, it is not a metric since it is not necessarily symmetric. An important divergence in information geometry is the Kullback-Leibler (KL) divergence, or relative entropy.
Therefore, the length of small line segment is given by
[TABLE]
In this article, we consider
[TABLE]
Now,
[TABLE]
The above calculation indicates where
[TABLE]
The matrix is also called the Fisher information matrix.
Theorem 9**.**
The statistical manifold induced by the generalised Tsallis relative entropy is Hassian.
Proof.
A manifold is called Hassian if there is a function such that . For we have . Integrating twice we find
[TABLE]
where and are integrating constants. For we have , that is . Hence, the statistical manifold is Hassian. β
6 Conclusion and open problems
In recent years, the idea of entropy is generalized in the context of thermodynamics, information theory, and dynamical systems with the help of advanced mathematical tools [19, 20]. It offers a broad scope of mathematical investigations. The Tsallis entropy has been widely utilized in different branches of science and engineering [21, 22, 23, 24, 25]. This article is a detailed description of the characteristics of generalised Tsallis relative entropy. Here, we propose a modification in the definition of the Sharma-Mittal entropy, such that, the new entropy fulfils the chain rule. Similarly, we modify the definitions of Sharma-Mittal joint entropy, conditional entropy, and relative entropy. We establish a number of characteristics of the generalised Tsallis divergence, which make it efficient to be utilized in classical information theory. Also, we justify that the statistical manifold induced by the generalised Tsallis relative entropy is Hassian. The following problems may be discussed in future:
In Shannon information theory, the mutual information of two random variables and is defined by , which is the Kullback-Leibler divergence between two probability distributions and . In case of generalised Tsallis entropy, one may introduce the mutual information then investigates its properties. Moreover, the mutual information has a crucial role in the literature of data processing inequalities. Hence, two parameter deformation of data-processing inequalities will be very crucial in this direction. 2. 2.
In Shannon information theory, it is proved that
[TABLE]
where denotes the Shannon entropy. As the generalised Tsallis mutual information is not well proposed we may define mutual entropy as
[TABLE]
Note that, here we do not assign the term mutual information [8]. Although, it is used as relative entropy for various applications [26]. In quantum information theory, these identities generates quantum discord, which is a well known quantum correlation. There are a few works discussing the deformation of quantum discord in terms of Tsallis [27], Renyi [28], and Sharma-Mittal entropy [29]. There is a scope for further investigation in this direction.
Acknowledgement
The author (S.F.) was partially supported by JSPS KAKENHI Grant Number 16K05257.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Shun-ichi Amari and Hiroshi Nagaoka. Methods of information geometry , volume 191. American Mathematical Soc., 2007.
- 2[2] AM Scarfone and T Wada. Thermodynamic equilibrium and its stability for microcanonical systems described by the sharma-taneja-mittal entropy. Physical Review E , 72(2):026123, 2005.
- 3[3] Antonio Scarfone, Hiroshi Matsuzoe, and Tatsuaki Wada. Information geometry of ΞΊ π \kappa -exponential families: Dually-flat, hessian and legendre structures. Entropy , 20(6):436, 2018.
- 4[4] AM Scarfone. Legendre structure of the thermostatistics theory based on the sharmaβtanejaβmittal entropy. Physica A: Statistical Mechanics and its Applications , 365(1):63β70, 2006.
- 5[5] Antonio M Scarfone, Hiroki Suyari, and Tatsuaki Wada. Gaussβ law of error revisited in the framework of sharma-taneja-mittal information measure. Central European Journal of Physics , 7(3):414β420, 2009.
- 6[6] Constantino Tsallis. Possible generalization of boltzmann-gibbs statistics. Journal of statistical physics , 52(1-2):479β487, 1988.
- 7[7] Shigeru Furuichi, Kenjiro Yanagi, and Ken Kuriyama. Fundamental properties of tsallis relative entropy. Journal of Mathematical Physics , 45(12):4868β4877, 2004.
- 8[8] Shigeru Furuichi. Information theoretical properties of tsallis entropies. Journal of Mathematical Physics , 47(2):023302, 2006.
