A two-parameter entropy and its fundamental properties

Supriyo Dutta; Shigeru Furuichi; Partha Guha

arXiv:1908.01696·math-ph·May 2, 2024

A two-parameter entropy and its fundamental properties

Supriyo Dutta, Shigeru Furuichi, Partha Guha

PDF

Open Access

TL;DR

This paper introduces a new two-parameter generalized entropy that encompasses Tsallis and Shannon entropies, exploring its fundamental properties and comparing its information-theoretic and geometric characteristics.

Contribution

It proposes a novel two-parameter entropy and analyzes its key properties, extending the understanding of generalized entropies beyond existing models.

Findings

01

The new entropy reduces to Tsallis and Shannon entropies at specific parameters.

02

It satisfies sub-additivity, strong sub-additivity, joint convexity, and information monotonicity.

03

The entropy exhibits distinct information-geometric properties compared to classical entropies.

Abstract

This article proposes a new two-parameter generalized entropy, which can be reduced to the Tsallis and the Shannon entropy for specific values of its parameters. We develop a number of information-theoretic properties of this generalized entropy and divergence, for instance, the sub-additive property, strong sub-additive property, joint convexity, and information monotonicity. This article presents an exposit investigation on the information-theoretic and information-geometric characteristics of the new generalized entropy and compare them with the properties of the Tsallis and the Shannon entropy.

Equations210

x^{r + k} ln_{{k, r}} (x) + y^{r + k} ln_{{k, r}} (y) + 2 k x^{r + k} y^{r + k} ln_{{k, r}} (x) ln_{{k, r}} (y),

x^{r + k} ln_{{k, r}} (x) + y^{r + k} ln_{{k, r}} (y) + 2 k x^{r + k} y^{r + k} ln_{{k, r}} (x) ln_{{k, r}} (y),

S_{{k, r}} (X, Y) = S_{{k, r}} (X) + S_{{k, r}} (Y) - 2 k S_{{k, r}} (X) S_{{k, r}} (y) .

S_{{k, r}} (X, Y) = S_{{k, r}} (X) + S_{{k, r}} (Y) - 2 k S_{{k, r}} (X) S_{{k, r}} (y) .

S_{{k, r}} (X_{1}, X_{2}, \dots X_{n}) \leq i = 1 \sum n S_{{k, r}} (X_{i}),

S_{{k, r}} (X_{1}, X_{2}, \dots X_{n}) \leq i = 1 \sum n S_{{k, r}} (X_{i}),

D_{{k, r}} (P^{(1)} \otimes P^{(2)} ∣∣ Q^{(1)} \otimes Q^{(2)}) = D_{{k, r}} (P^{(1)} ∣∣ Q^{(1)}) + D_{{k, r}} (P^{(2)} ∣∣ Q^{(2)}) - 2 k D_{{k, r}} (P^{(1)} ∣∣ Q^{(1)}) D_{{k, r}} (P^{(2)} ∣∣ Q^{(2)}) .

D_{{k, r}} (P^{(1)} \otimes P^{(2)} ∣∣ Q^{(1)} \otimes Q^{(2)}) = D_{{k, r}} (P^{(1)} ∣∣ Q^{(1)}) + D_{{k, r}} (P^{(2)} ∣∣ Q^{(2)}) - 2 k D_{{k, r}} (P^{(1)} ∣∣ Q^{(1)}) D_{{k, r}} (P^{(2)} ∣∣ Q^{(2)}) .

D_{{k, r}} (P^{(1)} + λ P^{(2)} ∣∣ Q^{(1)} + λ Q^{(2)}) \leq D_{{k, r}} (P^{(1)} ∣∣ Q^{(1)}) + λ D_{{k, r}} (P^{(1)} ∣∣ Q^{(1)}) .

D_{{k, r}} (P^{(1)} + λ P^{(2)} ∣∣ Q^{(1)} + λ Q^{(2)}) \leq D_{{k, r}} (P^{(1)} ∣∣ Q^{(1)}) + λ D_{{k, r}} (P^{(1)} ∣∣ Q^{(1)}) .

D_{{k, r}} (W P ∣∣ W Q) \leq D_{{k, r}} (P ∣∣ Q) .

D_{{k, r}} (W P ∣∣ W Q) \leq D_{{k, r}} (P ∣∣ Q) .

f (i \sum t_{i} x_{i}) \leq i \sum t_{i} f (x_{i}), where i \sum t_{i} = 1 and 0 \leq t_{i} \leq 1.

f (i \sum t_{i} x_{i}) \leq i \sum t_{i} f (x_{i}), where i \sum t_{i} = 1 and 0 \leq t_{i} \leq 1.

f (i \sum t_{i} x_{i}) \geq i \sum t_{i} f (x_{i}), where i \sum t_{i} = 1 and 0 \leq t_{i} \leq 1.

f (i \sum t_{i} x_{i}) \geq i \sum t_{i} f (x_{i}), where i \sum t_{i} = 1 and 0 \leq t_{i} \leq 1.

S_{{k, r}} (X) = - x \in X \sum p (x) Ln_{{k, r}} (p (x)), where Ln_{{k, r}} (x) = x^{r} \frac{x ^{k} - x ^{- k}}{2 k},

S_{{k, r}} (X) = - x \in X \sum p (x) Ln_{{k, r}} (p (x)), where Ln_{{k, r}} (x) = x^{r} \frac{x ^{k} - x ^{- k}}{2 k},

R = {(k, r) : - ∣ k ∣ \leq r \leq ∣ k ∣ when 0 \leq ∣ k ∣ < \frac{1}{2}} \cup {(k, r) : ∣ k ∣ - 1 \leq r \leq 1 - ∣ k ∣ when \frac{1}{2} \leq ∣ k ∣ < 1} .

R = {(k, r) : - ∣ k ∣ \leq r \leq ∣ k ∣ when 0 \leq ∣ k ∣ < \frac{1}{2}} \cup {(k, r) : ∣ k ∣ - 1 \leq r \leq 1 - ∣ k ∣ when \frac{1}{2} \leq ∣ k ∣ < 1} .

ln_{{k, r}} (x) = \frac{x ^{k} - x ^{- k}}{2 k x ^{r}} = \frac{x ^{2 k} - 1}{2 k x ^{r + k}},

ln_{{k, r}} (x) = \frac{x ^{k} - x ^{- k}}{2 k x ^{r}} = \frac{x ^{2 k} - 1}{2 k x ^{r + k}},

Ln_{{k, r}} (x y) = u_{{k, r}} (x) Ln_{{k, r}} (y) + Ln_{{k, r}} (x) u_{{k, r}} (y),

Ln_{{k, r}} (x y) = u_{{k, r}} (x) Ln_{{k, r}} (y) + Ln_{{k, r}} (x) u_{{k, r}} (y),

S_{{k, r}} (X, Y) = = = - x \in X \sum y \in Y \sum p (x, y) Ln_{{k, r}} (p (x, y)) - x \in X \sum y \in Y \sum p (x) p (y ∣ x) Ln_{{k, r}} (p (x) p (y ∣ x)) - x \in X \sum y \in Y \sum p (x) p (y ∣ x) u_{{k, r}} (p (x)) Ln_{{k, r}} (p (y ∣ x)) - x \in X \sum y \in Y \sum p (x) p (y ∣ x) Ln_{{k, r}} (p (x)) u_{{k, r}} (p (y ∣ x)) .

S_{{k, r}} (X, Y) = = = - x \in X \sum y \in Y \sum p (x, y) Ln_{{k, r}} (p (x, y)) - x \in X \sum y \in Y \sum p (x) p (y ∣ x) Ln_{{k, r}} (p (x) p (y ∣ x)) - x \in X \sum y \in Y \sum p (x) p (y ∣ x) u_{{k, r}} (p (x)) Ln_{{k, r}} (p (y ∣ x)) - x \in X \sum y \in Y \sum p (x) p (y ∣ x) Ln_{{k, r}} (p (x)) u_{{k, r}} (p (y ∣ x)) .

x^{r + k} ln_{{k, r}} (x) + y^{r + k} ln_{{k, r}} (y) + 2 k x^{r + k} y^{r + k} ln_{{k, r}} (x) ln_{{k, r}} (y) .

x^{r + k} ln_{{k, r}} (x) + y^{r + k} ln_{{k, r}} (y) + 2 k x^{r + k} y^{r + k} ln_{{k, r}} (x) ln_{{k, r}} (y) .

ln_{{k, r}} (x) ln_{{k, r}} (y) = \frac{x ^{2 k} - 1}{2 k x ^{r + k}} \frac{y ^{2 k} - 1}{2 k y ^{r + k}} = \frac{x ^{2 k} y ^{2 k} - x ^{2 k} - y ^{2 k} + 1}{4 k ^{2} x ^{r + k} y ^{r + k}} = \frac{x ^{2 k} y ^{2 k} - 1 + 1 - x ^{2 k} - y ^{2 k} + 1}{4 k ^{2} x ^{r + k} y ^{r + k}} = \frac{x ^{2 k} y ^{2 k} - 1}{4 k ^{2} x ^{r + k} y ^{r + k}} - \frac{x ^{2 k} - 1}{4 k ^{2} x ^{r + k} y ^{r + k}} - \frac{y ^{2 k} - 1}{4 k ^{2} x ^{r + k} y ^{r + k}} = \frac{ln _{{k, r}} ( x y )}{2 k} - \frac{ln _{{k, r}} ( x )}{2 k y ^{r + k}} - \frac{ln _{{k, r}} ( y )}{2 k x ^{r + k}} .

ln_{{k, r}} (x) ln_{{k, r}} (y) = \frac{x ^{2 k} - 1}{2 k x ^{r + k}} \frac{y ^{2 k} - 1}{2 k y ^{r + k}} = \frac{x ^{2 k} y ^{2 k} - x ^{2 k} - y ^{2 k} + 1}{4 k ^{2} x ^{r + k} y ^{r + k}} = \frac{x ^{2 k} y ^{2 k} - 1 + 1 - x ^{2 k} - y ^{2 k} + 1}{4 k ^{2} x ^{r + k} y ^{r + k}} = \frac{x ^{2 k} y ^{2 k} - 1}{4 k ^{2} x ^{r + k} y ^{r + k}} - \frac{x ^{2 k} - 1}{4 k ^{2} x ^{r + k} y ^{r + k}} - \frac{y ^{2 k} - 1}{4 k ^{2} x ^{r + k} y ^{r + k}} = \frac{ln _{{k, r}} ( x y )}{2 k} - \frac{ln _{{k, r}} ( x )}{2 k y ^{r + k}} - \frac{ln _{{k, r}} ( y )}{2 k x ^{r + k}} .

ln_{{k, r}} (x y) = \frac{1}{x ^{r - k}} ln_{{k, r}} (y) + \frac{1}{y ^{r + k}} ln_{{k, r}} (x) .

ln_{{k, r}} (x y) = \frac{1}{x ^{r - k}} ln_{{k, r}} (y) + \frac{1}{y ^{r + k}} ln_{{k, r}} (x) .

\frac{( x y ) ^{k} - ( x y ) ^{- k}}{2 k} or \frac{( x y ) ^{k} - ( x y ) ^{- k}}{2 k ( x y ) ^{r}} = \frac{x ^{k} y ^{k} - x ^{k} y ^{- k} + x ^{k} y ^{- k} - x ^{k} y ^{- k}}{2 k} = \frac{x ^{k} ( y ^{k} - y ^{- k} )}{2 k} + \frac{y ^{- k} ( x ^{k} - x ^{- k} )}{2 k} = \frac{1}{x ^{r - k}} \frac{( y ^{k} - y ^{- k} )}{2 k y ^{r}} + \frac{1}{y ^{(r + k)}} \frac{( x ^{k} - x ^{- k} )}{2 k x ^{r}} .

\frac{( x y ) ^{k} - ( x y ) ^{- k}}{2 k} or \frac{( x y ) ^{k} - ( x y ) ^{- k}}{2 k ( x y ) ^{r}} = \frac{x ^{k} y ^{k} - x ^{k} y ^{- k} + x ^{k} y ^{- k} - x ^{k} y ^{- k}}{2 k} = \frac{x ^{k} ( y ^{k} - y ^{- k} )}{2 k} + \frac{y ^{- k} ( x ^{k} - x ^{- k} )}{2 k} = \frac{1}{x ^{r - k}} \frac{( y ^{k} - y ^{- k} )}{2 k y ^{r}} + \frac{1}{y ^{(r + k)}} \frac{( x ^{k} - x ^{- k} )}{2 k x ^{r}} .

or ln_{{k, r}} (1) = ln_{{k, r}} (x . \frac{1}{x}) = (\frac{1}{x})^{r - k} ln_{{k, r}} (\frac{1}{x}) + x^{r + k} ln_{{k, r}} (x) \frac{1}{x ^{r - k}} ln_{{k, r}} (\frac{1}{x}) = - x^{r + k} ln_{{k, r}} (x) .

or ln_{{k, r}} (1) = ln_{{k, r}} (x . \frac{1}{x}) = (\frac{1}{x})^{r - k} ln_{{k, r}} (\frac{1}{x}) + x^{r + k} ln_{{k, r}} (x) \frac{1}{x ^{r - k}} ln_{{k, r}} (\frac{1}{x}) = - x^{r + k} ln_{{k, r}} (x) .

ln_{{k, r}} (\frac{x}{y}) = - \frac{y ^{2 r}}{x ^{r - k}} ln_{{k, r}} (y) + y^{r + k} ln_{{k, r}} (x) .

ln_{{k, r}} (\frac{x}{y}) = - \frac{y ^{2 r}}{x ^{r - k}} ln_{{k, r}} (y) + y^{r + k} ln_{{k, r}} (x) .

ln_{{k, r}} (\frac{x}{y}) = \frac{1}{x ^{r - k}} ln_{{k, r}} (\frac{1}{y}) + y^{r + k} ln_{{k, r}} (x) .

ln_{{k, r}} (\frac{x}{y}) = \frac{1}{x ^{r - k}} ln_{{k, r}} (\frac{1}{y}) + y^{r + k} ln_{{k, r}} (x) .

ln_{{k, r}} (x^{a}) = \frac{( x ^{a} ) ^{k} - ( x ^{a} ) ^{- k}}{2 k ( x ^{a} ) ^{r}} = a \frac{x ^{ak} - x ^{- ak}}{2 ak x ^{a r}} = a ln_{{ak, a r}} (x) .

ln_{{k, r}} (x^{a}) = \frac{( x ^{a} ) ^{k} - ( x ^{a} ) ^{- k}}{2 k ( x ^{a} ) ^{r}} = a \frac{x ^{ak} - x ^{- ak}}{2 ak x ^{a r}} = a ln_{{ak, a r}} (x) .

i = 1 \sum n a_{i} (\frac{a _{i}}{b _{i}})^{r - k} ln_{{k, r}} (\frac{a _{i}}{b _{i}}) \geq a (\frac{a}{b})^{r - k} ln_{{k, r}} (\frac{a}{b}) .

i = 1 \sum n a_{i} (\frac{a _{i}}{b _{i}})^{r - k} ln_{{k, r}} (\frac{a _{i}}{b _{i}}) \geq a (\frac{a}{b})^{r - k} ln_{{k, r}} (\frac{a}{b}) .

i = 1 \sum n a_{i} (\frac{a _{i}}{b _{i}})^{r - k} ln_{{k, r}} (\frac{a _{i}}{b _{i}}) = b i = 1 \sum n \frac{b _{i}}{b} \frac{a _{i}}{b _{i}} (\frac{a _{i}}{b _{i}})^{r - k} ln_{{k, r}} (\frac{a _{i}}{b _{i}}) = b i = 1 \sum n \frac{b _{i}}{b} f (\frac{a _{i}}{b _{i}}) .

i = 1 \sum n a_{i} (\frac{a _{i}}{b _{i}})^{r - k} ln_{{k, r}} (\frac{a _{i}}{b _{i}}) = b i = 1 \sum n \frac{b _{i}}{b} \frac{a _{i}}{b _{i}} (\frac{a _{i}}{b _{i}})^{r - k} ln_{{k, r}} (\frac{a _{i}}{b _{i}}) = b i = 1 \sum n \frac{b _{i}}{b} f (\frac{a _{i}}{b _{i}}) .

i = 1 \sum n a_{i} (\frac{a _{i}}{b _{i}})^{r - k} ln_{{k, r}} (\frac{a _{i}}{b _{i}}) \geq b f (i = 1 \sum n \frac{b _{i}}{b} \frac{a _{i}}{b _{i}}) = b f (\frac{1}{b} i = 1 \sum n a_{i}) = b f (\frac{a}{b}) = b (\frac{a}{b})^{r - k + 1} ln_{{k, r}} (\frac{a}{b}),

i = 1 \sum n a_{i} (\frac{a _{i}}{b _{i}})^{r - k} ln_{{k, r}} (\frac{a _{i}}{b _{i}}) \geq b f (i = 1 \sum n \frac{b _{i}}{b} \frac{a _{i}}{b _{i}}) = b f (\frac{1}{b} i = 1 \sum n a_{i}) = b f (\frac{a}{b}) = b (\frac{a}{b})^{r - k + 1} ln_{{k, r}} (\frac{a}{b}),

S_{α, β} (X) = x \in X \sum \frac{( p ( x ) ) ^{α} - ( p ( x ) ) ^{β}}{β - α} where α \neq = β .

S_{α, β} (X) = x \in X \sum \frac{( p ( x ) ) ^{α} - ( p ( x ) ) ^{β}}{β - α} where α \neq = β .

S_{{k, r}} (X) = - x \in X \sum (p (x))^{r + k + 1} ln_{{k, r}} (p (x)) = x \in X \sum (p (x))^{k - r + 1} ln_{{k, r}} (\frac{1}{p ( x )}),

S_{{k, r}} (X) = - x \in X \sum (p (x))^{r + k + 1} ln_{{k, r}} (p (x)) = x \in X \sum (p (x))^{k - r + 1} ln_{{k, r}} (\frac{1}{p ( x )}),

ln_{{k, k}} (x) = \frac{x ^{k} - x ^{- k}}{2 k x ^{k}} = \frac{1 - x ^{- 2 k}}{2 k} .

ln_{{k, k}} (x) = \frac{x ^{k} - x ^{- k}}{2 k x ^{k}} = \frac{1 - x ^{- 2 k}}{2 k} .

ln_{{\frac{q - 1}{2}, \frac{q - 1}{2}}} (x) = \frac{1 - x ^{1 - q}}{q - 1} = \frac{x ^{1 - q} - 1}{1 - q} = ln_{q} (x),

ln_{{\frac{q - 1}{2}, \frac{q - 1}{2}}} (x) = \frac{1 - x ^{1 - q}}{q - 1} = \frac{x ^{1 - q} - 1}{1 - q} = ln_{q} (x),

S_{{\frac{q - 1}{2}, \frac{q - 1}{2}}} (X) = - x \in X \sum (p (x))^{q} \frac{( p ( x ) ) ^{1 - q} - 1}{1 - q} = S_{q} (X),

S_{{\frac{q - 1}{2}, \frac{q - 1}{2}}} (X) = - x \in X \sum (p (x))^{q} \frac{( p ( x ) ) ^{1 - q} - 1}{1 - q} = S_{q} (X),

S_{{k, r}} (X, Y) = - x \in X \sum y \in Y \sum (p (x, y))^{k + r + 1} ln_{{k, r}} (p (x, y)) .

S_{{k, r}} (X, Y) = - x \in X \sum y \in Y \sum (p (x, y))^{k + r + 1} ln_{{k, r}} (p (x, y)) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Mechanics and Entropy · Mathematical Inequalities and Applications

Full text

Elements of Generalized Tsallis Relative Entropy in Classical Information Theory

Supriyo Dutta 1, Shigeru Furuichi2 , Partha Guha 1

1 Department of Theoretical Science

S. N. Bose National Centre for Theoretical Sciences

Block - JD, Sector - III, Salt Lake City, Kolkata

West Bengal, India - 700 106

2 Department of Information Science,

College of Humanities and Sciences, Nihon University,

3-25-40, Sakurajyousui, Setagaya-Ku, Tokyo, 156-8550, Japan Email: [email protected]: [email protected]: [email protected]

Abstract

This article proposes a modification in the Sharma-Mittal entropy and distinguishes it as generalised Tsallis entropy. This modification accomplish the Sharma-Mittal entropy to be used in classical information theory. We derive a product rule $(xy)^{r+k}\ln_{\{k,r\}}(xy)=$

[TABLE]

for the two-parameter deformed logarithm $\ln_{\{k,r\}}(x)=x^{r}\frac{x^{k}-x^{-k}}{2k}$ . It assists us to derive a number of information theoretic properties of the generalized Tsallis entropy, and related entropy. They include the sub-additive property, strong sub-additive property, joint convexity, and information monotonicity. This article is an exposit investigation on the information-theoretic, and information-geometric characteristics of generalized Tsallis entropy.

1 Introduction

Information geometry [1] has been developed in the field of statistics as a geometric way to analyse different order dependencies between random variables. The information geometry has a unique feature. It has a dualistic structures of affine connections. In this article, we study information geometry of a two parameter generalization of Tsallis entropy which also reduces to the Gibbs Shannon entropy [2, 3, 4, 5]. The Tsallis entropy which is followed by the $\kappa$ -thermostatistics is a generalization of the thermostatistics based on $\kappa$ -entropy [6]. In literature, the properties of Tsallis entropy and Tsallis relative entropies are investigated in detail [7, 8, 9]. In this article, we have explained the information theoretic , and information geometric structures associated to the generalised Tsallis entropy.

In literature, a number of variations of the Sharma-Mittal entropy [10, 11] is available. Although it is utilised in thermodynamics, biology and computer science, the information theoretic counterpart of this entropy is not investigated till date. Here, we propose a rectification in the definition of Sharma-Mittal entropy, available in [12], which opens a scope of information theoretic investigations. The Sharma-Mittal entropy can be reduced to both the Renyi and Tsallis entropy for specific limits on its parameters. After our modification it loss this important feature and can be reducible to the Tsallis entropy, only. Therefore, we call it a two parameter generalization of Tsallis entropy and denote it by $S_{\{k,r\}}(X)$ . Similarly, we define the generalised Tsallis relative entropy or generalised Tsallis divergence $D_{\{k,r\}}$ . The significant attributes of $S_{\{k,r\}}(X)$ and $D_{\{k,r\}}(X||Y)$ derived in this article are are listed below:

Pseudo-additivity of generalised Tsallis entropy:

[TABLE] 2. 2.

Sub-additive property of generalized Tsallis entropy:

[TABLE] 3. 3.

Pseudo-additivity of generalised Tsallis relative entropy:

[TABLE] 4. 4.

Joint convexity of generalised Tsallis relative entropy:

[TABLE] 5. 5.

Information monotonicity of generalised Tsallis relative entropy:

[TABLE]

This article consists of six sections. The next section redefines the Sharma-Mittal logarithm and establishes a number of its characteristics required for the calculations. Section 3 discusses the generalised Tsallis entropy. The chain rule of joint generalised Tsallis entropy is discussed here. Section 4 is dedicated to generalised Tsallis relative entropy and its properties. We discuss the information geometric aspects of relative entropy in section 5. Then we conclude the article with a number of open problems.

2 Preliminary properties of a two parameter deformed logarithm

A function $f$ is convex [13] if $f((1-\lambda)x_{1}+\lambda x_{2})\leq(1-\lambda)f(x_{1})+\lambda f(x_{2})$ , for all $\lambda\in[0,1]$ . More generally,

[TABLE]

It can be proved that, if $f$ is a twice differentiable convex function then $f^{\prime\prime}(x)\geq 0$ . The function $f$ is concave if $-f(x)$ is convex. Hence, a function $f$ is said to be concave if

[TABLE]

Given probability distribution $\mathcal{P}=\{p(x):x\in X,p(x)\geq 0,\sum_{x\in X}p(x)=1\}$ , the Sharma-Mittal entropy [12] of the random variable $X$ is defined by

[TABLE]

and $(k,r)\in\mathcal{R}\subset\mathbb{R}^{2}$ , such that,

[TABLE]

Now, recall a few properties of natural logarithm which is useful in the literature of information theory. The function $f(x)=-\log(x)>0$ for all $x\in(0,1)$ . Also, for all $x>0$ we have $f^{\prime}(x)=-\frac{1}{x}<0$ , that is $-\log(x)$ is monotonically decreasing. Moreover, $f^{\prime\prime}(x)=\frac{1}{x^{2}}>0$ , which indicates $-\log(x)$ is a convex function for all $x\neq 0$ . Now, we restrict our discussion of the deformed logarithm $\operatorname{Ln}_{\{k,r\}}$ to the range of its parameters $k$ and $r$ such that $\ln_{\{k,r\}}$ fulfils these characteristics.

Theorem 1.

For $r<0$ , and $0<k\leq 1$ the function $-\operatorname{Ln}_{\{k,r\}}(x)=-x^{r}\frac{x^{k}-x^{-k}}{2k}$ is positive, convex, and monotonically decreasing for all $x\in(0,1]$ .

Proof.

Note that, $f(x)=x^{r}>0$ for all $x>0$ and for all $r<0$ . If $r<0$ we have $r=-|r|$ and $f(x)=\frac{1}{x^{|r|}}$ . Differentiating we find $f^{\prime}(x)=-\frac{|r|}{x^{|r|+1}}<0$ , that is $f(x)$ is a monotonically decreasing function. Also, $f^{\prime\prime}(x)=\frac{|r|(|r|+1)}{x^{|r|+2}}\geq 0$ . Hence, $f(x)$ is a convex function.

For all $k>0$ , we have $x^{-k}\geq x^{k}$ for all $x\in(0,1]$ . Therefore, $-\frac{x^{k}-x^{-k}}{2k}>0$ for all $k\neq 0$ and $0<x\leq 1$ . Define $g(x):\mathbb{R}^{+}\rightarrow\mathbb{R}^{+}$ with $g(x)=-\frac{x^{k}-x^{-k}}{2k}$ . Differentiating we get $g^{\prime}(x)=-\frac{x^{k-1}+x^{-k-1}}{2}<0$ , for all $x>0$ , which indicates $g(x)$ is a monotonically decreasing function for all $x>0$ . Again, the double derivative $g^{\prime\prime}(x)=-\frac{(k-1)x^{k-2}-(k+1)x^{-k-2}}{2}$ . The assumption of the theorem suggests that $k-1<0$ . Also, $x^{k-2},(k+1)$ and $x^{-k-2}>0$ for all $x>0$ . Combining we get $g^{\prime\prime}(x)>0$ which is sufficient for convexity.

It can be proved that, if two given functions $f,g:\mathbb{R}\rightarrow\mathbb{R}^{+}$ are convex, and both monotonically non-decreasing (or non-increasing) on an interval, then $fg(x)=f(x)g(x)$ is convex [13]. Hence, $-\ln_{\{k,r\}}(x)$ is convex for $r<0$ , and $0<k\leq 1$ .

Moreover, $f(x)>0$ and $g(x)>0$ for all $x\in(0,1]$ and both are monotonically decreasing. Therefore, their product $-\ln_{\{k,r\}}$ is monotonically decreasing for $r<0$ , and $0<k\leq 1$ . ∎

The convexity of $-\operatorname{Ln}_{\{k,r\}}(x)$ requires $r<0$ . Hence, we redefine $\operatorname{Ln}_{\{k,r\}}$ as follows

Definition 1.

[TABLE]

with $r>0$ and $0<k\leq\frac{1}{2}$ .

Note that, different notations are used for redefining the deformed logarithm. The importance of restricting the range of $k$ will be clarified later. Clearly, $\ln_{\{k,r\}}(1)=0$ and $\ln_{\{k,r\}}(0)$ is undefined.

It can be proved that the product rule of two parameter deformed logarithm $\ln_{\{k,r\}}(x)$ mentioned in equation (8) is given by

[TABLE]

where $u_{\{k,r\}}(x)=x^{r}\frac{x^{k}+x^{-k}}{2}$ [4]. Now, for a joint random variables $(X,Y)$ we have $p(x,y)=p(x)p(y|x)$ . The equation (10) suggests that the joint Sharma-Mittal entropy is

[TABLE]

These expressions have no known counterpart in Shannon and Tsallis information theory. It prevents us to derive the chain rule for Sharma-Mittal entropy. This is a theoretical barrier restricting the utilization of Sharma-Mittal entropy in classical information theory [14]. But, the deformed logarithm mentioned in definition 1 follows a product rule discussed in next lemma:

Lemma 1.

Given any two real numbers $x,y\neq 0$ we have $(xy)^{r+k}\ln_{\{k,r\}}(xy)=$

[TABLE]

Proof.

[TABLE]

It leads us to the result. ∎

The product rule of two parameter deformed logarithm can also be expressed as follows:

Lemma 2.

Given any two real numbers $x,y\neq 0$ we have

[TABLE]

Proof.

Note that,

[TABLE]

Hence, we find the result. ∎

Now, we consider a few properties of the two parameter deformed logarithm which will be useful later.

Corollary 1.

For any non-zero real number $x$ we have $\ln_{\{k,r\}}\left(\frac{1}{x}\right)=-x^{2r}\ln_{\{k,r\}}(x)$ , or $\ln_{\{k,r\}}(x)=-\frac{1}{x^{2r}}\ln_{\{k,r\}}\left(\frac{1}{x}\right)$ .

Proof.

Putting $y=\frac{1}{x}$ in the lemma 2 we find

[TABLE]

which leads to the result. ∎

Corollary 2.

For any two non-zero real numbers $x$ and $y$ we have

[TABLE]

Proof.

Considering $x\equiv x$ and $y\equiv\frac{1}{y}$ in lemma 2 we find that

[TABLE]

Now putting $\ln_{\{k,r\}}\left(\frac{1}{y}\right)=-y^{2r}\ln_{\{k,r\}}(y)$ we have the result. ∎

Lemma 3.

For any non-zero real number $x$ and any real number $a$ we have $\ln_{\{k,r\}}(x^{a})=a\ln_{\{ak,ar\}}(x)$ .

Proof.

[TABLE]

∎

Lemma 4.

The function $f(x)=-x^{r+k}\ln_{\{k,r\}}(x)$ is a convex function for $0\leq k\leq\frac{1}{2}$ , and $r>0$ .

Proof.

$f(x)=-x^{r+k}\ln_{\{k,r\}}(x)=\frac{1-x^{2k}}{2k}$ . Therefore $f^{\prime\prime}(x)=(1-2k)x^{(2k-2)}$ . Now $f^{\prime\prime}(x)\geq 0$ if $1-2k\geq 0$ , that is $0\leq k\leq\frac{1}{2}$ . ∎

Lemma 5.

For any real number $x>0$ and for $0<k\leq\frac{1}{2}$ the function $f(x)=x^{r-k+1}\ln_{\{k,r\}}(x)$ is a convex function.

Proof.

Simplifying we get $f(x)=x^{r-k+1}\ln_{\{k,r\}}(x)=\frac{x-x^{1-2k}}{2k}$ . Therefore, $f^{\prime}(x)=\frac{1-(1-2k)x^{-2k}}{2k}$ and $f^{\prime\prime}(x)=\frac{1-2k}{x^{1+2k}}$ . For $k\leq\frac{1}{2}$ we have $1-2k>0$ . Hence, $f^{\prime\prime}(x)\geq 0$ for all $x>0$ , which indicates that $f(x)$ is a convex function. ∎

Now we state the generalized log sum inequality for two parameter deformed logarithm:

Theorem 2.

Let $a_{1},a_{2},\dots a_{n}$ and $b_{1},b_{2},\dots b_{n}$ be non-negative numbers. In addition, $a=\sum_{i=1}^{n}a_{i}$ and $b=\sum_{i=1}^{n}b_{i}$ . Then,

[TABLE]

Proof.

[TABLE]

Now lemma 5 suggests that $f(x)=x^{r-k+1}\ln_{\{k,r\}}(x)$ is a convex function. Therefore,

[TABLE]

which indicates the proof. ∎

3 Modified Sharma-Mittal entropy

For proceeding further, recall the definition of two parameter deformed logarithm and Sharma-Mittal entropy in equation (8). Equation (11) justifies that the two parameter deformed logarithm does not produce the chain rule for the Sharma-Mittal entropy. Therefore, we need to modify the definition of Sharma-Mittal entropy mentioned in the equation (8). A two parameter deformed entropy is investigated in [15] which is defined by

[TABLE]

Our proposal for a two parameter deformed entropy is mentioned in definition 2. Definition 2 with the function $\ln_{k,r}$ enables us to establish the chain rule and subadditivity and so on for two parameter entropy. This is a great advantage to use Definition 2 instead of the definition given in equation (19).

Definition 2.

We define the generalized Tsallis entropy for a random variable $X$ with probability distribution $\mathcal{P}=\{p(x)\}_{x\in X}$ as

[TABLE]

where $\ln_{\{k,r\}}(x)=\frac{x^{k}-x^{-k}}{2kx^{r}}$ with $0<k\leq\frac{1}{2}$ , and $r>0$ , mentioned in definition 1.

Assuming $\alpha=1-k+r$ and $\beta=1+k+r$ in equation (19), we recover the expression in definition 2.

The function $\ln_{\{k,r\}}(x)$ is undefined for $x\leq 0$ but $\lim_{x\rightarrow 0^{+}}x\ln_{\{k,r\}}(x)=0$ . Hence conventionally, if $p(x)=0$ for some $x\in X$ then we have $\left(p(x)\right)^{r+k+1}\ln_{\{k,r\}}(p(x))=0$ . The theorem 1 suggests that $-\ln_{\{k,r\}}(p(x))>0$ , that is $-\left(p(x)\right)^{r+k+1}\ln_{\{k,r\}}(p(x))\geq 0$ for any non-zero probability $p(x)$ . Therefore, given any random variable $X$ we have $S_{\{k,r\}}(X)>0$ .

This modification in Sharma-Mittal entropy is consistent to the Tsallis entropy [12]. Putting $r=k$ , in the expression $\ln_{\{k,r\}}(x)=\frac{x^{k}-x^{-k}}{2kx^{r}}$ we find

[TABLE]

Now, for $k=\frac{q-1}{2}$ in the expression of $\ln_{\{k,k\}}(x)$ we find that

[TABLE]

which is the Tsallis logarithm. Putting $r=k=\frac{q-1}{2}$ in definition 2, we find that

[TABLE]

which is the Tsallis entropy [6, 8].

There are a number of similar expressions generating the Tsallis entropy. For instance, consider $-\sum_{x\in X}\left(p(x)\right)^{2k+1}\ln_{\{k,r\}}(p(x))$ . This expression also produces the Tsallis entropy for $k=r=\frac{q-1}{2}$ . Another expression with similar property is $-\sum_{x\in X}\left(p(x)\right)^{2r+1}\ln_{\{k,r\}}(p(x))$ . In fact there are many others. Our motivation behind considering the expression of definition 2 as an ideal expression of generalized Tsallis entropy comes from lemma 1, which is the product rule for deformed logarithm. It establishes a chain rule like identity for the generalized Tsallis entropy. But, other expressions do not offer this scope. Details of the chain rule for generalized Tsallis entropy will be discussed below.

Definition 3.

(Joint entropy) Let $\mathcal{P}=\{p(x,y)\}_{(x,y)\in(X,Y)}$ be a probability distribution of the joint random variable $(X,Y)$ . The generalized Tsallis joint entropy of the joint random variable $(X,Y)$ is defined by

[TABLE]

Similarly, for three random variables $X,Y$ , and $Z$ the joint entropy will be given by

[TABLE]

Recall that the probability distribution of conditional random variable $Y|X=x$ is given by $p(Y|X=x)=\frac{p(X,Y)}{p(X=x)}$ . In sort, $p(x,y)=p(y|x)p(x)$ . Now, we define the conditional entropy as follows:

Definition 4.

(Conditional entropy) Given a conditional random variable $Y|X=x$ we define the generalized Tsallis conditional entropy as

[TABLE]

As $\ln_{\{k,r\}}(x)=-\frac{1}{x^{r}}\ln_{\{k,r\}}\left(\frac{1}{x}\right)$ , we can alternatively write down

[TABLE]

This definition can be generalized for three or more random variables. Given three random variables $X,Y$ and $Z$ we have

[TABLE]

In a similar fashion, we can define

[TABLE]

Similarly, the definition of conditional entropy can be extended for any number of random variables for defining $S_{\{k,r\}}(X_{1},X_{2},\dots X_{n}|Y_{1},Y_{2},\dots Y_{m})$ .

The definitions of the generalised Tsallis joint entropy and conditional entropy are also consistent to the definitions of Tsallis joint and conditional entropy [8], respectively. Note that,

[TABLE]

which is the Tsallis joint entropy. In addition,

[TABLE]

which is the Tsallis conditional entropy.

Lemma 6.

Given two independent random variables $X$ and $Y$ the generalised Tsallis conditional entropy can be expressed as

[TABLE]

Proof.

From the definition of conditional entropy we find that

[TABLE]

Also the definition of generalized Tsallis entropy mentioned in definition 2 suggests that $\ln_{\{k,r\}}(p(x))=\frac{(p(x))^{2k}-1}{2k(p(x))^{r+k}}$ that is $(p(x))^{2k}=1+2k(p(x))^{r+k}\ln_{\{k,r\}}(p(x))$ . Putting it in the above equation we construct $S_{\{k,r\}}(Y|X)=$

[TABLE]

For independent random variables $X$ and $Y$ we have $p(y|x)=p(y)$ . Therefore, $S_{\{k,r\}}(Y|X)=$

[TABLE]

∎

Putting $r=k=\frac{q-1}{2}$ in this result we find,

[TABLE]

for any two independent random variables $X$ and $Y$ . Now we have the following corollaries.

Given any two independent random variables $X$ and $Y$ , the lemma 6 suggests that $S_{\{k,r\}}(Y|X)\leq S_{\{k,r\}}(Y)$ . But, this inequality holds for any two random variables $X$ and $Y$ , which we discuss in the lemma below.

Lemma 7.

Given any two random variables $X$ and $Y$ we have $S_{\{k,r\}}(Y|X)\leq S_{\{k,r\}}(Y)$ .

Proof.

Consider a function $f(x)=x^{k+r+1}\ln_{\{k,r\}}(x)$ where $r>0,0<k\leq\frac{1}{2}$ and $0\leq x\leq 1$ . Simplifying, $f(x)=x\frac{x^{2k}-1}{2k}$ . As $0<k\leq\frac{1}{2}$ and $0\leq x\leq 1$ , we have $f(x)\leq 0$ . Differentiating we get $f^{\prime}(x)=\frac{(2k+1)x^{2k}-1}{2k}$ and $f^{\prime\prime}(x)=\frac{(2k+1)2kx^{2k-1}}{2k}=(2k+1)x^{2k-1}\geq 0$ for all $x>0$ . Therefore, $f(x)$ is a convex function that is $-f(x)$ is a concave function.

As $0\leq p(x)\leq 1$ , we have $0\leq\left(p(x)\right)^{2k+1}\leq p(x)\leq 1$ . Also, $0\leq p(y|x)\leq 1$ indicates,

[TABLE]

for $0\leq x\leq 1$ . Combining we get

[TABLE]

Now, applying the concavity property of $-f(x)$ we find

[TABLE]

Expanding $f(p(y|x))$ in the above equation,

[TABLE]

Summing over $Y$ we find

[TABLE]

Combining this equation with equation (35) we find

[TABLE]

The first and the last term of the above inequality indicates $S_{\{k,r\}}(Y|X)\leq S_{\{k,r\}}(Y)$ . ∎

Theorem 3.

(Chain rule for generalised Tsallis entropy) Given any two random variables $X$ and $Y$ the generalised Tsallis joint entropy can be expressed as

[TABLE]

Proof.

The probability of joint random variables can be expressed as $p(x,y)=p(x)p(y|x)$ . The product rule of $\ln_{\{k,r\}}(x)$ mentioned in lemma 1 indicates that

[TABLE]

Applying $p(x,y)=p(x)p(y|x)$ we find that

[TABLE]

The definition 2 of generalized Tsallis entropy suggests that $\ln_{\{k,r\}}(p(x))=\frac{(p(x))^{2k}-1}{2k(p(x))^{r+k}}$ that is $(p(x))^{2k}=1+2k(p(x))^{r+k}\ln_{\{k,r\}}(p(x))$ . Therefore, the above equation indicates that

[TABLE]

Multiplying both side by $p(x,y)$ and summing over $X$ and $Y$ we get

[TABLE]

Now, definition 3 and 4 together indicate

[TABLE]

∎

The above theorem clearly indicates that $S_{\{k,r\}}(X)\leq S_{\{k,r\}}(X,Y)$ . For two independent random variables $X$ and $Y$ the lemma 6 and theorem 3 produce that the pseudo-additivity property for generalised Tsallis entropy is

[TABLE]

Putting $k=r=\frac{q-1}{2}$ in the above equation we find

[TABLE]

which is the pseudoadditivity property of Tsallis entropy [8].

Corollary 3.

The following chain rules holds for generalized Tsallis entropy: $S_{\{k,r\}}(X,Y,Z)=S_{\{k,r\}}(X,Y|Z)+S_{\{k,r\}}(Z)$ .

Proof.

We have $p(x,y,z)=p(x,y|z)p(z)$ . Now, applying the product rule mentioned in lemma 1 we find

[TABLE]

Now the equation $\left(p(z)\right)^{2k}=1+2k\left(p(z)\right)^{r+k}\ln_{\{k,r\}}\left(p(z)\right)$ and the definitions of joint and conditional entropies indicate $S_{\{k,r\}}(X,Y,Z)=S_{\{k,r\}}(X,Y|Z)+S_{\{k,r\}}(Z)$ . ∎

Corollary 4.

The generalized Tsallis entropy also fulfils the following chain rule: $S_{\{k,r\}}(X,Y|Z)=S_{\{k,r\}}(X|Z)+S_{\{k,r\}}(Y|X,Z)$ .

Proof.

We also have $p(x,y,z)=p(y|x,z)p(x,z)$ . Applying the similar approach in corollary 3 and theorem 3 we have

[TABLE]

Applying corollary 3 we have

[TABLE]

Now the theorem 3 suggests $S_{\{k,r\}}(X,Z)=S_{\{k,r\}}(Z)+S_{\{k,r\}}(X|Z)$ . Putting it in the above equation we have

[TABLE]

∎

The corollary 4 also suggests that

[TABLE]

In general the corollary 3 and 4 can be generalized as

[TABLE]

which indicates

[TABLE]

For any two independent random variables $X$ ad $Y$ equation (44) suggests that

[TABLE]

as $0<k\leq\frac{1}{2}$ . If $X$ and $Y$ are any two random variables theorem 3 and lemma 7 together indicate the following theorem, which is the Sub-additive property of generalized Tsallis entropy:

Theorem 4.

Given any two random variables $X$ and $Y$ we have

[TABLE]

where $r>0$ and $0<k\leq\frac{1}{2}$ .

This theorem can be further generalized as

[TABLE]

where $X_{1},X_{2},\dots X_{n}$ are random variables.

Lemma 8.

Given any three random variables $X$ , $Y$ and $Z$ we have $S_{\{k,r\}}(Y|Z)\geq S_{\{k,r\}}(Y|X,Z)$ .

Proof.

Recall from the proof of lemma 7 that the function $f(x)=x^{k+r+1}\ln_{\{k,r\}}(x)$ , where $r>0,0<k\leq\frac{1}{2}$ and $0\leq x\leq 1$ is a convex function, as well as $f(x)\leq 0$ . Therefore, as $0\leq p(y|z)\leq 1$ we have

[TABLE]

In addition, $0\leq p(y|x,z)\leq 1$ indicates

[TABLE]

Also, recall a basic result of conditional probability which is

[TABLE]

Using the concavity property of $-f(x)$ in the expression below we get

[TABLE]

Multiplying both side of the above inequality with $(p(z))^{2k+1}$ and summing over $Y$ and $Z$ we find

[TABLE]

Note that, $p(x,z)^{2k+1}=(p(z))^{2k+1}(p(x|z))^{2k+1}\leq(p(z))^{2k+1}p(x|z)$ . Therefore,

[TABLE]

Combining we get $S_{\{k,r\}}(Y|Z)\geq S_{\{k,r\}}(Y|X,Z)$ . ∎

The above inequality leads up the the strong sub-additivity property of generalized Tsallis entropy which is mentioned below.

Theorem 5.

Given any three random variable $X,Y$ and $Z$ we have

[TABLE]

Proof.

The theorem 3 indicates

[TABLE]

Now, applying the chain rules for the generalized Tsallis entropy mentioned in corollary 4 we find

[TABLE]

The chain rule in corollary 3 leads us to

[TABLE]

Now lemma 8 indicates $S_{\{k,r\}}(Y|Z)-S_{\{k,r\}}(Y|X,Z)\geq 0$ . Therefore,

[TABLE]

Hence, the result. ∎

4 Fundamental properties of generalized Tsallis relative entropy

In Shannon information theory, the relative entropy, or Kullback-Leibler divergence is a measure of difference between two probability distributions. Recall that given two probability distributions $\mathcal{P}=\{p(x)\}_{x\in X}$ and $\mathcal{Q}=\{q(x)\}_{x\in X}$ the Kullback-Leibler divergence [14] is defined by

[TABLE]

We generalize it in terms of generalized Tsallis entropy as follows:

Definition 5.

(generalized Tsallis relative entropy) Given two probability distributions $\mathcal{P}=\{p(x)\}_{x\in X}$ and $\mathcal{Q}=\{q(x)\}_{x\in X}$ the generalised Tsallis relative entropy is given by

[TABLE]

The equivalence between two expressions of $D_{\{k,r\}}(\mathcal{P}||\mathcal{Q})$ follows from corollary 1. In the above definition, the term $\left(\frac{q(x)}{p(x)}\right)^{k+r}$ is essential to establish the pseudo-additive property of the generalized Tsallis relative entropy. But, in the above definition applying $k=r=\frac{q-1}{2}$ , as earlier, does not leads us to the usual definition of the Tsallis relative entropy. Putting $k=r=\frac{1-q}{2}$ in $-\sum_{x\in X}p(x)\left(\frac{q(x)}{p(x)}\right)^{r+k}\ln_{\{k,r\}}\left(\frac{q(x)}{p(x)}\right)$ we find

[TABLE]

which is the Tsallis relative entropy [8, 7]. Now we discuss a few properties of the generalised Tsallis divergence.

Lemma 9.

(Nonnegativity) For any two probability distribution $\mathcal{P}$ and $\mathcal{Q}$ the generalised Tsallis relative entropy $D_{\{k,r\}}(\mathcal{P}||\mathcal{Q})\geq 0$ . The equality holds for $\mathcal{P}=\mathcal{Q}$ .

Proof.

Lemma 4 suggests that $-x^{k+r}\ln_{\{k,r\}}(x)$ is a convex function for all $x\geq 0$ and $0\leq k\leq\frac{1}{2}$ . Therefore,

[TABLE]

Now, $\ln_{\{k,r\}}\left(\sum_{x\in X}p(x)\frac{q(x)}{p(x)}\right)=\ln_{\{k,r\}}\left(\sum_{x\in X}q(x)\right)=\ln_{\{k,r\}}(1)=0$ . Note that, if $\mathcal{P}=\mathcal{Q}$ then

[TABLE]

as $\ln_{\{k,r\}}(1)=0$ . ∎

Lemma 10.

(Symmetry) Let $\mathcal{P}^{\prime}=\{p^{\prime}_{i}\}$ and $\mathcal{Q}^{\prime}=\{q^{\prime}_{i}\}$ be two probability distributions, such that, $p(x)^{\prime}=p_{\pi(i)}$ and $q(x)^{\prime}=q_{\pi(i)}$ for a permutation $\pi$ and probability distributions $\mathcal{P}=\{p(x)\}_{x\in X}$ and $\mathcal{Q}=\{q(x)\}_{x\in X}$ . Then $D_{\{k,r\}}(\mathcal{P}^{\prime}||\mathcal{Q}^{\prime})=D_{\{k,r\}}(\mathcal{P}||\mathcal{Q})$ .

Proof.

The permutation $\pi$ alters the position of $p(x)\left(\frac{p(x)}{q(x)}\right)^{r-k}\ln_{\{k,r\}}\left(\frac{p(x)}{q(x)}\right)$ under addition and keeps the sum $D_{\{k,r\}}(\mathcal{P}||\mathcal{Q})$ , unaltered. Hence, the proof follows trivially. ∎

Lemma 11.

(Possibility of extension) Let $\mathcal{P}^{\prime}=\mathcal{P}\cup\{0\}$ and $\mathcal{Q}^{\prime}=\mathcal{Q}\cup\{0\}$ , then $D_{\{k,r\}}(\mathcal{P}^{\prime}||\mathcal{Q}^{\prime})=D_{\{k,r\}}(\mathcal{P}||\mathcal{Q})$ .

Proof.

Define $0\left(\frac{0}{0}\right)^{r+k}\ln_{\{k,r\}}\left(\frac{0}{0}\right)=\lim_{(x,y)\rightarrow(0,0)}x\left(\frac{y}{x}\right)^{r+k}\ln_{\{k,r\}}\left(\frac{y}{x}\right)$ . Expanding logarithm of of $x\left(\frac{y}{x}\right)^{r+k}\ln_{\{k,r\}}\left(\frac{y}{x}\right)$ we find

[TABLE]

Hence, we find $\lim_{x\rightarrow 0}\lim_{y\rightarrow 0}x\left(\frac{y}{x}\right)^{r+k}\ln_{\{k,r\}}\left(\frac{y}{x}\right)=0$ . In addition, we have $\lim_{y\rightarrow 0}\lim_{x\rightarrow 0}x\left(\frac{y}{x}\right)^{r+k}\ln_{\{k,r\}}\left(\frac{y}{x}\right)=0$ . Now applying Moore-Osgood theorem [16] we find that $\lim_{(x,y)\rightarrow(0,0)}x\left(\frac{y}{x}\right)^{r+k}\ln_{\{k,r\}}\left(\frac{y}{x}\right)=0$ . Therefore, $0\ln_{\{k,r\}}\left(\frac{0}{0}\right)=0$ . Hence, $D_{\{k,r\}}(\mathcal{P}^{\prime}||\mathcal{Q}^{\prime})=D_{\{k,r\}}(\mathcal{P}||\mathcal{Q})$ . ∎

Given two probability distributions $\mathcal{P}=\{p(x)\}_{x\in X}$ and $\mathcal{Q}=\{q(y)\}_{y\in Y}$ we can define a joint probability distribution $\mathcal{P}\otimes\mathcal{Q}=\{p(x)q(y)\}_{(x,y)\in X\otimes Y}$ . Note that, for all $x\in X$ and $y\in Y$ we have $0\leq p(x)q(y)\leq 1$ . In addition, $\sum_{x\in X}\sum_{y\in Y}p(x)q(y)=1$ . Now, we have the following theorem.

Theorem 6.

(Pseudo-additivity) Given probability distributions $\mathcal{P}^{(1)}=\{p^{(1)}(x)\}_{x\in X}$ , $\mathcal{Q}^{(1)}=\{q^{(1)}(x)\}_{x\in X}$ , $\mathcal{P}^{(2)}=\{p^{(2)}(y)\}_{y\in Y}$ and $\mathcal{Q}^{(2)}=\{q^{(2)}(y)\}_{y\in Y}$ we have

[TABLE]

Proof.

Recall the product rule of $\ln_{\{k,r\}}(xy)$ mentioned in the lemma 1. Expanding the logarithm we find

[TABLE]

Multiplying $p^{(1)}(x)p^{(2)}(y)$ with both side we find

[TABLE]

Now, applying the definition 5 we find $D_{\{k,r\}}(\mathcal{P}^{(1)}\otimes\mathcal{P}^{(2)}||\mathcal{Q}^{(1)}\otimes\mathcal{Q}^{(2)})$

[TABLE]

∎

Putting $k=\frac{q-1}{2}$ and $r=-\frac{q-1}{2}$ in the above result we find $D_{q}(\mathcal{P}^{(1)}\otimes\mathcal{P}^{(2)}||\mathcal{Q}^{(1)}\otimes\mathcal{Q}^{(2)})=$

[TABLE]

which is the Pseudo-additive property of Tsallis relative entropy [8].

Theorem 7.

(Joint convexity) Let $\mathcal{P}^{(k)}=\{p^{(k)}(x)\}_{x\in X}$ and $\mathcal{Q}^{(k)}=\{q^{(k)}(x)\}_{x\in X}$ for $k=1,2$ are probability distributions. Construct new probability distributions $(1-\lambda)\mathcal{P}^{(1)}+\lambda\mathcal{P}^{(2)}=\{(1-\lambda)p^{(1)}(x)+\lambda p^{(2)}(x)\}_{x\in X}$ , and $(1-\lambda)\mathcal{Q}^{(1)}+\lambda\mathcal{Q}^{(2)}=\{(1-\lambda)q^{(1)}(x)+\lambda q^{(2)}(x)\}_{x\in X}$ as convex combinations. Then

[TABLE]

Proof.

Note that, $D_{\{k,r\}}((1-\lambda)\mathcal{P}^{(1)}+\lambda\mathcal{P}^{(2)}||(1-\lambda)\mathcal{Q}^{(1)}+\lambda\mathcal{Q}^{(2)})=$

[TABLE]

Now applying the log-sum inequality stated in theorem 2 we find

[TABLE]

Summing over $x$ , we find the result. ∎

Before proceeding farther, we make a change of notations from now on. Let $X$ be a random variable with outcomes $(x_{1},x_{2},\dots x_{n})$ . We represent a probability distribution $\mathcal{P}=\{p(x)\}_{x\in X}$ by a finite sequence as $\mathcal{P}=\{p_{i}:p_{i}=p(x_{i}),\sum_{i=1}^{n}p_{i}=1,0\leq p_{i}\leq 1\}$ . Now consider a transition probability matrix $W=(w_{j,i})_{m\times n}$ such that $\sum_{j=1}^{m}w_{j,i}=1$ for all $i=1,2,\dots n$ . Let $\mathcal{P}=\{p_{i}^{(in)}\}_{i=1}^{n}$ and $\mathcal{Q}=\{q_{i}^{(in)}\}_{i=1}^{n}$ be two probability distributions. After a transition with $W$ the new probability distributions are $W\mathcal{P}=\{p_{j}^{(out)}\}_{j=1}^{m}$ and $W\mathcal{Q}=\{q_{j}^{(out)}\}_{j=1}^{m}$ , where $p_{j}^{(out)}=\sum_{i=1}^{n}w_{j,i}p_{i}^{(in)}$ , and $q_{j}^{(out)}=\sum_{i=1}^{n}w_{j,i}q_{i}^{(in)}$ . Now, we have the following theorem.

Theorem 8.

(Information monotonicity in general) Given probability distributions $\mathcal{P}$ , $\mathcal{Q}$ and transition probability matrix $W$ we have $D_{\{k,r\}}(W\mathcal{P}||W\mathcal{Q})\leq D_{\{k,r\}}(\mathcal{P}||\mathcal{Q})$ .

Proof.

Modifying the notations in definition 5 we find that $D_{\{k,r\}}(W\mathcal{P}||W\mathcal{Q})=$

[TABLE]

Now, theorem 2 suggests that $D_{\{k,r\}}(W\mathcal{P}||W\mathcal{Q})$

[TABLE]

Hence, we have $D_{\{k,r\}}(W\mathcal{P}||W\mathcal{Q})\leq D_{\{k,r\}}(\mathcal{P}||\mathcal{Q})$ . ∎

In theorem 8, if the probability transition matrix $W=(w_{ji})_{m\times n}$ has $m<n$ , then $W$ partitions the random variable $X=(x_{1},x_{2},\dots x_{n})$ into $m$ groups $G_{1},G_{2},\dots G_{n}$ such that $X=\cup_{j=1}^{m}G_{j}$ , and $G_{k}\cap G_{l}=\emptyset$ . Then $p_{j}^{(out)}(G_{j})=\sum_{x_{i}\in G_{j}}p_{i}^{(in)}$ . Now the theorem 8 indicates $D(W\mathcal{P}|W\mathcal{Q})\leq D(\mathcal{P}|\mathcal{Q})$ , which is formally mentioned as information monotonicity.

5 Information geometric aspects

Let us start with reviewing main concepts of information geometry. We consider a probability simplex,

[TABLE]

with the distribution $\mathcal{P}$ described by $n$ -independent probabilities $(p_{1},p_{2},\dots p_{n})$ . Consider a parametric family of distributions $\mathcal{P}({\bf x})$ with parameter vector ${\bf x}=(x_{1},x_{2},\dots x_{n})\in X$ , where $X$ is a parameter space. If the parameter space $X$ is a differentiable manifold and the mapping $x\mapsto\mathcal{P}({\bf p},{\bf x})$ is a diffeomorphism we can identify statistical models in the family as points on the manifold $X$ . The Fisher-Rao information matrix $E(ss^{T})$ , where s is the gradient $[s]_{i}=\frac{\partial\log\mathcal{P}({\bf p},{\bf x})}{\partial x_{i}}$ may be used to endow $X$ with the following Riemannian metric

[TABLE]

If $X$ is discrete the above integral is replaced with a sum. An equivalent form of (7) for normalized distributions that is given by

[TABLE]

In information geometry [17, 18], a function $D(\mathcal{P}||\mathcal{Q})$ between two points $P$ and $Q$ in a manifold M for $\mathcal{P},\mathcal{Q}\in S$ is called divergence if it fulfils the following conditions:

$D(\mathcal{P}||\mathcal{Q})\geq 0$ . 2. 2.

$D(\mathcal{P}||\mathcal{Q})\geq 0$ if and only if $\mathcal{P}=\mathcal{Q}$ .

We denote the coordinates of a point $(\mathcal{P}$ by $(p_{1},p_{2},\dots p_{n})$ . For infinitesimally close two points $\mathcal{P}$ and $\mathcal{Q}=(\mathcal{P}+d(\mathcal{P}$ , we have by Taylor expansion For small $d\mathcal{P}$ we have

[TABLE]

where $g_{ij}$ is a positive-definite matrix. Hence, the Riemannian metric induced by a divergence $D$ is given by

[TABLE]

Thus the divergence gives us a means of determining the degree of separation between two points on a manifold, it is not a metric since it is not necessarily symmetric. An important divergence in information geometry is the Kullback-Leibler (KL) divergence, or relative entropy.

Therefore, the length of small line segment is given by

[TABLE]

In this article, we consider

[TABLE]

Now,

[TABLE]

The above calculation indicates $G=(g_{ij})_{n\times n}$ where

[TABLE]

The matrix $G$ is also called the Fisher information matrix.

Theorem 9.

The statistical manifold induced by the generalised Tsallis relative entropy is Hassian.

Proof.

A manifold is called Hassian if there is a function $\Psi(u)$ such that $g_{ij}(\mathcal{P})=\partial_{ij}(\Psi)$ . For $i=j$ we have $\partial_{ii}(\Psi)=g_{ii}(u)=\frac{1-2k+4r}{u}$ . Integrating twice we find

[TABLE]

where $c_{1}$ and $c_{2}$ are integrating constants. For $i\neq j$ we have $\partial_{ii}(\Psi)=g_{ij}=0$ , that is $\Psi(u)=c_{1}u+c_{2}$ . Hence, the statistical manifold is Hassian. ∎

6 Conclusion and open problems

In recent years, the idea of entropy is generalized in the context of thermodynamics, information theory, and dynamical systems with the help of advanced mathematical tools [19, 20]. It offers a broad scope of mathematical investigations. The Tsallis entropy has been widely utilized in different branches of science and engineering [21, 22, 23, 24, 25]. This article is a detailed description of the characteristics of generalised Tsallis relative entropy. Here, we propose a modification in the definition of the Sharma-Mittal entropy, such that, the new entropy fulfils the chain rule. Similarly, we modify the definitions of Sharma-Mittal joint entropy, conditional entropy, and relative entropy. We establish a number of characteristics of the generalised Tsallis divergence, which make it efficient to be utilized in classical information theory. Also, we justify that the statistical manifold induced by the generalised Tsallis relative entropy is Hassian. The following problems may be discussed in future:

In Shannon information theory, the mutual information of two random variables $X$ and $Y$ is defined by $I(X;Y)=D(p(x,y)|p(x)p(y))$ , which is the Kullback-Leibler divergence between two probability distributions $p(x,y)$ and $p(x)p(y)$ . In case of generalised Tsallis entropy, one may introduce the mutual information $I_{\{k,r\}}(X;Y)=D_{\{k,r\}}(p(x,y)|p(x)p(y))$ then investigates its properties. Moreover, the mutual information has a crucial role in the literature of data processing inequalities. Hence, two parameter deformation of data-processing inequalities will be very crucial in this direction. 2. 2.

In Shannon information theory, it is proved that

[TABLE]

where $H$ denotes the Shannon entropy. As the generalised Tsallis mutual information is not well proposed we may define mutual entropy as

[TABLE]

Note that, here we do not assign the term mutual information [8]. Although, it is used as relative entropy for various applications [26]. In quantum information theory, these identities generates quantum discord, which is a well known quantum correlation. There are a few works discussing the deformation of quantum discord in terms of Tsallis [27], Renyi [28], and Sharma-Mittal entropy [29]. There is a scope for further investigation in this direction.

Acknowledgement

The author (S.F.) was partially supported by JSPS KAKENHI Grant Number 16K05257.

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Shun-ichi Amari and Hiroshi Nagaoka. Methods of information geometry , volume 191. American Mathematical Soc., 2007.
2[2] AM Scarfone and T Wada. Thermodynamic equilibrium and its stability for microcanonical systems described by the sharma-taneja-mittal entropy. Physical Review E , 72(2):026123, 2005.
3[3] Antonio Scarfone, Hiroshi Matsuzoe, and Tatsuaki Wada. Information geometry of κ 𝜅 \kappa -exponential families: Dually-flat, hessian and legendre structures. Entropy , 20(6):436, 2018.
4[4] AM Scarfone. Legendre structure of the thermostatistics theory based on the sharma–taneja–mittal entropy. Physica A: Statistical Mechanics and its Applications , 365(1):63–70, 2006.
5[5] Antonio M Scarfone, Hiroki Suyari, and Tatsuaki Wada. Gauss’ law of error revisited in the framework of sharma-taneja-mittal information measure. Central European Journal of Physics , 7(3):414–420, 2009.
6[6] Constantino Tsallis. Possible generalization of boltzmann-gibbs statistics. Journal of statistical physics , 52(1-2):479–487, 1988.
7[7] Shigeru Furuichi, Kenjiro Yanagi, and Ken Kuriyama. Fundamental properties of tsallis relative entropy. Journal of Mathematical Physics , 45(12):4868–4877, 2004.
8[8] Shigeru Furuichi. Information theoretical properties of tsallis entropies. Journal of Mathematical Physics , 47(2):023302, 2006.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Elements of Generalized Tsallis Relative Entropy in Classical Information Theory

Abstract

1 Introduction

2 Preliminary properties of a two parameter deformed logarithm

Theorem 1**.**

Proof.

Definition 1**.**

Lemma 1**.**

Proof.

Lemma 2**.**

Proof.

Corollary 1**.**

Proof.

Corollary 2**.**

Proof.

Lemma 3**.**

Proof.

Lemma 4**.**

Proof.

Lemma 5**.**

Proof.

Theorem 2**.**

Proof.

3 Modified Sharma-Mittal entropy

Definition 2**.**

Definition 3**.**

Definition 4**.**

Lemma 6**.**

Proof.

Lemma 7**.**

Proof.

Theorem 3**.**

Proof.

Corollary 3**.**

Proof.

Corollary 4**.**

Proof.

Theorem 4**.**

Lemma 8**.**

Proof.

Theorem 5**.**

Proof.

4 Fundamental properties of generalized Tsallis relative entropy

Definition 5**.**

Lemma 9**.**

Proof.

Lemma 10**.**

Proof.

Lemma 11**.**

Proof.

Theorem 6**.**

Proof.

Theorem 7**.**

Proof.

Theorem 8**.**

Proof.

5 Information geometric aspects

Theorem 9**.**

Proof.

6 Conclusion and open problems

Acknowledgement

Theorem 1.

Definition 1.

Lemma 1.

Lemma 2.

Corollary 1.

Corollary 2.

Lemma 3.

Lemma 4.

Lemma 5.

Theorem 2.

Definition 2.

Definition 3.

Definition 4.

Lemma 6.

Lemma 7.

Theorem 3.

Corollary 3.

Corollary 4.

Theorem 4.

Lemma 8.

Theorem 5.

Definition 5.

Lemma 9.

Lemma 10.

Lemma 11.

Theorem 6.

Theorem 7.

Theorem 8.

Theorem 9.