Optimal Guessing under Nonextensive Framework and associated Moment   Bounds

Abhik Ghosh

arXiv:1905.07729·cs.IT·May 21, 2019

Optimal Guessing under Nonextensive Framework and associated Moment Bounds

Abhik Ghosh

PDF

Open Access

TL;DR

This paper extends the classical guessing problem to a non-extensive Tsallis framework, deriving new moment bounds based on a generalized entropy measure, and explores mismatched guessing with robust divergence links.

Contribution

It introduces non-extensive moment bounds for guessing problems using Tsallis entropy and links these bounds to generalized divergence measures, expanding the theoretical understanding of guessing under non-extensive statistics.

Findings

01

Derived non-extensive moment bounds for guessing with side information

02

Established connection between non-extensive bounds and generalized entropy measures

03

Analyzed mismatched guessing bounds linked to robust divergence families

Abstract

We consider the problem of guessing the realization of a random variable but under more general Tsallis' non-extensive entropic framework rather than the classical Maxwell-Boltzman-Gibbs-Shannon framework. We consider both the conditional guessing problem in the presence of some related side information, and the unconditional one where no such side-information is available. For both types of the problem, the non-extensive moment bounds of the required number of guesses are derived; here we use the $q$ -normalized expectation in place of the usual (linear) expectation to define the non-extensive moments. These moment bounds are seen to be a function of the logarithmic norm entropy measure, a recently developed two-parameter generalization of the Renyi entropy, and hence provide their information theoretic interpretation. We have also considered the case of uncertain source distribution…

Equations113

E (X) = E (P_{X}) = - x \in X \sum P_{X} (x) ln P_{X} (x) .

E (X) = E (P_{X}) = - x \in X \sum P_{X} (x) ln P_{X} (x) .

E_{\frac{1}{( 1 + ρ )}} (P_{X}) - ln (1 + ∣ X ∣) \leq \frac{1}{ρ} ln (G min E [G (X)^{ρ}]) \leq E_{\frac{1}{( 1 + ρ )}} (P_{X}),

E_{\frac{1}{( 1 + ρ )}} (P_{X}) - ln (1 + ∣ X ∣) \leq \frac{1}{ρ} ln (G min E [G (X)^{ρ}]) \leq E_{\frac{1}{( 1 + ρ )}} (P_{X}),

E_{α} (P_{X}) := \frac{1}{1 - α} lo g [x \in X \sum P_{X} (x)^{α}], α > 0.

E_{α} (P_{X}) := \frac{1}{1 - α} lo g [x \in X \sum P_{X} (x)^{α}], α > 0.

R (P, G) = \frac{1}{ρ} lo g E [G (X)^{ρ}] - \frac{1}{ρ} lo g E [G_{P} (X)^{ρ}],

R (P, G) = \frac{1}{ρ} lo g E [G (X)^{ρ}] - \frac{1}{ρ} lo g E [G_{P} (X)^{ρ}],

E_{q} [G (X)] = \frac{\sum _{x \in X} G ( x ) P ( x ) ^{q}}{\sum _{x \in X} P ( x ) ^{q}}, q \in R,

E_{q} [G (X)] = \frac{\sum _{x \in X} G ( x ) P ( x ) ^{q}}{\sum _{x \in X} P ( x ) ^{q}}, q \in R,

E_{q} [G (X)^{ρ}] \geq (1 + ln ∣ X ∣)^{- ρ} \frac{[ \sum _{x \in X} P _{X} ( x ) ^{\frac{q}{1 + ρ}} ] ^{1 + ρ}}{\sum _{x \in X} P _{X} ( x ) ^{q}} .

E_{q} [G (X)^{ρ}] \geq (1 + ln ∣ X ∣)^{- ρ} \frac{[ \sum _{x \in X} P _{X} ( x ) ^{\frac{q}{1 + ρ}} ] ^{1 + ρ}}{\sum _{x \in X} P _{X} ( x ) ^{q}} .

E_{q} [G (X)^{ρ}]

E_{q} [G (X)^{ρ}]

R E (Q, P) = x \in X \sum Q (x) ln \frac{Q ( x )}{P ( x )} .

R E (Q, P) = x \in X \sum Q (x) ln \frac{Q ( x )}{P ( x )} .

x \in X \sum Q (x) G (x) = E (Q) - x \in X \sum Q (x) ln \frac{1}{Q ( x ) G ( x )} \geq E (Q) - ln x \in X \sum \frac{1}{G ( x )},

x \in X \sum Q (x) G (x) = E (Q) - x \in X \sum Q (x) ln \frac{1}{Q ( x ) G ( x )} \geq E (Q) - ln x \in X \sum \frac{1}{G ( x )},

x \in X \sum \frac{1}{G ( x )} = i = 1 \sum ∣ X ∣ \frac{1}{i} \leq 1 + ln ∣ X ∣.

x \in X \sum \frac{1}{G ( x )} = i = 1 \sum ∣ X ∣ \frac{1}{i} \leq 1 + ln ∣ X ∣.

x \in X \sum Q (x) G (x) \geq E (Q) - ln (1 + ln ∣ X ∣) .

x \in X \sum Q (x) G (x) \geq E (Q) - ln (1 + ln ∣ X ∣) .

x \in X \sum Q (x) ln P (x) = - E (Q) - R E (Q, P) .

x \in X \sum Q (x) ln P (x) = - E (Q) - R E (Q, P) .

E_{q} [G (X)^{ρ}]

E_{q} [G (X)^{ρ}]

Q^{*} (x) = \frac{P ( x ) ^{\frac{q}{1 + ρ}}}{\sum _{x^{'} \in X} P ( x ^{'} ) ^{\frac{q}{1 + ρ}}}, x \in X .

Q^{*} (x) = \frac{P ( x ) ^{\frac{q}{1 + ρ}}}{\sum _{x^{'} \in X} P ( x ^{'} ) ^{\frac{q}{1 + ρ}}}, x \in X .

E_{q} [G (X ∣ Y)^{ρ}] \geq (1 + ln ∣ X ∣)^{- ρ} \frac{\sum _{y \in Y} [ \sum _{x \in X} P _{X, Y} ( x , y ) ^{\frac{q}{1 + ρ}} ] ^{1 + ρ}}{\sum _{y \in Y} \sum _{x \in X} P _{X, Y} ( x , y ) ^{q}} .

E_{q} [G (X ∣ Y)^{ρ}] \geq (1 + ln ∣ X ∣)^{- ρ} \frac{\sum _{y \in Y} [ \sum _{x \in X} P _{X, Y} ( x , y ) ^{\frac{q}{1 + ρ}} ] ^{1 + ρ}}{\sum _{y \in Y} \sum _{x \in X} P _{X, Y} ( x , y ) ^{q}} .

E_{q} [G (X ∣ Y)^{ρ}]

E_{q} [G (X ∣ Y)^{ρ}]

E_{q} [G (X)^{ρ}]

E_{q} [G (X)^{ρ}]

E_{q} [G (X ∣ Y)^{ρ}]

E_{q} [G (X ∣ Y)^{ρ}] = y \in Y \sum P_{q} (\cdot, y) x \in X \sum P_{q} (x ∣ y) G (x ∣ y)^{ρ},

E_{q} [G (X ∣ Y)^{ρ}] = y \in Y \sum P_{q} (\cdot, y) x \in X \sum P_{q} (x ∣ y) G (x ∣ y)^{ρ},

G^{*} (x ∣ y) < G^{*} (x, ∣ y) \Rightarrow P_{q} (x ∣ y) \geq P_{q} (x^{'} ∣ y), \mbox f or a l l x, x^{'} \in X, y \in Y .

G^{*} (x ∣ y) < G^{*} (x, ∣ y) \Rightarrow P_{q} (x ∣ y) \geq P_{q} (x^{'} ∣ y), \mbox f or a l l x, x^{'} \in X, y \in Y .

E_{q} [G^{*} (X ∣ Y)^{ρ}]

E_{q} [G^{*} (X ∣ Y)^{ρ}]

G^{*} (x ∣ y)

G^{*} (x ∣ y)

E_{q} [G^{*} (X ∣ Y)^{ρ}]

E_{q} [G^{*} (X ∣ Y)^{ρ}]

(1 + ln ∣ X ∣)^{- ρ} L_{q, ρ} (X ∣ Y) \leq E_{q} [G^{*} (X ∣ Y)^{ρ}] \leq L_{q, ρ} (X ∣ Y) .

(1 + ln ∣ X ∣)^{- ρ} L_{q, ρ} (X ∣ Y) \leq E_{q} [G^{*} (X ∣ Y)^{ρ}] \leq L_{q, ρ} (X ∣ Y) .

(1 + ln ∣ X ∣)^{- ρ} L_{q, ρ} (X) \leq E_{q} [G^{*} (X)^{ρ}] \leq L_{q, ρ} (X),

(1 + ln ∣ X ∣)^{- ρ} L_{q, ρ} (X) \leq E_{q} [G^{*} (X)^{ρ}] \leq L_{q, ρ} (X),

L_{q, ρ} (X) = \frac{[ \sum _{x \in X} P _{X} ( x ) ^{\frac{q}{1 + ρ}} ] ^{1 + ρ}}{\sum _{x \in X} P _{X} ( x ) ^{q}} = [x \in X \sum P_{q} (x)^{\frac{1}{1 + ρ}}]^{1 + ρ} .

L_{q, ρ} (X) = \frac{[ \sum _{x \in X} P _{X} ( x ) ^{\frac{q}{1 + ρ}} ] ^{1 + ρ}}{\sum _{x \in X} P _{X} ( x ) ^{q}} = [x \in X \sum P_{q} (x)^{\frac{1}{1 + ρ}}]^{1 + ρ} .

E_{(α, β)} (X) = E_{(α, β)} (P_{X}) = \frac{α β}{( β - α )} ln \frac{( \sum _{x \in X} P _{X} ( x ) ^{α} ) ^{1/ α}}{( \sum _{x \in X} P _{X} ( x ) ^{β} ) ^{1/ β}}, α > 0, β \in R ∖ {α} .

E_{(α, β)} (X) = E_{(α, β)} (P_{X}) = \frac{α β}{( β - α )} ln \frac{( \sum _{x \in X} P _{X} ( x ) ^{α} ) ^{1/ α}}{( \sum _{x \in X} P _{X} ( x ) ^{β} ) ^{1/ β}}, α > 0, β \in R ∖ {α} .

ln L_{q, ρ} (X) = ρ E_{(\frac{q}{1 + ρ}, q)} (X) .

ln L_{q, ρ} (X) = ρ E_{(\frac{q}{1 + ρ}, q)} (X) .

E_{(\frac{q}{1 + ρ}, q)} (X) - ln (1 + ln ∣ X ∣) \leq \frac{1}{ρ} ln E_{q} [G^{*} (X)^{ρ}] \leq E_{(\frac{q}{1 + ρ}, q)} (X) .

E_{(\frac{q}{1 + ρ}, q)} (X) - ln (1 + ln ∣ X ∣) \leq \frac{1}{ρ} ln E_{q} [G^{*} (X)^{ρ}] \leq E_{(\frac{q}{1 + ρ}, q)} (X) .

E_{(α, β)} (X ∣ Y) = E_{(α, β)} (P_{X} ∣ P_{Y}) = \frac{α}{( β - α )} ln \frac{\sum _{y \in Y} ( \sum _{x \in X} P _{X, Y} ( x , y ) ^{α} ) ^{\frac{β}{α}}}{\sum _{y \in Y} ( \sum _{x \in X} P _{X, Y} ( x , y ) ^{β} )}, α > 0, β \in R ∖ {α} .

E_{(α, β)} (X ∣ Y) = E_{(α, β)} (P_{X} ∣ P_{Y}) = \frac{α}{( β - α )} ln \frac{\sum _{y \in Y} ( \sum _{x \in X} P _{X, Y} ( x , y ) ^{α} ) ^{\frac{β}{α}}}{\sum _{y \in Y} ( \sum _{x \in X} P _{X, Y} ( x , y ) ^{β} )}, α > 0, β \in R ∖ {α} .

E_{(\frac{q}{1 + ρ}, q)} (X ∣ Y) - ln (1 + ln ∣ X ∣) \leq \frac{1}{ρ} ln E_{q} [G^{*} (X ∣ Y)^{ρ}] \leq E_{(\frac{q}{1 + ρ}, q)} (X ∣ Y),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Mechanics and Entropy · Advanced Statistical Methods and Models · Mathematical Inequalities and Applications

Full text

Optimal Guessing under Nonextensive Framework

and associated Moment Bounds

Abhik Ghosh

Interdisciplinary Statistical Research unit

Indian Statistical Institute, Kolkata, India

*[email protected] *

Abstract

We consider the problem of guessing the realization of a random variable but under more general Tsallis’ non-extensive entropic framework rather than the classical Maxwell-Boltzman-Gibbs-Shannon framework. We consider both the conditional guessing problem in the presence of some related side information, and the unconditional one where no such side-information is available. For both types of the problem, the non-extensive moment bounds of the required number of guesses are derived; here we use the $q$ -normalized expectation in place of the usual (linear) expectation to define the non-extensive moments. These moment bounds are seen to be a function of the logarithmic norm entropy measure, a recently developed two-parameter generalization of the Renyi entropy, and hence provide their information theoretic interpretation. We have also considered the case of uncertain source distribution and derived the non-extensive moment bounds for the corresponding mismatched guessing function. These mismatched bounds are interestingly seen to be linked with an important robust statistical divergence family known as the relative $(\alpha,\beta)$ -entropies; similar link is discussed between the optimum mismatched guessing with the extremes of these relative entropy measures.

Keywords: Guessing strategy; Uncertain source; q-Normalized expectation; Logarithmic norm entropy; Relative ( $\alpha,\beta$ )-entropy; Logarithmic super divergence.

1 Introduction

The problem of guessing the realization of a random variable is an well-known and important problem in information theory, motivated by the need of decoding a cryptic message in the output of a communication channel [2, 21]. Suppose $X$ be a random variable taking values in a finite set, say $\mathcal{X}$ , and the probability mass function (pmf) of $X$ is $P_{X}(x)$ . We want to guess its realization by asking the question “Is $X=x$ ?”, varying $x\in\mathcal{X}$ sequentially, until the answer is “Yes”. For any guessing strategy, let $G(x)$ denotes the number of guess required to reach the correct conclusion given $X=x$ . The optimal guessing strategy, obtained by minimizing $E(G(X))$ , is to guess in the decreasing order of probabilities $\{P_{X}(x):x\in\mathcal{X}\}$ ; the minimum possible value of $E(G(X))$ is further related to the Shannon entropy of $X$ [26] defined as

[TABLE]

If additionally a correlated random variable $Y$ , taking values in a countable set $\mathcal{Y}$ , is available, the objective becomes guessing $X$ for a given values of $Y=y$ by a guessing strategy $G(X|Y=y)$ and the best candidate turns out to be guessing in order of decreasing probabilities of $X$ given $Y=y$ , say $P_{X|Y}(x|y)$ ; see [21, 2, 3].

Arikan [3] have further extended the above guessing theory by considering the minimization of the moments of $G(X)$ and providing a tight bound in terms of the Renyi entropy measure [25]. In particular, it has been shown in [3] that, for any $\rho>0$ ,

[TABLE]

where $\mathcal{E}_{\alpha}(P_{X})$ denotes the Renyi entropy of order $\alpha$ of the distribution $P_{X}$ (or the underlying random variable $X$ ) given by

[TABLE]

Note that $\mathcal{E}_{1}(P_{X})$ is defined only in the limiting sense as $\alpha\rightarrow 1$ and coincides with the Shannon entropy $\mathcal{E}(P_{X})$ given in (1). For the conditional guessing given $Y$ , moment bounds also similar to (2) are studied in [3]. But all these results assume that the true source distribution $P_{X}$ or $P_{X|Y}$ is known in the respective cases.

The above guessing theory has been extended more recently by [27, 15] to the cases of uncertain sources, where the guesser only know that the source distribution is coming from a family $\mathcal{P}$ of pmfs over $\mathcal{X}$ . Considering the conditional guessing problem with joint distribution of $(X,Y)$ denoted by $P$ , here one need to minimize the worst (supremum) value of the penalty or redundancy measure

[TABLE]

where $G_{P}$ denote the optimal guessing strategy when source has distribution $P$ . The supremum and minimum of $R(P,G)$ should taken, respectively, over $P\in\mathcal{P}$ and all guessing strategy $G$ . The final optimum value $[\min_{G}\sup_{P}R(P,G)]$ in this case of uncertain source also satisfies a similar moment bound as in (2) with Renyi entropy replaced by the corresponding relative entropy measure $\mathcal{RE}_{\alpha}(P,Q_{G})$ , where $Q_{G}$ is a distribution on $\mathcal{X}$ such that $G_{Q_{G}}=G$ ; see [27] for details.

Most, if not all, works on guessing are developed with the ordinary (linear) expectation, the basis of Shannon theory related to the classical Maxwell-Boltzmann-Gibbs (MBG) statistical physics. However, more recently several complex systems are observed where the prediction of the MBG theory fails leading to more general entropies and corresponding statistical frameworks. A popular extension is the Tsallis entropy [30] and associated non-extensive statistics which are applied successfully to predict the behavior of many complex systems; see, for example, [34, 33, 32, 17, 16, 10] and the references therein. These works lead to a whole new framework of non-extensive statistical physics which has also been applied to the information science generalizing the classical results of Shannon Coding theory; see [9, 11, 28, 24, 7, 20, 8, 22, 35, 36] among many others.

In this paper, we will extend the guessing theory and related moment inequalities under the Tsallis’ non-extensive frameworks, considering suitably defined generalized expectation in place of the usual (linear) expectation. After some debates [31, 34, 33], it is finally well accepted that the “best” choice of constraints under the non-extensive framework should be given in terms of the $q$ -normalized expectation defined as

[TABLE]

for any function $G(\cdot)$ of $X$ (including the guessing function considered above). Note that, these expectations can also be written as the linear expectation with respect to the $q$ -escort distribution $P_{q}=P^{q}/W_{q}(P)$ , with $W_{q}(P)=\sum_{x\in\mathcal{X}}P(x)^{q}$ , which has its own importance and applications in information theory ([23, 5, 29, 1, 4, 5]). So, it is natural to study the guessing inequalities and the optimal guessing results of [3, 27] in terms of the non-extensive $q$ -normalized expectation in (5), which is the main objective of the present paper.

Major contribution of this paper can be summarized as follows.

•

We discuss the optimal guessing strategy, both for unconditional and conditional guessing problems, obtained by minimizing the $q$ -normalized moments of the number of guesses under Tsalli’s non-extensive framework, and obtain the moments bound for the resulting optimum guessing function.

•

We obtain a lower bound of the non-extensive moments of the number of guesses required to correctly predict a realization of a discrete random variable $X$ with and without additional side-information. The obtained bound is shown to be tight up to a multiplicative constant for the optimum strategy.

•

We provide an information theoretic justification of a newly developed two-parameter family of entropy measures, namely the logarithmic norm entropy (LNE), which were developed as a generalization of Renyi entropy family. In this paper, we have proved a direct connection of the LNE measures with the moment bound of the optimum guessing under non-extensive framework, indicating a new interpretation of these LNE measures. As a by-product, we further extend the LNE measures to define the corresponding conditional entropy as well.

•

We have also considered the cases where the source distribution is not exactly known and the guessing is to be done based on a mismatched distribution. For both the unconditional and conditional problems, we have derived the $q$ -normalized moment bounds for the mismatched guessing functions under non-extensive framework. The bound is again shown to be tight up to a multiplicative constant for the optimum strategy.

•

The moment bounds for non-extensive mismatched guessing are further shown to be linked with the relative $(\alpha,\beta)$ -entropy measure, also known as the logarithmic super divergences. These divergences were observed to be extremely useful in robust statistical inference [18, 19]; we provide their information theoretic interpretation from mismatched guessing under non-extensivity.

•

Finally we illustrate that the optimum guessing strategy with mismatched source under non-extensivity can be obtained by minimizing the maximum of the relative $(\alpha,\beta)$ -entropies between the mismatched source and all plausible true source distributions. Non-extensive moment bounds for the resulting optimum guessing function is also derived.

2 Optimal Guessing via Non-Extensive Moment Criterion

2.1 Bounds on the Non-Extensive Moments of the Number of Guesses

Consider the problem of guessing the realization of the random variable $X$ with finite support $\mathcal{X}$ along with the notation of Section 1. We start with proving an important inequality on the non-extensive $q$ -normalized moments of $G(X)$ .

Theorem 2.1

For any arbitrary guessing function $G(X)$ , any $\rho>0$ and any $q\in\mathbb{R}$ , we have

[TABLE]

**Proof:

**For simplicity let us drop the subscript in $P_{X}(x)$ and use the notation $W_{q}(P)=\sum\limits_{x\in\mathcal{X}}P(x)^{q}$ . Now, taking an arbitrary distribution $Q$ on $\mathcal{X}$ , we have

[TABLE]

by the application of Jensen’s inequality, where $\mathcal{RE}(Q,P)$ denotes the Kullback-Leibler relative entropy measure [14] defined as

[TABLE]

Now, in terms of $\mathcal{E}(P)$ from (1), we get

[TABLE]

by another application of Jensen’s inequality. But, we know

[TABLE]

Hence, combining above equations, we get

[TABLE]

Further, simple algebra yields

[TABLE]

Now, substituting (9) and (10) in (7), we get

[TABLE]

Finally, by the standard Lagrange multiplier arguments, one can show that, given $P$ , the quantity $\left[(\rho-q+1)\mathcal{E}(Q)-q\mathcal{RE}(Q,P)\right]$ is maximized over $Q$ at the distribution

[TABLE]

Hence, a tight bound of $E_{q}[G(X)^{\rho}]$ can be obtained by substituting the above choice of $Q=Q^{\ast}$ in (11), which leads to the desired result (6). $\square$

Next we consider the conditional guessing problem with notation of Section 1 and derive the lower bound on the non-extensive moment of $G(X|Y)$ which is presented in the following theorem. Here we denote the joint pmf of $(X,Y)$ by $P_{X,Y}(x,y)$ and the marginal pmf of $Y$ by $P_{Y}(y)$ ; note that $P_{X,Y}(x,y)=P_{X|Y}(x|y)P_{Y}(y)$ .

Theorem 2.2

For any arbitrary conditional guessing function $G(X|Y)$ , any $\rho>0$ and any $q\in\mathbb{R}$ , we have

[TABLE]

**Proof:

**We break the joint probability into conditional and marginal probabilities, and apply Theorem 2.1 to the conditional expectation, to get

[TABLE]

This proves the theorem. $\square$

It is interesting to note another interpretation of these lower bounds obtained in above two theorems through the escort distribution. Let us denote by $P_{q}(x,y)$ , $P_{q}(x|y)$ and $P_{q}(x)$ the escort distributions corresponding to the joint, conditional and marginal pmfs $P_{X,Y}(x,y)$ , $P_{X|Y}(x|y)$ and $P_{X}(x)$ , respectively. Then we can rewrite the main results of the previous two theorems as

[TABLE]

where $P_{q}(\cdot,y)=\sum_{x\in\mathcal{X}}P_{q}(x,y)$ is the marginal escort distribution of $Y$ . Note that, at $q=1$ , all escort distributions coincide with the respective origin distributions and $P_{1}(\cdot,y)=P_{Y}(y)$ ; hence our results coincide with those of [3] at $q=1$ . It links our results with classical ones through the concept of escort distribution under non-extensive framework.

Further, the above lower bounds in (14) and (15) are valid for any guessing function, not necessarily the optimal one. In the following subsection, we will define the optimal guessing strategy and develop a complementary upper bound of the non-extensive moments of the optimum number of guesses.

2.2 Optimal Guessing under Non-Extensivity

Let us first consider the conditional guessing problem. We call a guessing strategy $G(X|Y)$ to be optimal under $q$ -non-extensivity if it minimizes the non-extensive moments $E_{q}[G(X|Y)^{\rho}]$ simultaneously for all $\rho>0$ . Note that, in terms of the escort distributions, we can write

[TABLE]

which is minimized by the guessing function $G^{\ast}(X|Y)$ satisfying

[TABLE]

Therefore, the optimal guessing rule $G^{\ast}(X|Y)$ is to guess the values of $X$ , given $Y=y$ , in decreasing order of the $q$ -escort distribution $P_{q}(x|y)$ of the conditional (posterior) pmf $P_{X|Y}(x|y)$ . This optimal guessing rule is unique if and only if $P_{q}(x|y)$ or equivalently $P_{X|Y}(x|y)$ is distinct over $x\in\mathcal{X}$ for any given $Y=y$ ; this is exactly the same uniqueness condition as in the case of classical optimal guessing strategy of [3].

Next note that, we already have the lower bound of the optimal guessing function $G^{\ast}(X|Y)$ from Theorem 2.2. The following theorem presents its upper bound which is tight within a multiplicative factor of the lower bound.

Theorem 2.3

For the optimal guessing function $G^{\ast}(X|Y)$ under $q$ -non-extensivity with any $q\in\mathbb{R}$ and for any $\rho>0$ , we have

[TABLE]

**Proof:

**We know that the optimal rule $G^{\ast}(x|y)$ satisfies (17) and hence we have

[TABLE]

Therefore, we get

[TABLE]

The second part follows by straightforward algebra using the definitions of escort distributions. $\square$

Let us now identify and characterize the bound on the $q$ -non-extensive moments of the optimal guessing function $G^{\ast}(X|Y)$ which is given by the right hand side of (18); let us denote this quantity as $L_{q,\rho}(X|Y)$ . This bound is tight up to a multiplicative factor of $(1+\ln M)^{\rho}$ , i.e.,

[TABLE]

In a similar manner, one can also deduce a similar tight bound for the unconditional guessing problem. The optimal guessing rule $G^{\ast}(X)$ to guess the values of $X$ under $q$ -non-extensivity, defined by the simultaneous minimizer of the non-extensive moments $E_{q}[G(X|Y)^{\rho}]$ for all $\rho>0$ , is given by the decreasing order of the $q$ -escort distribution $P_{q}(x)$ of $X$ and satisfies the moment inequality

[TABLE]

where the bound $L_{q,\rho}(X)$ is the one in Theorem 2.1, i.e.,

[TABLE]

*Relation of the bounds with a generalization of Renyi Entropy:

*It is interestingly to note that the above bounds $L_{q,\rho}(X)$ and $L_{q,\rho}(X|Y)$ are directly linked with a recent generalized entropy measure, namely the logarithmic norm entropy (LNE) of [13]. The LNE of the distribution $P_{X}$ of X is defined in terms of two parameters $\alpha,\beta$ as

[TABLE]

It coincides with the classical Renyi entropy measure (3) if either of the two parameters equals one and hence provides a two parameter generalization of Renyi entropy. The one parameter subclass at $\alpha=\beta$ is defined in the limiting sense and includes the Shannon entropy (1) at $\alpha=\beta=1$ ; see [12, 13] for more details.

By a simple algebra, one can easily see that the bound $L_{q}(X)$ in the unconditional problem is indeed given by

[TABLE]

Then, the moment inequality in (21) can be rewritten as

[TABLE]

This provides a new interesting interpretation of the newly proposed LNE measure through the non-extensive information theory, as well as the corresponding optimal guessing.

A similar interpretation of the conditional moment bound $L_{q,\rho}(X|Y)$ can also be obtained if we extend the definition of the LNE measure to define the Conditional logarithmic norm entropy (CLNE) measure as

[TABLE]

Then, we can derive that $\ln L_{q,\rho}(X|Y)=\rho\mathcal{E}_{(\frac{q}{1+\rho},q)}(X)$ , and hence the moment inequality (20) can be rewritten as

[TABLE]

i.e.,

[TABLE]

This final equation generalizes Arikan’s [3] guessing theorem for the non-extensive expectation. Further, along with providing the bound for optimal guessing, we additionally obtain a new two-parameter family of conditional entropy measure in (26) which coincides with the Renyi conditional entropy if either $\alpha$ or $\beta$ equals one. We can further extend this LNE family at $\alpha=\beta$ through continuous limit which yields

[TABLE]

We hope to develop more interesting properties of these new CLNE measures in future works. An immediate property in the context of optimal guessing is obtained by taking limit as $\rho\rightarrow 0^{+}$ in (27) which gives

[TABLE]

3 The Cases of Uncertain Source Distribution

We have studied the optimal guessing strategy and its non-extensive moments in the previous section, where we have assumed that the true joint distribution $P_{X,Y}(x,y)$ is known. Let us now assume the case of uncertain source where the true distribution $P_{X,Y}(x,y)$ is not known and it is only known that $P_{X,Y}$ comes from a family of probability distribution $\mathcal{P}$ over $\mathcal{X}\times\mathcal{Y}$ . As noted in the introduction, in such a case the optimal guessing strategy needs to be obtained by minimizing the worst (supremum) value of the penalty or redundancy measure $R(P,G)$ defined in (4); however we will use the $q$ -normalized expectations under the non-extensivity framework of the present paper.

Let us first focus on the conditional guessing problem, for which the $q$ -non-extensive optimal guessing strategy $G^{\ast}(X|Y)$ is studied in the previous section; it guesses the values of $X$ given $Y=y$ in decreasing order of $P_{q}(x|y)$ obtained from $P_{X,Y}$ . From now on, let us drop the subscript in $P_{X,Y}$ and denote the optimal strategy $G^{\ast}$ obtained from $P=P_{X,Y}$ by $G^{\ast}_{P}(X|Y)$ . However, due to the lack of knowledge, we can only guess based on another (joint) pmf $Q(x,y)$ ; let us denote the corresponding guessing strategy by $G_{Q}^{\ast}(X|Y)$ which guesses the values of $X$ given $Y=y$ in decreasing order of $Q_{q}(x|y)$ , the $q$ -escort distribution of the conditional pmf $Q(x|y)=Q(x,y)/\int Q(x,y)dx$ . We start with deriving bounds for the non-extensive $q$ -normalized expectation of the guessing function $G_{Q}^{\ast}(X|Y)$ under the true (but unknown) source distribution $P(x,y)$ in the following two theorems.

Theorem 3.1

Under the non-extensive conditional guessing problem with uncertain source, for any $\rho>0$ and any $q\in\mathbb{R}$ , we have

[TABLE]

where the expectation is taken with respect to the joint distribution $P(x,y)$ .

**Proof:

**The proof follows from the definition of $G_{Q}^{\ast}$ and (16) by observing that

[TABLE]

$\square$

Theorem 3.2

Under the non-extensive conditional guessing problem, let $G(X|Y)$ denote any arbitrary guessing strategy and let $\rho>0$ , $q\in\mathbb{R}\setminus\{0\}$ . Then, there is a pmf $Q^{(G)}$ , depending on $G$ , with support $\mathcal{X}\times\mathcal{Y}$ which satisfies

[TABLE]

where the expectation is taken with respect to the joint distribution $P(x,y)$ , and $Q_{q}^{(G)}(x|y)$ denotes the $q$ -escort distribution of the conditional pmf $Q^{(G)}(x|y)=Q^{(G)}(x,y)/\int Q^{(G)}(x,y)dx$ .

**Proof:

**Let us define, for $\rho>0$ , $q\in\mathbb{R}-\{0\}$ and for each $y\in\mathcal{Y}$ ,

[TABLE]

Note that, clearly $s_{\rho,q}$ is independent of $y\in\mathcal{Y}$ and is finite for all $\rho>0$ and $q\in\mathbb{R}-\{0\}$ . Given the guessing strategy $G(x,y)$ , define the joint pmf $Q^{(G)}$ on $\mathcal{X}\times\mathcal{Y}$ as

[TABLE]

It is easy to verify that $Q^{(G)}$ is a joint pmf with support $\mathcal{X}\times\mathcal{Y}$ and

[TABLE]

Now, using the above formula $Q_{q}^{(G)}(x|y)$ , we get

[TABLE]

Then the theorem follows by noting that $s_{0,1}=\sum\limits_{i=1}^{|\mathcal{X}|}\frac{1}{i}\leq 1+\ln|\mathcal{X}|$ . $\square$

Let us denote the right hand side of (30) by $L_{q,\rho}^{\ast}(P,Q)$ . Then, combining the results from Theorems 3.1 and 3.2, we have the following non-extensive moment bound for mismatched guessing strategy

[TABLE]

where the expectation is taken with respect to the joint distribution $P(x,y)$ . Note that (33) complements (20) for the cases of uncertain source; they coincide when the source is known, i.e., when $Q=P$ .

To get physical interpretation of the above bounds, let us define

[TABLE]

where the second term is as defined in (26) from $P$ . After some algebra, one can simplify this measure to have the form

[TABLE]

For the case when $|\mathcal{Y}|=1$ , i.e., the case of no additional information to condition upon, our joint pmfs $P(x,y)$ and $Q(x,y)$ may be though of as the pmfs of $X$ only over $\mathcal{X}$ , say $P_{X}(x)$ and $Q_{X}(x)$ . In this case the above measure simplifies to

[TABLE]

which is exactly the relative $(\alpha,\beta)$ -entropy studied in [12]. This provides a two-parameter generalization of the relative $\alpha$ -entropy of [15] or equivalently of the Renyi divergence family. For all $\alpha>0$ and $\beta\in\mathbb{R}$ , it has been shown that the relative $(\alpha,\beta)$ -entropy is indeed a proper statistical divergence and hence $\mathcal{RE}_{(\alpha,\beta)}(P,Q)\geq 0$ with equality if and only if $P=Q$ [12]. This particular two parameter divergence family has further importance in robust statistical inference, where it was referred to as the logarithmic super divergence family [18, 19].

Recalling from (4), let us now define our target redundancy measure for the conditional guessing problem under non-extensivity by using $q$ -normalized expectation

[TABLE]

where the expectation is taken with respect to the joint distribution $P(x,y)$ . We can obtain the bound on its value from (33) which is presented in the following theorem.

Theorem 3.3

Under the conditional guessing problem under non-extensivity, let $G(X|Y)$ denote any arbitrary guessing strategy and let $\rho>0$ , $q\in\mathbb{R}$ . Let $Q^{(G)}$ be the pmf associated with $G$ as obtained from 3.2. Then

[TABLE]

**Proof:

**Substituting $Q=Q^{(G)}$ in Theorem 3.1, and using (27) along with definition of $\mathcal{RE}_{(\alpha,\beta)}(P,Q)$ , one can easily deduce

[TABLE]

On the other hand, from Theorem 3.2 and (27), one can conclude

[TABLE]

which completes the proof. $\square$

Now, a optimal guessing strategy under source mismatch should work well for all possible true distributions $P\in\mathcal{P}$ and hence we should aim to minimize the worst redundancy measure given by $\sup\limits_{P\in\mathcal{P}}R_{q}(P,G)$ . However, from Theorem 3.3, it is expected that this optimal guessing strategy can be obtained from a pmf $Q$ that minimizes $\sup\limits_{P\in\mathcal{P}}q\mathcal{RE}_{(\frac{q}{1+\rho},q)}(P,Q)$ , or equivalently $\sup\limits_{P\in\mathcal{P}}\mathcal{RE}_{(\frac{q}{1+\rho},q)}(P,Q)$ if $q>0$ . We will now rigorously prove that it is indeed the case up to a factor of $\ln(1+\ln|\mathcal{X}|)$ . We start with the definition

[TABLE]

where we have assumed that the minimum exists and is attained, say at a pmf $Q^{\ast}$ . Then, we finally get an idea about how to find optimal guessing strategy and a bound of the worst-case redundancy value in terms of $C_{q,\rho}$ which is presented in our final theorem below.

Theorem 3.4

Under the non-extensive conditional guessing problem, let $\rho>0$ and $q\in\mathbb{R}$ be such that $C_{q,\rho}$ exists and is attained at $Q^{\ast}$ . Then, for any arbitrary guessing strategy $G(X|Y)$ , we have

[TABLE]

Conversely, there exists a guessing strategy $\widetilde{G^{\ast}}(X|Y)$ that satisfies

[TABLE]

**Proof:

**For any arbitrary guessing strategy $G(X|Y)$ , Theorem 3.3 gives

[TABLE]

Taking supremum over $P\in\mathcal{P}$ , we get

[TABLE]

For the converse, note that, $C_{q,\rho}=\sup_{P\in\mathcal{P}}q\mathcal{RE}_{(\frac{q}{1+\rho},q)}(P,Q^{\ast})$ by definition. Take $\widetilde{G^{\ast}}=G_{Q^{\ast}}^{\ast}$ . Then, as in the proof of Theorem 3.3, we get from Theorem 3.1 and (27) that

[TABLE]

Taking supremum over $P\in\mathcal{P}$ , we get

[TABLE]

This completes the proof. $\square$

4 Conclusion

We have studied the guessing problem under non-extensive framework with $q$ -normalized expectation. Our result generalizes the classical guessing results with usual expectation that formed the basis of Shannon coding theory. Hence, it would be a natural follow-up work to apply our results to extend the Shannon coding theorem and related theory which will be helpful in order to develop and analyze more complex communication channel and related information theoretic problems. Our work opens up a new direction towards non-extensive information theory which we hope to study in more detail in our future work.

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Abe, S. (2003). Geometry of escort distributions. Physical Review E , 68(3), 031101.
2[2] Arikan, E. (1994). On the average number of guesses required to determine the value of a random vanable. In Proc. 12th Prague Conf. on Information Theory, Statistical Decision Functions and Random Processes , Prague, Czech Republic. 20–23.
3[3] Arikan, E. (1996). An inequality on guessing and its application to sequential decoding. IEEE Transactions on Information Theory , 42(1), 99–105.
4[4] Beck, C. (2004). Superstatistics, escort distributions, and applications. Physica A , 342(1-2), 139–144.
5[5] Bercher, J. F. (2009). Source coding with escort distributions and Renyi entropy bounds. Physics Letters A , 373(36), 3235–3238.
6[6] Bercher, J. F. (2011). On escort distributions, q‐gaussians and Fisher information. AIP Conference Proceedings , 1305(1), 208–215.
7[7] Bialek, W., Nemenman, I., and Tishby, N. (2001). Complexity through nonextensivity. Physica A: Statistical Mechanics and its Applications , 302, 89–99.
8[8] Borland, L., Plastino, A. R., and Tsallis, C. (1998). Information gain within nonextensive thermostatistics. Journal of Mathematical Physics , 39(12), 6490–6501.