On the Polarization of R\'{e}nyi Entropy

Mengfan Zheng; Ling Liu; Cong Ling

arXiv:1907.06423·cs.IT·July 16, 2019

On the Polarization of R\'{e}nyi Entropy

Mengfan Zheng, Ling Liu, Cong Ling

PDF

Open Access

TL;DR

This paper extends polarization theory to Rényi entropy, revealing that sub-channel extremal states can differ under various orders, providing deeper micro-scale insights into polarization phenomena.

Contribution

It introduces polarization analysis based on Rényi entropy, showing that sub-channel extremal states can vary with entropy order, unlike traditional Shannon-based theories.

Findings

01

Sub-channels can have opposite extremal states under different Rényi entropy orders.

02

Polarization phenomena can be analyzed at the micro scale, focusing on probability pairs.

03

The theory broadens understanding of information measures beyond Shannon entropy.

Abstract

Existing polarization theories have mostly been concerned with Shannon's information measures, such as Shannon entropy and mutual information, and some related measures such as the Bhattacharyya parameter. In this work, we extend polarization theories to a more general information measure, namely, the R\'{e}nyi entropy. Our study shows that under conditional R\'{e}nyi entropies of different orders, the same synthetic sub-channel may exhibit opposite extremal states. This result reveals more insights into the polarization phenomenon on the micro scale (probability pairs) rather than on the average scale.

Equations103

H_{α} (X) = \frac{1}{1 - α} lo g x \in X \sum P_{X} (x)^{α},

H_{α} (X) = \frac{1}{1 - α} lo g x \in X \sum P_{X} (x)^{α},

H_{0} (X) = lo g ∣ X ∣,

H_{0} (X) = lo g ∣ X ∣,

H_{\infty} (X) = i min (- lo g p_{i}) = - lo g i max p_{i},

H_{\infty} (X) = i min (- lo g p_{i}) = - lo g i max p_{i},

H_{α} (X ∣ Y)

H_{α} (X ∣ Y)

H_{α} (X ∣ Y) + H_{α} (Y) = H_{α} (X, Y) .

H_{α} (X ∣ Y) + H_{α} (Y) = H_{α} (X, Y) .

H_{α}^{*} (U_{1} U_{2} ∣ Y_{1} Y_{2}) \geq H_{α}^{*} (U_{1} ∣ Y_{1} Y_{2}) + H_{α}^{*} (U_{2} ∣ Y_{1} Y_{2} U_{1}),

H_{α}^{*} (U_{1} U_{2} ∣ Y_{1} Y_{2}) \geq H_{α}^{*} (U_{1} ∣ Y_{1} Y_{2}) + H_{α}^{*} (U_{2} ∣ Y_{1} Y_{2} U_{1}),

U_{1} = X_{1} \oplus X_{2}, U_{2} = X_{2},

U_{1} = X_{1} \oplus X_{2}, U_{2} = X_{2},

P_{A} (u_{1}, u_{2}, y_{1}, y_{2})

P_{A} (u_{1}, u_{2}, y_{1}, y_{2})

≜ P_{U_{1}, U_{2}} (u_{1}, u_{2}) P_{Y_{1} ∣ X_{1}} (y_{1} ∣ u_{1} \oplus u_{2}) P_{Y_{2} ∣ X_{2}} (y_{2} ∣ u_{2})

P_{U_{1}, U_{2}} (0, 0) = P_{X} (0) P_{X} (0)

P_{U_{1}, U_{2}} (0, 0) = P_{X} (0) P_{X} (0)

P_{U_{1}, U_{2}} (1, 0) = P_{X} (1) P_{X} (0)

H_{α} (U_{2} ∣ Y_{1} Y_{2} U_{1})

H_{α} (U_{2} ∣ Y_{1} Y_{2} U_{1})

H_{α} (U_{1} ∣ Y_{1} Y_{2})

H_{α} (U_{1} U_{2} ∣ Y_{1} Y_{2})

\displaystyle\Bigg{(}\sum_{k=1}^{n}|x_{k}+y_{k}|^{p}\Bigg{)}^{\frac{1}{p}}\leq\Bigg{(}\sum_{k=1}^{n}|x_{k}|^{p}\Bigg{)}^{\frac{1}{p}}+\Bigg{(}\sum_{k=1}^{n}|y_{k}|^{p}\Bigg{)}^{\frac{1}{p}},

\displaystyle\Bigg{(}\sum_{k=1}^{n}|x_{k}+y_{k}|^{p}\Bigg{)}^{\frac{1}{p}}\leq\Bigg{(}\sum_{k=1}^{n}|x_{k}|^{p}\Bigg{)}^{\frac{1}{p}}+\Bigg{(}\sum_{k=1}^{n}|y_{k}|^{p}\Bigg{)}^{\frac{1}{p}},

S_{1}

S_{1}

S_{2}

S_{3}

H_{α} (U_{1} ∣ Y_{1} Y_{2}) + H_{α} (U_{2} ∣ Y_{1} Y_{2} U_{1})

H_{α} (U_{1} ∣ Y_{1} Y_{2}) + H_{α} (U_{2} ∣ Y_{1} Y_{2} U_{1})

= \frac{1}{1 - α} lo g \frac{S _{2}}{S _{1}} + \frac{1}{1 - α} lo g \frac{S _{3}}{S _{2}}

= \frac{1}{1 - α} lo g \frac{S _{3}}{S _{1}}

= H_{α} (U_{1} U_{2} ∣ Y_{1} Y_{2}) .

H_{α} (X_{1} ∣ Y_{1})

H_{α} (X_{1} ∣ Y_{1})

S_{4}

S_{4}

\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}\times\Big{[}P_{Y_{2},X_{2}}(y_{2},0)^{\alpha}+P_{Y_{2},X_{2}}(y_{2},1)^{\alpha}\Big{]}\Bigg{\}}

S_{2}

S_{2}

\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}+P_{Y_{1},X_{1}}(y_{1},1)P_{Y_{2},X_{2}}(y_{2},1)\Big{]}^{\alpha}

\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}+\Big{[}P_{Y_{1},X_{1}}(y_{1},1)P_{Y_{2},X_{2}}(y_{2},0)

\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}+P_{Y_{1},X_{1}}(y_{1},0)P_{Y_{2},X_{2}}(y_{2},1)\Big{]}^{\alpha}\Bigg{\}}.

\displaystyle\Bigg{\{}\Big{[}P_{Y_{1},X_{1}}(y_{1},0)P_{Y_{2},X_{2}}(y_{2},0)+P_{Y_{1},X_{1}}(y_{1},1)P_{Y_{2},X_{2}}(y_{2},1)\Big{]}^{\alpha}

\displaystyle\Bigg{\{}\Big{[}P_{Y_{1},X_{1}}(y_{1},0)P_{Y_{2},X_{2}}(y_{2},0)+P_{Y_{1},X_{1}}(y_{1},1)P_{Y_{2},X_{2}}(y_{2},1)\Big{]}^{\alpha}

\displaystyle~{}~{}~{}~{}+\Big{[}P_{Y_{1},X_{1}}(y_{1},1)P_{Y_{2},X_{2}}(y_{2},0)

\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}+P_{Y_{1},X_{1}}(y_{1},0)P_{Y_{2},X_{2}}(y_{2},1)\Big{]}^{\alpha}\Bigg{\}}^{\frac{1}{\alpha}}

\displaystyle\geq\Big{\{}\big{[}P_{Y_{1},X_{1}}(y_{1},0)P_{Y_{2},X_{2}}(y_{2},0)\big{]}^{\alpha}

\displaystyle~{}~{}~{}~{}+\big{[}P_{Y_{1},X_{1}}(y_{1},0)P_{Y_{2},X_{2}}(y_{2},1)\big{]}^{\alpha}\Big{\}}^{\frac{1}{\alpha}}

\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}+\Big{\{}\big{[}P_{Y_{1},X_{1}}(y_{1},1)P_{Y_{2},X_{2}}(y_{2},0)\big{]}^{\alpha}

\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}+\big{[}P_{Y_{1},X_{1}}(y_{1},1)P_{Y_{2},X_{2}}(y_{2},1)\big{]}^{\alpha}\Big{\}}^{\frac{1}{\alpha}}

\displaystyle=\Big{[}P_{Y_{1},X_{1}}(y_{1},0)+P_{Y_{1},X_{1}}(y_{1},1)\Big{]}

\displaystyle~{}~{}~{}~{}\times\Big{[}P_{Y_{2},X_{2}}(y_{2},0)^{\alpha}+P_{Y_{2},X_{2}}(y_{2},1)^{\alpha}\Big{]}^{\frac{1}{\alpha}}.

H_{α} (U_{2} ∣ Y_{1} Y_{2} U_{1})

H_{α} (U_{2} ∣ Y_{1} Y_{2} U_{1})

\leq \frac{1}{1 - α} lo g \frac{S _{3}}{S _{4}} = H_{α} (X_{1} ∣ Y_{1}) .

E [H_{n + 1} ∣ B_{0}, B_{1}, ..., B_{n}] = H_{n} .

E [H_{n + 1} ∣ B_{0}, B_{1}, ..., B_{n}] = H_{n} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWireless Communication Security Techniques · Quantum Computing Algorithms and Architecture · Molecular Communication and Nanonetworks

Full text

On the Polarization of Rényi Entropy

Mengfan Zheng

Imperial College London,

United Kingdom

Email: [email protected]

Ling Liu

Huawei Technologies Co. Ltd.

Shenzhen, P. R. China

Email: [email protected]

Cong Ling

Imperial College London,

United Kingdom

Email: [email protected]

Abstract

Existing polarization theories have mostly been concerned with Shannon’s information measures, such as Shannon entropy and mutual information, and some related measures such as the Bhattacharyya parameter. In this work, we extend polarization theories to a more general information measure, namely, the Rényi entropy. Our study shows that under conditional Rényi entropies of different orders, the same synthetic sub-channel may exhibit opposite extremal states. This result reveals more insights into the polarization phenomenon on the micro scale (probability pairs) rather than on the average scale.

I Introduction

The polarization technique (including channel polarization [1] and source polarization [2]) is one of the most significant breakthrough in information theory over the past decade. Arıkan showed us that as the size of the polar transformation goes to infinity, the conditional entropies of the synthetic sub-channels (or random variable pairs) equal 0 or 1 almost everywhere (a.e.) [1]. Also, their varentropies (variance of the conditional entropy random variable) asymptotically decrease to zero [3]. These results imply that the sub-channels’ transition probability matrices tend to either deterministic (noiseless channels) or uniform with respect to any channel input (completely noisy channels). However, are they still close to uniform or deterministic distributions under stricter criteria? Polarization results using Shannon’s information measures fail to answer this question.

In this work, we study polarization using a more general information measure, i.e., the Rényi entropy [4]. The Rényi entropy is more sensitive to deviations from uniform or deterministic distributions, and has been used in many areas where Shannon entropy may not be a good metric. For example, the collision entropy and min-entropy, both of which are special cases of the Rényi entropy, are convenient metrics for privacy amplification in secret-key agreement [5]. The Rényi entropy of a random variable $X$ is defined as follows.

Definition 1 (Rényi Entropy [4]).

The Rényi entropy of a random variable $X\in\mathcal{X}$ of order $\alpha$ is defined as

[TABLE]

It can be shown that as $\alpha\rightarrow 1$ , the Rényi entropy reduces to the Shannon entropy. Two other special cases of the Rényi entropy which will be discussed later include the max-entropy:

[TABLE]

which is the Rényi entropy of order 0, and the min-entropy:

[TABLE]

which equals the limiting value of $H_{\alpha}(X)$ as $\alpha\rightarrow\infty$ .

Unlike the conditional Shannon entropy, there is no generally accepted definition of the conditional Rényi entropy yet. In this paper, we adopt the following definition of conditional Rényi entropy in the study of polarization.

Definition 2 (conditional Rényi entropy [6, 7]).

The conditional Rényi entropy of order $\alpha$ of $X$ given $Y$ is defined as

[TABLE]

Note that this type of conditional Rényi entropy satisfies the chain rule:

[TABLE]

This means that it reduces to the conditional Shannon entropy when $\alpha=1$ .

There have been very limited researches in regard to polarization of conditional Rényi entropies. In [8] it is shown that the following chain rule inequality holds for the polar transformation for $\alpha\leq 1$ ,

[TABLE]

whenever $U_{1}$ , $U_{2}$ are i.i.d. uniform on $\mathbb{F}_{2}$ . The inequality holds with equality if and only if the channel $W$ is perfect, or the channel $W$ is completely noisy, or $\alpha=1$ . Note that $H^{*}_{\alpha}(X|Y)$ in [8] is defined as $H^{*}_{\alpha}(X|Y)=\frac{\alpha}{1-\alpha}\log\sum_{y\in\mathcal{Y}}P_{Y}(y)\Big{[}\sum_{x\in\mathcal{X}}{P_{X|Y}(x|y)^{\alpha}\Big{]}^{\frac{1}{\alpha}}}$ .

In this paper, we also restrict ourselves to the binary case (i.e., $\mathcal{X}=\mathbb{F}_{2}$ ) as a starting point. As a result, logarithms in this paper will all be base-2. However, we do not assume $X$ to be uniformly distributed. We prove that the order- $\alpha$ conditional Rényi entropies of the synthetic sub-channels also polarize to 0 and 1, but the fraction of 1 tends to $H_{\alpha}(X|Y)$ . This is to say that for a given sub-channel, its conditional Rényi entropies of different orders may exhibit opposite extremal states. Intuitively, if a sub-channel is truly noiseless (or truly completely noisy), its conditional Rényi entropies of different orders should all be 0 (or 1). We show both analytically and numerically that this strange phenomenon is caused by a vanishing deviation from truly deterministic or truly uniform distributions. The different extremal states that a sub-channel exhibits for various $\alpha$ reflect the polarization level of the sub-channel at the micro scale, i.e., how close is its joint distribution to truly uniform or truly deterministic.

II Polarization of Rényi Entropy

II-A Polarization of a Basic Polar Transformation

Consider two binary-input discrete memoryless channels (B-DMC) $P_{Y_{1}|X_{1}}$ and $P_{Y_{2}|X_{2}}$ with the same input distribution $P_{X}$ in the channel coding scenario, or consider $(X_{1},Y_{1})$ and $(X_{2},Y_{2})$ as two samples of a memoryless source $(X,Y)\sim P_{X,Y}$ with $X$ being the binary source to be compressed and $Y$ being the side information about $X$ in the source coding scenario. Note that in the former case, $P_{Y_{1}|X_{1}}$ and $P_{Y_{2}|X_{2}}$ can be different, which corresponds to the compound channel setting. Let

[TABLE]

and denote

[TABLE]

for short. From (6) we know that

[TABLE]

For the basic polar transformation, we have the following polarization result.

Lemma 1.

For $\alpha\geq 0$ , we have

[TABLE]

Let us first recall the Minkowski inequality before proving Lemma 1.

Proposition 1.

The Minkowski inequality states that for $1\leq p\leq\infty$ ,

[TABLE]

with equality if and only if $\mathbf{x}=(x_{1},x_{2},...,x_{n})$ and $\mathbf{y}=(y_{1},y_{2},...,y_{n})$ are positively linearly dependent, i.e., $\mathbf{x}=\lambda\mathbf{y}$ for some $\lambda\geq 0$ or $\mathbf{y}=0$ . For $0<p<1$ , the inequality in (11) is reversed.

Proof of Lemma 1.

(I) First, we consider (10). Denote

[TABLE]

Then we have

[TABLE]

(II) Next, we consider (9).

$H_{\alpha}(X_{1}|Y_{1})$ can be expressed as (15) (on the top of the next page), where

[TABLE]

From (13) and (7) we have

[TABLE]

For $0<\alpha<1$ , by the Minkowski inequality we have

[TABLE]

Thus, $S_{2}\geq S_{4}$ , which means

[TABLE]

For $\alpha>1$ , the inequality in (16) is reversed. However, since $1-\alpha<0$ , the result remains the same. For $\alpha=1$ , the Renyi entropy reduces to the Shannon entropy, and the polarization result is identical to the result here [1]. For $\alpha=0$ , it is obvious from (2) and (4) that (10) holds, while (8) and (9) hold with equality.

Similarly, $H_{\alpha}(U_{2}|Y_{1}Y_{2}U_{1})\leq H_{\alpha}(X_{2}|Y_{2})$ can be proved.

(III) Finally, equality (10) and inequality (9) immediately imply (8).

∎

II-B Recursive Polar Transformation

Now consider extending the basic transformation recursively to higher orders. For $N=2^{n}$ with $n$ being an arbitrary integer, the recursive transformation can be expressed as $U^{1:N}=X^{1:N}\mathbf{G}_{N}$ [1], where $\mathbf{G}_{N}=\mathbf{B}_{N}\textbf{F}^{\otimes n}$ with $\mathbf{B}_{N}$ being the bit-reversal matrix and $\textbf{F}=\begin{bmatrix}1&0\\ 1&1\end{bmatrix}$ . Denote $H_{N}(i)=H_{\alpha}(U^{i}|Y^{1:N}U^{1:i-1})$ . We have the following theorem.

Theorem 1.

For any B-DMC $P_{Y|X}$ (or any discrete memoryless source $(X,Y)\sim P_{X,Y}$ over $\mathcal{X}\times\mathcal{Y}$ with $\mathcal{X}=\{0,1\}$ and $\mathcal{Y}$ an arbitrary countable set) and any $\alpha\geq 0$ , as $N\rightarrow\infty$ through the power of 2, the fraction of indices $i\in[N]\triangleq\{1,2...,N\}$ with $H_{N}(i)\in(1-\delta,1]$ goes to $H_{\alpha}(X|Y)$ , and the fraction with $H_{N}(i)\in[0,\delta)$ goes to $1-H_{\alpha}(X|Y)$ .

Proof.

We follow Arıkan’s martingale approach [1, 9] to complete the proof. First, we introduce the same infinite binary tree as in the proof of [1, Theorem 1], with a root node at level 0 and $2^{n}$ nodes at level $n$ . Then define a random walk $\{B_{n};n\geq 0\}$ in this tree as follows. The random walk starts at the root node with $B_{0}=(0,1)$ , and moves to one of the two child nodes in the next level with equal probability at each integer time. If $B_{n}=(n,i)$ , $B_{n+1}$ equals $(n+1,2i-1)$ or $(n+1,2i)$ with probability $1/2$ each. Denote $H(0,1)=H_{\alpha}(X|Y)$ and $H(n,i)=H_{\alpha}(U^{i}|Y^{1:2^{n}},U^{1:i-1})$ for $n\geq 1$ , $i=[2^{n}]$ . Define a random process $\{H_{n};n\geq 0\}$ with $H_{n}=H(B_{n})$ . It can be shown that the process $\{H_{n};n\geq 0\}$ is a martingale due to the chain rule equality of (10), i.e.,

[TABLE]

Since $\{H_{n};n\geq 0\}$ is a uniformly integrable martingale, it converges a.e. to an RV $H_{\infty}$ such that $E[|H_{n}-H_{\infty}|]=0$ . Then we have

[TABLE]

Note that

[TABLE]

In Lemma 1, by letting $X_{1}=X_{2}=U^{i}$ , $Y_{1}=Y_{2}=(Y^{1:2^{n}},U^{1:i-1})$ , $U_{1}=U^{2i-1}$ , and $U_{2}=U^{2i}$ , we can see that (19) and (20) force (16) to hold with equality for $i\in[2^{n}]$ a.e. as $n\rightarrow\infty$ . From Proposition 1 we know that (16) holds with equality only in the following two cases:

•

(Case 1) $P_{Y_{1},X_{1}}(y_{1},0)=0$ or $P_{Y_{1},X_{1}}(y_{1},1)=0$ for all effective $y_{1}\in\mathcal{Y}$ .

•

(Case 2) $P_{Y_{2},X_{2}}(y_{2},0)=P_{Y_{2},X_{2}}(y_{2},1)$ for all effective $y_{2}\in\mathcal{Y}$ .

The effective elements of $\mathcal{Y}$ (denoted by $\mathcal{Y}_{e}\subset\mathcal{Y}$ ) mean that

[TABLE]

as $n\rightarrow\infty$ . A more detailed discussion about it will be given in the next subsection. Since $P_{Y_{1},X_{1}}$ and $P_{Y_{2},X_{2}}$ are identical in the recursive transformation, the joint distributions of $P_{\mathbf{Y}^{i},U^{i}}$ ( $i\in[2^{n}]$ ) tend to either Case 1 or Case 2 a.e. as $n\rightarrow\infty$ , where $\mathbf{Y}^{i}=(Y^{1:2^{n}},U^{1:i-1})$ . It is easy to verify that $H_{\alpha}(U^{i}|\mathbf{Y}^{i})$ equals 0 in Case 1 and 1 in Case 2. This shows that $H_{n}$ converges a.e. to 0 and 1 as $n\rightarrow\infty$ .

The convergence result together with the chain rule equality of (10) imply that the fraction of $\{i:H(n,i)\in(1-\delta,1]\}$ goes to $H_{\alpha}(X|Y)$ as $n\rightarrow\infty$ .

∎

Now we can answer the question raised at the beginning of this paper. From the definition of the max-entropy $H_{0}(X)$ and (5) we know that

[TABLE]

provided that the probabilities $P_{\mathbf{Y}^{i},U^{i}}(\mathbf{y}^{i},u^{i})$ are nonzero. Since $P_{\mathbf{Y}^{i},U^{i}}(\mathbf{y}^{i},u^{i})$ never really become 0 by the polar transformation, we know that the fraction of sub-channels with $H_{0}(U^{i}|\mathbf{Y}^{i})\rightarrow 0$ is always 0, which means that the synthetic sub-channels never really polarize to truly deterministic state. On the contrary, the min-conditional-entropy,

[TABLE]

is not a constant in general. As will be explained in Section III, only truly uniform distribution yields $H_{\infty}(X|Y)=1$ . Therefore, as $n\rightarrow\infty$ , a fraction $H_{\infty}(X|Y)$ of the sub-channels will become truly completely noisy. This result can be concluded by the following corollary.

Corollary 1.

As $n\rightarrow\infty$ , the fraction of truly completely noisy sub-channels tends to $H_{\infty}(X|Y)$ , while the fraction of truly noiseless sub-channels is 0.

II-C An Example of Effective Elements

The reason we introduce the term of effective elements is that, without the ”effective” in the definitions of Case 1 and Case 2, the conditional Rényi entropies in Case 1 and Case 2 of different orders should all be 0 and 1, respectively. In this subsection, we further discuss this issue. First, let us present an example to show that for a given $\alpha=\alpha_{0}$ and $\alpha^{\prime}=\alpha_{0}+1$ , $H_{\alpha}(X|Y)$ can be arbitrarily close to 1 while $H_{\alpha^{\prime}}(X|Y)$ can be arbitrarily close to 0.

Let $|\mathcal{Y}|=2^{N}\triangleq M$ . Consider a joint distribution $P_{X,Y}$ such that a fraction $\frac{1}{L}$ of probability pairs $\big{(}P_{X,Y}(0,y),P_{X,Y}(1,y)\big{)}$ are completely deterministic with accumulated probability of $\frac{1}{N}$ . Without loss of generality, we assume that $P_{X,Y}(0,y_{i})=\frac{L}{NM}$ and $P_{X,Y}(1,y_{i})=0$ for $i\in\mathcal{A}$ , where $\mathcal{A}\subset[2^{N}]$ is the set of deterministic pairs. The rest $\frac{L-1}{L}$ fraction of probability pairs are completely uniform, i.e., $P_{X,Y}(0,y_{i})=P_{X,Y}(1,y_{i})=\frac{(N-1)L}{2NM(L-1)}$ for $i\in\mathcal{A}^{C}$ . Then

[TABLE]

For a considered $\alpha=\alpha_{0}>1$ , let

[TABLE]

Then we have

[TABLE]

It is clear that as $N\rightarrow\infty$ , $\frac{(N-1)^{\alpha_{0}}}{(N-1)^{\alpha_{0}-\frac{0.5}{\alpha_{0}}}}=(N-1)^{0.5/\alpha_{0}}\rightarrow\infty$ , thus $H_{\alpha_{0}}(X|Y)\rightarrow 1$ .

Now let $\alpha^{\prime}=\alpha_{0}+1$ . From (23) we have

[TABLE]

In this case, as $N\rightarrow\infty$ , $\frac{(N-1)^{\alpha_{0}+1}}{(N-1)^{\frac{\alpha_{0}^{2}-0.5}{\alpha_{0}-1}}}=(N-1)^{\frac{-0.5}{\alpha_{0}^{2}-1}}\rightarrow 0$ , thus $H_{\alpha^{\prime}}(X|Y)\rightarrow 0$ .

From (24) we can see that $\frac{1}{L}\rightarrow 0$ as $N\rightarrow\infty$ if $\alpha_{0}>1$ . Also, $\frac{1}{N}\rightarrow 0$ as $N\rightarrow\infty$ . Therefore, we have shown a case when $H_{\alpha}(X|Y)$ and $H_{\alpha+1}(X|Y)$ can be completely opposite asymptotically. Fig. 1 shows the example of $\alpha_{0}=2$ .

We can similarly design such an example for $0<\alpha<1$ . This shows that the extreme cases defined in the proof of Theorem 1 are relative. Even if almost all probability pairs of a sub-channel are uniform (so that it may seem to be of Case 2), when powered by some $\alpha$ , these probability pairs may have little impact on the value of $H_{\alpha}(X|Y)$ (so that the sub-channel is actually of Case 1), just as our example has shown. Thus, although we proved Theorem 1 for any $\alpha\geq 0$ in a unified form, the criterion for judging whether a sub-channel converges to Case 1 or Case 2 depends on $\alpha$ .

II-D Numerical Results

In this subsection we present the polarization of a binary symmetric channel (BSC) with crossover probability 0.2 numerically. We calculate the joint distribution of each synthetic sub-channel and then compute its conditional Rényi entropies of order 0.1, 0.5, 1, 2, 10 and 100, respectively. The result is shown in Fig. 2. The channel indices are reordered so that the conditional Shannon entropies increase monotonically. The dash lines demonstrate the proportions of extremal sub-channels as $n\rightarrow\infty$ . Although the considered block-length is not long, this figure clearly shows that the polarized sets vary with $\alpha$ . We can also see that a relatively higher conditional Shannon entropy does not necessarily imply a relatively higher conditional Rényi entropy of a different order.

III A Discussion on Rényi entropy

In this section, we further discuss how small the deviation from a uniform or deterministic distribution should be to make the conditional Rényi entropy achieve 1 or 0. Let $\bar{Q}_{X,Y}$ be a uniform distribution with respect to $X$ , i.e., $\bar{Q}_{X,Y}(0,y)=\bar{Q}_{X,Y}(1,y)=\frac{1}{2}\bar{Q}_{Y}(y)$ for any $y\in\mathcal{Y}$ , and further assume that $\bar{Q}_{Y}(y)=P_{Y}(y)$ . Let $\tilde{Q}_{X,Y}$ be a deterministic distribution with respect to $X$ , i.e., $\tilde{Q}_{X,Y}(0,y)=0,\tilde{Q}_{X,Y}(1,y)=\tilde{Q}_{Y}(y)$ or $\tilde{Q}_{X,Y}(1,y)=0,\tilde{Q}_{X,Y}(0,y)=\tilde{Q}_{Y}(y)$ for any $y\in\mathcal{Y}$ , and further assume that $\tilde{Q}_{Y}(y)=P_{Y}(y)$ . Then we have

[TABLE]

and similarly

[TABLE]

For the uniform distribution case, assume that $P_{X,Y}(0,y_{i})=\frac{1}{2}\bar{Q}_{Y}(y_{i})+\delta_{i},P_{X,Y}(1,y_{i})=\frac{1}{2}\bar{Q}_{Y}(y_{i})-\delta_{i}$ , where $\delta_{i}\ll\bar{Q}_{Y}(y_{i})$ . Denote $|\mathcal{Y}|=M$ . Define

[TABLE]

As a simple example, further assume that $\bar{Q}_{Y}$ is also uniform. Then to ensure that $H_{\alpha}(X|Y)\rightarrow 1$ ,

[TABLE]

is required. As $\alpha\rightarrow\infty$ , only truly uniform distribution yields $H_{\alpha}(X|Y)=1$ .

For the deterministic distribution case, without loss of generality, assume $\tilde{Q}_{X,Y}(0,y_{i})=0,\tilde{Q}_{X,Y}(1,y_{i})=\tilde{Q}_{Y}(y_{i})$ , and $P_{X,Y}(0,y_{i})=\delta_{i},P_{X,Y}(1,y_{i})=\tilde{Q}_{Y}(y_{i})-\delta_{i}$ for any $y_{i}\in\mathcal{Y}$ , where $\delta_{i}\ll\tilde{Q}_{Y}(y_{i})$ . Denote $|\mathcal{Y}|=M$ . Define

[TABLE]

For $0<\alpha<1$ , the difference between $\delta_{i}$ and $\tilde{Q}_{Y}(y_{i})$ shrinks by the power- $\alpha$ operation. As $\alpha$ approaches 0, $\sum_{i=1}^{M}\delta_{i}^{\alpha}$ gains more influence on $\Delta$ . When $\alpha$ is small enough, we will have

[TABLE]

In the extreme case when $\alpha=0$ and $\delta_{i}>0$ for all $i\in[M]$ , we get $\Delta=1$ , and $H_{\alpha}(X|Y)$ always equals 1. $\Delta$ equals 0 if and only if $\delta_{i}=0$ for all $i\in[M]$ .

IV Concluding Remarks

This work has revealed the polarization phenomenon of conditional Rényi entropies. To apply the results to specific problems, much work has yet to be done. For example, estimating conditional Rényi entropies accurately can be a hard problem when $N$ grows large, and existing approximation methods for polar code constructions may not be directly applied. For another example, the polarization rate of conditional Rényi entropies has not been touched in this paper. We will leave them for future research.

Bibliography9

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. Arıkan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Transactions on Information Theory , vol. 55, no. 7, pp. 3051–3073, 2009.
2[2] ——, “Source polarization,” in 2010 IEEE International Symposium on Information Theory , 2010, pp. 899–903.
3[3] ——, “Varentropy decreases under the polar transform,” IEEE Transactions on Information Theory , vol. 62, no. 6, pp. 3390–3400, 2016.
4[4] A. Rényi, “On measures of entropy and information,” Hungarian Academy of Sciences Budapest Hungary, Tech. Rep., 1961.
5[5] M. Bloch and J. Barros, Physical-layer security . Cambridge University Press, 2011.
6[6] P. Jizba and T. Arimitsu, “The world according to Rényi: thermodynamics of multifractal systems,” Annals of Physics , vol. 312, no. 1, pp. 17–59, 2004.
7[7] L. Golshani, E. Pasha, and G. Yari, “Some properties of Rényi entropy and Rényi entropy rate,” Information Sciences , vol. 179, no. 14, pp. 2426–2433, 2009.
8[8] M. Alsan and E. Telatar, “Polarization improves E 0 subscript 𝐸 0 {E}_{0} ,” IEEE Transactions on Information Theory , vol. 60, no. 5, pp. 2714–2719, 2014.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

On the Polarization of Rényi Entropy

Abstract

I Introduction

Definition 1** (Rényi Entropy [4]).**

Definition 2** (conditional Rényi entropy [6, 7]).**

II Polarization of Rényi Entropy

II-A Polarization of a Basic Polar Transformation

Lemma 1**.**

Proposition 1**.**

Proof of Lemma 1.

II-B Recursive Polar Transformation

Theorem 1**.**

Proof.

Corollary 1**.**

II-C An Example of Effective Elements

II-D Numerical Results

III A Discussion on Rényi entropy

IV Concluding Remarks

Definition 1 (Rényi Entropy [4]).

Definition 2 (conditional Rényi entropy [6, 7]).

Lemma 1.

Proposition 1.

Theorem 1.

Corollary 1.