On the Polarization of R\'{e}nyi Entropy
Mengfan Zheng, Ling Liu, Cong Ling

TL;DR
This paper extends polarization theory to Rényi entropy, revealing that sub-channel extremal states can differ under various orders, providing deeper micro-scale insights into polarization phenomena.
Contribution
It introduces polarization analysis based on Rényi entropy, showing that sub-channel extremal states can vary with entropy order, unlike traditional Shannon-based theories.
Findings
Sub-channels can have opposite extremal states under different Rényi entropy orders.
Polarization phenomena can be analyzed at the micro scale, focusing on probability pairs.
The theory broadens understanding of information measures beyond Shannon entropy.
Abstract
Existing polarization theories have mostly been concerned with Shannon's information measures, such as Shannon entropy and mutual information, and some related measures such as the Bhattacharyya parameter. In this work, we extend polarization theories to a more general information measure, namely, the R\'{e}nyi entropy. Our study shows that under conditional R\'{e}nyi entropies of different orders, the same synthetic sub-channel may exhibit opposite extremal states. This result reveals more insights into the polarization phenomenon on the micro scale (probability pairs) rather than on the average scale.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWireless Communication Security Techniques · Quantum Computing Algorithms and Architecture · Molecular Communication and Nanonetworks
On the Polarization of Rényi Entropy
Mengfan Zheng
Imperial College London,
United Kingdom
Email: [email protected]
Ling Liu
Huawei Technologies Co. Ltd.
Shenzhen, P. R. China
Email: [email protected]
Cong Ling
Imperial College London,
United Kingdom
Email: [email protected]
Abstract
Existing polarization theories have mostly been concerned with Shannon’s information measures, such as Shannon entropy and mutual information, and some related measures such as the Bhattacharyya parameter. In this work, we extend polarization theories to a more general information measure, namely, the Rényi entropy. Our study shows that under conditional Rényi entropies of different orders, the same synthetic sub-channel may exhibit opposite extremal states. This result reveals more insights into the polarization phenomenon on the micro scale (probability pairs) rather than on the average scale.
I Introduction
The polarization technique (including channel polarization [1] and source polarization [2]) is one of the most significant breakthrough in information theory over the past decade. Arıkan showed us that as the size of the polar transformation goes to infinity, the conditional entropies of the synthetic sub-channels (or random variable pairs) equal 0 or 1 almost everywhere (a.e.) [1]. Also, their varentropies (variance of the conditional entropy random variable) asymptotically decrease to zero [3]. These results imply that the sub-channels’ transition probability matrices tend to either deterministic (noiseless channels) or uniform with respect to any channel input (completely noisy channels). However, are they still close to uniform or deterministic distributions under stricter criteria? Polarization results using Shannon’s information measures fail to answer this question.
In this work, we study polarization using a more general information measure, i.e., the Rényi entropy [4]. The Rényi entropy is more sensitive to deviations from uniform or deterministic distributions, and has been used in many areas where Shannon entropy may not be a good metric. For example, the collision entropy and min-entropy, both of which are special cases of the Rényi entropy, are convenient metrics for privacy amplification in secret-key agreement [5]. The Rényi entropy of a random variable is defined as follows.
Definition 1** (Rényi Entropy [4]).**
The Rényi entropy of a random variable of order is defined as
[TABLE]
It can be shown that as , the Rényi entropy reduces to the Shannon entropy. Two other special cases of the Rényi entropy which will be discussed later include the max-entropy:
[TABLE]
which is the Rényi entropy of order 0, and the min-entropy:
[TABLE]
which equals the limiting value of as .
Unlike the conditional Shannon entropy, there is no generally accepted definition of the conditional Rényi entropy yet. In this paper, we adopt the following definition of conditional Rényi entropy in the study of polarization.
Definition 2** (conditional Rényi entropy [6, 7]).**
The conditional Rényi entropy of order of given is defined as
[TABLE]
Note that this type of conditional Rényi entropy satisfies the chain rule:
[TABLE]
This means that it reduces to the conditional Shannon entropy when .
There have been very limited researches in regard to polarization of conditional Rényi entropies. In [8] it is shown that the following chain rule inequality holds for the polar transformation for ,
[TABLE]
whenever , are i.i.d. uniform on . The inequality holds with equality if and only if the channel is perfect, or the channel is completely noisy, or . Note that in [8] is defined as H^{*}_{\alpha}(X|Y)=\frac{\alpha}{1-\alpha}\log\sum_{y\in\mathcal{Y}}P_{Y}(y)\Big{[}\sum_{x\in\mathcal{X}}{P_{X|Y}(x|y)^{\alpha}\Big{]}^{\frac{1}{\alpha}}}.
In this paper, we also restrict ourselves to the binary case (i.e., ) as a starting point. As a result, logarithms in this paper will all be base-2. However, we do not assume to be uniformly distributed. We prove that the order- conditional Rényi entropies of the synthetic sub-channels also polarize to 0 and 1, but the fraction of 1 tends to . This is to say that for a given sub-channel, its conditional Rényi entropies of different orders may exhibit opposite extremal states. Intuitively, if a sub-channel is truly noiseless (or truly completely noisy), its conditional Rényi entropies of different orders should all be 0 (or 1). We show both analytically and numerically that this strange phenomenon is caused by a vanishing deviation from truly deterministic or truly uniform distributions. The different extremal states that a sub-channel exhibits for various reflect the polarization level of the sub-channel at the micro scale, i.e., how close is its joint distribution to truly uniform or truly deterministic.
II Polarization of Rényi Entropy
II-A Polarization of a Basic Polar Transformation
Consider two binary-input discrete memoryless channels (B-DMC) and with the same input distribution in the channel coding scenario, or consider and as two samples of a memoryless source with being the binary source to be compressed and being the side information about in the source coding scenario. Note that in the former case, and can be different, which corresponds to the compound channel setting. Let
[TABLE]
and denote
[TABLE]
for short. From (6) we know that
[TABLE]
For the basic polar transformation, we have the following polarization result.
Lemma 1**.**
For , we have
[TABLE]
Let us first recall the Minkowski inequality before proving Lemma 1.
Proposition 1**.**
The Minkowski inequality states that for ,
[TABLE]
with equality if and only if and are positively linearly dependent, i.e., for some or . For , the inequality in (11) is reversed.
Proof of Lemma 1.
(I) First, we consider (10). Denote
[TABLE]
Then we have
[TABLE]
(II) Next, we consider (9).
can be expressed as (15) (on the top of the next page), where
[TABLE]
[TABLE]
For , by the Minkowski inequality we have
[TABLE]
Thus, , which means
[TABLE]
For , the inequality in (16) is reversed. However, since , the result remains the same. For , the Renyi entropy reduces to the Shannon entropy, and the polarization result is identical to the result here [1]. For , it is obvious from (2) and (4) that (10) holds, while (8) and (9) hold with equality.
Similarly, can be proved.
(III) Finally, equality (10) and inequality (9) immediately imply (8).
∎
II-B Recursive Polar Transformation
Now consider extending the basic transformation recursively to higher orders. For with being an arbitrary integer, the recursive transformation can be expressed as [1], where with being the bit-reversal matrix and . Denote . We have the following theorem.
Theorem 1**.**
For any B-DMC (or any discrete memoryless source over with and an arbitrary countable set) and any , as through the power of 2, the fraction of indices with goes to , and the fraction with goes to .
Proof.
We follow Arıkan’s martingale approach [1, 9] to complete the proof. First, we introduce the same infinite binary tree as in the proof of [1, Theorem 1], with a root node at level 0 and nodes at level . Then define a random walk in this tree as follows. The random walk starts at the root node with , and moves to one of the two child nodes in the next level with equal probability at each integer time. If , equals or with probability each. Denote and for , . Define a random process with . It can be shown that the process is a martingale due to the chain rule equality of (10), i.e.,
[TABLE]
Since is a uniformly integrable martingale, it converges a.e. to an RV such that . Then we have
[TABLE]
Note that
[TABLE]
In Lemma 1, by letting , , , and , we can see that (19) and (20) force (16) to hold with equality for a.e. as . From Proposition 1 we know that (16) holds with equality only in the following two cases:
- •
(Case 1) or for all effective .
- •
(Case 2) for all effective .
The effective elements of (denoted by ) mean that
[TABLE]
as . A more detailed discussion about it will be given in the next subsection. Since and are identical in the recursive transformation, the joint distributions of () tend to either Case 1 or Case 2 a.e. as , where . It is easy to verify that equals 0 in Case 1 and 1 in Case 2. This shows that converges a.e. to 0 and 1 as .
The convergence result together with the chain rule equality of (10) imply that the fraction of goes to as .
∎
Now we can answer the question raised at the beginning of this paper. From the definition of the max-entropy and (5) we know that
[TABLE]
provided that the probabilities are nonzero. Since never really become 0 by the polar transformation, we know that the fraction of sub-channels with is always 0, which means that the synthetic sub-channels never really polarize to truly deterministic state. On the contrary, the min-conditional-entropy,
[TABLE]
is not a constant in general. As will be explained in Section III, only truly uniform distribution yields . Therefore, as , a fraction of the sub-channels will become truly completely noisy. This result can be concluded by the following corollary.
Corollary 1**.**
As , the fraction of truly completely noisy sub-channels tends to , while the fraction of truly noiseless sub-channels is 0.
II-C An Example of Effective Elements
The reason we introduce the term of effective elements is that, without the ”effective” in the definitions of Case 1 and Case 2, the conditional Rényi entropies in Case 1 and Case 2 of different orders should all be 0 and 1, respectively. In this subsection, we further discuss this issue. First, let us present an example to show that for a given and , can be arbitrarily close to 1 while can be arbitrarily close to 0.
Let . Consider a joint distribution such that a fraction of probability pairs \big{(}P_{X,Y}(0,y),P_{X,Y}(1,y)\big{)} are completely deterministic with accumulated probability of . Without loss of generality, we assume that and for , where is the set of deterministic pairs. The rest fraction of probability pairs are completely uniform, i.e., for . Then
[TABLE]
For a considered , let
[TABLE]
Then we have
[TABLE]
It is clear that as ,, thus .
Now let . From (23) we have
[TABLE]
In this case, as ,, thus .
From (24) we can see that as if . Also, as . Therefore, we have shown a case when and can be completely opposite asymptotically. Fig. 1 shows the example of .
We can similarly design such an example for . This shows that the extreme cases defined in the proof of Theorem 1 are relative. Even if almost all probability pairs of a sub-channel are uniform (so that it may seem to be of Case 2), when powered by some , these probability pairs may have little impact on the value of (so that the sub-channel is actually of Case 1), just as our example has shown. Thus, although we proved Theorem 1 for any in a unified form, the criterion for judging whether a sub-channel converges to Case 1 or Case 2 depends on .
II-D Numerical Results
In this subsection we present the polarization of a binary symmetric channel (BSC) with crossover probability 0.2 numerically. We calculate the joint distribution of each synthetic sub-channel and then compute its conditional Rényi entropies of order 0.1, 0.5, 1, 2, 10 and 100, respectively. The result is shown in Fig. 2. The channel indices are reordered so that the conditional Shannon entropies increase monotonically. The dash lines demonstrate the proportions of extremal sub-channels as . Although the considered block-length is not long, this figure clearly shows that the polarized sets vary with . We can also see that a relatively higher conditional Shannon entropy does not necessarily imply a relatively higher conditional Rényi entropy of a different order.
III A Discussion on Rényi entropy
In this section, we further discuss how small the deviation from a uniform or deterministic distribution should be to make the conditional Rényi entropy achieve 1 or 0. Let be a uniform distribution with respect to , i.e., for any , and further assume that . Let be a deterministic distribution with respect to , i.e., or for any , and further assume that . Then we have
[TABLE]
and similarly
[TABLE]
For the uniform distribution case, assume that , where . Denote . Define
[TABLE]
As a simple example, further assume that is also uniform. Then to ensure that ,
[TABLE]
is required. As , only truly uniform distribution yields .
For the deterministic distribution case, without loss of generality, assume , and for any , where . Denote . Define
[TABLE]
For , the difference between and shrinks by the power- operation. As approaches 0, gains more influence on . When is small enough, we will have
[TABLE]
In the extreme case when and for all , we get , and always equals 1. equals 0 if and only if for all .
IV Concluding Remarks
This work has revealed the polarization phenomenon of conditional Rényi entropies. To apply the results to specific problems, much work has yet to be done. For example, estimating conditional Rényi entropies accurately can be a hard problem when grows large, and existing approximation methods for polar code constructions may not be directly applied. For another example, the polarization rate of conditional Rényi entropies has not been touched in this paper. We will leave them for future research.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. Arıkan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Transactions on Information Theory , vol. 55, no. 7, pp. 3051–3073, 2009.
- 2[2] ——, “Source polarization,” in 2010 IEEE International Symposium on Information Theory , 2010, pp. 899–903.
- 3[3] ——, “Varentropy decreases under the polar transform,” IEEE Transactions on Information Theory , vol. 62, no. 6, pp. 3390–3400, 2016.
- 4[4] A. Rényi, “On measures of entropy and information,” Hungarian Academy of Sciences Budapest Hungary, Tech. Rep., 1961.
- 5[5] M. Bloch and J. Barros, Physical-layer security . Cambridge University Press, 2011.
- 6[6] P. Jizba and T. Arimitsu, “The world according to Rényi: thermodynamics of multifractal systems,” Annals of Physics , vol. 312, no. 1, pp. 17–59, 2004.
- 7[7] L. Golshani, E. Pasha, and G. Yari, “Some properties of Rényi entropy and Rényi entropy rate,” Information Sciences , vol. 179, no. 14, pp. 2426–2433, 2009.
- 8[8] M. Alsan and E. Telatar, “Polarization improves E 0 subscript 𝐸 0 {E}_{0} ,” IEEE Transactions on Information Theory , vol. 60, no. 5, pp. 2714–2719, 2014.
