An Entropy Power Inequality for Discrete Random Variables

Ehsan Nekouei; Mikael Skoglund; Karl Henrik Johansson

arXiv:1905.03015·cs.IT·May 9, 2019

An Entropy Power Inequality for Discrete Random Variables

Ehsan Nekouei, Mikael Skoglund, Karl Henrik Johansson

PDF

Open Access

TL;DR

This paper establishes a new entropy power inequality for discrete random variables, showing that the sum of their entropy powers is bounded by twice the entropy power of their sum, with a proof leveraging perturbation and continuous inequalities.

Contribution

The paper introduces the first entropy power inequality for discrete variables, extending concepts from continuous entropy power inequalities to the discrete setting.

Findings

01

The inequality is tight for certain distributions.

02

The proof uses perturbation with continuous variables.

03

Provides a new tool for analyzing discrete entropy.

Abstract

Let $N_{d} [X] = \frac{1}{2 π e} e^{2 H [X]}$ denote the entropy power of the discrete random variable $X$ where $H [X]$ denotes the discrete entropy of $X$ . In this paper, we show that for two independent discrete random variables $X$ and $Y$ , the entropy power inequality $N_{d} [X] + N_{d} [Y] \leq 2 N_{d} [X + Y]$ holds and it can be tight. The basic idea behind the proof is to perturb the discrete random variables using suitably designed continuous random variables. Then, the continuous entropy power inequality is applied to the sum of the perturbed random variables and the resulting lower bound is optimized.

Equations75

N_{c} [U] + N_{c} [V] \leq N_{c} [U + V],

N_{c} [U] + N_{c} [V] \leq N_{c} [U + V],

N_{d} [X] + N_{d} [Y] \leq 2 N_{d} [X + Y]

N_{d} [X] + N_{d} [Y] \leq 2 N_{d} [X + Y]

h [V]

h [V]

N_{c} [V]

H [X]

H [X]

N_{d} [X]

N_{d} [X] + N_{d} [Y] \leq 2 N_{d} [X + Y] .

N_{d} [X] + N_{d} [Y] \leq 2 N_{d} [X + Y] .

h [M + T] = H [M] + h [T],

h [M + T] = H [M] + h [T],

h [X + W_{1}]

h [X + W_{1}]

h [Y + W_{2}]

h [X + W_{1} + Y + W_{2}] = H [X + Y] + h [W_{1} + W_{2}] .

h [X + W_{1} + Y + W_{2}] = H [X + Y] + h [W_{1} + W_{2}] .

\frac{N _{d} [ X + Y ]}{N _{d} [ X ] + N _{d} [ Y ]} \geq p (x) \in Λ sup \frac{e ^{2 h [W_{1}]}}{e ^{2 h [W_{1} + W_{2}]}},

\frac{N _{d} [ X + Y ]}{N _{d} [ X ] + N _{d} [ Y ]} \geq p (x) \in Λ sup \frac{e ^{2 h [W_{1}]}}{e ^{2 h [W_{1} + W_{2}]}},

p (x) \in Λ sup \frac{e ^{2 h [W_{1}]}}{e ^{2 h [W_{1} + W_{2}]}} = \frac{1}{2} .

p (x) \in Λ sup \frac{e ^{2 h [W_{1}]}}{e ^{2 h [W_{1} + W_{2}]}} = \frac{1}{2} .

h [M + T]

h [M + T]

= (a) - i \sum \int Pr (M = m_{i}) P_{T} (x - m_{i}) lo g Pr (M = m_{i}) P_{T} (x - m_{i}) d x

= - i \sum Pr (M = m_{i}) lo g Pr (M = m_{i}) \int P_{T} (x - m_{i}) d x - i \sum Pr (M = m_{i}) \int P_{T} (x - m_{i}) lo g P_{T} (x - m_{i}) d x

= (b) - i \sum Pr (M = m_{i}) lo g Pr (M = m_{i}) - \int P_{T} (x) lo g P_{T} (x) d x

= H [M] + h [T],

1

1

= \frac{\frac{1}{2 π e} e ^{2 h [X + W_{1}]} + \frac{1}{2 π e} e ^{2 h [Y + W_{2}]}}{\frac{1}{2 π e} e ^{2 h [X + W_{1} + Y + W_{2}]}}

= (a) \frac{\frac{1}{2 π e} e ^{2 H [X]} e ^{2 h [W_{1}]} + \frac{1}{2 π e} e ^{2 H [Y]} e ^{2 h [W_{2}]}}{\frac{1}{2 π e} e ^{2 H [X + Y]} e ^{2 h [W_{1} + W_{2}]}}

= (b) \frac{e ^{2 h [W_{1}]}}{e ^{2 h [W_{1} + W_{2}]}} \frac{N _{d} [ X ] + N _{d} [ Y ]}{N _{d} [ X + Y ]},

\frac{N _{d} [ X + Y ]}{N _{d} [ X ] + N _{d} [ Y ]} \geq \frac{e ^{2 h [W_{1}]}}{e ^{2 h [W_{1} + W_{2}]}} .

\frac{N _{d} [ X + Y ]}{N _{d} [ X ] + N _{d} [ Y ]} \geq \frac{e ^{2 h [W_{1}]}}{e ^{2 h [W_{1} + W_{2}]}} .

\frac{N _{d} [ X + Y ]}{N _{d} [ X ] + N _{d} [ Y ]} \geq p (x) \in Λ sup \frac{e ^{2 h [W_{1}]}}{e ^{2 h [W_{1} + W_{2}]}} .

\frac{N _{d} [ X + Y ]}{N _{d} [ X ] + N _{d} [ Y ]} \geq p (x) \in Λ sup \frac{e ^{2 h [W_{1}]}}{e ^{2 h [W_{1} + W_{2}]}} .

\frac{e ^{2 h [W_{1}]}}{e ^{2 h [W_{1} + W_{2}]}} \leq \frac{1}{2}

\frac{e ^{2 h [W_{1}]}}{e ^{2 h [W_{1} + W_{2}]}} \leq \frac{1}{2}

p (x) \in Λ sup \frac{e ^{2 h [W_{1}]}}{e ^{2 h [W_{1} + W_{2}]}} \leq \frac{1}{2} .

p (x) \in Λ sup \frac{e ^{2 h [W_{1}]}}{e ^{2 h [W_{1} + W_{2}]}} \leq \frac{1}{2} .

\displaystyle p_{\sigma}\left(x\right)=\left\{\begin{array}[]{cc}\frac{K\left(\sigma\right)}{\sqrt{2\pi}\sigma}{\rm e}^{-\frac{x^{2}}{2\sigma^{2}}}&x\in\left(-\frac{\alpha_{z}}{4},\frac{\alpha_{z}}{4}\right)\\ 0&\text{o.w.},\end{array}\right.

\displaystyle p_{\sigma}\left(x\right)=\left\{\begin{array}[]{cc}\frac{K\left(\sigma\right)}{\sqrt{2\pi}\sigma}{\rm e}^{-\frac{x^{2}}{2\sigma^{2}}}&x\in\left(-\frac{\alpha_{z}}{4},\frac{\alpha_{z}}{4}\right)\\ 0&\text{o.w.},\end{array}\right.

p (x) \in Λ sup \frac{e ^{2 h [W_{1}]}}{e ^{2 h [W_{1} + W_{2}]}}

p (x) \in Λ sup \frac{e ^{2 h [W_{1}]}}{e ^{2 h [W_{1} + W_{2}]}}

\geq (b) \frac{e ^{2 h [W_{1}^{σ}]}}{e ^{2 \times \frac{1}{2} l o g (2 π e E [(W_{1}^{σ} + W_{2}^{σ})^{2}])}}

\geq \frac{e ^{2 h [W_{1}^{σ}]}}{2 π e E [ ( W _{1}^{σ} + W _{2}^{σ} ) ^{2} ]},

E [(W_{1}^{σ} + W_{2}^{σ})^{2}]

E [(W_{1}^{σ} + W_{2}^{σ})^{2}]

= 2 \int_{- \frac{α _{z}}{4}}^{\frac{α _{z}}{4}} x^{2} \frac{K ( σ )}{2 π σ} e^{- \frac{x ^{2}}{2 σ ^{2}}} d x

\leq 2 K (σ) \int_{- \infty}^{\infty} \frac{x ^{2}}{2 π σ} e^{- \frac{x ^{2}}{2 σ ^{2}}} d x

= 2 K (σ) σ^{2} .

h [W_{1}^{σ}]

h [W_{1}^{σ}]

= - lo g K (σ) - K (σ) \int_{- \frac{α _{z}}{4}}^{\frac{α _{z}}{4}} \frac{1}{2 π σ} e^{- \frac{1}{2 σ ^{2}} x^{2}} lo g \frac{1}{2 π σ} e^{- \frac{1}{2 σ ^{2}} x^{2}} d x

= - lo g K (σ) - K (σ) [\int_{- \infty}^{\infty} \frac{1}{2 π σ} e^{- \frac{1}{2 σ ^{2}} x^{2}} lo g \frac{1}{2 π σ} e^{- \frac{1}{2 σ ^{2}} x^{2}} d x - 2 \int_{\frac{α _{z}}{4}}^{\infty} \frac{1}{2 π σ} e^{- \frac{1}{2 σ ^{2}} x^{2}} lo g \frac{1}{2 π σ} e^{- \frac{1}{2 σ ^{2}} x^{2}} d x]

= - lo g K (σ) - K (σ) [- \frac{1}{2} lo g (2 π e σ^{2}) - 2 \int_{\frac{α _{z}}{4 σ}}^{\infty} \frac{1}{2 π} e^{- \frac{1}{2} x^{2}} lo g \frac{1}{2 π σ} e^{- \frac{1}{2} x^{2}} d x]

= - lo g K (σ) - K (σ) - \frac{1}{2} lo g (2 π e σ^{2}) + 2 η (σ) lo g 2 π σ \int_{\frac{α _{z}}{4 σ}}^{\infty} \frac{1}{2 π} e^{- \frac{1}{2} x^{2}} d x + Φ (σ) \int_{\frac{α _{z}}{4 σ}}^{\infty} \frac{x ^{2}}{2 π} e^{- \frac{1}{2} x^{2}} d x

= - lo g K (σ) - K (σ) [- \frac{1}{2} lo g (2 π e σ^{2}) + 2 η (σ) + Φ (σ)] .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWireless Communication Security Techniques · Limits and Structures in Graph Theory · Distributed Sensor Networks and Detection Algorithms

Full text

An Entropy Power Inequality for Discrete Random Variables

Ehsan Nekouei, Mikael Skoglund and Karl H. Johansson School of electrical engineering and computer science, KTH Royal Institute of Technology, Stockholm, Sweden. {nekouei,skoglund,kallej}@kth.se. This work is supported by the Knut and Alice Wallenberg Foundation, the Swedish Foundation for Strategic Research and the Swedish Research Council.

Abstract

Let $\mathsf{N}_{\rm d}\left[X\right]=\frac{1}{2\pi{\rm e}}{\rm e}^{2\mathsf{H}\left[X\right]}$ denote the entropy power of the discrete random variable $X$ where $\mathsf{H}\left[X\right]$ denotes the discrete entropy of $X$ . In this paper, we show that for two independent discrete random variables $X$ and $Y$ , the entropy power inequality $\mathsf{N}_{\rm d}\left[X\right]+\mathsf{N}_{\rm d}\left[Y\right]\leq 2\mathsf{N}_{\rm d}\left[X+Y\right]$ holds and it can be tight. The basic idea behind the proof is to perturb the discrete random variables using suitably designed continuous random variables. Then, the continuous entropy power inequality is applied to the sum of the perturbed random variables and the resulting lower bound is optimized.

Index Terms:

Discrete entropy power inequality.

I Introduction

The continuous entropy power inequality [1], [2], [3] asserts that for two independent absolutely continuous random variables (rvs) $U$ and $V$ , the following inequality holds

[TABLE]

where $\mathsf{N}_{\rm c}\left[\cdot\right]=\frac{1}{2\pi e{}}{\rm e}^{2\mathsf{h}\left[\cdot\right]}$ and $\mathsf{h}\left[\cdot\right]$ denote the continuous entropy power and the differential entropy functionals, respectively. In the information theory literature, substantial efforts have been dedicated to obtaining an analogue of (1) for discrete rvs. In general, the discrete counterpart of (1), where the differential entropy is replaced by the discrete entropy, does not hold for discrete rvs. Classes of discrete rvs which satisfy the discrete version of (1) have been studied in the literature. Let $B\left(n,p\right)$ denote a binomial distribution with $n$ trials and success probability $p$ . Harremoës and Vignat [4] showed that the discrete version of (1) holds for two binomial rvs distributed according to $B\left(n,p\right)$ and $B\left(m,p\right)$ with $p=\frac{1}{2}$ and $m,n\in\mathbb{N}$ . Sharma et al., [5] proved that this result holds for $p\in\left(0,1\right)$ when $m$ and $n$ are sufficiently large.

The authors of [6] showed that the discrete version of (1) holds for two independent and uniformly distributed rvs. A variant of the entropy power inequality for ultra log-concave discrete rvs has been derived in [7] using Rényi’s thinning operation. It worth mentioning that lower bounds on the entropy of a sum of independent discrete rvs have been investigated extensively in the literature. The interested reader is referred to [8], [9] and references therein for more information on this line of research.

In this paper, we derive a discrete entropy power inequality, which is analogous to the continuous entropy power inequality and holds for the sum of two arbitrarily distributed, independent discrete rvs. More specifically, it is shown that for two independent discrete rvs $X$ and $Y$ , we have

[TABLE]

regardless of their distributions, where $\mathsf{N}_{\rm d}\left[\cdot\right]=\frac{1}{2\pi{\rm e}}{\rm e}^{2\mathsf{H}\left[\cdot\right]}$ and $\mathsf{H}\left[\cdot\right]$ denote the discrete entropy power and the discrete entropy, respectively.

I-A Notation and Organization of The Paper

Let $V$ denote a generic continuous random variable taking values on $\mathbb{R}$ . The differential entropy of $V$ and its (continuous) entropy power are defined as

[TABLE]

where $p_{V}\left(x\right)$ denotes the probability density function (pdf) of $V$ . For a generic discrete random variable $X$ , its discrete entropy and entropy power are defined as

[TABLE]

The rest of this paper is organized as follows. Next section presents our main result along with the key steps of its proof. Detailed proofs of the steps are presented in Section III.

II The Main Result

The following theorem establishes an entropy power inequality for the sum of two independent discrete rvs.

Theorem 1

Consider two independent discrete rvs $X$ and $Y$ . Then, we have

[TABLE]

Moreover, the equality is achieved when the “effective” support sets of $X$ and $Y$ are singletons.

Theorem 1 establishes an upper bound on the sum of entropy powers of two independent discrete rvs. According to this result, the sum of the entropy powers of two independent discrete rvs is always less than twice of the entropy power of their sum. Also, the inequality is tight when each rv only takes one value from its support set with probability one. Note that the difference between the two sides of (2) becomes small when the probability mass function of each rv is highly concentrated around one element of its support set.

II-A Proof of Theorem 1

The proof of Theorem 1 relies on $1)$ perturbing the discrete rvs by carefully chosen continuous rvs, $2)$ applying the continuous entropy power inequality to the sum of perturbed rvs, and $3)$ optimizing the lower bound obtained in step $2$ . In this subsection, Theorem 1 is proved using four key lemmas.

Let $M$ denote a discrete rv taking values in $\left\{m_{1},\dots,m_{k}\right\}$ and $\alpha_{m}$ denote the minimum spacing between its atoms, i.e, $\alpha_{m}=\min_{i\neq j}\left|m_{i}-m_{j}\right|$ . Also, let $T$ denote a real-valued rv, independent of $M$ , with $\left|T\right|<\frac{\alpha_{m}}{2}$ almost surely (a.s.). We assume that $T$ is absolutely continuous with respect to the Lebesgue measure on the real line and has finite differential entropy.

The following lemma derives an expression for the differential entropy of $M+T$ . Its proof is presented in Subsection III-A.

Lemma 1

The differential entropy of $M+T$ can be written as

[TABLE]

where $\mathsf{h}\left[\cdot\right]$ and $\mathsf{H}\left[\cdot\right]$ denote the differential entropy and the discrete entropy, respectively.

Let $X$ and $Y$ denote independent discrete rvs, and $Z$ denote their sum. Let $\alpha_{x}$ , $\alpha_{y}$ and $\alpha_{z}$ denote the minimum spacing of $X$ , $Y$ and $Z$ , respectively. Next lemma derives an upper bound on the minimum spacing of $Z$ . The proof of this result is straightforward and is skipped.

Lemma 2

We have $\alpha_{z}\leq\min\left(\alpha_{x},\alpha_{y}\right)$ .

According to this lemma, the minimum spacing between the atoms of $Z$ is not larger than those of $X$ and $Y$ .

Let $W_{1}$ and $W_{2}$ be independent and identically distributed (iid) absolutely continuous rvs which are independent of $X$ and $Y$ ; and take values in $\left(-\frac{\alpha_{z}}{4},\frac{\alpha_{z}}{4}\right)$ . Let $p\left(x\right)$ denote the common probability density function (pdf) of $W_{1}$ and $W_{2}$ (with respect to the Lebesgue measure on the real line) and assume it has finite differential entropy. Consider the rvs $X+W_{1}$ and $Y+W_{2}$ which are obtained by perturbing $X$ and $Y$ using $W_{1}$ and $W_{2}$ . From Lemmas 1 and 2, we have

[TABLE]

Moreover, using Lemma 1 and the fact that $\left|W_{1}+W_{2}\right|<\frac{\alpha_{z}}{2}$ a.s., we have

[TABLE]

The equalities (II-A) and (5) are used to establish an inequality on the entropy power of $X+Y$ in Lemma 3. This lemma is proved in Subsection III-B by applying the continuous entropy power inequality to the sum of the perturbed rvs $X+W_{1}$ and $Y+W_{2}$ .

Lemma 3

Let $\Lambda$ denote the set of pdfs defined on $\left(-\frac{\alpha_{z}}{4},\frac{\alpha_{z}}{4}\right)$ and have finite differential entropies. Then, we have

[TABLE]

where $W_{1}$ and $W_{2}$ are two independent absolutely continuous rvs with pdf $p\left(x\right)\in\Lambda$ .

Next lemma characterizes the lower bound in Lemma 3. The proof of this lemma is relegated to Subsection III-C.

Lemma 4

[TABLE]

The proof of Theorem 1 follows from Lemmas 3 and 4.

III Proofs of Lemmas

III-A Proof of Lemma 1

Let $P_{T}\left(x\right)$ denote the pdf of $T$ . Then, the pdf of $M+T$ can be written as $\sum_{i}\mathsf{Pr}\left(M=m_{i}\right)P_{T}\left(x-m_{i}\right)$ . The assumption $\left|T\right|<\frac{\alpha_{m}}{2}$ implies that the size of the support set of $T$ is less than the minimum spacing of $M$ . This observation implies that the pdf of $M+T$ is composed of $k$ non-overlapping components. Using the definition of the differential entropy, we have

[TABLE]

where $(a)$ follows from the fact that the components of the pdf of $M+T$ are non-overlapping and $(b)$ from the fact that the differential entropy is shift-invariant.

III-B Proof of Lemma 3

Using the entropy power inequality for continuous rvs [1], we have

[TABLE]

where $(a)$ follows from equalities (II-A) and (5) and $(b)$ follows from the fact that $W_{1}$ and $W_{2}$ are identically distributed. Hence, we have

[TABLE]

Inequality (6) holds for any pdf defined on $\left(-\frac{\alpha_{z}}{2},\frac{\alpha_{z}}{2}\right)$ with a finite differential entropy. Thus, we have

[TABLE]

III-C Proof of Lemma 4

Using the entropy power inequality for continuous rvs, we have

[TABLE]

for all independent and identically distributed rvs $W_{1}$ and $W_{2}$ with the common pdf in $\Lambda$ . Thus, we have

[TABLE]

To show the other direction, let $N\left(0,\sigma^{2}\right)$ denote the pdf of a Gaussian rv with zero mean and variance $\sigma^{2}$ . Let $p_{\sigma}\left(x\right)$ denote the pdf obtained by truncating $N\left(0,\sigma^{2}\right)$ outside $\left(-\frac{\alpha_{z}}{4},\frac{\alpha_{z}}{4}\right)$ , i.e.,

[TABLE]

where $K\left(\sigma\right)=\left(\int_{-\frac{\alpha_{z}}{4}}^{\frac{\alpha_{z}}{4}}\frac{1}{\sqrt{2\pi}\sigma}{\rm e}^{-\frac{x^{2}}{2\sigma^{2}}}dx\right)^{-1}$ is the normalizing factor. Let $W_{1}^{\sigma}$ and $W_{2}^{\sigma}$ be two independent rvs distributed according to $p_{\sigma}\left(x\right)$ . Then, we have

[TABLE]

where $(a)$ follows from the fact that $p_{\sigma}\left(x\right)$ belongs to $\Lambda$ and $(b)$ follows from the entropy maximizing property of Gaussian distributions. The variance of $W_{1}^{\sigma}+W_{2}^{\sigma}$ can be upper bounded as

[TABLE]

Moreover, the differential entropy of ${W_{1}^{\sigma}}$ can be written as

[TABLE]

Using (III-C) and (III-C), we have

[TABLE]

Note that $\lim_{\sigma\downarrow 0}K\left(\sigma\right)=1$ and $\lim_{\sigma\downarrow 0}\Phi\left(\sigma\right)=0$ . The term $\left|\eta\left(\sigma\right)\right|$ can be upper bounded as

[TABLE]

where $(a)$ follows from the fact that $\int_{x}^{\infty}\frac{1}{\sqrt{2\pi}}{\rm e}^{-\frac{1}{2}x^{2}}dx\leq{\rm e}^{-\frac{x^{2}}{2}}$ for $x>0$ [10]. Thus, we have $\lim_{\sigma\downarrow 0}\eta\left(\sigma\right)=0$ . The term $K\left(\sigma\right)-1$ can be written as

[TABLE]

which implies that $\lim_{\sigma\downarrow 0}\log\left(2\pi{\rm e}\sigma^{2}\right)\left[K\left(\sigma\right)-1\right]=0$ . Thus, we have $\lim_{\sigma\downarrow 0}F\left(\sigma\right)=1$ .

For a given $\epsilon>0$ , we can find $\sigma_{0}$ small enough such that $F\left(\sigma_{0}\right)\geq 1-\epsilon$ and $p_{\sigma_{0}}\left(x\right)\in\Lambda$ . Thus, we have

[TABLE]

for $\sigma_{0}$ sufficiently small. The desired result follows from the fact that $\epsilon>0$ is arbitrary.

Bibliography10

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal , vol. 27, no. 3, pp. 379–423, 7 1948.
2[2] A. Stam, “Some inequalities satisfied by the quantities of information of Fisher and Shannon,” Information and Control , vol. 2, no. 2, pp. 101 – 112, 1959.
3[3] N. Blachman, “The convolution inequality for entropy powers,” IEEE Transactions on Information Theory , vol. 11, no. 2, pp. 267–271, April 1965.
4[4] P. Harremoës and C. Vignat, “An entropy power inequality for the binomial family,” Journal of Inequalities in Pure & Applied Mathematics] , vol. 4, no. 5, pp. 1–6, 2003.
5[5] N. Sharma, S. Das, and S. Muthukrishnan, “Entropy power inequality for a family of discrete random variables,” in IEEE International Symposium on Information Theory Proceedings , July 2011, pp. 1945–1949.
6[6] J. O. Woo and M. Madiman, “A discrete entropy power inequality for uniform distributions,” in IEEE International Symposium on Information Theory , June 2015, pp. 1625–1629.
7[7] O. Johnson and Y. Yu, “Monotonicity, thinning, and discrete versions of the entropy power inequality,” IEEE Transactions on Information Theory , vol. 56, no. 11, pp. 5387–5395, Nov 2010.
8[8] S. Haghighatshoar, E. Abbe, and I. E. Telatar, “A new entropy power inequality for integer-valued random variables,” IEEE Transactions on Information Theory , vol. 60, no. 7, pp. 3787–3796, July 2014.