Remarks on the R\'{e}nyi Entropy of a sum of IID random variables

Benjamin Jaye; Galyna V. Livshyts; Grigoris Paouris; Peter Pivovarov

arXiv:1904.08038·cs.IT·December 12, 2019

Remarks on the R\'{e}nyi Entropy of a sum of IID random variables

Benjamin Jaye, Galyna V. Livshyts, Grigoris Paouris, Peter Pivovarov

PDF

TL;DR

This paper investigates a conjecture regarding the Rényi entropy of sums of IID variables, revealing that the generalized Gaussian distribution does not minimize the entropy as previously conjectured.

Contribution

The study disproves a conjecture by showing that the generalized Gaussian is not the entropy minimizer for sums of independent variables.

Findings

01

Generalized Gaussian does not minimize Rényi entropy for sums of IID variables.

02

Disproves a conjecture by Madiman and Wang.

03

Uses variational analysis to reach conclusions.

Abstract

In this note we study a conjecture of Madiman and Wang which predicted that the generalized Gaussian distribution minimizes the R\'{e}nyi entropy of the sum of independent random variables. Through a variational analysis, we show that the generalized Gaussian fails to be a minimizer for the problem.

Equations99

h_{p} (X) = - \frac{1}{p - 1} lo g \int_{R^{d}} f (x)^{p} d μ_{d} (x) = - \frac{1}{p - 1} lo g ∥ f ∥_{p}^{p},

h_{p} (X) = - \frac{1}{p - 1} lo g \int_{R^{d}} f (x)^{p} d μ_{d} (x) = - \frac{1}{p - 1} lo g ∥ f ∥_{p}^{p},

h (X) = - \int_{R^{d}} f (x) lo g f (x) d μ_{d} (x)

h (X) = - \int_{R^{d}} f (x) lo g f (x) d μ_{d} (x)

N (X_{1} + X_{2}) \geq N (Z_{1} + Z_{2}),

N (X_{1} + X_{2}) \geq N (Z_{1} + Z_{2}),

G_{β, p} (x) = α (1 - β ∣ x ∣^{2})_{+}^{1/ (p - 1)},

G_{β, p} (x) = α (1 - β ∣ x ∣^{2})_{+}^{1/ (p - 1)},

h_{p} (X_{1} + \dots + X_{n}) \geq h_{p} (Z_{1} + \dots + Z_{n}) .

h_{p} (X_{1} + \dots + X_{n}) \geq h_{p} (Z_{1} + \dots + Z_{n}) .

\mathcal{F}=\bigl{\{}f\in L^{1}(\mathbb{R}^{d})\cap L^{p}(\mathbb{R}^{d}),\,f\geq 0,\,\|f\|_{p}^{p}=M,\,\|f\|_{1}=1\bigl{\}}

\mathcal{F}=\bigl{\{}f\in L^{1}(\mathbb{R}^{d})\cap L^{p}(\mathbb{R}^{d}),\,f\geq 0,\,\|f\|_{p}^{p}=M,\,\|f\|_{1}=1\bigl{\}}

{Maximize I (f) = def \int_{R^{d}} [C_{n} (f) (x)]^{p} d μ_{d} (x) subject to f \in F .

{Maximize I (f) = def \int_{R^{d}} [C_{n} (f) (x)]^{p} d μ_{d} (x) subject to f \in F .

Λ = Λ (p, M) = sup {I (f) : f \in F} .

Λ = Λ (p, M) = sup {I (f) : f \in F} .

\widetilde{f}=\frac{1}{\lambda^{d}\|f\|_{1}}f\Bigl{(}\frac{\cdot}{\lambda}\Bigl{)},\text{ with }\lambda=\Bigl{(}\frac{\|f\|_{p}^{p}}{M\|f\|_{1}^{p}}\Bigl{)}^{\tfrac{1}{d(p-1)}},

\widetilde{f}=\frac{1}{\lambda^{d}\|f\|_{1}}f\Bigl{(}\frac{\cdot}{\lambda}\Bigl{)},\text{ with }\lambda=\Bigl{(}\frac{\|f\|_{p}^{p}}{M\|f\|_{1}^{p}}\Bigl{)}^{\tfrac{1}{d(p-1)}},

I (f) = \frac{M}{∥ f ∥ _{p}^{p}} \frac{1}{∥ f ∥ _{1}^{p (n - 1)}} I (f) .

I (f) = \frac{M}{∥ f ∥ _{p}^{p}} \frac{1}{∥ f ∥ _{1}^{p (n - 1)}} I (f) .

∥ f ∥_{r}^{r} = \frac{1}{λ ^{d (r - 1)} ∥ f ∥ _{1}^{r}} ∥ f ∥_{r}^{r} .

∥ f ∥_{r}^{r} = \frac{1}{λ ^{d (r - 1)} ∥ f ∥ _{1}^{r}} ∥ f ∥_{r}^{r} .

\mathcal{C}_{n}(\widetilde{f})(x)=\frac{1}{\|f\|_{1}^{n}\lambda^{d}}\mathcal{C}_{n}(f)\bigl{(}\frac{x}{\lambda}\bigl{)}\text{ for any }x\in\mathbb{R}^{d}.

\mathcal{C}_{n}(\widetilde{f})(x)=\frac{1}{\|f\|_{1}^{n}\lambda^{d}}\mathcal{C}_{n}(f)\bigl{(}\frac{x}{\lambda}\bigl{)}\text{ for any }x\in\mathbb{R}^{d}.

I (f) = \frac{1}{λ ^{d (p - 1)} ∥ f ∥ _{1}^{p n}} I (f),

I (f) = \frac{1}{λ ^{d (p - 1)} ∥ f ∥ _{1}^{p n}} I (f),

j \to \infty lim \int_{I} v_{j} (s) d μ_{1} (s) = \int_{I} v (s) d μ_{1} (s) .

j \to \infty lim \int_{I} v_{j} (s) d μ_{1} (s) = \int_{I} v (s) d μ_{1} (s) .

v (r) = k \to \infty lim \frac{1}{2 ^{- k}} \int_{I_{k}} v (s) d μ_{1} (s) = k \to \infty lim j \to \infty lim \frac{1}{2 ^{- k}} \int_{I_{k}} v_{j} (s) d μ_{1} (s)

v (r) = k \to \infty lim \frac{1}{2 ^{- k}} \int_{I_{k}} v (s) d μ_{1} (s) = k \to \infty lim j \to \infty lim \frac{1}{2 ^{- k}} \int_{I_{k}} v_{j} (s) d μ_{1} (s)

v (r) \geq j \to \infty lim sup v_{j} (r) .

v (r) \geq j \to \infty lim sup v_{j} (r) .

v (r) \leq j \to \infty lim inf v_{j} (r) .

v (r) \leq j \to \infty lim inf v_{j} (r) .

j ⋃ {∣ f_{j} ∣ \geq \frac{δ}{2}} \cup {∣ f ∣ \geq \frac{δ}{2}} \subset B,

j ⋃ {∣ f_{j} ∣ \geq \frac{δ}{2}} \cup {∣ f ∣ \geq \frac{δ}{2}} \subset B,

\int_{\mathbb{R}^{d}\backslash B}|f_{j}(x)-f(x)|^{q}d\mu_{d}(x)\leq\delta^{q-1}\Bigl{(}\|f_{j}\|_{1}+\|f\|_{1}\Bigl{)}\leq 2\delta^{q-1}<\frac{\varepsilon}{3}

\int_{\mathbb{R}^{d}\backslash B}|f_{j}(x)-f(x)|^{q}d\mu_{d}(x)\leq\delta^{q-1}\Bigl{(}\|f_{j}\|_{1}+\|f\|_{1}\Bigl{)}\leq 2\delta^{q-1}<\frac{\varepsilon}{3}

\int_{B \cap {∣ f_{j} - f ∣ < ϰ}} ∣ f_{j} (x) - f (x) ∣^{q} d μ_{d} (x) \leq μ_{d} (B) ϰ^{q} < \frac{ε}{3}

\int_{B \cap {∣ f_{j} - f ∣ < ϰ}} ∣ f_{j} (x) - f (x) ∣^{q} d μ_{d} (x) \leq μ_{d} (B) ϰ^{q} < \frac{ε}{3}

\int_{B \cap {∣ f_{j} - f ∣ \geq ϰ}} ∣ f_{j} (x) - f (x) ∣^{q} d μ_{d} (x) \leq μ_{d} (B \cap {∣ f_{j} - f ∣ \geq ϰ})^{1 - q / p} ∥ f_{j} - f ∥_{p}^{q} \leq 2^{q} M^{q / p} μ_{d} (B \cap {∣ f_{j} - f ∣ \geq ϰ})^{1 - q / p},

\int_{B \cap {∣ f_{j} - f ∣ \geq ϰ}} ∣ f_{j} (x) - f (x) ∣^{q} d μ_{d} (x) \leq μ_{d} (B \cap {∣ f_{j} - f ∣ \geq ϰ})^{1 - q / p} ∥ f_{j} - f ∥_{p}^{q} \leq 2^{q} M^{q / p} μ_{d} (B \cap {∣ f_{j} - f ∣ \geq ϰ})^{1 - q / p},

\int_{B \cap {∣ f_{j} - f ∣ \geq ϰ}} ∣ f_{j} (x) - f (x) ∣^{q} d μ_{d} (x) < \frac{ε}{3} for all j \geq N .

\int_{B \cap {∣ f_{j} - f ∣ \geq ϰ}} ∣ f_{j} (x) - f (x) ∣^{q} d μ_{d} (x) < \frac{ε}{3} for all j \geq N .

\Bigl{(}\int_{\mathbb{R}^{d}}|g_{1}*g_{2}*\cdots*g_{n}(x)|^{p}d\mu_{d}(x)\Bigl{)}^{1/p}\leq\prod_{j=1}^{n}\|g_{j}\|_{(np^{\prime})^{\prime}},

\Bigl{(}\int_{\mathbb{R}^{d}}|g_{1}*g_{2}*\cdots*g_{n}(x)|^{p}d\mu_{d}(x)\Bigl{)}^{1/p}\leq\prod_{j=1}^{n}\|g_{j}\|_{(np^{\prime})^{\prime}},

|\mathcal{I}(g_{1})^{1/p}-\mathcal{I}(g_{2})^{1/p}|\leq\Bigl{(}\int_{\mathbb{R}^{d}}|\mathcal{C}_{n}(g_{1})(x)-\mathcal{C}_{n}(g_{2})(x)|^{p}d\mu_{d}(x)\Bigl{)}^{1/p},

|\mathcal{I}(g_{1})^{1/p}-\mathcal{I}(g_{2})^{1/p}|\leq\Bigl{(}\int_{\mathbb{R}^{d}}|\mathcal{C}_{n}(g_{1})(x)-\mathcal{C}_{n}(g_{2})(x)|^{p}d\mu_{d}(x)\Bigl{)}^{1/p},

C_{n} (g_{1}) - C_{n} (g_{2}) = k = 0 \sum n - 1 C_{k} (g_{1}) * (g_{1} - g_{2}) * C_{n - k - 1} (g_{2}),

C_{n} (g_{1}) - C_{n} (g_{2}) = k = 0 \sum n - 1 C_{k} (g_{1}) * (g_{1} - g_{2}) * C_{n - k - 1} (g_{2}),

|\mathcal{I}(g_{1})^{1/p}-\mathcal{I}(g_{2})^{1/p}|\leq\sum_{k=0}^{n-1}\Bigl{(}\int_{\mathbb{R}^{d}}|\mathcal{C}_{k}(g_{1})*(g_{1}-g_{2})*\mathcal{C}_{n-k-1}(g_{2})(x)|^{p}d\mu_{d}(x)\Bigl{)}^{1/p}.

|\mathcal{I}(g_{1})^{1/p}-\mathcal{I}(g_{2})^{1/p}|\leq\sum_{k=0}^{n-1}\Bigl{(}\int_{\mathbb{R}^{d}}|\mathcal{C}_{k}(g_{1})*(g_{1}-g_{2})*\mathcal{C}_{n-k-1}(g_{2})(x)|^{p}d\mu_{d}(x)\Bigl{)}^{1/p}.

∣ I (g_{1})^{1/ p} - I (g_{2})^{1/ p} ∣ \leq k = 0 \sum n - 1 ∥ g_{1} ∥_{(n p^{'})^{'}}^{k} ∥ g_{2} ∥_{(n p^{'})^{'}}^{n - k - 1} ∥ g_{1} - g_{2} ∥_{(n p^{'})^{'}} .

∣ I (g_{1})^{1/ p} - I (g_{2})^{1/ p} ∣ \leq k = 0 \sum n - 1 ∥ g_{1} ∥_{(n p^{'})^{'}}^{k} ∥ g_{2} ∥_{(n p^{'})^{'}}^{n - k - 1} ∥ g_{1} - g_{2} ∥_{(n p^{'})^{'}} .

∣ I (f_{j})^{1/ p} - I (f)^{1/ p} ∣ \leq C^{'} (n, p, M) ∥ f_{j} - f ∥_{(n p^{'})^{'}} for every j .

∣ I (f_{j})^{1/ p} - I (f)^{1/ p} ∣ \leq C^{'} (n, p, M) ∥ f_{j} - f ∥_{(n p^{'})^{'}} for every j .

\widetilde{f}=\frac{1}{\|f\|_{1}\lambda^{d}}f\Bigl{(}\frac{\cdot}{\lambda}\Bigl{)},\text{ with }\lambda=\Bigl{(}\frac{\|f\|_{p}^{p}}{M\|f\|_{1}^{p}}\Bigl{)}^{\frac{1}{d(p-1)}}.

\widetilde{f}=\frac{1}{\|f\|_{1}\lambda^{d}}f\Bigl{(}\frac{\cdot}{\lambda}\Bigl{)},\text{ with }\lambda=\Bigl{(}\frac{\|f\|_{p}^{p}}{M\|f\|_{1}^{p}}\Bigl{)}^{\frac{1}{d(p-1)}}.

\int_{R^{d}} f (x) (T (g) * h) (x) d μ_{d} (x) = \int_{R^{d}} (f * g) (x) h (x) d μ_{d} (x) .

\int_{R^{d}} f (x) (T (g) * h) (x) d μ_{d} (x) = \int_{R^{d}} (f * g) (x) h (x) d μ_{d} (x) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Remarks on the Rényi Entropy of a sum of IID random variables

Benjamin Jaye

School of Mathematical Sciences, Clemson University

[email protected]

,

Galyna V. Livshyts

Department of Mathematics, Georgia Tech

[email protected]

,

Grigoris Paouris

Department of Mathematics, Texas A&M

[email protected]

and

Peter Pivovarov

Department of Mathematics, University of Missouri

[email protected]

Abstract.

In this note we study a conjecture of Madiman and Wang [MW] which predicted that the generalized Gaussian distribution minimizes the Rényi entropy of the sum of independent random variables. Through a variational analysis, we show that the generalized Gaussian fails to be a minimizer for the problem.

1. Introduction

For $p>1$ , the $p$ -Rényi [Re] entropy of a (continuous) random vector $X$ in $\mathbb{R}^{d}$ distributed with density $f$ is defined by

[TABLE]

where $\mu_{d}$ denotes the $d$ -dimensional Lebesgue measure. As $p\to 1^{+}$ , $h_{p}(X)$ converges to the usual Shannon entropy

[TABLE]

(provided that the density of $X$ is sufficiently regular to justify passage of the limit). For the entropy power $N(X)=\exp(2h(X)/d)$ , the fundamental entropy power inequality (EPI) of Shannon [Sh] asserts that for independent random vectors $X_{1}$ and $X_{2}$ ,

[TABLE]

where $Z_{1}$ , $Z_{2}$ are independent Gaussians satisfying $N(X_{i})=N(Z_{i}),i=1,2$ . A firm connection between the EPI, $p$ -Rényi entropy and fundamental results like the Brunn-Minkowski and Young’s convolution inequalities goes back to Dembo, Cover and Thomas [DCT]. See Principe [Pr] for more information about where the Rényi entropy arises; see also Bobkov, Marsiglietti [BM] for a related discussion.

Recently, there has been increasing interest in $p$ -Rényi entropy inequalities. Interestingly, the following basic mathematical question is still open: Over all random variables $X$ with $h_{p}(X)$ some fixed quantity, what are the minimizers of the entropy $h_{p}(X+X^{\prime})$ , where $X^{\prime}$ is an independent copy of $X$ ? We learnt about this question from the papers of Madiman, Melbourne, Xu, and Wang [MW, MMX1], who studied unifying entropy power inequalities for the Rényi entropy, which, in the limit $p\to 1^{+}$ recover the statement that, over all probability distributions with $h(X)$ fixed, $h(X+X^{\prime})$ is minimized if (and only if) $X$ is a Gaussian, see e.g. [DCT].

Several closely related questions have been recently addressed involving the $p$ -Rényi entropy power $N_{p}(X)=\exp(\tfrac{2}{d}h_{p}(X))$ . Bobkov and Chistyakov [BCh2] show that there is a constant $c>0$ , depending on $d$ and $p$ , such that $N_{p}(\sum_{j=1}^{n}X_{j})\geq c\sum_{j=1}^{n}N_{p}(X_{j})$ for independent random vectors $X_{1},\dots,X_{n}$ . A sharper form of the constant was subsequently found by Ram and Sason [RS]. Bobkov and Marsiglietti [BM1] proved that $N_{p}(X_{1}+X_{2})^{\alpha}\geq N_{p}(X_{1})^{\alpha}+N_{p}(X_{2})^{\alpha}$ for $X_{1},X_{2}$ independent Random vectors if $\alpha\geq\frac{p+1}{2}$ . There has been considerable further recent success extending the EPI to the Rényi setting [BCh, Li, LiMM, MM, RS, Rioul2].

Following [LYZ, MW, MMX1], for $\beta>0$ , consider the Generalized Gaussian

[TABLE]

where $\alpha$ is chosen so that $\int_{\mathbb{R}^{d}}G_{\beta,p}(x)d\mu_{d}(x)=1$ . The generalized Gaussian is the distribution with the smallest second moment with a given Rényi entropy, see work of Lutwak, Yang, and Zhang [LYZ], as well as earlier results of Costa, Hero, and Vignat [CHV]. Madiman and Wang made the following bold conjecture (Conjecture IV.3 in [MW]).

Conjecture 1.1 (The Madiman-Wang Conjecture).

If $X_{j}$ , $j=1,\dots,n$ , are independent random variables with densities $f_{j}$ , and $Z_{j}$ are independent random variables distributed with respect to $G_{\beta_{j},p}$ where $\beta_{j}$ is chosen so that $h_{p}(X_{j})=h_{p}(Z_{j})$ , then

[TABLE]

This conjecture has been confirmed in the case $p=+\infty$ , see [MMX, MMX2].

In this note we will show that unfortunately this conjecture does not hold in the special case when $d=1$ , $p=2$ , $n=2$ and $X_{1}$ and $X_{2}$ are identically distributed, see Section 4. However, we do suspect that a minimizing distribution is a relatively small perturbation of the generalized Gaussian.

Throughout this note we only consider the case where $X_{1},\dots,X_{n}$ are independent copies of a random variable $X$ with density $f$ . The question of finding the minimizer of $h_{p}(X_{1}+\dots+X_{n})$ with $h_{p}(X)$ fixed can then be rephrased as a constrained maximization problem, which we introduce in Section 2. Subsequently, in Section 3 we take the first variation of this maximization problem. We have not been able to develop a satisfactory theory of the associated Euler-Lagrange equation (3.2), but we show in Section 4 that the generalized Gaussian is not a solution to (3.2), and so fails to be a maximizer of the extremal problem. We conclude the paper with some elementary remarks and speculation.

Acknowledgement. The first named author is supported by NSF DMS-1830128, DMS-1800015 and NSF CAREER DMS-1847301. The second named author is supported by the NSF CAREER DMS-1753260. The third named author is supported by the NSF DMS-1812240. The fourth named author is supported by the NSF DMS-1612936. The work was partially supported by the National Science Foundation under Grant No. DMS-1440140 while the authors were in residence at the Mathematical Sciences Research Institute in Berkeley, California, during the Fall 2017 semester.

The authors are especially grateful to the reviewers for valuable comments and suggestions, which helped improve the paper and clarify the exposition.

2. The constrained maximization problem

Denote by $\mathcal{C}_{n}(f)$ the $(n-1)$ -fold convolution of a given function $f$ with itself, that is, $\mathcal{C}_{n}(f)=f*f*\cdots*f$ , where there are $n$ factors of $f$ (and $n-1$ convolutions). Then $\mathcal{C}_{1}(f)=f$ . It will be convenient to set $\mathcal{C}_{0}(f)=\delta_{0}$ , the Dirac delta measure, so that $g*\mathcal{C}_{0}(f)=g$ for any measurable function $g$ .

Throughout the text, we fix $M>0$ , $n\in\mathbb{N}$ and $p\in(1,\infty)$ . We set

[TABLE]

and consider the extremal problem

[TABLE]

Put

[TABLE]

We begin with a simple scaling lemma, which we will use often in what follows.

Lemma 2.1.

Suppose that $f\in L^{1}(\mathbb{R}^{d})\cap L^{p}(\mathbb{R}^{d})$ is non-negative, and $\|f\|_{1}>0$ . The function

[TABLE]

belongs to $\mathcal{F}$ , and

[TABLE]

Proof.

Observe that, for any $r\in[1,\infty)$ ,

[TABLE]

Plugging in $r=1$ and $r=p$ (and recalling the definition of $\lambda$ ) we see that $\widetilde{f}\in\mathcal{F}$ . Next, observe that

[TABLE]

Whence,

[TABLE]

and the proof is complete by recalling the definition of $\lambda$ . ∎

We next prove that (2.1) has a maximizer. A radial function $f$ on $\mathbb{R}^{d}$ is called decreasing if $f(y)\leq f(x)$ whenever $|y|\geq|x|$ .

Proposition 2.2.

The problem (2.1) has a lower-semicontinuous, radially decreasing, maximizer $Q$ .

Proof.

First observe that for any measurable function $f$ , iterating Riesz’s rearrangement inequality [LL, Theorem 3.7] yields $\mathcal{I}(f)\leq\mathcal{I}(f^{*})$ , where $f^{*}$ is the symmetric rearrangement of $f$ ; see [B, Section 3.4] for related multiple convolution rearrangement inequalities and their equality cases. Also, notice that if $f\in\mathcal{F}$ , then $f^{*}\in\mathcal{F}$ .

Take non-negative functions $f_{j}\in\mathcal{F}$ such that $\Lambda=\lim_{j\to\infty}\mathcal{I}(f_{j})$ (recall $\Lambda$ from (2.2)). By replacing $f_{j}$ with its symmetric rearrangement, we may assume that $f_{j}$ are radial and decreasing. Passing to a subsequence if necessary, we may in addition assume that $f_{j}\to f$ weakly in $L^{p}(\mathbb{R}^{d})$ . Consequently, $f$ is radial, decreasing, $f\geq 0$ , and $\|f\|_{p}^{p}\leq M$ . (To see this, observe that the set of radial decreasing nonnegative functions with norm at most $M^{1/p}$ is a closed convex set in $L^{p}(\mathbb{R}^{d})$ , so by Mazur’s Lemma, see e.g. [LL, Theorem 2.13], this set is weakly closed.) By modifying $f$ on a set of measure zero if necessary, we may assume that $f$ is lower semi-continuous111If $f$ is discontinuous at $x\in\mathbb{R}^{d}$ , then define $f(x)=\sup_{|y|>|x|}f(y)$ (i.e. the one-sided radial limit from the right). Then $\{f>\lambda\}$ is open for every $\lambda>0$ ..

Claim 2.3.

As $j\to\infty$ , $f_{j}\to f$ $\mu_{d}$ -almost everywhere.

Proof.

For $r>0$ , define $v_{j}(r)=f_{j}(x)$ and $v(r)=f(x)$ whenever $|x|=r$ . Then since $f_{j}$ converges weakly to $f$ in $L^{p}(\mathbb{R}^{d})$ , we have that whenever $I$ is a closed interval of finite Lebesgue measure in $(0,\infty)$ ,

[TABLE]

Insofar as the function $v$ is non-decreasing, it has at most countably many points of discontinuity. If $r>0$ is a point of continuity of $v$ , and $I_{k}=[r-2^{-k},r]$ , then

[TABLE]

but since $v_{j}$ is decreasing we have that $v_{j}(s)\geq v_{j}(r)$ for $s\in I_{k}$ . Thus

[TABLE]

Arguing similarly with intervals whose left end-point is $r$ , we also have that

[TABLE]

Thus $\lim_{j\to\infty}v_{j}=v$ at every point of continuity of $v$ . If $E$ is a countable set in $(0,\infty)$ , then $E\times\mathbb{S}^{d-1}$ is a Lebesgue null set in $\mathbb{R}^{d}$ , so the claim follows.∎

Notice that, as a consequence of this claim, Fatou’s Lemma ensures that $\|f\|_{1}\leq 1$ . Our next claim is

Claim 2.4.

If $1<q<p$ , then $f_{j}\to f$ strongly in $L^{q}(\mathbb{R}^{d})$ as $j\to\infty$ .

The proof of this claim is a variant of the Vitali convergence theorem (see e.g. Theorem 9.1.6 of [Ros]), but observe that it does not necessarily hold if one was to remove the radially decreasing property of the functions $f_{j}$ (just consider a sequence of translates of a fixed function).

Proof.

Fix $\varepsilon>0,\delta>0$ . Insofar as the functions $f_{j}$ and $f$ are radially decreasing,

[TABLE]

where $B$ is the closed ball centered at [math] of radius $\bigl{(}\frac{2}{\mu_{d}(B(0,1))\delta}\bigl{)}^{1/d}.$ (Otherwise we would have $\|f_{j}\|_{1}>1$ for some $j$ , or $\|f\|_{1}>1$ .)

On $\mathbb{R}^{d}\backslash B$ , we have $|f_{j}|<\delta/2$ for every $j$ , and $|f|<\delta/2$ , whence

[TABLE]

provided $\delta>0$ is chosen sufficiently small.

Now fix $\varkappa>0$ . Observe that,

[TABLE]

if $\varkappa$ is chosen sufficiently small. On the other hand, since $B$ has finite measure, one can invoke continuity of measure from above, thus we have that $f_{j}\to f$ in measure on $B$ as $j\to\infty$ . From the inequalities

[TABLE]

we infer that there exists $N\in\mathbb{N}$ such that

[TABLE]

Bringing these estimates together, it follows that $\|f_{j}-f\|_{q}^{q}<\varepsilon$ for every $j\geq N$ . ∎

Our next goal is to use this claim in order to show that $\mathcal{I}(f)=\Lambda$ . To this end, observe that repeated application of Young’s convolution inequality [LL] yields that, for any $n$ -tuple of functions $g_{1},\dots,g_{n}$ ,

[TABLE]

where $p^{\prime}=p/(p-1)$ is the Hölder conjugate of $p$ , so $(np^{\prime})^{\prime}=\tfrac{np}{np-p+1}$ . Since $n>1$ , $(np^{\prime})^{\prime}\in(1,p)$ .

To apply this inequality, first use Minkowski’s inequality to observe that,

[TABLE]

but,

[TABLE]

and hence

[TABLE]

Appealing to (2.3) now yields,

[TABLE]

Returning to our sequence $f_{j}$ , it is a consequence of Hölder’s inequality that $\|f_{j}\|_{(np^{\prime})^{\prime}}\leq\|f_{j}\|_{1}^{\theta}\|f_{j}\|_{p}^{1-\theta}$ with some $\theta\in(0,1)$ depending on $n$ and $p$ , so $\|f_{j}\|_{(np^{\prime})^{\prime}}\leq C(M,n,p)$ (and the same inequality holds with $f_{j}$ replaced by $f$ ). Whence there is a constant $C^{\prime}(n,p,M)$ such that

[TABLE]

Since $(np^{\prime})^{\prime}\in(1,p)$ , Claim 2.4 yields that $f_{j}\to f$ in $L^{(np^{\prime})^{\prime}}$ as $j\to\infty$ . Hence $\mathcal{I}(f)=\lim_{j\to\infty}\mathcal{I}(f_{j})=\Lambda$ . (It follows that $f$ is not identically zero.)

It remains to show that $f\in\mathcal{F}$ . To this end, we apply Lemma 2.1: Consider the function

[TABLE]

Then $\widetilde{f}\in\mathcal{F}$ and $\mathcal{I}(\widetilde{f})=\frac{M}{\|f\|_{p}^{p}}\frac{1}{\|f\|_{1}^{p(n-1)}}\Lambda.$ Consequently, if $\|f\|_{p}^{p}<M$ or $\|f\|_{1}<1$ , then $\mathcal{I}(\widetilde{f})>\Lambda$ , which is absurd. Thus $f\in\mathcal{F}$ and the proof of the proposition is complete. ∎

3. The First Variation

With the existence of a maximizer proved, we now wish to analyze it analytically.

To introduce the Euler-Lagrange equation associated to (2.1) it will be convenient to define, for a function $f$ , $\mathcal{T}(f)(x)=f(-x)$ . Observe that, if $f,g,h$ are non-negative measurable functions,

[TABLE]

Proposition 3.1.

A lower-semicontinuous function $Q\in\mathcal{F}$ is a maximizer of the problem (2.1) if and only if

[TABLE]

Remark 3.2.

Observe that if $Q$ is radially decreasing, then $\mathcal{C}_{n-1}(Q)$ is again radially decreasing for any $n\in\mathbb{N}$ , so $\mathcal{T}(\mathcal{C}_{n-1}(Q))=\mathcal{C}_{n-1}(Q)$ in this case.

Proof.

The sufficiency is easy to show. Integrating both sides of (3.2) against $Q$ , and recalling that $Q\in\mathcal{F}$ , yields

[TABLE]

But using Tonelli’s theorem and (3.1), the left hand side is equal to $\int_{\mathbb{R}^{d}}(\mathcal{C}_{n}(Q)(x))^{p}d\mu_{d}(x)=\mathcal{I}(Q)$ .

Conversely, consider a bounded function $\varphi$ compactly supported in the open set $\{Q>0\}$ . Since $Q$ is lower-semicontinuous, $\inf_{\operatorname{supp}(\varphi)}Q>0$ . Therefore, (insofar as $\varphi$ is bounded) there exists a constant $C>0$ such that

[TABLE]

so in particular, there exists $t_{0}>0$ such that for $|t|\leq t_{0}$ it follows that $Q_{t}\stackrel{{\scriptstyle\operatorname{def}}}{{=}}Q+t\varphi$ is non-negative. In the notation of Lemma 2.1 with $f=Q_{t}$ , we consider the function

[TABLE]

with the corresponding $\lambda>0$ satisfying $\|\widetilde{Q}_{t}\|^{p}_{p}=\|Q\|_{p}^{p}=M$ . Of course we also have $\int_{\mathbb{R}}{\widetilde{Q}_{t}}(x)\,d\mu_{d}(x)=1$ regardless of $\lambda$ for $|t|<t_{0}$ . We conclude that ${\widetilde{Q}_{t}}$ belongs to $\mathcal{F}$ , and therefore

[TABLE]

Moreover, as in Lemma 2.1,

[TABLE]

For $|t|<t_{0}$ , we calculate, using commutativity and associativity of the convolution operator,

[TABLE]

and

[TABLE]

Crudely employing the bound (3.3) in (3.6), we infer that there is a constant $C>0$ , depending on $n$ , $p$ and $t_{0},$ such that for all $|t|<t_{0}$ ,

[TABLE]

Whence, the second order Taylor formula yields that

[TABLE]

for $|t|<t_{0}$ . Integrating the pointwise inequality (3.7) yields

[TABLE]

as $t\to 0$ .

Now, recalling the definition of $\lambda$ , we calculate

[TABLE]

where in the expansion of $\|Q+t\varphi\|_{p}^{p}$ we have again used the inequality (3.3) to obtain the $O(t^{2})$ term.

Plugging the two expansions (LABEL:scaleexpansion) and (3.8) into (3.5) yields that, as $t\to 0$ ,

[TABLE]

From (3.4) it follows that $\lim_{t\to 0}\frac{\mathcal{I}(\widetilde{Q}_{t})-\mathcal{I}(Q)}{t}=0$ , so the second term in the prior expansion must vanish, that is,

[TABLE]

where (3.1) has been used. Since $\varphi$ was any bounded function compactly supported in $\{Q>0\}$ , we conclude that (3.2) holds.∎

4. On the Madiman-Wang conjecture

Proposition 4.1.

The generalized Gaussian is not the extremizer for problem (2.1).

Proof.

Consider the simplest case $d=1$ , $p=2$ , and $n=2$ . We shall show that the function $G(x)=\alpha(1-|x|^{2})_{+}$ does not satisfy the equation

[TABLE]

and so no function of the form $\frac{c}{\lambda}G(\frac{\cdot}{\lambda})$ , with $c,\lambda>0$ , satisfies (3.2), for any value of $\Lambda$ (recall Remark 3.2). In fact, we shall show that $\mathcal{C}_{3}(G)=G*G*G$ is not a quadratic polynomial near [math].

For this, observe:

[TABLE]

Thus, $(G*G*G)^{\prime\prime\prime\prime\prime\prime}=(G^{\prime\prime}*G^{\prime\prime}*G^{\prime\prime})$ is the threefold convolution of the above measure. The threefold convolution of $-2\chi_{[-1,1]}$ equals $-8(3-|x|^{2})_{+}$ on $[-1,1]$ , and no other term in the convolution $G^{\prime\prime}*G^{\prime\prime}*G^{\prime\prime}$ is quadratic in $|x|$ . Therefore, $G*G*G$ has non-vanishing sixth derivative at [math], but $a+bG$ does have vanishing sixth derivative at [math]. ∎

Remark 4.2.

Moreover, for any dimension $d$ , the random vector $X$ in $\mathbb{R}^{d}$ with i.i.d. coordinates $X_{i}$ , each distributed according to the generalized Gaussian density, does not constitute the extremizer for this problem. Indeed, in this case $h_{p}(X)=dh_{p}(X_{i})$ , and it remains to use Proposition 4.1. Therefore, a random vector with i.i.d. coordinates which are generalized Gaussians is not an extremal case for this question.

5. Any radially decreasing solution of (3.2) is compactly supported

In this section, we discuss the following

Proposition 5.1.

Decreasing radial solutions of (3.2) are compactly supported.

Proof.

Suppose that $Q\in\mathcal{F}$ solves (3.2) and $Q$ is not compactly supported. Since $Q$ is non-negative and radially decreasing, its support is $\mathbb{R}^{d}$ .

The term $G=\mathcal{C}_{n-1}(Q)*(\mathcal{C}_{n}(Q))^{p-1}$ on the left hand side of (3.2) belongs to $L^{r}$ , where $r=\max(1,1/(p-1))$ . Indeed, if $p\geq 2$ then $\int_{\mathbb{R}^{d}}G(x)d\mu_{d}(x)=\int_{\mathbb{R}^{d}}[\mathcal{C}_{n}(Q)(x)]^{p-1}d\mu_{d}(x)$ (recall that $Q\geq 0$ with $\int_{\mathbb{R}^{d}}Q(x)d\mu_{d}(x)=1$ ), but $\int_{\mathbb{R}^{d}}\mathcal{C}_{n}(Q)(x)d\mu_{d}(x)=1$ and

[TABLE]

so $\mathcal{C}_{n}(Q)\in L^{p-1}(\mathbb{R}^{d})$ . If $1<p<2$ , then $t\mapsto t^{1/(p-1)}$ is convex, so by Jensen’s inequality, $G^{1/(p-1)}\leq\mathcal{C}_{n-1}(Q)*(\mathcal{C}_{n}(Q)^{p-1})^{1/(p-1)}=\mathcal{C}_{2n-1}(Q)$ , whence $\|G\|_{1/(p-1)}\leq 1$ in this case.

On the other hand, the right hand side of (3.2) belongs to $L^{r}$ only if $\Lambda=0$ , which is absurd, since $\mathcal{F}$ certainly contains non-zero functions. ∎

6. Remarks

In this section we make some remarks that suggest that although the generalized Gaussian is not an optimal distribution for the problem (2.1), a reasonably small perturbation of the generalized Gaussian could well be.

Beginning with $f_{0}(x)=\mathbf{1}_{[-1,1]}$ , consider the following iteration for $j\geq 1$

[TABLE]

Numerically, this iteration converges pointwise to a solution of the equation (4.1) for some $a,b>0$ satisfying the constraints $f(0)=1$ and $f(1)=0$ (so the support of $f$ is $[-1,1]$ ). The resulting function $f$ can then be re-scaled via the transformation $\frac{c}{\lambda}f(\tfrac{\cdot}{\lambda})$ ( $c,\lambda>0$ ) to have any given positive integral and $L^{2}$ -norm. We do not know if the solution of $\mathcal{C}_{3}(f)=af+b$ is unique (modulo natural invariants in the problem), so we cannot say that this function $f$ corresponds to a solution of the constrained maximization problem (2.1).

We provide the graphs of $f_{1},f_{2},f_{3}$ and $f_{4}$ (see Figure 1 below), and the algebraic expressions for $f_{1}$ , $f_{2}$ and $f_{3}$ on $[-1,1]$ .

[TABLE]

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[B Ch] S. G. Bobkov, G. P. Chistyakov, Bounds for the maximum of the density of the sum of independent random variables, Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI), 408(Veroyatnost i Statistika. 18):62-73, 324, 2012.
2[B Ch 2] S. G. Bobkov, G. P. Chistyakov, Entropy power inequality for the Renyi entropy , IEEE Trans. Inform. Theory, 61(2):708-714, February 2015.
3[BM 1] S. Bobkov, A. Marsiglietti, Variants of the entropy power inequality, IEEE Transactions of Information Theory, 63(12):7747-7752, (2017).
4[BM] S. Bobkov, A. Marsiglietti, Asymptotic behavior of Rényi entropy in the central limit theorem, submitted, ar Xiv:1802.10212.
5[B] A. Burchard, Cases of equality in the Riesz rearrangement inequality. Thesis (Ph.D.) Georgia Institute of Technology. 1994. 94 pp.
6[CHV] J. Costa, A. Hero, and C. Vignat, On solutions to multivariate maximum alpha-entropy problems, Lecture Notes in Computer Science, vol. 2683, no.EMMCVPR 2003, Lisbon, 7-9 July 2003, pp. 211-228, 2003.
7[DCT] A. Dembo, T. M. Cover and J. A. Thomas, Information theoretic inequalities, IEEE Transactions on Information Theory, vol. 37, no. 6, pp. 1501–1518, Nov. 1991.
8[Li] J. Li, Renyi entropy power inequality and a reverse , Studia Mathematica, 242 (2018) 303-319.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Remarks on the Rényi Entropy of a sum of IID random variables

Abstract.

1. Introduction

Conjecture 1.1** (The Madiman-Wang Conjecture).**

2. The constrained maximization problem

Lemma 2.1**.**

Proof.

Proposition 2.2**.**

Proof.

Claim 2.3**.**

Proof.

Claim 2.4**.**

Proof.

3. The First Variation

Proposition 3.1**.**

Remark 3.2**.**

Proof.

4. On the Madiman-Wang conjecture

Proposition 4.1**.**

Proof.

Remark 4.2**.**

5. Any radially decreasing solution of (3.2) is compactly supported

Proposition 5.1**.**

Proof.

6. Remarks

Conjecture 1.1 (The Madiman-Wang Conjecture).

Lemma 2.1.

Proposition 2.2.

Claim 2.3.

Claim 2.4.

Proposition 3.1.

Remark 3.2.

Proposition 4.1.

Remark 4.2.

Proposition 5.1.