Remarks on the R\'{e}nyi Entropy of a sum of IID random variables
Benjamin Jaye, Galyna V. Livshyts, Grigoris Paouris, Peter Pivovarov

TL;DR
This paper investigates a conjecture regarding the Rényi entropy of sums of IID variables, revealing that the generalized Gaussian distribution does not minimize the entropy as previously conjectured.
Contribution
The study disproves a conjecture by showing that the generalized Gaussian is not the entropy minimizer for sums of independent variables.
Findings
Generalized Gaussian does not minimize Rényi entropy for sums of IID variables.
Disproves a conjecture by Madiman and Wang.
Uses variational analysis to reach conclusions.
Abstract
In this note we study a conjecture of Madiman and Wang which predicted that the generalized Gaussian distribution minimizes the R\'{e}nyi entropy of the sum of independent random variables. Through a variational analysis, we show that the generalized Gaussian fails to be a minimizer for the problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Remarks on the Rényi Entropy of a sum of IID random variables
Benjamin Jaye
School of Mathematical Sciences, Clemson University
,
Galyna V. Livshyts
Department of Mathematics, Georgia Tech
,
Grigoris Paouris
Department of Mathematics, Texas A&M
and
Peter Pivovarov
Department of Mathematics, University of Missouri
Abstract.
In this note we study a conjecture of Madiman and Wang [MW] which predicted that the generalized Gaussian distribution minimizes the Rényi entropy of the sum of independent random variables. Through a variational analysis, we show that the generalized Gaussian fails to be a minimizer for the problem.
1. Introduction
For , the -Rényi [Re] entropy of a (continuous) random vector in distributed with density is defined by
[TABLE]
where denotes the -dimensional Lebesgue measure. As , converges to the usual Shannon entropy
[TABLE]
(provided that the density of is sufficiently regular to justify passage of the limit). For the entropy power , the fundamental entropy power inequality (EPI) of Shannon [Sh] asserts that for independent random vectors and ,
[TABLE]
where , are independent Gaussians satisfying . A firm connection between the EPI, -Rényi entropy and fundamental results like the Brunn-Minkowski and Young’s convolution inequalities goes back to Dembo, Cover and Thomas [DCT]. See Principe [Pr] for more information about where the Rényi entropy arises; see also Bobkov, Marsiglietti [BM] for a related discussion.
Recently, there has been increasing interest in -Rényi entropy inequalities. Interestingly, the following basic mathematical question is still open: Over all random variables with some fixed quantity, what are the minimizers of the entropy , where is an independent copy of ? We learnt about this question from the papers of Madiman, Melbourne, Xu, and Wang [MW, MMX1], who studied unifying entropy power inequalities for the Rényi entropy, which, in the limit recover the statement that, over all probability distributions with fixed, is minimized if (and only if) is a Gaussian, see e.g. [DCT].
Several closely related questions have been recently addressed involving the -Rényi entropy power . Bobkov and Chistyakov [BCh2] show that there is a constant , depending on and , such that for independent random vectors . A sharper form of the constant was subsequently found by Ram and Sason [RS]. Bobkov and Marsiglietti [BM1] proved that for independent Random vectors if . There has been considerable further recent success extending the EPI to the Rényi setting [BCh, Li, LiMM, MM, RS, Rioul2].
Following [LYZ, MW, MMX1], for , consider the Generalized Gaussian
[TABLE]
where is chosen so that . The generalized Gaussian is the distribution with the smallest second moment with a given Rényi entropy, see work of Lutwak, Yang, and Zhang [LYZ], as well as earlier results of Costa, Hero, and Vignat [CHV]. Madiman and Wang made the following bold conjecture (Conjecture IV.3 in [MW]).
Conjecture 1.1** (The Madiman-Wang Conjecture).**
If , , are independent random variables with densities , and are independent random variables distributed with respect to where is chosen so that , then
[TABLE]
This conjecture has been confirmed in the case , see [MMX, MMX2].
In this note we will show that unfortunately this conjecture does not hold in the special case when , , and and are identically distributed, see Section 4. However, we do suspect that a minimizing distribution is a relatively small perturbation of the generalized Gaussian.
Throughout this note we only consider the case where are independent copies of a random variable with density . The question of finding the minimizer of with fixed can then be rephrased as a constrained maximization problem, which we introduce in Section 2. Subsequently, in Section 3 we take the first variation of this maximization problem. We have not been able to develop a satisfactory theory of the associated Euler-Lagrange equation (3.2), but we show in Section 4 that the generalized Gaussian is not a solution to (3.2), and so fails to be a maximizer of the extremal problem. We conclude the paper with some elementary remarks and speculation.
Acknowledgement. The first named author is supported by NSF DMS-1830128, DMS-1800015 and NSF CAREER DMS-1847301. The second named author is supported by the NSF CAREER DMS-1753260. The third named author is supported by the NSF DMS-1812240. The fourth named author is supported by the NSF DMS-1612936. The work was partially supported by the National Science Foundation under Grant No. DMS-1440140 while the authors were in residence at the Mathematical Sciences Research Institute in Berkeley, California, during the Fall 2017 semester.
The authors are especially grateful to the reviewers for valuable comments and suggestions, which helped improve the paper and clarify the exposition.
2. The constrained maximization problem
Denote by the -fold convolution of a given function with itself, that is, , where there are factors of (and convolutions). Then . It will be convenient to set , the Dirac delta measure, so that for any measurable function .
Throughout the text, we fix , and . We set
[TABLE]
and consider the extremal problem
[TABLE]
Put
[TABLE]
We begin with a simple scaling lemma, which we will use often in what follows.
Lemma 2.1**.**
Suppose that is non-negative, and . The function
[TABLE]
belongs to , and
[TABLE]
Proof.
Observe that, for any ,
[TABLE]
Plugging in and (and recalling the definition of ) we see that . Next, observe that
[TABLE]
Whence,
[TABLE]
and the proof is complete by recalling the definition of . ∎
We next prove that (2.1) has a maximizer. A radial function on is called decreasing if whenever .
Proposition 2.2**.**
The problem (2.1) has a lower-semicontinuous, radially decreasing, maximizer .
Proof.
First observe that for any measurable function , iterating Riesz’s rearrangement inequality [LL, Theorem 3.7] yields , where is the symmetric rearrangement of ; see [B, Section 3.4] for related multiple convolution rearrangement inequalities and their equality cases. Also, notice that if , then .
Take non-negative functions such that (recall from (2.2)). By replacing with its symmetric rearrangement, we may assume that are radial and decreasing. Passing to a subsequence if necessary, we may in addition assume that weakly in . Consequently, is radial, decreasing, , and . (To see this, observe that the set of radial decreasing nonnegative functions with norm at most is a closed convex set in , so by Mazur’s Lemma, see e.g. [LL, Theorem 2.13], this set is weakly closed.) By modifying on a set of measure zero if necessary, we may assume that is lower semi-continuous111If is discontinuous at , then define (i.e. the one-sided radial limit from the right). Then is open for every ..
Claim 2.3**.**
As , -almost everywhere.
Proof.
For , define and whenever . Then since converges weakly to in , we have that whenever is a closed interval of finite Lebesgue measure in ,
[TABLE]
Insofar as the function is non-decreasing, it has at most countably many points of discontinuity. If is a point of continuity of , and , then
[TABLE]
but since is decreasing we have that for . Thus
[TABLE]
Arguing similarly with intervals whose left end-point is , we also have that
[TABLE]
Thus at every point of continuity of . If is a countable set in , then is a Lebesgue null set in , so the claim follows.∎
Notice that, as a consequence of this claim, Fatou’s Lemma ensures that . Our next claim is
Claim 2.4**.**
If , then strongly in as .
The proof of this claim is a variant of the Vitali convergence theorem (see e.g. Theorem 9.1.6 of [Ros]), but observe that it does not necessarily hold if one was to remove the radially decreasing property of the functions (just consider a sequence of translates of a fixed function).
Proof.
Fix . Insofar as the functions and are radially decreasing,
[TABLE]
where is the closed ball centered at [math] of radius \bigl{(}\frac{2}{\mu_{d}(B(0,1))\delta}\bigl{)}^{1/d}. (Otherwise we would have for some , or .)
On , we have for every , and , whence
[TABLE]
provided is chosen sufficiently small.
Now fix . Observe that,
[TABLE]
if is chosen sufficiently small. On the other hand, since has finite measure, one can invoke continuity of measure from above, thus we have that in measure on as . From the inequalities
[TABLE]
we infer that there exists such that
[TABLE]
Bringing these estimates together, it follows that for every . ∎
Our next goal is to use this claim in order to show that . To this end, observe that repeated application of Young’s convolution inequality [LL] yields that, for any -tuple of functions ,
[TABLE]
where is the Hölder conjugate of , so . Since , .
To apply this inequality, first use Minkowski’s inequality to observe that,
[TABLE]
but,
[TABLE]
and hence
[TABLE]
Appealing to (2.3) now yields,
[TABLE]
Returning to our sequence , it is a consequence of Hölder’s inequality that with some depending on and , so (and the same inequality holds with replaced by ). Whence there is a constant such that
[TABLE]
Since , Claim 2.4 yields that in as . Hence . (It follows that is not identically zero.)
It remains to show that . To this end, we apply Lemma 2.1: Consider the function
[TABLE]
Then and Consequently, if or , then , which is absurd. Thus and the proof of the proposition is complete. ∎
3. The First Variation
With the existence of a maximizer proved, we now wish to analyze it analytically.
To introduce the Euler-Lagrange equation associated to (2.1) it will be convenient to define, for a function , . Observe that, if are non-negative measurable functions,
[TABLE]
Proposition 3.1**.**
A lower-semicontinuous function is a maximizer of the problem (2.1) if and only if
[TABLE]
Remark 3.2**.**
Observe that if is radially decreasing, then is again radially decreasing for any , so in this case.
Proof.
The sufficiency is easy to show. Integrating both sides of (3.2) against , and recalling that , yields
[TABLE]
But using Tonelli’s theorem and (3.1), the left hand side is equal to .
Conversely, consider a bounded function compactly supported in the open set . Since is lower-semicontinuous, . Therefore, (insofar as is bounded) there exists a constant such that
[TABLE]
so in particular, there exists such that for it follows that is non-negative. In the notation of Lemma 2.1 with , we consider the function
[TABLE]
with the corresponding satisfying . Of course we also have regardless of for . We conclude that belongs to , and therefore
[TABLE]
Moreover, as in Lemma 2.1,
[TABLE]
For , we calculate, using commutativity and associativity of the convolution operator,
[TABLE]
and
[TABLE]
Crudely employing the bound (3.3) in (3.6), we infer that there is a constant , depending on , and such that for all ,
[TABLE]
Whence, the second order Taylor formula yields that
[TABLE]
for . Integrating the pointwise inequality (3.7) yields
[TABLE]
as .
Now, recalling the definition of , we calculate
[TABLE]
where in the expansion of we have again used the inequality (3.3) to obtain the term.
Plugging the two expansions (LABEL:scaleexpansion) and (3.8) into (3.5) yields that, as ,
[TABLE]
From (3.4) it follows that , so the second term in the prior expansion must vanish, that is,
[TABLE]
where (3.1) has been used. Since was any bounded function compactly supported in , we conclude that (3.2) holds.∎
4. On the Madiman-Wang conjecture
Proposition 4.1**.**
The generalized Gaussian is not the extremizer for problem (2.1).
Proof.
Consider the simplest case , , and . We shall show that the function does not satisfy the equation
[TABLE]
and so no function of the form , with , satisfies (3.2), for any value of (recall Remark 3.2). In fact, we shall show that is not a quadratic polynomial near [math].
For this, observe:
[TABLE]
Thus, is the threefold convolution of the above measure. The threefold convolution of equals on , and no other term in the convolution is quadratic in . Therefore, has non-vanishing sixth derivative at [math], but does have vanishing sixth derivative at [math]. ∎
Remark 4.2**.**
Moreover, for any dimension , the random vector in with i.i.d. coordinates , each distributed according to the generalized Gaussian density, does not constitute the extremizer for this problem. Indeed, in this case , and it remains to use Proposition 4.1. Therefore, a random vector with i.i.d. coordinates which are generalized Gaussians is not an extremal case for this question.
5. Any radially decreasing solution of (3.2) is compactly supported
In this section, we discuss the following
Proposition 5.1**.**
Decreasing radial solutions of (3.2) are compactly supported.
Proof.
Suppose that solves (3.2) and is not compactly supported. Since is non-negative and radially decreasing, its support is .
The term on the left hand side of (3.2) belongs to , where . Indeed, if then (recall that with ), but and
[TABLE]
so . If , then is convex, so by Jensen’s inequality, , whence in this case.
On the other hand, the right hand side of (3.2) belongs to only if , which is absurd, since certainly contains non-zero functions. ∎
6. Remarks
In this section we make some remarks that suggest that although the generalized Gaussian is not an optimal distribution for the problem (2.1), a reasonably small perturbation of the generalized Gaussian could well be.
Beginning with , consider the following iteration for
[TABLE]
Numerically, this iteration converges pointwise to a solution of the equation (4.1) for some satisfying the constraints and (so the support of is ). The resulting function can then be re-scaled via the transformation () to have any given positive integral and -norm. We do not know if the solution of is unique (modulo natural invariants in the problem), so we cannot say that this function corresponds to a solution of the constrained maximization problem (2.1).
We provide the graphs of and (see Figure 1 below), and the algebraic expressions for , and on .
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[B Ch] S. G. Bobkov, G. P. Chistyakov, Bounds for the maximum of the density of the sum of independent random variables, Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI), 408(Veroyatnost i Statistika. 18):62-73, 324, 2012.
- 2[B Ch 2] S. G. Bobkov, G. P. Chistyakov, Entropy power inequality for the Renyi entropy , IEEE Trans. Inform. Theory, 61(2):708-714, February 2015.
- 3[BM 1] S. Bobkov, A. Marsiglietti, Variants of the entropy power inequality, IEEE Transactions of Information Theory, 63(12):7747-7752, (2017).
- 4[BM] S. Bobkov, A. Marsiglietti, Asymptotic behavior of Rényi entropy in the central limit theorem, submitted, ar Xiv:1802.10212.
- 5[B] A. Burchard, Cases of equality in the Riesz rearrangement inequality. Thesis (Ph.D.) Georgia Institute of Technology. 1994. 94 pp.
- 6[CHV] J. Costa, A. Hero, and C. Vignat, On solutions to multivariate maximum alpha-entropy problems, Lecture Notes in Computer Science, vol. 2683, no.EMMCVPR 2003, Lisbon, 7-9 July 2003, pp. 211-228, 2003.
- 7[DCT] A. Dembo, T. M. Cover and J. A. Thomas, Information theoretic inequalities, IEEE Transactions on Information Theory, vol. 37, no. 6, pp. 1501–1518, Nov. 1991.
- 8[Li] J. Li, Renyi entropy power inequality and a reverse , Studia Mathematica, 242 (2018) 303-319.
