Unifying the Brascamp-Lieb Inequality and the Entropy Power Inequality
Venkat Anantharam, Varun Jog, Chandra Nair

TL;DR
This paper introduces a new family of entropy functionals that unify and generalize the entropy power inequality and the Brascamp-Lieb inequality, revealing Gaussian extremality and intermediate inequalities.
Contribution
It defines subadditive entropy functionals, proves Gaussian extremality, and derives a generalized inequality that encompasses both EPI and BLI.
Findings
Gaussians are extremal for the new entropy functionals.
A new inequality generalizes both EPI and BLI.
Intermediate inequalities are obtained based on component independence.
Abstract
The entropy power inequality (EPI) and the Brascamp-Lieb inequality (BLI) are fundamental inequalities concerning the differential entropies of linear transformations of random vectors. The EPI provides lower bounds for the differential entropy of linear transformations of random vectors with independent components. The BLI, on the other hand, provides upper bounds on the differential entropy of a random vector in terms of the differential entropies of some of its linear transformations. In this paper, we define a family of entropy functionals, which we show are subadditive. We then establish that Gaussians are extremal for these functionals by mimicking the idea in Geng and Nair (2014). As a consequence, we obtain a new entropy inequality that generalizes both the BLI and EPI. By considering a variety of independence relations among the components of the random vectors appearing in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNF-κB Signaling Pathways · Statistical Mechanics and Entropy · Mathematical Inequalities and Applications
Unifying the Brascamp-Lieb Inequality and the Entropy Power Inequality
Venkat Anantharam Department of Electrical Engineering and Computer Sciences, UC Berkeley. Email: [email protected]
Varun Jog Department of Pure Mathematics and Mathematical Statistics, University of Cambridge. Email: [email protected]
Chandra Nair Department of Information Engineering Engineering, CUHK. Email: [email protected]
Abstract
The entropy power inequality (EPI) and the Brascamp-Lieb inequality (BLI) are fundamental inequalities concerning the differential entropies of linear transformations of random vectors. The EPI provides lower bounds for the differential entropy of linear transformations of random vectors with independent components. The BLI, on the other hand, provides upper bounds on the differential entropy of a random vector in terms of the differential entropies of some of its linear transformations. In this paper, we define a family of entropy functionals, which we show are subadditive. We then establish that Gaussians are extremal for these functionals by mimicking the idea in Geng and Nair (2014). As a consequence, we obtain a new entropy inequality that generalizes both the BLI and EPI. By considering a variety of independence relations among the components of the random vectors appearing in these functionals, we also obtain families of inequalities that lie between the EPI and the BLI.111A version of this paper appeared in the Proceedings of the IEEE International Symposium on Information Theory, 2019.
1 Introduction
Information inequalities provide some of the most powerful mathematical tools in an information theorist’s toolbox and are therefore a vital part of information theory. Inequalities such as the non-negativity of mutual information and the data processing inequality are so fundamental to information theory that they are inseparable from information-theoretic notation. These basic inequalities, combined with Fano’s inequality, are powerful enough to yield the converse of Shannon’s channel coding theorem. For harder problems in network information theory, it is necessary to develop more nuanced information inequalities. Not surprisingly, it is often the case that discovering new inequalities leads to breakthroughs in network information theory problems. Some examples of information inequalities that spurred such breakthroughs include the entropy power inequality [1, 2], numerous strengthened forms of the entropy power inequality [3, 4, 5], strong data processing inequalities [6], and inequalities that established certain continuity properties of entropy [7].
On a related note, “single-letter characterizations” of a capacity region or outer bounds to a capacity region in network information theory are induced by subadditive functionals that reduce the characterization of the region to one governed by a single channel use. In this paper, we identify a new functional that is sub-additive and for which Gaussian distributions are extremal. Consequently, we obtain a new class of information inequalities that unifies two fundamental inequalities: the entropy power inequality (EPI) and the Brascamp-Lieb inequality (BLI). In what follows, we provide a brief introduction to the EPI and the BLI and state our main results.
As notational conventions in what follows, and denote equality by definition depending on whether the expression being defined is on the left or on the right respectively, while, for an integer , denotes and denotes the identity matrix. We use the notation for the determinant of a square matrix . We use the term “entropy” as synonymous with “differential entropy” in this document. All vectors are assumed to be column vectors, and we will adopt the convention that if is an k-valued vector and is an l-valued vector, then denotes the k+l-valued vector that would normally be written as . Given a random vector , we use the notation to denote the random vector , where . The notation for random vectors , , and indicates that and are conditionally independent given .
Entropy power inequality:
The EPI states that for any independent n-valued random variables and , the following inequality holds:
[TABLE]
Here, refers to the differential entropy function and all the differential entropies in equation (1) are assumed to exist. Equality holds if and only if and are Gaussian random variables with proportional covariance matrices. The EPI was proposed by Shannon [1] and was first proved by Stam [8]. This proof was later simplified by Blachman [2]. A variety of simple and ingenious proofs have been discovered since; see Rioul [9] for a discussion.
The EPI has an equivalent formulation due to Lieb [10] which is that for we have:
[TABLE]
Equality holds in the above inequality if and only if and are Gaussian random variables with identical covariance matrices. Note that may be interpreted as a linear transformation of an 2n-valued random variable with some independence constraints on the components of , namely . Another result along such lines is Zamir and Feder’s EPI [4] for linear transformations of random vectors with independent components. This EPI has an equivalent formulation, discovered in [9, 11], that is analogous to the one in equation (2): For an n-valued random vector with independent scalar components and any matrix satisfying , we have
[TABLE]
where is the squared-norm of the -th column of ; i.e., .
Brascamp-Lieb inequality:
The BLI [12] is actually a family of functional inequalities that lies, in some sense, at the intersection of information and functional inequalities. Many well-known and commonly used inequalities are special cases of the BLI, including Hölder’s inequality, the Loomis-Whitney inequality, the Prékopa-Leindler inequality, and sharp forms of Young’s convolution inequalities [13]. In Gardner’s extensive survey [14], the author describes relationships between popular functional and information inequalities using a pyramid-like sketch, where inequalities at the top imply those below. The BLI and its reverse lie at the very apex of this inequality pyramid. A simple statement of the BLI is as follows:
Theorem 1** (Functional form of the BLI).**
For , let , be Euclidean spaces, be linear maps, be positive real numbers, and be nonnegative integrable functions on . Define the function via
[TABLE]
Then the supremum of over all nonnegative and integrable is equal to the supremum of when are centered Gaussian functions; i.e., for all , we have for some positive semidefinite .
Surprisingly, a direct connection exists between the functional form of the BLI and a generalized subadditivity result for entropy. This link was first discovered in Carlen, Lieb, and Loss [15], and has since led to newer proofs and generalizations of the original BLI [16, 17, 18, 19, 20]. The information-theoretic form of the BLI is the following:
Theorem 2** (Information-theoretic form of the BLI, Theorem 2.1 in Carlen and Cordero-Erausqin [16]).**
For , let , , , and be as in Theorem 1. For a random variable on with a well-defined differential entropy (see Definition 1) and satisfying , define as
[TABLE]
Then the supremum of over all such random variables is equal to the supremum of over all Gaussian random variables.
This information-theoretic form is completely equivalent to the functional form: For a fixed choice of the and the , the supremums in both problems have a direct relationship and the cases of equality are also in correspondence [16, Theorem 2.1]. A defining feature of the BLI is that it reduces an infinite-dimensional optimization problem to a finite-dimensional optimization problem over a set of positive definite matrices. When the supremum in Theorem 2 is finite, random variables that achieve the supremum are called extremizers, and Gaussian random variables that achieve the supremum are called Gaussian extremizers. 222 In [13] a Gaussian extremizer is defined as a distribution that extremizes among the class of Gaussian distributions, but it turns out that this definition is identical to the one used here.
The existence of extremizers or Gaussian extremizers and the finiteness of are not addressed by Theorem 2, as stated above. However, this is well-understood in the literature [21, 16, 13].
Our contributions:
The classical EPI and the EPI of Zamir and Feder are valid only under certain independence assumptions. To be precise, for an 2n-valued random vector , the EPI requires independence of and and considers the sum of these two vectors, whereas Zamir and Feder’s EPI requires all the components to be independent and considers linear transformations of . It is natural to consider more general “mixed” independence constraints, for instance, independence of for suitable choices of , and establish lower bounds on for a matrix . This is indeed a special case of the setting considered in our work.
Consider an n-valued random vector , where and are mutually independent -valued random variables. Note that . We consider the following function:
[TABLE]
for positive constants and where and for some , and surjective linear transformations from n to . Just as in Theorem 2, our main result in Theorem 3 states that the supremum of over all random variables satisfying the stated independence constraints is the same as the supremum evaluated over Gaussian random variables. In Theorem 4, we identify necessary and sufficient conditions on , , and the , , , and , such that this supremum is finite. We show that the EPI, BLI, and Zamir and Feder’s EPI easily follow from Theorem 3. Theorem 3 also provides a generalization of Zamir and Feder’s result for certain kinds of dependent random variables.
Our main technical contribution is identifying new entropic functionals and proving that they satisfy a certain subadditivity property. The work of Geng and one of the authors [22] highlighted the critical role played by subadditivity in information inequalities. How subadditivity of information theoretic functionals—which is established using the chain rule and data processing relations—can be used to determine the capacity of the Gaussian vector broadcast channel was demonstrated in that work. Once subadditivity is ascertained, a technique from functional analysis called the “doubling trick” may be used to establish Gaussian optimality. The doubling trick, attributed to Ball [23], appeared in Lieb [24] to prove that Gaussian kernels have Gaussian optimizers, and in Carlen [25] to show Gaussian optimality in the log-Sobolev inequality. Subadditivity followed by the doubling trick has been used to prove numerous information inequalities in recent years [26, 27, 28, 29, 30, 5].
Related work:
The EPI may be thought of as a limiting special case of the BLI. Gardner [14] showed that the EPI follows from the sharp form of Young’s inequality, which in turn is a special case of the BLI. This proof strategy is further clarified using a more geometric approach by Cordero-Erausquin and Ledoux [18]. The authors of [18] establish the EPI directly from Theorem 2 by carefully choosing the and as a function of a parameter that tends to 0 and yields the EPI in the limit. While these are intriguing connections, they do not suggest concrete approaches for developing information inequalities for random vectors under more general independence constraints.
Various information-theoretic analogues of hypercontractive inequalities and reverse Brascamp-Lieb inequalities in finite alphabet spaces have been studied in [31, 19, 32]. A closely related work is that of Liu et al. [20], where a novel functional inequality called the forward-reverse Brascamp-Lieb inequality is formulated, and it is shown that there exists an analogous information-theoretic version of this inequality. Most relevant to us is the forward-reverse Brascamp-Lieb inequality with linear maps that was introduced in Liu et al. [20]. Define a function of the marginal densities of an n-valued random variable :
[TABLE]
Here, by we mean that the distribution of is identical to that of . Theorem 8 in [20] states that the supremum of is obtained when each is a centered Gaussian random variable, in which case the infimum in the definition in equation (6) is attained when the optimal coupling is a jointly Gaussian random vector. The expressions in equations (5) and (6) look very similar. The main difference is that equation (6) has an infimum over all possible couplings , whereas our definition in equation (5) enforces the unique coupling where the components are mutually independent.
Structure of the paper:
In Section 2, we introduce some preliminaries and set up the notation to be used in the rest of the paper. In Section 3 we state our main result in Theorem 3 and show that the EPI, BLI, and Zamir and Feder’s EPI may be proved as special cases of this result. In Section 4, we prove Theorem 3. In Section 5, we establish necessary and sufficient conditions for the supremum of in the expression in equation (5) to be finite. In Section 6, we provide a concrete example that demonstrates the utility of Theorem 3 in obtaining EPI-like results for dependent random variables. Finally, in Section 7 we conclude the paper and describe some open problems.
2 Preliminaries and notation
Definition 1**.**
For , let be an n-valued random variable with density that lies in the convex set of probability densities
[TABLE]
Then we define the entropy of as
[TABLE]
The entropy of a [math]-dimensional random variable is defined to be 0.
Remark 2.1**.**
The integral in equation (7) is well-defined since the integrand is non-negative. The condition in equation (7) implies that the differential entropy integral in equation (8) is well-defined and lower-bounded away from . Also note that the condition in equation (7) is inherited by marginalization, i.e. if satisfies the condition and is a (multidimensional) marginal of , then also satisfies the condition.
Definition 2** (BL datum).**
For an integer , define an -transformation as a triple
[TABLE]
where for each , is a surjective linear transformation, and . An -exponent is defined as an -tuple , such that for . A Brascamp-Lieb datum (BL datum) is defined as a pair where is an -transformation and is an -exponent, for an integer .
Definition 3** (EPI datum).**
For an integer , define a -partition of as such that are integers and . Let such that for all be a -exponent. An EPI datum is a pair where is a -partition and is a -exponent, for an integer .
Definition 4** (BL-EPI datum).**
For an integer , a BL-EPI datum is defined as where is a BL datum for an integer , and is an EPI datum for an integer .
Definition 5**.**
Let be a BL-EPI datum where is a -partition of . Define to be the set of all n-valued random vectors such that:
For , the random vectors take values in and their densities satisfy the condition in equation (7); 2. 2.
are independent; 3. 3.
and .
Since entropy expressions are not affected by adding constants, the 0-mean assumption in Definition 5 may be made without loss of generality. Define as the set of random variables that satisfy the properties above, while, in addition, each , is Gaussian.
Remark 2.2**.**
Whether an n-valued random vector lies in or not is a property of its distribution. The finite variance assumption on random variables in implies that the entropies for and for are bounded away from . However, with only the variance assumption in place, it may happen that some of these entropies equal , which happens, for instance, when is a constant. In this paper, we shall be dealing with differences of entropies of the form
[TABLE]
The condition in equation (7) together with the finite variance assumption has the effect of ensuring that the absolute values of the differential entropies are finite, which ensures that the above difference is well-defined for . This is a technical assumption made for ease of presentation. In cases where the expression in equation (9) is not well-defined, we may redefine it to equal the limit
[TABLE]
where for a standard normal independent of and the are standard normal random vectors independent of . With this modification, our results continue to hold for random variables that satisfy all the conditions in Definition 5 except the condition in equation (7).
The following two concepts are required for Theorem 4.
Definition 6**.**
Let be a BL-EPI datum. Define a subspace as being of -product form if may be written as for subspaces , for .
Definition 7**.**
Let be a BL-EPI datum. An -product form subspace is called a critical subspace if
[TABLE]
Definition 8**.**
For a BL-EPI datum , define as
[TABLE]
Similarly, define as the above supremum taken over Gaussian inputs . When the BL-EPI datum is fixed, we shall omit the argument and use the simplified notation and .
3 Main results
We are now in a position to state our main result:
Theorem 3** (Unified EPI and BLI).**
Let be a BL-EPI datum. Recall the definition
[TABLE]
Then for any , the following inequality holds:
[TABLE]
Recall that in Definition 8 we introduced the quantity (with a simplified notation):
[TABLE]
Naturally, we have . Thus, if is , then so is . If , then the above result implies , and thus . An equivalent way of stating the above result is asserting . Theorem 3 does not address the following points, which are worth investigating:
Finiteness: When is (and therefore ) finite? 2. 2.
Extremizability and Gaussian extremizability: Assuming is finite, when do extremizers exist for the supremum in equation (13), and when do Gaussian extremizers exist for the supremum in equation (12)? In particular, does extremizability imply Gaussian extremizability? (Clearly, the reverse implication is true because of Theorem 3.) 3. 3.
Uniqueness of extremizers: Assuming extremizers exists, are they unique in some appropriate sense?
The answers to all these questions will depend on the BL-EPI datum . In this paper, we resolve the first question by identifying necessary and sufficient conditions on that ensure finiteness of and . We do not address the latter two questions here. We show the following result:
Theorem 4**.**
For a BL-EPI datum , we have if and only if the following conditions are satisfied:
[TABLE]
As we show below, Theorem 3 readily implies the EPI, BLI, and Zamir and Feder’s EPI. For this reason, we choose to interpret the inequality in Theorem 3 as a unified version of the Brascamp-Lieb inequality and the entropy power inequality.
Entropy Power Inequality:
We will prove the EPI in Lieb’s form (2) using Theorem 3. Let and be independent d-valued random variables with zero means and bounded variances, and let . The expression corresponds to , , , , , , and . Note that it is enough to prove by explicit calculation. Consider Gaussian random variables and . Plugging in the entropies of these Gaussian random variables and simplifying, we see that we need to evaluate the supremum
[TABLE]
This supremum is seen to be 0 via the concavity of the function.
Brascamp-Lieb Inequality:
When , , and , we recover the setting of the Brascamp-Lieb inequality in its equivalent form of subadditivity of entropy:
[TABLE]
for all n-valued random variables with and .
Zamir and Feder’s Inequality:
Let be a matrix satisfying . For , let the squared norm of the -th column of be denoted by ; i.e.,
[TABLE]
Just as we did for the EPI, it is enough to show that by explicitly computing the supremum of over Gaussian . Let be a positive definite matrix. Define a function from the space of positive definite diagonal matrices to as follows:
[TABLE]
If we show that , then Theorem 3 will immediately imply Zamir and Feder’s EPI for random vectors with independent components. Let , so that . Using the Cauchy-Binet formula for the determinant of , we obtain
[TABLE]
where consists of the columns of corresponding to the indices . The right hand side of the above equality may be written explicitly as
[TABLE]
Noting that (again via the Cauchy-Binet formula), we may take logarithms and use Jensen’s inequality to obtain
[TABLE]
We now gather the coefficients of for a fixed . The coefficient of is given by
[TABLE]
Here, the first equality follows by using the Cauchy-Binet formula again, the second equality follows from the orthogonality of the rows of , and the third equality is true because for any vector . A similar calculation can be done to show that the coefficient of is for all , which completes the proof of .
4 Proof of Theorem 3
Our proof strategy relies on the technique of Geng and Nair [22] which was developed to solve optimization problems of the form A rough sketch of this proof strategy is outlined below:
- •
Concave envelope: Define the concave envelope of , denoted by , as the smallest concave function that pointwise dominates . It can be seen that
[TABLE]
where the supremum is over finite auxiliary random variables with support .
- •
Subadditivity of : This step consists of defining on the larger space of pairs of random variables . A straightforward extension often exists for information-theoretic functions . The subadditivity result shows that
[TABLE]
The ingredients for establishing the subadditivity result developed in this paper stems from the ideas to establish converses to coding theorems and outer bounds in network information theory. An argument with a flavor similar to that employed here can be found outlined in [33].
- •
Optimizers of : In this step (also known as the doubling trick), we consider two i.i.d. copies of any optimizer of , say , and show that and are also optimizers of . From here, we may use Gaussian characterization results [34] or the central limit theorem [22] to conclude that it is enough to consider only Gaussian optimizers.
- •
Optimizers of : In this final step, we show that the optimal value for is attained by a single Gaussian distribution; i.e., we may assume without loss of generality that , and thus this Gaussian also maximizes .
The crux of the proof is establishing the subadditivity of . Our proof relies on the expanding the joint entropy in two separate ways as follows:
- (A)
,
- (B)
.
To highlight the main ideas, we present a proof sketch of the subadditivity result for the EPI using our new technique.
4.1 Proving the EPI via subadditivity
Consider the function
[TABLE]
where . Define the lifting of to the space of pairs of random variables by
[TABLE]
where . Let and be the respective concave envelopes of and its lifting. 333 To get from , we can think of the domain of as being the product of the convex set of probability densities on satisfying (7) and the convex set of probability densities on satisfying (7), and take the concave hull on this product space; similarly for getting from . It can be checked that any product distribution on got by a mixture of product distributions can be viewed as having the mixing done on the marginals, basically because if where and then summing over on both sides gives and similarly . This justifies why we can write (20) and the analogous expression for .
We would like to show the subadditivity relation
[TABLE]
Notice that
[TABLE]
and similarly for . For any auxiliary random variable satisfying , applying expansion (A) to each entropy term in equation (18) (conditioned on ) yields
[TABLE]
For simplicity, call the terms in the brackets , , and respectively, even though they actually depend on . Observing that for , we may conclude and . Substituting these inequalities, we arrive at
[TABLE]
We now expand the expression in equation (18) (conditioned on ) using expansion (B) for each entropy term:
[TABLE]
For ease of notation, call the three terms , , and , even though they actually depend on . Similar to inequality (22), we would like to upper bound and by and respectively. However, the conditioning for the entropy terms in each of the is not the same so we cannot directly conclude such a bound. Using the chain rule of mutual information and data-processing relations, we may make the conditioning in and uniform by introducing some extra mutual information terms:
[TABLE]
where the notational conventions and are used even though the respective terms actually depend on . The main step in the preceding equation is justified as follows. First, it it easy to check using the Markov relation that
[TABLE]
Also, we may verify that
[TABLE]
Similar reasoning for gives
[TABLE]
where the notational conventions and are used even though the respective terms actually depend on . Substituting the expressions for and in the expansion in equation (23), we arrive at
[TABLE]
Here, in step we used the Markov chains and . Step follows by noticing that and are non-negative, being mutual information expressions.
Inequalities (22) and (24) may now be used in tandem to conclude
[TABLE]
Taking the supremum over all auxiliary random variables satisfying leads to
[TABLE]
Notice that the above proof not only gives us subadditivity, but also states that if there is equality in equation (25) for some optimal , then . This leads to several independence conditions that can be used establish Gaussian optimality. We do not sketch this part of the proof here.
In what follows, we develop this outline into a rigorous proof for a more general result in two stages. In Section 4.2 we establish the key subadditivity inequality and the independence relations that follow from the conditions for equality in that inequality, and in Section 4.3 we complete the proof of Theorem 3 by proving Gaussian optimality.
4.2 Subadditivity lemma
4.2.1 Preliminaries
Let be a BL-EPI datum. Let , where . A natural definition for would be
[TABLE]
and one might then work with its concave envelope . However, for technical reasons we consider Gaussian-smoothed random variables in defining as follows:
Definition 9**.**
Let be mutually independent standard normal random variables on , and let . For , define independent Gaussian random variables , and let . Assume that the random variables , and are mutually independent. For define as
[TABLE]
Let be the concave envelope of . Let be an auxiliary random variable taking values in a finite set such that we have . It is easy to see that the concave envelope has an equivalent definition in terms of such choices of :
[TABLE]
where, on the right hand side of equation (28), we can assume that , and are mutually independent. For a particular choice of , define
[TABLE]
Analogous to , define to be the set of random variables that take values in and satisfy the conditions in Definition 5. More precisely, a random vector is in if and are n-valued random vectors such that the random vectors , are mutually independent, satisfy the condition in equation (7), and condition 3 of Definition 5 holds for . Since the condition in equation (7) is inherited by marginalization, we have that if then and .
We will need to define an extension of to the larger space . Consider a random vector as in the preceding paragraph. Define
[TABLE]
where are mutually independent standard normal distributions of the appropriate dimensions that are independent of . The concave envelope of can be written as:
[TABLE]
where , , , and are mutually independent, with taking values in finite sets and . Figure 1 illustrates the relations between the random variables via a graphical model.
4.2.2 Proof of subadditivity
Lemma 4.1** (Subadditivity lemma).**
For any , the function is subadditive; i.e., if then
[TABLE]
Corollary 4.1**.**
For any , the function tensorizes; i.e., if and if , then
[TABLE]
Proof of Lemma 4.1.
Let be an auxiliary random variable taking values in a finite set , such that . Consider the following expansion, which comes from applying expansion (A) term by term:
[TABLE]
For simplicity, denote the terms in the square brackets by , , and , respectively, even though they actually depend on . Observe that that (see Figure 1). Thus, we conclude that and , using the definition in equation (28). Substituting these inequalities, we arrive at
[TABLE]
We now expand in a different way, which comes from applying expansion (B) term by term:
[TABLE]
For ease of notation, call the three terms in the square brackets , , and , respectively, even though each term actually depends on . Similar to inequality (35), we would like to upper bound and by and respectively. However, the conditioning in each of the two differential entropy terms in each , is not the same, so we cannot directly conclude such a bound. Using the chain rule of mutual information and data-processing relations, we may make the conditioning in and uniform by introducing some extra mutual information terms:
[TABLE]
where we write and for simplicity, even thought the corresponding terms depend on . The above steps are justified as follows. First, it is easy to check that conditioned on . This means that, for all ,
[TABLE]
Also, we may verify the Markov chain (conditioned on )
[TABLE]
which gives the equality
[TABLE]
Similar reasoning for gives
[TABLE]
where we use the notation and for simplicity, even though the corresponding terms depend on . Substituting the expressions for and in the expansion in equation (37), we arrive at
[TABLE]
Here, in step we used the fact that and the definition in equation (28). Step follows by noticing that the are non-negative, and so are and since they are nonnegative linear combinations of mutual informations.
We can combine inequalities (35) and (38) to get
[TABLE]
Taking the supremum on the left hand side of this inequality over all auxiliary variables taking values in finite sets , such that , yields the claimed subadditivity result. ∎
Proof of Corollary 4.1.
When , we have the inequality
[TABLE]
This is because we can always choose such that and , . The supremum in equation (4.2.1) over this restricted class of auxiliaries is simply , which therefore is a lower bound on . Inequality (40) combined with Lemma 4.1 completes the proof of Corollary 4.1. ∎
Our next lemma serves to some extent as a converse to Corollary 4.1. In particular, we show that if , then and are independent conditioned on the optimal auxiliary , assuming it exists. We point out that this converse requires and to be strictly bounded away from [math], unlike Lemma 4.1. The formal statement is as follows:
Lemma 4.2** (Independence relations).**
Fix . Given , suppose that . Suppose that is such that and . Then the following results hold:
- (a)
For all , we have that conditioned on , 2. (b)
* and .*
Proof.
Notice that the proof of Lemma 4.1 implies that the optimizing , if it exists, must satisfy . The first two equalities yield the Markov chains (conditioned on )
[TABLE]
However, we have the obvious Markov chains
[TABLE]
Using Lemma A.1, we may conclude that, conditioned on , we have
[TABLE]
Recall that is given by
[TABLE]
Substituting the above independence relations in , we conclude that, conditioned on , we have
[TABLE]
which by Lemma A.2 implies that, conditioned on , we have
[TABLE]
and concludes the proof of (a).
Having proved (a), rewrite equation (34), with for , as
[TABLE]
The above inequality, combined with the assumed equality , immediately yields
[TABLE]
∎
4.2.3 A general subadditivity result
A closer inspection of the proof of Lemma 4.1 reveals that the linear functions mapping to could be replaced with general channels. To be precise, let and for , consider channels from to . Define the function as
[TABLE]
and let be its concave envelope. The function is lifted to pairs of random variables as
[TABLE]
where the channel from to is given by . Let be the concave envelope of .
Claim 4.1**.**
The function is subadditive; i.e., .
Proof.
Let be an auxiliary random variable taking values in a finite set , such that . Note that
[TABLE]
To verify step (a) it suffices to show that for each . In fact we have equality here because, as is easily verified, we have conditionally independent of given and . To verify the last inequality, observe that are conditionally independent given and are conditionally independent given . Taking a supremum over completes the proof. ∎
We make several remarks. First, observe that only the need to be non-negative; no such condition is necessary for the . 444 However, studying the maximum over of an expression like (9) when some of the are negative is not interesting because the maximum over is , as can be seen by letting the covariance matrix of the component corresponding to any factor with negative tend to [math].
Second, while this proof is very simple compared to that of Lemma 4.1, the independence relations in Lemma 4.2—which are critical to the proof of Gaussian optimality—cannot be directly deduced from the above proof. However, this is not such a big impediment. Instead of , we could consider a slightly modified function defined by
[TABLE]
where is a standard Gaussian that is independent of and . It is not hard to show that the concave envelope of is subadditive; in fact, the same steps as in the proof of Claim 4.1 suffice. Further, including the extra mutual information term allows one to deduce independence relations analogous to those in Lemma 4.2. This approach provides an alternate route to proving Theorem 3.
4.3 Proof of Theorem 3
Having proved the key subadditivity step, the rest of the proof closely follows the steps outlined in [22, Appendix II].
Definition 10**.**
Let be an block diagonal matrix such that each is an positive definite matrix. For , define
[TABLE]
where denotes ordering in the positive semidefinite partial order.
Lemma 4.3**.**
There exist random variables and satisfying (1) ; (2) ; and (3) , such that the following holds:
[TABLE]
Proof of Lemma 4.3.
Let be a sequence of random variables such that and as . This sequence of random variables is tight due to the covariance constraint [22, Proposition 17], and thus we may assume without loss of generality that the converge weakly to a random variable as . Since satisfies the necessary regularity conditions as in [22, Proposition 18], we also have for , and for . Hence we may conclude .
Recall that is defined as
[TABLE]
where, for the moment, ranges over positive integers of arbitrary size. The equality in is because we may restrict to the class of optimizers for . We now show that we can fix to be in (46). Let denote the connected subset of positive definite matrices of the form where is an positive definite matrix for . Consider the connected compact subset, , of the -dimensional Euclidean space obtained using the continuous mapping defined by , where . Fenchel’s extension of Carathéodory’s Theorem [35, Theorem 1.3.7] states that any finite convex combination of points in , can be represented as a convex combination of at most points in . Hence for any we can find a pair with taking at most values, such that and . Thus from this point onwards in the proof we define in (46).
Consider any sequence of convex combinations with for all , and such that converges to as . Appealing to the compactness of the -dimensional simplex, we may assume without loss of generality that for all . If any of the equals [math], then noticing that gives us
[TABLE]
where is some constant that does not depend on . In , we used the fact that each is upper-bounded by the entropy of a Gaussian random variable with the same covariance matrix as , and .
It is now clear that the limit as is equal to 0 whenever . Thus, we may assume that , by splitting a component into multiple components if necessary. This implies that for all large enough . Hence, we can find a convergent subsequence such that for each when along this subsequence. We arrive at
[TABLE]
or, in other words, we can find a pair of random variables with such that . This completes the proof. ∎
Lemma 4.4**.**
Consider random variables such that for some -partition of . Define new random variables and via
[TABLE]
Then .
Proof.
We have the equality
[TABLE]
Further, defining , , , and , we have
[TABLE]
and
[TABLE]
and are equal in distribution. Multiplying the equations in (49) by and those in (4.3) by and subtracting the sum of the latter from the sum of the former, we may conclude that . ∎
Lemma 4.5**.**
Fix . Let the random variables and be as in Lemma 4.3; i.e., satisfying the equality , and with . Consider two independent and identically distributed copies of , denoted by and . Define new random variables and as follows:
[TABLE]
Also, define . Then the following results hold:
- (a)
* and are conditionally independent given ,* 2. (b)
* and .*
Proof.
We have the following sequence of inequalities:
[TABLE]
Here follows from the assumption that . Equality follows from the independence . Equality holds because of Lemma 4.4. Inequality follows from the definition of . Inequality follows from the tensorization result in Lemma 4.1. Finally, inequality follows from the definition in equation (44), and the fact that and have the same covariance as , which is bounded above by in the positive semidefinite partial order.
Since the first and last expressions match, all the inequalities in the above sequence of inequalities must be equalities. In particular, equalities and combined with Lemma 4.2 imply that conditioned on , thus establishing part (a) of the lemma. Lemma 4.2 also gives and . Finally, equality in gives and . This completes the proof of part (b). ∎
Lemma 4.6**.**
There exists such that and . Furthermore, the random variable is the unique element of the set satisfying .
Proof.
Consider the setting as in Lemma 4.5. Using Lemma 4.5, we have that conditioned on for any . However, we also have conditioned on . The characterization theorem for Gaussian distributions [34] implies that and must be Gaussian with identical covariance matrices, conditioned on . Recall that is independent of , and the covariance matrix of conditioned on is simply the covariance matrix of conditioned on for . Since and may be chosen arbitrarily, we conclude that the covariance matrix of is some fixed for all . Let . Thus,
[TABLE]
To establish uniqueness, first note that it is enough to only consider Gaussian random variables satisfying , since our argument above shows that any that achieves this equality must be Gaussian. Now suppose that and are two distinct random variables such that with . Define such that when and when . Suppose also that takes values 1 and 2 with probability , each. It is easy to check that satisfies the covariance constraint, and that As in Lemma 4.5, consider two i.i.d. copies of and of . Lemma 4.5 states that conditioned on , we have , for any values of and . Conditioned on and , we have and . This implies , which is impossible since , and thus there cannot be two distinct Gaussian maximizers. ∎
Proof of Theorem 3.
We now complete the proof of Theorem 3. Recall the definition of :
[TABLE]
Clearly, there is nothing to prove if is infinite, so we assume . Let be an arbitrary random vector. By choosing a large enough such that , we may conclude that
[TABLE]
Let , where , be the unique maximizer such that , as in Lemma 4.6. Thus, we have the sequence of inequalities
[TABLE]
Here, inequality follows from the entropy inequality
[TABLE]
for all . The inequality in is true because the random variable defined by for is a Gaussian random variable in . Thus, by the definition of , we must have
[TABLE]
Combining inequalities (51) and (52), we have
[TABLE]
Recall that is given by
[TABLE]
If satisfies certain mild conditions (such as bounded second moments) provided in Lemma A.3, we have that
[TABLE]
This means that we may take the limit in inequality (53) as to conclude
[TABLE]
and conclude the proof of Theorem 3. ∎
5 Conditions for
Theorem 3 shows that it is enough to find necessary and sufficient conditions for to be finite, since . We prove Theorem 4 by finding necessary conditions on the BL-EPI datum for such finiteness in Claim 5.1, and showing that the necessary conditions are also sufficient in Claim 5.2.
Claim 5.1**.**
If is finite, then the conditions in equations (14) and (15) must be satisfied.
Proof.
The necessity of the condition in equation (15) is seen as follows. Choose for some . It is easy to see that scales as as a function of as . Since is arbitrary, the above expression is finite only if the condition in equation (15) is satisfied.
To show that the condition in equation (14) is necessary, let be a subspace of n of -product form. Consider a Gaussian random variable such that , and is supported on and is supported on . Furthermore, assume and . Taking the limit as and gathering the coefficients of , we see that scales as
[TABLE]
as . Thus, is finite only if the condition in equation (14) is satisfied. ∎
The proof of sufficiency of the conditions in equations (14) and (15) relies on two lemmas which we prove below.
Lemma 5.1**.**
Let be a BL-EPI datum. Let be an arbitrary -product form subspace such that for . Let and . Define two BL-EPI data as follows:
- (a)
* is a BL-EPI datum defined on . For each , define the linear maps by for .* 2. (b)
* is a BL-EPI datum defined on . For , the linear maps are defined by*
[TABLE]
We also define the linear maps as
[TABLE]
Here denotes the orthogonal projection on to a subspace . Note that is an orthogonal decomposition.
Then the following relation holds:
[TABLE]
Remark 5.1**.**
Note that it may happen that for some . It may also happen that for some , we have . We do not rule out such cases, and keep our notation the same by instead defining entropy on a 0-dimensional subspace as 0.
Proof of Lemma 5.1.
By definition, the linear transformations in and are surjective. Also, and . This verifies that and are indeed valid BL-EPI data on and , respectively. Every vector may be expressed as . We use the notation where , and similarly for . We have the equality
[TABLE]
For any ,
[TABLE]
Taking the supremum over all completes the proof. ∎
Lemma 5.2**.**
Suppose that a BL-EPI datum satisfies the conditions in equations (14) and (15), and suppose that is an -product form critical subspace. Then the BL-EPI data and defined as in Lemma 5.1 also satisfy the conditions in equations (14) and (15).
Proof.
Verifying the conditions for is immediate: the condition in equation (14) restricted to product form subspaces of yields the first condition, and the criticality of yields the second condition.
For , it is not hard to verify that is . We may now check the second condition for by observing the equality
[TABLE]
using the criticality of and the fact that . Let be an arbitrary -product form subspace of . Consider the new subspace , which is the direct sum of the subspace with the subspace . Note that is an -product form subspace of n. Using the condition in equation (14) for , we have
[TABLE]
Note that , for all . Moreover, . Substituting these equalities in the above inequality, we arrive at
[TABLE]
The criticality of then implies
[TABLE]
and this completes the proof. ∎
We are now in a position to prove the following sufficiency result:
Claim 5.2**.**
If the conditions in equations (14) and (15) are satisfied, then is finite.
Proof.
The proof proceeds via a double induction on the dimension and the number of linear maps . We first prove the result for and arbitrary , and for and arbitrary . For , it must be that and . The conditions in equations (14) and (15) imply that , because . Thus, equals
[TABLE]
since for all such that , and is a nonzero scalar for each such .
Now fix and let , , , , , and be arbitrary, subject to satisfying the conditions in equations (14) and (15). We write
[TABLE]
where is an matrix for (and is an matrix). Recall that, by assumption, for and .
Let denote the null space of . For every -product form subspace we must have for all . This is because if we have for some , then letting and for , the corresponding -product form subspace will violate the condition , where and for .
We can therefore assume that for . Under this assumption, we will now show that , where denotes the supremum of
[TABLE]
over independent taking values in with positive definite for each for , and where
[TABLE]
We have
[TABLE]
for , and
[TABLE]
It is therefore equivalent to show that the supremum of
[TABLE]
over positive definite for each for is finite.
Let be a singular value decomposition of for . Since by assumption, here is a diagonal matrix with strictly positive diagonal entries, is an orthogonal matrix and is an matrix with orthonormal columns. Note that span of the columns of equals the range space of .
With denoting for , it is equivalent to show that the supremum of
[TABLE]
over positive definite for each for is finite.
Note that the entries of depend only on , which is fixed, and note that the are fixed. Therefore, with denoting , it is equivalent to show that the supremum of
[TABLE]
over positive definite for each for is finite. Let be the spectral-decomposition of and let denote the eigenvalues of in any order. By assumption these are all strictly positive. Let
[TABLE]
denote the ordered list of all the distinct values among these eigenvalues (note that , so here ).
Starting with and working towards the larger eigenvalues step by by step we can build up each , for , in layer-cake fashion as
[TABLE]
where each for is a positive semidefinite matrix, with a spectral decomposition given by , and each of whose eigenvalues is either [math] or (recalling the convention that ). Thus each corresponds to a subspace of , whose dimension we denote as . Note that and is nonincreasing as decreases, but it can become [math] for ; however we have for at least one choice of . We also have
[TABLE]
Observe that is strictly positive for .
Let denote the subspace of corresponding to , i.e. the subspace spanned by the eigenvectors of . Then is the subspace corresponding to in the same sense, where , and is the subspace corresponding to in the same sense, where . Note that
[TABLE]
By assumption, for each we therefore have
[TABLE]
where is an -product subspace of .
For each , since , we see that the subspace corresponding to is . In particular, the subspace corresponding to is .
We also note that for each we have
[TABLE]
Since , let us relabel the eigenvectors into (according to decreasing values of the eigenvalues) such that we have
[TABLE]
where we recall that by definition. We can also write
[TABLE]
where for and . Note that .
Now we have
[TABLE]
where . Note that the subspace corresponding to is . Since the range space of is non-decreasing, there exists an orthonormal basis for such that the range space of matches the span of for some appropriate . Thus .
By construction we have . Let where is the orthonormal matrix formed by ’s and is a diagonal matrix with diagonal entries being [math] or , where occurs at the indices corresponding to the membership in .
We now claim that there is positive constant depending only on (and in particular not depending on the or the choices of the bases for ) such that, for all , we have
[TABLE]
This is a consequence of Lemma B.1 and is established in Corollary B.1.
We therefore have
[TABLE]
From this it follows that
[TABLE]
for a fixed constant . Here, to justify step (a), due to the nested nature of , is a diagonal matrix with entries equal to . We take .
Since and is strictly positive for , and since , we can conclude that
[TABLE]
for all choices of positive definite for each for . This establishes what was desired, when .
We have shown that the claim is true for and all . Assume that claim is true for all and all . Our goal is to establish the claim for and all . To do so, we induct on . The case of and follows from our calculations above. Now we assume that the claim is true for and all , and show that it also holds for and .
Let be a BL-EPI datum in with . We may assume that for all , since otherwise we could have treated the scenario as a BL-EPI datum in with , which is already covered by the inductive hypothesis. For fixed , , and , consider the function defined on as
[TABLE]
Since is a pointwise supremum of linear functions, is convex. Let be the region of all such that satisfy the conditions in equations (14) and (15). Note that is a compact, convex set. By Claim 5.1, we have that takes values outside . We wish to show that takes finite values everywhere on . Since is convex and is closed, it is enough to show finiteness of at all points on the boundary of . Since for all , a point is a boundary point of if and only if at least one of the following two conditions is satisfied: (1) for some ; or (2) there exists a proper -product form subspace of that is critical. If a boundary point satisfies (1), then our induction assumption (on ) ensures the finiteness of evaluated at that BL-EPI datum, since we could have treated the scenario as a BL-EPI datum in with .
Now consider a boundary point that satisfies (2), assuming that for all . Let be an -product form critical subspace of ; i.e., a subspace that satisfies the equality
[TABLE]
with Lemma 5.1 shows that given any -product form subspace , it is possible to define BL-EPI data on and in terms of the original BL-EPI datum that satisfy a certain subadditivity property. In particular, if the datum on is denoted by and that on is denoted by , then Lemma 5.1 states that
[TABLE]
Thus, to show that is finite, is enough to show that and are finite. Lemma 5.2 asserts that since is a critical -product form subspace, the BL-EPI data and satisfy both the conditions in equations (14) and (15). Since , we may use the induction assumption (on the dimension) to assert and , and conclude the proof. ∎
6 A special case
We examine a special case here to see what kinds of new inequalities may result from Theorem 3. Let and be real valued random variables such that . We would like to lower bound the entropy . Note that the regular EPI applied with the independent random vectors and yields the trivial lower bound
[TABLE]
Note also that
[TABLE]
However, it is not possible to use Zamir and Feder’s EPI to provide lower bounds on because of the dependency between and . We show that Theorem 3 may be used to obtain a family of nontrivial lower bounds that account for this dependency.
Lemma 6.1**.**
Let . Consider the inequality
[TABLE]
where is some constant that depends only on . The above inequality holds for all if and only if satisfy the following inequalities:
; 2. 2.
; 3. 3.
, and ; 4. 4.
, which, combined with condition (1), is equivalent to .
Proof.
We shall use Theorem 4 to show this result. The above inequality is easily seen to be of the form in Theorem 3, where , , , , , and . An exhaustive search of all possible subspaces that are in -product form where is not hard to do. For simplicity, we refer to the axes in 3 as . Thus, the subspace is simply the subspace spanned by .
Equality (1) follows directly from equation (15) of Theorem 4; 2. 2.
Inequality (2) follows from equation (14) of Theorem 4, by choosing ; 3. 3.
Inequality (3) follows from equation (15) of Theorem 4, by choosing and ; 4. 4.
Inequality (4) is obtained from equation (15) of Theorem 4, by a careful choice of , i.e. the subspace spanned by and .
∎
Claim 6.1**.**
For satisfying the conditions in Lemma 6.1, the following inequality holds:
[TABLE]
where
[TABLE]
Proof.
For , the optimal constant is given by
[TABLE]
Calculating the above supremum for arbitrary is cumbersome so we assume The supremum simplifies to
[TABLE]
For a fixed and fixed , it is clear that the optimal choice of maximizes the above expression. Thus, we assume that and obtain
[TABLE]
Let , and noting that , we obtain
[TABLE]
For a fixed , the maximum of the above expression is attained when
[TABLE]
Substituting this value of ,
[TABLE]
Differentiating with respect to , the supremum is seen to be attained when Substituting this, we get
[TABLE]
This leads to the entropy inequality
[TABLE]
Notice that the mutual information term accounts for the dependency between and .∎
7 Conclusion
In this paper, we established a new inequality that unifies the BLI and the EPI by establishing subadditivity of certain entropic functionals. There are several interesting research directions that are worth pursuing. We did not address the questions of extremizability and uniqueness of extremizers in this work. One reason for this is that Theorem 3 is established by taking the limit as and go to 0. When and are strictly bounded away from 0, the extremizer of under a covariance constraint exists and is a unique Gaussian distribution. However, these existence and uniqueness properties need not hold in the limit as . In general, such a proof strategy is a powerful tool for proving inequalities, but may not always succeed in identifying necessary and sufficient conditions for equality. For this reason, alternate proof strategies that rely on heat flow based arguments [17, 13, 16] or optimal transport methods [21, 36] are worth exploring as well. After a preprint of this work appeared online, an optimal transport-based proof of Theorem 3 was discovered in Courtade [37]. Shortly thereafter, Courtade and Liu [38] proved Theorem 3 as a limiting case of the forward-reverse Brascamp-Lieb inequality [20] and gave an alternate proof of Theorem 4.
Finally, although our results generalize the BLI and the EPI to vector random variables with more general independence properties, these independence properties are still quite restrictive. For instance, the inequalities we derived do not encompass the monotonicity of entropy power family of results [39, 40, 41]. It would be interesting to generalize our inequalities to include the above family as well. Another (related) direction to pursue would be to establish similar entropy inequalities under weaker independence conditions.
Acknowledgements
The research of VA was supported by the NSF grants CNS-1527846, CCF-1618145, CCF-1901004, CIF-2007965, the NSF Science & Technology Center grant CCF-0939370 (Science of Information), and the William and Flora Hewlett Foundation supported Center for Long Term Cybersecurity at Berkeley. VJ acknowledges support from NSF grants CCF-1841190 and CCF-1907786, and is grateful to the Department of Information Engineering at CUHK for hosting him in July 2018, when a part of this work was done. The research of CN was supported by GRF grants 14303714, 14231916, 14206518 and a discretionary fund of the Vice Chancellor of CUHK.
Appendix A Supporting results for Theorem 3
Lemma A.1**.**
Let and be random variables taking values in and respectively, such that the following hold: (a) has a strictly positive density on ; (b) ; and (c) . Then .
Proof.
For any , , and , we have that
[TABLE]
where we used the assumed strict positivity of the density of to write the above equations. Fix . For any , we have
[TABLE]
Integrating both sides of the above equality with respect to , we obtain
[TABLE]
Since was chosen arbitrarily, we conclude that . A similar argument shows that . Using equation (58), we conclude that . ∎
Lemma A.2**.**
Let and be n-valued random variables and let be such that . If , then .
Proof.
Using the independence of and , we have that for any ,
[TABLE]
However, using the independence , we also have
[TABLE]
Since has no zeros (’s being independent standard Gaussian random variables), we conclude that
[TABLE]
that is, . ∎
Lemma A.3**.**
Let be an n-valued random variable with density and be independent of . Suppose that for some nonnegative continuous function , satisfying and . (Note that, for instance, satisfies the conditions.) Then the following equality holds:
[TABLE]
Proof.
Our proof relies on the following (lower semi-continuity) result from Posner [42, Theorem 1]: If are Borel probability distributions on a Polish space with and , then
[TABLE]
where denotes the relative entropy of the distribution with respect to the distribution . Picking an arbitrary sequence that converges to [math], let Using characteristic functions (or otherwise), it is easy to check that converges to in distribution. Let denote the distribution of and denote the distribution of . Let be the distribution corresponding to the density function . Note that
[TABLE]
Therefore, we have
[TABLE]
Here follows from the Posner’s result and follows from assumption (2). Hence
[TABLE]
On the other hand, non-negativity of mutual information, , yields Taking the on both sides of this equality, we conclude
[TABLE]
Inequalities (68) and (69) yield the equality
[TABLE]
and concludes the proof. ∎
Appendix B Supporting results for Claim 5.2
Lemma B.1**.**
Given subspaces for , with , let denote the corresponding -product subspace of , where . Let , with an matrix of rank for as above. Then there is some such that for all choices of where at least one is strictly positive, for all unit vectors (i.e. ), there exists some unit vector for some such that .
Proof.
Suppose to the contrary that we can find a sequence of unit vectors and subspaces that violates the condition, i.e. such that
[TABLE]
By going to a subsequence if necessary we can assume that there exist some choices of for with at least one of the being strictly positive, such that we have for all . Since the space of all -dimensional subspaces of is compact in the usual topology (i.e. as the corresponding Grassmanian), by going to a further subsequence if necessary we can assume that each converges to a limit as , where . Since the set of unit vectors in is compact, by going to a further subsequence if necessary we can assume that converges to a unit vector as . Since we have for all (where ), we must have (where ). We thus have for all unit vectors for all . But this is a contradiction, because is itself in the linear span of such vectors. ∎
Corollary B.1**.**
There is positive constant depending only on (and in particular not depending on the or the choices of the bases for ) such that, for all , we have
[TABLE]
where is a positive semidefinite matrix all of whose eigenvalues are either [math] or and where the subspace corresponding to is .
Proof.
Let be as in the Lemma. For each unit vector there exists some and a unit vector such that . Since is an orthonormal basis for , This means means that there is some such that , where we define and we have used . Recalling that , it follows that
[TABLE]
Since this holds for all unit vectors , this proves the corollary. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] C. E. Shannon. A mathematical theory of communication, I and II. Bell System Technical Journal , 27:379–423, 1948.
- 2[2] N. Blachman. The convolution inequality for entropy powers. IEEE Transactions on Information Theory , 11(2):267–271, 1965.
- 3[3] M. Costa. A new entropy power inequality. IEEE Transactions on Information Theory , 31(6):751–760, 1985.
- 4[4] R. Zamir and M. Feder. A generalization of the entropy power inequality with applications. IEEE Transactions on Information Theory , 39(5):1723–1728, 1993.
- 5[5] T. A. Courtade. A strong entropy power inequality. IEEE Transactions on Information Theory , 64(4):2173–2192, 2018.
- 6[6] Y. Polyanskiy and Y. Wu. Strong data-processing inequalities for channels and Bayesian networks. In Convexity and Concentration , pages 211–249. Springer, 2017.
- 7[7] Y. Polyanskiy and Y. Wu. Wasserstein continuity of entropy and outer bounds for interference channels. IEEE Transactions on Information Theory , 62(7):3992–4002, 2016.
- 8[8] A. J. Stam. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Information and Control , 2(2):101–112, 1959.
