Comment on the Equality Condition for the I-MMSE Proof of Entropy Power Inequality
Alex Dytso, Ronit Bustin, H. Vincent Poor, Shlomo Shamai (Shitz)

TL;DR
This paper clarifies the conditions under which the I-MMSE proof of the entropy power inequality achieves equality, by deriving an exact expression for the deficit and linking it to the Cauchy functional equation.
Contribution
It provides the first precise characterization of the equality condition in the I-MMSE proof of the EPI, connecting it to the Cauchy functional equation.
Findings
Derived an exact expression for the EPI deficit.
Identified a necessary condition for equality involving the Cauchy functional equation.
Enhanced understanding of the equality case in the I-MMSE proof.
Abstract
The paper establishes the equality condition in the I-MMSE proof of the entropy power inequality (EPI). This is done by establishing an exact expression for the deficit between the two sides of the EPI. Interestingly, a necessary condition for the equality is established by making a connection to the famous Cauchy functional equation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProbabilistic and Robust Engineering Design · Matrix Theory and Algorithms · Mathematical Approximation and Integration
Comment on the Equality Condition for the I-MMSE Proof of Entropy Power Inequality
Alex Dytso*∗, Ronit Bustin∗∗, H. Vincent Poor∗, and Shlomo Shamai (Shitz)∗∗* *∗A. Dytso and H.V. Poor are with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544, USA (email: adytso, [email protected]).∗∗*R. Bustin and S. Shamai (Shitz) are with the Department of Electrical Engineering, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel (e-mail: [email protected], [email protected]).The work of A. Dytso and H.V. Poor was supported by the National Science Foundation under Grants CCF-1420575 and CNS-1456793. The work of S. Shamai and R.Bustin was supported by the Unions Horizon 2020 Research and Innovation Programme Grant 694630. The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of the funding agencies.
Abstract
The paper establishes the equality condition in the I-MMSE proof of the entropy power inequality (EPI). This is done by establishing an exact expression for the deficit between the two sides of the EPI. Interestingly, a necessary condition for the equality is established by making a connection to the famous Cauchy functional equation.
The classical entropy power inequality (EPI) formulated by Shannon in [1] states that for two independent continuous random vectors and
[TABLE]
where equality in (1) is attained if and only if and are Gaussian with proportional covariances (i.e., for some scalar ). Via the transformation
[TABLE]
the EPI can be shown to be equivalent to Lieb’s inequality [2]
[TABLE]
and where equality in (2) holds if and only if .
There are several proofs of the EPI which follow three distinct methods: using integration over a path of a continuous Gaussian perturbation [3, 4, 5, 6, 7]; using the sharp version of Young’s inequality and properties of Rényi entropy [2, 5, 4]; and using a change of variable and Knöthe’s map [8, 9]. For a comprehensive list of references and a detailed history of the EPI, the reader is referred to [10] and references therein.
As was recently pointed out in [8] not all available proofs settle the equality case in (1) and (2). In particular, for the class of proofs via Gaussian perturbations, the case of equality has not yet been established in the proof given in [6], which relies on the so-called I-MMSE relationship [11].
The goal of this paper is to close this gap by establishing the equality case in the proof of the EPI via the I-MMSE relationship. Equality is established by determining an exact expression for the deficit in (2) and showing that the deficit is zero if and only if and are Gaussian with identical covariances.
Notation
Deterministic scalar/vector quantities are denoted by lowercase normal/bold letters, matrices by bold uppercase letters, random variables by uppercase letters, and random vectors by bold uppercase letters. For a random vector we denote the covariance matrix by , determinant by , transpose by , and trace by . The Euclidian norm of a vector is denoted by . The gradient operator is denoted by . The denotes the expectation operator.
Assumptions
Throughout the paper, we assume that all random vectors treated in this work have covariance matrices with bounded entries and continuous, positive, and differentiable probability densities. Therefore, quantities such as entropies, expectations, and conditional expectations are well defined throughout the paper. The interested reader is referred to [8] and [12] where it is shown that the set of aforementioned assumptions is sufficient to prove the EPI in (1).
I Preliminary Results
In this section, we present necessary mathematical tools needed in this paper.
The first result of this section establishes the penalty, incurred in the minimum mean square error (MMSE), for using a sub-optimal estimator.
Lemma 1**.**
Let be measurable and such that . Then,
[TABLE]
Proof:
[TABLE]
where (4a) and (4b) are due to the orthogonality principle. This concludes the proof. ∎
The necessary condition for the equality in (2) will be shown to be a consequence of a remarkably simple, yet powerful, Cauchy functional equation.
Lemma 2**.**
(Cauchy Functional Equation.) Over a space of measurable111In this paper, measurable is meant with respect to the Lebesgue measure. functions from to the equation
[TABLE]
is satisfied if and only if (i.e., is linear) for some matrix .
Proof:
See [13, Chapter 2]. ∎
Cauchy functional equation has a very rich history, and the interested reader is referred to [13] for a comprehensive summary. Cauchy functional equation is used next to establish the following property of the conditional expectation.
Lemma 3**.**
Let be independent random vectors with support and be measurable functions such that . Then, for any
[TABLE]
for some and .
Proof:
The proof of the sufficient condition follows trivially. To show the necessary condition observe that (6a) is equivalent to identifying a set of functions for which
[TABLE]
Since and are fully supported, we have that
[TABLE]
for all . In particular,
[TABLE]
Therefore, by adding the two equations in (9) and using (8), we arrive at
[TABLE]
Next, by letting , it is not difficult to see that (10) corresponds to Cauchy functional equation in Lemma 2. As a result, we concluded that is an affine function
[TABLE]
for some and . Finally, (11) and (9) imply that functions and are also affine with the same slope. This concludes the proof. ∎
The following well-known property of the conditional expectation will be useful in manipulating some of our expressions.
Lemma 4**.**
(Smoothing or Towering Property of the Conditional Expectation.) Let sigma algebras be such that . Then,
[TABLE]
Proof:
See [14, Chapter 10]. ∎
The key step of the proof would be to establish that the equality holds if and only if a certain conditional expectation is a linear or an affine function. The following result shows that the conditional expectation is an affine function if and only if the input random variable is Gaussian.
Lemma 5**.**
Let where and are independent. Then,
[TABLE]
Proof:
Lemma 5 is a well known result from estimation theory and the details of the proof can be found in [15]. Here, we only a give sketch of the proof of the necessary condition. To show the necessary condition, one must show that, in the MMSE sense, linear estimators are only optimal for Gaussian random vectors. For simplicity, we only look at the zero mean (i.e., and ) and the scalar case. Let , and let be an estimator that we claim to be optimal with . Then, by the orthogonality principle we have that
[TABLE]
where (14a) follows by the independence of and , and (14b) follows by the derivative expression (the derivative expression holds since by assumption .
Therefore, from (14c) we have a differential equation of the form
[TABLE]
The only nontrivial solution to the differential equation in (15) is given by the Gaussian distribution with the characteristic function given by . This concludes the proof. ∎
We define the score function and the Fisher information of a continuous random vector with the probability density function as
[TABLE]
For the Gaussian noise channel, the score function of the output can be related to the conditional expectation.
Lemma 6**.**
Let where and are independent. Then,
[TABLE]
Proof:
See [11, Eq.(56)]. ∎
We conclude this section by giving an expression for the differential entropy in terms of an integral of the MMSE which is a consequence of the I-MMSE relationship in [11].
Lemma 7**.**
For every continuous random vector ,
[TABLE]
where and is independent of .
II Main Results
The first main result of this section, which is a refinement of the bound in [6], establishes an exact expression for the deficit in (2).
Theorem 1**.**
For any independent continuous random vectors and any
[TABLE]
Proof:
According to Lemma 7, the entropy of a random vector , defined in (19c), is given by
[TABLE]
where the last step follows by taking
[TABLE]
in Lemma 1.
Next, by the mutual independence of , the first expectation in (20) reduces to
[TABLE]
Finally, by combining (20) and (22) we arrive at
[TABLE]
where is defined in (19b). This concludes the proof. ∎
Clearly, in (19b) is a non-negative quantity which leads to Lieb’s inequality in (2).
II-A On the Equality Condition
The following result establishes necessary and sufficient conditions for the equality in (2) and gives several equivalent statemens for the equality.
Theorem 2**.**
The following statements are equivalent:
[TABLE]
Moreover, equality in (23) holds if and only if and are Gaussian with identical covariances.
Proof:
From (19b) it is immediate that if and only if
[TABLE]
This shows equivalence between (23a) and (23b).
The equivalence between (23b) and (23c) follows from the towering property in Lemma 4
[TABLE]
Showing equivalence between (23d), (23e) and (23f) is deferred to Appendix A.
Next, we show that (23) is satisfied if and only if and are Gaussian random vectors with identical covariances. The sufficient condition follows by noting that if and , then the estimators are linear and are given by
[TABLE]
Therefore, the equality condition in (23b) holds only if
[TABLE]
With a small amount of algebra it is not difficult to show that the equality in (26) holds only if .
The necessary condition follows by letting and , in which case the condition in (23c) reduces to
[TABLE]
According to Lemma 3 equality in (27) implies that and (or and ) are affine functions with the same slope. In other words, the conditional expectations are given by
[TABLE]
Moreover, by Lemma 5 the linearity of conditional expectations implies that and are Gaussian random vectors such that
[TABLE]
From (28), it is evident that and have identical covariances. This concludes the proof. ∎
III Concluding Remark
In this work, we have established the equality condition for the I-MMSE proof of the EPI. Theorem 2 also establishes an equality condition for the following Fisher information inequality
[TABLE]
This should come as no surprise since the inequality in (29) is a key to establishing the proof of the EPI via DeBruijn’s identity [3, 4, 5]. The equality condition in (29) was previously established in [16] by showing that a certain differential equation is satisfied only by Gaussian densities, and in [17] by checking the equality case of the Variance Drop inequality. In contrast, our proof relies on Cauchy functional equation and towering property of the conditional expectation.
It is also interesting to observe that the expression for the deficit
[TABLE]
is closely related to the mismatched representation of the relative entropy [18]
[TABLE]
where , and , and where
[TABLE]
is the relative Fisher information distance.
Moreover, in view of the exact characterization of in (19b), it would be interesting to explore connections to the work in [19]. The authors of [19] provided lower bounds on , for log-concave densities in terms of Wasserstein distance, by using the lower bound on from [8].
Appendix A Proof of equivalence between (23d), (23e), and (23f)
The equivalence between (23c) and (23d) follows from Lemma 6.
Next we show the equivalence between (23d) and (23e). Using Lemma 6 the score function can be written as
[TABLE]
Next, by the towering property of the conditional expectation in Lemma 4
[TABLE]
where the last step follows Lemma 6. Putting equation (35) and (36) together we arrive at
[TABLE]
which establishes equivalence between (23d) and (23e). The expression in (37) is sometimes called a convolution identity of the score function [17].
The equivalence between (23d) and (23f) follows from (37) and the definition of Fisher’s information in (16). This concludes the proof.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] C. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J. , vol. 27, no. 379-423, 623-656, Jul., Oct. 1948.
- 2[2] E. H. Lieb, “Proof of an entropy conjecture of Wehrl,” in Inequalities . Springer, 2002, pp. 359–365.
- 3[3] A. Stam, “Some inequalities satisfied by the quantities of information of Fisher and Shannon,” Information and Control , vol. 2, no. 2, pp. 101–112, 1959.
- 4[4] N. Blachman, “The convolution inequality for entropy powers,” IEEE Trans. Inf. Theory , vol. 11, no. 2, pp. 267–271, 1965.
- 5[5] A. Dembo, T. Cover, and J. Thomas, “Information theoretic inequalities,” IEEE Trans. Inf. Theory , vol. 37, no. 6, pp. 1501–1518, Nov 1991.
- 6[6] S. Verdú and D. Guo, “A simple proof of the entropy-power inequality,” IEEE Trans. Inf. Theory , vol. 52, no. 5, pp. 2165–2166, 2006.
- 7[7] D. Guo, S. Shamai, and S. Verdú, “Proof of entropy power inequalities via MMSE,” in Proc. IEEE Int. Symp. Inf. Theory . IEEE, 2006, pp. 1011–1015.
- 8[8] O. Rioul, “Yet another proof of the entropy power inequality,” ar Xiv preprint ar Xiv:1606.05969 , 2016.
