Unifying the Brascamp-Lieb Inequality and the Entropy Power Inequality

Venkat Anantharam; Varun Jog; Chandra Nair

arXiv:1901.06619·cs.IT·September 28, 2021

Unifying the Brascamp-Lieb Inequality and the Entropy Power Inequality

Venkat Anantharam, Varun Jog, Chandra Nair

PDF

Open Access

TL;DR

This paper introduces a new family of entropy functionals that unify and generalize the entropy power inequality and the Brascamp-Lieb inequality, revealing Gaussian extremality and intermediate inequalities.

Contribution

It defines subadditive entropy functionals, proves Gaussian extremality, and derives a generalized inequality that encompasses both EPI and BLI.

Findings

01

Gaussians are extremal for the new entropy functionals.

02

A new inequality generalizes both EPI and BLI.

03

Intermediate inequalities are obtained based on component independence.

Abstract

The entropy power inequality (EPI) and the Brascamp-Lieb inequality (BLI) are fundamental inequalities concerning the differential entropies of linear transformations of random vectors. The EPI provides lower bounds for the differential entropy of linear transformations of random vectors with independent components. The BLI, on the other hand, provides upper bounds on the differential entropy of a random vector in terms of the differential entropies of some of its linear transformations. In this paper, we define a family of entropy functionals, which we show are subadditive. We then establish that Gaussians are extremal for these functionals by mimicking the idea in Geng and Nair (2014). As a consequence, we obtain a new entropy inequality that generalizes both the BLI and EPI. By considering a variety of independence relations among the components of the random vectors appearing in…

Equations404

e^{\frac{2 h ( X + Y )}{n}} \geq e^{\frac{2 h ( X )}{n}} + e^{\frac{2 h ( Y )}{n}} .

e^{\frac{2 h ( X + Y )}{n}} \geq e^{\frac{2 h ( X )}{n}} + e^{\frac{2 h ( Y )}{n}} .

h (λ X + 1 - λ Y) \geq λh (X) + (1 - λ) h (Y) .

h (λ X + 1 - λ Y) \geq λh (X) + (1 - λ) h (Y) .

h (A X) \geq j = 1 \sum n α_{j}^{2} h (X_{j}),

h (A X) \geq j = 1 \sum n α_{j}^{2} h (X_{j}),

F (f_{1}, \dots, f_{m}) := \frac{\int _{E} \prod _{j = 1}^{m} f _{j}^{c_{j}} ( A _{j} x ) d x}{\prod _{j = 1}^{m} ( \int _{E_{j}} f _{j} ( x _{j} ) d x _{j} ) ^{c_{j}}} .

F (f_{1}, \dots, f_{m}) := \frac{\int _{E} \prod _{j = 1}^{m} f _{j}^{c_{j}} ( A _{j} x ) d x}{\prod _{j = 1}^{m} ( \int _{E_{j}} f _{j} ( x _{j} ) d x _{j} ) ^{c_{j}}} .

f (X) := h (X) - j = 1 \sum m c_{j} h (A_{j} X) .

f (X) := h (X) - j = 1 \sum m c_{j} h (A_{j} X) .

f (X) := i = 1 \sum k d_{i} h (X_{i}) - j = 1 \sum m c_{j} h (A_{j} X),

f (X) := i = 1 \sum k d_{i} h (X_{i}) - j = 1 \sum m c_{j} h (A_{j} X),

F (X_{1}, \dots, X_{n}) := {Y ∣ Y_{i} = d X_{i}, i \in [n]} in f i = 1 \sum n d_{i} h (Y_{i}) - j = 1 \sum m c_{j} h (A_{j} Y) .

F (X_{1}, \dots, X_{n}) := {Y ∣ Y_{i} = d X_{i}, i \in [n]} in f i = 1 \sum n d_{i} h (Y_{i}) - j = 1 \sum m c_{j} h (A_{j} Y) .

\displaystyle\left\{f~{}\Big{|}~{}\int_{{}^{n}}f(x)\log(1+f(x))dx<\infty\right\}.

\displaystyle\left\{f~{}\Big{|}~{}\int_{{}^{n}}f(x)\log(1+f(x))dx<\infty\right\}.

h (X) := - \int_{^{n}} f_{X} (x) lo g f_{X} (x) d x .

h (X) := - \int_{^{n}} f_{X} (x) lo g f_{X} (x) d x .

A := (n, {n_{j}}_{j \in [m]}, {A_{j}}_{j \in [m]}),

A := (n, {n_{j}}_{j \in [m]}, {A_{j}}_{j \in [m]}),

i = 1 \sum k d_{i} h (X_{i}) - j = 1 \sum m c_{j} h (A_{j} X) .

i = 1 \sum k d_{i} h (X_{i}) - j = 1 \sum m c_{j} h (A_{j} X) .

δ \to 0_{+} lim sup i = 1 \sum k d_{i} h (\tilde{X}_{i}) - j = 1 \sum m c_{j} h (A_{j} \tilde{X} + δ Z_{j}),

δ \to 0_{+} lim sup i = 1 \sum k d_{i} h (\tilde{X}_{i}) - j = 1 \sum m c_{j} h (A_{j} \tilde{X} + δ Z_{j}),

i = 1 \sum k d_{i} dim (V_{i}) = j = 1 \sum m c_{j} dim (A_{j} V) .

i = 1 \sum k d_{i} dim (V_{i}) = j = 1 \sum m c_{j} dim (A_{j} V) .

M (A, c, r, d) := X \in P (r) sup i = 1 \sum k d_{i} h (X_{i}) - j - 1 \sum m c_{j} h (A_{j} X) .

M (A, c, r, d) := X \in P (r) sup i = 1 \sum k d_{i} h (X_{i}) - j - 1 \sum m c_{j} h (A_{j} X) .

M_{g} := Z \in P_{g} (r) sup i = 1 \sum k d_{i} h (Z_{i}) - j = 1 \sum m c_{j} h (A_{j} Z) .

M_{g} := Z \in P_{g} (r) sup i = 1 \sum k d_{i} h (Z_{i}) - j = 1 \sum m c_{j} h (A_{j} Z) .

i = 1 \sum k d_{i} h (X_{i}) - j = 1 \sum m c_{j} h (A_{j} X) \leq M_{g} .

i = 1 \sum k d_{i} h (X_{i}) - j = 1 \sum m c_{j} h (A_{j} X) \leq M_{g} .

M := X \in P (r) sup i = 1 \sum k d_{i} h (X_{i}) - j = 1 \sum m c_{j} h (A_{j} X) .

M := X \in P (r) sup i = 1 \sum k d_{i} h (X_{i}) - j = 1 \sum m c_{j} h (A_{j} X) .

i = 1 \sum k d_{i} dim (V_{i}) \leq j = 1 \sum m c_{j} dim (A_{j} V) for all r -product form V, and

i = 1 \sum k d_{i} dim (V_{i}) \leq j = 1 \sum m c_{j} dim (A_{j} V) for all r -product form V, and

i = 1 \sum k d_{i} r_{i} = j = 1 \sum m c_{j} n_{j} .

M_{g} = Σ_{1}, Σ_{2} ⪰ 0 sup λ lo g det (Σ_{1}) + (1 - λ) lo g det Σ_{2} - lo g det (λ Σ_{1} + (1 - λ) Σ_{2}) .

M_{g} = Σ_{1}, Σ_{2} ⪰ 0 sup λ lo g det (Σ_{1}) + (1 - λ) lo g det Σ_{2} - lo g det (λ Σ_{1} + (1 - λ) Σ_{2}) .

h (X) \leq j = 1 \sum m c_{j} h (A_{j} X) + M_{g},

h (X) \leq j = 1 \sum m c_{j} h (A_{j} X) + M_{g},

α_{j}^{2} := i = 1 \sum k a_{ij}^{2} .

α_{j}^{2} := i = 1 \sum k a_{ij}^{2} .

F (Λ) = lo g ∣ A Λ A^{T} ∣ - j = 1 \sum n α_{j}^{2} lo g λ_{j} .

F (Λ) = lo g ∣ A Λ A^{T} ∣ - j = 1 \sum n α_{j}^{2} lo g λ_{j} .

∣ B B^{T} ∣ = 1 \leq i_{1} < \dots < i_{k} \leq n \sum ∣ B_{i_{1} i_{2} \dots i_{k}} ∣∣ B_{i_{1} i_{2} \dots i_{k}}^{T} ∣,

∣ B B^{T} ∣ = 1 \leq i_{1} < \dots < i_{k} \leq n \sum ∣ B_{i_{1} i_{2} \dots i_{k}} ∣∣ B_{i_{1} i_{2} \dots i_{k}}^{T} ∣,

1 \leq i_{1} < \dots < i_{k} \leq n \sum (j = 1 \prod k λ_{i_{j}}) ∣ A_{i_{1} i_{2} \dots i_{k}} ∣^{2} .

1 \leq i_{1} < \dots < i_{k} \leq n \sum (j = 1 \prod k λ_{i_{j}}) ∣ A_{i_{1} i_{2} \dots i_{k}} ∣^{2} .

lo g ∣ A Λ A^{T} ∣ \geq 1 \leq i_{1} < \dots < i_{k} \leq n \sum ∣ A_{i_{1} i_{2} \dots i_{k}} ∣^{2} lo g (j = 1 \prod k λ_{i_{j}}) .

lo g ∣ A Λ A^{T} ∣ \geq 1 \leq i_{1} < \dots < i_{k} \leq n \sum ∣ A_{i_{1} i_{2} \dots i_{k}} ∣^{2} lo g (j = 1 \prod k λ_{i_{j}}) .

1 = i_{1} < \dots < i_{k} \leq n \sum ∣ A_{i_{1} i_{2} \dots i_{k}} ∣^{2} = 1 - ∣ A_{2, 3, \dots, n} A_{2, 3, \dots, n}^{T} ∣ = 1 - ∣ I_{n} - A_{1} A_{1}^{T} ∣ = α_{1}^{2} .

1 = i_{1} < \dots < i_{k} \leq n \sum ∣ A_{i_{1} i_{2} \dots i_{k}} ∣^{2} = 1 - ∣ A_{2, 3, \dots, n} A_{2, 3, \dots, n}^{T} ∣ = 1 - ∣ I_{n} - A_{1} A_{1}^{T} ∣ = α_{1}^{2} .

S (X) = U sup s (X ∣ U) = U sup u \in U \sum s (X ∣ U = u) p_{U} (u),

S (X) = U sup s (X ∣ U) = U sup u \in U \sum s (X ∣ U = u) p_{U} (u),

S (X_{1}, X_{2}) \leq S (X_{1}) + S (X_{2}) .

S (X_{1}, X_{2}) \leq S (X_{1}) + S (X_{2}) .

s (X_{1}, Y_{1}) := λh (X_{1}) + (1 - λ) h (Y_{1}) - h (λ X_{1} + 1 - λ Y_{1}),

s (X_{1}, Y_{1}) := λh (X_{1}) + (1 - λ) h (Y_{1}) - h (λ X_{1} + 1 - λ Y_{1}),

s (X_{1 : 2}, Y_{1 : 2}) := λh (X_{1 : 2}) + (1 - λ) h (Y_{1 : 2}) - h (λ X_{1 : 2} + 1 - λ Y_{1 : 2}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNF-κB Signaling Pathways · Statistical Mechanics and Entropy · Mathematical Inequalities and Applications

Full text

Unifying the Brascamp-Lieb Inequality and the Entropy Power Inequality

Venkat Anantharam Department of Electrical Engineering and Computer Sciences, UC Berkeley. Email: [email protected]

Varun Jog Department of Pure Mathematics and Mathematical Statistics, University of Cambridge. Email: [email protected]

Chandra Nair Department of Information Engineering Engineering, CUHK. Email: [email protected]

Abstract

The entropy power inequality (EPI) and the Brascamp-Lieb inequality (BLI) are fundamental inequalities concerning the differential entropies of linear transformations of random vectors. The EPI provides lower bounds for the differential entropy of linear transformations of random vectors with independent components. The BLI, on the other hand, provides upper bounds on the differential entropy of a random vector in terms of the differential entropies of some of its linear transformations. In this paper, we define a family of entropy functionals, which we show are subadditive. We then establish that Gaussians are extremal for these functionals by mimicking the idea in Geng and Nair (2014). As a consequence, we obtain a new entropy inequality that generalizes both the BLI and EPI. By considering a variety of independence relations among the components of the random vectors appearing in these functionals, we also obtain families of inequalities that lie between the EPI and the BLI.111A version of this paper appeared in the Proceedings of the IEEE International Symposium on Information Theory, 2019.

1 Introduction

Information inequalities provide some of the most powerful mathematical tools in an information theorist’s toolbox and are therefore a vital part of information theory. Inequalities such as the non-negativity of mutual information and the data processing inequality are so fundamental to information theory that they are inseparable from information-theoretic notation. These basic inequalities, combined with Fano’s inequality, are powerful enough to yield the converse of Shannon’s channel coding theorem. For harder problems in network information theory, it is necessary to develop more nuanced information inequalities. Not surprisingly, it is often the case that discovering new inequalities leads to breakthroughs in network information theory problems. Some examples of information inequalities that spurred such breakthroughs include the entropy power inequality [1, 2], numerous strengthened forms of the entropy power inequality [3, 4, 5], strong data processing inequalities [6], and inequalities that established certain continuity properties of entropy [7].

On a related note, “single-letter characterizations” of a capacity region or outer bounds to a capacity region in network information theory are induced by subadditive functionals that reduce the characterization of the region to one governed by a single channel use. In this paper, we identify a new functional that is sub-additive and for which Gaussian distributions are extremal. Consequently, we obtain a new class of information inequalities that unifies two fundamental inequalities: the entropy power inequality (EPI) and the Brascamp-Lieb inequality (BLI). In what follows, we provide a brief introduction to the EPI and the BLI and state our main results.

As notational conventions in what follows, $:=$ and $=:$ denote equality by definition depending on whether the expression being defined is on the left or on the right respectively, while, for an integer $n>0$ , $[n]$ denotes $\{1,\ldots,n\}$ and $I_{n\times n}$ denotes the $n\times n$ identity matrix. We use the notation $|A|$ for the determinant of a square matrix $A$ . We use the term “entropy” as synonymous with “differential entropy” in this document. All vectors are assumed to be column vectors, and we will adopt the convention that if $X$ is an k-valued vector and $Y$ is an l-valued vector, then $(X,Y)$ denotes the k+l-valued vector that would normally be written as $(X^{T},Y^{T})^{T}$ . Given a random vector $(Z_{1},\ldots,Z_{n})$ , we use the notation $Z_{a:b}$ to denote the random vector $(Z_{a},Z_{a+1},\dots,Z_{b})$ , where $1\leq a\leq b\leq n$ . The notation $X\rightarrow U\rightarrow Y$ for random vectors $X$ , $U$ , and $Y$ indicates that $X$ and $Y$ are conditionally independent given $U$ .

Entropy power inequality:

The EPI states that for any independent n-valued random variables $X$ and $Y$ , the following inequality holds:

[TABLE]

Here, $h(\cdot)$ refers to the differential entropy function and all the differential entropies in equation (1) are assumed to exist. Equality holds if and only if $X$ and $Y$ are Gaussian random variables with proportional covariance matrices. The EPI was proposed by Shannon [1] and was first proved by Stam [8]. This proof was later simplified by Blachman [2]. A variety of simple and ingenious proofs have been discovered since; see Rioul [9] for a discussion.

The EPI has an equivalent formulation due to Lieb [10] which is that for $\lambda\in(0,1)$ we have:

[TABLE]

Equality holds in the above inequality if and only if $X$ and $Y$ are Gaussian random variables with identical covariance matrices. Note that $\sqrt{\lambda}X+\sqrt{1-\lambda}Y$ may be interpreted as a linear transformation of an 2n-valued random variable $Z:=(X,Y)$ with some independence constraints on the components of $Z$ , namely $X\perp\!\!\!\perp Y$ . Another result along such lines is Zamir and Feder’s EPI [4] for linear transformations of random vectors with independent components. This EPI has an equivalent formulation, discovered in [9, 11], that is analogous to the one in equation (2): For an n-valued random vector $X:=(X_{1},\dots,X_{n})$ with independent scalar components and any $k\times n$ matrix $A$ satisfying $AA^{T}=I_{k}$ , we have

[TABLE]

where $\alpha_{j}^{2}$ is the squared-norm of the $j$ -th column of $A$ ; i.e., $\alpha_{j}^{2}:=\sum_{i=1}^{k}a_{ij}^{2}$ .

Brascamp-Lieb inequality:

The BLI [12] is actually a family of functional inequalities that lies, in some sense, at the intersection of information and functional inequalities. Many well-known and commonly used inequalities are special cases of the BLI, including Hölder’s inequality, the Loomis-Whitney inequality, the Prékopa-Leindler inequality, and sharp forms of Young’s convolution inequalities [13]. In Gardner’s extensive survey [14], the author describes relationships between popular functional and information inequalities using a pyramid-like sketch, where inequalities at the top imply those below. The BLI and its reverse lie at the very apex of this inequality pyramid. A simple statement of the BLI is as follows:

Theorem 1 (Functional form of the BLI).

For $j\in[m]$ , let $E$ , $E_{j}$ be Euclidean spaces, $A_{j}:E\to E_{j}$ be linear maps, $c_{j}$ be positive real numbers, and $f_{j}$ be nonnegative integrable functions on $E_{j}$ . Define the function ${\cal F}$ via

[TABLE]

Then the supremum of ${\cal F}$ over all nonnegative and integrable $f_{j}$ is equal to the supremum of ${\cal F}$ when $f_{j}$ are centered Gaussian functions; i.e., for all $j\in[m]$ , we have $f_{j}(x_{j})\propto e^{-x_{j}^{T}B_{j}x_{j}}$ for some positive semidefinite $B_{j}$ .

Surprisingly, a direct connection exists between the functional form of the BLI and a generalized subadditivity result for entropy. This link was first discovered in Carlen, Lieb, and Loss [15], and has since led to newer proofs and generalizations of the original BLI [16, 17, 18, 19, 20]. The information-theoretic form of the BLI is the following:

Theorem 2 (Information-theoretic form of the BLI, Theorem 2.1 in Carlen and Cordero-Erausqin [16]).

For $i\in[m]$ , let $E$ , $E_{i}$ , $A_{i}$ , and $c_{i}$ be as in Theorem 1. For a random variable $X$ on $E$ with a well-defined differential entropy (see Definition 1) and satisfying $E[\|X\|_{2}]^{2}<\infty$ , define $f(X)$ as

[TABLE]

Then the supremum of $f$ over all such random variables $X$ is equal to the supremum of $f$ over all Gaussian random variables.

This information-theoretic form is completely equivalent to the functional form: For a fixed choice of the $a_{j}$ and the $c_{j}$ , the supremums in both problems have a direct relationship and the cases of equality are also in correspondence [16, Theorem 2.1]. A defining feature of the BLI is that it reduces an infinite-dimensional optimization problem to a finite-dimensional optimization problem over a set of positive definite matrices. When the supremum in Theorem 2 is finite, random variables that achieve the supremum are called extremizers, and Gaussian random variables that achieve the supremum are called Gaussian extremizers. 222 In [13] a Gaussian extremizer is defined as a distribution that extremizes among the class of Gaussian distributions, but it turns out that this definition is identical to the one used here.

The existence of extremizers or Gaussian extremizers and the finiteness of $D$ are not addressed by Theorem 2, as stated above. However, this is well-understood in the literature [21, 16, 13].

Our contributions:

The classical EPI and the EPI of Zamir and Feder are valid only under certain independence assumptions. To be precise, for an 2n-valued random vector $Z$ , the EPI requires independence of $Z_{1:n}$ and $Z_{n+1:n}$ and considers the sum of these two vectors, whereas Zamir and Feder’s EPI requires all the components to be independent and considers linear transformations of $Z$ . It is natural to consider more general “mixed” independence constraints, for instance, independence of $Z_{1:k_{1}},Z_{k_{1}+1:k_{2}},\dots,Z_{k_{r}+1:n}$ for suitable choices of $k_{i}$ , and establish lower bounds on $h(AZ)$ for a matrix $A$ . This is indeed a special case of the setting considered in our work.

Consider an n-valued random vector $X:=(X_{1},\dots,X_{k})$ , where $k\leq n$ and $X_{i}$ are mutually independent ${}^{r_{i}}$ -valued random variables. Note that $\sum_{i=1}^{k}r_{i}=n$ . We consider the following function:

[TABLE]

for positive constants $d_{i}$ and $c_{j}$ where $i\in[k]$ and $j\in[m]$ for some $m\geq 1$ , and surjective linear transformations $A_{j}$ from n to ${}^{n_{j}}$ . Just as in Theorem 2, our main result in Theorem 3 states that the supremum of $f(\cdot)$ over all random variables $X$ satisfying the stated independence constraints is the same as the supremum evaluated over Gaussian random variables. In Theorem 4, we identify necessary and sufficient conditions on $n$ , $k$ , $m$ and the $r_{i}$ , $d_{i}$ , $c_{j}$ , $n_{j}$ and $A_{j}$ , such that this supremum is finite. We show that the EPI, BLI, and Zamir and Feder’s EPI easily follow from Theorem 3. Theorem 3 also provides a generalization of Zamir and Feder’s result for certain kinds of dependent random variables.

Our main technical contribution is identifying new entropic functionals and proving that they satisfy a certain subadditivity property. The work of Geng and one of the authors [22] highlighted the critical role played by subadditivity in information inequalities. How subadditivity of information theoretic functionals—which is established using the chain rule and data processing relations—can be used to determine the capacity of the Gaussian vector broadcast channel was demonstrated in that work. Once subadditivity is ascertained, a technique from functional analysis called the “doubling trick” may be used to establish Gaussian optimality. The doubling trick, attributed to Ball [23], appeared in Lieb [24] to prove that Gaussian kernels have Gaussian optimizers, and in Carlen [25] to show Gaussian optimality in the log-Sobolev inequality. Subadditivity followed by the doubling trick has been used to prove numerous information inequalities in recent years [26, 27, 28, 29, 30, 5].

Related work:

The EPI may be thought of as a limiting special case of the BLI. Gardner [14] showed that the EPI follows from the sharp form of Young’s inequality, which in turn is a special case of the BLI. This proof strategy is further clarified using a more geometric approach by Cordero-Erausquin and Ledoux [18]. The authors of [18] establish the EPI directly from Theorem 2 by carefully choosing the $a_{j}$ and $c_{j}$ as a function of a parameter $\epsilon$ that tends to 0 and yields the EPI in the limit. While these are intriguing connections, they do not suggest concrete approaches for developing information inequalities for random vectors under more general independence constraints.

Various information-theoretic analogues of hypercontractive inequalities and reverse Brascamp-Lieb inequalities in finite alphabet spaces have been studied in [31, 19, 32]. A closely related work is that of Liu et al. [20], where a novel functional inequality called the forward-reverse Brascamp-Lieb inequality is formulated, and it is shown that there exists an analogous information-theoretic version of this inequality. Most relevant to us is the forward-reverse Brascamp-Lieb inequality with linear maps that was introduced in Liu et al. [20]. Define a function $F$ of the marginal densities of an n-valued random variable $X$ :

[TABLE]

Here, by $Y_{i}\stackrel{{\scriptstyle d}}{{=}}X_{i}$ we mean that the distribution of $Y_{i}$ is identical to that of $X_{i}$ . Theorem 8 in [20] states that the supremum of $F$ is obtained when each $X_{i}$ is a centered Gaussian random variable, in which case the infimum in the definition in equation (6) is attained when the optimal coupling $Y$ is a jointly Gaussian random vector. The expressions in equations (5) and (6) look very similar. The main difference is that equation (6) has an infimum over all possible couplings $Y$ , whereas our definition in equation (5) enforces the unique coupling where the components $Y_{i}$ are mutually independent.

Structure of the paper:

In Section 2, we introduce some preliminaries and set up the notation to be used in the rest of the paper. In Section 3 we state our main result in Theorem 3 and show that the EPI, BLI, and Zamir and Feder’s EPI may be proved as special cases of this result. In Section 4, we prove Theorem 3. In Section 5, we establish necessary and sufficient conditions for the supremum of $f$ in the expression in equation (5) to be finite. In Section 6, we provide a concrete example that demonstrates the utility of Theorem 3 in obtaining EPI-like results for dependent random variables. Finally, in Section 7 we conclude the paper and describe some open problems.

2 Preliminaries and notation

Definition 1.

For $n>0$ , let $X$ be an n-valued random variable with density $f_{X}$ that lies in the convex set of probability densities

[TABLE]

Then we define the entropy of $X$ as

[TABLE]

The entropy of a [math]-dimensional random variable is defined to be 0.

Remark 2.1.

The integral in equation (7) is well-defined since the integrand is non-negative. The condition in equation (7) implies that the differential entropy integral in equation (8) is well-defined and lower-bounded away from $-\infty$ . Also note that the condition in equation (7) is inherited by marginalization, i.e. if $f$ satisfies the condition and $g$ is a (multidimensional) marginal of $f$ , then $g$ also satisfies the condition.

Definition 2 (BL datum).

For an integer $m>0$ , define an $m$ -transformation as a triple

[TABLE]

where for each $j\in[m]$ , $A_{j}:^{n}\to^{n_{j}}$ is a surjective linear transformation, and $n_{j}\geq 0$ . An $m$ -exponent is defined as an $m$ -tuple $\mathbf{c}=\{c_{j}\}_{j\in[m]}$ , such that $c_{j}\geq 0$ for $j\in[m]$ . A Brascamp-Lieb datum (BL datum) is defined as a pair $(\mathbf{A},\mathbf{c})$ where $\mathbf{A}$ is an $m$ -transformation and $\mathbf{c}$ is an $m$ -exponent, for an integer $m>0$ .

Definition 3 (EPI datum).

For an integer $k>0$ , define a $k$ -partition of $n$ as $\mathbf{r}=\{r_{i}\}_{i\in[k]},$ such that $r_{i}>0$ are integers and $\sum_{i\in[k]}r_{i}=n$ . Let $\mathbf{d}=\{d_{i}\}_{i\in[k]}$ such that $d_{i}\geq 0$ for all $i$ be a $k$ -exponent. An EPI datum is a pair $(\mathbf{r},\mathbf{d})$ where $\mathbf{r}$ is a $k$ -partition and $\mathbf{d}$ is a $k$ -exponent, for an integer $k>0$ .

Definition 4 (BL-EPI datum).

For an integer $n>0$ , a BL-EPI datum is defined as $(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ where $(\mathbf{A},\mathbf{c})$ is a BL datum for an integer $m>0$ , and $(\mathbf{r},\mathbf{d})$ is an EPI datum for an integer $k>0$ .

Definition 5.

Let $(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ be a BL-EPI datum where $\mathbf{r}$ is a $k$ -partition of $n$ . Define ${\cal P}(\mathbf{r})$ to be the set of all n-valued random vectors $X:=(X_{1},X_{2},\dots,X_{k})$ such that:

For $i\in[k]$ , the random vectors $X_{i}$ take values in ${}^{r_{i}}$ and their densities satisfy the condition in equation (7); 2. 2.

$X_{1},X_{2},\dots,X_{k}$ are independent; 3. 3.

$\mathbb{E}X=0$ and $\mathbb{E}\left\lVert X\right\rVert_{2}^{2}<\infty$ .

Since entropy expressions are not affected by adding constants, the 0-mean assumption in Definition 5 may be made without loss of generality. Define ${\cal P}_{g}(\mathbf{r})\subseteq{\cal P}(\mathbf{r})$ as the set of random variables $X$ that satisfy the properties above, while, in addition, each $X_{i}$ , $i\in[k]$ is Gaussian.

Remark 2.2.

Whether an n-valued random vector $X$ lies in ${\cal P}(\mathbf{r})$ or not is a property of its distribution. The finite variance assumption on random variables in ${\cal P}(\mathbf{r})$ implies that the entropies $h(X_{i})$ for $i\in[k]$ and $h(A_{j}X)$ for $j\in[m]$ are bounded away from $\infty$ . However, with only the variance assumption in place, it may happen that some of these entropies equal $-\infty$ , which happens, for instance, when $X$ is a constant. In this paper, we shall be dealing with differences of entropies of the form

[TABLE]

The condition in equation (7) together with the finite variance assumption has the effect of ensuring that the absolute values of the differential entropies are finite, which ensures that the above difference is well-defined for $X\in{\cal P}(\mathbf{r})$ . This is a technical assumption made for ease of presentation. In cases where the expression in equation (9) is not well-defined, we may redefine it to equal the limit

[TABLE]

where $\tilde{X}:=X+\sqrt{\delta}W$ for a standard normal $W$ independent of $X$ and the $Z_{j}$ are standard normal random vectors independent of $(X,W)$ . With this modification, our results continue to hold for random variables that satisfy all the conditions in Definition 5 except the condition in equation (7).

The following two concepts are required for Theorem 4.

Definition 6.

Let $(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ be a BL-EPI datum. Define a subspace $V\subseteq^{n}$ as being of $\mathbf{r}$ -product form if $V$ may be written as $V=V_{1}\times V_{2}\times\dots\times V_{k}$ for subspaces $V_{i}\subseteq^{r_{i}}$ , for $i\in[k]$ .

Definition 7.

Let $(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ be a BL-EPI datum. An $\mathbf{r}$ -product form subspace $V\subseteq^{n}$ is called a critical subspace if

[TABLE]

Definition 8.

For a BL-EPI datum $(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ , define $M(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ as

[TABLE]

Similarly, define $M_{g}(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ as the above supremum taken over Gaussian inputs $X\in{\cal P}_{g}(\mathbf{r})$ . When the BL-EPI datum is fixed, we shall omit the $(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ argument and use the simplified notation $M$ and $M_{g}$ .

3 Main results

We are now in a position to state our main result:

Theorem 3 (Unified EPI and BLI).

Let $(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ be a BL-EPI datum. Recall the definition

[TABLE]

Then for any $X\in{\cal P}(\mathbf{r})$ , the following inequality holds:

[TABLE]

Recall that in Definition 8 we introduced the quantity (with a simplified notation):

[TABLE]

Naturally, we have $M\geq M_{g}$ . Thus, if $M_{g}$ is $+\infty$ , then so is $M$ . If $M_{g}<\infty$ , then the above result implies $M\leq M_{g}$ , and thus $M=M_{g}$ . An equivalent way of stating the above result is asserting $M=M_{g}$ . Theorem 3 does not address the following points, which are worth investigating:

Finiteness: When is $M_{g}$ (and therefore $M$ ) finite? 2. 2.

Extremizability and Gaussian extremizability: Assuming $M$ is finite, when do extremizers exist for the supremum in equation (13), and when do Gaussian extremizers exist for the supremum in equation (12)? In particular, does extremizability imply Gaussian extremizability? (Clearly, the reverse implication is true because of Theorem 3.) 3. 3.

Uniqueness of extremizers: Assuming extremizers exists, are they unique in some appropriate sense?

The answers to all these questions will depend on the BL-EPI datum $(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ . In this paper, we resolve the first question by identifying necessary and sufficient conditions on $(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ that ensure finiteness of $M$ and $M_{g}$ . We do not address the latter two questions here. We show the following result:

Theorem 4.

For a BL-EPI datum $(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ , we have $M(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})<\infty$ if and only if the following conditions are satisfied:

[TABLE]

As we show below, Theorem 3 readily implies the EPI, BLI, and Zamir and Feder’s EPI. For this reason, we choose to interpret the inequality in Theorem 3 as a unified version of the Brascamp-Lieb inequality and the entropy power inequality.

Entropy Power Inequality:

We will prove the EPI in Lieb’s form (2) using Theorem 3. Let $X$ and $Y$ be independent d-valued random variables with zero means and bounded variances, and let $\lambda\in(0,1)$ . The expression $\lambda h(X)+(1-\lambda)h(Y)-h(\sqrt{\lambda}X+\sqrt{1-\lambda}Y)$ corresponds to $n=2d$ , $k=2$ , $r_{1}=r_{2}=d$ , $d_{1}=\lambda$ , $d_{2}=1-\lambda$ , $c_{1}=1$ , and $A_{1}=[\sqrt{\lambda}I_{d},\sqrt{1-\lambda}I_{d}]$ . Note that it is enough to prove $M_{g}=0$ by explicit calculation. Consider Gaussian random variables $Z_{1}\sim{\cal N}(0,\Sigma_{1})$ and $Z_{2}\sim{\cal N}(0,\Sigma_{2})$ . Plugging in the entropies of these Gaussian random variables and simplifying, we see that we need to evaluate the supremum

[TABLE]

This supremum is seen to be 0 via the concavity of the $\log\det$ function.

Brascamp-Lieb Inequality:

When $k=1$ , $r_{1}=n$ , and $d_{1}=1$ , we recover the setting of the Brascamp-Lieb inequality in its equivalent form of subadditivity of entropy:

[TABLE]

for all n-valued random variables $X$ with $\mathbb{E}X=0$ and $\mathbb{E}||X||_{2}^{2}<\infty$ .

Zamir and Feder’s Inequality:

Let $A$ be a $k\times n$ matrix satisfying $AA^{T}=I_{k\times k}$ . For $1\leq j\leq n$ , let the squared norm of the $j$ -th column of $A$ be denoted by $\alpha_{j}^{2}$ ; i.e.,

[TABLE]

Just as we did for the EPI, it is enough to show that $M_{g}\leq 0$ by explicitly computing the supremum of $\sum_{j=1}^{n}\alpha_{j}^{2}h(X_{j})-h(AX)$ over Gaussian $X$ . Let $\Lambda=\operatorname{Diag}(\lambda_{1},\lambda_{2},\dotsc,\lambda_{n})$ be a positive definite matrix. Define a function $F$ from the space of positive definite diagonal matrices to $\mathbb{R}$ as follows:

[TABLE]

If we show that $F(\Lambda)\geq 0$ , then Theorem 3 will immediately imply Zamir and Feder’s EPI for random vectors with independent components. Let $B:=A\Lambda^{1/2}$ , so that $A\Lambda A^{T}=BB^{T}$ . Using the Cauchy-Binet formula for the determinant of $BB^{T}$ , we obtain

[TABLE]

where $B_{i_{1}i_{2}\dots i_{k}}$ consists of the $k$ columns of $B$ corresponding to the indices $i_{1},\dots,i_{k}$ . The right hand side of the above equality may be written explicitly as

[TABLE]

Noting that $\sum_{1\leq i_{1}<\dots<i_{k}\leq n}|A_{i_{1}i_{2}\dots i_{k}}|^{2}=|AA^{T}|=|I_{k}|=1$ (again via the Cauchy-Binet formula), we may take logarithms and use Jensen’s inequality to obtain

[TABLE]

We now gather the coefficients of $\log{\lambda_{j}}$ for a fixed $j$ . The coefficient of $\log{\lambda_{1}}$ is given by

[TABLE]

Here, the first equality follows by using the Cauchy-Binet formula again, the second equality follows from the orthogonality of the rows of $A$ , and the third equality is true because $|I_{n}-uu^{T}|=1-\left\lVert u\right\rVert^{2}$ for any vector $u$ . A similar calculation can be done to show that the coefficient of $\log{\lambda_{j}}$ is $\alpha_{j}^{2}$ for all $1\leq j\leq n$ , which completes the proof of $F(\Lambda)\geq 0$ .

4 Proof of Theorem 3

Our proof strategy relies on the technique of Geng and Nair [22] which was developed to solve optimization problems of the form $\sup_{\text{Cov}(X)\preceq\Sigma}s(X).$ A rough sketch of this proof strategy is outlined below:

•

Concave envelope: Define the concave envelope of $s$ , denoted by $S$ , as the smallest concave function that pointwise dominates $s$ . It can be seen that

[TABLE]

where the supremum is over finite auxiliary random variables $U$ with support ${\cal U}$ .

•

Subadditivity of $S$ : This step consists of defining $S$ on the larger space of pairs of random variables $(X_{1},X_{2})$ . A straightforward extension often exists for information-theoretic functions $S$ . The subadditivity result shows that

[TABLE]

The ingredients for establishing the subadditivity result developed in this paper stems from the ideas to establish converses to coding theorems and outer bounds in network information theory. An argument with a flavor similar to that employed here can be found outlined in [33].

•

Optimizers of $S$ : In this step (also known as the doubling trick), we consider two i.i.d. copies of any optimizer $X$ of $S(X)$ , say $(X_{1},X_{2})$ , and show that $(X_{1}+X_{2})/\sqrt{2}$ and $(X_{1}-X_{2})/\sqrt{2}$ are also optimizers of $S(X)$ . From here, we may use Gaussian characterization results [34] or the central limit theorem [22] to conclude that it is enough to consider only Gaussian optimizers.

•

Optimizers of $s$ : In this final step, we show that the optimal value for $S(X)$ is attained by a single Gaussian distribution; i.e., we may assume without loss of generality that $\left\lvert{\cal U}\right\rvert=1$ , and thus this Gaussian also maximizes $s(X)$ .

The crux of the proof is establishing the subadditivity of $S$ . Our proof relies on the expanding the joint entropy $h(X_{1},X_{2})$ in two separate ways as follows:

(A)

$h(X_{1},X_{2})=h(X_{1})+h(X_{2})-I(X_{1};X_{2})$ ,

(B)

$h(X_{1},X_{2})=h(X_{1}|X_{2})+h(X_{2}|X_{1})+I(X_{1};X_{2})$ .

To highlight the main ideas, we present a proof sketch of the subadditivity result for the EPI using our new technique.

4.1 Proving the EPI via subadditivity

Consider the function

[TABLE]

where $X_{1}\perp\!\!\!\perp Y_{1}$ . Define the lifting of $s$ to the space of pairs of random variables by

[TABLE]

where $X_{1:2}\perp\!\!\!\perp Y_{1:2}$ . Let $S(X_{1},Y_{1})$ and $S(X_{1:2},Y_{1:2})$ be the respective concave envelopes of $s$ and its lifting. 333 To get $S(X_{1},Y_{1})$ from $s(X_{1},Y_{1})$ , we can think of the domain of $s(X_{1},Y_{1})$ as being the product of the convex set of probability densities on $x$ satisfying (7) and the convex set of probability densities on $y$ satisfying (7), and take the concave hull on this product space; similarly for getting $S(X_{1:2},Y_{1:2})$ from $s(X_{1:2},Y_{1:2})$ . It can be checked that any product distribution on $(X_{1},Y_{1})$ got by a mixture of product distributions can be viewed as having the mixing done on the marginals, basically because if $p(x)q(y)=\sum_{i}\lambda_{i}p_{i}(x)q_{i}(y)$ where $\lambda_{i}\geq 0$ and $\sum_{i}\lambda_{i}=1$ then summing over $y$ on both sides gives $p(x)=\sum_{i}\lambda_{i}p_{i}(x)$ and similarly $q(y)=\sum_{i}\lambda_{i}q_{i}(y)$ . This justifies why we can write (20) and the analogous expression for $S(X_{1:2},Y_{1:2})$ .

We would like to show the subadditivity relation

[TABLE]

Notice that

[TABLE]

and similarly for $S(X_{1:2},Y_{1:2})$ . For any auxiliary random variable $U$ satisfying $X_{1:2}\rightarrow U\rightarrow Y_{1:2}$ , applying expansion (A) to each entropy term in equation (18) (conditioned on $U$ ) yields

[TABLE]

For simplicity, call the terms in the brackets $T_{1}(U)$ , $T_{2}(U)$ , and $T_{3}(U)$ respectively, even though they actually depend on $p_{U|X_{1:2},Y_{1:2}}$ . Observing that $X_{i}\to U\to Y_{i}$ for $i=1,2$ , we may conclude $T_{1}(U)\leq S(X_{1},Y_{1})$ and $T_{2}(U)\leq S(X_{2},Y_{2})$ . Substituting these inequalities, we arrive at

[TABLE]

We now expand the expression in equation (18) (conditioned on $U$ ) using expansion (B) for each entropy term:

[TABLE]

For ease of notation, call the three terms $R_{1}(U)$ , $R_{2}(U)$ , and $R_{3}(U)=-T_{3}(U)$ , even though they actually depend on $p_{U|X_{1:2},Y_{1:2}}$ . Similar to inequality (22), we would like to upper bound $R_{1}(U)$ and $R_{2}(U)$ by $S(X_{1},Y_{1})$ and $S(X_{2},Y_{2})$ respectively. However, the conditioning for the entropy terms in each of the $R_{i}(U)$ is not the same so we cannot directly conclude such a bound. Using the chain rule of mutual information and data-processing relations, we may make the conditioning in $R_{1}(U)$ and $R_{2}(U)$ uniform by introducing some extra mutual information terms:

[TABLE]

where the notational conventions $\tilde{R}_{1}(U)$ and $I_{1}(U)$ are used even though the respective terms actually depend on $p_{U|X_{1:2},Y_{1:2}}$ . The main step in the preceding equation is justified as follows. First, it it easy to check using the Markov relation $(X_{1},X_{2})\rightarrow U\rightarrow(Y_{1},Y_{2})$ that

[TABLE]

Also, we may verify that

[TABLE]

Similar reasoning for $R_{2}(U)$ gives

[TABLE]

where the notational conventions $\tilde{R}_{2}(U)$ and $I_{2}(U)$ are used even though the respective terms actually depend on $p_{U|X_{1:2},Y_{1:2}}$ . Substituting the expressions for $R_{1}(U)$ and $R_{2}(U)$ in the expansion in equation (23), we arrive at

[TABLE]

Here, in step $(a)$ we used the Markov chains $X_{1}\rightarrow(U,X_{2},Y_{2})\rightarrow Y_{1}$ and $X_{2}\rightarrow(U,X_{1},Y_{1})\rightarrow Y_{2}$ . Step $(b)$ follows by noticing that $I_{1}(U)$ and $I_{2}(U)$ are non-negative, being mutual information expressions.

Inequalities (22) and (24) may now be used in tandem to conclude

[TABLE]

Taking the supremum over all auxiliary random variables $U$ satisfying $X_{1:2}\rightarrow U\rightarrow Y_{1:2}$ leads to

[TABLE]

Notice that the above proof not only gives us subadditivity, but also states that if there is equality in equation (25) for some optimal $U^{*}$ , then $I_{1}(U^{*})=I_{2}(U^{*})=T_{3}(U^{*})=0$ . This leads to several independence conditions that can be used establish Gaussian optimality. We do not sketch this part of the proof here.

In what follows, we develop this outline into a rigorous proof for a more general result in two stages. In Section 4.2 we establish the key subadditivity inequality and the independence relations that follow from the conditions for equality in that inequality, and in Section 4.3 we complete the proof of Theorem 3 by proving Gaussian optimality.

4.2 Subadditivity lemma

4.2.1 Preliminaries

Let $(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ be a BL-EPI datum. Let $X:=(X_{1},X_{2},\dots,X_{k})\in{\cal P}(\mathbf{r})$ , where $X_{i}\sim p_{X_{i}}$ . A natural definition for $s(X)$ would be

[TABLE]

and one might then work with its concave envelope $S$ . However, for technical reasons we consider Gaussian-smoothed random variables in defining $s$ as follows:

Definition 9.

Let $W_{i}\sim{\cal N}(0,I_{r_{i}\times r_{i}}),i\in[k]$ be mutually independent standard normal random variables on ${}^{r_{i}}$ , and let $W:=(W_{1},\ldots,W_{k})$ . For $j\in[m]$ , define independent Gaussian random variables $Z_{j}\sim{\cal N}(0,I_{n_{j}\times n_{j}})$ , and let $Z:=(Z_{1},Z_{2},\dots,Z_{m})$ . Assume that the random variables $X$ , $W$ and $Z$ are mutually independent. For $\epsilon,\delta\geq 0$ define $s_{\epsilon,\delta}:{\cal P}(\mathbf{r})\to$ as

[TABLE]

Let $S_{\epsilon,\delta}$ be the concave envelope of $s_{\epsilon,\delta}$ . Let $U$ be an auxiliary random variable taking values in a finite set ${\cal U}$ such that we have $p_{X|U}(\cdot|U)\in{\cal P}(\mathbf{r})$ . It is easy to see that the concave envelope has an equivalent definition in terms of such choices of $U$ :

[TABLE]

where, on the right hand side of equation (28), we can assume that $W$ , $Z$ and $(U,X)$ are mutually independent. For a particular choice of $U$ , define

[TABLE]

Analogous to ${\cal P}(\mathbf{r})$ , define ${\cal P}(2\mathbf{r})$ to be the set of random variables that take values in ${}^{2r_{1}}\times\dots\times^{2r_{k}}$ and satisfy the conditions in Definition 5. More precisely, a random vector $(X_{1},X_{2})$ is in ${\cal P}(2\mathbf{r})$ if $X_{1}:=(X_{11},\ldots,X_{k1})$ and $X_{2}:=(X_{12},\ldots,X_{k2})$ are n-valued random vectors such that the random vectors $(X_{i1},X_{i2})\in^{2r_{i}}$ , $i\in[k]$ are mutually independent, satisfy the condition in equation (7), and condition 3 of Definition 5 holds for $(X_{1},X_{2})$ . Since the condition in equation (7) is inherited by marginalization, we have that if $(X_{1},X_{2})\in{\cal P}(2\mathbf{r})$ then $X_{1}\in{\cal P}(\mathbf{r})$ and $X_{2}\in{\cal P}(\mathbf{r})$ .

We will need to define an extension of $S_{\epsilon,\delta}$ to the larger space ${\cal P}(2\mathbf{r})$ . Consider a random vector $(X_{1},X_{2})\in{\cal P}(2\mathbf{r})$ as in the preceding paragraph. Define

[TABLE]

where $(W_{1},W_{2},Z_{1},Z_{2})$ are mutually independent standard normal distributions of the appropriate dimensions that are independent of $(X_{1},X_{2})$ . The concave envelope of $s_{\epsilon,\delta}(X_{1},X_{2})$ can be written as:

[TABLE]

where $W_{1}$ , $W_{2}$ , $Z_{1}$ , $Z_{2}$ and $(U,X_{1},X_{2})$ are mutually independent, with $U$ taking values in finite sets ${\cal U}$ and $p_{X_{1},X_{2}|U}(\cdot,\cdot|U)\in{\cal P}(2\mathbf{r})$ . Figure 1 illustrates the relations between the random variables via a graphical model.

4.2.2 Proof of subadditivity

Lemma 4.1 (Subadditivity lemma).

For any $\epsilon,\delta\geq 0$ , the function $S_{\epsilon,\delta}$ is subadditive; i.e., if $(X_{1},X_{2})\in{\cal P}(2\mathbf{r})$ then

[TABLE]

Corollary 4.1.

For any $\epsilon,\delta\geq 0$ , the function $S_{\epsilon,\delta}$ tensorizes; i.e., if $X_{1},X_{2}\in{\cal P}(\mathbf{r})$ and if $X_{1}\perp\!\!\!\perp X_{2}$ , then

[TABLE]

Proof of Lemma 4.1.

Let $U$ be an auxiliary random variable taking values in a finite set ${\cal U}$ , such that $p_{X_{1},X_{2}\mid U}(\cdot,\cdot|U)\in{\cal P}(2\mathbf{r})$ . Consider the following expansion, which comes from applying expansion (A) term by term:

[TABLE]

For simplicity, denote the terms in the square brackets by $T_{1}(U)$ , $T_{2}(U)$ , and $T_{3}(U)$ , respectively, even though they actually depend on $p_{U|X_{1},X_{2}}$ . Observe that that $p_{X_{1}|U}(\cdot|U),p_{X_{2}|U}(\cdot|U)\in{\cal P}(\mathbf{r})$ (see Figure 1). Thus, we conclude that $T_{1}(U)\leq S_{\epsilon,\delta}(X_{1})$ and $T_{2}(U)\leq S_{\epsilon,\delta}(X_{2})$ , using the definition in equation (28). Substituting these inequalities, we arrive at

[TABLE]

We now expand $s_{\epsilon,\delta}(X_{1},X_{2}\mid U)$ in a different way, which comes from applying expansion (B) term by term:

[TABLE]

For ease of notation, call the three terms in the square brackets $R_{1}(U)$ , $R_{2}(U)$ , and $R_{3}(U)=-T_{3}(U)$ , respectively, even though each term actually depends on $p_{U|X_{1},X_{2}}$ . Similar to inequality (35), we would like to upper bound $R_{1}(U)$ and $R_{2}(U)$ by $S_{\epsilon,\delta}(X_{1})$ and $S_{\epsilon,\delta}(X_{2})$ respectively. However, the conditioning in each of the two differential entropy terms in each $R_{a}(U)$ , $a=1,2$ is not the same, so we cannot directly conclude such a bound. Using the chain rule of mutual information and data-processing relations, we may make the conditioning in $R_{1}(U)$ and $R_{2}(U)$ uniform by introducing some extra mutual information terms:

[TABLE]

where we write $\tilde{R}_{1}(U)$ and $I_{1}(U)$ for simplicity, even thought the corresponding terms depend on $p_{U|X_{1},X_{2}}$ . The above steps are justified as follows. First, it is easy to check that $(X_{i1}+\sqrt{\delta}W_{i1})\perp\!\!\!\perp\{X_{l2}+\sqrt{\delta}W_{l2}\}_{l\neq i}$ conditioned on $(U,X_{i2}+\sqrt{\delta}W_{i2})$ . This means that, for all $1\leq i\leq k$ ,

[TABLE]

Also, we may verify the Markov chain (conditioned on $U$ )

[TABLE]

which gives the equality

[TABLE]

Similar reasoning for $R_{2}(U)$ gives

[TABLE]

where we use the notation $\tilde{R}_{2}(U)$ and $I_{2}(U)$ for simplicity, even though the corresponding terms depend on $p_{U|X_{1},X_{2}}$ . Substituting the expressions for $R_{1}(U)$ and $R_{2}(U)$ in the expansion in equation (37), we arrive at

[TABLE]

Here, in step $(a)$ we used the fact that $p_{X_{1}|U,X_{2}+\sqrt{\delta}W_{2}}(\cdot|U,X_{2}+\sqrt{\delta}W_{2}),p_{X_{2}|U,X_{1}+\sqrt{\delta}W_{1}}(\cdot|U,X_{1}+\sqrt{\delta}W_{1})\in{\cal P}(\mathbf{r})$ and the definition in equation (28). Step $(b)$ follows by noticing that the $c_{j}$ are non-negative, and so are $I_{1}(U)$ and $I_{2}(U)$ since they are nonnegative linear combinations of mutual informations.

We can combine inequalities (35) and (38) to get

[TABLE]

Taking the supremum on the left hand side of this inequality over all auxiliary variables $U$ taking values in finite sets ${\cal U}$ , such that $p_{X_{1},X_{2}\mid U}(\cdot,\cdot|U)\in{\cal P}(2\mathbf{r})$ , yields the claimed subadditivity result. ∎

Proof of Corollary 4.1.

When $X_{1}\perp\!\!\!\perp X_{2}$ , we have the inequality

[TABLE]

This is because we can always choose $U:=(U_{1},U_{2})$ such that $(U_{1},X_{1})\perp\!\!\!\perp(U_{2},X_{2})$ and $p_{X_{1}|U_{1}}(\cdot|U_{1})$ , $p_{X_{2}|U_{2}}(\cdot|U_{2})\in{\cal P}(\mathbf{r})$ . The supremum in equation (4.2.1) over this restricted class of auxiliaries is simply $S_{\epsilon,\delta}(X_{1})+S_{\epsilon,\delta}(X_{2})$ , which therefore is a lower bound on $S_{\epsilon,\delta}(X_{1},X_{2})$ . Inequality (40) combined with Lemma 4.1 completes the proof of Corollary 4.1. ∎

Our next lemma serves to some extent as a converse to Corollary 4.1. In particular, we show that if $S_{\epsilon,\delta}(X_{1},X_{2})=S_{\epsilon,\delta}(X_{1})+S_{\epsilon,\delta}(X_{2})$ , then $X_{1}$ and $X_{2}$ are independent conditioned on the optimal auxiliary $U^{*}$ , assuming it exists. We point out that this converse requires $\epsilon$ and $\delta$ to be strictly bounded away from [math], unlike Lemma 4.1. The formal statement is as follows:

Lemma 4.2 (Independence relations).

Fix $\epsilon,\delta>0$ . Given $(X_{1},X_{2})\in{\cal P}(2\mathbf{r})$ , suppose that $S_{\epsilon,\delta}(X_{1},X_{2})=S_{\epsilon,\delta}(X_{1})+S_{\epsilon,\delta}(X_{2})$ . Suppose that $U^{*}$ is such that $p_{X_{1},X_{2}|U^{*}}(\cdot,\cdot|U^{*})\in{\cal P}(2\mathbf{r})$ and $s_{\epsilon,\delta}(X_{1},X_{2}\mid U^{*})=S_{\epsilon,\delta}(X_{1},X_{2})$ . Then the following results hold:

(a)

For all $u^{*}\in{\cal U}^{*}$ , we have that $X_{1}\perp\!\!\!\perp X_{2}$ conditioned on $U^{*}=u^{*}$ , 2. (b)

$s_{\epsilon,\delta}(X_{1}|U^{*})=S_{\epsilon,\delta}(X_{1})$ * and $s_{\epsilon,\delta}(X_{2}|U^{*})=S_{\epsilon,\delta}(X_{2})$ .*

Proof.

Notice that the proof of Lemma 4.1 implies that the optimizing $U^{*}$ , if it exists, must satisfy $I_{1}(U^{*})=I_{2}(U^{*})=T_{3}(U^{*})=0$ . The first two equalities yield the Markov chains (conditioned on $U^{*}=u^{*}$ )

[TABLE]

However, we have the obvious Markov chains

[TABLE]

Using Lemma A.1, we may conclude that, conditioned on $U^{*}$ , we have

[TABLE]

Recall that $T_{3}(U^{*})$ is given by

[TABLE]

Substituting the above independence relations in $T_{3}(U^{*})=0$ , we conclude that, conditioned on $U^{*}$ , we have

[TABLE]

which by Lemma A.2 implies that, conditioned on $U^{*}$ , we have

[TABLE]

and concludes the proof of (a).

Having proved (a), rewrite equation (34), with $U^{*}$ for $U$ , as

[TABLE]

The above inequality, combined with the assumed equality $s_{\epsilon,\delta}(X_{1},X_{2}|U^{*})=S_{\epsilon,\delta}(X_{1})+S_{\epsilon,\delta}(X_{2})$ , immediately yields

[TABLE]

∎

4.2.3 A general subadditivity result

A closer inspection of the proof of Lemma 4.1 reveals that the linear functions mapping $X$ to $A_{j}X$ could be replaced with general channels. To be precise, let $X=(X_{1},X_{2},\dots,X_{k})\in{\cal P}(\mathbf{r})$ and for $j\in[m]$ , consider $m$ channels $p_{Y_{j}|X}$ from $X$ to $Y_{j}$ . Define the function $s:{\cal P}(\mathbf{r})\to$ as

[TABLE]

and let $S$ be its concave envelope. The function $s$ is lifted to pairs of random variables $(X_{1},X_{2})\in{\cal P}(2\mathbf{r})$ as

[TABLE]

where the channel from $(X_{1},X_{2})$ to $(Y_{j1},Y_{j2})$ is given by $p_{Y_{j1},Y_{j2}|X_{1},X_{2}}=p_{Y_{j1}|X_{1}}p_{Y_{j2}|X_{2}}$ . Let $S(X_{1},X_{2})$ be the concave envelope of $s(X_{1},X_{2})$ .

Claim 4.1.

The function $S$ is subadditive; i.e., $S(X_{1},X_{2})\leq S(X_{1})+S(X_{2})$ .

Proof.

Let $U$ be an auxiliary random variable taking values in a finite set ${\cal U}$ , such that $p_{X_{1},X_{2}\mid U}(\cdot,\cdot|U)\in{\cal P}(2\mathbf{r})$ . Note that

[TABLE]

To verify step (a) it suffices to show that $h(X_{i2}|U,X_{i1})\leq h(X_{i2}|U,X_{1})$ for each $i\in[k]$ . In fact we have equality here because, as is easily verified, we have $(X_{l1},l\neq i)$ conditionally independent of $X_{i2}$ given $X_{i1}$ and $U$ . To verify the last inequality, observe that $(X_{i1},1\leq i\leq k)$ are conditionally independent given $U$ and $(X_{i2},1\leq i\leq k)$ are conditionally independent given $(U,X_{1})$ . Taking a supremum over $U$ completes the proof. ∎

We make several remarks. First, observe that only the $c_{j}$ need to be non-negative; no such condition is necessary for the $d_{i}$ . 444 However, studying the maximum over ${\cal P}(\mathbf{r})$ of an expression like (9) when some of the $d_{i}$ are negative is not interesting because the maximum over ${\cal P}_{g}(\mathbf{r})$ is $\infty$ , as can be seen by letting the covariance matrix of the component corresponding to any factor with negative $d_{i}$ tend to [math].

Second, while this proof is very simple compared to that of Lemma 4.1, the independence relations in Lemma 4.2—which are critical to the proof of Gaussian optimality—cannot be directly deduced from the above proof. However, this is not such a big impediment. Instead of $s(X)$ , we could consider a slightly modified function $s_{\epsilon}(X)$ defined by

[TABLE]

where $Z$ is a standard Gaussian that is independent of $X$ and $Y_{j}=A_{j}X$ . It is not hard to show that the concave envelope of $s_{\epsilon}$ is subadditive; in fact, the same steps as in the proof of Claim 4.1 suffice. Further, including the extra mutual information term allows one to deduce independence relations analogous to those in Lemma 4.2. This approach provides an alternate route to proving Theorem 3.

4.3 Proof of Theorem 3

Having proved the key subadditivity step, the rest of the proof closely follows the steps outlined in [22, Appendix II].

Definition 10.

Let $\Sigma:=\text{Diag}(\Sigma_{1},\Sigma_{2},\dots,\Sigma_{k})$ be an $n\times n$ block diagonal matrix such that each $\Sigma_{i}$ is an $r_{i}\times r_{i}$ positive definite matrix. For $\epsilon,\delta>0$ , define

[TABLE]

where $\preceq$ denotes ordering in the positive semidefinite partial order.

Lemma 4.3.

There exist random variables $X^{*}$ and $U^{*}$ satisfying (1) $|{\cal U}^{*}|\leq\sum_{i=1}^{k}\frac{r_{i}(r_{i}+1)}{2}+1$ ; (2) $X^{*}\in{\cal P}(\mathbf{r})$ ; and (3) $\mathbb{E}X^{*}{X^{*}}^{T}\preceq\Sigma$ , such that the following holds:

[TABLE]

Proof of Lemma 4.3.

Let $(X^{(t)},t\geq 1)$ be a sequence of random variables such that $\mathbb{E}X^{(t)}(X^{(t)})^{T}=\widehat{\Sigma}$ and $s_{\epsilon,\delta}(X^{(t)})\uparrow v(\widehat{\Sigma})$ as $t\to\infty$ . This sequence of random variables is tight due to the covariance constraint [22, Proposition 17], and thus we may assume without loss of generality that the $X^{(t)}$ converge weakly to a random variable $X^{\widehat{\Sigma}}$ as $t\to\infty$ . Since $X^{(t)}+\sqrt{\delta}W$ satisfies the necessary regularity conditions as in [22, Proposition 18], we also have $h(X^{(t)}_{i}+\sqrt{\delta}W_{i})\to h(X^{\widehat{\Sigma}}_{i}+\sqrt{\delta}W_{i})$ for $i\in[k]$ , and $h(A_{j}(X^{(t)}+\sqrt{\delta}W)+\sqrt{\epsilon}Z_{j})\to h(A_{j}(X^{\widehat{\Sigma}}+\sqrt{\delta}W)+\sqrt{\epsilon}Z_{j})$ for $j\in[m]$ . Hence we may conclude $s_{\epsilon,\delta}(X^{\widehat{\Sigma}})=v(\widehat{\Sigma})$ .

Recall that $V(\Sigma)$ is defined as

[TABLE]

where, for the moment, $M$ ranges over positive integers of arbitrary size. The equality in $(a)$ is because we may restrict $p_{X|U}(\cdot|U)$ to the class of optimizers $X^{\widehat{\Sigma}}$ for $\widehat{\Sigma}\succeq 0$ . We now show that we can fix $M$ to be $\sum_{i=1}^{k}{{r_{i}+1}\choose 2}+1$ in (46). Let $\mathcal{T}$ denote the connected subset of positive definite matrices $\Sigma$ of the form $\text{Diag}(\Sigma_{1},\dots,\Sigma_{k})$ where $\Sigma_{i}$ is an $r_{i}\times r_{i}$ positive definite matrix for $i\in[k]$ . Consider the connected compact subset, $\mathcal{V}$ , of the $M$ -dimensional Euclidean space obtained using the continuous mapping $\Phi:\mathcal{T}\mapsto\mathbb{R}^{M},$ defined by $\Phi(\Sigma)=\left(\{\Sigma_{i}(j,k)_{1\leq j\leq k\leq r_{i}}\},v(\Sigma)\right)$ , where $M:=\sum_{i=1}^{k}{{r_{i}+1}\choose 2}+1$ . Fenchel’s extension of Carathéodory’s Theorem [35, Theorem 1.3.7] states that any finite convex combination of points in $\mathcal{V}$ , can be represented as a convex combination of at most $M$ points in $\mathcal{V}$ . Hence for any $(U,X^{\Sigma_{U}})$ we can find a pair $(U^{\prime},X^{\Sigma_{U^{\prime}}})$ with $U^{\prime}$ taking at most $M$ values, such that $E(\Sigma_{U})=E(\Sigma_{U^{\prime}})$ and $E(v(\Sigma_{U}))=E(v(\Sigma_{U^{\prime}}))$ . Thus from this point onwards in the proof we define $M:=\sum_{i=1}^{k}{{r_{i}+1}\choose 2}+1$ in (46).

Consider any sequence of convex combinations $\left(\{\alpha_{l}^{(t)}\}_{l=1}^{M},\{\widehat{\Sigma}_{l}^{(t)}\}_{l=1}^{M}\right)$ with $\sum_{l=1}^{M}\alpha_{l}^{(t)}\widehat{\Sigma}_{l}^{(t)}\preceq\Sigma$ for all $t\geq 1$ , and such that $\sum_{l=1}^{M}\alpha_{l}^{(t)}v(\widehat{\Sigma}_{l}^{(t)})$ converges to $v(\Sigma)$ as $t\to\infty$ . Appealing to the compactness of the $M$ -dimensional simplex, we may assume without loss of generality that $\alpha_{l}^{(t)}\to\alpha^{*}_{l}$ for all $i\in[M]$ . If any of the $\alpha^{*}_{l}$ equals [math], then noticing that $\alpha_{l}^{(t)}\widehat{\Sigma}_{l}^{(t)}\preceq\Sigma$ gives us

[TABLE]

where $C_{0}$ is some constant that does not depend on $t$ . In $(a)$ , we used the fact that each $h(X_{i}+\sqrt{\delta}W_{i})$ is upper-bounded by the entropy of a Gaussian random variable with the same covariance matrix as $X_{i}+\sqrt{\delta}W_{i}$ , and $h(A_{j}(X+\sqrt{\delta}W)+\sqrt{\epsilon}Z_{j})\geq h(\sqrt{\epsilon}Z_{j})$ .

It is now clear that the limit $\alpha_{l}^{(t)}v(\widehat{\Sigma}^{(t)}_{l})$ as $t\to\infty$ is equal to 0 whenever $\alpha_{l}^{(t)}\to 0$ . Thus, we may assume that $\min_{l\in[M]}\alpha^{*}_{l}=\alpha_{\min}>0$ , by splitting a component $\alpha_{l}^{(t)}v(\widehat{\Sigma}_{l}^{(t)})$ into multiple components if necessary. This implies that $\widehat{\Sigma}^{(t)}_{l}\preceq\frac{2\Sigma}{\alpha_{\min}}$ for all large enough $t$ . Hence, we can find a convergent subsequence such that $\widehat{\Sigma}^{(t)}_{l}\rightarrow\Sigma_{l}^{*}$ for each $l\in[M]$ when $t\to\infty$ along this subsequence. We arrive at

[TABLE]

or, in other words, we can find a pair of random variables $(X^{*},U^{*})$ with $|{\cal U}^{*}|\leq M$ such that $V(\Sigma)=s_{\epsilon,\delta}(X^{*}|U^{*})$ . This completes the proof. ∎

Lemma 4.4.

Consider random variables $(X_{1},X_{2},U)$ such that $(X_{1},X_{2})\in{\cal P}(2\mathbf{r})$ for some $\mathbf{r}$ -partition of $n>0$ . Define new random variables $X_{+}$ and $X_{-}$ via

[TABLE]

Then $s_{\epsilon,\delta}(X_{1},X_{2}|U)=s_{\epsilon,\delta}(X_{+},X_{-}|U)$ .

Proof.

We have the equality

[TABLE]

Further, defining $W_{i+}:=\frac{W_{i1}+W_{i2}}{\sqrt{2}}$ , $W_{i-}:=\frac{W_{i1}-W_{i2}}{\sqrt{2}}$ , $Z_{j+}:=\frac{Z_{j1}+Z_{j2}}{\sqrt{2}}$ , and $Z_{j-}:=\frac{Z_{j1}-Z_{j2}}{\sqrt{2}}$ , we have

[TABLE]

and

[TABLE]

$(W_{1},W_{2},Z_{1},Z_{2})$ and $(W_{+},W_{-},Z_{+},Z_{-})$ are equal in distribution. Multiplying the equations in (49) by $d_{i}$ and those in (4.3) by $c_{j}$ and subtracting the sum of the latter from the sum of the former, we may conclude that $s_{\epsilon,\delta}(X_{1},X_{2}|U)=s_{\epsilon,\delta}(X_{+},X_{-}|U)$ . ∎

Lemma 4.5.

Fix $\epsilon,\delta>0$ . Let the random variables $X^{*}$ and $U^{*}$ be as in Lemma 4.3; i.e., satisfying the equality $V(\Sigma)=s_{\epsilon,\delta}(X^{*}|U^{*})$ , and with $|{\cal U}^{*}|\leq M$ . Consider two independent and identically distributed copies of $(X^{*},U^{*})$ , denoted by $(X_{1},U_{1})$ and $(X_{2},U_{2})$ . Define new random variables $X_{+}$ and $X_{-}$ as follows:

[TABLE]

Also, define $U:=(U_{1},U_{2})$ . Then the following results hold:

(a)

$X_{+}$ * and $X_{-}$ are conditionally independent given $U$ ,* 2. (b)

$V(\Sigma)=s_{\epsilon,\delta}(X_{+}|U)$ * and $V(\Sigma)=s_{\epsilon,\delta}(X_{-}|U)$ .*

Proof.

We have the following sequence of inequalities:

[TABLE]

Here $(a)$ follows from the assumption that $s_{\epsilon,\delta}(X^{*}|U^{*})=V(\Sigma)$ . Equality $(b)$ follows from the independence $(X_{1},U_{1})\perp\!\!\!\perp(X_{2},U_{2})$ . Equality $(c)$ holds because of Lemma 4.4. Inequality $(d)$ follows from the definition of $S_{\epsilon,\delta}(\cdot)$ . Inequality $(e)$ follows from the tensorization result in Lemma 4.1. Finally, inequality $(f)$ follows from the definition in equation (44), and the fact that $X_{+}$ and $X_{-}$ have the same covariance as $X^{*}$ , which is bounded above by $\Sigma$ in the positive semidefinite partial order.

Since the first and last expressions match, all the inequalities in the above sequence of inequalities must be equalities. In particular, equalities $(d)$ and $(e)$ combined with Lemma 4.2 imply that $X_{+}\perp\!\!\!\perp X_{-}$ conditioned on $(U_{1},U_{2})$ , thus establishing part (a) of the lemma. Lemma 4.2 also gives $s_{\epsilon,\delta}(X_{+}|U_{1},U_{2})=S_{\epsilon,\delta}(X_{+})$ and $s_{\epsilon,\delta}(X_{-}|U_{1},U_{2})=S_{\epsilon,\delta}(X_{-})$ . Finally, equality in $(f)$ gives $S_{\epsilon,\delta}(X_{+})=V(\Sigma)$ and $S_{\epsilon,\delta}(X_{-})=V(\Sigma)$ . This completes the proof of part (b). ∎

Lemma 4.6.

There exists $G^{*}\sim{\cal N}(0,\Sigma^{*})\in{\cal P}(\mathbf{r})$ such that $\Sigma^{*}\preceq\Sigma$ and $V(\Sigma)=s_{\epsilon,\delta}(G^{*})$ . Furthermore, the random variable $G^{*}$ is the unique element of the set ${\cal P}(\mathbf{r})\cap\{X~{}:~{}\mathbb{E}XX^{T}\preceq\Sigma\}$ satisfying $s_{\epsilon,\delta}(X)=V(\Sigma)$ .

Proof.

Consider the setting as in Lemma 4.5. Using Lemma 4.5, we have that $X_{+}\perp\!\!\!\perp X_{-}$ conditioned on $U=(u_{1},u_{2})$ for any $u_{1},u_{2}\in{\cal U}^{*}$ . However, we also have $X_{1}\perp\!\!\!\perp X_{2}$ conditioned on $U=(u_{1},u_{2})$ . The characterization theorem for Gaussian distributions [34] implies that $X_{1}$ and $X_{2}$ must be Gaussian with identical covariance matrices, conditioned on $U=(u_{1},u_{2})$ . Recall that $(X_{1},U_{1})$ is independent of $(X_{2},U_{2})$ , and the covariance matrix of $X_{i}$ conditioned on $U=(u_{1},u_{2})$ is simply the covariance matrix of $X_{i}$ conditioned on $U_{i}=u_{i}$ for $i\in\{1,2\}$ . Since $u_{1}$ and $u_{2}$ may be chosen arbitrarily, we conclude that the covariance matrix of $X_{1}$ is some fixed $\Sigma^{*}\preceq\Sigma$ for all $u_{1}\in{\cal U}^{*}$ . Let $G^{*}\sim{\cal N}(0,\Sigma^{*})$ . Thus,

[TABLE]

To establish uniqueness, first note that it is enough to only consider Gaussian random variables $X$ satisfying $s_{\epsilon,\delta}(X)=V(\Sigma)$ , since our argument above shows that any $X$ that achieves this equality must be Gaussian. Now suppose that $G_{1}\sim{\cal N}(0,\Sigma_{1})$ and $G_{2}\sim{\cal N}(0,\Sigma_{2})$ are two distinct random variables such that $s_{\epsilon,\delta}(G_{1})=s_{\epsilon,\delta}(G_{2})=V(\Sigma)$ with $\Sigma_{1},\Sigma_{2}\preceq\Sigma$ . Define $(X,U)$ such that $X=G_{1}$ when $U=1$ and $X=G_{2}$ when $U=2$ . Suppose also that $U$ takes values 1 and 2 with probability $1/2$ , each. It is easy to check that $X$ satisfies the covariance constraint, and that $s_{\epsilon,\delta}(X|U)=V(\Sigma).$ As in Lemma 4.5, consider two i.i.d. copies of $(X_{1},U_{1})$ and $(X_{2},U_{2})$ of $(X,U)$ . Lemma 4.5 states that conditioned on $(U_{1}=u_{1},U_{2}=u_{2})$ , we have $X_{1}+X_{2}\perp\!\!\!\perp X_{1}-X_{2}$ , for any values of $u_{1}$ and $u_{2}$ . Conditioned on $u_{1}=1$ and $u_{2}=2$ , we have $X_{1}+X_{2}=G_{1}+G_{2}$ and $X_{1}-X_{2}=G_{1}-G_{2}$ . This implies $G_{1}+G_{2}\perp\!\!\!\perp G_{1}-G_{2}$ , which is impossible since $\Sigma_{1}\neq\Sigma_{2}$ , and thus there cannot be two distinct Gaussian maximizers. ∎

Proof of Theorem 3.

We now complete the proof of Theorem 3. Recall the definition of $M_{g}$ :

[TABLE]

Clearly, there is nothing to prove if $M_{g}$ is infinite, so we assume $M_{g}<\infty$ . Let $X\in{\cal P}(\mathbf{r})$ be an arbitrary random vector. By choosing a large enough $\Sigma$ such that $\mathbb{E}XX^{T}\preceq\Sigma$ , we may conclude that

[TABLE]

Let $G^{*}\sim{\cal N}(0,\Sigma^{*})\in{\cal P}(\mathbf{r})$ , where $\Sigma^{*}\preceq\Sigma$ , be the unique maximizer such that $s_{\epsilon,\delta}(G^{*})=V(\Sigma)$ , as in Lemma 4.6. Thus, we have the sequence of inequalities

[TABLE]

Here, inequality $(a)$ follows from the entropy inequality

[TABLE]

for all $j\in[m]$ . The inequality in $(b)$ is true because the random variable $\tilde{G^{*}}$ defined by $\tilde{G^{*}_{i}}:=G^{*}_{i}+\sqrt{\delta}W_{i}$ for $i\in[k]$ is a Gaussian random variable in ${\cal P}_{g}(\mathbf{r})$ . Thus, by the definition of $M_{g}$ , we must have

[TABLE]

Combining inequalities (51) and (52), we have

[TABLE]

Recall that $s_{\epsilon,\delta}(X)$ is given by

[TABLE]

If $X$ satisfies certain mild conditions (such as bounded second moments) provided in Lemma A.3, we have that

[TABLE]

This means that we may take the limit in inequality (53) as $\epsilon,\delta\to 0$ to conclude

[TABLE]

and conclude the proof of Theorem 3. ∎

5 Conditions for $M(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})<\infty$

Theorem 3 shows that it is enough to find necessary and sufficient conditions for $M_{g}(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ to be finite, since $M=M_{g}$ . We prove Theorem 4 by finding necessary conditions on the BL-EPI datum for such finiteness in Claim 5.1, and showing that the necessary conditions are also sufficient in Claim 5.2.

Claim 5.1.

If $M_{g}(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ is finite, then the conditions in equations (14) and (15) must be satisfied.

Proof.

The necessity of the condition in equation (15) is seen as follows. Choose $Z\sim\lambda{\cal N}(0,I_{n\times n})$ for some $\lambda>0$ . It is easy to see that $\sum_{i=1}^{k}d_{i}h(Z_{i})-\sum_{j=1}^{m}c_{j}h(A_{j}Z)$ scales as $\left(\sum_{i=1}^{k}d_{i}r_{i}-\sum_{j=1}^{m}c_{j}n_{j}\right)\log(\lambda)$ as a function of $\lambda$ as $\lambda\to\infty$ . Since $\lambda$ is arbitrary, the above expression is finite only if the condition in equation (15) is satisfied.

To show that the condition in equation (14) is necessary, let $V$ be a subspace of n of $\mathbf{r}$ -product form. Consider a Gaussian random variable $Z:=(Z_{V},Z_{V^{\perp}})$ such that $Z_{V}\perp\!\!\!\perp Z_{V^{\perp}}$ , and $Z_{V}$ is supported on $V$ and $Z_{V^{\perp}}$ is supported on $V^{\perp}$ . Furthermore, assume $Z_{V}\sim{\cal N}(0,\lambda I_{\dim(V)\times\dim(V)})$ and $Z_{V^{\perp}}\sim{\cal N}(0,I_{\dim(V^{\perp}\times\dim(V^{\perp})}))$ . Taking the limit as $\lambda\to\infty$ and gathering the coefficients of $\log\lambda$ , we see that $M_{g}(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ scales as

[TABLE]

as $\lambda\to\infty$ . Thus, $M_{g}$ is finite only if the condition in equation (14) is satisfied. ∎

The proof of sufficiency of the conditions in equations (14) and (15) relies on two lemmas which we prove below.

Lemma 5.1.

Let $(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ be a BL-EPI datum. Let $U:=(U_{1},\dots,U_{k})$ be an arbitrary $\mathbf{r}$ -product form subspace such that $\dim(U_{i})=\tilde{r}_{i}\leq r_{i}$ for $i\in[k]$ . Let $\tilde{\mathbf{r}}:=(\tilde{r}_{1},\dots,\tilde{r}_{k})$ and $\tilde{\mathbf{r}}^{c}:=\mathbf{r}-\tilde{\mathbf{r}}$ . Define two BL-EPI data as follows:

(a)

$(\tilde{\mathbf{A}},\mathbf{c},\tilde{\mathbf{r}},\mathbf{d})$ * is a BL-EPI datum defined on $U$ . For each $j\in[m]$ , define the linear maps $\tilde{A}_{j}:U\to(A_{j}U)$ by $\tilde{A}_{j}x=A_{j}x$ for $x\in U$ .* 2. (b)

$(\tilde{\tilde{\mathbf{A}}},\mathbf{c},\tilde{\mathbf{r}}^{c},\mathbf{d})$ * is a BL-EPI datum defined on $U^{\perp}$ . For $j\in[m]$ , the linear maps $\tilde{\tilde{A}}_{j}:U^{\perp}\to(A_{j}U)^{\perp}$ are defined by*

[TABLE]

We also define the linear maps $\Gamma_{j}:U^{\perp}\to(A_{j}U)$ as

[TABLE]

Here $\Pi_{V}$ denotes the orthogonal projection on to a subspace $V$ . Note that $A_{j}x=\tilde{\tilde{\mathbf{A}}}x+\Gamma_{j}x$ is an orthogonal decomposition.

Then the following relation holds:

[TABLE]

Remark 5.1.

Note that it may happen that $\dim(U_{i})=0$ for some $i\in[k]$ . It may also happen that for some $j\in[m]$ , we have $\dim((A_{j}U)^{\perp})=0$ . We do not rule out such cases, and keep our notation the same by instead defining entropy on a 0-dimensional subspace as 0.

Proof of Lemma 5.1.

By definition, the linear transformations in $\tilde{\mathbf{A}}$ and $\tilde{\tilde{\mathbf{A}}}$ are surjective. Also, $\sum_{i}\tilde{r}_{i}=\dim(U)$ and $\sum_{i}\tilde{r}^{c}_{i}=\dim(U^{\perp})$ . This verifies that $(\tilde{\mathbf{A}},\mathbf{c},\tilde{\mathbf{r}},\mathbf{d})$ and $(\tilde{\tilde{\mathbf{A}}},\mathbf{c},\tilde{\mathbf{r}}^{c},\mathbf{d})$ are indeed valid BL-EPI data on $U$ and $U^{\perp}$ , respectively. Every vector $x\in^{n}$ may be expressed as $x=\Pi_{U}x+\Pi_{U^{\perp}}x:=\tilde{x}+\tilde{\tilde{x}}$ . We use the notation $\tilde{x}=(\tilde{x}_{1},\dots,\tilde{x}_{k})$ where $\tilde{x}_{i}=\Pi_{U_{i}}x_{i}$ , and similarly for $\tilde{\tilde{x}}_{i}$ . We have the equality

[TABLE]

For any $X\in{\cal P}(\mathbf{r})$ ,

[TABLE]

Taking the supremum over all $X\in{\cal P}(\mathbf{r})$ completes the proof. ∎

Lemma 5.2.

Suppose that a BL-EPI datum $(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ satisfies the conditions in equations (14) and (15), and suppose that $U$ is an $\mathbf{r}$ -product form critical subspace. Then the BL-EPI data $(\tilde{\mathbf{A}},\mathbf{c},\tilde{\mathbf{r}},\mathbf{d})$ and $(\tilde{\tilde{\mathbf{A}}},\mathbf{c},\tilde{\mathbf{r}}^{c},\mathbf{d})$ defined as in Lemma 5.1 also satisfy the conditions in equations (14) and (15).

Proof.

Verifying the conditions for $(\tilde{\mathbf{A}},\mathbf{c},\tilde{\mathbf{r}},\mathbf{d})$ is immediate: the condition in equation (14) restricted to $\tilde{\mathbf{r}}$ product form subspaces of $U$ yields the first condition, and the criticality of $U$ yields the second condition.

For $j\in[m]$ , it is not hard to verify that $\dim(\tilde{\tilde{A}}_{j}U^{\perp})$ is $n_{j}-\dim(\tilde{A}_{j}U)$ . We may now check the second condition for $(\tilde{\tilde{\mathbf{A}}},\mathbf{c},\tilde{\mathbf{r}}^{c},\mathbf{d})$ by observing the equality

[TABLE]

using the criticality of $U$ and the fact that $\sum_{i=1}^{k}d_{i}r_{i}=\sum_{j=1}^{m}c_{j}n_{j}$ . Let $V$ be an arbitrary $\tilde{\mathbf{r}}^{c}$ -product form subspace of $U^{\perp}$ . Consider the new subspace $V_{+}=V\oplus U\subset^{n}$ , which is the direct sum of the subspace $V$ with the subspace $U$ . Note that $V_{+}$ is an $\mathbf{r}$ -product form subspace of n. Using the condition in equation (14) for $V_{+}$ , we have

[TABLE]

Note that $\dim(V_{+i})=\dim(V_{i})+\dim(U_{i})$ , for all $1\leq i\leq k$ . Moreover, $\dim(A_{j}V_{+})=\dim(A_{j}U)+\dim(\tilde{\tilde{A}}_{j}V_{i})$ . Substituting these equalities in the above inequality, we arrive at

[TABLE]

The criticality of $U$ then implies

[TABLE]

and this completes the proof. ∎

We are now in a position to prove the following sufficiency result:

Claim 5.2.

If the conditions in equations (14) and (15) are satisfied, then $M(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ is finite.

Proof.

The proof proceeds via a double induction on the dimension $n$ and the number of linear maps $m$ . We first prove the result for $n=1$ and arbitrary $m$ , and for $m=1$ and arbitrary $n$ . For $n=1$ , it must be that $\mathbf{r}=\{1\}$ and $\mathbf{d}=\{d_{1}\}$ . The conditions in equations (14) and (15) imply that $d_{1}=\sum_{j=1,n_{j}>0}^{m}c_{j}$ , because $n_{j}>0\Longrightarrow n_{j}=1$ . Thus, $M(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ equals

[TABLE]

since $h(A_{j}X)=h(X)+\log|A_{j}|$ for all $j\in[m]$ such that $n_{j}>0$ , and $A_{j}$ is a nonzero scalar for each such $j$ .

Now fix $m=1$ and let $n_{1}>0$ , $k>0$ , $\mathbf{r}$ , $\mathbf{d}$ , $c_{1}$ , and $n=\sum_{i=1}^{k}r_{i}$ be arbitrary, subject to satisfying the conditions in equations (14) and (15). We write

[TABLE]

where $A_{1i}$ is an $n_{1}\times r_{i}$ matrix for $1\leq i\leq k$ (and $A$ is an $n_{1}\times n$ matrix). Recall that, by assumption, $d_{i}>0$ for $1\leq i\leq k$ and $c_{1}>0$ .

Let $\mathcal{N}(A_{1})$ denote the null space of $A_{1}$ . For every $\mathbf{r}$ -product form subspace $V:=V_{1}\times\ldots\times V_{k}$ we must have $\mathcal{N}(A_{1})\cap V_{i}=\{0\}$ for all $1\leq i\leq k$ . This is because if we have $0\neq v_{i}\in\mathcal{N}(A_{1})\cap V_{i}$ for some $1\leq i\leq k$ , then letting $V_{i}:=\mbox{span}(\{v_{i}\})$ and $V_{j}=\{0\}$ for $1\leq j\neq i\leq k$ , the corresponding $\mathbf{r}$ -product form subspace $V:=V_{1}\times\ldots\times V_{k}$ will violate the condition $\sum_{i=1}^{k}d_{i}s_{i}\leq\mbox{dim}(A_{1}V)$ , where $s_{i}=1=\mbox{dim}(V_{i})$ and $s_{j}=0=\mbox{dim}(V_{j})$ for $1\leq j\neq i\leq k$ .

We can therefore assume that $\mbox{rk}(A_{1i})=r_{i}$ for $1\leq i\leq k$ . Under this assumption, we will now show that $M_{g}<\infty$ , where $M_{g}$ denotes the supremum of

[TABLE]

over independent $X_{i}\sim\mathcal{N}(0,\Sigma_{i})$ taking values in $\mathbb{R}^{r_{i}}$ with $\Sigma_{i}$ positive definite for each for $1\leq i\leq k$ , and where

[TABLE]

We have

[TABLE]

for $1\leq i\leq k$ , and

[TABLE]

It is therefore equivalent to show that the supremum of

[TABLE]

over $\Sigma_{i}\in\mathbb{R}^{r_{i}\times r_{i}}$ positive definite for each for $1\leq i\leq k$ is finite.

Let $A_{1i}=W_{i}\Lambda_{i}U_{i}^{T}$ be a singular value decomposition of $A_{1i}$ for $1\leq i\leq k$ . Since $\mbox{rk}(A_{1i})=r_{i}$ by assumption, here $\Lambda_{i}$ is a diagonal $r_{i}\times r_{i}$ matrix with strictly positive diagonal entries, $U_{i}$ is an $r_{i}\times r_{i}$ orthogonal matrix and $W_{i}$ is an $n_{1}\times r_{i}$ matrix with orthonormal columns. Note that span of the columns of $W_{i}$ equals the range space of $A_{1i}$ .

With $\tilde{\Sigma}_{i}$ denoting $U_{i}\Sigma_{i}U_{i}^{T}$ for $1\leq i\leq k$ , it is equivalent to show that the supremum of

[TABLE]

over $\tilde{\Sigma}_{i}\in\mathbb{R}^{r_{i}\times r_{i}}$ positive definite for each for $1\leq i\leq k$ is finite.

Note that the entries of $\Lambda_{i}$ depend only on $A_{1i}$ , which is fixed, and note that the $d_{i}$ are fixed. Therefore, with $\hat{\Sigma}_{i}$ denoting $\Lambda_{i}\tilde{\Sigma}_{i}\Lambda_{i}$ , it is equivalent to show that the supremum of

[TABLE]

over $\hat{\Sigma}_{i}\in\mathbb{R}^{r_{i}\times r_{i}}$ positive definite for each for $1\leq i\leq k$ is finite. Let $\hat{\Sigma}_{i}=\hat{Q}_{i}\Pi_{i}\hat{Q}_{i}^{T}$ be the spectral-decomposition of $\hat{\Sigma}_{i}$ and let $\sigma_{i1},\ldots,\sigma_{1r_{i}}$ denote the eigenvalues of $\hat{\Sigma}_{i}$ in any order. By assumption these are all strictly positive. Let

[TABLE]

denote the ordered list of all the distinct values among these eigenvalues (note that $n=\sum_{i=1}^{k}r_{i}$ , so here $1\leq n^{\prime}\leq n$ ).

Starting with $\sigma_{n^{\prime}}$ and working towards the larger eigenvalues step by by step we can build up each $\hat{\Sigma}_{i}$ , for $1\leq i\leq k$ , in layer-cake fashion as

[TABLE]

where each $\hat{\Sigma}_{il}$ for $1\leq l\leq n^{\prime}$ is a positive semidefinite matrix, with a spectral decomposition given by $\hat{Q}_{i}\Pi_{il}\hat{Q}_{i}^{T}$ , and each of whose eigenvalues is either [math] or $\sigma_{j}-\sigma_{j+1}$ (recalling the convention that $\sigma_{n^{\prime}+1}=0$ ). Thus each $\hat{\Sigma}_{il}$ corresponds to a subspace of $\mathbb{R}^{r_{i}}$ , whose dimension we denote as $s_{il}$ . Note that $s_{in^{\prime}}=r_{i}$ and $s_{il}$ is nonincreasing as $l$ decreases, but it can become [math] for $l<n^{\prime}$ ; however we have $s_{i1}>0$ for at least one choice of $1\leq i\leq k$ . We also have

[TABLE]

Observe that $\log\frac{\sigma_{l}}{\sigma_{l+1}}$ is strictly positive for $1\leq l\leq n^{\prime}-1$ .

Let $\hat{V}_{il}$ denote the subspace of $\mathbb{R}^{r_{i}}$ corresponding to $\hat{\Sigma}_{il}$ , i.e. the subspace spanned by the eigenvectors of $\hat{\Sigma}_{il}$ . Then $\tilde{V}_{il}:=\Lambda_{i}^{-1}\hat{V}_{il}$ is the subspace corresponding to $\tilde{\Sigma}_{il}$ in the same sense, where $\tilde{\Sigma}_{il}:=\Lambda_{i}^{-1}\hat{\Sigma}_{il}\Lambda_{i}^{-1}$ , and $V_{il}:=U_{i}^{T}\tilde{V}_{il}$ is the subspace corresponding to $\Sigma_{il}$ in the same sense, where $\Sigma_{il}:=U_{i}^{T}\tilde{\Sigma}_{il}U_{i}$ . Note that

[TABLE]

By assumption, for each $1\leq l\leq n^{\prime}$ we therefore have

[TABLE]

where $V_{l}:=V_{1l}\times\ldots\times V_{kl}$ is an $\mathbf{r}$ -product subspace of $\mathbb{R}^{n}$ .

For each $1\leq l\leq n^{\prime}$ , since $\sum_{i=1}^{k}A_{1i}\Sigma_{il}A_{1i}^{T}=\sum_{i=1}^{k}W_{i}\hat{\Sigma}_{il}W_{i}^{T}$ , we see that the subspace corresponding to $\sum_{i=1}^{k}W_{i}\hat{\Sigma}_{il}W_{i}^{T}$ is $A_{1}V_{l}$ . In particular, the subspace corresponding to $\sum_{i=1}^{k}W_{i}\hat{\Sigma}_{in^{\prime}}W_{i}^{T}$ is $\mathbb{R}^{n_{1}}=A_{1}V_{n^{\prime}}=A_{1}\mathbb{R}^{n}$ .

We also note that for each $1\leq i\leq k$ we have

[TABLE]

Since $\hat{\Sigma}_{i}=\hat{Q}_{i}\Pi_{i}\hat{Q}_{i}^{T}=\sum_{m=1}^{r_{i}}\sigma_{im}\hat{q}_{im}\hat{q}_{im}^{T}$ , let us relabel the eigenvectors into $b_{im}$ (according to decreasing values of the eigenvalues) such that we have

[TABLE]

where we recall that $\sigma_{n^{\prime}+1}=0$ by definition. We can also write

[TABLE]

where $\tilde{b}_{iu_{i}}:=W_{i}b_{iu_{i}}$ for $1\leq i\leq k$ and $1\leq u_{i}\leq r_{i}$ . Note that $\tilde{b}_{iu_{i}}\in\mathbb{R}^{n_{1}}$ .

Now we have

[TABLE]

where $M_{l}:=\sum_{i=1}^{k}\sum_{u_{i}=1}^{s_{il}}\tilde{b}_{iu_{i}}\tilde{b}_{iu_{i}}^{T}$ . Note that the subspace corresponding to $M_{l}$ is $A_{1}V_{l}$ . Since the range space of $M_{l}$ is non-decreasing, there exists an orthonormal basis $\tilde{q}_{1},...,\tilde{q}_{n_{1}}$ for $\mathbb{R}^{n_{1}}$ such that the range space of $M_{l}$ matches the span of $\{q_{i}\}_{i\in S_{l}}$ for some appropriate $S_{l}\subseteq[1:n_{1}]$ . Thus $\textrm{dim}(A_{1}V_{l})=|S_{l}|$ .

By construction we have $S_{1}\subseteq S_{2}\subseteq\cdots\subseteq S_{n^{\prime}}=[1:n_{1}]$ . Let $C_{l}=\sum_{i\in S_{l}}\tilde{q}_{i}\tilde{q}_{i}^{T}=\tilde{Q}\Theta_{l}\tilde{Q}^{T}$ where $\tilde{Q}$ is the orthonormal matrix formed by $\tilde{q}$ ’s and $\Theta_{l}$ is a diagonal matrix with diagonal entries being [math] or $1$ , where $1$ occurs at the indices corresponding to the membership in $S_{l}$ .

We now claim that there is positive constant $\delta^{2}>0$ depending only on $W_{1},\ldots,W_{k}$ (and in particular not depending on the $(\hat{\Sigma}_{i},1\leq i\leq k)$ or the choices of the bases $\{b_{i1},b_{i2},\ldots,b_{ir_{i}}\}$ for $1\leq i\leq k$ ) such that, for all $1\leq l\leq n^{\prime}$ , we have

[TABLE]

This is a consequence of Lemma B.1 and is established in Corollary B.1.

We therefore have

[TABLE]

From this it follows that

[TABLE]

for a fixed constant $\kappa$ . Here, to justify step (a), due to the nested nature of $S_{l}$ , $\sum_{l=1}^{n^{\prime}}(\sigma_{l}-\sigma_{l+1})\Theta_{l}$ is a diagonal matrix with $\mbox{dim}(AV_{l})-\mbox{dim}(AV_{l-1})$ entries equal to $\sigma_{l}$ . We take $\mbox{dim}(AV_{0})=0$ .

Since $\sum_{i=1}^{k}d_{i}s_{il}\leq c_{1}\mbox{dim}(AV_{l})$ and $\log\frac{\sigma_{l}}{\sigma_{l+1}}$ is strictly positive for $1\leq l\leq n^{\prime}-1$ , and since $\sum_{i=1}^{k}d_{i}r_{i}=c_{1}n_{1}$ , we can conclude that

[TABLE]

for all choices of $\hat{\Sigma}_{i}\in\mathbb{R}^{r_{i}\times r_{i}}$ positive definite for each for $1\leq i\leq k$ . This establishes what was desired, when $m=1$ .

We have shown that the claim is true for $n=1$ and all $m$ . Assume that claim is true for all $n<n_{0}$ and all $m$ . Our goal is to establish the claim for $n=n_{0}$ and all $m>0$ . To do so, we induct on $m$ . The case of $n=n_{0}$ and $m=1$ follows from our calculations above. Now we assume that the claim is true for $n=n_{0}$ and all $m<m_{0}$ , and show that it also holds for $n=n_{0}$ and $m=m_{0}$ .

Let $(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ be a BL-EPI datum in ${}^{n_{0}}$ with $m=m_{0}$ . We may assume that $n_{j}>0$ for all $j\in[m]$ , since otherwise we could have treated the scenario as a BL-EPI datum in ${}^{n_{0}}$ with $m<m_{0}$ , which is already covered by the inductive hypothesis. For fixed $\mathbf{A}$ , $\mathbf{r}$ , and $\mathbf{d}$ , consider the function defined on $\mathbf{c}\in_{+}^{m_{0}}$ as

[TABLE]

Since $M$ is a pointwise supremum of linear functions, $M$ is convex. Let ${\cal K}$ be the region of all $\mathbf{c}\in_{+}^{m_{0}}$ such that $(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ satisfy the conditions in equations (14) and (15). Note that ${\cal K}$ is a compact, convex set. By Claim 5.1, we have that $M$ takes $+\infty$ values outside ${\cal K}$ . We wish to show that $M$ takes finite values everywhere on ${\cal K}$ . Since $M$ is convex and ${\cal K}$ is closed, it is enough to show finiteness of $M$ at all points on the boundary of ${\cal K}$ . Since $n_{j}>0$ for all $j\in[m]$ , a point $\mathbf{c}$ is a boundary point of ${\cal K}$ if and only if at least one of the following two conditions is satisfied: (1) $c_{j_{0}}=0$ for some $j_{0}\in[m]$ ; or (2) there exists a proper $\mathbf{r}$ -product form subspace of ${}^{n_{0}}$ that is critical. If a boundary point satisfies (1), then our induction assumption (on $m$ ) ensures the finiteness of $M$ evaluated at that BL-EPI datum, since we could have treated the scenario as a BL-EPI datum in ${}^{n_{0}}$ with $m<m_{0}$ .

Now consider a boundary point that satisfies (2), assuming that $c_{j}\neq 0$ for all $j\in[m]$ . Let $V=(V_{1},\dots,V_{k})$ be an $\mathbf{r}$ -product form critical subspace of ${}^{n_{0}}$ ; i.e., a subspace that satisfies the equality

[TABLE]

with $\dim(V)<n_{0}.$ Lemma 5.1 shows that given any $\mathbf{r}$ -product form subspace $V$ , it is possible to define BL-EPI data on $V$ and $V^{\perp}$ in terms of the original BL-EPI datum $(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ that satisfy a certain subadditivity property. In particular, if the datum on $V$ is denoted by $(\tilde{\mathbf{A}},\mathbf{c},\tilde{\mathbf{r}},\mathbf{d})$ and that on $V^{\perp}$ is denoted by $(\tilde{\tilde{\mathbf{A}}},\mathbf{c},\tilde{\mathbf{r}}^{c},\mathbf{d})$ , then Lemma 5.1 states that

[TABLE]

Thus, to show that $M(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})$ is finite, is enough to show that $M(\tilde{\mathbf{A}},\mathbf{c},\tilde{\mathbf{r}},\mathbf{d})$ and $M(\tilde{\tilde{\mathbf{A}}},\mathbf{c},\tilde{\mathbf{r}}^{c},\mathbf{d})$ are finite. Lemma 5.2 asserts that since $V$ is a critical $\mathbf{r}$ -product form subspace, the BL-EPI data $(\tilde{\mathbf{A}},\mathbf{c},\tilde{\mathbf{r}},\mathbf{d})$ and $(\tilde{\tilde{\mathbf{A}}},\mathbf{c},\tilde{\mathbf{r}}^{c},\mathbf{d})$ satisfy both the conditions in equations (14) and (15). Since $\dim(V),\dim(V^{\perp})<n_{0}$ , we may use the induction assumption (on the dimension) to assert $M(\tilde{\mathbf{A}},\mathbf{c},\tilde{\mathbf{r}},\mathbf{d})<\infty$ and $M(\tilde{\tilde{\mathbf{A}}},\mathbf{c},\tilde{\mathbf{r}}^{c},\mathbf{d})<\infty$ , and conclude the proof. ∎

6 A special case

We examine a special case here to see what kinds of new inequalities may result from Theorem 3. Let $X_{1},X_{2},$ and $Y$ be real valued random variables such that $(X_{1},X_{2})\perp\!\!\!\perp Y$ . We would like to lower bound the entropy $h(X_{1}+Y,X_{2}+Y)$ . Note that the regular EPI applied with the independent random vectors $(X_{1},X_{2})$ and $(Y,Y)$ yields the trivial lower bound

[TABLE]

Note also that

[TABLE]

However, it is not possible to use Zamir and Feder’s EPI to provide lower bounds on $h(X_{1}+Y,X_{2}+Y)$ because of the dependency between $X_{1}$ and $X_{2}$ . We show that Theorem 3 may be used to obtain a family of nontrivial lower bounds that account for this dependency.

Lemma 6.1.

Let $\alpha,\beta,\delta_{1},\delta_{2}\geq 0$ . Consider the inequality

[TABLE]

where $C(\alpha,\beta,\delta_{1},\delta_{2})$ is some constant that depends only on $\alpha,\beta,$ $\delta_{1},\delta_{2}$ . The above inequality holds for all $(X_{1},X_{2})\perp\!\!\!\perp Y$ if and only if $\alpha,\beta,\delta_{1},\delta_{2}$ satisfy the following inequalities:

$2\alpha+\beta=2+\delta_{1}+\delta_{2}$ ; 2. 2.

$\beta\leq 1$ ; 3. 3.

$\alpha\leq 1+\delta_{1}$ , and $\alpha\leq 1+\delta_{2}$ ; 4. 4.

$\alpha+\beta\leq 1+\delta_{1}+\delta_{2}$ , which, combined with condition (1), is equivalent to $\alpha\geq 1$ .

Proof.

We shall use Theorem 4 to show this result. The above inequality is easily seen to be of the form in Theorem 3, where $A_{1}=[1,0,1;0,1,1]$ , $A_{2}=[1,0,0]$ , $A_{3}=[0,1,0]$ , ${\mathbf{r}}=(2,1)$ , $d_{1}=\alpha$ , and $d_{2}=\beta$ . An exhaustive search of all possible subspaces $V$ that are in $\mathbf{r}$ -product form where $\mathbf{r}=(2,1)$ is not hard to do. For simplicity, we refer to the axes in 3 as $X_{1},X_{2},Y$ . Thus, the subspace $X_{1}$ is simply the subspace spanned by $(1,0,0)$ .

Equality (1) follows directly from equation (15) of Theorem 4; 2. 2.

Inequality (2) follows from equation (14) of Theorem 4, by choosing $V=\phi\times Y$ ; 3. 3.

Inequality (3) follows from equation (15) of Theorem 4, by choosing $V=X_{1}\times\phi$ and $V=X_{2}\times\phi$ ; 4. 4.

Inequality (4) is obtained from equation (15) of Theorem 4, by a careful choice of $V=(X_{1}+X_{2})\times Y$ , i.e. the subspace spanned by $(1,1,0)$ and $(0,0,1)$ .

∎

Claim 6.1.

For $\alpha,\beta<1,\delta_{1}=\delta_{2}=\delta$ satisfying the conditions in Lemma 6.1, the following inequality holds:

[TABLE]

where

[TABLE]

Proof.

For $\alpha,\beta,\delta_{1},\delta_{2}$ , the optimal constant $C$ is given by

[TABLE]

Calculating the above supremum for arbitrary $\alpha,\beta,\delta_{1},\delta_{2}$ is cumbersome so we assume $\delta_{1}=\delta_{2}=\delta.$ The supremum simplifies to

[TABLE]

For a fixed $K_{1}K_{2}$ and fixed $K_{3}$ , it is clear that the optimal choice of $K_{1}=K_{2}=\sqrt{K_{1}K_{2}}$ maximizes the above expression. Thus, we assume that $K_{1}=K_{2}=K$ and obtain

[TABLE]

Let $x:=K_{3}/K$ , and noting that $2\alpha-2\delta-1=1-\beta$ , we obtain

[TABLE]

For a fixed $\rho$ , the maximum of the above expression is attained when

[TABLE]

Substituting this value of $x$ ,

[TABLE]

Differentiating with respect to $\rho$ , the supremum is seen to be attained when $\rho=\frac{\beta}{2\alpha+\beta-2}=\frac{\beta}{2\delta}.$ Substituting this, we get

[TABLE]

This leads to the entropy inequality

[TABLE]

Notice that the mutual information term $I(X_{1};X_{2})$ accounts for the dependency between $X_{1}$ and $X_{2}$ .∎

7 Conclusion

In this paper, we established a new inequality that unifies the BLI and the EPI by establishing subadditivity of certain entropic functionals. There are several interesting research directions that are worth pursuing. We did not address the questions of extremizability and uniqueness of extremizers in this work. One reason for this is that Theorem 3 is established by taking the limit as $\epsilon$ and $\delta$ go to 0. When $\epsilon$ and $\delta$ are strictly bounded away from 0, the extremizer of $s_{\epsilon,\delta}(\cdot)$ under a covariance constraint exists and is a unique Gaussian distribution. However, these existence and uniqueness properties need not hold in the limit as $\epsilon,\delta\to 0$ . In general, such a proof strategy is a powerful tool for proving inequalities, but may not always succeed in identifying necessary and sufficient conditions for equality. For this reason, alternate proof strategies that rely on heat flow based arguments [17, 13, 16] or optimal transport methods [21, 36] are worth exploring as well. After a preprint of this work appeared online, an optimal transport-based proof of Theorem 3 was discovered in Courtade [37]. Shortly thereafter, Courtade and Liu [38] proved Theorem 3 as a limiting case of the forward-reverse Brascamp-Lieb inequality [20] and gave an alternate proof of Theorem 4.

Finally, although our results generalize the BLI and the EPI to vector random variables with more general independence properties, these independence properties are still quite restrictive. For instance, the inequalities we derived do not encompass the monotonicity of entropy power family of results [39, 40, 41]. It would be interesting to generalize our inequalities to include the above family as well. Another (related) direction to pursue would be to establish similar entropy inequalities under weaker independence conditions.

Acknowledgements

The research of VA was supported by the NSF grants CNS-1527846, CCF-1618145, CCF-1901004, CIF-2007965, the NSF Science & Technology Center grant CCF-0939370 (Science of Information), and the William and Flora Hewlett Foundation supported Center for Long Term Cybersecurity at Berkeley. VJ acknowledges support from NSF grants CCF-1841190 and CCF-1907786, and is grateful to the Department of Information Engineering at CUHK for hosting him in July 2018, when a part of this work was done. The research of CN was supported by GRF grants 14303714, 14231916, 14206518 and a discretionary fund of the Vice Chancellor of CUHK.

Appendix A Supporting results for Theorem 3

Lemma A.1.

Let $X,Y,$ and $Z$ be random variables taking values in ${}^{n_{X}},^{n_{Y}},$ and ${}^{n_{Z}}$ respectively, such that the following hold: (a) $(X,Y,Z)$ has a strictly positive density on ${}^{n_{X}+n_{Y}+n_{Z}}$ ; (b) $X\rightarrow Y\rightarrow Z$ ; and (c) $X\rightarrow Z\rightarrow Y$ . Then $X\perp\!\!\!\perp(Y,Z)$ .

Proof.

For any $x\in^{n_{X}}$ , $y\in^{n_{Y}}$ , and $z\in^{n_{Z}}$ , we have that

[TABLE]

where we used the assumed strict positivity of the density of $(X,Y,Z)$ to write the above equations. Fix $y_{0}\in^{n_{Y}}$ . For any $z\in^{n_{Z}}$ , we have

[TABLE]

Integrating both sides of the above equality with respect to $p_{Z}(z)$ , we obtain

[TABLE]

Since $y_{0}$ was chosen arbitrarily, we conclude that $X\perp\!\!\!\perp Y$ . A similar argument shows that $X\perp\!\!\!\perp Z$ . Using equation (58), we conclude that $X\perp\!\!\!\perp(Y,Z)$ . ∎

Lemma A.2.

Let $X_{1}$ and $X_{2}$ be n-valued random variables and let $(Z_{1},Z_{2})\perp\!\!\!\perp(X_{1},X_{2})$ be such that $(Z_{1},Z_{2})\sim{\cal N}(0,I_{2n\times 2n})$ . If $(X_{1}+Z_{1})\perp\!\!\!\perp(X_{2}+Z_{2})$ , then $X_{1}\perp\!\!\!\perp X_{2}$ .

Proof.

Using the independence of $(X_{1}+Z_{1})$ and $(X_{2}+Z_{2})$ , we have that for any $t_{1},t_{2}\in^{n}$ ,

[TABLE]

However, using the independence $(X_{1},X_{2})\perp\!\!\!\perp(Z_{1},Z_{2})$ , we also have

[TABLE]

Since $\phi_{Z_{1},Z_{2}}(\cdot,\cdot)$ has no zeros ( $Z_{i}$ ’s being independent standard Gaussian random variables), we conclude that

[TABLE]

that is, $X_{1}\perp\!\!\!\perp X_{2}$ . ∎

Lemma A.3.

Let $X$ be an n-valued random variable with density $p_{X}(x)$ and $Z\sim{\cal N}(0,I_{n\times n})$ be independent of $X$ . Suppose that $\mathbb{E}[\Psi(X)]<\infty$ for some nonnegative continuous function $\Psi:^{n}\mapsto$ , satisfying $\int_{{}^{n}}e^{-\Psi(x)}dx<\infty$ and $\lim_{\delta\to 0}\mathbb{E}[\Psi(X+\sqrt{\delta}Z)]=\mathbb{E}[\Psi(X)]$ . (Note that, for instance, $\Psi(X)=\|X\|_{p},p\geq 1$ satisfies the conditions.) Then the following equality holds:

[TABLE]

Proof.

Our proof relies on the following (lower semi-continuity) result from Posner [42, Theorem 1]: If $P_{m},Q_{m}$ are Borel probability distributions on a Polish space with $P_{m}\stackrel{{\scriptstyle w}}{{\Rightarrow}}P$ and $Q_{m}\stackrel{{\scriptstyle w}}{{\Rightarrow}}Q$ , then

[TABLE]

where $D(P\|Q)$ denotes the relative entropy of the distribution $P$ with respect to the distribution $Q$ . Picking an arbitrary sequence $\{\delta_{m}\}_{m\geq 1}$ that converges to [math], let $X_{m}=X+\sqrt{\delta}_{m}Z.$ Using characteristic functions (or otherwise), it is easy to check that $X_{m}$ converges to $X$ in distribution. Let $P_{m}$ denote the distribution of $X_{m}$ and $P$ denote the distribution of $X$ . Let $Q_{m}=Q$ be the distribution corresponding to the density function $Ce^{-\Psi(x)}$ . Note that

[TABLE]

Therefore, we have

[TABLE]

Here $(a)$ follows from the Posner’s result and $(b)$ follows from assumption (2). Hence

[TABLE]

On the other hand, non-negativity of mutual information, $I(Z;X+\sqrt{\delta_{m}}Z)\geq 0$ , yields $h(X+\sqrt{\delta}_{m}Z)\geq h(X).$ Taking the $\liminf$ on both sides of this equality, we conclude

[TABLE]

Inequalities (68) and (69) yield the equality

[TABLE]

and concludes the proof. ∎

Appendix B Supporting results for Claim 5.2

Lemma B.1.

Given subspaces $K_{i}\subseteq\mathbb{R}^{r_{i}}$ for $1\leq i\leq k$ , with $s_{i}:=\mbox{dim}(K_{i})$ , let $K:=K_{1}\times\ldots\times K_{k}$ denote the corresponding $\mathbf{r}$ -product subspace of $\mathbb{R}^{n}$ , where $n:=\sum_{i=1}^{k}r_{i}$ . Let $A_{1}=\left[A_{11}\ldots A_{1k}\right]$ , with $A_{1i}$ an $n_{1}\times r_{i}$ matrix of rank $r_{i}$ for $1\leq i\leq k$ as above. Then there is some $\eta>0$ such that for all choices of $(K_{i},1\leq i\leq k)$ where at least one $s_{i}$ is strictly positive, for all unit vectors $x\in A_{1}K$ (i.e. $x^{T}x=1$ ), there exists some unit vector $v_{i}\in A_{1i}K_{i}$ for some $1\leq i\leq k$ such that $|x^{T}v_{i}|\geq\eta$ .

Proof.

Suppose to the contrary that we can find a sequence $((x(t),(K_{1}(t)\ldots,K_{k}(t))),t\geq 1)$ of unit vectors and subspaces that violates the condition, i.e. such that

[TABLE]

By going to a subsequence if necessary we can assume that there exist some choices of $1\leq s_{i}\leq r_{i}$ for $1\leq i\leq k$ with at least one of the $s_{i}$ being strictly positive, such that we have $\mbox{dim}(K_{i}(t))=s_{i}$ for all $t\geq 1$ . Since the space of all $s_{i}$ -dimensional subspaces of $\mathbb{R}^{r_{i}}$ is compact in the usual topology (i.e. as the corresponding Grassmanian), by going to a further subsequence if necessary we can assume that each $K_{i}(t)$ converges to a limit $K_{i}$ as $t\to\infty$ , where $\mbox{dim}(K_{i})=s_{i}$ . Since the set of unit vectors in $\mathbb{R}^{n_{1}}$ is compact, by going to a further subsequence if necessary we can assume that $x(t)$ converges to a unit vector $x\in\mathbb{R}^{n_{1}}$ as $t\to\infty$ . Since we have $x(t)\in A_{1}K(t)$ for all $t\geq 1$ (where $K(t):=K_{1}(t)\times\ldots\times K_{k}(t)$ ), we must have $x\in A_{1}K$ (where $K:=K_{1}\times\ldots\times K_{k}$ ). We thus have $x^{T}v_{i}=0$ for all unit vectors $v_{i}\in A_{1}K_{i}$ for all $1\leq i\leq k$ . But this is a contradiction, because $x$ is itself in the linear span of such vectors. ∎

Corollary B.1.

There is positive constant $\delta^{2}>0$ depending only on $W_{1},\ldots,W_{k}$ (and in particular not depending on the $(\hat{\Sigma}_{i},1\leq i\leq k)$ or the choices of the bases $\{b_{i1},b_{i2},\ldots,b_{ir_{i}}\}$ for $1\leq i\leq k$ ) such that, for all $1\leq l\leq n^{\prime}$ , we have

[TABLE]

where $C_{l}$ is a positive semidefinite matrix all of whose eigenvalues are either [math] or $1$ and where the subspace corresponding to $C_{l}$ is $A_{1}V_{l}$ .

Proof.

Let $\eta>0$ be as in the Lemma. For each unit vector $x\in A_{1}V_{l}$ there exists some $1\leq i\leq k$ and a unit vector $v_{i}\in A_{1}V_{il}$ such that $|x^{T}v_{i}|\geq\eta$ . Since $\{\tilde{b}_{i1},\ldots,\tilde{b}_{is_{il}}\}$ is an orthonormal basis for $A_{1}V_{il}$ , This means means that there is some $1\leq u_{i}\leq s_{il}$ such that $|x^{T}\tilde{b}_{iu_{i}}|\geq\delta$ , where we define $\delta:=\frac{1}{n}\eta$ and we have used $s_{il}\leq r_{i}\leq n$ . Recalling that $M_{l}:=\sum_{i=1}^{k}\sum_{u_{i}=1}^{s_{il}}\tilde{b}_{iu_{i}}\tilde{b}_{iu_{i}}^{T}$ , it follows that

[TABLE]

Since this holds for all unit vectors $x\in A_{1}V_{l}$ , this proves the corollary. ∎

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] C. E. Shannon. A mathematical theory of communication, I and II. Bell System Technical Journal , 27:379–423, 1948.
2[2] N. Blachman. The convolution inequality for entropy powers. IEEE Transactions on Information Theory , 11(2):267–271, 1965.
3[3] M. Costa. A new entropy power inequality. IEEE Transactions on Information Theory , 31(6):751–760, 1985.
4[4] R. Zamir and M. Feder. A generalization of the entropy power inequality with applications. IEEE Transactions on Information Theory , 39(5):1723–1728, 1993.
5[5] T. A. Courtade. A strong entropy power inequality. IEEE Transactions on Information Theory , 64(4):2173–2192, 2018.
6[6] Y. Polyanskiy and Y. Wu. Strong data-processing inequalities for channels and Bayesian networks. In Convexity and Concentration , pages 211–249. Springer, 2017.
7[7] Y. Polyanskiy and Y. Wu. Wasserstein continuity of entropy and outer bounds for interference channels. IEEE Transactions on Information Theory , 62(7):3992–4002, 2016.
8[8] A. J. Stam. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Information and Control , 2(2):101–112, 1959.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Unifying the Brascamp-Lieb Inequality and the Entropy Power Inequality

Abstract

1 Introduction

Entropy power inequality:

Brascamp-Lieb inequality:

Theorem 1** (Functional form of the BLI).**

Theorem 2** (Information-theoretic form of the BLI, Theorem 2.1 in Carlen and Cordero-Erausqin [16]).**

Our contributions:

Related work:

Structure of the paper:

2 Preliminaries and notation

Definition 1**.**

Remark 2.1**.**

Definition 2** (BL datum).**

Definition 3** (EPI datum).**

Definition 4** (BL-EPI datum).**

Definition 5**.**

Remark 2.2**.**

Definition 6**.**

Definition 7**.**

Definition 8**.**

3 Main results

Theorem 3** (Unified EPI and BLI).**

Theorem 4**.**

Entropy Power Inequality:

Brascamp-Lieb Inequality:

Zamir and Feder’s Inequality:

4 Proof of Theorem 3

4.1 Proving the EPI via subadditivity

4.2 Subadditivity lemma

4.2.1 Preliminaries

Definition 9**.**

4.2.2 Proof of subadditivity

Lemma 4.1** (Subadditivity lemma).**

Corollary 4.1**.**

Proof of Lemma 4.1.

Proof of Corollary 4.1.

Lemma 4.2** (Independence relations).**

Proof.

4.2.3 A general subadditivity result

Claim 4.1**.**

Proof.

4.3 Proof of Theorem 3

Definition 10**.**

Lemma 4.3**.**

Proof of Lemma 4.3.

Lemma 4.4**.**

Proof.

Lemma 4.5**.**

Proof.

Lemma 4.6**.**

Proof.

Proof of Theorem 3.

5 Conditions for M(A,c,r,d)<∞M(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})<\inftyM(A,c,r,d)<∞

Claim 5.1**.**

Proof.

Lemma 5.1**.**

Remark 5.1**.**

Proof of Lemma 5.1.

Lemma 5.2**.**

Proof.

Claim 5.2**.**

Proof.

6 A special case

Lemma 6.1**.**

Proof.

Claim 6.1**.**

Proof.

7 Conclusion

Acknowledgements

Appendix A Supporting results for Theorem 3

Lemma A.1**.**

Proof.

Theorem 1 (Functional form of the BLI).

Theorem 2 (Information-theoretic form of the BLI, Theorem 2.1 in Carlen and Cordero-Erausqin [16]).

Definition 1.

Remark 2.1.

Definition 2 (BL datum).

Definition 3 (EPI datum).

Definition 4 (BL-EPI datum).

Definition 5.

Remark 2.2.

Definition 6.

Definition 7.

Definition 8.

Theorem 3 (Unified EPI and BLI).

Theorem 4.

Definition 9.

Lemma 4.1 (Subadditivity lemma).

Corollary 4.1.

Lemma 4.2 (Independence relations).

Claim 4.1.

Definition 10.

Lemma 4.3.

Lemma 4.4.

Lemma 4.5.

Lemma 4.6.

5 Conditions for $M(\mathbf{A},\mathbf{c},\mathbf{r},\mathbf{d})<\infty$

Claim 5.1.

Lemma 5.1.

Remark 5.1.

Lemma 5.2.

Claim 5.2.

Lemma 6.1.

Claim 6.1.

Lemma A.1.

Lemma A.2.

Lemma A.3.

Lemma B.1.

Corollary B.1.