Exploring the bounds on the positive semidefinite rank

Andrii Riazanov; Mikhail Vyalyiy

arXiv:1704.06507·cs.CC·April 24, 2017

Exploring the bounds on the positive semidefinite rank

Andrii Riazanov, Mikhail Vyalyiy

PDF

Open Access

TL;DR

This paper investigates the limitations of existing bounds on the positive semidefinite rank of matrices related to polytopes, showing they cannot produce exponential lower bounds on extension complexity, and relates these bounds to the matrix's regular rank.

Contribution

It proves that current bounds on PSD-rank are polynomially bounded by the regular rank, providing new insights into extension complexity limitations.

Findings

01

Existing bounds are upper bounded by polynomial functions of regular rank.

02

No exponential lower bounds on PSD-rank can be derived from current bounds.

03

An upper bound on mutual information based on regular rank is established.

Abstract

The nonnegative and positive semidefinite (PSD-) ranks are closely connected to the nonnegative and positive semidefinite extension complexities of a polytope, which are the minimal dimensions of linear and SDP programs which represent this polytope. Though some exponential lower bounds on the nonnegative and PSD- ranks has recently been proved for the slack matrices of some particular polytopes, there are still no tight bounds for these quantities. We explore some existing bounds on the PSD-rank and prove that they cannot give exponential lower bounds on the extension complexity. Our approach consists in proving that the existing bounds are upper bounded by the polynomials of the regular rank of the matrix, which is equal to the dimension of the polytope (up to an additive constant). As one of the implications, we also retrieve an upper bound on the mutual information of an arbitrary…

Equations83

I (A : B) = H (A) + H (B) - H (A, B) = H (A) - H (A ∣ B) = H (B) - H (B ∣ A),

I (A : B) = H (A) + H (B) - H (A, B) = H (A) - H (A ∣ B) = H (B) - H (B ∣ A),

rank_{psd} M \geq B_{2} (P) = 2^{I (A : B)} .

rank_{psd} M \geq B_{2} (P) = 2^{I (A : B)} .

rank_{psd} M \geq B_{3} (M) = {q_{i}}_{i = 1}^{m} max \frac{1}{\sum _{i, j = 1}^{m} q _{i} q _{j} F ( M _{i} , M _{j} ) ^{2}}

rank_{psd} M \geq B_{3} (M) = {q_{i}}_{i = 1}^{m} max \frac{1}{\sum _{i, j = 1}^{m} q _{i} q _{j} F ( M _{i} , M _{j} ) ^{2}}

rank_{psd} M \geq B_{4} (M) = i = 1 \sum n j max M (i, j) .

rank_{psd} M \geq B_{4} (M) = i = 1 \sum n j max M (i, j) .

rank_{psd} M \geq B_{5} (M) = i = 1 \sum n {q_{j}^{(i)}}_{j = 1}^{m} max \frac{\sum _{k = 1}^{m} q _{k}^{(i)} M ( i , k )}{\sum _{s, t = 1}^{m} q _{s}^{(i)} q _{t}^{(i)} F ( M _{s} , M _{t} ) ^{2}}

rank_{psd} M \geq B_{5} (M) = i = 1 \sum n {q_{j}^{(i)}}_{j = 1}^{m} max \frac{\sum _{k = 1}^{m} q _{k}^{(i)} M ( i , k )}{\sum _{s, t = 1}^{m} q _{s}^{(i)} q _{t}^{(i)} F ( M _{s} , M _{t} ) ^{2}}

i = 1 \sum n m_{ij}^{ε} = i = 1 \sum r + 1 m_{ij}^{ε} + i = r + 2 \sum n m_{ij} = i = 1 \sum r + 1 m_{ij} (1 + ε α_{i}) + i = r + 2 \sum n m_{ij} = i = 1 \sum n m_{ij} + ε 0 i = 1 \sum r + 1 α_{i} m_{ij} = i = 1 \sum n m_{ij} .

i = 1 \sum n m_{ij}^{ε} = i = 1 \sum r + 1 m_{ij}^{ε} + i = r + 2 \sum n m_{ij} = i = 1 \sum r + 1 m_{ij} (1 + ε α_{i}) + i = r + 2 \sum n m_{ij} = i = 1 \sum n m_{ij} + ε 0 i = 1 \sum r + 1 α_{i} m_{ij} = i = 1 \sum n m_{ij} .

m_{ij} = P [X = x_{i}, Y = y_{j}] \geq 0, i = 1, j = 1 \sum n, m m_{ij} = 1.

m_{ij} = P [X = x_{i}, Y = y_{j}] \geq 0, i = 1, j = 1 \sum n, m m_{ij} = 1.

p_{i} = P [X = x_{i}] = j = 1 \sum m m_{ij}, i \in \overline{1, n}; q_{j} = P [Y = y_{j}] = i = 1 \sum n m_{ij}, j \in \overline{1, m} .

p_{i} = P [X = x_{i}] = j = 1 \sum m m_{ij}, i \in \overline{1, n}; q_{j} = P [Y = y_{j}] = i = 1 \sum n m_{ij}, j \in \overline{1, m} .

I (X : Y) = D_{K L} (p (X, Y) ∣∣ p (X) p (Y)) = i = 1 \sum n j = 1 \sum m p (x_{i}, y_{j}) lo g_{2} (\frac{p ( x _{i} , y _{j} )}{p ( x _{i} ) p ( y _{j} )}) = i = 1 \sum n j = 1 \sum m m_{ij} lo g_{2} (\frac{m _{ij}}{p _{i} q _{j}}),

I (X : Y) = D_{K L} (p (X, Y) ∣∣ p (X) p (Y)) = i = 1 \sum n j = 1 \sum m p (x_{i}, y_{j}) lo g_{2} (\frac{p ( x _{i} , y _{j} )}{p ( x _{i} ) p ( y _{j} )}) = i = 1 \sum n j = 1 \sum m m_{ij} lo g_{2} (\frac{m _{ij}}{p _{i} q _{j}}),

B_{2} (M) = 2^{I (X : Y)} \leq rank M .

B_{2} (M) = 2^{I (X : Y)} \leq rank M .

I (M^{ε}) - I (M) = i = 1 \sum r + 1 j = 1 \sum s [m_{ij}^{ε} lo g (\frac{m _{ij}^{ε}}{p _{i}^{ε} q _{j}^{ε}}) - m_{ij} lo g (\frac{m _{ij}}{p _{i} q _{j}})] =

I (M^{ε}) - I (M) = i = 1 \sum r + 1 j = 1 \sum s [m_{ij}^{ε} lo g (\frac{m _{ij}^{ε}}{p _{i}^{ε} q _{j}^{ε}}) - m_{ij} lo g (\frac{m _{ij}}{p _{i} q _{j}})] =

= i = 1 \sum r + 1 j = 1 \sum s [m_{ij} (1 + ε α_{i}) lo g (\frac{m _{ij} ( 1 + ε α _{i} )}{p _{i} ( 1 + ε α _{i} ) q _{j}}) - m_{ij} lo g (\frac{m _{ij}}{p _{i} q _{j}})] =

= i = 1 \sum r + 1 j = 1 \sum s [m_{ij} (1 + ε α_{i}) lo g (\frac{m _{ij} ( 1 + ε α _{i} )}{p _{i} ( 1 + ε α _{i} ) q _{j}}) - m_{ij} lo g (\frac{m _{ij}}{p _{i} q _{j}})] =

= i = 1 \sum r + 1 j = 1 \sum s [ε α_{i} m_{ij} lo g (\frac{m _{ij}}{p _{i} q _{j}})] = ε \cdot Λ.

= i = 1 \sum r + 1 j = 1 \sum s [ε α_{i} m_{ij} lo g (\frac{m _{ij}}{p _{i} q _{j}})] = ε \cdot Λ.

I (M) \leq I (M) = I (X : Y) \leq H (X) \leq lo g ∣ supp (X) ∣ \leq lo g r .

I (M) \leq I (M) = I (X : Y) \leq H (X) \leq lo g ∣ supp (X) ∣ \leq lo g r .

B_{3} (M) \leq (ln m + 1)^{2} r^{2} .

B_{3} (M) \leq (ln m + 1)^{2} r^{2} .

1 - k = 1 \sum m p_{k} q_{k} = \frac{1}{2} (\sum p_{k} + \sum q_{k} - 2 \sum p_{k} q_{k}) = \frac{1}{2} \sum ∣ p_{k} - q_{k} ∣^{2} \leq \frac{1}{2} \sum ∣ p_{k} - q_{k} ∣

1 - k = 1 \sum m p_{k} q_{k} = \frac{1}{2} (\sum p_{k} + \sum q_{k} - 2 \sum p_{k} q_{k}) = \frac{1}{2} \sum ∣ p_{k} - q_{k} ∣^{2} \leq \frac{1}{2} \sum ∣ p_{k} - q_{k} ∣

\Rightarrow F (p, q) = k = 1 \sum m p_{k} q_{k} \geq 1 - \frac{1}{2} \sum ∣ p_{k} - q_{k} ∣ = 1 - \frac{∣ p - q ∣}{2} .

\Rightarrow F (p, q) = k = 1 \sum m p_{k} q_{k} \geq 1 - \frac{1}{2} \sum ∣ p_{k} - q_{k} ∣ = 1 - \frac{∣ p - q ∣}{2} .

B_{3} (M) = {q_{i}}_{i = 1}^{m} max \frac{1}{\sum _{i, j} q _{i} q _{j} F ( M _{i} , M _{j} ) ^{2}} = \frac{1}{{ q _{i} } _{i = 1}^{m} min \sum _{i, j} q _{i} q _{j} F ( M _{i} , M _{j} ) ^{2}} .

B_{3} (M) = {q_{i}}_{i = 1}^{m} max \frac{1}{\sum _{i, j} q _{i} q _{j} F ( M _{i} , M _{j} ) ^{2}} = \frac{1}{{ q _{i} } _{i = 1}^{m} min \sum _{i, j} q _{i} q _{j} F ( M _{i} , M _{j} ) ^{2}} .

1 = q_{1} + q_{2} + \dots + q_{m} \leq \frac{1}{ln m + 1} + \frac{1}{2 ( ln m + 1 )} + \dots + \frac{1}{m ( ln m + 1 )} =

1 = q_{1} + q_{2} + \dots + q_{m} \leq \frac{1}{ln m + 1} + \frac{1}{2 ( ln m + 1 )} + \dots + \frac{1}{m ( ln m + 1 )} =

= \frac{1}{ln m + 1} (1 + \frac{1}{2} + \frac{1}{3} + \dots + \frac{1}{m}) < \frac{1}{ln m + 1} (1 + \int_{1}^{m} \frac{1}{x} d x) = 1.

= \frac{1}{ln m + 1} (1 + \frac{1}{2} + \frac{1}{3} + \dots + \frac{1}{m}) < \frac{1}{ln m + 1} (1 + \int_{1}^{m} \frac{1}{x} d x) = 1.

i, j = 1 \sum m q_{i} q_{j} F (M_{i}, M_{j})^{2} \geq i, j = 1 \sum s q_{i} q_{j} F (M_{i}, M_{j})^{2} \geq i, j = 1 \sum s q_{s}^{2} F (M_{i}, M_{j})^{2} =

i, j = 1 \sum m q_{i} q_{j} F (M_{i}, M_{j})^{2} \geq i, j = 1 \sum s q_{i} q_{j} F (M_{i}, M_{j})^{2} \geq i, j = 1 \sum s q_{s}^{2} F (M_{i}, M_{j})^{2} =

= s^{2} q_{s}^{2} \cdot \frac{i , j = 1 \sum s F ( M _{i} , M _{j} ) ^{2}}{s ^{2}} \geq \frac{1}{( ln m + 1 ) ^{2}} \cdot \frac{i , j = 1 \sum s F ( M _{i} , M _{j} ) ^{2}}{s ^{2}}

\frac{i , j = 1 \sum s F ( M _{i} , M _{j} ) ^{2}}{s ^{2}} \geq \frac{i , j = 1 \sum s F ( M _{i} , M _{j} )}{s ^{2}}^{2} \geq \frac{i , j = 1 \sum s ( 1 - \frac{∣ M _{i} - M _{j} ∣}{2} )}{s ^{2}}^{2} = 1 - \frac{\frac{1}{2} i , j = 1 \sum s ∣ M _{i} - M _{j} ∣}{s ^{2}}^{2}

\frac{i , j = 1 \sum s F ( M _{i} , M _{j} ) ^{2}}{s ^{2}} \geq \frac{i , j = 1 \sum s F ( M _{i} , M _{j} )}{s ^{2}}^{2} \geq \frac{i , j = 1 \sum s ( 1 - \frac{∣ M _{i} - M _{j} ∣}{2} )}{s ^{2}}^{2} = 1 - \frac{\frac{1}{2} i , j = 1 \sum s ∣ M _{i} - M _{j} ∣}{s ^{2}}^{2}

S (M^{ε}) - S (M)

S (M^{ε}) - S (M)

= \frac{1}{2 m ^{2}} (k = 1 \sum n [i, j = 1 \sum m (∣ m_{k i}^{ε} - m_{k j}^{ε} ∣ - ∣ m_{k i} - m_{k j} ∣)]) =

= \frac{1}{2 m ^{2}} (k = 1 \sum r + 1 [i, j = 1 \sum m (∣ m_{k i} - m_{k j} ∣ (1 + ε α_{k}) - ∣ m_{k i} - m_{k j} ∣)]) =

= \frac{1}{2 m ^{2}} (k = 1 \sum r + 1 [i, j = 1 \sum m ∣ m_{k i} - m_{k j} ∣ ε α_{k}]) = ε \cdot Λ.

S (M) \leq 1 - \frac{1}{r} .

S (M) \leq 1 - \frac{1}{r} .

Z (M) = Z (B) = \frac{1}{2} k = 1 \sum r i, j = 1 \sum m ∣ b_{k i} - b_{k j} ∣ = k = 1 \sum r i = 1 \sum m j = i \sum m (b_{k i} - b_{k j}) .

Z (M) = Z (B) = \frac{1}{2} k = 1 \sum r i, j = 1 \sum m ∣ b_{k i} - b_{k j} ∣ = k = 1 \sum r i = 1 \sum m j = i \sum m (b_{k i} - b_{k j}) .

Z (M) = k = 1 \sum r ((m - 1) b_{k 1} + (m - 3) b_{k 2} + \dots - (m - 3) b_{k (m - 1)} - (m - 1) b_{k m}) =

Z (M) = k = 1 \sum r ((m - 1) b_{k 1} + (m - 3) b_{k 2} + \dots - (m - 3) b_{k (m - 1)} - (m - 1) b_{k m}) =

= (m - 1) k = 1 \sum r b_{k 1} + (m - 3) k = 1 \sum r b_{k 2} + \dots - (m - 3) k = 1 \sum r b_{k (m - 1)} - (m - 1) k = 1 \sum r b_{k m}

B^{*} = 1 ⋮ ⋮ ⋮ ⋮ 1 1 ⋮ ⋮ ⋮ ⋮ 1 \dots \dots \dots \dots \dots \dots 1 ⋮ 11 ⋮ 1 1 ⋮ 10 ⋮ 0 0 ⋮ 00 ⋮ 0 0 ⋮ 00 ⋮ 0 \dots \dots \dots \dots \dots \dots 000000

B^{*} = 1 ⋮ ⋮ ⋮ ⋮ 1 1 ⋮ ⋮ ⋮ ⋮ 1 \dots \dots \dots \dots \dots \dots 1 ⋮ 11 ⋮ 1 1 ⋮ 10 ⋮ 0 0 ⋮ 00 ⋮ 0 0 ⋮ 00 ⋮ 0 \dots \dots \dots \dots \dots \dots 000000

M^{*} = \setcounter M a x M a t r i x C o l s 14 11 ⋱ ⋱ 111 ⋱ ⋱ 1 \dots \dots \dots 1 ⋱ 10 0 \par .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optimization Algorithms Research · Complexity and Algorithms in Graphs · Machine Learning and Algorithms

Full text

Exploring the bounds on the positive semidefinite rank

Andrii Riazanov

[email protected] Skolkovo Institute of Science and Technology; Moscow Institute of Physics and Technology (State University).

Mikhail Vyalyi

[email protected] Dorodnicyn Computing Centre, FRC CSC RAS; Moscow Institute of Physics and Technology (State University); National Research University Higher School of Economics. The study has been funded by the Russian Academic Excellence Project ’5-100’.

Abstract

The nonnegative and positive semidefinite (PSD-) ranks are closely connected to the nonnegative and positive semidefinite extension complexities of a polytope, which are the minimal dimensions of linear and SDP programs which represent this polytope. Though some exponential lower bounds on the nonnegative [FMP*+*12] and PSD- [LRS15] ranks has recently been proved for the slack matrices of some particular polytopes, there are still no tight bounds for these quantities. We explore some existing bounds on the PSD-rank and prove that they cannot give exponential lower bounds on the extension complexity. Our approach consists in proving that the existing bounds are upper bounded by the polynomials of the regular rank of the matrix, which is equal to the dimension of the polytope (up to an additive constant). As one of the implications, we also retrieve an upper bound on the mutual information of an arbitrary matrix of a joint distribution, based on its regular rank.

1 Introduction

Linear optimization plays an important role in computer science and mathematics. Though there exist efficient algorithms of linear optimization over convex sets, for the polytopes with exponential number of facets they still work too long in general case. That is why one may want to represent such “hard” convex set as a projection (linear map) of some “easier” convex set, for example of some affine slice of the cone of nonnegative orthant or the cone of positive semidefinite matrices, since on slices of both these cones linear optimization has efficient algorithms. Such representations are called the nonnegative and the positive semidefinite (PSD-) extensions, respectively.

Since many problems of combinatorial optimization can be represented as linear programs over a polytope, studying the extensions of convex polytopes is an important and challenging problem. The natural question is to find the minimal dimension for which there exists an extension of the given polytope. It can be also formulated as determining the smallest dimensions of LP or SDP programs which represent optimization over the given polytope, and such sizes are called the nonnegative and the semidefinite extension complexities, respectively.

In the context of P $\not=$ NP we do not expect to find small nonnegative or PSD- extension complexities for NP-hard problems, since that would mean that there exist polynomial algorithms for solving these problems. However, there is still no general approach for proving the lower bounds on these quantities, and only a few exponential lower bounds for some particular problems has recently been proved. All such results use the connection between extension complexity and matrix factorizations, which was first discovered in [Yan91] for the nonnegative extension complexity and nonnegative matrix factorizations. Further, this approach was extended in [GPT13] for the general case of cone factorizations, and the same result for PSD-factorizations was also obtained in [FMP*+*12]. This instrument gave an opportunity to explore the nonnegative and PSD- extension complexities of polytopes via studying some characteristics of their slack matrices called the nonnegative and the PSD- ranks. For example, in the 1980s there were attempts to prove P = NP by providing the polynomial-sized linear program to solve the NP-hard travelling salesman problem (TSP). However, using the described approach, Yannakakis proved in [Yan91] that any symmetric LP which solves TSP has exponential size, which meant invalidity of all such attempts, since all the presented LPs were symmetric. The extension of this result for any (not only symmetric) TSP was first presented in [FMP*+*12], where the authors used the connection between the nonnegative rank of the matrix and the nondeterministic communication complexity of its support. In this work, the exponential lower bounds on the nonnegative rank were also proved for CUT and Stable Set polytopes. The first analogical bounds for the PSD-extension complexity were presented in [LRS15] using the sum-of-squares SDP hierarchy.

Since exponential lower bounds were obtained for some particular cases only, it is still a challenging problem to obtain reasonable estimations and bounds for the nonnegative and PSD- ranks. This problem is widely discussed during the last decade. For instance, exponential bounds on the nonnegative rank, and thus on the nonnegative extension complexity, were proved in [Rot14] for the matching polytope , where the author used the extension of Razborov’s result [Raz90]. We address the reader to the review [FGP*+*15] for more details about recent research on the PSD-rank.

There is also a problem of determining the computational complexity of computing the nonnegative and PSD- ranks. Both problems are known to be NP-hard, and recent research [Shi16] shows that the problem of computing the PSD-rank is complete in $\exists\mathbb{R}$ – the existential theory of the reals.

Contribution

In this paper we explore the lower bounds on the PSD-rank introduced in [LWdW16], which we will further address as bounding functionals (of a matrix). We show that these functionals cannot give exponential bounds on the PSD-rank, and thus on the positive semidefinite extension complexity. Our approach consists in proving that the bounding functionals of the slack matrix are bounded above by the polynomial of the regular rank of this matrix and the logarithm of the matrix size. Since for any polytope $P$ we have $\operatorname{rank}S_{P}=\operatorname{dim}(P)+1$ , it would mean that the bounds are polynomial in the dimension of the polytope.

As one of the implications of our approach, we achieve the upper bound on the mutual information for an arbitrary matrix of a joint distribution. More precisely, we show that the mutual information is bounded above by the logarithm of the rank of the matrix.

Outline of the paper

This paper is organized as follows. In Sect. 2 we introduce all the necessary notations and explain some connections between the PSD-rank and the quantum communication complexity. In Sect. 3 we present the bounding functionals from [LWdW16] and explain how the lower bound on the PSD-rank can be obtained via the mutual information. Finally, in Sect. 4 the upper bounds on the bounding functionals are proved. In particular, Theorem 4.1 shows that the mutual information of two discrete random variables is bounded above by the logarithm of the regular rank of the matrix of their joint distribution.

2 Preliminaries

2.1 Nonnegative and PSD- matrix factorizations

The nonnegative matrix factorization of the nonnegative matrix $A\in\mathbb{R}^{m\times n}$ is the decomposition $A=BC$ , where $B\in\mathbb{R}^{m\times k}$ , $C\in\mathbb{R}^{k\times n}$ , and $B,C$ are nonnegative matrices. Alternatively, such factorization can be thought of as two sets of vectors $\{b_{i}\}_{i=1}^{m},\ \{c_{j}\}_{j=1}^{n},\ b_{i},c_{j}\in\mathbb{R}^{k}_{+}$ , such that $A(i,j)=\langle b_{i},c_{j}\rangle$ . Then the nonnegative rank of $A$ , denoted $\operatorname{rank_{+}}A$ , is the smallest $k\in\mathbb{N}$ for which such nonnegative factorization of $A$ exists.

Similarly, the positive semidefinite rank $\operatorname{\operatorname{rank}_{psd}}A$ is the minimal integer $r$ for which there exist two sets of complex Hermitian positive semidefinite matrices $\{B_{i}\}_{i=1}^{m},\ \{C_{j}\}_{j=1}^{n},\ B_{i},C_{j}\in{\bf S}^{r}_{+}$ , such that $A(i,j)=\langle B_{i},C_{j}\rangle=\operatorname{Tr}(B_{i}C_{j})$ . Such factorization is called the positive semidefinite factorization, and it has many applications in combinatorial optimization and communication complexity. If to restrict the matrices in the factorization to be real symmetric positive semidefinite, one will obtain the definition of the real PSD-rank $\operatorname{rank}^{\mathbb{R}}_{psd}$ . It can be shown ([LWdW16]), that the restriction for matrices to be real can increase $\operatorname{\operatorname{rank}_{psd}}$ at most by the factor of 2, e.g. $\operatorname{\operatorname{rank}_{psd}}\leq\operatorname{rank}^{\mathbb{R}}_{psd}\leq 2\operatorname{\operatorname{rank}_{psd}}$ . Since in our context we only study asymptotic bounds on the ranks, there is no difference between considering $\operatorname{\operatorname{rank}_{psd}}$ or $\operatorname{rank}^{\mathbb{R}}_{psd}$ .

We would like to emphasize that rescaling the nonnegative matrix by multiplying its rows or columns by any positive factors does not change its nonnegative and PSD- ranks. Indeed, multiplication of the $i^{th}$ row of $A$ by $\alpha$ corresponds to the multiplication of $b_{i}$ by the same factor $\alpha$ in the nonnegative factorization. Similarly, it corresponds to the multiplication of $B_{i}$ by $\alpha$ in the PSD-factorization. Obviously, the situation with the columns of $A$ is the same.

2.2 Extension complexity

The nonnegative extension complexity of the polytope $P$ is the smallest number $d$ such that $P$ can be expressed as a projection of an affine slice of the nonnegative $d$ -dimensional orthant $\mathbb{R}_{+}^{d}$ . Similarly, the semidefinite (PSD-) extension complexity of $P$ is the minimum number $r$ for which there exists an affine slice of the cone of complex Hermitian $r\times r$ positive semidefinite matrices ${\bf S}^{r}_{+}$ that projects onto $P$ .

In other words, for optimizing over some polytope $P\in\mathbb{R}^{d}$ one may want to represent is as $P=\pi(K~{}\cap~{}L)$ , where $K\subseteq\mathbb{R}^{n}$ is some close convex cone, $L$ is some affine subspace of $\mathbb{R}^{n}$ , and $\pi$ is a linear map (projection). Such representations are called $K$ -lifts, ([GPT13]), or $K$ -extensions. If to choose $K$ from the families of the cones of nonnegative orthants $\mathbb{R}^{k}_{+}$ or positive semidefinite matrices ${\bf S}^{r}_{+}$ , the nonnegative and PSD- extension complexities for the given polytope correspond to minimal $k$ and $r$ for which such representations exist.

2.3 Factorization theorem

As it was discussed in Introduction, [Yan91], [GPT13], and [FMP*+*12] proved that the extension complexities and matrix factorizations are interconnected. Here we present the Factorization theorem, which explains the relations between these two notions.

Let $P$ be a polytope in $\mathbb{R}^{d}$ with $n$ vertices and $m$ facets, thus $P=\{x\in\mathbb{R}^{d}\ |\ \langle x,a_{j}\rangle\leq b_{j},\ j\in\overline{1,m}\}$ . Then the slack matrix of the polytope $P$ is defined as the nonnegative matrix $S_{P}\in\mathbb{R}^{n\times m}$ with $S_{P}(i,j)=b_{j}-\langle v_{i},a_{j}\rangle$ , where $v_{i}$ is the $i^{th}$ vertex of $P$ . Then the Factorization theorem can be formulated as follows:

Factorization Theorem.

The nonnegative extension complexity of $P$ is equal to $\operatorname{rank_{+}}S_{P}$ . Similarly, the PSD-extension complexity of $P$ is equal to $\operatorname{\operatorname{rank}_{psd}}S_{P}$ .

This approach allows applying techniques for estimating or bounding such algebraic notions as sizes of matrix factorizations to answer geometrical questions about the complexities of the polytopes.

2.4 Quantum communication complexity

In this section, we describe the connection between the quantum communication complexity and $\operatorname{\operatorname{rank}_{psd}}$ . First, we will consider one-way quantum communication protocol.

A quantum state $\rho$ is a positive semidefinite matrix with $\operatorname{Tr}{\rho}=1$ . A measurement $\mathcal{E}$ is the set of positive semidefinite matrices $\{E_{i}\}_{i\in\Omega}$ , indexed by the finite set of nonnegative real numbers $\Omega$ , with the condition $\Sigma_{i\in\Omega}E_{i}=I$ . The measurements are also called POVM (“Positive Operator Value Measure”) in the literature. POVMs work in the following way: when we apply the measurement $\mathcal{E}$ to the state $\rho$ , the outcome is $i$ with probability $\operatorname{Tr}(E_{i}\rho)$ .

Then the process of communication is set as follows: initially, Alice has the integer $x$ , and Bob has $y$ . Then Alice sends an $r\times r$ -dimensional quantum state $\rho_{x}$ to Bob, who measures it with POVM $\mathcal{E}_{y}$ and outputs the result. We say that such a protocol computes the nonnegative matrix $M$ in expectation, if the expected value of Bob’s output on the input $(x,y)$ is equal to $M(x,y)$ (the entry of the matrix $M$ in $x^{th}$ row and $y^{th}$ column). Then the quantum communication complexity of the matrix $M$ is the logarithm of such a minimal size of dimension $r$ , for which there exists a one-way quantum protocol which computes $M$ in expectation.

Fiorini et. al. [FMP*+*12] and Jain et. al. [JSWZ13] proved that the minimal amount of quantum information needed for Alice and Bob to generate the nonnegative matrix $M$ is completely determined by the PSD-rank of this matrix. More precisely, they showed that the quantum communication complexity of $M$ is equal to $\lceil\log\operatorname{\operatorname{rank}_{psd}}M\rceil$ .

3 Bounding functionals on the PSD-rank

In this section, we present some existing general lower bounds on $\operatorname{\operatorname{rank}_{psd}}$ from [LWdW16], which we address as bounding functionals. Except for the bound via mutual information, the bounding functionals are introduced here without justification. We address the reader to the original article for more details on the bounds. For convenience, we preserve the notations for the bounding functionals from the original article.

3.1 Bound via Mutual Information

If $X$ and $Y$ are two random variables, then the mutual information is defined as follows:

[TABLE]

where $H$ is Shannon entropy. The mutual information can be interpreted as the number of bits of information about $A$ that are revealed by the value of $B$ . We will now use Holevo’s theorem [Wat11] to bound the mutual information. It claims that the number of classical bits of information that Alice can communicate to Bob by sending $n$ qubits does not exceed $n$ . From the previous passage we know that we need exactly $\lceil\log\operatorname{\operatorname{rank}_{psd}}M\rceil$ qubits of information to compute the matrix $M$ . Normalizing $M$ and considering it as a matrix of joint distribution $\mathbb{P}(A,B)$ , we then have:

Fact 3.1.

Let $M$ be a matrix of a joint distribution of two discrete random variables $A,B$ with finite support, $M(a,b)=\mathbb{P}[B=b,A=a]$ . Then

[TABLE]

3.2 Bounding functionals from [LWdW16]

For two probability distributions $p=\{p_{i}\}_{i=1}^{n}$ and $q=\{q_{i}\}_{i=1}^{n}$ fidelity is defined as $F(p,q)=\Sigma_{i=1}^{n}\sqrt{p_{i}q_{i}}$ .

Recall that the left stochastic matrix is the matrix with nonnegative entries, with each column summing to $1$ . Further in the text we will omit “left” and just use the term “stochastic matrix” instead.

Then we have the following lower bounds:

Fact 3.2.

Let $M\in\mathbb{R}^{n\times m}$ be a stochastic matrix. Then

[TABLE]

where the $\max$ is taken over all probability distributions $q=\{q_{i}\}_{i=1}^{m}$ , and $M_{i}$ is the $i^{th}$ column of $M$ .

Fact 3.3.

Let $M\in\mathbb{R}^{n\times m}$ be a stochastic matrix. Then

[TABLE]

Fact 3.4.

Let $M\in\mathbb{R}^{n\times m}$ be a stochastic matrix. Then

[TABLE]

where the $\max$ is taken over all probability distributions $q^{(i)}=\{q^{(i)}_{j}\}_{j=1}^{m}$ , and $M_{i}$ is the $i^{th}$ column of $M$ .

4 Upper bounds on the bounding functionals

All the bounds from section 3 were explored and compared in [LWdW16]. It turned out that in different cases $B_{2},B_{3},B_{4}$ , or $B_{5}$ can give better bounds on $\operatorname{\operatorname{rank}_{psd}}$ than others, and some of them can be tight in some particular cases. However, the key question of whether these functions can give exponential lower bounds on the PSD-rank with respect to the regular rank was not addressed. In this section we answer this question negatively.

In the context of combinatorial optimization, we would like to show that for the polytope of some NP-hard problem the semidefinite extension complexity is exponential in the dimension. Following the arguments from Section 2.2, it suffices to show that the PSD-rank of the corresponding slack matrix is exponential. It is easy to show ([GGK*+*13]) that the regular rank of the slack matrix equals to the dimension of the polytope plus one: $\operatorname{rank}S_{P}=\operatorname{dim}P+1$ . For all the presented bounding functionals we provide the upper bounds polynomial in the regular rank of the matrix and the logarithm of the matrix size, which means that they cannot be exponential in the dimension.

4.1 Row elimination transformation

We will now describe the row elimination transformation, which will be used for proving the required bounds.

Let $M\in\mathbb{R}^{n\times m}$ be a nonnegative matrix with $\operatorname{rank}M=r<n$ . Without loss of generality, assume that first $r+1$ rows $\overline{m_{1}},\overline{m_{2}}\dots,\overline{m_{r+1}}$ are non-zero. They are linearly dependent, so there exists a nontrivial set of real numbers $\{\alpha_{i}\}_{i=1}^{r+1}$ , such that $\sum_{i=1}^{r+1}\alpha_{i}\overline{m_{i}}=\overline{0}$ . Since all entries of $M$ are nonnegative, there are both negative and positive numbers among $\{\alpha_{i}\}_{i=1}^{r+1}$ . For such a set of real numbers $\{\alpha_{i}\}$ we denote by $\Delta_{\alpha}$ the closed interval $\Delta_{\alpha}=\left[-\dfrac{1}{\max_{i}\alpha_{i}},\ -\dfrac{1}{\min_{i}\alpha_{i}}\right]$ , which is properly defined due to the last remark.

Then we define the matrix $M^{\varepsilon}$ as follows: for $1\leq i\leq(r+1)$ the $i$ -th row of $M^{\varepsilon}$ equals $\overline{m_{i}}(1+\varepsilon\alpha_{i})$ , for $i>(r+1)$ the $i$ -th row of $M^{\varepsilon}$ coincides with the $i$ -th row of $M$ . We call the matrix $M^{\varepsilon}$ $\varepsilon$ -transformation of $M$ .

First of all, note that $\quad(1+\varepsilon\alpha_{i})\ \geq 0\quad\forall i\in\overline{1,(r+1)}\quad\Leftrightarrow\quad\varepsilon\in\Delta_{\alpha}$ . Moreover, it holds that when $\varepsilon$ is equal to one of the ends of $\Delta_{\alpha}$ , at least one of the coefficients $(1+\varepsilon\alpha_{i})$ is equal to zero. It means that for $\varepsilon\in\Delta_{\alpha}$ the matrix $M^{\varepsilon}$ is nonnegative matrix, and when $\varepsilon$ is either the left or the right end of $\Delta_{\alpha}$ , $M^{\varepsilon}$ has more zero rows than $M$ .

Next, we prove that sums of columns do not change after row elimination transformation. Indeed,

[TABLE]

In particular, it means that if $M$ is stochastic, then for $\varepsilon\in\Delta_{\alpha}\ M^{\varepsilon}$ is also stochastic. Similarly, if $M$ is a matrix of a joint distribution, then $M^{\varepsilon}$ is also a matrix of some joint distribution for $\varepsilon$ from $\Delta_{\alpha}$ .

4.2 Upper bound on $B_{2}$ (Mutual Information)

Let $M\in\mathbb{R}^{n\times m}$ be the matrix of a joint distribution of two discrete random variables $X,Y:$

[TABLE]

Let $p_{i},\ i\in\overline{1,n}$ , and $q_{j},\ j\in\overline{1,m}$ , be the marginal probabilities of $X$ and $Y$ respectively:

[TABLE]

Then the mutual information between $X$ and $Y$ can also be defined as:

[TABLE]

where we set $0\log\dfrac{0}{q}=0$ (the logarithm here and further is to the base 2). We also denote $I(M)=I(X:Y)$ .

Theorem 4.1.

Let $M\in\mathbb{R}^{n\times m}$ be the matrix of a joint distribution of $X$ and $Y$ . Then

[TABLE]

Proof.

Denote $r=\operatorname{rank}M$ . We will now transform the original matrix $M$ in such a way, that the mutual information will not decrease, but the new matrix $\widetilde{M}$ will have at most $r$ non-zero rows.

Suppose $M$ has more than $r$ non-zero rows. Then we apply the row elimination transformtaion and consider the $\varepsilon$ -transformation $M^{\varepsilon}$ of the original matrix. Since we have already shown that it is also a matrix of some joint distribution, we explore how the mutual information changes after such transformations.

First, since the $\varepsilon$ -transformation does not change the sums in the columns of $M$ , we have $q^{\varepsilon}_{j}=q_{j}$ . Then, since $p^{\varepsilon}_{i}$ is the sum of entries in the $i$ -th row, we obtain $p^{\varepsilon}_{i}=p_{i}(1+\varepsilon\alpha_{i})$ .

Note that since $M$ and $M^{\varepsilon}$ coincide on rows with indexes larger than $r+1$ , we may omit the summation over these rows:

[TABLE]

Now recall that the $\varepsilon$ -transformation is valid for $\varepsilon\in\Delta_{\alpha}$ , where the left end of $\Delta_{\alpha}$ is negative, and the right end is positive. It means that we can choose an end of the interval of $\Delta_{\alpha}$ such that $I(M^{\varepsilon})\geq I(M)$ . It only remains to note that with the chosen value of $\varepsilon$ at least one of the first $(r+1)$ rows in $M^{\varepsilon}$ becomes zero.

To get an upper bound on the mutual information, we apply $\varepsilon$ -transformations with such suitable $\varepsilon$ ’s that the number of non-zero rows strictly decreases and the mutual information does not decrease. At the end of such procedure we obtain the matrix $\widetilde{M}$ with at most $r$ non-zero rows for which $I(M)\leq I(\widetilde{M})$ . Since $\widetilde{M}$ is the matrix of joint distribution, we have $I(\widetilde{M})=I(\widetilde{X}:\widetilde{Y})$ , where the support of $\widetilde{X}$ has cardinality at most $r$ . Using the equality $I(\widetilde{X}:\widetilde{Y})=H(\widetilde{X})-H(\widetilde{X}|\widetilde{Y})$ and the non-negativity of the conditional entropy, we finally have:

[TABLE]

∎

4.3 Upper bound on $B_{3}$

We will show that $B_{3}(M)$ is upper bounded by $poly(\mathrm{rank}(M),\ln m)$ :

Theorem 4.2.

Let $M\in\mathbb{R}^{n\times m}$ be a stochastic matrix, $\operatorname{rank}M=r$ . Then

[TABLE]

We start with proving the following well-known fact:

Lemma 4.1.

For distributions $p,q$ it holds $F(p,q)\geq 1-\dfrac{|p-q|}{2}$ , where $|p-q|$ is $l_{1}-$ norm of the vector $(p-q)$ , and thus $\dfrac{|p-q|}{2}$ is the statistical distance between the distributions.

Proof.

[TABLE]

∎

Now, we have

[TABLE]

Then we need to prove the lower bound on $\underset{q\in\Delta_{m}}{\min}\sum_{i,j}q_{i}q_{j}F(M_{i},M_{j})^{2}$ .

We will find the lower bound on this quadratic form for an arbitrary distribution $q$ . Without loss of generality, assume $q_{1}\geq q_{2}\geq\dots\geq q_{n}$ .

Lemma 4.2.

There exists $s\in\overline{1,m}$ such that $sq_{s}\geq\frac{1}{\ln m+1}$ .

Proof.

Suppose the opposite: $sq_{s}\leq\frac{1}{\ln m+1}\ \forall s\in\overline{1,m}$ . Then

[TABLE]

∎

Then we have

[TABLE]

Now, using the RMS-AM inequality and Lemma 4.1, we get:

[TABLE]

For any stochastic matrix $M\in\mathbb{R}^{n\times m}$ denote $S(M)=\dfrac{\dfrac{1}{2}\sum\limits_{i,j=1}^{m}|M_{i}-M_{j}|}{m^{2}}$ – the arithmetic mean of statistical distances between $m$ columns of $M$ . It now suffices to show the upper bound on $S(M)$ .

Lemma 4.3.

Let $M\in\mathbb{R}^{n\times m}$ be a stochastic matrix with $\mathrm{rank}(M)=r$ . Then there exists a stochastic matrix $\widetilde{M}\in\mathbb{R}^{r\times m}$ such that $S(M)\leq S(\widetilde{M})$ .

Proof.

We apply the row elimination algorithm. Suppose $M$ has more then $r$ non-zero rows. Consider then the $\varepsilon$ -transformation $M^{\varepsilon}$ of the original matrix. Since the $\varepsilon$ -transformation does not change the sums of entries in every column of the matrix, $M^{\varepsilon}$ is also stochastic. We now explore how $S(M)$ changes after the $\varepsilon$ -transformation:

[TABLE]

So, the difference $S(M^{\varepsilon})-S(M)$ is linear in terms of $\varepsilon$ . Remind again that the $\varepsilon$ -transformation is valid for $\varepsilon\in\Delta_{\alpha}$ , where the left end of $\Delta_{\alpha}$ is negative, and the right end is positive. It means that we can choose an end of the interval of $\Delta_{\alpha}$ such that $S(M^{\varepsilon})\geq S(M)$ and with the chosen value of $\varepsilon$ at least one of the first $(r+1)$ rows in $M^{\varepsilon}$ becomes zero. When we apply such $\varepsilon$ -transformations with suitable $\varepsilon$ ’s, the number of non-zero rows strictly decreases, and $S(M)$ does not decrease. At the end of such procedure we will obtain the matrix $\widetilde{M}$ with at most $r$ non-zero rows for which $S(M)\leq S(\widetilde{M})$ .

∎

Lemma 4.4.

Let $M\in\mathbb{R}^{r\times m}$ be a stochastic matrix. Then

[TABLE]

Proof.

If $m\leq r$ , then $\dfrac{\dfrac{1}{2}\sum\limits_{i,j=1}^{m}|M_{i}-M_{j}|}{m^{2}}\leq\dfrac{m^{2}-m}{m^{2}}=1-\dfrac{1}{m}\leq 1-\dfrac{1}{r}$ , where we just used $|M_{i}-M_{j}|\leq 2$ .

Now suppose $m>r$ . Denote $Z(M)=\dfrac{1}{2}\sum\limits_{i,j=1}^{m}|M_{i}-M_{j}|=\dfrac{1}{2}\sum\limits_{k=1}^{r}\sum\limits_{i,j=1}^{m}|m_{ki}-m_{kj}|$ .

We now construct the matrix $B$ by sorting every row of $M$ . Obviously, $Z(M)=Z(B)$ , since it is just a permutation of terms. Then

[TABLE]

For each $b_{ki}$ in this sum it occurs $(m-i)$ times with the sign $(+1)$ and $(i-1)$ times with the sign $(-1)$ . Hence,

[TABLE]

Clearly, $Z(M)$ takes its maximal value when the sum in the first columns of $B$ is maximal. Since $b_{ki}\leq 1$ and the sums of all the entries in $B$ and $M$ coincide and are equal to $m$ , to maximize $Z(M)$ we need to have $m$ ones in total in the first columns of $B$ . Denote $m=sr+p,\ p<r$ . If $r=1$ , then the matrix $M$ consists of ones only (since it is stochastic), then $S(M)=0$ and the inequality in the lemma is obvious. If $r>1$ , then it is easy to show that $(s+1)\leq\lceil\frac{m}{2}\rceil$ . Note that exactly first $\lceil\frac{m}{2}\rceil$ summands are nonnegative in (4), so to maximize $Z(M)$ first $(s+1)$ columns of $B$ should be filled with ones:

Such matrix $B^{*}$ would correspond to the following matrix $M^{*}$ :

[TABLE]

Then

[TABLE]

∎

Proof of Theorem 4.2.

The first $s$ columns of $M$ form the matrix $M^{\prime}\in\mathbb{R}^{n\times s}$ with $\mathrm{rank}(M^{\prime})=r^{\prime}\leq r$ . Using Lemma 4.3, we conclude that there exists $\widetilde{M^{\prime}}\in\mathbb{R}^{r^{\prime}\times s}$ such that $S(M^{\prime})\leq S(\widetilde{M^{\prime}})$ . Applying Lemma 4.4 we get $S(M^{\prime})\leq S(\widetilde{M^{\prime}})\leq\left(1-\dfrac{1}{r^{\prime}}\right)\leq\left(1-\dfrac{1}{r}\right)$ . Then from (3):

[TABLE]

Then from (2) for every distribution $q$ we obtain:

[TABLE]

And finally, using (1),

[TABLE]

∎

4.4 Upper bound on $B_{4}$

Theorem 4.3.

Let $M\in\mathbb{R}^{n\times m}$ be a stochastic matrix, $\operatorname{rank}M=r$ . Then

[TABLE]

Proof.

Again, we apply the row elimination transformation. Note that since every row in the matrix $M$ after this transformation is either multiplied by some nonnegative factor $\alpha$ or remains unchanged, the maximal element in this row is, obviously, multiplied by the same factor $\alpha$ or remains constant as well.

Suppose $M$ has at least $r+1$ non-zero rows, and without loss of generality, suppose that these are the first $r+1$ rows of M. Now consider the $\varepsilon$ -transformation $M^{\varepsilon}$ of $M$ , and explore how the functional $B_{4}$ changes after such transformation, taking the last remark into consideration:

[TABLE]

Similarly to previous proofs, $B_{4}$ is linear in terms of $\varepsilon$ , and therefore when $\varepsilon$ equals one of the ends of $\Delta_{\alpha}$ , the difference between $B_{4}(M^{\varepsilon})$ and $B_{4}(M)$ is nonnegative, while $M^{\varepsilon}$ has strictly less non-zero rows, then $M$ . Again, applying such transformations with suitable $\varepsilon$ ’s, at the end we obtain the matrix $\widetilde{M}$ with at most $r$ non-zero rows, for which $B_{4}(M)\leq B_{4}(\widetilde{M})$ . It only remains to note that in the formula for $B_{4}(\widetilde{M})$ there are at most $r$ non-zero summands, each less or equal than $1$ (since $\widetilde{M}$ is also stochastic). Therefore, we have $B_{4}(M)\leq B_{4}(\widetilde{M})\leq r$ .

∎

4.5 Upper bound on $B_{5}$

Theorem 4.4.

Let $M\in\mathbb{R}^{n\times m}$ be a stochastic matrix, $\operatorname{rank}M=r$ . Then

[TABLE]

Proof.

Simply applying (5) and (6) , we get:

[TABLE]

The last inequality is due to Theorem 4.3. ∎

Bibliography12

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[FGP + 15] Hamza Fawzi, João Gouveia, Pablo A. Parrilo, Richard Z. Robinson, and Rekha R. Thomas. Positive semidefinite rank. Mathematical Programming , 153(1):133–177, Jul 2015.
2[FMP + 12] Samuel Fiorini, Serge Massar, Sebastian Pokutta, Hans Raj Tiwary, and Ronald de Wolf. Linear vs. semidefinite extended formulations. In Proceedings of the 44th symposium on Theory of Computing - STOC’12 . Association for Computing Machinery (ACM), 2012.
3[GGK + 13] João Gouveia, Roland Grappe, Volker Kaibel, Kanstantsin Pashkovich, Richard Z. Robinson, and Rekha R. Thomas. Which nonnegative matrices are slack matrices? Linear Algebra and its Applications , 439(10):2921–2933, nov 2013.
4[GPT 13] João Gouveia, Pablo A. Parrilo, and Rekha R. Thomas. Lifts of convex sets and cone factorizations. Mathematics of Operations Research , 38(2):248–264, May 2013.
5[JSWZ 13] Rahul Jain, Yaoyun Shi, Zhaohui Wei, and Shengyu Zhang. Efficient protocols for generating bipartite classical distributions and quantum states. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms , pages 1503–1512. Society for Industrial & Applied Mathematics (SIAM), Jan 2013.
6[LRS 15] James R. Lee, Prasad Raghavendra, and David Steurer. Lower bounds on the size of semidefinite programming relaxations. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing - STOC’15 . Association for Computing Machinery (ACM), 2015.
7[L Wd W 16] Troy Lee, Zhaohui Wei, and Ronald de Wolf. Some upper and lower bounds on PSD-rank. Mathematical Programming , 162(1-2):495–521, Jul 2016.
8[Raz 90] A. A. Razborov. On the distributional complexity of disjointness. In Automata, Languages and Programming , pages 249–253. Springer Nature, 1990.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Exploring the bounds on the positive semidefinite rank

Abstract

1 Introduction

Contribution

Outline of the paper

2 Preliminaries

2.1 Nonnegative and PSD- matrix factorizations

2.2 Extension complexity

2.3 Factorization theorem

Factorization Theorem**.**

2.4 Quantum communication complexity

3 Bounding functionals on the PSD-rank

3.1 Bound via Mutual Information

Fact 3.1**.**

3.2 Bounding functionals from [LWdW16]

Fact 3.2**.**

Fact 3.3**.**

Fact 3.4**.**

4 Upper bounds on the bounding functionals

4.1 Row elimination transformation

4.2 Upper bound on B2B_{2}B2​ (Mutual Information)

Theorem 4.1**.**

Proof.

4.3 Upper bound on B3B_{3}B3​

Theorem 4.2**.**

Lemma 4.1**.**

Proof.

Lemma 4.2**.**

Proof.

Lemma 4.3**.**

Proof.

Lemma 4.4**.**

Proof.

Proof of Theorem 4.2.

4.4 Upper bound on B4B_{4}B4​

Theorem 4.3**.**

Proof.

4.5 Upper bound on B5B_{5}B5​

Theorem 4.4**.**

Proof.

Factorization Theorem.

Fact 3.1.

Fact 3.2.

Fact 3.3.

Fact 3.4.

4.2 Upper bound on $B_{2}$ (Mutual Information)

Theorem 4.1.

4.3 Upper bound on $B_{3}$

Theorem 4.2.

Lemma 4.1.

Lemma 4.2.

Lemma 4.3.

Lemma 4.4.

4.4 Upper bound on $B_{4}$

Theorem 4.3.

4.5 Upper bound on $B_{5}$

Theorem 4.4.