This paper proves the Courtade-Kumar conjecture for specific classes of Boolean functions, showing that mutual information between the function output and channel output is bounded by the binary entropy, for all input sizes and error probabilities.
Contribution
The authors establish the conjecture for certain classes of Boolean functions, extending the understanding of mutual information bounds in binary symmetric channels.
Findings
01
Proved the conjecture for specific Boolean function classes.
02
Mutual information is bounded by 1 - H(p) for all n ≥ 2 and 0 ≤ p ≤ 0.5.
03
Results hold universally across different input sizes and error probabilities.
Abstract
We prove the Courtade-Kumar conjecture, for certain classes of n-dimensional Boolean functions, ∀n≥2 and for all values of the error probability of the binary symmetric channel, ∀0≤p≤21. Let X=[X1...Xn] be a vector of independent and identically distributed Bernoulli(21) random variables, which are the input to a memoryless binary symmetric channel, with the error probability in the interval 0≤p≤21, and Y=[Y1...Yn] the corresponding output. Let f:{0,1}n→{0,1} be an n-dimensional Boolean function. Then, the Courtade-Kumar conjecture states that the mutual information MI(f(X),Y)≤1−H(p), where H(p) is the binary entropy function.
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Combinatorial Mathematics · Coding theory and cryptography · Advanced Mathematical Identities
Full text
On the Courtade-Kumar conjecture for certain classes of Boolean functions
We prove the Courtade-Kumar conjecture, for certain classes of n-dimensional Boolean functions, ∀n≥2 and for all values of the error probability of the binary symmetric channel, ∀0≤p≤21. Let X=[X1…Xn] be a vector of independent and identically distributed Bernoulli(21) random variables, which are the input to a memoryless binary symmetric channel, with the error probability in the interval 0≤p≤21, and Y=[Y1…Yn] the corresponding output. Let f:{0,1}n→{0,1} be an n-dimensional Boolean function. Then, the Courtade-Kumar conjecture states that the mutual information MI(f(X),Y)≤1−H(p), where H(p) is the binary entropy function.
Index Terms:
Boolean function, mutual information, Karamata’s theorem, binary entropy function
I Introduction
A recent information-theoretic conjecture, termed the Courtade-Kumar conjecture, was stated in [1] and gives the upper bound on the mutual information between a Boolean function of a random vector of inputs to a memoryless binary symmetric channel and the vector of the outputs. The mutual information is computed between a Boolean function of n independent and identically distributed Bernoulli random variables, with success probability, q=21, and the output of a memoryless binary symmetric channel, with error probability, 0≤p≤21, when this vector of Bernoulli random variables is passed as its input. The conjecture states that this upper bound is equal to 1−H(p), where H(p) denotes the binary entropy function. Several proofs have appeared in the literature, for different settings of this conjecture, but the most general case has remained unsolved. We bring further contributions to this effort. Using Karamata’s theorem [2], we prove the Courtade-Kumar conjecture [1], for certain classes of Boolean functions, ∀n≥2 and ∀0≤p≤21. These functions represent particular subclasses of lex functions, as introduced by Kumar and Courtade in [3]. In the context of this conjecture, Karamata’s theorem has been used in an earlier version of the preprint [4], which extends the conjecture to the continous case. The generalization of Karamata’s theorem, named Schur convexity, has been employed in [3].
Our paper is structured as follows: we start the introductory section with the prior results obtained so far in the literature, in the effort to solve the Courtade-Kumar conjecture. We end this section with our contributions. The essence of this paper, the proof of the Courtade-Kumar conjecture for particular classes of Boolean functions, for any dimension n≥2 and any error probability 0≤p≤21, is given in Section II. We present the conclusions of this study in Section III.
I-A Prior work related to the Courtade-Kumar conjecture
The proofs that have made the most progress towards solving the Courtade-Kumar conjecture are [5], [6]. The authors of [5] employ Fourier analysis and the hypercontractivity theorem to prove the bound stated in their Theorem 1, in the case of balanced Boolean functions and p in the range 21⋅(1−31)≤p≤21: MI(f(X),Y)≤2log(e)⋅(1−2⋅p)2+9⋅(1−2log(e))⋅(1−2⋅p)4. They show that this new bound performs better than the previously established bound of (1−2⋅p)2 of [7], in the case of 31≤p≤21. In Corollary 1, they prove that the Courtade-Kumar conjecture holds for the dictatorship function, as a special case of equiprobable Boolean functions, when p→21. This region is termed the noise interval p∈[21−pn21], where pn is defined as pn=41⋅2−n. Related to this result, in Theorem 1.15, the author of [6] proves that the Courtade-Kumar conjecture holds for high noise, that is MI(f(X),Y)≤1−H(p) holds for any Boolean function and for any noise ϵ≥0, such that (1−2⋅ϵ)2≤δ⇔21−2δ≤ϵ≤21+2δ, where δ>0 is a constant of small value. The author of [6] provides an improvement of Theorem 1 derived by Wyner and Ziv in [8], known as Mrs. Gerber’s Lemma, which was employed in [7], for the proof of Theorem 4. This strenghtening of Mrs. Gerber’s Lemma is employed in the proof of the Courtade-Kumar conjecture for high noise [6].
An extension of the Courtade-Kumar conjecture to two n−dimensional Boolean functions, is hypothesized to hold in [9], termed Conjecture 3. It states that, for any Boolean functions f,g:{0,1}n→{0,1}, the mutual information MI(f(X),g(Y))≤1−H(p). For several specific cases of the joint probability mass function of the binary random variables f(X) and g(Y), the authors analytically prove another conjecture, termed Conjecture 4, which implies Conjecture 3. A similar form of Conjecture 4 of [9] is analytically proved in [10], in a more general context than that of the results of [9]. In section V of [10], the authors prove that the mutual information MI(B,B^)≤1−H(p), for Boolean functions, B=f(X) and B^=g(Y), an estimator of Y, with fixed mean E(B)=E(B^)=a and P(B=B^=0)≥a2. Conjecture 3 of [9] is proved to hold in [11]. The Courtade-Kumar conjecture is generalized to continuous random variables in the preprint [4]. The function f takes as input n−dimensional real vectors, when they are correlated Gaussian random vectors and when they are correlated random vectors from the unit sphere. As output, the function produces values from the set {0,1}.
I-B Our contributions
Theorem 1
*Let X=[X1X2…Xn] be an n-dimensional random vector of independent and identically distributed Bernoulli(21) random variables and Y=[Y1Y2…Yn] the result of sending X through a discrete memoryless binary symmetric channel, without feedback and with the error probability 0≤p≤21. Let f:{0,1}n→{0,1} be an n-dimensional Boolean function, which has any of the following properties: (1) for any X(i)∈{0,1}n, f(X(i))=1,f(X)=0,∀X∈{0,1}n,X=X(i); (2) for any X(i)∈{0,1}n, f(X(i))=0,f(X)=1,∀X∈{0,1}n,X=X(i); (3)X(i)=[XrXn−r(i)], ∀Xn−r(i)∈{0,1}n−r, that is i∈{1,2,…,2n−r}, ∀r∈{1,2,…,n−1}, f(X(i))=1, f(X)=0,∀X∈{0,1}n,X=X(i); (4)X(i)=[XrXn−r(i)], ∀Xn−r(i)∈{0,1}n−r, that is i∈{1,2,…,2n−r}, ∀r∈{1,2,…,n−1}f(X(i))=0, f(X)=1,∀X∈{0,1}n,X=X(i). Let H(p) denote the binary entropy function. Then, MI(f(X),Y)≤1−H(p),∀n≥2,∀0≤p≤21.
*
II Proof of the Courtade-Kumar conjecture, for certain classes of n-dimensional Boolean functions, ∀n≥2 and ∀0≤p≤21
Lemma 1
For any k∈{1,2,…,n}, let Y=[y1y2…yk]∈{0,1}k be fixed and X(i)=[x1(i)x2(i)…xk(i)]∈{0,1}k range over all the 2k possible values. Then, the following identy holds
∑i=12kp(Y,X(i))=2k1.
Proof:
X(i) ranges from [00…0] to [11…1]. For any fixed Y, there is one X(i), such that X(i)=Y. There are (1k) number of vectors X(i) that differ from Y in one position. There are (jk) number of vectors X(i) that differ from Y in j positions. As a result, the summation of the joint probabilities becomes i=1∑2kp(Y,X(i))=i=1∑2kj=1∏kp(yj,xj(i))=r=0∑k(rk)⋅2k(1−p)k−r⋅pr=2k1.
∎
II-A Boolean functions from the classes 1 and 2 of Theorem 1
In order to apply Karamata’s inequality [2], we need to transform the mutual information into an algebraic expresion. To this end, we employ concepts from probability mass functions of transformations of random variables [ Ch 5, section 6 of [12] ]. Let X,Y be two n−dimensional discrete random vectors, with ensembles EX, EY, Z a discrete random variable, with ensemble EZ, and an n-dimensional function f, such that Z=f(X). Let T,U be two random vectors and g be a multidimensional function, such that T=g1(X,Y)=Y, U=g2(X,Y)=X and Z=g3(X,Y)=f(X).
[TABLE]
Let N0, N1, {xi(0)} and {xk(1)}, such that f(xi(0))=0 and f(xk(1))=1, ∀i∈{1,2,…N0}, ∀k∈{1,2,…N1}. For the first class of functions, N1=1, N0=2n−1. Then, pYZ(y,1)=pXY(x1(1),y), pYZ(y,0)=2n1−pYZ(y,1), ∀y∈EY={0,1}n. For any x1(1)∈{0,1}n, there exists: one vector, that is m0=1, yi0∈{0,1}n, such that yi0=x1(1), a number m1=(1n) of the vectors (yi1), ∀i1∈{m0+1,m0+2,…,m0+m1}, such that (yi1) differ from x1(1) in one position and a number mk=(kn) of the vectors (yik), ∀ik∈{(m0+…+mk−1)+1,(m0+…+mk−1)+2,…,(m0+…+mk−1)+mk}, such that (yik) differ from x1(1) in k positions, ∀k∈{0,1,2,…n}.
[TABLE]
[TABLE]
From this discussion, we can conclude that the mutual information is identical for all Boolean functions from the class of functions with N1=1 and N0=2n−1.
Let q={qi}, p={pi} and w={wi}, ∀i∈{1,2…,2n}, such that, ∀k∈{0,1,…,n},
[TABLE]
[TABLE]
Let a=2n−11−p and b=2n−1p. We want to prove that
[TABLE]
We need to transform the element (−n)⋅(n−1), from the right side of the inequality, into a sum of the type x⋅logx, such that the number of elements on the right side of the inequality equals that of the left side. That is, we need 2n⋅(2n−1)−2⋅2n−1⋅(2n−n)=(n−1)⋅2n elements. That is, we need to find x, such that (n−1)⋅2n⋅x⋅logx=(−n)⋅(n−1)⇔x=2n1.The right hand side sequence has three distinct elements ordered as a=2n−11−p≥c=2n1≥b=2n−1p. The left hand side sequence has the elements ordered as w2n=2n−11−pn≥w2n−1=2n−11−(1−p)⋅pn−1≥…≥wi=2n−11−(1−p)n−k⋅pk≥…≥w1=2n−11−(1−p)n. Let X=[x1x2…x2n⋅(2n−1)] and Y=[y1y2…y2n⋅(2n−1)] be equal to
[TABLE]
⇒X and Y are in descending order, which satisfies the first condition of Karamata’s theorem [2]. Let g:R+→R, g(x)=x⋅logx. Then, g is a convex function.
II-A1 We prove that w2n≤a
⇔2n−11−pn≤2n−11−p⇔(2n−1)⋅p−2n−1⋅pn≤2n−1−1. Let f(x):[0,21]→R+,f(x)=(2n−1)⋅x−2n−1⋅xn. f′(x)=2n−1−2n−1⋅n⋅xn−1≥n−2n−1⋅n⋅2n−11=0⇒f′(x)≥0,∀x∈[021],⇒ the function f is increasing. Let x∗ be the critical point of f. f′(x)=0⇒(x∗)n−1=2n−1⋅n2n−1≥2n−11⇔x∗≥21⇒f(x)≤f(21),∀x∈[021]⇒(2n−1)⋅p−2n−1⋅pn≤2n−1−1⇒w2n≤a.
Let SLk and SRk, ∀k∈{1,2,…2n⋅(2n−1)}, denote the partial sums computed with the elements of the left-hand sequence of the inequality (II-A) and with the right-hand one, respectively. Let K=2n−1⋅(2n−n). Using the binomial theorem [13], 2n−1≥1+n−1⇒2n−1⋅2n−1−1(n−1)≤2n−1≤2n−1⇒2n−1≤2n−1⋅(2n−n). w_{k}\leq w_{2^{n}}\leq a,\forall k\in\{2^{n},2^{n}-1,\ldots,1\}\Rightarrow\operatorname{SL}_{k}=\sum_{j=1}^{k}y_{j}\leq$$\operatorname{SR}_{k}=\sum_{j=1}^{k}x_{j}=k\cdot x_{1}=k\cdot a,\forall k\in\{1,2,\ldots,K\}.
II-A2 We prove that 2⋅w2n≤a+c
⇔2⋅2n−11−pn+2n−1p≤2n3. Let f(x):[0,21]→R+,f(x)=2⋅2n−11−xn+2n−1x. Using the binomial theorem [13], 2n−12⋅n≤1⇒2n−12⋅n⋅xn−1≤2n−11,∀0≤x≤21. f′(x)=2n−1−2⋅n⋅xn−1+2n−11⇒f′(x)≥0,∀0≤x≤21⇒f is increasing ⇒f(x)≤f(21),∀0≤x≤21⇒2⋅2n−11−pn+2n−1p≤2n3⇒2⋅w2n≤a+c.
II-A3 We prove that the inequalities involving the partial sums from Karamata’s theorem hold
. If n=2, it can be easily verified that SLK+i≤4⋅a+i⋅c=SRK+i,∀i∈{1,2,3,4}. If n≥3, using the binomial theorem [13], we have that K−2n⋅(n−1)≥1,∀n≥3; 2⋅w2n≤a+c⇒wj+wk≤a+c,∀j,k∈{2n,2n−1,…,1}⇔yj+yk≤x1+xK+1,∀j,k∈{2n⋅(2n−1),2n⋅(2n−1)−1,…,1}; K−i≥1,∀i∈{1,2,…,2n⋅(n−1)}⇒SLK+i=SLK−i+yK−i+1+…+yK+yK+1+…+yK+i=SLK−i+(yK−i+1+yK+1)+…+(yK+yK+i)⇒SLK+i≤SRK−i+i⋅(x1+xK+1)=SRK+i,∀i∈{1,2,…,2n⋅(n−1)}⇒SLK+i≤SRK+i,∀i∈{1,2,…,2n⋅(n−1)}.
II-A4 We prove that w1≥b
⇔2n−1≥2n−1⋅(1−p)n+(2n−1)⋅p. Let f(x):[0,21]→R+,f(x)=2n−1⋅(1−x)n+(2n−1)⋅x. Let x∗ be the critical point of f. f′(x)=2n−1⋅n⋅(1−x)n−1⋅(−1)+(2n−1), f′′(x)=n⋅(n−1)⋅2n−1⋅(1−x)n−2≥0,∀x∈[021]⇒f is a convex function and x∗ is a minimum point ⇒f(x)≤f(0)=f(21)=2n−1,∀x∈[021]⇒2n−1≥2n−1⋅(1−p)n+(2n−1)⋅p⇒w1≥b.
II-A5 We verify that the final inequalities involving the partial sums from Karamata’s theorem hold
In (II-A1), we proved that 2^{n}-1\leq$$2^{n-1}\cdot(2^{n}-n). K=2n−1⋅(2n−n) represents the total number of elements equal to b. The partial sum inequalities hold only for 2n−1 elements equal to b. We need to determine that the remaining number of elements equal to b, satisfy the partial sum inequalities. We denote them as {SL2n⋅(2n−1)−2n,…,SL2n⋅(2n−1)−2n−1⋅(2n−n)+1} and {SR2n⋅(2n−1)−2n,…,SR2n⋅(2n−1)−2n−1⋅(2n−n)+1}.
Let M=2n⋅(2n−1)−(2n−1). \operatorname{SL}_{M}=\sum_{j=1}^{M-i}y_{j}+y_{M-i+1}+\ldots+y_{M}\leq$$\operatorname{SR}_{M}=\sum_{j=1}^{M-i}x_{j}+x_{M-i+1}+\ldots x_{M},\forall i\in\{1,2,\ldots 2^{n-1}\cdot(2^{n}-n)-(2^{n}-1)\}$$\Rightarrow\operatorname{SL}_{M-i}\leq\operatorname{SR}_{M-i}+(b-y_{M-i+1})+\ldots+(b-y_{M})\leq\operatorname{SR}_{M-i},\forall i\in\{1,2,\ldots 2^{n-1}\cdot(2^{n}-n)-(2^{n}-1)\}$$\Rightarrow\operatorname{SL}_{M-i}\leq\operatorname{SR}_{M-i},\forall i\in\{1,2,\ldots,2^{n-1}\cdot(2^{n}-n)-(2^{n}-1)\}. These sums are well defined, because M−i≥1, ∀i∈{1,2,…,2n−1⋅(2n−n)−(2n−1)}. ∀i∈{1,2,…,2n−1⋅(2n−n)−(2n−1)}⇒M−i≥[2n⋅(2n−1)−(2n−1)]−[2n−1⋅(2n−n)−(2n−1)]⇔M−i≥2n⋅(2n−1)−2n−1⋅(2n−n).
The first partial sum that does not contain an element equal to b is given by i=2n−1⋅(2n−n)−(2n−1)⇒M−i=2n⋅(2n−1)−2n−1⋅(2n−n)=K+2n⋅(n−1). As a result, SLK+2n⋅(n−1)≤SRK+2n⋅(n−1), which we also proved in (II-A3). In conclusion, all the conditions in Karamata’s theorem are satisfied. This yields ∑i=12n⋅(2n−1)g(yi)≤∑i=12n⋅(2n−1)g(xi)⇔MI(Y,Z)≤1−H(p).
Following the above reasoning, the same result holds, for Boolean functions that have one element equal to [math] in their output table and the rest are equal to 1, that is N1=2n−1 and N0=1.
II-B Boolean functions from the classes 3 and 4 of Theorem 1
For any r∈{1,2,…,n−1}, let N1=2n−r and Y(k)=[Yr(k)Yn−r(k)], ∀k∈{1,2,…,2n}, and X(i)=[XrXn−r(i)], ∀i∈{1,2,…,2n−r}, such that X(i)∈{[Xr00…00],[Xr00…01],…,[Xr11…11]}. The output table of the Boolean function has N1=2n−r number of ones, such that these values correspond to the vector of inputs X(i)∈{[Xr00…00],[Xr00…01],…,[Xr11…11]}, where Xr is fixed. The rest of the output values are zeros.
From the properties of the binary symmetric channel, we have that p(Y(k),X(i))=p(Yr(k),Xr)⋅p(Yn−r(k),Xn−r(i)),∀k∈{1,2,…,2n}. According to Lemma 1, ∑i=12n−rp(Yn−r(k),Xn−r(i))=2n−r1. Let qk=pYZ(Y(k),1)=∑i=12n−rp(Y(k),X(i))=∑i=12n−rp(Yr(k),Xr)⋅p(Yn−r(k),Xn−r(i))=2n−rp(Yr(k),Xr). Let pk=pYZ(Y(k),0)=pY(Y(k))−pYZ(Y(k),1)=2n1−pYZ(Y(k),1),∀k∈{1,2,…,2n}. For any k∈{1,2,…,2n}, the total number of Y(k)=[Yr(k)Yn−r(k)] that have the same Yr(k) is equal to N1=2n−r. This produces a number of N1=2n−r identical probability mass values, qk=2n−rp(Yr(k),Xr) and N1=2n−r identical probability mass values, pk=2n1−qk. Let the vectors v=[v1v2…v2r] and t=[t1t2…t2r] denote the distinct values of the vectors q=[q1q2…q2n] and p=[p1p2…p2n], respectively.
[TABLE]
For any Xr∈{0,1}r fixed, there exists: one vector, that is m0=1, Yr(i0)∈{0,1}r, such that Yr(i0)=Xr, a number m1=(1r) of the vectors (Yr(i1)), ∀i1∈{m0+1,m0+2,…,m0+m1}, such that (Yr(i1)) differ from Xr in one position and a number mj=(jr) of the vectors (Yr(ij)), ∀ij∈{(m0+…+mj−1)+1,(m0+…+mj−1)+2,…,(m0+…+mj−1)+mj}, such that (Yr(ij)) differ from Xr in j positions, ∀j∈{0,1,2,…r}. As a result, we obtain
[TABLE]
The last inequality represents the result proved for Boolean functions from the classes 1 and 2, with n=r. Equality is obtained for r=1, that is for the dictatorship function. If r=1⇒N1=2n−1,N0=2n−1,v=[2n1−p2np] and t=[2np2n1−p]⇒MI(Y,Z)=1−H(p).
Following the above reasoning, the same result holds, for Boolean functions that have N0=2n−r elements equal to [math] in their output table and the rest are equal to 1, that is N1=2n−2n−r=2n−r⋅(2r−1), ∀r∈{1,2,…,n−1}. These Boolean functions satisfy an additional condition: the [math] values from the output table correspond to the input vectors X(i)=[XrXn−r(i)]∈{[Xr00…00],[Xr00…01], …,[Xr11…11]}, where Xr is fixed, ∀i∈{1,2,…,2n−r}.
III Conclusions
In this study, we proved the Courtade-Kumar conjecture, for certain subclasses of Boolean lex functions, for all dimensions, ∀n≥2, and for all values of the error probability, ∀0≤p≤21. We provided an algebraic proof using Karamata’s theorem as our main tool. We brought further improvement in the effort to establish this conjecture in its most general form. Our novelty lied in showing that, for several subclasses of Boolean lex functions, the conjecture holds for all dimensions, ∀n≥2, and for all values of the error probability, ∀0≤p≤21. We have tried to apply Karamata’s theorem to other types of Boolean functions, in order to solve the conjecture in its most general form. However, we have been unsuccesful in both applying the theorem directly to the mutual information inequality and in finding a suitable algebraic transformation of the original inequality into an expression that can be proved with Karamata’s theorem. The majorazation condition from this theorem cannot be verified.
IV Acknowledgments
We would like to thank Thomas Courtade for helpful discussions on lex functions and for indicating two articles which employ Karamata’s theorem and its extension, Schur convexity, namely an earlier version of the preprint [4] and [3], respectively.
Bibliography13
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[1] T. A. Courtade and G. R. Kumar, “Which Boolean functions maximize mutual information on noisy inputs?” IEEE Transactions on Information Theory , vol. 60, no. 8, pp. 4515–4525, August 2014.
2[2] J. Karamata, “Sur une inégalité relative aux fonctions convexes,” Publications de l’Institut Mathématique , vol. 1, no. 1, pp. 145–147, 1932.
3[3] G. R. Kumar and T. A. Courtade, “Which Boolean functions are most informative?” in Proceedings of the 2013 IEEE International Symposium on Information Theory (ISIT 2013) , 2013.
4[4] G. Kindler, R. O’Donnell, and D. Witmer, “Continous analogues of the most informative function problem,” preprint, ar Xiv:1506:03167 v 3.
5[5] O. Ordentlich, O. Shayevitz, and O. Weinstein, “An improved upper bound for the most informative Boolen function conjecture,” in Proceedings of the 2016 International Symposium on Information Theory (ISIT 2016) , 2016.
6[6] A. Samorodnitsky, “On the entropy of a noisy function,” IEEE Transactions on Information Theory , vol. 62, no. 10, pp. 5446–5464, October 2016.
7[7] E. Erkip, “The efficiency of information in investment,” Doctor of Philosophy dissertation, Stanford University, Department of Electrical Engineering, August 1996.
8[8] A. D. Wyner and J. Ziv, “A theorem on the entropy of certain binary sequences and applications: part I,” IEEE Transactions on Information Theory , vol. IT-19, no. 6, pp. 769–772, November 1973.