Boolean Functions with Biased Inputs: Approximation and Noise Sensitivity
Mohsen Heidari, S. Sandeep Pradhan, Ramji Venkataramanan

TL;DR
This paper analyzes how well Boolean functions can be approximated by simpler classes like juntas and linear functions under biased input distributions, linking approximation quality to Fourier analysis and noise sensitivity.
Contribution
It characterizes optimal approximations and mismatch probabilities for biased inputs using biased Fourier expansion, and connects these to noise sensitivity analysis.
Findings
Optimal approximation strategies for biased inputs are derived.
Mismatch probabilities are expressed via biased Fourier coefficients.
Noise sensitivity is characterized in terms of Fourier analysis.
Abstract
This paper considers the problem of approximating a Boolean function using another Boolean function from a specified class. Two classes of approximating functions are considered: -juntas, and linear Boolean functions. The input bits of the function are assumed to be independently drawn from a distribution that may be biased. The quality of approximation is measured by the mismatch probability between and the approximating function . For each class, the optimal approximation and the associated mismatch probability is characterized in terms of the biased Fourier expansion of . The technique used to analyze the mismatch probability also yields an expression for the noise sensitivity of in terms of the biased Fourier coefficients, under a general i.i.d. input perturbation model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Boolean Functions with Biased Inputs: Approximation and Noise Sensitivity
Mohsen Heidari
University of Michigan, USA
S. Sandeep Pradhan
University of Michigan, USA
Ramji Venkataramanan This work was supported in part by a grant from the Michigan Cambridge Research Initiative (MCRI) and by NSF grant CCF 1717299. University of Cambridge, UK
Abstract
This paper considers the problem of approximating a Boolean function using another Boolean function from a specified class. Two classes of approximating functions are considered: -juntas, and linear Boolean functions. The input bits of the function are assumed to be independently drawn from a distribution that may be biased. The quality of approximation is measured by the mismatch probability between and the approximating function . For each class, the optimal approximation and the associated mismatch probability is characterized in terms of the biased Fourier expansion of . The technique used to analyze the mismatch probability also yields an expression for the noise sensitivity of in terms of the biased Fourier coefficients, under a general i.i.d. input perturbation model.
I Introduction
Given a set of labeled data, we may wish to learn the optimal classifier within a specific class of functions. For example, given -dimensional data with binary labels, one may wish to construct a classifier that depends on only of the input variables (where may be much smaller than ). Such a parsimonious classifier would be less accurate on the training data than the optimal unconstrained classifier (which uses all variables), but may be more robust to errors in the data. A useful measure to quantify this trade-off is the probability of mismatch between the optimal unconstrained and constrained classifiers, under some distribution on the input variables.
Motivated by such applications, we consider the problem of approximating a given Boolean function using a simpler Boolean function from a specified class. The input set is equipped with a product distribution, where each of the input bits is drawn independently according to
[TABLE]
The quality of approximation is measured by the mismatch probability , where .
We consider two classes of approximating functions: i) -juntas where the Boolean function depends on at most of the input variables (with ), and ii) linear Boolean functions which are parity functions or negations of a parity on a subset of the input variables. In each case, we characterize the optimal approximation and the associated mismatch probability in terms of the -biased Fourier expansion of the original function .
The standard Fourier expansion [1] of a Boolean function is a multilinear polynomial with real coefficients, where each term in the polynomial corresponds to a parity function on a subset of the input variables. The Fourier expansion has been used to analyze Boolean functions in wide range of applications, e.g., to characterize the learning complexity [2, 3], noise sensitivity [1, 4, 5], approximation [6], and other information-theoretic properties [7, 8, 9]. The parity functions form a set of orthonormal basis functions when the inputs to the Boolean function are uniformly random.
For , the -biased Fourier expansion [1, Chap. 8] generalizes the standard Fourier expansion by expressing the Boolean function as a linear combination of functions that form an orthonormal basis when the input variables are drawn i.i.d. according to the distribution in (1). -biased Fourier analysis was used in [10] to show that a certain class of Boolean functions could be learnt efficiently using examples drawn from a biased input distribution. It has also been used to study threshold phenomena of random graphs [11]. In this paper, we use the -biased expansion to study optimal approximation of Boolean functions with biased inputs.
The contributions of the paper are as follows.
In Section III, we obtain an expression (Lemma 1) for the mismatch probability , where are Boolean functions with statistically dependent binary inputs and , respectively. Taking yields the noise sensitivity of a Boolean function under a general i.i.d. input perturbation model. Lemma 1 also generalizes a bound on the mismatch probability obtained in [12]. 2. 2.
Next, by taking , Lemma 1 is used to establish the optimal approximation with -juntas (Section IV), and with linear Boolean functions (Section V). We provide examples to illustrate how the optimal approximation within a class depends on the input bias.
We remark that some of the results (such as those in Section IV) hold for product distributions over any finite input alphabet. For concreteness, we focus on the binary input alphabet throughout the paper. We also mention that the worst-case circuit-size complexity of approximating Boolean functions with uniform inputs was analyzed in [13].
Notation: We use to denote the set . The cardinality of a set is denoted by . Given and a sequence of numbers , denote . We use upper case to denote random variables, lower case for realizations, and boldface for vectors.
II The -biased Fourier Expansion
We consider Boolean functions with the distribution on the entries of the input being i.i.d. according to (1). With this distribution, an inner product can be defined for the (larger) space of bounded functions with binary inputs and real-valued outputs. For any , let
[TABLE]
The -biased Fourier expansion [1, Chap. 8] of a function is
[TABLE]
where
[TABLE]
Here
[TABLE]
are the mean and standard deviation, respectively, of each of the ’s. For , the -biased Fourier coefficients can be computed as
[TABLE]
where the entries of are i.i.d. according to (1). Under this inner product, the set of functions is an orthonormal basis. Indeed, using the independence of the ’s, it can be shown that for any , the inner product if , and [math] otherwise.
Since (3) is an orthonormal expansion, the inner product between two functions can be expressed in terms of their -biased Fourier coefficients. For any
[TABLE]
The standard Fourier expansion corresponds to the case where . In this case, , and the basis functions are , .
For and any set , let denote the components of indexed by . We refer to as the projection of onto . This projection is denoted by , and is given by
[TABLE]
The last equality is obtained from (3) by noting that for any set , the conditional expectation . We note that the projection may have real-valued outputs, even when is Boolean.
III Boolean functions of jointly distributed random variables
In this section we investigate Boolean functions, say and , whose inputs that are statistically correlated. We derive an expression for the mismatch probability in term of biased Fourier coefficients of the functions.
Let be jointly distributed Boolean random variables with joint pmf whose marginals satisfy
[TABLE]
Let denote the correlation coefficient between and . The joint pmf is uniquely determined by the triple . Let be a pair of sequences with entries .
For any Boolean functions , the -biased Fourier expansion of is given by (3)–(4), and the -biased Fourier expansion of is
[TABLE]
where with
[TABLE]
The -biased Fourier coefficients of are
[TABLE]
The following result expresses the probability of mismatch between and in terms of their biased Fourier coefficients.
Lemma 1*.*
For with ,
[TABLE]
Proof:
Using the -biased Fourier expansion for and the -biased one for , we have
[TABLE]
Here is obtained as follows, using the independence of the pairs across : when , there is at least one index that belongs to only one of these two sets. If and , the term ; similarly if and , then .
Eq. (14) follows by observing that
[TABLE]
∎
For , Lemma 1 shows that the biased Fourier coefficients corresponding to sets of small cardinality play a key role in determining probability of mismatch. Since and are Boolean, by Parseval’s formula we have
[TABLE]
Suppose that the biased Fourier coefficients of and are both largely concentrated on sets of small cardinality. Then if the coefficients have the same sign on these sets, then (14) shows that the probability of mismatch between and will be small; if the coefficients have opposite signs on these sets, the probability of mismatch will be close to . On the other hand, if the biased Fourier coefficients of are concentrated on sets of large cardinality, then for , the probability of mismatch will be close to .
Noise sensitivity: The noise sensitivity of a Boolean function is defined as , where . It represents the mismatch probability under a perturbation model where the noisy input is assumed to be generated from the original input via a memoryless channel .
By taking , Lemma 1 yields the noise sensitivity for a general bivariate distribution on a pair of Boolean random variables, parametrized by . From (14), the noise sensitivity of can be expressed as
[TABLE]
where and are the -biased and -biased Fourier coefficients, respectively. This generalizes previous characterizations of noise sensitivity [1, 6], which assumed a symmetric perturbation model with .
In the following sections, we will use Lemma 1 to obtain the mismatch probability for approximations of Boolean functions. We will apply Lemma 1 taking to be the approximating function, and with (i.e., ).
IV Approximation with -Juntas
In the set of Boolean functions with input variables, -juntas are Boolean functions whose output depends only on a subset of at most input variables.
Definition 1**.**
A Boolean function is a -junta (with ), if there exist and a Boolean function such that
[TABLE]
In this section, we investigate approximation of Boolean functions by -juntas. Given a Boolean function , we wish to find a -junta that minimizes the mismatch probability where the entries of are i.i.d. according to (1). Letting denote the set of all -juntas, the minimum mismatch probability is denoted by
[TABLE]
The following theorem gives an expression for and an optimal -junta function for approximation of . For , we define if , and if .
Theorem 1**.**
Let be a Boolean function with input i.i.d. according to the distribution in (3). Then the minimum mismatch probability of a -junta approximation of (for ) is
[TABLE]
where is the projection defined in (8), and
[TABLE]
Furthermore, the minimum mismatch probability is achieved by the -junta approximation , where achieves the optimum in (19).
Proof:
We apply Lemma 1 taking to be a -junta, and , i.e., . From (14), for any the mismatch probability satisfies
[TABLE]
where are the -biased Fourier coefficients of and , respectively. Suppose that depends on the inputs , where is a subset of with at most elements. Then, for any . Hence, the mismatch probability in (21) equals
[TABLE]
The last equality in (22) holds because is a Boolean function, hence . Since is an arbitrary subset of with at most elements, (22) implies
[TABLE]
Next we obtain an upper bound on by specifying a -junta approximation of . Fix a subset with , and let . Note that for any we have
[TABLE]
Therefore, using (22), the mismatch probability of this approximation is
[TABLE]
Eq. (25) provides an upper-bound on for any such that . Taking , where achieves , we obtain
[TABLE]
Combining (26) and (23) completes the proof. ∎
Remark 1*.*
The proof shows that for any , the mismatch probability between and is given by (25). The function is the maximum a posteriori probability (MAP) estimator of given . To see this, note that the MAP estimator of given is a Boolean function such that if
[TABLE]
and otherwise. Since is a Boolean function, by the definition of , we have
[TABLE]
Hence, equals the MAP estimator of .
Remark 2*.*
Eq. (25) shows that the mismatch probability for approximating with is determined by . We can bound the mismatch probability from above and below in terms of , which depends only the weight of the -biased Fourier coefficients of .
Corollary 1**.**
With the assumptions of Theorem 1, the minimum mismatch probability satisfies
[TABLE]
where
[TABLE]
Proof:
Since , we have . Thus , which yields the upper bound by substituting in (19). Next, from Jensen’s inequality we have
[TABLE]
This implies that , which establishes the lower-bound. ∎
Given , Theorem 1 specifies the optimal -junta approximation for . The problem may be viewed from another perspective: given , find the smallest such that there exists a -junta function whose mismatch probability with is at most . When depends on all input variables, there is a trade-off between and : the lower the tolerance , the larger the required value of . As discussed in Section VI, this formulation can be useful in the context of learning arbitrary Boolean functions to within a specified mismatch probability.
Examples: We examine -junta approximations of the ‘or’ function , and the majority function . The function is defined as if , and otherwise. The majority function is defined as for all . Figure 1 shows the minimum mismatch probability as function of for the approximation of and using -juntas (i.e., and ). The bounds given in Corollary 1 are also plotted.
Using the symmetry between the inputs, we can show that
[TABLE]
For , the optimal approximation is therefore the constant function (for ). For , the projection does not have a compact closed form expression and is computed as
[TABLE]
V Approximation with linear Boolean functions
A linear Boolean function is either a parity or a negation of a parity. More precisely, a Boolean function is linear if it is of the form for some subset and constant .
Given a Boolean function , we wish to find a linear Boolean function that minimizes the mismatch probability . Let denote the set of linear Boolean functions with input variables. The minimum mismatch probability is denoted by
[TABLE]
where the entries of are i.i.d. according to (1).
For any Boolean function and , let
[TABLE]
where are the mean and standard deviation of the , defined in (5).
Theorem 2**.**
Let be a Boolean function with input i.i.d. according to the distribution in (3). Then the linear Boolean function minimizes the mismatch probability where
[TABLE]
The minimum mismatch probability is
Proof:
We apply Lemma 1 with (i.e., ), and a linear Boolean function. From (13)–(14), we have
[TABLE]
Since is linear Boolean, , for some , and . Thus
[TABLE]
Substituting in (32), we deduce
[TABLE]
The mismatch probability in (34) is minimized by taking and , where , and . ∎
For uniformly random inputs (), we have , which implies . The optimal linear approximation can be succinctly characterized in this case.
Corollary 2**.**
If the inputs are uniformly random, then the mismatch probability with is minimized by the linear Boolean function with
[TABLE]
Here is the standard Fourier coefficient for the set .
Figure 2 shows for and as a function of . The optimal linear approximation for is found to be a degree 5 linear function for , and the constant function for other values of . For , the optimal linear approximation is a degree function for , the constant function for , and the constant for . (The end points of these intervals are accurate up to 3 decimal places.)
VI Discussion and Future Work
An interesting open question is whether we can efficiently learn the optimal approximation of an unknown function, using a small (polynomial in ) number of samples from the function. These samples may be generated from either uniformly distributed or biased inputs. For example, we may wish to learn the optimal -junta approximation of a function, where is large enough to achieve a desired mismatch probability. It is known that any -junta can be learned with high probability with complexity of order , where [3]. However this result is for the setting where the learning algorithm uses examples from the -junta function. The question of how to efficiently learn the optimal -junta approximation using examples from the original function is open. Similar questions may be posed for other useful classes of approximating functions such as linear threshold functions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. O’Donnell, Analysis of Boolean functions . Cambridge University Press, 2014.
- 2[2] N. Linial, Y. Mansour, and N. Nisan, “Constant depth circuits, Fourier transform, and learnability,” J. ACM , vol. 40, no. 3, pp. 607–620, 1993.
- 3[3] E. Mossel, R. O’Donnell, and R. A. Servedio, “Learning functions of k 𝑘 k relevant variables,” J. Comput. Syst. Sci , vol. 69, no. 3, pp. 421–434, 2004.
- 4[4] G. Kalai, “Noise sensitivity and chaos in social choice theory,” tech. rep., Hebrew University, 2005.
- 5[5] J. Li and M. Médard, “Boolean functions: Noise stability, non-interactive correlation, and mutual information,” in Proc. IEEE ISIT , 2018.
- 6[6] E. Blais, R. O’Donnell, and K. Wimmer, “Polynomial regression under arbitrary product distributions,” Machine learning , vol. 80, no. 2-3, pp. 273–294, 2010.
- 7[7] T. A. Courtade and G. R. Kumar, “Which Boolean functions maximize mutual information on noisy inputs?,” IEEE Trans. Inf. Theory , vol. 60, no. 8, pp. 4515–4525, 2014.
- 8[8] N. Weinberger and O. Shayevitz, “On the optimal Boolean function for prediction under quadratic loss,” IEEE Trans. Inf. Theory , vol. 63, no. 7, pp. 4202–4217, 2017.
