Almost Optimal Distribution-free Junta Testing

Nader H. Bshouty

arXiv:1901.00717·cs.DS·June 9, 2020

Almost Optimal Distribution-free Junta Testing

Nader H. Bshouty

PDF

TL;DR

This paper introduces a simpler, more efficient adaptive algorithm for distribution-free $k$-junta testing that reduces query complexity from roughly $k^2$ to $k$, improving the practicality of property testing.

Contribution

It presents a new two-sided error adaptive algorithm for distribution-free $k$-junta testing with nearly optimal query complexity of $ ilde O(k/\epsilon)$, improving upon previous methods.

Findings

01

Reduces query complexity from $ ilde O(k^2/\epsilon)$ to $ ilde O(k/\epsilon)$.

02

Provides a simpler and more efficient testing algorithm.

03

Achieves near-optimal performance in distribution-free junta testing.

Abstract

We consider the problem of testing whether an unknown $n$ -variable Boolean function is a $k$ -junta in the distribution-free property testing model, where the distance between function is measured with respect to an arbitrary and unknown probability distribution over ${0, 1}^{n}$ . Chen, Liu, Servedio, Sheng and Xie showed that the distribution-free $k$ -junta testing can be performed, with one-sided error, by an adaptive algorithm that makes $\tilde{O} (k^{2}) / ϵ$ queries. In this paper, we give a simple two-sided error adaptive algorithm that makes $\tilde{O} (k / ϵ)$ queries.

Equations36

Pr_{x \in D, y \in U} [f (x) \neq = f (x_{J} \circ y_{\overset{ˉ}{J}})] \geq ϵ .

Pr_{x \in D, y \in U} [f (x) \neq = f (x_{J} \circ y_{\overset{ˉ}{J}})] \geq ϵ .

Pr_{x \in D, y \in U} [f (x) \neq = f (x_{J} \circ y_{\overset{ˉ}{J}})] \geq ϵ .

Pr_{x \in D, y \in U} [f (x) \neq = f (x_{J} \circ y_{\overset{ˉ}{J}})] \geq ϵ .

Pr_{y \in U, x \in D} [f (x_{X} \circ 0_{\overline{X}}) \neq = f (x_{J} \circ y_{X \ J} \circ 0_{\overline{X}})] \geq ϵ /2.

Pr_{y \in U, x \in D} [f (x_{X} \circ 0_{\overline{X}}) \neq = f (x_{J} \circ y_{X \ J} \circ 0_{\overline{X}})] \geq ϵ /2.

Pr_{z \in U, x \in D} [f (x_{X} \circ 0_{\overline{X}}) \neq = f ((x_{X} + z_{X}) \circ 0_{\overline{X}}) ∣ z_{J} = 0_{J}] \geq ϵ /2.

Pr_{z \in U, x \in D} [f (x_{X} \circ 0_{\overline{X}}) \neq = f ((x_{X} + z_{X}) \circ 0_{\overline{X}}) ∣ z_{J} = 0_{J}] \geq ϵ /2.

Pr_{u \in D} [f (u_{X} \circ 0_{\overline{X}}) \neq = f (u)] \geq ϵ /2

Pr_{u \in D} [f (u_{X} \circ 0_{\overline{X}}) \neq = f (u)] \geq ϵ /2

k (1 - \frac{ϵ}{2})^{2 l n (15 k) / ϵ} = \frac{1}{15} .

k (1 - \frac{ϵ}{2})^{2 l n (15 k) / ϵ} = \frac{1}{15} .

Pr [\mbox T h e a l g or i t hm d oes n o t r e j ec t]

Pr [\mbox T h e a l g or i t hm d oes n o t r e j ec t]

E_{x_{X_{ℓ}} \in U} [Z (x_{X_{ℓ}})] \leq \frac{1}{30} .

E_{x_{X_{ℓ}} \in U} [Z (x_{X_{ℓ}})] \leq \frac{1}{30} .

E_{x_{Y_{ℓ, 1}} \in U} E_{x_{Y_{ℓ, 0}} \in U} [Z (x_{Y_{ℓ, 0}} \circ x_{Y_{ℓ, 1}})] \leq \frac{1}{30}

E_{x_{Y_{ℓ, 1}} \in U} E_{x_{Y_{ℓ, 0}} \in U} [Z (x_{Y_{ℓ, 0}} \circ x_{Y_{ℓ, 1}})] \leq \frac{1}{30}

Pr_{x_{Y_{ℓ, 1}} \in U} [E_{x_{Y_{ℓ, 0}} \in U} [Z (x_{Y_{ℓ, 0}} \circ x_{Y_{ℓ, 1}})] \geq \frac{2}{15}] \leq \frac{1}{4} .

Pr_{x_{Y_{ℓ, 1}} \in U} [E_{x_{Y_{ℓ, 0}} \in U} [Z (x_{Y_{ℓ, 0}} \circ x_{Y_{ℓ, 1}})] \geq \frac{2}{15}] \leq \frac{1}{4} .

Pr [V (b) ∣ A]

Pr [V (b) ∣ A]

Pr [z_{τ (ℓ)} = 1]

Pr [z_{τ (ℓ)} = 1]

∣ I ∣ \leq k .

∣ I ∣ \leq k .

Pr_{u \in D} [f (u_{X} \circ 0_{\overline{X}}) \neq = f (u)] \leq ϵ /2.

Pr_{u \in D} [f (u_{X} \circ 0_{\overline{X}}) \neq = f (u)] \leq ϵ /2.

u \in D, y \in U Pr [f (u_{X} \circ 0_{\overline{X}}) = f (u_{I} \circ y_{X \ I} \circ 0_{\overline{X}})] \geq 1 - \frac{ϵ}{2} .

u \in D, y \in U Pr [f (u_{X} \circ 0_{\overline{X}}) = f (u_{I} \circ y_{X \ I} \circ 0_{\overline{X}})] \geq 1 - \frac{ϵ}{2} .

u \in D, z_{X \ I}^{(i)} \in U Pr [(\forall i) f (u_{X} \circ 0_{\overline{X}})

u \in D, z_{X \ I}^{(i)} \in U Pr [(\forall i) f (u_{X} \circ 0_{\overline{X}})

\tilde{O} (\frac{k}{ϵ}) .

\tilde{O} (\frac{k}{ϵ}) .

O (\frac{k}{ϵ} ln \frac{k}{ϵ}) .

O (\frac{k}{ϵ} ln \frac{k}{ϵ}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Almost Optimal Distribution-free Junta Testing

**Nader H. Bshouty

**Dept. of Computer Science

Technion, Haifa, 32000

Abstract

We consider the problem of testing whether an unknown $n$ -variable Boolean function is a $k$ -junta in the distribution-free property testing model, where the distance between functions is measured with respect to an arbitrary and unknown probability distribution over $\{0,1\}^{n}$ . Chen, Liu, Servedio, Sheng and Xie [36] showed that the distribution-free $k$ -junta testing can be performed, with one-sided error, by an adaptive algorithm that makes $\tilde{O}(k^{2})/\epsilon$ queries. In this paper, we give a simple two-sided error adaptive algorithm that makes $\tilde{O}(k/\epsilon)$ queries.

1 Inroduction

Property testing of Boolean function was first considered in the seminal works of Blum, Luby and Rubinfeld [11] and Rubinfeld and Sudan [43] and has recently become a very active research area. See for example, [1, 2, 3, 4, 7, 8, 13, 14, 15, 16, 18, 19, 22, 24, 28, 30, 33, 34, 38, 37, 40, 44] and other works referenced in the surveys [27, 41, 42].

A function $f:\{0,1\}^{n}\to\{0,1\}$ is said to be $k$ -junta if it depends on at most $k$ variables. Juntas have been of particular interest to the computational learning theory community [9, 10, 12, 31, 35, 39]. A problem closely related to learning juntas is the problem of testing juntas: Given black-box query access to a Boolean function $f$ . Distinguish, with high probability, the case that $f$ is $k$ -junta versus the case that $f$ is $\epsilon$ -far from every $k$ -junta.

In the uniform distribution framework, where the distance between two functions is measured with respect to the uniform distribution, Ficher et al. [24] introduced the junta testing problem and gave adaptive and non-adaptive algorithms that make $poly(k)/\epsilon$ queries. Blais in [5] gave a non-adaptive algorithm that makes $\tilde{O}(k^{3/2})/\epsilon$ queries and in [6] an adaptive algorithm that makes $O(k\log k+k/\epsilon)$ queries. On the lower bounds side, Fisher et al. [24] gave an $\Omega(\sqrt{k})$ lower bound. Chockler and Gutfreund [21] gave an $\Omega(k)$ lower bound for adaptive testing and, recently, Sağlam in [44] improved this lower bound to $\Omega(k\log k)$ . For the non-adaptive testing Chen et al. [17] gave the lower bound $\tilde{\Omega}(k^{3/2})/\epsilon$ .

In the distribution-free property testing, [29], the distance between Boolean functions is measured with respect to an arbitrary and unknown distribution ${\cal D}$ over $\{0,1\}^{n}$ . In this model, the testing algorithm is allowed (in addition to making black-box queries) to draw random $x\in\{0,1\}^{n}$ according to the distribution ${\cal D}$ . This model is studied in [20, 23, 26, 32, 36]. For testing $k$ -junta in this model, Chen et al. [36] gave a one-sided adaptive algorithm that makes $\tilde{O}(k^{2})/\epsilon$ queries and proved a lower bound $\Omega(2^{k/3})$ for any non-adaptive algorithm. The results of Halevy and Kushilevitz [32] gives a one-sided non-adaptive algorithm that makes $O(2^{k}/\epsilon)$ queries. The adaptive $\Omega(k\log k)$ uniform-distribution lower bound from [44] trivially extend to the distribution-free model.

In this paper, we close the gap between the adaptive lower and upper bound. We prove

Theorem 1.

For any $\epsilon>0$ , there is a two-sided distribution-free adaptive algorithm for $\epsilon$ -testing $k$ -junta that makes $\tilde{O}(k/\epsilon)$ queries.

Our exact upper bound is $O((k/\epsilon)\log(k/\epsilon))$ and therefore, by Sağlam [44] lower bound of $\Omega(k\log k)$ , our bound is tight for any constant $\epsilon$ .

2 Preliminaries

In this section we give some notations follows by a formal definition of the model and some preliminary known results

2.1 Notations

We start with some notations. Denote $[n]=\{1,2,\ldots,n\}$ . For $S\subseteq[n]$ and $x=(x_{1},\ldots,x_{n})$ we write $x(S)=\{x_{i}|i\in S\}$ . For $X\subset[n]$ we denote by $\{0,1\}^{X}$ the set of all binary strings of length $|X|$ with coordinates indexed by $i\in X$ . For $x\in\{0,1\}^{n}$ and $X\subseteq[n]$ we write $x_{X}\in\{0,1\}^{X}$ to denote the projection of $x$ over coordinates in $X$ . We denote by $1_{X}$ and $0_{X}$ the all one and all zero strings in $\{0,1\}^{X}$ , respectively. When we write $x_{I}=0$ we mean $x_{I}=0_{I}$ . For $X_{1},X_{2}\subseteq[n]$ where $X_{1}\cap X_{2}=\emptyset$ and $x\in\{0,1\}^{X_{1}},y\in\{0,1\}^{X_{2}}$ we write $x\circ y$ to denote their concatenation, the string in $\{0,1\}^{X_{1}\cup X_{2}}$ that agrees with $x$ over coordinates in $X_{1}$ and agrees with $y$ over $X_{2}$ . For $X\subseteq[n]$ we denote $\overline{X}=[n]\backslash X$ . We say that the Boolean function $f:\{0,1\}^{n}\to\{0,1\}$ is a literal if $f\in\{x_{1},\ldots,x_{n},\bar{x_{1}},\ldots,\bar{x_{n}}\}$ .

Given $f,g:\{0,1\}^{n}\to\{0,1\}$ and a probability distribution ${\cal D}$ over $\{0,1\}^{n}$ , we say that $f$ is $\epsilon$ -close to $g$ with respect to ${\cal D}$ if ${\bf Pr}_{x\in{\cal D}}[f(x)\not=g(x)]\leq\epsilon$ , where $x\in{\cal D}$ means $x$ is chosen from $\{0,1\}^{n}$ according to the distribution ${\cal D}$ . We say that $f$ is $\epsilon$ -far from $g$ with respect to ${\cal D}$ if ${\bf Pr}_{x\in{\cal D}}[f(x)\not=g(x)]\geq\epsilon$ . We say that $f$ is $\epsilon$ -far from every $k$ -junta with respect to ${\cal D}$ if for every $k$ -junta $g$ , $f$ is $\epsilon$ -far from $g$ with respect to ${\cal D}$ . We will use $U$ to denote the uniform distribution over $\{0,1\}^{n}$ .

2.2 The Model

In this subsection, we define the model.

We consider the problem of testing juntas in the distribution-free testing model. In this model, the algorithm has access to a $k$ -junta $f$ via a black-box that returns $f(x)$ when a string $x$ is queried, and access to unknown distribution ${\cal D}$ via an oracle that returns $x\in\{0,1\}^{n}$ chosen randomly according to the distribution ${\cal D}$ .

A distribution-free testing algorithm ${\cal A}$ is a algorithm that, given as input a distance parameter $\epsilon$ and the above two oracles,

if $f$ is $k$ -junta then ${\cal A}$ output “accept” with probability at least $2/3$ . 2. 2.

if $f$ is $\epsilon$ -far from every $k$ -junta with respect to the distribution ${\cal D}$ then it output “reject” with probability at least $2/3$ .

We say that ${\cal A}$ is one-sided if it always accepts when $f$ is $k$ -junta, otherwise, it is called two sided algorithm. The query complexity of a distribution-free testing algorithm is the number of queries made on $f$ .

2.3 Preliminaries Results

In this section, we give some known results that will be used in the sequel.

For a Boolean function $f$ and $X\subset[n]$ , we say that $X$ is a relevant set of $f$ if there are $a,b\in\{0,1\}^{n}$ such that $f(a)\not=f(b_{X}\circ a_{\overline{X}})$ . When $X=\{i\}$ then we say that $x_{i}$ is relevant variable of $f$ . Obviously, if $X$ is relevant set of $f$ then $x(X)$ contains at least one relevant variable of $f$ . In particular, we have

Lemma 2.

If $\{X_{i}\}_{i\in[r]}$ is a partition of $[n]$ then for any Boolean function $f$ the number of relevant sets $X_{i}$ of $f$ is at most the number of relevant variables of $f$ .

We will use the following folklore result that is formally proved in [36].

Lemma 3.

Let $\{X_{i}\}_{i\in[r]}$ be a partition of $[n]$ . Let $f$ be a Boolean function and $u\in\{0,1\}^{n}$ . If $f(u)\not=f(0)$ then a relevant set $X_{\ell}$ of $f$ with a string $v\in\{0,1\}^{n}$ that satisfies $f(v)\not=f(0_{X_{\ell}}\circ v_{\overline{X_{\ell}}})$ can be found with $\lceil\log_{2}r\rceil$ queries.

The following is from [6]

Lemma 4.

There exists a one-sided adaptive algorithm, UniformJunta $(f,k,\epsilon,\delta)$ , for $\epsilon$ -testing $k$ -junta that makes $O(((k/\epsilon)+k\log k)\log(1/\delta))$ queries and rejects $f$ with probability at least $1-\delta$ when it is $\epsilon$ -far from every $k$ -junta with respect to the uniform distribution.

The following is from [36].

Lemma 5.

Let ${\cal D}$ be any probability distribution over $\{0,1\}^{n}$ . If $f$ is $\epsilon$ -far from every $k$ -junta with respect to ${\cal D}$ then for any $J\subseteq[n]$ , $|J|\leq k$ we have

[TABLE]

Proof.

Let $J\subseteq[n]$ of size $|J|\leq k$ . For every fixed $y\in\{0,1\}^{n}$ the function $f(x_{J}\circ y_{\bar{J}})$ is $k$ -junta and therefore ${\bf Pr}_{x\in{\cal D}}[f(x)\not=f(x_{J}\circ y_{\bar{J}})]\geq\epsilon.$ Therefore

[TABLE]

∎

3 The Algorithm

In this section, we prove the correctness of the algorithm and show that it makes $\tilde{O}(k/\epsilon)$ queries. We first give an overview of the algorithm then prove its correctness and analyze its query complexity.

3.1 Overview of the Algorithm

In this subsection we give an overview of the algorithm. We will use the notationד in Subsection 2.1 and the definitions and Lemmas in Subsection 2.3.

Consider the algorithm in Figure 1. In steps 1-1, the algorithm uniformly at random partitions $[n]$ into $r=2k^{2}$ disjoint sets $X_{1},\ldots,X_{r}$ . Lemma 6 shows that,

Fact 1.

If the function is $k$ -junta then with high probability (w.h.p), each set of variables $x(X_{i})=\{x_{j}|j\in X_{i}\}$ contains at most one relevant variable.

In steps 1-LABEL:EndRep, the algorithm finds

Fact 2.

relevant sets $\{X_{i}\}_{i\in I}$ such that for $X=\cup_{i\in I}X_{i}$ , w.h.p., the function $f(x_{X}\circ 0_{\overline{X}})$ is $\epsilon/2$ -close to $f$ with respect to ${\cal D}$ .

To find such set, the algorithm, after finding relevant sets $\{X_{i}\}_{i\in I^{\prime}}$ , chooses random string $u\in{\cal D}$ and tests if $f(u_{X^{\prime}}\circ 0_{\overline{X^{\prime}}})\not=f(u)$ where $X^{\prime}=\cup_{i\in I^{\prime}}X_{i}$ . The variable $t(X^{\prime})$ counts for how many random strings $u\in{\cal D}$ we get $f(u_{X^{\prime}}\circ 0_{\overline{X^{\prime}}})=f(u)$ . If $t(X^{\prime})$ reaches the value $O((\log k)/\epsilon)$ then, w.h.p, $f(x_{X^{\prime}}\circ 0_{\overline{X^{\prime}}})$ is $\epsilon/2$ -close to $f$ with respect to ${\cal D}$ and $X=X^{\prime}$ . Otherwise, $f(u_{X^{\prime}}\circ 0_{\overline{X^{\prime}}})\not=f(u)$ and using Lemma 3 the algorithm finds a new relevant set $X_{\ell}$ . This is proved in Lemma 10.

In addition, for each relevant set $X_{\ell}$ , $\ell\in I$ , it finds a string $v^{(\ell)}$ that satisfies $f(v^{(\ell)})\not=f(0_{X_{\ell}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ . Obviously, if $|I|>k$ then, since each relevant set contains at least one relevant variable, the target is not $k$ -junta and the algorithm rejects. See Lemma 2.

Now one of the key ideas is the following: If $f$ is $k$ -junta then $f(x_{X}\circ 0_{\overline{X}})$ is $k$ -junta. If $f$ is $\epsilon$ -far from every $k$ -junta with respect to ${\cal D}$ then since, by Fact 2, w.h.p., $f(x_{X}\circ 0_{\overline{X}})$ is $\epsilon/2$ -close to $f$ with respect to ${\cal D}$ we have that,

Fact 3.

If $f$ is $\epsilon$ -far from every $k$ -junta with respect to ${\cal D}$ then, w.h.p., $f(x_{X}\circ 0_{\overline{X}})$ is $\epsilon/2$ -far from every $k$ -junta with respect to ${\cal D}$ .

Now, since each $X_{\ell}$ , $\ell\in I$ is relevant set and $f(v^{(\ell)})\not=f(0_{X_{\ell}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ , for $\ell\in I$ the function $f(x_{X_{\ell}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ is non-constant. In steps 1-LABEL:ConE, the algorithm tests that,

Fact 4.

w.h.p., for each $\ell\in I$ there is $\tau(\ell)\in X_{\ell}$ such that $f(x_{X_{\ell}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ is close to some literal in $\{x_{\tau(\ell)},\overline{x_{\tau(\ell)}}\}$ , with respect to the uniform distribution.

This is done using the procedure UniformJunta in Lemma 4.

If $f$ is $k$ -junta then, by Fact 1 and 2, w.h.p., it passes this test (does not output reject). This is Lemma 7. If the algorithm does not pass this test, it rejects. If $f$ is not $k$ -junta and it passes this test, then the statement in Fact 4 is true. This is proved in Lemma 11.

Consider now steps 1-LABEL:Finn. First, let us consider a function $f$ that is $\epsilon$ -far from every $k$ -junta with respect to ${\cal D}$ . Let $J=\{\tau(\ell)\ |\ \ell\in I\}$ where $\tau(\ell)$ is as defined in Fact 4. Since by Fact 3, w.h.p., $f(x_{X}\circ 0_{\overline{X}})$ is $\epsilon/2$ -far from every $k$ -junta with respect to ${\cal D}$ and $|J|=|I|\leq k$ , by Lemma 5, w.h.p.,

[TABLE]

So we need to test whether $f(x_{X}\circ 0_{\overline{X}})$ is $\epsilon/2$ -far from $f(x_{J}\circ y_{X\backslash J}\circ 0_{\overline{X}})$ (those are equal in the case when $f$ is $k$ -Junta). This is the last test we would like to do but the problem is that we do not know $J$ , so we cannot use this test as is. So we change it, as is done in [36], to an equivalent test as follows

[TABLE]

To be able to draw uniformly random $z_{X}$ with $z_{J}=0_{J}$ , we use Fact 4, that is, the fact that each $f(x_{X_{\ell}}\circ v_{\overline{X_{\ell}}}^{(\ell)})$ is close to one of the literals in $\{x_{\tau(\ell)},\overline{x_{\tau(\ell)}}\}$ . For every $\ell\in I$ , the algorithm draws uniformly random $w:=z_{X_{\ell}}$ and then using the fact that $f(x_{X_{\ell}}\circ v_{\overline{X_{\ell}}}^{(\ell)})$ is close to one of the literals in $\{x_{\tau(\ell)},\overline{x_{\tau(\ell)}}\}$ where $\tau(\ell)\in X_{\ell}$ the algorithm tests in which set $Y_{\ell,0}:=\{j\in X_{\ell}|\ w_{j}=0\}$ or $Y_{\ell,1}:=\{j\in X_{\ell}|\ w_{j}=1\}$ the index $\tau(\ell)$ falls. If $\tau(\ell)\in Y_{\ell,0}$ then the entry $\tau(\ell)$ in $z_{X_{\ell}}$ is zero and if $\tau(\ell)\in Y_{\ell,1}$ then the entry $\tau(\ell)$ in $z_{X_{\ell}}$ is one. In the latter case, the algorithm replaces $z_{X_{\ell}}$ with $\overline{z_{X_{\ell}}}$ (negation of each entry in $z_{X_{\ell}}$ ) which is also uniformly random. This gives a random uniform $z_{X_{\ell}}$ with $z_{\tau(\ell)}=0$ . We do that for every $\ell\in I$ and get a random uniform $z$ with $z_{J}=0$ . This is proved in Lemma 12. Then the algorithm rejects if $f(x_{X}\circ 0_{\overline{X}})\not=f((x_{X}+z_{X})\circ 0_{\overline{X}})$ . If $f(x_{X}\circ 0_{\overline{X}})$ is $\epsilon/2$ -far from every $k$ -junta then, by Lemma 5, $f(x_{X}\circ 0_{\overline{X}})$ is $\epsilon/2$ -far from $f(x_{J}\circ y_{X\backslash J}\circ 0_{\overline{X}})$ , and the algorithm, with one test, rejects with probability at least $\epsilon/2$ . Therefore, by repeating this test $O(1/\epsilon)$ times the algorithm rejects w.h.p. This is proved in Lemma 13.

Now we consider $f$ that is $k$ -junta. Obviously, if $f$ is $k$ -junta then $f(x_{X}\circ 0_{\overline{X}})=f((x_{X}+z_{X})\circ 0_{\overline{X}})$ when $z_{J}=0$ and the algorithm accepts. This is because $x(J)$ are the relevant variables in $f(x_{X}\circ 0_{\overline{X}})$ . This is proved in Lemma 8.

3.2 The algorithm for $k$ -Junta

In this subsection, we show that if the target function $f$ is $k$ -junta then the algorithm accepts with probability at least $2/3$ .

We first prove

Lemma 6.

Consider steps 1-1 in the algorithm. If $f$ is a $k$ -junta then, with probability at least $2/3$ , for each $i\in[r]$ , the set $x(X_{i})=\{x_{j}|j\in X_{i}\}$ contains at most one relevant variable of $f$ .

Proof.

Let $x_{i_{1}}$ and $x_{i_{2}}$ be two relevant variables in $f$ . The probability that $x_{i_{1}}$ and $x_{i_{2}}$ are in the same set is equal to $1/r$ . By the union bound, it follows that the probability that some relevant variables $x_{i_{1}}$ and $x_{i_{2}}$ in $f$ are in the same set is at most ${k\choose 2}/r\leq 1/3$ . ∎

We now show that w.h.p. the algorithm reaches the final test in the algorithm

Lemma 7.

If $f$ is $k$ -junta and each $x(X_{i})$ contains at most one relevant variable of $f$ then

Each $x(X_{i})$ , $i\in I$ , contains exactly one relevant variable. 2. 2.

The algorithm reaches step 1

Proof.

By Lemma 3 and steps LABEL:con1-LABEL:Finddd, for $\ell\in I$ , $f(v^{(\ell)})\not=f(0_{X_{\ell}}\circ v_{\overline{X_{\ell}}}^{(\ell)})$ and therefore $x(X_{\ell})$ contains exactly one relevant variable. Thus, for every $\ell\in I$ , $f(x_{X_{\ell}}\circ v_{\overline{X_{\ell}}}^{(\ell)})$ is a literal.

If the algorithm does not reach step 1, then it either halts in step LABEL:Rej, LABEL:Rej2 or LABEL:ConE. If it halts in step LABEL:Rej then $|I|>k$ and therefore, by Lemma 2, $f$ contains more than $k$ relevant variables and then it is not $k$ -Junta. If it halts in step LABEL:Rej2 then, by Lemma 4, for some $X_{\ell}$ , $\ell\in I$ , $f(x_{X_{\ell}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ is not $1$ -Junta (literal or constant function) and therefore $X_{\ell}$ contains at least two relevant variables. If it halts in step LABEL:ConE, then $f(b_{X_{\ell}}\circ v^{(\ell)}_{\overline{X_{\ell}}})=f(\overline{b_{X_{\ell}}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ and then $f(x_{X_{\ell}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ is not a literal. In all cases we get a contradiction. ∎

We now give two Lemmas that show that, with probability at least $2/3$ , the algorithm accepts $k$ -junta.

Lemma 8.

If $f$ is $k$ -Junta and each $x(X_{i})$ contains at most one relevant variable of $f$ then the algorithm outputs “accept”.

Proof.

By Lemma 7, the algorithm reaches step 1. We now show that it reaches step 1. Now we need to show that the algorithm does not halt in step LABEL:GGG or LABEL:Finn.

Since $Y_{\ell,0},Y_{\ell,1}$ is a partition of $X_{\ell}$ , $\ell\in I$ and $X_{\ell}$ contains exactly one relevant variable in $x(X_{\ell})$ of $f$ , this variable is either in $x(Y_{\ell,0})$ or in $x(Y_{\ell,1})$ but not in both. Suppose w.l.o.g. it is in $x(Y_{\ell,0})$ and not in $x(Y_{\ell,1})$ . Then $f(x_{Y_{\ell,0}}\circ b_{Y_{\ell,1}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ is a literal and $f(x_{Y_{\ell,1}}\circ b_{Y_{\ell,0}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ is a constant function. This implies that for any $b$ , $f(b_{Y_{\ell,0}}\circ b_{Y_{\ell,1}}\circ v^{(\ell)}_{\overline{X_{\ell}}})\not=f(\overline{b_{Y_{\ell,0}}}\circ b_{Y_{\ell,1}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ and $f(b_{Y_{\ell,1}}\circ b_{Y_{\ell,0}}\circ v^{(\ell)}_{\overline{X_{\ell}}})=f(\overline{b_{Y_{\ell,1}}}\circ b_{Y_{\ell,0}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ . Therefore, $G_{\ell,0}=h$ and $G_{\ell,1}=0$ . Thus the algorithm does not halt in step LABEL:GGG.

Now for every $X_{\ell}$ , $\ell\in I$ , let $\tau(\ell)\in X_{\ell}$ be such that $f(x_{X_{\ell}}\circ v^{(\ell)}_{\overline{X_{\ell}}})\in\{x_{\tau(\ell)},\overline{x_{\tau(\ell)}}\}$ . If $\tau(\ell)\in Y_{\ell,0}$ then $G_{\ell,0}=h$ and then by step LABEL:Gl, $z_{\tau(\ell)}=w_{\tau(\ell)}=0$ . If $\tau(\ell)\in Y_{\ell,1}$ then $G_{\ell,1}=h$ and then $z_{\tau(\ell)}=\overline{w_{\tau(\ell)}}=0$ . Therefore for every relevant variable $x_{\tau(\ell)}$ in $\hat{f}=f(x_{X}\circ 0_{\overline{X}})$ we have $z_{\tau(\ell)}=0$ which implies that $f(u_{X}\circ 0_{\overline{X}})=f((u_{X}+z_{X})\circ 0_{\overline{X}})$ and therefore the algorithm does not halt in step LABEL:Finn. ∎

Lemma 9.

If $f$ is $k$ -Junta then the algorithm outputs “accept” with probability at least $2/3$ .

Proof.

The result follows from Lemma 6 and Lemma 8. ∎

3.3 The Algorithm for $\epsilon$ -Far Functions

In this subsection, we prove that if $f$ is $\epsilon$ -far from every $k$ -junta then the algorithm rejects with probability at least $2/3$ .

The first lemma shows that, w.h.p., $f(u_{X}\circ 0_{\overline{X}})$ is $\epsilon/2$ -close to $f$ .

Lemma 10.

If the algorithm reaches step 1 then $t(X)=2\ln(15k)/\epsilon$ and $|I|\leq k$ . If

[TABLE]

then the algorithm reaches step 1 with probability at most $1/15$ .

Proof.

The algorithm does not reaches step 1 if and only if it halts in step LABEL:Rej and then $|I|>k$ . The size of $I$ is increased by one each time the condition, $f(u_{X}\circ 0_{\overline{X}})\not=f(u)$ , in step LABEL:con1, is true. Therefore, if the algorithm reaches step 1 then the condition in step LABEL:con1 was true at most $k$ times and $|I|\leq k$ . Then steps LABEL:Find-LABEL:tx0 are executed at most $k$ times. Thus, $t()$ is updated to [math] at most $k$ times. The loop LABEL:Cho-LABEL:EndRep is repeated $M$ times and $t()$ is updated to [math] at most $k$ times and therefore there is $X$ for which $t(X)=M/k=2\ln(15k)/\epsilon$ . This implies that when the algorithm reaches step 1, we have $t(X)=2\ln(15k)/\epsilon$ .

The probability that the algorithm reaches step 1 with ${\bf Pr}_{u\in{\cal D}}[f(u_{X}\circ 0_{\overline{X}})\not=f(u)]>\epsilon/2$ is the probability that for one (of the at most $k$ ) $X^{\prime}$ , ${\bf Pr}_{u\in{\cal D}}[f(u_{X^{\prime}}\circ 0_{\overline{X^{\prime}}})\not=f(u)]>\epsilon/2$ and $t(X^{\prime})=2\ln(15k)/\epsilon$ . By the union bound, this probability is less than

[TABLE]

∎

In the following lemma we show that, w.h.p, each $f(x_{X_{\ell}}\circ v_{\overline{X_{\ell}}}^{(\ell)})$ is close to a literal.

Lemma 11.

Consider steps 1-LABEL:Rej2. If for some $\ell\in I$ , $f(x_{X_{\ell}}\circ v_{\overline{X_{\ell}}}^{(\ell)})$ is $(1/30)$ -far from every literal with respect to the uniform distribution then, with probability at least $1-(2/15)$ , the algorithm rejects.

Proof.

If $f(x_{X_{\ell}}\circ v_{\overline{X_{\ell}}}^{(\ell)})$ is $(1/30)$ -far from every literal with respect to the uniform distribution then it is either (case 1) $(1/30)$ -far from every $1$ -Junta (literal or constant) or (case 2) $(1/30)$ -far from every literal and $(1/30)$ -close to [math]-Junta. In case 1, by Lemma 4, with probability at least $1-(1/15)$ , ${\bf UniformJunta}$ $(f(x_{X_{\ell}}\circ v_{\overline{X_{\ell}}}^{(\ell)}),1,1/30,1/15)$ $=$ “reject” and then the algorithm rejects. In case 2, if $f(x_{X_{\ell}}\circ v_{\overline{X_{\ell}}}^{(\ell)})$ is $1/30$ -close to some [math]-Junta then it is either $(1/30)$ -close to [math] or $(1/30)$ -close to $1$ . Suppose it is $(1/30)$ -close to [math]. Let $b$ be a random uniform string generated in steps LABEL:ConB. Then $\overline{b}$ is random uniform and for $g(x)=f(x_{X_{\ell}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ we have

[TABLE]

By the union bound the result follows. ∎

In the next lemma we prove that, w.h.p, the string $z$ generated in steps LABEL:feld-LABEL:Gl satisfies $z_{J}=0$ where $x(J)$ are relevant variables of $f(u_{X}\circ 0_{\overline{X}})$ .

Lemma 12.

Consider steps LABEL:feld-LABEL:Gl. If for every $\ell\in I$ the function $f(x_{X_{\ell}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ is $(1/30)$ -close to a literal in $\{x_{\tau(\ell)},\bar{x}_{\tau(\ell)}\}$ with respect to the uniform distribution, where $\tau(\ell)\in X_{\ell}$ , and $\{G_{\ell,0},G_{\ell,1}\}=\{0,h\}$ then, with probability at least $1-k(3/4)^{h}$ , we have: For every $\ell\in I$ , $z_{\tau(\ell)}=0$ .

Proof.

Fix some $\ell$ . Suppose $f(x_{X_{\ell}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ is $(1/30)$ -close to $x_{\tau(\ell)}$ with respect to the uniform distribution. The case when it is $(1/30)$ -close to $\overline{{x}_{\tau(\ell)}}$ is similar. Since $X_{\ell}=Y_{\ell,0}\cup Y_{\ell,1}$ and $Y_{\ell,0}\cap Y_{\ell,1}=\emptyset$ we have that $\tau(\ell)\in Y_{\ell,0}$ or $\tau(\ell)\in Y_{\ell,1}$ , but not both. Suppose $\tau(\ell)\in Y_{\ell,0}$ . The case where $\tau(\ell)\in Y_{\ell,1}$ is similar. Define the random variable $Z(x_{X_{\ell}})=1$ if $f(x_{X_{\ell}}\circ v^{(\ell)}_{\overline{X_{\ell}}})\not=x_{\tau(\ell)}$ and $Z(x_{X_{\ell}})=0$ otherwise. Then

[TABLE]

Therefore

[TABLE]

and by Markov’s bound

[TABLE]

That is, for a random uniform string $b\in\{0,1\}^{n}$ , with probability at least $3/4$ , $f(x_{Y_{\ell,0}}\circ b_{Y_{\ell,1}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ is $(2/15)$ -close to $x_{\tau(\ell)}$ with respect to the uniform distribution. Now, given that $f(x_{Y_{\ell,0}}\circ b_{Y_{\ell,1}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ is $(2/15)$ -close to $x_{\tau(\ell)}$ with respect to the uniform distribution the probability that $G_{\ell,0}=0$ is the probability that $f(b_{Y_{\ell,0}}\circ b_{Y_{\ell,1}}\circ v^{(\ell)}_{\overline{X_{\ell}}})=f(\overline{b_{Y_{\ell,0}}}\circ b_{Y_{\ell,1}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ for $h$ random uniform strings $b\in\{0,1\}^{n}$ . Let $b^{(1)},\ldots,b^{(h)}$ be $h$ random uniform strings in $\{0,1\}^{n}$ , $V(b)$ be the event $f(b_{Y_{\ell,0}}\circ b_{Y_{\ell,1}}\circ v^{(\ell)}_{\overline{X_{\ell}}})=f(\overline{b_{Y_{\ell,0}}}\circ b_{Y_{\ell,1}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ and $A$ the event that $f(x_{Y_{\ell,0}}\circ b_{Y_{\ell,1}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ is $(2/15)$ -close to $x_{\tau(\ell)}$ with respect to the uniform distribution. Let $g(x_{Y_{\ell,0}})=f(x_{Y_{\ell,0}}\circ b_{Y_{\ell,1}}\circ v^{(\ell)}_{\overline{X_{\ell}}})$ . Then

[TABLE]

Since $\tau(\ell)\in Y_{\ell,0}$ , we have $w_{\tau(\ell)}=0$ . Therefore, by step LABEL:Gl and since $\tau(\ell)\in X_{\ell}$ ,

[TABLE]

Therefore, the probability that $z_{\tau(\ell)}=1$ for some $\ell\in I$ is at most $k(3/4)^{h}$ . ∎

We now show that w.h.p the algorithm reject if $f$ is $\epsilon$ -far from every $k$ -junta

Lemma 13.

If $f$ is $\epsilon$ -far from every $k$ -junta with respect to ${\cal D}$ then, with probability at least $2/3$ , the algorithm outputs “reject”.

Proof.

If the algorithm stops in step LABEL:Rej then we are done. Therefore we may assume that

[TABLE]

By Lemma 10, if ${\bf Pr}_{u\in{\cal D}}[f(u_{X}\circ 0_{\overline{X}})\not=f(u)]\geq\epsilon/2$ then, with probability at most $1/15$ , the algorithm reaches step 1. So we may assume that (failure probability $1/15$ )

[TABLE]

Since $f$ is $\epsilon$ -far from every $k$ -junta with respect to ${\cal D}$ and $f(x_{X}\circ 0_{\overline{X}})$ is $\epsilon/2$ -close to $f$ with respect to ${\cal D}$ we have $f(x_{X}\circ 0_{\overline{X}})$ is $(\epsilon/2)$ -far from every $k$ -junta with respect to ${\cal D}$ . Therefore, by Lemma 5,

[TABLE]

By Lemma 11, if some $f(x_{X_{\ell}}\circ v_{\overline{X_{\ell}}}^{(\ell)})$ is $(1/30)$ -far from any literal with respect to the uniform distribution then, with probability at least $1-(2/15)$ , the algorithm rejects. So we may assume (failure probability $2/15$ ) that every $f(x_{X_{\ell}}\circ v_{\overline{X_{\ell}}}^{(\ell)})$ is $(1/30)$ -close to some $x_{\tau(\ell)}$ or $\overline{x_{\tau(\ell)}}$ with respect to the uniform distribution, where $\tau(\ell)\in X_{\ell}$ .

Let $z^{(1)},\ldots,z^{(M^{\prime})}$ be the strings generated in step LABEL:Gl. By Lemma 12, with probability at least $1-M^{\prime}k(3/4)^{h}\geq 1-(1/15)$ , every $z^{(i)}$ generated in step LABEL:Gl satisfies $z^{(i)}_{\tau(\ell)}=0$ for all $\ell\in I$ . Also, since the distribution of $w_{X_{\ell}}$ and $\overline{w_{X_{\ell}}}$ is uniform, the distribution of $z^{(i)}_{X\backslash I}$ and $u_{X\backslash I}+z^{(i)}_{X\backslash I}$ is uniform. We now assume (failure probability $1/15$ ) that $z^{(i)}_{I}=0$ for all $i$ . Therefore, by (3),

[TABLE]

Therefore, the failure probability of an output “reject” is at most $1/15+2/15+1/15+1/15=1/3$ . ∎

3.4 The Query Complexity of the Algorithm

In this section we show that

Lemma 14.

The query complexity of the algorithm is

[TABLE]

Proof.

The condition in step LABEL:con1 requires two queries and is executed at most $M=2k\ln(15k)/\epsilon$ times. This is $2M=O((k\log k)/\epsilon)$ queries. Steps LABEL:Find is executed at most $k+1$ times. This is because each time it is executed, the value of $|I|$ is increased by one, and when $|I|=k+1$ the algorithm rejects. By Lemma 3, to find a new relevant set the algorithm makes $O(\log r)=O(\log k)$ queries. This is $O(k\log k)$ queries. Steps LABEL:Uni and LABEL:ConE are executed $|I|\leq k$ times, and by Lemma 4, the total number of queries made is $O(1/(1/30)\log(15))k+2k=O(k)$ .

The final test in the algorithm is repeated $M^{\prime}=(2\ln 15)/\epsilon$ times (step 1) and each time, and for each $\ell\in I$ , (step LABEL:feld01) it repeats $h$ times (step LABEL:feld02) two conditions that takes $2$ queries each (step LABEL:feld03). This takes $4M^{\prime}kh=O((k/\epsilon)\ln(k/\epsilon))$ queries. The number of queries in step LABEL:Finn is $2M^{\prime}=O(1/\epsilon)$ . Therefore the total number of queries is

[TABLE]

∎

4 Open Problems

In this paper we proved that for any $\epsilon>0$ , there is a two-sided distribution-free adaptive algorithm for $\epsilon$ -testing $k$ -junta that makes $\tilde{O}(k/\epsilon)$ queries. It is also interesting to find a one-sided distribution-free adaptive algorithm with such query complexity.

Chen et al. [36] proved the lower bound $\Omega(2^{k/3})$ for any non-adaptive (one round) algorithm. What is the minimal number rounds one needs to get $poly(k/\epsilon)$ query complexity? Can $O(1)$ -round algorithms solve the problem with $poly(k/\epsilon)$ queries?

In the uniform distribution framework, where the distance between two functions is measured with respect to the uniform distribution Blais in [5] gave a non-adaptive algorithm that makes $\tilde{O}(k^{3/2})/\epsilon$ queries and in [6] an adaptive algorithm that makes $O(k\log k+k/\epsilon)$ queries. On the lower bounds side, Sağlam in [44] gave an $\Omega(k\log k)$ lower bound for adaptive testing and Chen et al. [17] gave an $\tilde{\Omega}(k^{3/2})/\epsilon$ lower bound for the non-adaptive testing. Thus in both the adaptive and non-adaptive uniform distribution settings, the query complexity of $k$ -junta testing has now been pinned down to within logarithmic factors. It is interesting to study $O(1)$ -round algorithms. For example, what is the query complexity for $2$ -round algorithm.

[TABLE]

Acknowledgment. We would like to thank Xi Chen for reading the early version of the paper and for verifying the correctness of the algorithm.

Bibliography44

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Noga Alon, Tali Kaufman, Michael Krivelevich, Simon Litsyn, and Dana Ron. Testing reed-muller codes. IEEE Trans. Information Theory , 51(11):4032–4039, 2005. URL: https://doi.org/10.1109/TIT.2005.856958 , doi:10.1109/TIT.2005.856958 . · doi ↗
2[2] Roksana Baleshzar, Meiram Murzabulatov, Ramesh Krishnan S. Pallavoor, and Sofya Raskhodnikova. Testing unateness of real-valued functions. Co RR , abs/1608.07652, 2016. URL: http://arxiv.org/abs/1608.07652 , ar Xiv:1608.07652 .
3[3] Aleksandrs Belovs and Eric Blais. A polynomial lower bound for testing monotonicity. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016 , pages 1021–1032, 2016. URL: https://doi.org/10.1145/2897518.2897567 , doi:10.1145/2897518.2897567 . · doi ↗
4[4] Arnab Bhattacharyya, Swastik Kopparty, Grant Schoenebeck, Madhu Sudan, and David Zuckerman. Optimal testing of reed-muller codes. In Property Testing - Current Research and Surveys , pages 269–275. 2010. URL: https://doi.org/10.1007/978-3-642-16367-8_19 , doi:10.1007/978-3-642-16367-8\_19 . · doi ↗
5[5] Eric Blais. Improved bounds for testing juntas. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, 11th International Workshop, APPROX 2008, and 12th International Workshop, RANDOM 2008, Boston, MA, USA, August 25-27, 2008. Proceedings , pages 317–330, 2008. URL: https://doi.org/10.1007/978-3-540-85363-3_26 , doi:10.1007/978-3-540-85363-3\_26 . · doi ↗
6[6] Eric Blais. Testing juntas nearly optimally. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009 , pages 151–158, 2009. URL: https://doi.org/10.1145/1536414.1536437 , doi:10.1145/1536414.1536437 . · doi ↗
7[7] Eric Blais, Joshua Brody, and Kevin Matulef. Property testing lower bounds via communication complexity. In Proceedings of the 26th Annual IEEE Conference on Computational Complexity, CCC 2011, San Jose, California, USA, June 8-10, 2011 , pages 210–220, 2011. URL: https://doi.org/10.1109/CCC.2011.31 , doi:10.1109/CCC.2011.31 . · doi ↗
8[8] Eric Blais and Daniel M. Kane. Tight bounds for testing k-linearity. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 15th International Workshop, APPROX 2012, and 16th International Workshop, RANDOM 2012, Cambridge, MA, USA, August 15-17, 2012. Proceedings , pages 435–446, 2012. URL: https://doi.org/10.1007/978-3-642-32512-0_37 , doi:10.1007/978-3-642-32512-0\_37 . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Almost Optimal Distribution-free Junta Testing

Abstract

1 Inroduction

Theorem 1**.**

2 Preliminaries

2.1 Notations

2.2 The Model

2.3 Preliminaries Results

Lemma 2**.**

Lemma 3**.**

Lemma 4**.**

Lemma 5**.**

Proof.

3 The Algorithm

3.1 Overview of the Algorithm

Fact 1**.**

Fact 2**.**

Fact 3**.**

Fact 4**.**

3.2 The algorithm for kkk-Junta

Lemma 6**.**

Proof.

Lemma 7**.**

Proof.

Lemma 8**.**

Proof.

Lemma 9**.**

Proof.

3.3 The Algorithm for ϵ\epsilonϵ-Far Functions

Lemma 10**.**

Proof.

Lemma 11**.**

Proof.

Lemma 12**.**

Proof.

Lemma 13**.**

Proof.

3.4 The Query Complexity of the Algorithm

Lemma 14**.**

Proof.

4 Open Problems

Theorem 1.

Lemma 2.

Lemma 3.

Lemma 4.

Lemma 5.

Fact 1.

Fact 2.

Fact 3.

Fact 4.

3.2 The algorithm for $k$ -Junta

Lemma 6.

Lemma 7.

Lemma 8.

Lemma 9.

3.3 The Algorithm for $\epsilon$ -Far Functions

Lemma 10.

Lemma 11.

Lemma 12.

Lemma 13.

Lemma 14.