Query-to-Communication Lifting Using Low-Discrepancy Gadgets
Arkadev Chattopadhyay, Yuval Filmus, Sajin Koroth, Or Meir, Toniann, Pitassi

TL;DR
This paper introduces a new lifting theorem that extends the class of gadgets with low discrepancy, including logarithmic-size gadgets, thereby broadening the applicability of query-to-communication complexity reductions.
Contribution
It proves a lifting theorem for all gadgets with logarithmic length and exponentially-small discrepancy, including randomized cases, significantly expanding previous limitations.
Findings
Lifting theorem now applies to all gadgets with logarithmic length and small discrepancy.
First randomized lifting theorem for logarithmic-size gadgets.
Generalizes direct-sum theorems for low-discrepancy functions.
Abstract
Lifting theorems are theorems that relate the query complexity of a function to the communication complexity of the composed function , for some "gadget" . Such theorems allow transferring lower bounds from query complexity to the communication complexity, and have seen numerous applications in the recent years. In addition, such theorems can be viewed as a strong generalization of a direct-sum theorem for the gadget . We prove a new lifting theorem that works for all gadgets that have logarithmic length and exponentially-small discrepancy, for both deterministic and randomized communication complexity. Thus, we significantly increase the range of gadgets for which such lifting theorems hold. Our result has two main motivations: First, allowing a larger variety of gadgets may support moreâŠ
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\RS@ifundefined
subsecref \newrefsubsecname = \RSsectxt
\RS@ifundefinedthmref \newrefthmname = theoremÂ
\RS@ifundefinedlemref \newreflemname = lemmaÂ
\newrefthmname=Theorem , Name=Theorem , names=Theorems , Names=Theorems \newrefsubsecname=Section , Name=Section , names=Sections , Names=Sections \newrefsecname=Section , Name=Section , names=Sections , Names=SectionsÂ
\newrefdefname=Definition , Name=Definition , names=Definitions , Names=DefinitionÂ
\newrefremname=Remark , Name=Remark , names=Remarks , Names=RemarksÂ
\newrefconname=Conjecture , Name=Conjecture , names=Conjectures , Names=ConjecturesÂ
\newreffacname=Fact , Name=Fact , names=Facts , Names=FactsÂ
\newreflemname=Lemma , Name=Lemma , names=Lemmas , Names=LemmasÂ
\newrefcorname=Corollary , Name=Corollary , names=Corollarys , Names=CorollariesÂ
\newrefproname=Proposition , Name=Proposition , names=Propositions , Names=PropositionsÂ
\newrefclaname=Claim , Name=Claim , names=Claims , Names=Claims
Query-to-Communication Lifting Using Low-Discrepancy Gadgetsâ â thanks: This work subsumes an earlier work that appeared in ICALP 2019 [CFK*+*19].
The earlier work proved our main result (1.2) only for the special case where the gadget is the inner product function, while this work proves the result for the general case of all low-discrepancy gadgets.
Arkadev Chattopadhyay School of Technology and Computer Science, Tata Institute of Fundamental Research, Mumbai, India. [email protected]. ââ
Yuval Filmus Technion Israel Institute of Technology, Haifa, Israel. [email protected]. Taub Fellow â supported by the Taub Foundations. The research was funded by ISF grant 1337/16. ââ
Sajin Koroth School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, B.C., Canada V5A 1S6. This research was done while Sajin Koroth was partially supported by the Israel Science Foundation (grant No. 1445/16) and by the institutional postdoctoral program of the University of Haifa. ââ
Or Meir Department of Computer Science, University of Haifa, Haifa 3498838, Israel. [email protected]. Partially supported by the Israel Science Foundation (grant No. 1445/16). ââ
Toniann Pitassi Department of Computer Science, University of Toronto, Canada. [email protected]. Research supported by NSERC and. by NSF CCF grant 1900460
Abstract
Lifting theorems are theorems that relate the query complexity of a function to the communication complexity of the composed function , for some âgadgetâ . Such theorems allow transferring lower bounds from query complexity to the communication complexity, and have seen numerous applications in the recent years. In addition, such theorems can be viewed as a strong generalization of a direct-sum theorem for the gadget .
We prove a new lifting theorem that works for all gadgets that have logarithmic length and exponentially-small discrepancy, for both deterministic and randomized communication complexity. Thus, we significantly increase the range of gadgets for which such lifting theorems hold.
Our result has two main motivations: First, allowing a larger variety of gadgets may support more applications. In particular, our work is the first to prove a randomized lifting theorem for logarithmic-size gadgets, thus improving some applications of the theorem. Second, our result can be seen as a strong generalization of a direct-sum theorem for functions with low discrepancy.
1 Introduction
1.1 Background
In this work, we prove new lifting theorems for a large family of gadgets. Let and be functions (where is referred to as a gadget). The block-composed function is the function that takes inputs for and computes as,
[TABLE]
Lifting theorems are theorems that relate the communication complexity of to the query complexity of and the communication complexity of .
More specifically, consider the following communication problem: Alice gets , Bob gets , and they wish to compute the output of on their inputs. The natural protocol for doing so is the following: Alice and Bob jointly simulate a decision tree of optimal height for solving . Any time the tree queries the -th bit, they compute on by invoking the best possible communication protocol for . A lifting theorem is a theorem that says that this natural protocol is optimal.
We note that it is often desirable to consider the case where is a search problem with an arbitrary range rather than a boolean function (see Section 2 for the definition of search problems). Most of the known results, as well as the results of this work, apply to this general case. However, for the simplicity of presentation, we focus for now on the case where is a boolean function.
Applications of lifting theorems.
One important reason for why lifting theorems are interesting is that they create a connection between query complexity and communication complexity. This connection, besides being interesting in its own right, allows us to transfer lower bounds and separations from query complexity (which is a relatively simple model) to communication complexity (which is a significantly richer model).
In particular, the first result of this form, due to Raz and McKenzie [RM99], proved a lifting theorem from deterministic query complexity to deterministic communication complexity when is the index function. They then used it to prove new lower bounds on communication complexity by lifting query-complexity lower-bounds. More recently, Göös, Pitassi and Watson [GPW15] applied that theorem to separate the logarithm of the partition number and the deterministic communication complexity of a function, resolving a long-standing open problem. This too was done by proving such a separation in the setting of query complexity and then lifting it to the setting of communication complexity. This result stimulated a flurry of work on lifting theorems of various kinds, such as: lifting for zero-communication protocols [GLM*+*16], round-preserving lifting theorems with applications to time-space trade-offs for proof complexity [dRNV16], deterministic lifting theorems with other gadgets [CKLM17, WYY17], lifting theorems from randomized query complexity to randomized communication complexity [GPW17], lifting theorems for DAG-like protocols [GGKS18] with applications to monotone circuit lower bounds, lifting theorems for asymmetric communication problems [CKLM18] with applications to data-structures, a lifting theorem for the EQUALITY gadget [LM18], lifting theorems for XOR functions with applications to the log-rank conjecture [HHL18], and lifting theorems for applications to monotone formula complexity, monotone span programs, and proof complexity [GP18, RPRC16, PR17, PR18]. There are also lifting theorems which lift more analytic properties of the function like approximate degree due to Sherstov [She11] and independently due to Shi and Zhu [SZ09].
In almost all known lifting theorems, the function can be arbitrary while is usually a specific function (e.g., the index function). This raises the following natural question: for which choices of can we prove lifting theorems? This question is interesting because usually the applications of lifting theorems work by reducing the composed function to some other problem of interest, and the choice of the gadget affects the efficiency of such reductions.
In particular, applications of lifting theorems often depend on the size of the gadget, which is the length of the input to . Both the deterministic lifting theorem of Raz and McKenzie [RM99] and the randomized lifting theorem of Göös et al. [GPW17] use a gadget of very large size (polynomial in ). Reducing the gadget size to a constant would have many interesting applications.
In the deterministic setting, the gadget size was recently improved to logarithmic by the independent works of [CKLM17] and [WYY17]. Moreover, [CKLM17, Koz18] showed the lifting to work for a class of gadgets with a certain pseudorandom property rather than just a single specific gadget. A gadget of logarithmic size was also obtained earlier in lifting theorems for more specialized models, such as the work of [GLM*+*16]. However, the randomized lifting theorem of Göös et al. [GPW17] seemed to work only with a specific gadget of polynomial size.
In this work, we prove a lifting theorem for a large family of gadgets, namely, all functions with logarithmic length and exponentially-small discrepancy (see 1.2 for details). Our theorem holds both in the deterministic and the randomized setting. This allows for a considerably larger variety of gadgets: in particular, our theorem is the first lifting theorem in the randomized setting that uses logarithmic-size gadgets, it allows lifting with the inner-product gadget (previously known only in the deterministic setting [CKLM17, WYY17]), and it is also the first lifting theorem that shows that a random function can be used as a gadget.
We would like to point out that, although we reduce the gadget size to logarithmic in this work, it is not enough to obtain the interesting applications a constant sized gadget would have yielded. Nevertheless, our randomized lifting theorem still has some applications. For example, our theorem can be used to simplify the lower bounds of Göös and Jayram [GJ16] on AND-OR trees and MAJORITY trees, since we can now obtain them directly from the randomized query complexity lower bounds rather than going through conical juntas. In addition, our theorem can be used to derive the separation of randomized separation from partition number (due to [GJPW15]) for functions with larger complexity (compared to their input length).
Lifting theorems as a generalization of direct-sum theorems.
Lifting theorems can also be motivated from another angle, which is particularly appealing in our case: lifting theorems can be viewed as a generalization of direct-sum theorems. The direct-sum question is a classical question in complexity theory, which asks whether performing a task on  independent inputs is  times harder than performing it on a single input. When specialized to the setting of communication complexity, a direct-sum theorem is a theorem that says that the communication complexity of a computing on  independent inputs is about  times larger than the communication complexity of . A related type of result, which is sometimes referred to as an âXOR lemmaâ, says that computing the XOR of the outputs of on  independent inputs is about  times larger than the communication complexity of .
The direct-sum question for communication complexity has been raised in [KRW91], and has since attracted much attention. While we do not have a general direct-sum theorem for all functions, many works have proved direct-sum theorems and XOR lemmas for large families of functions [FKNN95, Sha03, BPSW06, LSS08, Kla10, BBCR10, BRWY13, Bra17] as well as provided counterexamples [FKNN95, GKR14, GKR16b, GKR16a].
Now, observe that lifting theorems are natural generalizations of direct-sum theorems and XOR lemmas: in particular, if we set to be the identity function or the parity function, we get a direct sum theorem or an XOR lemma for , respectively. More generally, a lifting theorem says that the communication complexity of computing any function of the outputs of on independent inputs is larger than the complexity of by a factor that depends on the query complexity of . This is perhaps the strongest and most natural âdirect-sum-like theoremâ for that one could hope for.
From this perspective, it is natural to ask which functions admit such a strong theorem. The previous works of [RM99, GPW17] can be viewed as establishing this theorem only for the index function. The work of [CKLM17, Koz18] have made further progress, establishing this theorem for a class of functions with satisfy certain hitting property. However, the latter property is somewhat non-standard and ad hoc, and their theorem holds only in the deterministic setting. In this work, we establish such theorems for all functions with low discrepancy, which is a standard and well-studied complexity measure, and we do so in both the deterministic and the randomized setting.
1.2 Our results
In this work, we prove lifting theorems for gadgets of low-discrepancy. In what follows, we denote by and the deterministic query complexity and communication complexity of a task respectively, and by and the randomized query complexity and (public-coin) communication complexity with error probability respectively. Given a search problem and a gadget , it is easy to see that
[TABLE]
This upper bound is proved using the simple protocol discussed earlier: the party simulates the optimal decision tree for , and whenever a query is made, the parties compute on the corresponding input in order to answer the query (which can be done by communicating at most bits). Our main result says that when  has low discrepancy and is at least logarithmic, that this upper bound is roughly tight. In order to state this result, we first recall the definition of discrepancy.
Definition 1.1**.**
Let be a finite set, let be a function, and let be independent random variables that are uniformly distributed over . Given a combinatorial rectangle , the discrepancy of with respect to , denoted , is defined as follows:
[TABLE]
The discrepancy of , denoted , is defined as the maximum of over all combinatorial rectangles .
Discrepancy is a useful measure for the complexity of , and in particular, it is well-known that for :
[TABLE]
(see, e.g., [KN97]). We now state our main result.
Theorem 1.2** (Main theorem).**
For every there exists such that the following holds: Let be a search problem that takes inputs from , and let be an arbitrary function such that and such that . Then
[TABLE]
and for every it holds that
[TABLE]
where .
We note that our results are in fact more general, and preserve the round complexity of among other things. See Sections 4 and 5 for more details.
Remark 1.3**.**
Note that our main theorem can be applied to a random function , since such a function has a very low discrepancy. As noted above, we believe that our theorem is the first theorem to allow the gadget to be a random function.
Unifying deterministic and randomized lifting theorems.
The existing proofs of deterministic lifting theorems and randomized lifting theorems are quite different. While both proofs rely on information-theoretic arguments, they measure information in different ways. In particular, while the randomized lifting theorem of [GPW17] (following [GLM*+*16]) measures information using min-entropy, the deterministic lifting theorems of [RM99, GPW15, CKLM17, WYY17] (following [EIRS01]) measure information using a notion known as thickness (with [GGKS18] being a notable exception). A natural direction of further research is to investigate if these disparate techniques can be unified. Indeed, a related question was raised by [GLM*+*16], who asked if min-entropy based techniques could be used to prove (or simplify the existing proof of) RazâMcKenzie style deterministic lifting theorems.
Our work answers this question affirmatively: we prove both the deterministic and randomized lifting theorems using the same strategy. In particular, both proofs measure information using min-entropy. In doing so, we unify both lifting theorems under the same framework.
1.3 Our techniques
We turn to describe the high-level ideas that underlie the proof of our main theorem. Following the previous works, we use a âsimulation argumentâ: We show that given a protocol that solves with communication complexity , we can construct a decision tree that solves with query complexity . The decision tree works by simulating the action of the protocol (hence the name âsimulation argumentâ). We now describe this simulation in more detail, following the presentation of [GPW17].
The simulation argument.
For simplicity of notation, let us denote , so is a function from a âblockâ in to . Let be the function that takes  disjoint blocks and computes the outputs of on all of them. We assume that we have a protocol that solves with complexity , and would like to construct a decision tree that solves with complexity . The basic idea is that given an input , the tree simulates the action of on the random inputs that are uniformly distributed over . Clearly, it holds that , so this simulation, if done right, outputs the correct answer.
The core issue in implementing such a simulation is the following question: how can simulate the action of on without knowing ? The answer is that as long as the protocol has transmitted less than  bits of information about every block (for some specific ), the distribution of is similar to the uniform distribution in a certain sense (that will be formalized soon). Thus, the tree can pretend that are distributed uniformly, and simulate the action of on such inputs, which can be done without knowing .
This idea can be implemented as long as the protocol has transmitted less than  bits of information about every block . However, at some point, the protocol may transmit more than bits of information about some blocks. Let denote the set of these blocks. At this point, it is no longer true that the distribution of is similar to the uniform distribution. However, it can be shown that the distribution of is similar to the uniform distribution conditioned on . Thus, the tree queries the bits in , and can now continue the simulation of on by pretending that are distributed uniformly conditioned on . The tree proceeds in this way, adding blocks to as necessary, until the protocol ends, at which point halts and outputs the same output as .
It remains to show that the query complexity of is at most . To this end, observe that the query complexity of is exactly the size of the set at the end of the simulation. Moreover, recall that the set is the set of blocks on which the protocol transmitted at least bits of information. Hence, at any given point, the protocol must have transmitted at least bits. On the other hand, we know by assumption that the protocol never transmitted more than bits. This implies that and therefore the query complexity of the tree is at most . This concludes the argument.
Our contribution.
In order to implement the foregoing simulation argument, there are two technical issues that need to be addressed and are relevant at this point:
- âą
The uniform marginals issue: In the above description, we argued that as long the protocol has not transmitted too much information, the distribution of is âsimilar to the uniform distributionâ. The question is how do we formalize this idea. This issue was dealt with implicitly in several works in the lifting literature since [RM99], and was made explicit in [GPW17] as the âuniform marginals lemmaâ: if every set of blocks in has sufficient min-entropy, then each of the marginals on its own is close to the uniform distribution. In [GPW17], they proved this lemma for the case where  is the index function, and in [GLM*+*16] a very similar lemma was proved for the case where is the inner product function.
- âą
The conditioning issue: As we described above, when the protocol transmits too much information about a set of blocks , the tree queries and conditions the distribution of on the event that . In principle, this conditioning may reveal information on , which might reduce their min-entropy and ruin their uniform-marginals property. In order for the simulation argument to work, one needs to show that this cannot happen, and the conditioning will never reveal too much information about .
In the works of [RM99, WYY17, GPW17] this issue was handled by arguments that are tailored to the index and inner product functions. The work of [CKLM17] gave this issue a more general treatment, by identifying an abstract property of that prevents the conditioning from revealing too much information. However, as discussed above, this abstract property is somewhat ad hoc, and only works for deterministic simulation.
Our contribution is dealing with both issues in the general setting where is an arbitrary low-discrepancy gadget. In order to deal with the first issue, we prove a âuniform marginalsâ lemma for such gadgets : this is relatively easy, since the proof of [GLM*+*16] for the inner product gadget turns out to generalize in a straightforward way to arbitrary low-discrepancy gadgets.
The core of this work is in dealing with the conditioning issue. Our main technical lemma that says that as long as every set of blocks in has sufficient min-entropy, there are only few possible values of that are âdangerousâ (in the sense that they may lead the conditioning to leak too much information). We now modify the simulation such that it discards these dangerous values before performing the conditioning. Since there are only few of those dangerous values, discarding them does not reveal too much information on and , and the simulation can proceed as before.
1.4 Open problems
The main question that arises from this work is how much more general the gadget can be? As was discussed in LABEL:Subsec:background, lifting theorems can be viewed as a generalization of direct-sum theorems. In the setting of randomized communication complexity, it is known that the âability of to admit a direct-sum theoremâ is characterized exactly by a complexity measure called the information cost of (denoted ). In particular, the complexity of computing a function on  independent copies is . [BBCR10, BR14, Bra17]. This leads to the natural conjecture that a lifting theorem should hold for every gadget that has sufficiently high information cost.
Conjecture 1.4**.**
There exists a constant such that the following holds. Let be any search problem that takes inputs from , and let be an arbitrary function such that . Then
[TABLE]
Proving this conjecture would give us a nearly-complete understanding of the lifting phenomenon which, in addition to being interesting in its own right, would likely lead to many applications. In particular, this conjecture implies our result, since it is known that (roughly) lower bounds the information cost of [KLL*+*15].
1.4 is quite ambitious. As intermediate goals, one could attempt to prove such a lifting theorem for other complexity measures that are stronger than discrepancy and weaker than information cost (see [JK10, KLL*+*15] for several measures of this kind). To begin with, one could consider the well-known corruption bound of [Yao83, BFS86, Raz92]: could we prove a lifting theorem for an arbitrary gadget that has a low corruption bound? A particularly interesting example for such a gadget is the disjointness function â indeed, proving a lifting theorem for the disjointness gadget would be interesting in its own right and would likely have applications, in addition to being a step toward 1.4.
An even more modest intermediate goal is to gain better understanding of lifting theorems with respect to discrepancy. For starters, our result only holds111More accurately, our result can be applied to gadgets with larger discrepancy, but then the gadget size has to be larger than logarithmic. for gadgets whose discrepancy is exponentially vanishing in the gadget size. Can we prove a lifting theorem for gadgets with larger discrepancy? In particular, since the randomized communication complexity of is lower bounded by , the following conjecture comes to mind.
Conjecture 1.5**.**
There exists a constant such that the following holds. Let be any search problem that takes inputs from , and let be an arbitrary function such that . Then
[TABLE]
Another interesting direction is to consider discrepancy with respect to other distributions. The definition of discrepancy we gave above (1.1) is a special case of a more general definition, in which the random variables are distributed according to some fixed distribution over . Thus, our result works only when is the uniform distribution. Can we prove a lifting theorem that holds for an arbitrary choice of ? While we have not verified it, we believe that our proof can yield a lifting theorem that works whenever  is a product distribution (after some natural adaptations). However, proving such a lifting theorem for non-product distributions seems to require new ideas. We note that direct-sum theorems for discrepancy have been proved by [Sha03, LSS08], and proving 1.4 (and extending it to an arbitrary distribution ) seems like a natural extension of their results.
Yet another interesting direction is to consider the lifting analogue of strong direct product theorems. Such theorems say that when we compute on  independent inputs, then not only that the communication complexity increases by a factor of , but the success probability also drops exponentially in (see, e.g., [Sha03, Kla10, Dru12, BRWY13]). A plausible analogue for lifting theorems is to conjecture that the success probability of computing drops exponentially in the query complexity of . It would be interesting to see a result along these lines.
Finally, there remains major open problem of the lifting literature to prove a lifting theorem that uses gadgets of constant size.
Organization of the paper. In LABEL:Sec:preliminaries, we provide the required preliminaries. In 3, we set up the lifting machinery that is used in both the deterministic and the randomized lifting results, including our âuniform marginals lemmaâ and our main technical lemma. We prove the deterministic part of our main theorem in 4, and the randomized part of our main theorem in 5.
2 Preliminaries
We assume familiarity with the basic definitions of communication complexity (see, e.g., [KN97]). For any , we denote . Given a boolean random variable , we denote the bias of by
[TABLE]
Given an alphabet and a set , we denote by the set of strings of length which are indexed by . Given a string and a set , we denote by the projection of to the coordinates in (in particular, is defined to be the empty string). Given a boolean function and a set , we denote by the function that takes as inputs pairs from that are indexed by , and outputs the string in whose -th bit is the output of on the -th pair. In particular, we denote , so the takes as inputs and outputs the binary string
[TABLE]
For every , we denote by the function that given and , outputs the parity of the string .
Search problems.
Given a finite set of inputs and a finite set of outputs , a search problem is a relation between and . Given , we denote by the set of outputs such that . Without loss of generality, we may assume that is always non-empty, since otherwise we can set where is some special failure symbol that does not belong to .
Intuitively, a search problem represents the following task: given an input , find a solution . In particular, if for some finite sets , we say that a deterministic protocol solves if for every input , the output of is in . We say that a randomized protocol solves with error if for every input , the output of is in with probability at least .
We denote the deterministic communication complexity of a search problem with . Given , we denote by the randomized (public-coin) communication complexity of with error (i.e., the minimum worst-case complexity of a randomized protocol that solves with error ).
Given a search problem , we denote by the search problem that satisfies for every and that .
2.1 Decision trees
Informally, a decision tree is an algorithm that solves a search problem by querying the individual bits of its input. The tree is computationally unbounded, and its complexity is measured by the number of bits it queried.
Formally, a deterministic decision tree from to is a binary tree in which every internal node is labeled with a coordinate in (which represents a query), every edge is labeled by a bit (which represents the answer to the query), and every leaf is labeled by an output in . Such a tree computes a function from to in the natural way, and with a slight abuse of notation, we denote this function also as . The query complexity of is the depth of the tree. We say that a tree solves a search problem if for every it holds that . The deterministic query complexity of , denoted , is the minimal query complexity of a decision tree that solves .
A randomized decision tree is a random variable that takes deterministic decision trees as values. The query complexity of is the maximal depth of a tree in the support of . We say that solves a search problem with error if for every it holds that
[TABLE]
The randomized query complexity of with error , denoted , is the minimal query complexity of a randomized decision tree that solves with error . Again, when we omit , it is assumed to be .
2.1.1 Parallel decision-trees
Our lifting theorems have the property that they preserve the round complexity of protocols, which is useful for some applications [dRNV16]. In order to define this property, we need a notion of a decision tree that has an analogue of âround complexityâ. Such a notion, due to [Val75], is called a parallel decision tree. Informally, a parallel decision tree is a decision tree that works in âroundsâ, where in each round multiple queries are issued simultaneously. The âround complexityâ of the tree is the number of rounds, whereas the query complexity is the total number of queries issued.
Formally, a deterministic parallel decision tree from to is a rooted tree in which every internal node is labeled with a set (representing the queries issued simultaneously at this round) and has degree . The edges going out of such a node are labeled with all the possible assignments in , and the every leaf is labeled by some output . As before, such a tree naturally computes a function that is denoted by , and it solves a search problem if for all . The depth of such a tree is now the analogue of the number of rounds in a protocol. The query complexity of is defined as the maximum, over all leaves , of the sum of the sizes of the sets that are labeling the vertices on the path from the root to . A randomized parallel decision tree is defined analogously to the definition of randomized decision trees above.
2.2 Fourier analysis
Given a set , the character is the function from to that is defined by
[TABLE]
Here, if then we define . Given a function , its Fourier coefficient is defined as
[TABLE]
It is a standard fact of Fourier analysis that can be written as
[TABLE]
We have the following useful observation.
Fact 2.1**.**
Let be a random variable taking values in , and let be its density function. Then, for every set it holds that
[TABLE]
In particular, .
- Proof.
Let . It holds that
[TABLE]
as required. The âin particularâ part follows by noting that in the case of , the character is the constant function , and recalling that the sum of over all âs is . â
2.3 Probability
Given two distributions over a finite sample space , the statistical distance (or total variation distance) between and is
[TABLE]
It is not hard to see that the maximum is attained when consists of all the values such that . We say that and are -close if . The *min-entropy *of a random variable , denoted , is the largest number such that for every value it holds that
[TABLE]
Min-entropy has the following easy-to-prove properties.
Fact 2.2**.**
Let be a random variable and let be an event. Then, H_{\infty}(X\texttt{\mid}\mathcal{E})\geq H_{\infty}(X)-\log\frac{1}{\Pr\left[\mathcal{E}\right]}.
Fact 2.3**.**
Let be random variables taking values from sets respectively. Then, .
We say that a distribution is -flat if it is uniformly distributed over a subset of the sample space of size at least . The following standard fact is useful.
Fact 2.4**.**
If a random variable has min-entropy , then its distribution is a convex combination of -flat distributions.
2.3.1 Vaziraniâs Lemma
Vaziraniâs lemma is a useful result which says that a random string is close to being uniformly distributed if the XOR of every set of bits in the string has a small bias. We use the following variant of the lemma due to [GLM*+*16].
Lemma 2.5** ([GLM*+*16]).**
Let , and let be a random variable taking values in . If for every non-empty set it holds that
[TABLE]
then for every it holds that
[TABLE]
- Proof.
Let be the density function of , and let . By Equation 1 it holds that
[TABLE]
as required. â
2.5 says that if the bias of is small for every , then is close to being uniformly distributed. It turns out that if the latter assumption holds only for large sets , we can still deduce something useful, namely, that the min-entropy of is high.
Lemma 2.6**.**
Let be such that , and let be a random variable taking values in . If for every set such that it holds that
[TABLE]
then, .
- Proof.
Observe that if then the bound holds vacuously, so we may assume that . Let be the density function of , and let . By Equality 1 it holds that
[TABLE]
We now bound each of the two terms separately. The term for sets whose size is at least can be upper bounded by using exactly the same calculation as in the proof of 2.5. In order to upper bound the term for sets whose size is less than , observe that for every and and therefore
[TABLE]
It follows that
[TABLE]
Thus, as required. Note that this bound is a bit stronger than claimed in the lemma: indeed, we only need the ââ term in the lemma in order to deal with the case where . â
2.3.2 Coupling
Let be two distributions over sample spaces . A coupling of and is a distribution over the sample space whose marginal over the first coordinate is and whose marginal over the second coordinate is . In the case where , the following standard fact allows us to use couplings to study the statistical distance between and .
Fact 2.7**.**
Let be two distributions over a sample space . The statistical distance between and is equal to the minimum, over all couplings of and , of
[TABLE]
In particular, we can upper bound the statistical distance between and by constructing a coupling in which the probability that is small.
2.4 Prefix-free codes
A set of strings is called a prefix-free code if no string in is a prefix of another string in . Given a string , we denote its length by . We use the following simple corollary of Kraftâs inequality.
Fact 2.8**.**
Let be a finite prefix-free code, and let be a random string taking values from . Then, there exists a string such that .
For completeness, we provide the following simple proof of 2.8 that does not rely on Kraftâs inequality.
- Proof.
Let be the maximal length of a string in , and let be a random string in that is sampled according to the following process: sample a string from , choose a uniformly distributed string , and set (where here denotes string concatenation).
By a simple averaging argument, there exists a string such that . Since is a prefix-free code, there exists a unique prefix of that is in . The definition of implies that
[TABLE]
because the only way the string could be sampled is by first sampling and then sampling to be the rest of (again, since is a prefix-free code). Hence, it follows that
[TABLE]
as required. â
2.5 Discrepancy
We start by recalling the definition of discrepancy.
Definition 1.1.
Let be a finite set, let be a function, and let be independent random variables that are uniformly distributed over . Given a combinatorial rectangle , the discrepancy of with respect to , denoted , is defined as follows:
[TABLE]
The discrepancy of , denoted , is defined as the maximum of over all combinatorial rectangles .
Let be a function with discrepancy at most . Such functions satisfy the following âextractor-likeâ property. In what follows, the parameter controls .
Lemma 2.9**.**
Let be independent random variables taking values in such that . Then,
[TABLE]
- Proof.
By 2.4, it suffices to consider the case where and have flat distributions. Let be the sets over which are uniformly distributed, and denote . By the assumption on the min-entropies of and , it holds that .
Let be random variables that are uniformly distributed over . Then, and are distributed like U\texttt{\mid}U\in A and V\texttt{\mid}V\in B respectively. It follows that
[TABLE]
as required. â
Using 2.9, we can obtain the following sampling property, which says that with high probability takes a value for which is small. In what follows, the parameter controls , the parameter controls the error probability, and recall that is the parameter that controls the discrepancy of (i.e., ).
Lemma 2.10**.**
Let . Let be independent random variables taking values in such that
[TABLE]
Then, the probability that takes a value such that
[TABLE]
is less than .
- Proof.
For every , denote
[TABLE]
Using this notation, our goal is to prove that
[TABLE]
We will prove that the probability that is less than , and a similar proof can be used to show that the probability that is less than . The required result will then follow by the union bound.
Let be the set of values such that . Suppose for the sake of contradiction that . It clearly holds that
[TABLE]
On the other hand, it holds that
[TABLE]
This implies that
[TABLE]
By 2.9, it follows that
[TABLE]
which contradicts Inequality 3. We reached a contradiction, and therefore the probability that is less than , as required. â
Recall that the function from to that outputs the parity of the string . We would like to prove results like LABEL:lem:[s]discrepancy-extractor and 2.10 for functions of the form . To this end, we use the following XOR lemma for discrepancy due to [LSS08].
Theorem 2.11** ([LSS08]).**
Let . Then,
[TABLE]
By combining 2.11 with LABEL:lem:[s]discrepancy-extractor and 2.10, we obtain the following results.
Corollary 2.12**.**
Let , and . Let be independent random variables taking values in such that
[TABLE]
Then,
[TABLE]
Corollary 2.13**.**
Let , and . Let be independent random variables taking values in such that
[TABLE]
Then, the probability that takes a value such that
[TABLE]
is less than .
3 Lifting Machinery
In this section, we set up the machinery we need to prove our main theorem, restated next.
Theorem 1.2.
For every there exists such that the following holds: Let be a search problem that takes inputs from , and let be an arbitrary function such that and such that . Then
[TABLE]
and for every it holds that
[TABLE]
where .
For the rest of this paper, we fix and let be some sufficiently large parameter that will be determined later such that . Let , and let be a function such that and such that . Note that when , the theorem holds trivially, so we may assume that . For convenience, we denote and . Throughout the rest of this section, and will always denote random variables whose domain is .
As explained in LABEL:Subsec:our-techniques, our simulation argument is based on the idea that as long as the protocol did not transmit too much information about the inputs, their distribution is similar to the uniform distribution. The following definition, due to [GLM*+*16], formalizes the notion that the protocol did not transmit too much information about an input .
Definition 3.1**.**
Let . We say that a random variable is -dense if for every it holds that .
As explained there, whenever the protocol transmits too much information about a bunch of blocks (where ), the simulation queries and conditions the distribution on . The following definitions provide a useful way for implementing this argument: restrictions are used to keep track of which bits of have been queried so far, and the notion of structured variables expresses the desired properties of the distribution of the inputs.
Definition 3.2**.**
A restriction is a string in . We say that a coordinate is free in if , and otherwise we say that  is fixed. Given a restriction , we denote by and the set of free and fixed coordinates of respectively. We say that a string is consistent with if .
Intuitively, represents the queries that have been made so far, and represents the coordinates that have not been queried yet.
Definition 3.3** (following [GPW17]).**
Let be a restriction, let , and let be independent random variables. We say that  and are -structured if there exist such that and are -dense and -dense respectively, , and
[TABLE]
We can now state our version of the uniform marginals lemma of [GPW17], which formalizes the idea that if and are structured then their distribution is similar to the uniform distribution over . In what follows, the parameter controls the statistical distance from the uniform distribution, and recall that is the parameter that controls the discrepancy of (i.e., ).
Lemma 3.4** (Uniform marginals lemma).**
There exists a universal constant such that the following holds: Let , let be a restriction, and let be a string that is consistent with . Let be independent random variables that are uniformly distributed over sets respectively, and assume that they are -structured where
[TABLE]
Let be uniformly distributed over . Then, and are -close to and respectively.
Remark 3.5**.**
Here, as well as in other claims in the paper, we denote by some constant that is large enough to make the proofs go through, and does not depend on any other parameter. The constant can be calculated explicitly, and we only refrain from doing so in order to streamline the presentation. In all the cases where we use this constant, it can be chosen to be at most .
We defer the proof of 3.4 to LABEL:Subsec:uniform-marginals, and move to discuss the next issue. Recall that in order for and to be structured, the random variables and have to be dense. However, as the simulation progresses and the protocol transmits information, this property may be violated, and or may cease to be dense. In order to restore the density, we use the following folklore fact.
Proposition 3.6**.**
Let be a random variable, let , and let be a maximal subset of coordinates such that . Let be a value such that
[TABLE]
Then, the random variable X_{\left[n\right]-I}\texttt{\mid}X_{I}=x_{I} is -dense.
- Proof.
Assume for the sake of contradiction that X_{\left[n\right]-I}\texttt{\mid}X_{I}=x_{I} is not -dense. Then, there exists a non-empty set such that H_{\infty}(X_{J}\texttt{\mid}X_{I}=x_{I})<\delta_{X}\cdot b\cdot\left|J\right|. In particular, there exists a value such that
[TABLE]
But this implies that
[TABLE]
which means that
[TABLE]
However, this contradicts the maximality of . â
3.6 is useful in the deterministic setting, since in this setting the simulation is free to condition the distributions of in any way that maintains their density. However, in the randomized setting, the simulation is more restricted, and cannot condition the inputs on events such as which may have very low probability. In [GPW17], this issue was resolved by observing that the probability space can be partitioned to disjoint events of the form , and that the randomized simulation can use such a partition to achieve the same effect of 3.6. This leads to the following lemma, which we use as well.
Lemma 3.7** (Density-restoring partition [GPW17]).**
Let be a random variable, let denote the support of , and let . Then, there exists a partition
[TABLE]
where every is associated with a set and a value such that:
- âą
X_{I_{j}}\texttt{\mid}X\in\mathcal{X}^{j}* is fixed to .*
- âą
X_{\left[n\right]-I_{j}}\texttt{\mid}X\in\mathcal{X}^{j}* is -dense.*
Moreover, if we denote , then it holds that
[TABLE]
We turn to discuss the âconditioning issueâ that was discussed in LABEL:Subsec:our-techniques and its resolution: As mentioned above, the simulation uses 3.6 and 3.7 to restore the density of the inputs by conditioning some of the blocks. Specifically, suppose, for example, that is no longer dense. Then, the simulation chooses appropriate and , and conditions on the event . At this point, in order to make and structured again, we need to remove from , so the simulation queries the bits in , and update the restriction by setting . Now, we have to make sure that . To this end, the simulation conditions on the event . However, the latter conditioning reveals information about , which may have two harmful effects:
- âą
Leaking: As discussed in LABEL:Subsec:our-techniques, our analysis of the query complexity assumes that the protocol transmits at most bits of information. It is important not to reveal more information than that, or otherwise our query complexity may increase arbitrarily. On average, we expect that conditioning on the event would reveal only  bits of information, which is sufficiently small for our purposes. However, there could be values of and for which much more information is leaked. In this case, we say the conditioning is leaking.
- âą
Sparsifying: Even if the conditioning reveals only bits of information on , this could still ruin the density of if the set is large. In this case, we say that the conditioning is sparsifying.
This is the âconditioning issueâ, and dealing with it is the technical core of the paper. As explained in LABEL:Subsec:our-techniques, the simulation deals with this issue by recognizing in advance which values of are âdangerousâ, in the sense that they may lead to a bad conditioning, and discarding them before such conditioning may take place. The foregoing discussion leads to the following definition of a dangerous value.
Definition 3.8**.**
Let be a random variable taking values from . We say that a value is leaking if there exists a set and an assignment such that
[TABLE]
Let , and suppose that is -dense. We say that a value is -sparsifying if there exists a set and an assignment such that the random variable
[TABLE]
is not -dense. We say that a value is -dangerous if it is either leaking or -sparsifying.
We can now state our main technical lemma, which says that has only a small probability to take a dangerous value. This allows the simulation to discard such values and resolve the conditioning issue. In what follows, the parameter controls the error probability, and recall that is the parameter that controls the discrepancy of (i.e., ).
Lemma 3.9** (Main lemma).**
There exists a universal constant222See 3.5 for further explanation on the constant . such that the following holds: Let be such that and , and let be -structured random variables. Then, the probability that takes a value that is -dangerous for is at most .
3.1 Proof of the uniform marginals lemma
Recall that the random variables  and are -structured if there exist such that and are -dense and -dense respectively, , and . In this section we prove the uniform marginals lemma, restated next.
Lemma 3.4.
There exists a universal constant such that the following holds: Let , let be a restriction, and let be a string that is consistent with . Let be independent random variables that are uniformly distributed over sets respectively, and assume that they are -structured where
[TABLE]
Let be uniformly distributed over . Then, and are -close to and respectively.
In order to prove 3.4, we first prove the following proposition, which says that the string is close to the uniform distribution in a very strong sense. In what follows, the parameter controls the distance from the uniform distribution, and recall that is the parameter that controls the discrepancy of (i.e., ).
Proposition 3.10** (Generalization of [GLM*+*16, Lemma 13]).**
There exists a universal constant such that the following holds: Let . Let be random variables that are -structured for , and let . Then, for every it holds that
[TABLE]
- Proof.
Let . We use 2.12 to upper bound the biases of , and then apply Vaziraniâs lemma to show that it is close to the uniform distribution. Let . By assumption, the variables are -dense and -dense for some for which . Therefore, it holds that
[TABLE]
and 2.12 implies (with ) that
[TABLE]
Since the latter inequality holds for every , it follows by 2.5 that
[TABLE]
for every , as required. â
We turn to prove the uniform marginals lemma.
- Proof of 3.4
Let be the universal constant of 3.10 and let . Let be uniformly distributed over , and let . We prove that  is -close to , and a similar argument works for . Let be any test event. We show that
[TABLE]
Without loss of generality we may assume that , since otherwise we can replace with its complement. Since and are -structured where
[TABLE]
3.10 implies that
[TABLE]
Moreover, since , conditioning on cannot decrease the density of by more than (since this conditioning increases any probability by a factor of at most ). Therefore X\texttt{\mid}\mathcal{E} and together are -structured, where
[TABLE]
Hence, 3.10 implies that
[TABLE]
Now, it holds that
[TABLE]
A similar calculation shows that
[TABLE]
It follows that
[TABLE]
as required. â
3.2 Proof of the main technical lemma
In this section we prove our main technical lemma, which upper bounds the probability of a variable to take a dangerous value. We first recall the definition of a dangerous value and the lemma.
Definition 3.8.
Let be a random variable taking values from . We say that a value is leaking if there exists a set and an assignment such that
[TABLE]
Let , and suppose that is -dense. We say that a value is -sparsifying if there exists a set and an assignment such that the random variable
[TABLE]
is not -dense. We say that a value is -dangerous if it is either leaking or -sparsifying.
Lemma 3.9.
There exists a universal constant such that the following holds: Let be such that and , and let be -structured random variables. Then, the probability that takes a value that is -dangerous for is at most .
Let be a universal constant that will be chosen to be sufficiently large to make the inequalities in the proof hold. Let be as in the lemma, and assume that are -structured. For simplicity of the presentation, we assume that all the coordinates of are free â this can be assumed without loss of generality since the fixed coordinates of do not play any part in the lemma. Thus, our goal is to prove an upper bound on the probability that takes a value that is dangerous for . By assumption, there exist some parameters such that and are -dense and -dense respectively, and such that .
We start by discussing the high-level ideas that underlie the proof. We would like to prove an upper bound on the probability that takes a value that is either leaking or sparsifying. Proving the upper bound for leaking values is relatively easy and is similar to the proof of 3.10: basically, since and are sufficiently dense, the string is multiplicatively close to uniform, which implies that most values are non-leaking.
The more difficult task is to prove the upper bound for sparsifying values. Basically, a value is sparsifying if for some disjoint , conditioning on the value of decreases the min-entropy of by more than bits. Our first step is to apply Bayesâ formula to the latter condition, thus obtaining a more convenient condition to which we refer as âskewingâ: a value is skewing if conditioning on the value of decreases the min-entropy of by more than bits â in other words, the min-entropy of conditioned on should be less than (roughly).
It remains to prove an upper bound on the probability that takes a skewing value. This requires proving a lower bound of roughly on the min-entropy of g^{I}(x_{I},Y_{I})\texttt{\mid}Y_{J} for most âs. By the min-entropy version of Vaziraniâs lemma (2.6), in order to prove this lower bound, it suffices to prove an upper bound on the bias of g^{S}(x_{S},Y_{S})\texttt{\mid}Y_{J} for every set for which333Recall that is a large constant such that . .
To this end, we use the âextractor-likeâ property of : recall that by the discrepancy of (2.13), the bias of g^{S}(x_{S},Y_{S})\texttt{\mid}Y_{J} is small for most âs whenever the min-entropy of and Y_{S}\texttt{\mid}Y_{J} is high. Furthermore, recall that the min-entropy of and is high since we assumed that and are dense. The key step is to observe that the min-entropy of Y_{S}\texttt{\mid}Y_{J} is still high, since is large compared to . Thus, the min-entropy of and Y_{S}\texttt{\mid}Y_{J} is high, so the bias of g^{S}(x_{S},Y_{S})\texttt{\mid}Y_{J} is small, and this implies the desired lower on the min-entropy of g^{I}(x_{I},Y_{I})\texttt{\mid}Y_{J}.
The argument we explained above almost works, except for a small issue: We said that H_{\infty}(Y_{S}\texttt{\mid}Y_{J}) is still high, since is large compared to . Here, we implicitly assumed that conditioning on the value of decreases the min-entropy of by roughly  bits. This assumption is true for the average value of , but may fail for values of that have a very small probability. In order to deal with such values, we define a parameter which measures the âexcess entropyâ of , and keep track of it throughout the proof. The key observation is that if we consider a value that has a small probability, then the criterion of âskewingâ actually requires the min-entropy of to decrease by roughly . Intuitively, this means that the smaller the probability of , the harder it becomes for to be skewing. After propagating the additional term of throughout our proof, we get that the set can be assumed to satisfy
[TABLE]
This makes the set sufficiently large compared to that we can still deduce that Y_{S}\texttt{\mid}Y_{J}=y_{J} has high min-entropy, which finished the argument. We now turn to provide the formal proof, starting with a formal definition of the parameter and the criterion of âskewingâ.
Definition 3.11**.**
Recall that since is -dense, it holds that for every and . We denote by the (non-negative) number that satisfies
[TABLE]
We say that a value is -skewing if there exist disjoint non-empty sets and a value such that
[TABLE]
Next, we show that every dangerous value must be either leaking or skewing by applying Bayesâ formula.
Claim 3.12**.**
Let be an -dangerous value that is not leaking for . Then is -skewing.
- Proof.
Suppose that is -dangerous for and that it is not leaking. We prove that is -skewing. By our assumption, must be -sparsifying, so there exists a set and an assignment such that the random variable
[TABLE]
is not -dense. Thus, there exists a set and a value such that
[TABLE]
By Bayesâ formula, it holds that
[TABLE]
Hence, it follows that
[TABLE]
which implies that
[TABLE]
This means that
[TABLE]
That is, Â is -skewing, as required. â
As explained above, we will upper bound the probability of dangerous values by upper bounding the biases of g(x_{S},Y_{S})\texttt{\mid}Y_{J} for every . To this end, it is convenient to define the notion of a âbiasing valueâ, which is a value for which one of the biases is too large.
Definition 3.13**.**
We say that a value is *biasing (for ) with respect to *disjoint sets and an assignment if
[TABLE]
We say that is -biasing (for ) with respect to a set if there exists a set and an assignment that satisfy
[TABLE]
such that is biasing with respect to , , and (if is the empty set, we define ). Finally, we say that is -biasing (for ) if there exists a non-empty set with respect to which  is -biasing.
We now apply the min-entropy version of Vaziraniâs lemma to show that values that are not biasing are not dangerous.
Claim 3.14**.**
If a value is not -biasing for then it is not -dangerous for .
- Proof.
Suppose that is a value that is not -biasing for . We prove that is not -dangerous for . We start by proving that is not leaking. Let and let . We wish to prove that
[TABLE]
Observe that, by the assumption that is not -biasing, it holds for every non-empty set that
[TABLE]
(this follows by substituting in the definition of -biasing and noting that in this case ). It now follows from 2.6 that , as required.
We turn to prove that is not -skewing. Let be disjoint sets and let be an assignment. We wish to prove that
[TABLE]
By 2.6, it suffices to prove that for every set such that it holds that
[TABLE]
To this end, observe that every such set satisfies
[TABLE]
and since by assumption is not -biasing with respect to , the required upper bound on the bias must hold. It follows that is neither leaking nor -skewing, and therefore it is not -dangerous, as required. â
Finally, we prove an upper bound on the probability of to take an -biasing value, which together with 3.14 implies 3.9. As explained above, the idea is to combine the discrepancy of with the observation that and have large min-entropy even conditioned on (which holds since are dense and is large compared to and ).
Proposition 3.15**.**
The probability that takes a value that is -biasing for is at most .
- Proof.
We begin with upper bounding the probability of to take a value that is -biasing with respect to specific choices of , , and , and the rest of the proof will follow by applying union bounds over all possible choices of , , and . Let be disjoint sets and let be an assignment such that , , and together satisfy Equation 4, i.e.,
[TABLE]
For simplicity, we assume that is non-empty (in the case where is empty, the argument is similar but simpler). Since we assumed that and that is non-empty, and it holds that and therefore
[TABLE]
In other words, it holds that
[TABLE]
By assumption, is -dense, so . By 2.2, it follows that
[TABLE]
Moreover, is -dense and thus
[TABLE]
where the second inequality is made to hold for by choosing to be sufficiently large. It follows by 2.13 (with and ) that the probability that takes a value for which
[TABLE]
is at most .
We turn to applying the union bounds. First, we show that for every , the probability that takes a value that is -biasing with respect to is at most by taking upper bound over all choices of and . Note that we only need to consider sets for which . It follows that the probability that takes a value that satisfies Equation 6 for some and is at most
[TABLE]
where the last inequality follows since
[TABLE]
The above calculation showed that the probability that takes a value that is -biasing with respect to a fixed set is at most . Taking a union bound over all non-empty sets , the probability that takes a value that is -biasing for is at most
[TABLE]
We have thus shown that the probability that takes a value that is -biasing is at most , as required. â
4 The deterministic lifting theorem
In this section, we prove the deterministic part of our main theorem. In fact, we prove the following more general result.
Theorem 4.1** (Deterministic lifting theorem).**
For every there exists such that the following holds: Let be such that , let be such that , let be a function such that , and let . Let be a deterministic protocol that takes inputs in and that has communication complexity and round complexity . Then, there exists a deterministic parallel decision tree that that on input outputs a transcript of that is consistent with some pair of inputs , and that has query complexity and depth .
Observe that this theorem implies the lower bound of the main theorem: Given a protocol that solves with complexity , we use the theorem to construct a tree that on input outputs the output of on some pair of inputs in . This tree clearly solves , and the query complexity of is . This implies that , or in other words, , as required.
For the rest of this section, fix to be an arbitrary deterministic protocol that takes inputs in , and denote by and its communication complexity and round complexity respectively. The rest of this section is organized as follows: We first describe the construction of the parallel decision tree in LABEL:Subsec:deterministic-construction. We then prove that the output of is always correct in LABEL:Subsec:deterministic-correctness. Finally, we upper bound the query complexity of in LABEL:Subsec:deterministic-complexity.
4.1 The construction ofÂ
Let be the maximum among the universal constants of 3.10 and the main technical lemma (3.9), and let be a universal constant that will be chosen to be sufficiently large to make the inequalities in the proof hold. Let , let , and let . The tree constructs the transcript by simulating the protocol round-by-round, each time adding a single message to . Throughout the simulation, the tree maintains a rectangle of inputs that are consistent with (but not necessarily of all such inputs). In what follows, we denote by and random variables that are uniformly distributed over and respectively. The tree will maintain the invariant that and are -structured, where is a restriction that keeps track of the queries the tree has made so far. In fact, the tree will maintain a more specific invariant: whenever it is Aliceâs turn to speak, is -dense and is -dense, and whenever it is Bobâs turn to speak, the roles of and are reversed.
When the tree starts the simulation, the tree sets the transcript to be the empty string, the restriction to , and the sets to . At this point the invariant clearly holds. We now explain how simulates a single round of the protocol while maintaining the invariant. Suppose that the invariant holds at the beginning of the current round, and assume without loss of generality that it is Aliceâs turn to speak. The tree performs the following steps:
The tree conditions on not taking a value that is -dangerous for (i.e., the tree removes from all the values for which is -dangerous for ). 2. 2.
The tree chooses an arbitrary message of Alice with the following property: the probability of Alice sending on input is at least (the existence of will be justified soon). The tree adds to the transcript , and conditions on the event of sending (i.e., the tree sets to be the subset of inputs that are consistent with ). 3. 3.
Let be a maximal set that violates the -density of (i.e., ), and let be a value that satisfies . The tree conditions  on (i.e., the tree removes from all the values that are inconsistent with that event). By 3.6, is now -dense. 4. 4.
The tree queries , and updates accordingly. 5. 5.
The tree conditions  on (i.e., the tree sets to be the subset of values for which ). Due to Step 1, the variable must take a value that is not -dangerous, and therefore is necessarily -dense.
After those steps take place, it becomes Bobâs turn to speak, and indeed, and are -dense and -dense respectively. Thus, the invariant is maintained. When the protocol stops, the tree outputs the transcript and halts. In order for the foregoing construction to be well-defined, it remains to explain three points:
- âą
First, we should explain why the set remains non-empty after Step 1 (otherwise, the following steps are not well-defined). To this end, recall that and are -structured and observe that can be made larger than by choosing to be sufficiently large (see LABEL:Subsec:deterministic-complexity for a detailed calculation). Hence, by our main lemma (3.9), the variable has a non-zero probability to take a value that is not -dangerous for , so is non-empty after this step.
- âą
Second we should explain why the message in Step 2 exists. To see why, observe that the set of possible messages of Alice forms a prefix-free code â otherwise, Bob will not be able to tell when Alice finished speaking and his turn starts. Hence, by 2.8, it follows that there exists a message with probability at least .
- âą
Third, we should explain why the set remains non-empty after Step 5. To this end, recall that must take a value that is not -dangerous for , and in particular, the value of is necessarily not leaking. This means that in particular that the string has non-zero probability to be equal to , so is non-empty after this step.
The depth of .
We now observe that the depth of is equal to the round complexity of . Note that in each round, the tree issues a set of queries simultaneously. Thus, is a parallel decision tree whose depth equals the maximal number of rounds of , as required.
4.2 The correctness ofÂ
We now prove that when the decision tree halts, the transcript is consistent with some inputs . Clearly, the transcript is consistent with all the inputs in the rectangle . Thus, it suffices to show that there exist and such that . To this end, recall that when the tree halts, the random variables and are -structured. Since is consistent with , it holds for every and that
[TABLE]
It remains to deal with the free coordinates of . Since can be made larger than by choosing to be sufficiently large (see 4.3 for a detailed calculation), it follows by 3.10 that
[TABLE]
In particular, there exist and such that
[TABLE]
By combining Equations (LABEL:deterministic-fixed-z) and (LABEL:deterministic-free-z), we get that , as required.
4.3 The query complexity ofÂ
We conclude by showing that the total number of queries the tree makes is at most . To this end, we define the deficiency of to be
[TABLE]
We will prove that whenever the protocol transmits a message , the deficiency increases by , and that whenever the tree makes a query, the deficiency is decreased by . Since the deficiency is always non-negative, and the protocol transmits at most  bits, it will follow that the tree must make at most queries. More specifically, we prove that in every round, the first two steps from 4.1 increase the deficiency by at most in total, and the rest of the steps decrease the deficiency by at least , and this will imply the desired result.
Fix a round of the simulation, and assume without loss of generality that the message is sent by Alice. We start by analyzing Step 1. At this step, the tree conditions on taking dangerous values that are not -dangerous for . We show that this step increases the deficiency by at most one bit. By applying our main technical lemma (3.9) with , it follows that the probability that is -dangerous is at most . By 2.2, it follows that conditioning on non-dangerous values decreases by at most one bit, and therefore it increases the deficiency by at most one bit. To see why we can apply the main lemma with , recall that at this point and are -structured, where
[TABLE]
where the last inequality can be made to hold by choosing to be sufficiently large.
Next, in Step 2, the tree conditions on the event of sending the message , which has probability at least . By 2.2, this decreases by at most bits, which increases the deficiency by at most bits. All in all, we showed that the first two steps of the simulation increase the deficiency by at most .
Let be the set of queries chosen in Step 3. We turn to show that the rest of the steps decrease the deficiency by at least . Without loss of generality, assume that (otherwise the latter bound holds vacuously). The rest of the steps apply the following changes to the deficiency:
- âą
Step 3 conditions on the event , which has probability greater than by the definition of . Hence, this conditioning increases the deficiency by less than (by 2.3).
- âą
Step 4 removes the set from . Looking at the definition of deficiency, this change decreases the term of by , decreases the term by at most (by 2.3), and does not change the term (since at this point is fixed to ). All in all, the deficiency is decreased by at least .
- âą
Finally, Step 5 conditions on the event . This event has probability at least by the assumption that is not dangerous (and hence not leaking). Thus, this conditioning increases the deficiency by at most (by 2.3).
Summing all those effects together, we get that the deficiency was decreased by at least
[TABLE]
By choosing to be sufficiently large, we can make sure that is a positive constant independent of and , and therefore the decrease in the deficiency will be at least , as required. To see it, observe that
[TABLE]
Therefore, if we choose , the expression on the right-hand side will be a constant that is strictly smaller than , as required.
5 The randomized lifting theorem
In this section, we prove the randomized part of our main theorem. In fact, we prove the following more general result.
Theorem 5.1** (Randomized lifting theorem).**
For every there exists such that the following holds: Let be such that , let be such that , let be a function such that , and let . Let be a randomized (public-coin) protocol that takes inputs in that has communication complexity and round complexity . Then, there exists a randomized parallel decision tree with the following properties:
- âą
On input , the tree outputs a transcript of , whose distribution is -close to the distribution of the transcripts of when given inputs that are uniformly distributed in .
- âą
The tree has query complexity and depth .
We first observe that 5.1 indeed implies the lower bound of our main theorem.
Let be a search problem, and let and . We prove that . Let be an optimal protocol that solves with complexity , and observe that we can assume without loss of generality that (since the players can solve any search problem by sending their whole inputs). By applying the theorem to , we construct a tree that on input samples a transcript of as in the theorem, and outputs the output that is associated with this transcript. It is not hard to see that the output of will be in with probability at least
[TABLE]
and that the query complexity of is . This implies that , or in other words, , as required. â
In the rest of this section we prove 5.1. We start the proof by observing that it suffices to prove the theorem for the special case in which the protocol is deterministic. To see why, recall that a randomized public-coin protocol is a distribution over deterministic protocols. Thus, if we prove the theorem for deterministic protocols, we can extend it to randomized protocols as follows: Given a randomized protocol , the tree will start by sampling a deterministic protocol from the distribution , and will then apply the theorem to . It is not hard to verify that such a tree satisfies the requirements of 5.1. Thus, it suffices to consider the case where is deterministic.
For the rest of this section, fix to be an arbitrary deterministic protocol that takes inputs in , and denote by and its communication complexity and round complexity respectively. The rest of this section is organized as follows: We first describe the construction of the parallel decision tree in LABEL:Subsec:randomized-construction. We then prove that the transcript that outputs is distributed as required in LABEL:Subsec:randomized-correctness. Finally, we upper bound the query complexity of in LABEL:Subsec:randomized-complexity.
5.1 The construction ofÂ
The construction of the randomized tree is similar to the construction of the deterministic lifting theorem (LABEL:Subsec:deterministic-construction), but has the following differences in the simulation:
- âą
In the deterministic construction, the tree chose the message arbitrarily subject to having sufficiently high probability. The reason we could do it is that it did not matter which transcript the tree would output as long as it was consistent in . In the randomized construction, on the other hand, we would like to output a transcript whose distribution is close to the âcorrectâ distribution. Therefore, we change the construction such that the message is chosen randomly according to the distribution of the inputs.
- âą
Since the messages are now sampled according to the distribution of the inputs, we can no longer guarantee that the message has sufficiently high probability. Therefore, the tree may choose messages that have very low probability, and such messages may reveal too much information about the inputs. In order to avoid that, the tree maintains a variable which keeps track of the amount of information that was revealed by the messages. If at any point  becomes too large, the tree halts and declares failure. This modification is important since if we allow the chosen messages to reveal too much information, then they will lead the tree to make too many queries. In particular, the bound on is used in 5.3 to upper bound the query complexity of .
- âą
In the deterministic construction, the tree restored the density of by fixing some set of coordinates to some value (using 3.6). Again, this was possible since it did not matter which transcript the tree would output. In the randomized construction, we cannot do it, since the transcript has to be distributed in a way that is close to be correct. In order to resolve this issue, we follow [GPW17] and use their âdensity-restoring partitionâ (3.7). Recall that this lemma says that the probability space of can be partitioned into dense parts. The tree now samples one of those parts according to their probabilities and conditions on being in this part. If this conditioning reveals too much information, then the tree halts and declares failure.
We turn to give a formal description of the construction. Let be the maximum among the universal constants of the uniform marginals lemma (3.4) and the main technical lemma (3.9), and let be a universal constant that will be chosen to be sufficiently large to make the inequalities in the proof hold. Let , and as before, , and . As before, the parallel decision tree constructs the transcript by simulating the protocol round-by-round, each time adding a single message to . Throughout the simulation, the tree maintains a rectangle of inputs that are consistent with (but not necessarily of all such inputs). In what follows, we denote by and random variables that are uniformly distributed over and respectively. As before, the tree will maintain the invariant that and are -structured, and that moreover, they are -dense and -dense respectively in Aliceâs rounds and the other way around in Bobâs rounds. As mentioned above, the tree will also maintain a variable from iteration to iteration, which will measure the information revealed so far.
When the tree starts the simulation, the tree sets the transcript to be the empty string, the restriction to , the variable to zero, and the sets to . At this point the invariant clearly holds. We now explain how simulates a single round of the protocol while maintaining the invariant. Suppose that the invariant holds at the beginning of the current round, and assume without loss of generality that it is Aliceâs turn to speak. The tree performs the following steps:
The tree conditions on not taking a value that is -dangerous for (i.e., the tree removes from all the values for which is -dangerous for ). 2. 2.
The tree samples a message of Alice according to the distribution induced by . Let be the probability of . The tree adds to the transcript, adds to , and conditions on (i.e., the tree sets to be the subset of inputs that are consistent with ). 3. 3.
If , the tree halts and declares error. 4. 4.
Let be the density-restoring partition of LABEL:Lem:density-restoring-partition with respect to . The tree chooses a random class in the partition, where the class is chosen with probability . Let be the chosen class, and let and be the set and the value associated with . The tree conditions on the event (i.e., the tree sets to be the subset of inputs such that ). The variable is now -dense by the properties of the density-restoring partition. 5. 5.
Recall that
[TABLE]
(see LABEL:Lem:density-restoring-partition). If , the tree halts and declares error. 6. 6.
The tree queries the coordinates in , and updates accordingly. 7. 7.
The tree conditions  on (i.e., the tree sets to be the subset of values for which ). Due to Step 1, the variable must take a value that is not -dangerous, and therefore is necessarily -dense.
After those steps take place, it becomes Bobâs turn to speak, and indeed, and are -dense and -dense respectively. Thus, the invariant is maintained. When the protocol stops, the tree outputs the transcript and halts. The proof that the above steps are well-defined is similar to the proof for the deterministic construction and is therefore omitted.
The depth of .
As in the proof of the deterministic lifting theorem, it is not hard to see that the depth of is equal to the round complexity of .
5.2 The correctness ofÂ
In this section, we prove the correctness of the construction. For convenience, we first prove the correctness of a modified tree , whose construction is the same as that of except that Step 3 is omitted. Fix an input . We define the following (random) transcripts of the protocol :
- âą
Let be a transcript that outputs when given .
- âą
Let be a transcript that outputs when given .
- âą
Let be a transcript of when given inputs that are uniformly distributed in .
Our end goal is to prove that and are -close. In order to do so, we will first prove that is -close to . We will then prove that is -close to . Together, the two results imply that is -close to , as required.
is close to .
We first prove that is -close to . To this end, we construct a coupling of and such that . Essentially, we construct the coupling by going over the simulation step-by-step and using the uniform marginals lemma to argue that at each step, and are close and can therefore be coupled (and similarly for and ). We start by setting some notation: for every , let us denote by be the rectangle from 5.1 at the end of the -th round of the simulation of (if halts before the -th round ends, set to be the rectangle at the end of the simulation). In our proof, we construct, for every :
- âą
A random rectangle that is jointly distributed with with the following property: conditioned on a specific choice of , the pair is uniformly distributed over .
- âą
A coupling of and such that .
Observe that if we can construct such rectangles and couplings, then it follows that and are close. To see it, observe that at any given point during the simulation, all the inputs in the rectangle are consistent with the transcript . Hence, if , it necessarily means that the inputs are consistent with the transcript , so . It follows that
[TABLE]
as required.
It remains to construct the rectangles and the associated couplings. We construct them by induction. Let , and suppose we have already constructed and its associated coupling (here, if we set both and to ). The -th coupling first samples and from the -th coupling. If they are different, then we set arbitrarily and assume that the coupling failed (i.e., and are different). Suppose now that and are equal, and condition on some specific choice of this rectangle. If the tree has already halted by this point, we set . Otherwise, we proceed as follows.
Let be a random pair that is uniformly distributed over , and recall that due to our conditioning, the pair is uniformly distributed over . We construct the rest of the coupling by following the simulation step-by-step. For Step 1, with probability
[TABLE]
we assume that the coupling failed and set arbitrarily. Otherwise, we condition both and on not taking a dangerous value. In order to analyze the probability of failure, recall that at the beginning of this step, are -structured, where
[TABLE]
where the last inequality can be made to hold by choosing to be sufficiently large. Hence, our main technical lemma (3.9) implies that the probability that is -dangerous for is at most
[TABLE]
Moreover, the uniform marginals lemma (3.4) implies that is -close to and therefore the probability that is -dangerous for is at most . Hence, the failure probability at this step is at most . Note that if the coupling does not fail, is conditioned on an event of probability at least , and therefore after the conditioning and are -structured.
For Steps 2 and 4, let and be the message and partition class that are distributed according to the input respectively. Let and be the corresponding message and class of , Since and are -structured, it can again be showed by the uniform marginals lemma that and are -close, and therefore the pair is -close to the pair . This implies that there exists a coupling of and such that the probability that they differ is at most . We sample and from this coupling. If they differ, we assume that the coupling failed, and set arbitrarily. Otherwise, we condition both and on being consistent with the message and the class , and denote by the set and values associated with . Finally, for Step 5, if , then we assume that the coupling fails and set arbitrarily (note that this happens with probability at most ).
At this point, we set , and set to be the set of inputs for which . It is easy to see that this choice satisfies . To analyze the total failure probability of this coupling, observe that by the induction assumption, the failure probability of the -th coupling is at most , and the other failure events discussed above at to that a failure probability of at most
[TABLE]
Hence, the failure probability of the -th coupling is at most , as required.
It remains to show that conditioned on any specific choice of , the pair is uniformly distributed over . In the cases where the coupling fails, we can ensure this property holds by first sampling and then setting . Suppose that the coupling did not fail. Recall that by the induction assumption, it holds that conditioned on the choice of , the pair is uniformly distributed over . Observe that all the -th coupling changes in the distribution of is to condition it on being in . Thus, at the end of the -th coupling, the pair is uniformly distributed over , as required.
is close to .
We turn to prove that is -close to . Let denote the event that the tree halts in Step 3. It is not hard to see that the statistical distance between and is exactly . We show that , and this will conclude the proof of correctness.
Intuitively, the reason that is that the tree halts only if the probability of the transcript up to that point is less than : to see it, observe that the variable measures (roughly) the logarithm of the probability of the transcript up to that point, and recall that the tree halts when . By taking union bound over all possible transcripts, we get that the halting probability is less than .
Unfortunately, the formal proof contains a messier calculation: the reason is that the probabilities of the messages as measured by depend on the choices of the classes in Step 4, so the foregoing intuition only holds for a given choice of these classes. Thus, the formal proof also sums over all the possible choices of classes and conditions on those choices. However, while the resulting calculation is more complicated, the idea is the same.
In order to facilitate the formal proof, we setup some useful notation. Let be the messages that are chosen in Step 2 of the simulation (so ), and let be the indices of the classes that are chosen in Step 4 (if the tree halts before the -th round, set to the empty string and set ). Observe that the execution of is completely determined by and , and in particular, and determine whether the event happens or not. With some abuse of notation, let us denote the fact that a particular choice of is consistent with by . For any , let us denote and . Observe that at the -th round, the probability in Step 2 is determined by and , and let us denote by p_{M_{i}\texttt{\mid}\pi_{<i},J_{<i}} this probability for a given choice of and . We are now ready to prove the upper bound on . It holds that
[TABLE]
Next, observe that for every choice of , the corresponding value of at the end of the simulation is
[TABLE]
In particular, if , then it holds that , and therefore
[TABLE]
It follows that
[TABLE]
as required. In the calculation above, Equality 9 follows since each sum goes over all the possible choices of , and Inequality 10 follows since has at most  distinct transcripts.
5.3 The query complexity ofÂ
The analysis of the query complexity here is similar to the analysis of the deterministic query complexity. The main difference is the following: In the deterministic setting, the increase in the deficiency due to a single message was upper bounded by , and therefore the total increase in the deficiency was upper bounded by . In the randomized case, the increase in the deficiency due to a single message is upper bounded by . Thus, we upper bound the total increase in the deficiency by . Since is never larger than due to Step 3, we conclude that the query complexity is at most . Details follow.
As before, we define the deficiency of to be
[TABLE]
We prove that whenever the protocol transmits a message , the deficiency increases by , and that whenever the tree makes a query, the deficiency is decreased by . Since the deficiency is always non-negative, and is never more than , it will follow that the tree must make at most queries. More specifically, we prove that in every round, the first two steps increase the deficiency by , and the rest of the steps decrease the deficiency by , and this will imply the desired result.
Fix a round of the simulation, and assume without loss of generality that the message is sent by Alice. We start by analyzing Step 1. At this step, the tree conditions on taking dangerous values that are not -dangerous for . Using the same calculation as in LABEL:Subsec:randomized-correctness, it can be showed that the probability of non-dangerous values is at least . Therefore, this step increases the deficiency by at most  bit. Next, in Step 2, the tree conditions on an event of choosing the message , whose probability is by definition. Thus, this step increases the deficiency by at most bits. All in all, we showed that the first two steps of the simulation increase the deficiency by at most bits.
Let be the partition class that is sampled in Step 4, and let be the set and value that are associated with . We turn to show that the rest of the steps decrease the deficiency by . Those steps apply the following changes to the deficiency:
- âą
Step 4 conditions on the event . By 3.7, this conditioning increases the deficiency at most . Recall that by Step 5, the probability can never be less than . Thus, this step increases the deficiency by at most
[TABLE]
- âą
Step 6 removes the set from . Looking at the definition of deficiency, this change decreases the term of by , decreases the term by at most (2.3), and does not change the term (since at this point is fixed to ). All in all, the deficiency is decreased by at least .
- âą
Finally, Step 7 conditions on the event . This event has probability at least by the assumption that is not dangerous (and hence not leaking). Thus, this conditioning increases the deficiency by at most .
Summing all those effects together, we get that the deficiency was decreased by at least
[TABLE]
By choosing to be sufficiently large, we can make sure that is a positive constant independent of and , and therefore the decrease in the deficiency will be at least , as required. To see it, observe that
[TABLE]
Thus, if we choose such that , the expression on the right-hand side will be a constant that is strictly smaller than . It is not hard to see that we can choose such a value of that satisfies .
Acknowledgement**.**
We thank Daniel Kane for some very enlightening conversations and suggestions. The authors would also like to thank anonymous referees for comments that improved the presentation of this work. Part of this work was carried out while the authors were visiting the Simons Institute for the Theory of Computing.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[BBCR 10] Boaz Barak, Mark Braverman, Xi Chen, and Anup Rao. How to compress interactive communication. In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010 , pages 67â76, 2010.
- 2[BFS 86] LĂĄszlĂł Babai, Peter Frankl, and Janos Simon. Complexity classes in communication complexity theory (preliminary version). In 27th Annual Symposium on Foundations of Computer Science, Toronto, Canada, 27-29 October 1986 , pages 337â347, 1986.
- 3[BPSW 06] Paul Beame, Toniann Pitassi, Nathan Segerlind, and Avi Wigderson. A strong direct product theorem for corruption and the multiparty communication complexity of disjointness. Computational Complexity , 15(4):391â432, 2006.
- 4[BR 14] Mark Braverman and Anup Rao. Information equals amortized communication. IEEE Trans. Information Theory , 60(10):6058â6069, 2014.
- 5[Bra 17] Mark Braverman. Interactive information complexity. SIAM Review , 59(4):803â846, 2017.
- 6[BRWY 13] Mark Braverman, Anup Rao, Omri Weinstein, and Amir Yehudayoff. Direct products in communication complexity. In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013, 26-29 October, 2013, Berkeley, CA, USA , pages 746â755, 2013.
- 7[CFK + 19] Arkadev Chattopadhyay, Yuval Filmus, Sajin Koroth, Or Meir, and Toniann Pitassi. Query-to-communication lifting for BPP using inner product. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9-12, 2019, Patras, Greece , volume 132 of LIP Ics , pages 35:1â35:15. Schloss Dagstuhl - Leibniz-Zentrum fĂŒr Informatik, 2019.
- 8[CKLM 17] Arkadev Chattopadhyay, Michal KouckĂœ, Bruno Loff, and Sagnik Mukhopadhyay. Simulation theorems via pseudorandom properties. Co RR , abs/1704.06807, 2017.
