Query-to-Communication Lifting Using Low-Discrepancy Gadgets

Arkadev Chattopadhyay; Yuval Filmus; Sajin Koroth; Or Meir; Toniann; Pitassi

arXiv:1904.13056·cs.CC·October 6, 2021

Query-to-Communication Lifting Using Low-Discrepancy Gadgets

Arkadev Chattopadhyay, Yuval Filmus, Sajin Koroth, Or Meir, Toniann, Pitassi

PDF

TL;DR

This paper introduces a new lifting theorem that extends the class of gadgets with low discrepancy, including logarithmic-size gadgets, thereby broadening the applicability of query-to-communication complexity reductions.

Contribution

It proves a lifting theorem for all gadgets with logarithmic length and exponentially-small discrepancy, including randomized cases, significantly expanding previous limitations.

Findings

01

Lifting theorem now applies to all gadgets with logarithmic length and small discrepancy.

02

First randomized lifting theorem for logarithmic-size gadgets.

03

Generalizes direct-sum theorems for low-discrepancy functions.

Abstract

Lifting theorems are theorems that relate the query complexity of a function $f : {0, 1}^{n} \to {0, 1}$ to the communication complexity of the composed function $f \circ g^{n}$ , for some "gadget" $g : {0, 1}^{b} \times {0, 1}^{b} \to {0, 1}$ . Such theorems allow transferring lower bounds from query complexity to the communication complexity, and have seen numerous applications in the recent years. In addition, such theorems can be viewed as a strong generalization of a direct-sum theorem for the gadget $g$ . We prove a new lifting theorem that works for all gadgets $g$ that have logarithmic length and exponentially-small discrepancy, for both deterministic and randomized communication complexity. Thus, we significantly increase the range of gadgets for which such lifting theorems hold. Our result has two main motivations: First, allowing a larger variety of gadgets may support more…

Equations314

f \circ g^{n} ((x_{1}, y_{1}), \dots, (x_{n}, y_{n})) = f (g (x_{1}, y_{1}), g (x_{2}, y_{2}), \dots, g (x_{n}, y_{n})) .

f \circ g^{n} ((x_{1}, y_{1}), \dots, (x_{n}, y_{n})) = f (g (x_{1}, y_{1}), g (x_{2}, y_{2}), \dots, g (x_{n}, y_{n})) .

D^{cc} (S \circ g^{n})

D^{cc} (S \circ g^{n})

R_{ε}^{cc} (S \circ g^{n})

disc_{R} (g) = ∣ Pr [g (U, V) = 0 and (U, V) \in R] - Pr [g (U, V) = 1 and (U, V) \in R] ∣ .

disc_{R} (g) = ∣ Pr [g (U, V) = 0 and (U, V) \in R] - Pr [g (U, V) = 1 and (U, V) \in R] ∣ .

D^{cc} (g) \geq R_{ε}^{cc} (g) \geq lo g \frac{1 - 2 \cdot ε}{disc ( g )}

D^{cc} (g) \geq R_{ε}^{cc} (g) \geq lo g \frac{1 - 2 \cdot ε}{disc ( g )}

D^{cc} (S \circ g^{n}) = Ω (D^{dt} (S) \cdot b),

D^{cc} (S \circ g^{n}) = Ω (D^{dt} (S) \cdot b),

R_{ε}^{cc} (S \circ g^{n}) = Ω ((R_{ε^{'}}^{dt} (S) - O (1)) \cdot b),

R_{ε}^{cc} (S \circ g^{n}) = Ω ((R_{ε^{'}}^{dt} (S) - O (1)) \cdot b),

R_{ε}^{cc} (S \circ g^{n}) = Ω (R_{ε^{'}}^{dt} (S) \cdot IC (g)) .

R_{ε}^{cc} (S \circ g^{n}) = Ω (R_{ε^{'}}^{dt} (S) \cdot IC (g)) .

R_{ε}^{cc} (S \circ g^{n}) = Ω (R_{ε^{'}}^{dt} (S) \cdot lo g \frac{1}{disc ( g )}) .

R_{ε}^{cc} (S \circ g^{n}) = Ω (R_{ε^{'}}^{dt} (S) \cdot lo g \frac{1}{disc ( g )}) .

bias (V) = def ∣ Pr [V = 0] - Pr [V = 1] ∣ .

bias (V) = def ∣ Pr [V = 0] - Pr [V = 1] ∣ .

g^{n} (x, y) = def (g (x_{1}, y_{1}), \dots, g (x_{n}, y_{n})) .

g^{n} (x, y) = def (g (x_{1}, y_{1}), \dots, g (x_{n}, y_{n})) .

Pr [T (z) \in S (z)] \geq 1 - ε .

Pr [T (z) \in S (z)] \geq 1 - ε .

χ_{S} (z) = def (- 1)^{⨁_{i \in S} z_{i}} .

χ_{S} (z) = def (- 1)^{⨁_{i \in S} z_{i}} .

\hat{f} (S) = def \frac{1}{2 ^{m}} z \in {0, 1}^{m} \sum f (z) \cdot χ_{S} (z) .

\hat{f} (S) = def \frac{1}{2 ^{m}} z \in {0, 1}^{m} \sum f (z) \cdot χ_{S} (z) .

f (z) = S \subseteq [m] \sum \hat{f} (S) \cdot χ_{S} (z) .

f (z) = S \subseteq [m] \sum \hat{f} (S) \cdot χ_{S} (z) .

∣ \overset{μ}{^} (S) ∣ = 2^{- m} \cdot bias (i \in S ⨁ Z_{i}) .

∣ \overset{μ}{^} (S) ∣ = 2^{- m} \cdot bias (i \in S ⨁ Z_{i}) .

∣ \overset{μ}{^} (S) ∣

∣ \overset{μ}{^} (S) ∣

= 2^{- m} \cdot z \in {0, 1}^{m} \sum μ (z) \cdot (- 1)^{⨁_{i \in S} z_{i}}

= 2^{- m} \cdot z \in {0, 1}^{m} : ⨁_{i \in S} z_{i} = 0 \sum μ (z) - z \in {0, 1}^{m} : ⨁_{i \in S} z_{i} = 1 \sum μ (z)

= 2^{- m} \cdot Pr [i \in S ⨁ Z_{i} = 0] - Pr [i \in S ⨁ Z_{i} = 1]

= 2^{- m} \cdot bias (i \in S ⨁ Z_{i}),

∣ μ_{1} - μ_{2} ∣ = E \subseteq Ω max {∣ μ_{1} (E) - μ_{2} (E) ∣} .

∣ μ_{1} - μ_{2} ∣ = E \subseteq Ω max {∣ μ_{1} (E) - μ_{2} (E) ∣} .

Pr [X = x] \leq 2^{- k} .

Pr [X = x] \leq 2^{- k} .

bias (i \in S ⨁ Z_{i}) \leq ε \cdot (2 \cdot m)^{- ∣ S ∣}

bias (i \in S ⨁ Z_{i}) \leq ε \cdot (2 \cdot m)^{- ∣ S ∣}

(1 - ε) \cdot \frac{1}{2 ^{m}} \leq Pr [Z = z] \leq (1 + ε) \cdot \frac{1}{2 ^{m}} .

(1 - ε) \cdot \frac{1}{2 ^{m}} \leq Pr [Z = z] \leq (1 + ε) \cdot \frac{1}{2 ^{m}} .

μ (z) - 2^{- m}

μ (z) - 2^{- m}

\overset{μ}{^} (\emptyset) = 2^{- m}

∣ χ_{S} (z) ∣

= 2^{- m} \cdot S \subseteq [m] : S \neq = \emptyset \sum bias (i \in S ⨁ Z_{i})

\leq 2^{- m} \cdot S \subseteq [m] : S \neq = \emptyset \sum ε \cdot (2 \cdot m)^{- ∣ S ∣}

= ε \cdot 2^{- m} \cdot i = 1 \sum m (i m) \cdot (2 \cdot m)^{- i}

\leq ε \cdot 2^{- m} \cdot i = 1 \sum m m^{i} \cdot (2 \cdot m)^{- i}

\leq ε \cdot 2^{- m} \cdot i = 1 \sum m 2^{- i}

\leq ε \cdot 2^{- m},

bias (i \in S ⨁ Z_{i}) \leq (2 \cdot m)^{- ∣ S ∣},

bias (i \in S ⨁ Z_{i}) \leq (2 \cdot m)^{- ∣ S ∣},

μ (z)

μ (z)

∣ χ_{S} (z) ∣

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\RS@ifundefined

subsecref \newrefsubsecname = \RSsectxt

\RS@ifundefinedthmref \newrefthmname = theorem

\RS@ifundefinedlemref \newreflemname = lemma

\newrefthmname=Theorem , Name=Theorem , names=Theorems , Names=Theorems \newrefsubsecname=Section , Name=Section , names=Sections , Names=Sections \newrefsecname=Section , Name=Section , names=Sections , Names=Sections

\newrefdefname=Definition , Name=Definition , names=Definitions , Names=Definition

\newrefremname=Remark , Name=Remark , names=Remarks , Names=Remarks

\newrefconname=Conjecture , Name=Conjecture , names=Conjectures , Names=Conjectures

\newreffacname=Fact , Name=Fact , names=Facts , Names=Facts

\newreflemname=Lemma , Name=Lemma , names=Lemmas , Names=Lemmas

\newrefcorname=Corollary , Name=Corollary , names=Corollarys , Names=Corollaries

\newrefproname=Proposition , Name=Proposition , names=Propositions , Names=Propositions

\newrefclaname=Claim , Name=Claim , names=Claims , Names=Claims

Query-to-Communication Lifting Using Low-Discrepancy Gadgets††thanks: This work subsumes an earlier work that appeared in ICALP 2019 [CFK+19].

The earlier work proved our main result (1.2) only for the special case where the gadget $g$ is the inner product function, while this work proves the result for the general case of all low-discrepancy gadgets.

Arkadev Chattopadhyay School of Technology and Computer Science, Tata Institute of Fundamental Research, Mumbai, India. [email protected].

Yuval Filmus Technion Israel Institute of Technology, Haifa, Israel. [email protected]. Taub Fellow — supported by the Taub Foundations. The research was funded by ISF grant 1337/16.

Sajin Koroth School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, B.C., Canada V5A 1S6. This research was done while Sajin Koroth was partially supported by the Israel Science Foundation (grant No. 1445/16) and by the institutional postdoctoral program of the University of Haifa.

Or Meir Department of Computer Science, University of Haifa, Haifa 3498838, Israel. [email protected]. Partially supported by the Israel Science Foundation (grant No. 1445/16).

Toniann Pitassi Department of Computer Science, University of Toronto, Canada. [email protected]. Research supported by NSERC and. by NSF CCF grant 1900460

Abstract

Lifting theorems are theorems that relate the query complexity of a function $f:\left\{0,1\right\}^{n}\to\left\{0,1\right\}$ to the communication complexity of the composed function $f\circ g^{n}$ , for some “gadget” $g:\left\{0,1\right\}^{b}\times\left\{0,1\right\}^{b}\to\left\{0,1\right\}$ . Such theorems allow transferring lower bounds from query complexity to the communication complexity, and have seen numerous applications in the recent years. In addition, such theorems can be viewed as a strong generalization of a direct-sum theorem for the gadget $g$ .

We prove a new lifting theorem that works for all gadgets $g$ that have logarithmic length and exponentially-small discrepancy, for both deterministic and randomized communication complexity. Thus, we significantly increase the range of gadgets for which such lifting theorems hold.

Our result has two main motivations: First, allowing a larger variety of gadgets may support more applications. In particular, our work is the first to prove a randomized lifting theorem for logarithmic-size gadgets, thus improving some applications of the theorem. Second, our result can be seen as a strong generalization of a direct-sum theorem for functions with low discrepancy.

1 Introduction

1.1 Background

In this work, we prove new lifting theorems for a large family of gadgets. Let $f\colon\{0,1\}^{n}\to\{0,1\}$ and $g\colon\{0,1\}^{b}\times\{0,1\}^{b}\to\{0,1\}$ be functions (where $g$ is referred to as a gadget). The block-composed function $f\circ g^{n}$ is the function that takes $n$ inputs $(x_{1},y_{1}),\ldots,(x_{n},y_{n})$ for $g$ and computes $f\circ g^{n}$ as,

[TABLE]

Lifting theorems are theorems that relate the communication complexity of $f\circ g^{n}$ to the query complexity of $f$ and the communication complexity of $g$ .

More specifically, consider the following communication problem: Alice gets $x_{1},\ldots,x_{n}$ , Bob gets $y_{1},\ldots,y_{n}$ , and they wish to compute the output of $f\circ g^{n}$ on their inputs. The natural protocol for doing so is the following: Alice and Bob jointly simulate a decision tree of optimal height for solving $f$ . Any time the tree queries the $i$ -th bit, they compute $g$ on $(x_{i},y_{i})$ by invoking the best possible communication protocol for $g$ . A lifting theorem is a theorem that says that this natural protocol is optimal.

We note that it is often desirable to consider the case where $f$ is a search problem with an arbitrary range rather than a boolean function (see Section 2 for the definition of search problems). Most of the known results, as well as the results of this work, apply to this general case. However, for the simplicity of presentation, we focus for now on the case where $f$ is a boolean function.

Applications of lifting theorems.

One important reason for why lifting theorems are interesting is that they create a connection between query complexity and communication complexity. This connection, besides being interesting in its own right, allows us to transfer lower bounds and separations from query complexity (which is a relatively simple model) to communication complexity (which is a significantly richer model).

In particular, the first result of this form, due to Raz and McKenzie [RM99], proved a lifting theorem from deterministic query complexity to deterministic communication complexity when $g$ is the index function. They then used it to prove new lower bounds on communication complexity by lifting query-complexity lower-bounds. More recently, Göös, Pitassi and Watson [GPW15] applied that theorem to separate the logarithm of the partition number and the deterministic communication complexity of a function, resolving a long-standing open problem. This too was done by proving such a separation in the setting of query complexity and then lifting it to the setting of communication complexity. This result stimulated a flurry of work on lifting theorems of various kinds, such as: lifting for zero-communication protocols [GLM*+*16], round-preserving lifting theorems with applications to time-space trade-offs for proof complexity [dRNV16], deterministic lifting theorems with other gadgets [CKLM17, WYY17], lifting theorems from randomized query complexity to randomized communication complexity [GPW17], lifting theorems for DAG-like protocols [GGKS18] with applications to monotone circuit lower bounds, lifting theorems for asymmetric communication problems [CKLM18] with applications to data-structures, a lifting theorem for the EQUALITY gadget [LM18], lifting theorems for XOR functions with applications to the log-rank conjecture [HHL18], and lifting theorems for applications to monotone formula complexity, monotone span programs, and proof complexity [GP18, RPRC16, PR17, PR18]. There are also lifting theorems which lift more analytic properties of the function like approximate degree due to Sherstov [She11] and independently due to Shi and Zhu [SZ09].

In almost all known lifting theorems, the function $f$ can be arbitrary while $g$ is usually a specific function (e.g., the index function). This raises the following natural question: for which choices of $g$ can we prove lifting theorems? This question is interesting because usually the applications of lifting theorems work by reducing the composed function $f\circ g^{n}$ to some other problem of interest, and the choice of the gadget $g$ affects the efficiency of such reductions.

In particular, applications of lifting theorems often depend on the size of the gadget, which is the length $b$ of the input to $g$ . Both the deterministic lifting theorem of Raz and McKenzie [RM99] and the randomized lifting theorem of Göös et al. [GPW17] use a gadget of very large size (polynomial in $n$ ). Reducing the gadget size to a constant would have many interesting applications.

In the deterministic setting, the gadget size was recently improved to logarithmic by the independent works of [CKLM17] and [WYY17]. Moreover, [CKLM17, Koz18] showed the lifting to work for a class of gadgets with a certain pseudorandom property rather than just a single specific gadget. A gadget of logarithmic size was also obtained earlier in lifting theorems for more specialized models, such as the work of [GLM*+*16]. However, the randomized lifting theorem of Göös et al. [GPW17] seemed to work only with a specific gadget of polynomial size.

In this work, we prove a lifting theorem for a large family of gadgets, namely, all functions $g$ with logarithmic length and exponentially-small discrepancy (see 1.2 for details). Our theorem holds both in the deterministic and the randomized setting. This allows for a considerably larger variety of gadgets: in particular, our theorem is the first lifting theorem in the randomized setting that uses logarithmic-size gadgets, it allows lifting with the inner-product gadget (previously known only in the deterministic setting [CKLM17, WYY17]), and it is also the first lifting theorem that shows that a random function can be used as a gadget.

We would like to point out that, although we reduce the gadget size to logarithmic in this work, it is not enough to obtain the interesting applications a constant sized gadget would have yielded. Nevertheless, our randomized lifting theorem still has some applications. For example, our theorem can be used to simplify the lower bounds of Göös and Jayram [GJ16] on AND-OR trees and MAJORITY trees, since we can now obtain them directly from the randomized query complexity lower bounds rather than going through conical juntas. In addition, our theorem can be used to derive the separation of randomized separation from partition number (due to [GJPW15]) for functions with larger complexity (compared to their input length).

Lifting theorems as a generalization of direct-sum theorems.

Lifting theorems can also be motivated from another angle, which is particularly appealing in our case: lifting theorems can be viewed as a generalization of direct-sum theorems. The direct-sum question is a classical question in complexity theory, which asks whether performing a task on $n$ independent inputs is $n$ times harder than performing it on a single input. When specialized to the setting of communication complexity, a direct-sum theorem is a theorem that says that the communication complexity of a computing $g$ on $n$ independent inputs is about $n$ times larger than the communication complexity of $g$ . A related type of result, which is sometimes referred to as an “XOR lemma”, says that computing the XOR of the outputs of $g$ on $n$ independent inputs is about $m$ times larger than the communication complexity of $g$ .

The direct-sum question for communication complexity has been raised in [KRW91], and has since attracted much attention. While we do not have a general direct-sum theorem for all functions, many works have proved direct-sum theorems and XOR lemmas for large families of functions [FKNN95, Sha03, BPSW06, LSS08, Kla10, BBCR10, BRWY13, Bra17] as well as provided counterexamples [FKNN95, GKR14, GKR16b, GKR16a].

Now, observe that lifting theorems are natural generalizations of direct-sum theorems and XOR lemmas: in particular, if we set $f$ to be the identity function or the parity function, we get a direct sum theorem or an XOR lemma for $g$ , respectively. More generally, a lifting theorem says that the communication complexity of computing any function $f$ of the outputs of $g$ on independent inputs is larger than the complexity of $g$ by a factor that depends on the query complexity of $f$ . This is perhaps the strongest and most natural “direct-sum-like theorem” for $g$ that one could hope for.

From this perspective, it is natural to ask which functions $g$ admit such a strong theorem. The previous works of [RM99, GPW17] can be viewed as establishing this theorem only for the index function. The work of [CKLM17, Koz18] have made further progress, establishing this theorem for a class of functions with satisfy certain hitting property. However, the latter property is somewhat non-standard and ad hoc, and their theorem holds only in the deterministic setting. In this work, we establish such theorems for all functions $g$ with low discrepancy, which is a standard and well-studied complexity measure, and we do so in both the deterministic and the randomized setting.

1.2 Our results

In this work, we prove lifting theorems for gadgets of low-discrepancy. In what follows, we denote by $D^{\mathrm{dt}}$ and $D^{\mathrm{cc}}$ the deterministic query complexity and communication complexity of a task respectively, and by $R^{\mathrm{dt}}_{\varepsilon}$ and $R^{\mathrm{cc}}_{\varepsilon}$ the randomized query complexity and (public-coin) communication complexity with error probability $\varepsilon$ respectively. Given a search problem $\mathcal{S}$ and a gadget $g:\left\{0,1\right\}^{b}\times\left\{0,1\right\}^{b}\to\left\{0,1\right\}$ , it is easy to see that

[TABLE]

This upper bound is proved using the simple protocol discussed earlier: the party simulates the optimal decision tree for $\mathcal{S}$ , and whenever a query is made, the parties compute $g$ on the corresponding input in order to answer the query (which can be done by communicating at most $b+1$ bits). Our main result says that when $g$ has low discrepancy and $b$ is at least logarithmic, that this upper bound is roughly tight. In order to state this result, we first recall the definition of discrepancy.

Definition 1.1.

Let $\Lambda$ be a finite set, let $g:\Lambda\times\Lambda\to\left\{0,1\right\}$ be a function, and let $U,V$ be independent random variables that are uniformly distributed over $\Lambda$ . Given a combinatorial rectangle $R\subseteq\Lambda\times\Lambda$ , the discrepancy of $g$ with respect to $R$ , denoted ${\rm disc}_{R}(g)$ , is defined as follows:

[TABLE]

The discrepancy of $g$ , denoted ${\rm disc}(g)$ , is defined as the maximum of ${\rm disc}_{R}(g)$ over all combinatorial rectangles $R\subseteq\Lambda\times\Lambda$ .

Discrepancy is a useful measure for the complexity of $g$ , and in particular, it is well-known that for $\varepsilon>0$ :

[TABLE]

(see, e.g., [KN97]). We now state our main result.

Theorem 1.2 (Main theorem).

For every $\eta>0$ there exists $c=O(\frac{1}{\eta^{2}}\cdot\log\frac{1}{\eta})$ such that the following holds: Let $\mathcal{S}$ be a search problem that takes inputs from $\left\{0,1\right\}^{n}$ , and let $g:\left\{0,1\right\}^{b}\times\left\{0,1\right\}^{b}\to\left\{0,1\right\}$ be an arbitrary function such that ${\rm disc}(g)\leq 2^{-\eta\cdot b}$ and such that $b\geq c\cdot\log n$ . Then

[TABLE]

and for every $\varepsilon>0$ it holds that

[TABLE]

where $\varepsilon^{\prime}=\varepsilon+2^{-\eta\cdot b/8}$ .

We note that our results are in fact more general, and preserve the round complexity of $\mathcal{S}$ among other things. See Sections 4 and 5 for more details.

Remark 1.3.

Note that our main theorem can be applied to a random function $g:\left\{0,1\right\}^{b}\times\left\{0,1\right\}^{b}\to\left\{0,1\right\}$ , since such a function has a very low discrepancy. As noted above, we believe that our theorem is the first theorem to allow the gadget to be a random function.

Unifying deterministic and randomized lifting theorems.

The existing proofs of deterministic lifting theorems and randomized lifting theorems are quite different. While both proofs rely on information-theoretic arguments, they measure information in different ways. In particular, while the randomized lifting theorem of [GPW17] (following [GLM*+*16]) measures information using min-entropy, the deterministic lifting theorems of [RM99, GPW15, CKLM17, WYY17] (following [EIRS01]) measure information using a notion known as thickness (with [GGKS18] being a notable exception). A natural direction of further research is to investigate if these disparate techniques can be unified. Indeed, a related question was raised by [GLM*+*16], who asked if min-entropy based techniques could be used to prove (or simplify the existing proof of) Raz–McKenzie style deterministic lifting theorems.

Our work answers this question affirmatively: we prove both the deterministic and randomized lifting theorems using the same strategy. In particular, both proofs measure information using min-entropy. In doing so, we unify both lifting theorems under the same framework.

1.3 Our techniques

We turn to describe the high-level ideas that underlie the proof of our main theorem. Following the previous works, we use a “simulation argument”: We show that given a protocol $\Pi$ that solves $\mathcal{S}\circ g^{n}$ with communication complexity $C$ , we can construct a decision tree $T$ that solves $\mathcal{S}$ with query complexity $O(\frac{C}{b})$ . The decision tree $T$ works by simulating the action of the protocol $\Pi$ (hence the name “simulation argument”). We now describe this simulation in more detail, following the presentation of [GPW17].

The simulation argument.

For simplicity of notation, let us denote $\Lambda=\left\{0,1\right\}^{b}$ , so $g$ is a function from a “block” in $\Lambda\times\Lambda$ to $\left\{0,1\right\}$ . Let $G=g^{n}:\Lambda^{n}\times\Lambda^{n}\to\left\{0,1\right\}^{n}$ be the function that takes $n$ disjoint blocks and computes the outputs of $g$ on all of them. We assume that we have a protocol $\Pi$ that solves $\mathcal{S}\circ G$ with complexity $C$ , and would like to construct a decision tree $T$ that solves $\mathcal{S}$ with complexity $O(\frac{C}{b})$ . The basic idea is that given an input $z\in\left\{0,1\right\}^{n}$ , the tree $T$ simulates the action of $\Pi$ on the random inputs $(X,Y)$ that are uniformly distributed over $G^{-1}(z)$ . Clearly, it holds that $\mathcal{S}\circ G(X,Y)=\mathcal{S}(z)$ , so this simulation, if done right, outputs the correct answer.

The core issue in implementing such a simulation is the following question: how can $T$ simulate the action of $\Pi$ on $(X,Y)\in G^{-1}(z)$ without knowing $z$ ? The answer is that as long as the protocol $\Pi$ has transmitted less than $\varepsilon\cdot b$ bits of information about every block $(X_{i},Y_{i})$ (for some specific $\varepsilon>0$ ), the distribution of $(X,Y)$ is similar to the uniform distribution in a certain sense (that will be formalized soon). Thus, the tree $T$ can pretend that $(X,Y)$ are distributed uniformly, and simulate the action of $\Pi$ on such inputs, which can be done without knowing $z$ .

This idea can be implemented as long as the protocol has transmitted less than $\varepsilon\cdot b$ bits of information about every block $(X_{i},Y_{i})$ . However, at some point, the protocol may transmit more than $\varepsilon\cdot b$ bits of information about some blocks. Let $I\subseteq\left[n\right]$ denote the set of these blocks. At this point, it is no longer true that the distribution of $(X,Y)$ is similar to the uniform distribution. However, it can be shown that the distribution of $(X,Y)$ is similar to the uniform distribution conditioned on $g^{I}(X_{I},Y_{I})=z_{I}$ . Thus, the tree $T$ queries the bits in $z_{I}$ , and can now continue the simulation of $\Pi$ on $(X,Y)\in G^{-1}(z)$ by pretending that $(X,Y)$ are distributed uniformly conditioned on $g^{I}(X_{I},Y_{I})=z_{I}$ . The tree proceeds in this way, adding blocks to $I$ as necessary, until the protocol $\Pi$ ends, at which point $T$ halts and outputs the same output as $\Pi$ .

It remains to show that the query complexity of $T$ is at most $O(\frac{C}{b})$ . To this end, observe that the query complexity of $T$ is exactly the size of the set $I$ at the end of the simulation. Moreover, recall that the set $I$ is the set of blocks on which the protocol transmitted at least $\varepsilon\cdot b$ bits of information. Hence, at any given point, the protocol must have transmitted at least $\varepsilon\cdot b\cdot\left|I\right|$ bits. On the other hand, we know by assumption that the protocol never transmitted more than $C$ bits. This implies that $\varepsilon\cdot b\cdot\left|I\right|\leq C$ and therefore the query complexity of the tree $T$ is at most $\left|I\right|\leq\frac{C}{\varepsilon\cdot b}=O(\frac{C}{b})$ . This concludes the argument.

Our contribution.

In order to implement the foregoing simulation argument, there are two technical issues that need to be addressed and are relevant at this point:

•

The uniform marginals issue: In the above description, we argued that as long the protocol has not transmitted too much information, the distribution of $(X,Y)$ is “similar to the uniform distribution”. The question is how do we formalize this idea. This issue was dealt with implicitly in several works in the lifting literature since [RM99], and was made explicit in [GPW17] as the “uniform marginals lemma”: if every set of blocks in $(X,Y)$ has sufficient min-entropy, then each of the marginals $X,Y$ on its own is close to the uniform distribution. In [GPW17], they proved this lemma for the case where $g$ is the index function, and in [GLM*+*16] a very similar lemma was proved for the case where $g$ is the inner product function.

•

The conditioning issue: As we described above, when the protocol transmits too much information about a set of blocks $I\subseteq\left[n\right]$ , the tree $T$ queries $z_{I}$ and conditions the distribution of $(X,Y)$ on the event that $g^{I}(X_{I},Y_{I})=z_{I}$ . In principle, this conditioning may reveal information on $(X_{\left[n\right]-I},Y_{\left[n\right]-I})$ , which might reduce their min-entropy and ruin their uniform-marginals property. In order for the simulation argument to work, one needs to show that this cannot happen, and the conditioning will never reveal too much information about $(X_{\left[n\right]-I},Y_{\left[n\right]-I})$ .

In the works of [RM99, WYY17, GPW17] this issue was handled by arguments that are tailored to the index and inner product functions. The work of [CKLM17] gave this issue a more general treatment, by identifying an abstract property of $g$ that prevents the conditioning from revealing too much information. However, as discussed above, this abstract property is somewhat ad hoc, and only works for deterministic simulation.

Our contribution is dealing with both issues in the general setting where $g$ is an arbitrary low-discrepancy gadget. In order to deal with the first issue, we prove a “uniform marginals” lemma for such gadgets $g$ : this is relatively easy, since the proof of [GLM*+*16] for the inner product gadget turns out to generalize in a straightforward way to arbitrary low-discrepancy gadgets.

The core of this work is in dealing with the conditioning issue. Our main technical lemma that says that as long as every set of blocks in $(X,Y)$ has sufficient min-entropy, there are only few possible values of $X,Y$ that are “dangerous” (in the sense that they may lead the conditioning to leak too much information). We now modify the simulation such that it discards these dangerous values before performing the conditioning. Since there are only few of those dangerous values, discarding them does not reveal too much information on $X$ and $Y$ , and the simulation can proceed as before.

1.4 Open problems

The main question that arises from this work is how much more general the gadget $g$ can be? As was discussed in LABEL:Subsec:background, lifting theorems can be viewed as a generalization of direct-sum theorems. In the setting of randomized communication complexity, it is known that the “ability of $g$ to admit a direct-sum theorem” is characterized exactly by a complexity measure called the information cost of $g$ (denoted $\boldsymbol{IC}(g)$ ). In particular, the complexity of computing a function $g$ on $n$ independent copies is $\approx n\cdot\boldsymbol{IC}(g)$ . [BBCR10, BR14, Bra17]. This leads to the natural conjecture that a lifting theorem should hold for every gadget $g$ that has sufficiently high information cost.

Conjecture 1.4.

There exists a constant $c>0$ such that the following holds. Let $\mathcal{S}$ be any search problem that takes inputs from $\left\{0,1\right\}^{n}$ , and let $g:\left\{0,1\right\}^{b}\times\left\{0,1\right\}^{b}\to\left\{0,1\right\}$ be an arbitrary function such that $\boldsymbol{IC}(g)\geq c\cdot\log n$ . Then

[TABLE]

Proving this conjecture would give us a nearly-complete understanding of the lifting phenomenon which, in addition to being interesting in its own right, would likely lead to many applications. In particular, this conjecture implies our result, since it is known that $\log\frac{1}{{\rm disc}(g)}$ (roughly) lower bounds the information cost of $g$ [KLL*+*15].

1.4 is quite ambitious. As intermediate goals, one could attempt to prove such a lifting theorem for other complexity measures that are stronger than discrepancy and weaker than information cost (see [JK10, KLL*+*15] for several measures of this kind). To begin with, one could consider the well-known corruption bound of [Yao83, BFS86, Raz92]: could we prove a lifting theorem for an arbitrary gadget $g$ that has a low corruption bound? A particularly interesting example for such a gadget is the disjointness function — indeed, proving a lifting theorem for the disjointness gadget would be interesting in its own right and would likely have applications, in addition to being a step toward 1.4.

An even more modest intermediate goal is to gain better understanding of lifting theorems with respect to discrepancy. For starters, our result only holds111More accurately, our result can be applied to gadgets with larger discrepancy, but then the gadget size has to be larger than logarithmic. for gadgets whose discrepancy is exponentially vanishing in the gadget size. Can we prove a lifting theorem for gadgets $g$ with larger discrepancy? In particular, since the randomized communication complexity of $g$ is lower bounded by $\Omega(\log\frac{1}{{\rm disc}(g)})$ , the following conjecture comes to mind.

Conjecture 1.5.

There exists a constant $c>0$ such that the following holds. Let $\mathcal{S}$ be any search problem that takes inputs from $\left\{0,1\right\}^{n}$ , and let $g:\left\{0,1\right\}^{b}\times\left\{0,1\right\}^{b}\to\left\{0,1\right\}$ be an arbitrary function such that $\log\frac{1}{{\rm disc}(g)}\geq c\cdot\log n$ . Then

[TABLE]

Another interesting direction is to consider discrepancy with respect to other distributions. The definition of discrepancy we gave above (1.1) is a special case of a more general definition, in which the random variables $(U,V)$ are distributed according to some fixed distribution $\mu$ over $\Lambda\times\Lambda$ . Thus, our result works only when $\mu$ is the uniform distribution. Can we prove a lifting theorem that holds for an arbitrary choice of $\mu$ ? While we have not verified it, we believe that our proof can yield a lifting theorem that works whenever $\mu$ is a product distribution (after some natural adaptations). However, proving such a lifting theorem for non-product distributions seems to require new ideas. We note that direct-sum theorems for discrepancy have been proved by [Sha03, LSS08], and proving 1.4 (and extending it to an arbitrary distribution $\mu$ ) seems like a natural extension of their results.

Yet another interesting direction is to consider the lifting analogue of strong direct product theorems. Such theorems say that when we compute $g$ on $n$ independent inputs, then not only that the communication complexity increases by a factor of $n$ , but the success probability also drops exponentially in $n$ (see, e.g., [Sha03, Kla10, Dru12, BRWY13]). A plausible analogue for lifting theorems is to conjecture that the success probability of computing $\mathcal{S}\circ g^{n}$ drops exponentially in the query complexity of $\mathcal{S}$ . It would be interesting to see a result along these lines.

Finally, there remains major open problem of the lifting literature to prove a lifting theorem that uses gadgets of constant size.

Organization of the paper. In LABEL:Sec:preliminaries, we provide the required preliminaries. In 3, we set up the lifting machinery that is used in both the deterministic and the randomized lifting results, including our “uniform marginals lemma” and our main technical lemma. We prove the deterministic part of our main theorem in 4, and the randomized part of our main theorem in 5.

2 Preliminaries

We assume familiarity with the basic definitions of communication complexity (see, e.g., [KN97]). For any $n\in\mathbb{N}$ , we denote $\left[n\right]\stackrel{{\scriptstyle\rm{def}}}{{=}}\left\{1,\ldots,n\right\}$ . Given a boolean random variable $V$ , we denote the bias of $V$ by

[TABLE]

Given an alphabet $\Lambda$ and a set $I\subseteq\left[n\right]$ , we denote by $\Lambda^{I}$ the set of strings of length $\left|I\right|$ which are indexed by $I$ . Given a string $x\in\Lambda^{n}$ and a set $I\subseteq\left[n\right]$ , we denote by $x_{I}$ the projection of $x$ to the coordinates in $I$ (in particular, $x_{\emptyset}$ is defined to be the empty string). Given a boolean function $g:\mathcal{X}\times\mathcal{Y}\to\left\{0,1\right\}$ and a set $I\subseteq\left[n\right]$ , we denote by $g^{I}:\mathcal{X}^{I}\times\mathcal{Y}^{I}\to\left\{0,1\right\}^{I}$ the function that takes as inputs $\left|I\right|$ pairs from $\mathcal{X}\times\mathcal{Y}$ that are indexed by $I$ , and outputs the string in $\left\{0,1\right\}^{I}$ whose $i$ -th bit is the output of $g$ on the $i$ -th pair. In particular, we denote $g^{n}\stackrel{{\scriptstyle\rm{def}}}{{=}}g^{\left[n\right]}$ , so the $g^{n}$ takes as inputs $x\in\mathcal{X}^{n},y\in\mathcal{Y}^{n}$ and outputs the binary string

[TABLE]

For every $I\subseteq\left[n\right]$ , we denote by $g^{\oplus I}:\mathcal{X}^{I}\times\mathcal{Y}^{I}\to\left\{0,1\right\}$ the function that given $x\in\mathcal{X}^{I}$ and $y\in\mathcal{Y}^{I}$ , outputs the parity of the string $g^{I}(x,y)$ .

Search problems.

Given a finite set of inputs $\mathcal{I}$ and a finite set of outputs $\mathcal{O}$ , a search problem $\mathcal{S}$ is a relation between $\mathcal{I}$ and $\mathcal{O}$ . Given $z\in\mathcal{I}$ , we denote by $\mathcal{S}(z)$ the set of outputs $o\in\mathcal{O}$ such that $(z,o)\in\mathcal{S}$ . Without loss of generality, we may assume that $\mathcal{S}(z)$ is always non-empty, since otherwise we can set $\mathcal{S}(z)=\left\{\bot\right\}$ where $\bot$ is some special failure symbol that does not belong to $\mathcal{O}$ .

Intuitively, a search problem $\mathcal{S}$ represents the following task: given an input $z\in\mathcal{I}$ , find a solution $o\in\mathcal{S}(z)$ . In particular, if $\mathcal{I}=\mathcal{X}\times\mathcal{Y}$ for some finite sets $\mathcal{X},\mathcal{Y}$ , we say that a deterministic protocol $\Pi$ solves $\mathcal{S}$ if for every input $(x,y)\in\mathcal{I}$ , the output of $\Pi$ is in $\mathcal{S}(x,y)$ . We say that a randomized protocol $\Pi$ solves $\mathcal{S}$ with error $\varepsilon$ if for every input $(x,y)\in\mathcal{I}$ , the output of $\Pi$ is in $\mathcal{S}(x,y)$ with probability at least $1-\varepsilon$ .

We denote the deterministic communication complexity of a search problem $\mathcal{S}$ with $D^{\mathrm{cc}}(\mathcal{S})$ . Given $\varepsilon>0$ , we denote by $R^{\mathrm{cc}}_{\varepsilon}(\mathcal{S})$ the randomized (public-coin) communication complexity of $\mathcal{S}$ with error $\varepsilon$ (i.e., the minimum worst-case complexity of a randomized protocol that solves $\mathcal{S}$ with error $\varepsilon$ ).

Given a search problem $\mathcal{S}\subseteq\left\{0,1\right\}^{n}\times\mathcal{O}$ , we denote by $\mathcal{S}\circ g^{n}\subseteq(\mathcal{X}^{n}\times\mathcal{Y}^{n})\times\mathcal{O}$ the search problem that satisfies for every $x\in\mathcal{X}^{n}$ and $y\in\mathcal{Y}^{n}$ that $\mathcal{S}\circ g^{n}(x,y)=\mathcal{S}(g^{n}(x,y))$ .

2.1 Decision trees

Informally, a decision tree is an algorithm that solves a search problem $\mathcal{S}\subseteq\left\{0,1\right\}^{n}\times\mathcal{O}$ by querying the individual bits of its input. The tree is computationally unbounded, and its complexity is measured by the number of bits it queried.

Formally, a deterministic decision tree $T$ from $\left\{0,1\right\}^{n}$ to $\mathcal{O}$ is a binary tree in which every internal node is labeled with a coordinate in $\left[n\right]$ (which represents a query), every edge is labeled by a bit (which represents the answer to the query), and every leaf is labeled by an output in $\mathcal{O}$ . Such a tree computes a function from $\left\{0,1\right\}^{n}$ to $\mathcal{O}$ in the natural way, and with a slight abuse of notation, we denote this function also as $T$ . The query complexity of $T$ is the depth of the tree. We say that a tree $T$ solves a search problem $\mathcal{S}\subseteq\left\{0,1\right\}^{n}\times\mathcal{O}$ if for every $z\in\left\{0,1\right\}^{n}$ it holds that $T(z)\in\mathcal{S}(z)$ . The deterministic query complexity of $\mathcal{S}$ , denoted $D^{\mathrm{dt}}(\mathcal{S})$ , is the minimal query complexity of a decision tree that solves $\mathcal{S}$ .

A randomized decision tree $T$ is a random variable that takes deterministic decision trees as values. The query complexity of $T$ is the maximal depth of a tree in the support of $T$ . We say that $T$ solves a search problem $\mathcal{S}\subseteq\left\{0,1\right\}^{n}\times\mathcal{O}$ with error $\varepsilon$ if for every $z\in\left\{0,1\right\}^{n}$ it holds that

[TABLE]

The randomized query complexity of $\mathcal{S}$ with error $\varepsilon$ , denoted $R^{\mathrm{dt}}_{\varepsilon}$ , is the minimal query complexity of a randomized decision tree that solves $\mathcal{S}$ with error $\varepsilon$ . Again, when we omit $\varepsilon$ , it is assumed to be $\frac{1}{3}$ .

2.1.1 Parallel decision-trees

Our lifting theorems have the property that they preserve the round complexity of protocols, which is useful for some applications [dRNV16]. In order to define this property, we need a notion of a decision tree that has an analogue of “round complexity”. Such a notion, due to [Val75], is called a parallel decision tree. Informally, a parallel decision tree is a decision tree that works in “rounds”, where in each round multiple queries are issued simultaneously. The “round complexity” of the tree is the number of rounds, whereas the query complexity is the total number of queries issued.

Formally, a deterministic parallel decision tree $T$ from $\left\{0,1\right\}^{n}$ to $\mathcal{O}$ is a rooted tree in which every internal node is labeled with a set $I\subseteq\left[n\right]$ (representing the queries issued simultaneously at this round) and has degree $2^{\left|I\right|}$ . The edges going out of such a node are labeled with all the possible assignments in $\left\{0,1\right\}^{I}$ , and the every leaf is labeled by some output $o\in\mathcal{O}$ . As before, such a tree naturally computes a function that is denoted by $T$ , and it solves a search problem $\mathcal{S}\subseteq\left\{0,1\right\}^{n}\times\mathcal{O}$ if $T(z)\in\mathcal{S}(z)$ for all $z\in\left\{0,1\right\}^{n}$ . The depth of such a tree is now the analogue of the number of rounds in a protocol. The query complexity of $T$ is defined as the maximum, over all leaves $\ell$ , of the sum of the sizes of the sets $I$ that are labeling the vertices on the path from the root to $\ell$ . A randomized parallel decision tree is defined analogously to the definition of randomized decision trees above.

2.2 Fourier analysis

Given a set $S\subseteq\left[m\right]$ , the character $\chi_{S}$ is the function from $\left\{0,1\right\}^{m}$ to $\mathbb{R}$ that is defined by

[TABLE]

Here, if $S=\emptyset$ then we define $\bigoplus_{i\in S}z_{i}=0$ . Given a function $f:\left\{0,1\right\}^{m}\to\mathbb{R}$ , its Fourier coefficient $\hat{f}(S)$ is defined as

[TABLE]

It is a standard fact of Fourier analysis that $f$ can be written as

[TABLE]

We have the following useful observation.

Fact 2.1.

Let $Z$ be a random variable taking values in $\left\{0,1\right\}^{m}$ , and let $\mu:\left\{0,1\right\}^{m}\to\mathbb{R}$ be its density function. Then, for every set $S\subseteq\left[m\right]$ it holds that

[TABLE]

In particular, $\hat{\mu}(\emptyset)=2^{-m}$ .

Proof.

Let $S\subseteq\left[m\right]$ . It holds that

[TABLE]

as required. The “in particular” part follows by noting that in the case of $S=\emptyset$ , the character $\chi_{S}$ is the constant function $1$ , and recalling that the sum of $\mu(z)$ over all $z$ ’s is $1$ . ∎

2.3 Probability

Given two distributions $\mu_{1},\mu_{2}$ over a finite sample space $\Omega$ , the statistical distance (or total variation distance) between $\mu_{1}$ and $\mu_{2}$ is

[TABLE]

It is not hard to see that the maximum is attained when $\mathcal{E}$ consists of all the values $\omega\in\Omega$ such that $\mu_{1}(\omega)>\mu_{2}(\omega)$ . We say that $\mu$ and $\mu_{2}$ are $\varepsilon$ -close if $\left|\mu-\mu_{2}\right|\leq\varepsilon$ . The *min-entropy *of a random variable $X$ , denoted $H_{\infty}(X)$ , is the largest number $k\in\mathbb{R}$ such that for every value $x$ it holds that

[TABLE]

Min-entropy has the following easy-to-prove properties.

Fact 2.2.

Let $X$ be a random variable and let $\mathcal{E}$ be an event. Then, $H_{\infty}(X\texttt{$ \mid $}\mathcal{E})\geq H_{\infty}(X)-\log\frac{1}{\Pr\left[\mathcal{E}\right]}$ .

Fact 2.3.

Let $X_{1},X_{2}$ be random variables taking values from sets $\mathcal{X}_{1},\mathcal{X}_{2}$ respectively. Then, $H_{\infty}(X_{1})\geq H_{\infty}(X_{1},X_{2})-\log\left|\mathcal{X}_{2}\right|$ .

We say that a distribution is $k$ -flat if it is uniformly distributed over a subset of the sample space of size at least $2^{k}$ . The following standard fact is useful.

Fact 2.4.

If a random variable $X$ has min-entropy $k$ , then its distribution is a convex combination of $k$ -flat distributions.

2.3.1 Vazirani’s Lemma

Vazirani’s lemma is a useful result which says that a random string is close to being uniformly distributed if the XOR of every set of bits in the string has a small bias. We use the following variant of the lemma due to [GLM*+*16].

Lemma 2.5 ([GLM+16]).

Let $\varepsilon>0$ , and let $Z$ be a random variable taking values in $\left\{0,1\right\}^{m}$ . If for every non-empty set $S\subseteq\left[m\right]$ it holds that

[TABLE]

then for every $z\in\left\{0,1\right\}^{m}$ it holds that

[TABLE]

Proof.

Let $\mu:\left\{0,1\right\}^{m}\to\mathbb{R}$ be the density function of $Z$ , and let $z\in\left\{0,1\right\}^{m}$ . By Equation 1 it holds that

[TABLE]

as required. ∎

2.5 says that if the bias of $\bigoplus_{i\in S}Z_{i}$ is small for every $S$ , then $Z$ is close to being uniformly distributed. It turns out that if the latter assumption holds only for large sets $S$ , we can still deduce something useful, namely, that the min-entropy of $Z$ is high.

Lemma 2.6.

Let $t\in\mathbb{N}$ be such that $t\geq 1$ , and let $Z$ be a random variable taking values in $\left\{0,1\right\}^{m}$ . If for every set $S\subseteq\left[m\right]$ such that $\left|S\right|\geq t$ it holds that

[TABLE]

then, $H_{\infty}(Z)\geq m-t\log m-1$ .

Proof.

Observe that if $m=1$ then the bound holds vacuously, so we may assume that $m\geq 2$ . Let $\mu:\left\{0,1\right\}^{m}\to\mathbb{R}$ be the density function of $Z$ , and let $z\in\left\{0,1\right\}^{m}$ . By Equality 1 it holds that

[TABLE]

We now bound each of the two terms separately. The term for sets $S$ whose size is at least $t$ can be upper bounded by $1$ using exactly the same calculation as in the proof of 2.5. In order to upper bound the term for sets whose size is less than $t$ , observe that $\mathrm{bias}(\oplus_{i\in S}Z_{i})\leq 1$ for every $S\subseteq\left[m\right]$ and and therefore

[TABLE]

It follows that

[TABLE]

Thus, $H_{\infty}(Z)\geq m-t\cdot\log m$ as required. Note that this bound is a bit stronger than claimed in the lemma: indeed, we only need the “ $-1$ ” term in the lemma in order to deal with the case where $m=1$ . ∎

2.3.2 Coupling

Let $\mu_{1},\mu_{2}$ be two distributions over sample spaces $\Omega_{1},\Omega_{2}$ . A coupling of $\mu_{1}$ and $\mu_{2}$ is a distribution $\nu$ over the sample space $\Omega_{1}\times\Omega_{2}$ whose marginal over the first coordinate is $\mu_{1}$ and whose marginal over the second coordinate is $\mu_{2}$ . In the case where $\Omega_{1}=\Omega_{2}=\Omega$ , the following standard fact allows us to use couplings to study the statistical distance between $\mu_{1}$ and $\mu_{2}$ .

Fact 2.7.

Let $\mu_{1},\mu_{2}$ be two distributions over a sample space $\Omega$ . The statistical distance between $\mu_{1}$ and $\mu_{2}$ is equal to the minimum, over all couplings $\nu$ of $\mu_{1}$ and $\mu_{2}$ , of

[TABLE]

In particular, we can upper bound the statistical distance between $\mu_{1}$ and $\mu_{2}$ by constructing a coupling $\nu$ in which the probability that $X\neq Y$ is small.

2.4 Prefix-free codes

A set of strings $C\subseteq\left\{0,1\right\}^{*}$ is called a prefix-free code if no string in $C$ is a prefix of another string in $C$ . Given a string $w\in\left\{0,1\right\}^{*}$ , we denote its length by $\left|w\right|$ . We use the following simple corollary of Kraft’s inequality.

Fact 2.8.

Let $C\subseteq\left\{0,1\right\}^{*}$ be a finite prefix-free code, and let $W$ be a random string taking values from $C$ . Then, there exists a string $w\in C$ such that $\Pr\left[W=w\right]\geq\frac{1}{2^{\left|w\right|}}$ .

For completeness, we provide the following simple proof of 2.8 that does not rely on Kraft’s inequality.

Proof.

Let $n$ be the maximal length of a string in $C$ , and let $W^{\prime}$ be a random string in $\left\{0,1\right\}^{n}$ that is sampled according to the following process: sample a string $w$ from $W$ , choose a uniformly distributed string $z\in\left\{0,1\right\}^{n-\left|w\right|}$ , and set $W^{\prime}=w\circ z$ (where here $\circ$ denotes string concatenation).

By a simple averaging argument, there exists a string $w^{\prime}\in\left\{0,1\right\}^{n}$ such that $\Pr\left[W^{\prime}=w^{\prime}\right]\geq\frac{1}{2^{n}}$ . Since $C$ is a prefix-free code, there exists a unique prefix $w$ of $w^{\prime}$ that is in $C$ . The definition of $W^{\prime}$ implies that

[TABLE]

because the only way the string $w^{\prime}$ could be sampled is by first sampling $w$ and then sampling $z$ to be the rest of $w^{\prime}$ (again, since $C$ is a prefix-free code). Hence, it follows that

[TABLE]

as required. ∎

2.5 Discrepancy

We start by recalling the definition of discrepancy.

Definition 1.1.

Let $\Lambda$ be a finite set, let $g:\Lambda\times\Lambda\to\left\{0,1\right\}$ be a function, and let $U,V$ be independent random variables that are uniformly distributed over $\Lambda$ . Given a combinatorial rectangle $R\subseteq\Lambda\times\Lambda$ , the discrepancy of $g$ with respect to $R$ , denoted ${\rm disc}_{R}(g)$ , is defined as follows:

[TABLE]

The discrepancy of $g$ , denoted ${\rm disc}(g)$ , is defined as the maximum of ${\rm disc}_{R}(g)$ over all combinatorial rectangles $R\subseteq\Lambda\times\Lambda$ .

Let $g:\Lambda\times\Lambda\to\left\{0,1\right\}$ be a function with discrepancy at most $\left|\Lambda\right|^{-\eta}$ . Such functions $g$ satisfy the following “extractor-like” property. In what follows, the parameter $\lambda$ controls $\mathrm{bias}\left(g(X,Y)\right)$ .

Lemma 2.9.

Let $X,Y$ be independent random variables taking values in $\Lambda$ such that $H_{\infty}(X)+H_{\infty}(Y)\geq(2-\eta+\lambda)\cdot\log\left|\Lambda\right|$ . Then,

[TABLE]

Proof.

By 2.4, it suffices to consider the case where $X$ and $Y$ have flat distributions. Let $A,B\subseteq\Lambda$ be the sets over which $X,Y$ are uniformly distributed, and denote $R\stackrel{{\scriptstyle\rm{def}}}{{=}}A\times B$ . By the assumption on the min-entropies of $X$ and $Y$ , it holds that $\left|R\right|\geq\left|\Lambda\right|^{2-\eta+\lambda}$ .

Let $U,V$ be random variables that are uniformly distributed over $\Lambda$ . Then, $X$ and $Y$ are distributed like $U\texttt{$ \mid $}U\in A$ and $V\texttt{$ \mid $}V\in B$ respectively. It follows that

[TABLE]

as required. ∎

Using 2.9, we can obtain the following sampling property, which says that with high probability $X$ takes a value $x$ for which $\mathrm{bias}\left(g(x,Y)\right)$ is small. In what follows, the parameter $\lambda$ controls $\mathrm{bias}\left(g(x,Y)\right)$ , the parameter $\gamma$ controls the error probability, and recall that $\eta$ is the parameter that controls the discrepancy of $g$ (i.e., ${\rm disc}(g)\leq 2^{-\eta\cdot b}$ ).

Lemma 2.10.

Let $\gamma,\lambda>0$ . Let $X,Y$ be independent random variables taking values in $\Lambda$ such that

[TABLE]

Then, the probability that $X$ takes a value $x\in\Lambda$ such that

[TABLE]

is less than $\left|\Lambda\right|^{-\gamma}$ .

Proof.

For every $x\in\Lambda$ , denote

[TABLE]

Using this notation, our goal is to prove that

[TABLE]

We will prove that the probability that $p_{X}>\frac{1}{2}+\frac{1}{2}\left|\Lambda\right|^{-\lambda}$ is less than $\frac{1}{2}\cdot\left|\Lambda\right|^{-\gamma}$ , and a similar proof can be used to show that the probability that $p_{X}<\frac{1}{2}-\frac{1}{2}\left|\Lambda\right|^{-\lambda}$ is less than $\frac{1}{2}\cdot\left|\Lambda\right|^{-\gamma}$ . The required result will then follow by the union bound.

Let $\mathcal{E}\subseteq\Lambda$ be the set of values $x$ such that $p_{x}>\frac{1}{2}+\frac{1}{2}\left|\Lambda\right|^{-\lambda}$ . Suppose for the sake of contradiction that $\Pr\left[X\in\mathcal{E}\right]\geq\frac{1}{2}\cdot\left|\Lambda\right|^{-\gamma}$ . It clearly holds that

[TABLE]

On the other hand, it holds that

[TABLE]

This implies that

[TABLE]

By 2.9, it follows that

[TABLE]

which contradicts Inequality 3. We reached a contradiction, and therefore the probability that $p_{X}>\frac{1}{2}+\frac{1}{2}\left|\Lambda\right|^{-\lambda}$ is less than $\frac{1}{2}\cdot\left|\Lambda\right|^{-\gamma}$ , as required. ∎

Recall that $g^{\oplus I}(x,y)$ the function from $\mathcal{X}^{I}\times\mathcal{Y}^{I}$ to $\left\{0,1\right\}$ that outputs the parity of the string $g^{I}(x,y)$ . We would like to prove results like LABEL:lem:[s]discrepancy-extractor and 2.10 for functions of the form $g^{\oplus I}$ . To this end, we use the following XOR lemma for discrepancy due to [LSS08].

Theorem 2.11 ([LSS08]).

Let $m\in\mathbb{N}$ . Then,

[TABLE]

By combining 2.11 with LABEL:lem:[s]discrepancy-extractor and 2.10, we obtain the following results.

Corollary 2.12.

Let $\lambda>0$ , $n\in\mathbb{N}$ and $S\subseteq\left[n\right]$ . Let $X,Y$ be independent random variables taking values in $\Lambda^{S}$ such that

[TABLE]

Then,

[TABLE]

Corollary 2.13.

Let $\gamma,\lambda>0$ , $n\in\mathbb{N}$ and $S\subseteq\left[n\right]$ . Let $X,Y$ be independent random variables taking values in $\Lambda^{S}$ such that

[TABLE]

Then, the probability that $X$ takes a value $x\in\Lambda$ such that

[TABLE]

is less than $\left|\Lambda\right|^{-\gamma\cdot\left|I\right|}$ .

3 Lifting Machinery

In this section, we set up the machinery we need to prove our main theorem, restated next.

Theorem 1.2.

For every $\eta>0$ there exists $c=O(\frac{1}{\eta^{2}}\cdot\log\frac{1}{\eta})$ such that the following holds: Let $\mathcal{S}$ be a search problem that takes inputs from $\left\{0,1\right\}^{n}$ , and let $g:\left\{0,1\right\}^{b}\times\left\{0,1\right\}^{b}\to\left\{0,1\right\}$ be an arbitrary function such that ${\rm disc}(g)\leq 2^{-\eta\cdot b}$ and such that $b\geq c\cdot\log n$ . Then

[TABLE]

and for every $\varepsilon>0$ it holds that

[TABLE]

where $\varepsilon^{\prime}=\varepsilon+2^{-\eta\cdot b/8}$ .

For the rest of this paper, we fix $\eta>0$ and let $c\in\mathbb{N}$ be some sufficiently large parameter that will be determined later such that $c=O(\frac{1}{\eta^{2}}\cdot\log\frac{1}{\eta})$ . Let $n\in\mathbb{N}$ , and let $g:\left\{0,1\right\}^{b}\times\left\{0,1\right\}^{b}\to\left\{0,1\right\}$ be a function such that ${\rm disc}(g)\leq 2^{-\eta\cdot b}$ and such that $b\geq c\cdot\log n$ . Note that when $n=1$ , the theorem holds trivially, so we may assume that $n\geq 2$ . For convenience, we denote $\Lambda\stackrel{{\scriptstyle\rm{def}}}{{=}}\left\{0,1\right\}^{b}$ and $G\stackrel{{\scriptstyle\rm{def}}}{{=}}g^{n}$ . Throughout the rest of this section, $X$ and $Y$ will always denote random variables whose domain is $\Lambda^{n}$ .

As explained in LABEL:Subsec:our-techniques, our simulation argument is based on the idea that as long as the protocol did not transmit too much information about the inputs, their distribution is similar to the uniform distribution. The following definition, due to [GLM*+*16], formalizes the notion that the protocol did not transmit too much information about an input $X$ .

Definition 3.1.

Let $\delta_{X}>0$ . We say that a random variable $X$ is $\delta_{X}$ -dense if for every $I\subseteq\left[n\right]$ it holds that $H_{\infty}(X_{I})\geq\delta_{X}\cdot b\cdot\left|I\right|$ .

As explained there, whenever the protocol transmits too much information about a bunch of blocks $(X_{I},Y_{I})$ (where $I\subseteq\left[n\right]$ ), the simulation queries $z_{I}$ and conditions the distribution on $g(X_{I},Y_{I})=z_{I}$ . The following definitions provide a useful way for implementing this argument: restrictions are used to keep track of which bits of $z$ have been queried so far, and the notion of structured variables expresses the desired properties of the distribution of the inputs.

Definition 3.2.

A restriction $\rho$ is a string in $\left\{0,1,*\right\}^{n}$ . We say that a coordinate $i\in\left[n\right]$ is free in $\rho$ if $\rho_{i}=*$ , and otherwise we say that $i$ is fixed. Given a restriction $\rho\in\left\{0,1,*\right\}^{n}$ , we denote by $\mathrm{free}(\rho)$ and $\mathrm{fix}(\rho)$ the set of free and fixed coordinates of $\rho$ respectively. We say that a string $z\in\left\{0,1\right\}^{n}$ is consistent with $\rho$ if $z_{\mathrm{fix}(\rho)}=\rho_{\mathrm{fix}(\rho)}$ .

Intuitively, $\mathrm{fix}(\rho)$ represents the queries that have been made so far, and $\mathrm{free}(\rho)$ represents the coordinates that have not been queried yet.

Definition 3.3 (following [GPW17]).

Let $\rho\in\left\{0,1,*\right\}^{n}$ be a restriction, let $\tau>0$ , and let $X,Y$ be independent random variables. We say that $X$ and $Y$ are $(\rho,\tau)$ -structured if there exist $\delta_{X},\delta_{Y}>0$ such that $X_{\mathrm{free}(\rho)}$ and $Y_{\mathrm{free}(\rho)}$ are $\delta_{X}$ -dense and $\delta_{Y}$ -dense respectively, $\delta_{X}+\delta_{Y}\geq\tau$ , and

[TABLE]

We can now state our version of the uniform marginals lemma of [GPW17], which formalizes the idea that if $X$ and $Y$ are structured then their distribution is similar to the uniform distribution over $G^{-1}(z)$ . In what follows, the parameter $\gamma$ controls the statistical distance from the uniform distribution, and recall that $\eta$ is the parameter that controls the discrepancy of $g$ (i.e., ${\rm disc}(g)\leq 2^{-\eta\cdot b}$ ).

Lemma 3.4 (Uniform marginals lemma).

There exists a universal constant $h$ such that the following holds: Let $\gamma>0$ , let $\rho$ be a restriction, and let $z\in\left\{0,1\right\}^{n}$ be a string that is consistent with $\rho$ . Let $X,Y$ be independent random variables that are uniformly distributed over sets $\mathcal{X},\mathcal{Y}\subseteq\Lambda^{n}$ respectively, and assume that they are $(\rho,\tau)$ -structured where

[TABLE]

Let $(X^{\prime},Y^{\prime})$ be uniformly distributed over $G^{-1}(z)\cap(\mathcal{X}\times\mathcal{Y})$ . Then, $X$ and $Y$ are $2^{-\gamma\cdot b}$ -close to $X^{\prime}$ and $Y^{\prime}$ respectively.

Remark 3.5.

Here, as well as in other claims in the paper, we denote by $h$ some constant that is large enough to make the proofs go through, and does not depend on any other parameter. The constant $h$ can be calculated explicitly, and we only refrain from doing so in order to streamline the presentation. In all the cases where we use this constant, it can be chosen to be at most $50$ .

We defer the proof of 3.4 to LABEL:Subsec:uniform-marginals, and move to discuss the next issue. Recall that in order for $X$ and $Y$ to be structured, the random variables $X_{\mathrm{free}(\rho)}$ and $Y_{\mathrm{free}(\rho)}$ have to be dense. However, as the simulation progresses and the protocol transmits information, this property may be violated, and $X_{\mathrm{free}(\rho)}$ or $Y_{\mathrm{free}(\rho)}$ may cease to be dense. In order to restore the density, we use the following folklore fact.

Proposition 3.6.

Let $X$ be a random variable, let $\delta_{X}>0$ , and let $I\subseteq\left[n\right]$ be a maximal subset of coordinates such that $H_{\infty}(X_{I})<\delta_{X}\cdot b\cdot\left|I\right|$ . Let $x_{I}\in\Lambda^{I}$ be a value such that

[TABLE]

Then, the random variable $X_{\left[n\right]-I}\texttt{$ \mid $}X_{I}=x_{I}$ is $\delta_{X}$ -dense.

Proof.

Assume for the sake of contradiction that $X_{\left[n\right]-I}\texttt{$ \mid $}X_{I}=x_{I}$ is not $\delta_{X}$ -dense. Then, there exists a non-empty set $J\subseteq\left[n\right]-I$ such that $H_{\infty}(X_{J}\texttt{$ \mid $}X_{I}=x_{I})<\delta_{X}\cdot b\cdot\left|J\right|$ . In particular, there exists a value $x_{J}\in\Lambda^{J}$ such that

[TABLE]

But this implies that

[TABLE]

which means that

[TABLE]

However, this contradicts the maximality of $I$ . ∎

3.6 is useful in the deterministic setting, since in this setting the simulation is free to condition the distributions of $X,Y$ in any way that maintains their density. However, in the randomized setting, the simulation is more restricted, and cannot condition the inputs on events such as $X_{I}=x_{I}$ which may have very low probability. In [GPW17], this issue was resolved by observing that the probability space can be partitioned to disjoint events of the form $X_{I}=x_{I}$ , and that the randomized simulation can use such a partition to achieve the same effect of 3.6. This leads to the following lemma, which we use as well.

Lemma 3.7 (Density-restoring partition [GPW17]).

Let $X$ be a random variable, let $\mathcal{X}$ denote the support of $X$ , and let $\delta_{X}>0$ . Then, there exists a partition

[TABLE]

where every $\mathcal{X}^{j}$ is associated with a set $I_{j}\subseteq\left[n\right]$ and a value $x_{j}\in\Lambda^{I_{j}}$ such that:

•

$X_{I_{j}}\texttt{$ \mid $}X\in\mathcal{X}^{j}$ * is fixed to $x_{j}$ .*

•

$X_{\left[n\right]-I_{j}}\texttt{$ \mid $}X\in\mathcal{X}^{j}$ * is $\delta_{X}$ -dense.*

Moreover, if we denote $p_{\geq j}\stackrel{{\scriptstyle\rm{def}}}{{=}}\Pr\left[X\in\mathcal{X}^{j}\cup\ldots\cup\mathcal{X}^{\ell}\right]$ , then it holds that

[TABLE]

We turn to discuss the “conditioning issue” that was discussed in LABEL:Subsec:our-techniques and its resolution: As mentioned above, the simulation uses 3.6 and 3.7 to restore the density of the inputs by conditioning some of the blocks. Specifically, suppose, for example, that $X_{\mathrm{free}(\rho)}$ is no longer dense. Then, the simulation chooses appropriate $I\subseteq\mathrm{free}(\rho)$ and $x_{I}\in\Lambda^{I}$ , and conditions $X$ on the event $X_{I}=x_{I}$ . At this point, in order to make $X$ and $Y$ structured again, we need to remove $I$ from $\mathrm{free}(\rho)$ , so the simulation queries the bits in $z_{I}$ , and update the restriction $\rho$ by setting $\rho_{I}=z_{I}$ . Now, we have to make sure that $g(X_{I},Y_{I})=z_{I}$ . To this end, the simulation conditions $Y$ on the event $g(x_{I},Y_{I})=z_{I}$ . However, the latter conditioning reveals information about $Y$ , which may have two harmful effects:

•

Leaking: As discussed in LABEL:Subsec:our-techniques, our analysis of the query complexity assumes that the protocol transmits at most $C$ bits of information. It is important not to reveal more information than that, or otherwise our query complexity may increase arbitrarily. On average, we expect that conditioning on the event $g(x_{I},Y_{I})=z_{I}$ would reveal only $\left|I\right|$ bits of information, which is sufficiently small for our purposes. However, there could be values of $x_{I}$ and $z_{I}$ for which much more information is leaked. In this case, we say the conditioning is leaking.

•

Sparsifying: Even if the conditioning reveals only $\left|I\right|$ bits of information on $Y$ , this could still ruin the density of $Y$ if the set $I$ is large. In this case, we say that the conditioning is sparsifying.

This is the “conditioning issue”, and dealing with it is the technical core of the paper. As explained in LABEL:Subsec:our-techniques, the simulation deals with this issue by recognizing in advance which values of $X$ are “dangerous”, in the sense that they may lead to a bad conditioning, and discarding them before such conditioning may take place. The foregoing discussion leads to the following definition of a dangerous value.

Definition 3.8.

Let $Y$ be a random variable taking values from $\Lambda^{n}$ . We say that a value $x\in\Lambda^{n}$ is leaking if there exists a set $I\subseteq\left[n\right]$ and an assignment $z_{I}\in\left\{0,1\right\}^{I}$ such that

[TABLE]

Let $\delta_{Y},\varepsilon>0$ , and suppose that $Y$ is $\delta_{Y}$ -dense. We say that a value $x\in\Lambda^{n}$ is $\varepsilon$ -sparsifying if there exists a set $I\subseteq\left[n\right]$ and an assignment $z_{I}\in\left\{0,1\right\}^{I}$ such that the random variable

[TABLE]

is not $(\delta_{Y}-\varepsilon)$ -dense. We say that a value $x\in\Lambda^{n}$ is $\varepsilon$ -dangerous if it is either leaking or $\varepsilon$ -sparsifying.

We can now state our main technical lemma, which says that $X$ has only a small probability to take a dangerous value. This allows the simulation to discard such values and resolve the conditioning issue. In what follows, the parameter $\gamma$ controls the error probability, and recall that $\eta$ is the parameter that controls the discrepancy of $g$ (i.e., ${\rm disc}(g)\leq 2^{-\eta\cdot b}$ ).

Lemma 3.9 (Main lemma).

There exists a universal constant222See 3.5 for further explanation on the constant $h$ . $h$ such that the following holds: Let $0<\gamma,\varepsilon,\tau\leq 1$ be such that $\tau\geq 2+\frac{h}{c\cdot\varepsilon}-\eta+\gamma$ and $\varepsilon\geq\frac{4}{b}$ , and let $X,Y$ be $(\rho,\tau)$ -structured random variables. Then, the probability that $X_{\mathrm{free}(\rho)}$ takes a value that is $\varepsilon$ -dangerous for $Y_{\mathrm{free}(\rho)}$ is at most $2^{-\gamma\cdot b}$ .

3.1 Proof of the uniform marginals lemma

Recall that the random variables $X$ and $Y$ are $(\rho,\tau)$ -structured if there exist $\delta_{X},\delta_{Y}>0$ such that $X_{\mathrm{free}(\rho)}$ and $Y_{\mathrm{free}(\rho)}$ are $\delta_{X}$ -dense and $\delta_{Y}$ -dense respectively, $\delta_{X}+\delta_{Y}\geq\tau$ , and $g^{\mathrm{fix}(\rho)}\left(X_{\mathrm{fix}(\rho)},Y_{\mathrm{fix}(\rho)}\right)=\rho_{\mathrm{fix}(\rho)}$ . In this section we prove the uniform marginals lemma, restated next.

Lemma 3.4.

There exists a universal constant $h$ such that the following holds: Let $\gamma>0$ , let $\rho$ be a restriction, and let $z\in\left\{0,1\right\}^{n}$ be a string that is consistent with $\rho$ . Let $X,Y$ be independent random variables that are uniformly distributed over sets $\mathcal{X},\mathcal{Y}\subseteq\Lambda^{n}$ respectively, and assume that they are $(\rho,\tau)$ -structured where

[TABLE]

Let $(X^{\prime},Y^{\prime})$ be uniformly distributed over $G^{-1}(z)\cap(\mathcal{X}\times\mathcal{Y})$ . Then, $X$ and $Y$ are $2^{-\gamma\cdot b}$ -close to $X^{\prime}$ and $Y^{\prime}$ respectively.

In order to prove 3.4, we first prove the following proposition, which says that the string $g^{\mathrm{free}(\rho)}(X_{\mathrm{free}(\rho)},Y_{\mathrm{free}(\rho)})$ is close to the uniform distribution in a very strong sense. In what follows, the parameter $\gamma$ controls the distance from the uniform distribution, and recall that $\eta$ is the parameter that controls the discrepancy of $g$ (i.e., ${\rm disc}(g)\leq 2^{-\eta\cdot b}$ ).

Proposition 3.10 (Generalization of [GLM+16, Lemma 13]).

There exists a universal constant $h$ such that the following holds: Let $\gamma>0$ . Let $X,Y$ be random variables that are $(\rho,\tau)$ -structured for $\tau\geq 2+\frac{h}{c}-\eta+\gamma$ , and let $I\stackrel{{\scriptstyle\rm{def}}}{{=}}\mathrm{free}(\rho)$ . Then, for every $z_{I}\in\left\{0,1\right\}^{I}$ it holds that

[TABLE]

Proof.

Let $h\stackrel{{\scriptstyle\rm{def}}}{{=}}8$ . We use 2.12 to upper bound the biases of $g^{I}(X_{I},Y_{I})$ , and then apply Vazirani’s lemma to show that it is close to the uniform distribution. Let $S\subseteq I$ . By assumption, the variables $X_{I},Y_{I}$ are $\delta_{X}$ -dense and $\delta_{Y}$ -dense for some $\delta_{X},\delta_{Y}$ for which $\delta_{X}+\delta_{Y}\geq 2+\frac{8}{c}-\eta+\gamma$ . Therefore, it holds that

[TABLE]

and 2.12 implies (with $\gamma=\gamma+\frac{2}{c}$ ) that

[TABLE]

Since the latter inequality holds for every $S\subseteq I$ , it follows by 2.5 that

[TABLE]

for every $z_{I}\in\left\{0,1\right\}^{I}$ , as required. ∎

We turn to prove the uniform marginals lemma.

Proof of 3.4

Let $h^{\prime}$ be the universal constant of 3.10 and let $h\stackrel{{\scriptstyle\rm{def}}}{{=}}h^{\prime}+2$ . Let $(X^{\prime},Y^{\prime})$ be uniformly distributed over $G^{-1}(z)\cap(\mathcal{X}\times\mathcal{Y})$ , and let $I\stackrel{{\scriptstyle\rm{def}}}{{=}}\mathrm{free}(\rho)$ . We prove that $X$ is $2^{-\gamma\cdot b}$ -close to $X^{\prime}$ , and a similar argument works for $Y$ . Let $\mathcal{E}\subseteq\mathcal{X}$ be any test event. We show that

[TABLE]

Without loss of generality we may assume that $\Pr\left[X\in\mathcal{E}\right]\geq\frac{1}{2}$ , since otherwise we can replace $\mathcal{E}$ with its complement. Since $X$ and $Y$ are $(\rho,\tau)$ -structured where

[TABLE]

3.10 implies that

[TABLE]

Moreover, since $\Pr\left[X\in\mathcal{E}\right]\geq\frac{1}{2}$ , conditioning on $\mathcal{E}$ cannot decrease the density of $X$ by more than $\frac{1}{b}$ (since this conditioning increases any probability by a factor of at most $2$ ). Therefore $X\texttt{$ \mid $}\mathcal{E}$ and $Y$ together are $(\rho,\tau-\frac{1}{b})$ -structured, where

[TABLE]

Hence, 3.10 implies that

[TABLE]

Now, it holds that

[TABLE]

A similar calculation shows that

[TABLE]

It follows that

[TABLE]

as required. ∎

3.2 Proof of the main technical lemma

In this section we prove our main technical lemma, which upper bounds the probability of a variable to take a dangerous value. We first recall the definition of a dangerous value and the lemma.

Definition 3.8.

Let $Y$ be a random variable taking values from $\Lambda^{n}$ . We say that a value $x\in\Lambda^{n}$ is leaking if there exists a set $I\subseteq\left[n\right]$ and an assignment $z_{I}\in\left\{0,1\right\}^{I}$ such that

[TABLE]

Let $\delta_{Y},\varepsilon>0$ , and suppose that $Y$ is $\delta_{Y}$ -dense. We say that a value $x\in\Lambda^{n}$ is $\varepsilon$ -sparsifying if there exists a set $I\subseteq\left[n\right]$ and an assignment $z_{I}\in\left\{0,1\right\}^{I}$ such that the random variable

[TABLE]

is not $(\delta_{Y}-\varepsilon)$ -dense. We say that a value $x\in\Lambda^{n}$ is $\varepsilon$ -dangerous if it is either leaking or $\varepsilon$ -sparsifying.

Lemma 3.9.

There exists a universal constant $h$ such that the following holds: Let $0<\gamma,\varepsilon,\tau\leq 1$ be such that $\tau\geq 2+\frac{h}{c\cdot\varepsilon}-\eta+\gamma$ and $\varepsilon\geq\frac{4}{b}$ , and let $X,Y$ be $(\rho,\tau)$ -structured random variables. Then, the probability that $X_{\mathrm{free}(\rho)}$ takes a value that is $\varepsilon$ -dangerous for $Y_{\mathrm{free}(\rho)}$ is at most $2^{-\gamma\cdot b}$ .

Let $h$ be a universal constant that will be chosen to be sufficiently large to make the inequalities in the proof hold. Let $\gamma,\varepsilon,\tau,\rho$ be as in the lemma, and assume that $X,Y$ are $(\rho,\tau)$ -structured. For simplicity of the presentation, we assume that all the coordinates of $\rho$ are free — this can be assumed without loss of generality since the fixed coordinates of $\rho$ do not play any part in the lemma. Thus, our goal is to prove an upper bound on the probability that $X$ takes a value that is dangerous for $Y$ . By assumption, there exist some parameters $\delta_{X},\delta_{Y}>0$ such that $X$ and $Y$ are $\delta_{X}$ -dense and $\delta_{Y}$ -dense respectively, and such that $\delta_{X}+\delta_{Y}\geq 2+\frac{h}{c\cdot\varepsilon}-\eta-\gamma$ .

We start by discussing the high-level ideas that underlie the proof. We would like to prove an upper bound on the probability that $X$ takes a value that is either leaking or sparsifying. Proving the upper bound for leaking values is relatively easy and is similar to the proof of 3.10: basically, since $X_{I}$ and $Y_{I}$ are sufficiently dense, the string $g^{I}(X_{I},Y_{I})$ is multiplicatively close to uniform, which implies that most values $x_{I}$ are non-leaking.

The more difficult task is to prove the upper bound for sparsifying values. Basically, a value $x$ is sparsifying if for some disjoint $I,J\subseteq\mathrm{free}(\rho)$ , conditioning on the value of $g^{I}(x_{I},Y_{I})$ decreases the min-entropy of $Y_{J}$ by more than $\varepsilon\cdot b\cdot\left|J\right|$ bits. Our first step is to apply Bayes’ formula to the latter condition, thus obtaining a more convenient condition to which we refer as “skewing”: a value $x$ is skewing if conditioning on the value of $Y_{J}$ decreases the min-entropy of $g^{I}(x_{I},Y_{I})$ by more than $\varepsilon\cdot b\cdot\left|J\right|$ bits — in other words, the min-entropy of $g^{I}(x_{I},Y_{I})$ conditioned on $Y_{J}$ should be less than $\left|I\right|-\varepsilon\cdot b\cdot\left|J\right|$ (roughly).

It remains to prove an upper bound on the probability that $X$ takes a skewing value. This requires proving a lower bound of roughly $\left|I\right|-\varepsilon\cdot b\cdot\left|J\right|$ on the min-entropy of $g^{I}(x_{I},Y_{I})\texttt{$ \mid $}Y_{J}$ for most $x$ ’s. By the min-entropy version of Vazirani’s lemma (2.6), in order to prove this lower bound, it suffices to prove an upper bound on the bias of $g^{S}(x_{S},Y_{S})\texttt{$ \mid $}Y_{J}$ for every set $S\subseteq I$ for which333Recall that $c$ is a large constant such that $b\geq c\cdot\log n$ . $\left|S\right|\gtrapprox\varepsilon\cdot c\cdot\left|J\right|$ .

To this end, we use the “extractor-like” property of $g^{S}$ : recall that by the discrepancy of $g$ (2.13), the bias of $g^{S}(x_{S},Y_{S})\texttt{$ \mid $}Y_{J}$ is small for most $x$ ’s whenever the min-entropy of $X_{S}$ and $Y_{S}\texttt{$ \mid $}Y_{J}$ is high. Furthermore, recall that the min-entropy of $X_{S}$ and $Y_{S}$ is high since we assumed that $X$ and $Y$ are dense. The key step is to observe that the min-entropy of $Y_{S}\texttt{$ \mid $}Y_{J}$ is still high, since $S$ is large compared to $J$ . Thus, the min-entropy of $X_{S}$ and $Y_{S}\texttt{$ \mid $}Y_{J}$ is high, so the bias of $g^{S}(x_{S},Y_{S})\texttt{$ \mid $}Y_{J}$ is small, and this implies the desired lower on the min-entropy of $g^{I}(x_{I},Y_{I})\texttt{$ \mid $}Y_{J}$ .

The argument we explained above almost works, except for a small issue: We said that $H_{\infty}(Y_{S}\texttt{$ \mid $}Y_{J})$ is still high, since $S$ is large compared to $J$ . Here, we implicitly assumed that conditioning on the value of $Y_{J}$ decreases the min-entropy of $Y_{S}$ by roughly $\left|J\right|\cdot b$ bits. This assumption is true for the average value of $Y_{J}$ , but may fail for values of $Y_{J}$ that have a very small probability. In order to deal with such values, we define a parameter $e_{y_{J}}$ which measures the “excess entropy” of $y_{J}$ , and keep track of it throughout the proof. The key observation is that if we consider a value $y_{J}$ that has a small probability, then the criterion of “skewing” actually requires the min-entropy of $g^{I}(x_{I},Y_{I})$ to decrease by roughly $\varepsilon\cdot b\cdot\left|J\right|+e_{y_{J}}$ . Intuitively, this means that the smaller the probability of $y_{J}$ , the harder it becomes for $x$ to be skewing. After propagating the additional term of $e_{y_{J}}$ throughout our proof, we get that the set $S$ can be assumed to satisfy

[TABLE]

This makes the set $S$ sufficiently large compared to $e_{y_{J}}$ that we can still deduce that $Y_{S}\texttt{$ \mid $}Y_{J}=y_{J}$ has high min-entropy, which finished the argument. We now turn to provide the formal proof, starting with a formal definition of the parameter $e_{y_{J}}$ and the criterion of “skewing”.

Definition 3.11.

Recall that since $Y$ is $\delta_{Y}$ -dense, it holds that $\Pr\left[Y_{J}=y_{J}\right]\leq 2^{-\delta_{Y}\cdot b\cdot\left|J\right|}$ for every $J\subseteq\left[n\right]$ and $y_{J}\in\Lambda^{J}$ . We denote by $e_{y_{J}}\in\mathbb{R}$ the (non-negative) number that satisfies

[TABLE]

We say that a value $x\in\Lambda^{n}$ is $\varepsilon$ -skewing if there exist disjoint non-empty sets $I,J\subseteq\left[n\right]$ and a value $y_{J}\in\Lambda^{J}$ such that

[TABLE]

Next, we show that every dangerous value must be either leaking or skewing by applying Bayes’ formula.

Claim 3.12.

Let $x\in\Lambda^{n}$ be an $\varepsilon$ -dangerous value that is not leaking for $Y$ . Then $x$ is $\varepsilon$ -skewing.

Proof.

Suppose that $x$ is $\varepsilon$ -dangerous for $Y$ and that it is not leaking. We prove that $x$ is $\varepsilon$ -skewing. By our assumption, $x$ must be $\varepsilon$ -sparsifying, so there exists a set $I\subseteq\left[n\right]$ and an assignment $z_{I}\in\left\{0,1\right\}^{I}$ such that the random variable

[TABLE]

is not $(\delta_{Y}-\varepsilon)$ -dense. Thus, there exists a set $J\subseteq\left[n\right]-I$ and a value $y_{J}\in\Lambda^{n}$ such that

[TABLE]

By Bayes’ formula, it holds that

[TABLE]

Hence, it follows that

[TABLE]

which implies that

[TABLE]

This means that

[TABLE]

That is, $x$ is $\varepsilon$ -skewing, as required. ∎

As explained above, we will upper bound the probability of dangerous values by upper bounding the biases of $g(x_{S},Y_{S})\texttt{$ \mid $}Y_{J}$ for every $S\subseteq I$ . To this end, it is convenient to define the notion of a “biasing value”, which is a value $x$ for which one of the biases is too large.

Definition 3.13.

We say that a value $x\in\Lambda^{n}$ is *biasing (for $Y$ ) with respect to *disjoint sets $S,J\subseteq\left[n\right]$ and an assignment $y_{J}\in\Lambda^{J}$ if

[TABLE]

We say that $x$ is $\varepsilon$ -biasing (for $Y$ ) with respect to a set $S\subseteq\left[n\right]$ if there exists a set $J\subseteq\left[n\right]-S$ and an assignment $y_{J}\in\Lambda^{J}$ that satisfy

[TABLE]

such that $x$ is biasing with respect to $S$ , $J$ , and $y_{J}$ (if $J$ is the empty set, we define $e_{y_{J}}=0$ ). Finally, we say that $x$ is $\varepsilon$ -biasing (for $Y$ ) if there exists a non-empty set $S$ with respect to which $x$ is $\varepsilon$ -biasing.

We now apply the min-entropy version of Vazirani’s lemma to show that values that are not biasing are not dangerous.

Claim 3.14.

If a value $x\in\Lambda^{n}$ is not $\varepsilon$ -biasing for $Y$ then it is not $\varepsilon$ -dangerous for $Y$ .

Proof.

Suppose that $x\in\Lambda^{n}$ is a value that is not $\varepsilon$ -biasing for $Y$ . We prove that $x$ is not $\varepsilon$ -dangerous for $Y$ . We start by proving that $x$ is not leaking. Let $I\subseteq\left[n\right]$ and let $z_{I}\in\left\{0,1\right\}^{I}$ . We wish to prove that

[TABLE]

Observe that, by the assumption that $x$ is not $\varepsilon$ -biasing, it holds for every non-empty set $S\subseteq I$ that

[TABLE]

(this follows by substituting $J=\emptyset$ in the definition of $\varepsilon$ -biasing and noting that in this case $e_{y_{J}}=0$ ). It now follows from 2.6 that $\Pr\left[g^{I}(x_{I},Y_{I})=z_{I}\right]\geq 2^{-\left|I\right|-1}$ , as required.

We turn to prove that $x$ is not $\varepsilon$ -skewing. Let $I,J\subseteq\left[n\right]$ be disjoint sets and let $y_{J}\in\Lambda^{J}$ be an assignment. We wish to prove that

[TABLE]

By 2.6, it suffices to prove that for every set $S\subseteq I$ such that $\left|S\right|\geq\frac{\varepsilon\cdot b\cdot\left|J\right|+e_{y_{J}}+2}{\log n}$ it holds that

[TABLE]

To this end, observe that every such set $S$ satisfies

[TABLE]

and since by assumption $x$ is not $\varepsilon$ -biasing with respect to $S$ , the required upper bound on the bias must hold. It follows that $x$ is neither leaking nor $\varepsilon$ -skewing, and therefore it is not $\varepsilon$ -dangerous, as required. ∎

Finally, we prove an upper bound on the probability of $X$ to take an $\varepsilon$ -biasing value, which together with 3.14 implies 3.9. As explained above, the idea is to combine the discrepancy of $g$ with the observation that $X_{S}$ and $Y_{S}$ have large min-entropy even conditioned on $Y_{J}=y_{J}$ (which holds since $X,Y$ are dense and $S$ is large compared to $\left|J\right|$ and $e_{y_{J}}$ ).

Proposition 3.15.

The probability that $X$ takes a value $x$ that is $\varepsilon$ -biasing for $Y$ is at most $2^{-\gamma\cdot b}$ .

Proof.

We begin with upper bounding the probability of $X$ to take a value that is $\varepsilon$ -biasing with respect to specific choices of $S$ , $J$ , and $y_{J}$ , and the rest of the proof will follow by applying union bounds over all possible choices of $S$ , $J$ , and $y_{J}$ . Let $S,J\subseteq\left[n\right]$ be disjoint sets and let $y_{J}\in\Lambda^{J}$ be an assignment such that $S$ , $J$ , and $y_{J}$ together satisfy Equation 4, i.e.,

[TABLE]

For simplicity, we assume that $J$ is non-empty (in the case where $J$ is empty, the argument is similar but simpler). Since we assumed that $\varepsilon\geq\frac{4}{b}$ and that $J$ is non-empty, and it holds that $\frac{1}{2}\cdot c\cdot\varepsilon\cdot\left|J\right|\geq\frac{2}{\log n}$ and therefore

[TABLE]

In other words, it holds that

[TABLE]

By assumption, $Y$ is $\delta_{Y}$ -dense, so $H_{\infty}(Y_{S})\geq\delta_{Y}\cdot b\cdot\left|S\right|$ . By 2.2, it follows that

[TABLE]

Moreover, $X$ is $\delta_{X}$ -dense and thus

[TABLE]

where the second inequality is made to hold for by choosing $h$ to be sufficiently large. It follows by 2.13 (with $\lambda=\frac{3}{c\cdot\varepsilon}$ and $\gamma=\gamma+\frac{5}{c\cdot\varepsilon}$ ) that the probability that $X_{S}$ takes a value $x_{S}\in\Lambda^{S}$ for which

[TABLE]

is at most $2^{-\left(\gamma+\frac{5}{c\cdot\varepsilon}\right)\cdot b\cdot\left|S\right|}$ .

We turn to applying the union bounds. First, we show that for every $S\subseteq\left[n\right]$ , the probability that $X$ takes a value that is $\varepsilon$ -biasing with respect to $S$ is at most $2^{-(\gamma+\frac{2}{c\cdot\varepsilon})\cdot b\cdot\left|S\right|}$ by taking upper bound over all choices of $J$ and $y_{J}$ . Note that we only need to consider sets $J\subseteq\left[n\right]$ for which $\left|J\right|\leq\frac{1}{c\cdot\varepsilon}\cdot\left|S\right|$ . It follows that the probability that $X_{S}$ takes a value that satisfies Equation 6 for some $J$ and $y_{J}$ is at most

[TABLE]

where the last inequality follows since

[TABLE]

The above calculation showed that the probability that $X$ takes a value that is $\varepsilon$ -biasing with respect to a fixed set $S$ is at most $2^{-(\gamma+\frac{2}{c\cdot\varepsilon})\cdot b\cdot\left|S\right|}$ . Taking a union bound over all non-empty sets $S\subseteq\left[n\right]$ , the probability that $X$ takes a value that is $\varepsilon$ -biasing for $Y$ is at most

[TABLE]

We have thus shown that the probability that $X$ takes a value that is $\varepsilon$ -biasing is at most $2^{-\gamma\cdot b}$ , as required. ∎

4 The deterministic lifting theorem

In this section, we prove the deterministic part of our main theorem. In fact, we prove the following more general result.

Theorem 4.1 (Deterministic lifting theorem).

For every $\eta>0$ there exists $c=O(\frac{1}{\eta^{2}})$ such that the following holds: Let $n\in\mathbb{N}$ be such that $n\geq 2$ , let $\Lambda\stackrel{{\scriptstyle\rm{def}}}{{=}}\left\{0,1\right\}^{b}$ be such that $b\geq c\cdot\log n$ , let $g:\Lambda\times\Lambda\to\left\{0,1\right\}$ be a function such that ${\rm disc}(g)\leq 2^{-\eta\cdot b}$ , and let $G=g^{n}$ . Let $\Pi$ be a deterministic protocol that takes inputs in $\Lambda^{n}\times\Lambda^{n}$ and that has communication complexity $C$ and round complexity $r$ . Then, there exists a deterministic parallel decision tree $T$ that that on input $z\in\left\{0,1\right\}^{n}$ outputs a transcript $\pi$ of $\Pi$ that is consistent with some pair of inputs $(x,y)\in G^{-1}(z)$ , and that has query complexity $O(\frac{C}{b})$ and depth $r$ .

Observe that this theorem implies the lower bound of the main theorem: Given a protocol $\Pi$ that solves $\mathcal{S}\circ G$ with complexity $C$ , we use the theorem to construct a tree $T$ that on input $z$ outputs the output of $\Pi$ on some pair of inputs in $G^{-1}(z)$ . This tree $T$ clearly solves $\mathcal{S}$ , and the query complexity of $T$ is $O(\frac{C}{b})$ . This implies that $D^{\mathrm{dt}}(\mathcal{S})=O\left(D^{\mathrm{cc}}(\mathcal{S}\circ G)/b\right)$ , or in other words, $D^{\mathrm{cc}}(\mathcal{S}\circ G)=\Omega\left(D^{\mathrm{dt}}(\mathcal{S})\cdot b\right)$ , as required.

For the rest of this section, fix $\Pi$ to be an arbitrary deterministic protocol that takes inputs in $\Lambda^{n}\times\Lambda^{n}$ , and denote by $C$ and $r$ its communication complexity and round complexity respectively. The rest of this section is organized as follows: We first describe the construction of the parallel decision tree $T$ in LABEL:Subsec:deterministic-construction. We then prove that the output of $T$ is always correct in LABEL:Subsec:deterministic-correctness. Finally, we upper bound the query complexity of $T$ in LABEL:Subsec:deterministic-complexity.

4.1 The construction of $T$

Let $h^{\prime}$ be the maximum among the universal constants of 3.10 and the main technical lemma (3.9), and let $h$ be a universal constant that will be chosen to be sufficiently large to make the inequalities in the proof hold. Let $\varepsilon\stackrel{{\scriptstyle\rm{def}}}{{=}}\frac{h}{c\cdot\eta}$ , let $\delta\stackrel{{\scriptstyle\rm{def}}}{{=}}1-\frac{\eta}{4}+\frac{\varepsilon}{2}$ , and let $\tau\stackrel{{\scriptstyle\rm{def}}}{{=}}2\cdot\delta-\varepsilon$ . The tree $T$ constructs the transcript $\pi$ by simulating the protocol $\Pi$ round-by-round, each time adding a single message to $\pi$ . Throughout the simulation, the tree maintains a rectangle $\mathcal{X}\times\mathcal{Y}\subseteq\Lambda^{n}\times\Lambda^{n}$ of inputs that are consistent with $\pi$ (but not necessarily of all such inputs). In what follows, we denote by $X$ and $Y$ random variables that are uniformly distributed over $\mathcal{X}$ and $\mathcal{Y}$ respectively. The tree will maintain the invariant that $X$ and $Y$ are $(\rho,\tau)$ -structured, where $\rho$ is a restriction that keeps track of the queries the tree has made so far. In fact, the tree will maintain a more specific invariant: whenever it is Alice’s turn to speak, $X_{\mathrm{free}(\rho)}$ is $(\delta-\varepsilon)$ -dense and $Y_{\mathrm{free}(\rho)}$ is $\delta$ -dense, and whenever it is Bob’s turn to speak, the roles of $X$ and $Y$ are reversed.

When the tree $T$ starts the simulation, the tree sets the transcript $\pi$ to be the empty string, the restriction $\rho$ to $\left\{*\right\}^{n}$ , and the sets $\mathcal{X},\mathcal{Y}$ to $\Lambda^{n}$ . At this point the invariant clearly holds. We now explain how $T$ simulates a single round of the protocol while maintaining the invariant. Suppose that the invariant holds at the beginning of the current round, and assume without loss of generality that it is Alice’s turn to speak. The tree $T$ performs the following steps:

The tree conditions $X_{\mathrm{free}(\rho)}$ on not taking a value that is $\varepsilon$ -dangerous for $Y_{\mathrm{free}(\rho)}$ (i.e., the tree removes from $\mathcal{X}$ all the values $x$ for which $x_{\mathrm{free}(\rho)}$ is $\varepsilon$ -dangerous for $Y_{\mathrm{free}(\rho)}$ ). 2. 2.

The tree $T$ chooses an arbitrary message $M$ of Alice with the following property: the probability of Alice sending $M$ on input $X$ is at least $2^{-\left|M\right|}$ (the existence of $M$ will be justified soon). The tree adds $M$ to the transcript $\pi$ , and conditions $X$ on the event of sending $M$ (i.e., the tree sets $\mathcal{X}$ to be the subset of inputs that are consistent with $M$ ). 3. 3.

Let $I\subseteq\mathrm{free}(\rho)$ be a maximal set that violates the $\delta$ -density of $X_{\mathrm{free}(\rho)}$ (i.e., $H_{\infty}(X_{I})<\delta\cdot b\cdot\left|I\right|$ ), and let $x_{I}\in\Lambda^{I}$ be a value that satisfies $\Pr\left[X_{I}=x_{I}\right]>2^{-\delta\cdot b\cdot\left|I\right|}$ . The tree conditions $X$ on $X_{I}=x_{I}$ (i.e., the tree removes from $\mathcal{X}$ all the values that are inconsistent with that event). By 3.6, $X_{\mathrm{free}(\rho)-I}$ is now $\delta$ -dense. 4. 4.

The tree queries $z_{I}$ , and updates $\rho$ accordingly. 5. 5.

The tree conditions $Y$ on $g^{I}(x_{I},Y_{I})=\rho_{I}$ (i.e., the tree sets $\mathcal{Y}$ to be the subset of values $y$ for which $g^{I}(x_{I},y_{I})=\rho_{I}$ ). Due to Step 1, the variable $X_{\mathrm{free}(\rho)}$ must take a value that is not $\varepsilon$ -dangerous, and therefore $Y_{\mathrm{free}(\rho)}$ is necessarily $(\delta-\varepsilon)$ -dense.

After those steps take place, it becomes Bob’s turn to speak, and indeed, $X_{\mathrm{free}(\rho)}$ and $Y_{\mathrm{free}(\rho)}$ are $\delta$ -dense and $(\delta-\varepsilon)$ -dense respectively. Thus, the invariant is maintained. When the protocol $\Pi$ stops, the tree $T$ outputs the transcript $\pi$ and halts. In order for the foregoing construction to be well-defined, it remains to explain three points:

•

First, we should explain why the set $\mathcal{X}$ remains non-empty after Step 1 (otherwise, the following steps are not well-defined). To this end, recall that $X$ and $Y$ are $(\rho,\tau)$ -structured and observe that $\tau$ can be made larger than $2+\frac{h^{\prime}}{c\cdot\varepsilon}-\eta$ by choosing $h$ to be sufficiently large (see LABEL:Subsec:deterministic-complexity for a detailed calculation). Hence, by our main lemma (3.9), the variable $X_{\mathrm{free}(\rho)}$ has a non-zero probability to take a value that is not $\varepsilon$ -dangerous for $Y_{\mathrm{free}(\rho)}$ , so $\mathcal{X}$ is non-empty after this step.

•

Second we should explain why the message $M$ in Step 2 exists. To see why, observe that the set of possible messages of Alice forms a prefix-free code — otherwise, Bob will not be able to tell when Alice finished speaking and his turn starts. Hence, by 2.8, it follows that there exists a message $M$ with probability at least $2^{-\left|M\right|}$ .

•

Third, we should explain why the set $\mathcal{Y}$ remains non-empty after Step 5. To this end, recall that $X$ must take a value that is not $\varepsilon$ -dangerous for $Y$ , and in particular, the value of $X$ is necessarily not leaking. This means that in particular that the string $g^{I}(x_{I},Y_{I})$ has non-zero probability to be equal to $\rho_{I}$ , so $\mathcal{Y}$ is non-empty after this step.

The depth of $T$ .

We now observe that the depth of $T$ is equal to the round complexity of $\Pi$ . Note that in each round, the tree $T$ issues a set of queries $I$ simultaneously. Thus, $T$ is a parallel decision tree whose depth equals the maximal number of rounds of $\Pi$ , as required.

4.2 The correctness of $T$

We now prove that when the decision tree $T$ halts, the transcript $\pi$ is consistent with some inputs $(x,y)\in G^{-1}(z)$ . Clearly, the transcript $\pi$ is consistent with all the inputs in the rectangle $\mathcal{X}\times\mathcal{Y}$ . Thus, it suffices to show that there exist $x\in\mathcal{X}$ and $y\in\mathcal{Y}$ such that $G(x,y)=z$ . To this end, recall that when the tree halts, the random variables $X$ and $Y$ are $(\rho,\tau)$ -structured. Since $\rho$ is consistent with $z$ , it holds for every $x\in\mathcal{X}$ and $y\in\mathcal{Y}$ that

[TABLE]

It remains to deal with the free coordinates of $\rho$ . Since $\tau$ can be made larger than $2+\frac{h^{\prime}}{c}-\eta$ by choosing $h$ to be sufficiently large (see 4.3 for a detailed calculation), it follows by 3.10 that

[TABLE]

In particular, there exist $x\in\mathcal{X}$ and $y\in\mathcal{Y}$ such that

[TABLE]

By combining Equations (LABEL:deterministic-fixed-z) and (LABEL:deterministic-free-z), we get that $G(x,y)=z$ , as required.

4.3 The query complexity of $T$

We conclude by showing that the total number of queries the tree $T$ makes is at most $O(\frac{C}{b})$ . To this end, we define the deficiency of $X,Y$ to be

[TABLE]

We will prove that whenever the protocol transmits a message $M$ , the deficiency increases by $O(\left|M\right|)$ , and that whenever the tree $T$ makes a query, the deficiency is decreased by $\Omega(b)$ . Since the deficiency is always non-negative, and the protocol transmits at most $C$ bits, it will follow that the tree must make at most $O(\frac{C}{b})$ queries. More specifically, we prove that in every round, the first two steps from 4.1 increase the deficiency by at most $\left|M\right|+1$ in total, and the rest of the steps decrease the deficiency by at least $\Omega(\left|I\right|\cdot b)$ , and this will imply the desired result.

Fix a round of the simulation, and assume without loss of generality that the message is sent by Alice. We start by analyzing Step 1. At this step, the tree conditions $X_{\mathrm{free}(\rho)}$ on taking dangerous values that are not $\varepsilon$ -dangerous for $Y_{\mathrm{free}(\rho)}$ . We show that this step increases the deficiency by at most one bit. By applying our main technical lemma (3.9) with $\gamma=\frac{1}{b}$ , it follows that the probability that $X_{\mathrm{free}(\rho)}$ is $\varepsilon$ -dangerous is at most $\frac{1}{2}$ . By 2.2, it follows that conditioning on non-dangerous values decreases $H_{\infty}(X_{\mathrm{free}(\rho)})$ by at most one bit, and therefore it increases the deficiency by at most one bit. To see why we can apply the main lemma with $\gamma=\frac{1}{b}$ , recall that at this point $X$ and $Y$ are $(\rho,\tau)$ -structured, where

[TABLE]

where the last inequality can be made to hold by choosing $h$ to be sufficiently large.

Next, in Step 2, the tree conditions $X$ on the event of sending the message $M$ , which has probability at least $2^{-\left|M\right|}$ . By 2.2, this decreases $H_{\infty}(X_{\mathrm{free}(\rho)})$ by at most $\left|M\right|$ bits, which increases the deficiency by at most $\left|M\right|$ bits. All in all, we showed that the first two steps of the simulation increase the deficiency by at most $\left|M\right|+1$ .

Let $I$ be the set of queries chosen in Step 3. We turn to show that the rest of the steps decrease the deficiency by at least $\Omega(b\cdot\left|I\right|)$ . Without loss of generality, assume that $I\neq\emptyset$ (otherwise the latter bound holds vacuously). The rest of the steps apply the following changes to the deficiency:

•

Step 3 conditions $X$ on the event $X_{I}=x_{I}$ , which has probability greater than $2^{-\delta\cdot b\cdot\left|I\right|}$ by the definition of $x_{I}$ . Hence, this conditioning increases the deficiency by less than $\delta\cdot b\cdot\left|I\right|$ (by 2.3).

•

Step 4 removes the set $I$ from $\mathrm{free}(\rho)$ . Looking at the definition of deficiency, this change decreases the term of $2\cdot b\cdot\left|\mathrm{free}(\rho)\right|$ by $2\cdot b\cdot\left|I\right|$ , decreases the term $H_{\infty}(Y_{\mathrm{free}(\rho)})$ by at most $b\cdot\left|I\right|$ (by 2.3), and does not change the term $H_{\infty}(X_{\mathrm{free}(\rho)})$ (since at this point $X_{I}$ is fixed to $x_{I}$ ). All in all, the deficiency is decreased by at least $b\cdot\left|I\right|$ .

•

Finally, Step 5 conditions $Y$ on the event $g^{I}(x_{I},Y_{I})=\rho_{I}$ . This event has probability at least $2^{-\left|I\right|-1}$ by the assumption that $X$ is not dangerous (and hence not leaking). Thus, this conditioning increases the deficiency by at most $\left|I\right|+1$ (by 2.3).

Summing all those effects together, we get that the deficiency was decreased by at least

[TABLE]

By choosing $c$ to be sufficiently large, we can make sure that $1-\delta-\frac{2}{b}$ is a positive constant independent of $b$ and $n$ , and therefore the decrease in the deficiency will be at least $\Omega(b\cdot\left|I\right|)$ , as required. To see it, observe that

[TABLE]

Therefore, if we choose $c>\frac{2\cdot(h+4)}{\eta^{2}}$ , the expression on the right-hand side will be a constant that is strictly smaller than $1$ , as required.

5 The randomized lifting theorem

In this section, we prove the randomized part of our main theorem. In fact, we prove the following more general result.

Theorem 5.1 (Randomized lifting theorem).

For every $\eta>0$ there exists $c=O(\frac{1}{\eta^{2}}\cdot\log\frac{1}{\eta})$ such that the following holds: Let $n\in\mathbb{N}$ be such that $n\geq 2$ , let $\Lambda\stackrel{{\scriptstyle\rm{def}}}{{=}}\left\{0,1\right\}^{b}$ be such that $b\geq c\cdot\log n$ , let $g:\Lambda\times\Lambda\to\left\{0,1\right\}$ be a function such that ${\rm disc}(g)\leq 2^{-\eta\cdot b}$ , and let $G=g^{n}$ . Let $\Pi$ be a randomized (public-coin) protocol that takes inputs in $\Lambda^{n}\times\Lambda^{n}$ that has communication complexity $C\leq 2\cdot b\cdot n$ and round complexity $r$ . Then, there exists a randomized parallel decision tree $T$ with the following properties:

•

On input $z\in\left\{0,1\right\}^{n}$ , the tree outputs a transcript $\pi$ of $\Pi$ , whose distribution is $2^{-\frac{\eta}{8}\cdot b}$ -close to the distribution of the transcripts of $\Pi$ when given inputs that are uniformly distributed in $G^{-1}(z)$ .

•

The tree $T$ has query complexity $O(\frac{C}{b}+1)$ and depth $r$ .

We first observe that 5.1 indeed implies the lower bound of our main theorem.

Proof of 1.2 from 5.1.

Let $\mathcal{S}:\left\{0,1\right\}^{n}\to\mathcal{O}$ be a search problem, and let $\varepsilon>0$ and $\varepsilon^{\prime}\stackrel{{\scriptstyle\rm{def}}}{{=}}\varepsilon+2^{-\frac{\eta}{8}\cdot b}$ . We prove that $R^{\mathrm{cc}}_{\varepsilon}(\mathcal{S}\circ G)=\Theta\left(R^{\mathrm{dt}}_{\varepsilon^{\prime}}(\mathcal{S})\cdot b\right)$ . Let $\Pi$ be an optimal protocol that solves $\mathcal{S}\circ G$ with complexity $C\stackrel{{\scriptstyle\rm{def}}}{{=}}R^{\mathrm{cc}}(\mathcal{S}\circ G)$ , and observe that we can assume without loss of generality that $C\leq 2\cdot b\cdot n$ (since the players can solve any search problem by sending their whole inputs). By applying the theorem to $\Pi$ , we construct a tree $T$ that on input $z$ samples a transcript of $\Pi$ as in the theorem, and outputs the output that is associated with this transcript. It is not hard to see that the output of $T$ will be in $\mathcal{S}(z)$ with probability at least

[TABLE]

and that the query complexity of $T$ is $O(\frac{C}{b}+1)$ . This implies that $R^{\mathrm{dt}}_{\varepsilon^{\prime}}(\mathcal{S})=O\left(R^{\mathrm{cc}}_{\varepsilon}(\mathcal{S}\circ G)/b+1\right)$ , or in other words, $R^{\mathrm{cc}}_{\varepsilon}(\mathcal{S}\circ G)=\Omega\left(\left(R^{\mathrm{dt}}_{\varepsilon^{\prime}}(\mathcal{S})-O(1)\right)\cdot b\right)$ , as required. ∎

In the rest of this section we prove 5.1. We start the proof by observing that it suffices to prove the theorem for the special case in which the protocol $\Pi$ is deterministic. To see why, recall that a randomized public-coin protocol is a distribution over deterministic protocols. Thus, if we prove the theorem for deterministic protocols, we can extend it to randomized protocols as follows: Given a randomized protocol $\Pi$ , the tree $T$ will start by sampling a deterministic protocol $\Pi_{\mathrm{det}}$ from the distribution $\Pi$ , and will then apply the theorem to $\Pi_{\mathrm{det}}$ . It is not hard to verify that such a tree $T$ satisfies the requirements of 5.1. Thus, it suffices to consider the case where $\Pi$ is deterministic.

For the rest of this section, fix $\Pi$ to be an arbitrary deterministic protocol that takes inputs in $\Lambda^{n}\times\Lambda^{n}$ , and denote by $C$ and $r$ its communication complexity and round complexity respectively. The rest of this section is organized as follows: We first describe the construction of the parallel decision tree $T$ in LABEL:Subsec:randomized-construction. We then prove that the transcript that $T$ outputs is distributed as required in LABEL:Subsec:randomized-correctness. Finally, we upper bound the query complexity of $T$ in LABEL:Subsec:randomized-complexity.

5.1 The construction of $T$

The construction of the randomized tree $T$ is similar to the construction of the deterministic lifting theorem (LABEL:Subsec:deterministic-construction), but has the following differences in the simulation:

•

In the deterministic construction, the tree chose the message $M$ arbitrarily subject to having sufficiently high probability. The reason we could do it is that it did not matter which transcript the tree would output as long as it was consistent in $G^{-1}(z)$ . In the randomized construction, on the other hand, we would like to output a transcript whose distribution is close to the “correct” distribution. Therefore, we change the construction such that the message $M$ is chosen randomly according to the distribution of the inputs.

•

Since the messages are now sampled according to the distribution of the inputs, we can no longer guarantee that the message $M$ has sufficiently high probability. Therefore, the tree may choose messages $M$ that have very low probability, and such messages may reveal too much information about the inputs. In order to avoid that, the tree maintains a variable $K$ which keeps track of the amount of information that was revealed by the messages. If at any point $K$ becomes too large, the tree halts and declares failure. This modification is important since if we allow the chosen messages to reveal too much information, then they will lead the tree to make too many queries. In particular, the bound on $K$ is used in 5.3 to upper bound the query complexity of $T$ .

•

In the deterministic construction, the tree restored the density of $X$ by fixing some set of coordinates $I$ to some value $x_{I}$ (using 3.6). Again, this was possible since it did not matter which transcript the tree would output. In the randomized construction, we cannot do it, since the transcript has to be distributed in a way that is close to be correct. In order to resolve this issue, we follow [GPW17] and use their “density-restoring partition” (3.7). Recall that this lemma says that the probability space of $X$ can be partitioned into dense parts. The tree now samples one of those parts according to their probabilities and conditions $X$ on being in this part. If this conditioning reveals too much information, then the tree halts and declares failure.

We turn to give a formal description of the construction. Let $h^{\prime}$ be the maximum among the universal constants of the uniform marginals lemma (3.4) and the main technical lemma (3.9), and let $h$ be a universal constant that will be chosen to be sufficiently large to make the inequalities in the proof hold. Let $\varepsilon\stackrel{{\scriptstyle\rm{def}}}{{=}}\frac{h\cdot\log c}{c\cdot\eta}$ , and as before, $\delta\stackrel{{\scriptstyle\rm{def}}}{{=}}1-\frac{\eta}{4}+\frac{\varepsilon}{2}$ , and $\tau\stackrel{{\scriptstyle\rm{def}}}{{=}}2\cdot\delta-\varepsilon$ . As before, the parallel decision tree $T$ constructs the transcript $\pi$ by simulating the protocol $\Pi$ round-by-round, each time adding a single message to $\pi$ . Throughout the simulation, the tree maintains a rectangle $\mathcal{X}\times\mathcal{Y}\subseteq\Lambda^{n}\times\Lambda^{n}$ of inputs that are consistent with $\pi$ (but not necessarily of all such inputs). In what follows, we denote by $X$ and $Y$ random variables that are uniformly distributed over $\mathcal{X}$ and $\mathcal{Y}$ respectively. As before, the tree will maintain the invariant that $X$ and $Y$ are $(\rho,\tau)$ -structured, and that moreover, they are $(\delta-\varepsilon)$ -dense and $\delta$ -dense respectively in Alice’s rounds and the other way around in Bob’s rounds. As mentioned above, the tree will also maintain a variable $K$ from iteration to iteration, which will measure the information revealed so far.

When the tree $T$ starts the simulation, the tree sets the transcript $\pi$ to be the empty string, the restriction $\rho$ to $\left\{*\right\}^{n}$ , the variable $K$ to zero, and the sets $\mathcal{X},\mathcal{Y}$ to $\Lambda^{n}$ . At this point the invariant clearly holds. We now explain how $T$ simulates a single round of the protocol while maintaining the invariant. Suppose that the invariant holds at the beginning of the current round, and assume without loss of generality that it is Alice’s turn to speak. The tree $T$ performs the following steps:

The tree conditions $X_{\mathrm{free}(\rho)}$ on not taking a value that is $\varepsilon$ -dangerous for $Y_{\mathrm{free}(\rho)}$ (i.e., the tree removes from $\mathcal{X}$ all the values $x$ for which $x_{\mathrm{free}(\rho)}$ is $\varepsilon$ -dangerous for $Y_{\mathrm{free}(\rho)}$ ). 2. 2.

The tree samples a message $M$ of Alice according to the distribution induced by $X$ . Let $p_{M}$ be the probability of $M$ . The tree adds $M$ to the transcript, adds $\log\frac{1}{p_{M}}$ to $K$ , and conditions $X$ on $M$ (i.e., the tree sets $\mathcal{X}$ to be the subset of inputs that are consistent with $M$ ). 3. 3.

If $K>C+b$ , the tree halts and declares error. 4. 4.

Let $\mathcal{X}_{\mathrm{free}(\rho)}=\mathcal{X}^{1}\cup\ldots\cup\mathcal{X}^{\ell}$ be the density-restoring partition of LABEL:Lem:density-restoring-partition with respect to $X_{\mathrm{free}(\rho)}$ . The tree chooses a random class in the partition, where the class $\mathcal{X}^{j}$ is chosen with probability $\Pr\left[X_{\mathrm{free}(\rho)}\in\mathcal{X}^{j}\right]$ . Let $\mathcal{X}^{j}$ be the chosen class, and let $I_{j}$ and $x_{I_{j}}$ be the set and the value associated with $\mathcal{X}^{j}$ . The tree conditions $X$ on the event $X_{\mathrm{free}(\rho)}\in\mathcal{X}^{j}$ (i.e., the tree sets $\mathcal{X}$ to be the subset of inputs $x$ such that $x_{\mathrm{free}(\rho)}\in\mathcal{X}^{j}$ ). The variable $X_{\mathrm{free}(\rho)-I_{j}}$ is now $\delta$ -dense by the properties of the density-restoring partition. 5. 5.

Recall that

[TABLE]

(see LABEL:Lem:density-restoring-partition). If $p_{\geq j}<\frac{1}{8}\cdot 2^{-\frac{\eta}{8}}\cdot\frac{1}{2\cdot n\cdot b}$ , the tree halts and declares error. 6. 6.

The tree queries the coordinates in $I_{j}$ , and updates $\rho$ accordingly. 7. 7.

The tree conditions $Y$ on $g^{I}(x_{I_{j}},Y_{I_{j}})=\rho_{I_{j}}$ (i.e., the tree sets $\mathcal{Y}$ to be the subset of values $y$ for which $g^{I}(x_{I_{j}},Y_{I_{j}})=\rho_{I_{j}}$ ). Due to Step 1, the variable $X_{\mathrm{free}(\rho)}$ must take a value that is not $\varepsilon$ -dangerous, and therefore $Y_{\mathrm{free}(\rho)}$ is necessarily $(\delta-\varepsilon)$ -dense.

After those steps take place, it becomes Bob’s turn to speak, and indeed, $X_{\mathrm{free}(\rho)}$ and $Y_{\mathrm{free}(\rho)}$ are $\delta$ -dense and $(\delta-\varepsilon)$ -dense respectively. Thus, the invariant is maintained. When the protocol $\Pi$ stops, the tree $T$ outputs the transcript $\pi$ and halts. The proof that the above steps are well-defined is similar to the proof for the deterministic construction and is therefore omitted.

The depth of $T$ .

As in the proof of the deterministic lifting theorem, it is not hard to see that the depth of $T$ is equal to the round complexity of $\Pi$ .

5.2 The correctness of $T$

In this section, we prove the correctness of the construction. For convenience, we first prove the correctness of a modified tree $T^{*}$ , whose construction is the same as that of $T$ except that Step 3 is omitted. Fix an input $z\in\left\{0,1\right\}^{n}$ . We define the following (random) transcripts of the protocol $\Pi$ :

•

Let $\pi$ be a transcript that $T$ outputs when given $z$ .

•

Let $\pi^{*}$ be a transcript that $T^{*}$ outputs when given $z$ .

•

Let $\pi^{\prime}$ be a transcript of $\Pi$ when given inputs $(X^{\prime},Y^{\prime})$ that are uniformly distributed in $G^{-1}(z)$ .

Our end goal is to prove that $\pi$ and $\pi^{\prime}$ are $2^{-\frac{\eta}{8}\cdot b}$ -close. In order to do so, we will first prove that $\pi^{*}$ is $(\frac{1}{2}\cdot 2^{-\frac{\eta}{8}\cdot b})$ -close to $\pi^{\prime}$ . We will then prove that $\pi$ is $2^{-b}$ -close to $\pi^{*}$ . Together, the two results imply that $\pi$ is $2^{-\frac{\eta}{8}\cdot b}$ -close to $\pi^{\prime}$ , as required.

$\pi^{*}$ is close to $\pi^{\prime}$ .

We first prove that $\pi^{*}$ is $(\frac{1}{2}\cdot 2^{-\frac{\eta}{8}\cdot b})$ -close to $\pi^{\prime}$ . To this end, we construct a coupling of $\pi^{*}$ and $\pi^{\prime}$ such that $\Pr\left[\pi^{*}\neq\pi\right]\leq\frac{1}{2}\cdot 2^{-\frac{\eta}{8}\cdot b}$ . Essentially, we construct the coupling by going over the simulation step-by-step and using the uniform marginals lemma to argue that at each step, $X$ and $X^{\prime}$ are close and can therefore be coupled (and similarly for $Y$ and $Y^{\prime}$ ). We start by setting some notation: for every $i\in\left[r\right]$ , let us denote by $\mathcal{X}_{i}\times\mathcal{Y}_{i}$ be the rectangle $\mathcal{X}\times\mathcal{Y}$ from 5.1 at the end of the $i$ -th round of the simulation of $T^{*}$ (if $T^{*}$ halts before the $i$ -th round ends, set $\mathcal{X}_{i}\times\mathcal{Y}_{i}$ to be the rectangle $\mathcal{X}\times\mathcal{Y}$ at the end of the simulation). In our proof, we construct, for every $i\in\left[r\right]$ :

•

A random rectangle $\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime}$ that is jointly distributed with $X^{\prime},Y^{\prime}$ with the following property: conditioned on a specific choice of $\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime}$ , the pair $(X^{\prime},Y^{\prime})$ is uniformly distributed over $(\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime})\cap G^{-1}(z)$ .

•

A coupling of $\mathcal{X}_{i}\times\mathcal{Y}_{i}$ and $\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime}$ such that $\Pr\left[\mathcal{X}_{i}\times\mathcal{Y}_{i}\neq\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime}\right]\leq\frac{1}{2}\cdot 2^{-\frac{\eta}{8}\cdot b}\cdot\frac{i}{2\cdot n\cdot b}$ .

Observe that if we can construct such rectangles and couplings, then it follows that $\pi^{*}$ and $\pi^{\prime}$ are close. To see it, observe that at any given point during the simulation, all the inputs in the rectangle $\mathcal{X}\times\mathcal{Y}$ are consistent with the transcript $\pi$ . Hence, if $\mathcal{X}_{r}\times\mathcal{Y}_{r}=\mathcal{X}_{r}^{\prime}\times\mathcal{Y}_{r}^{\prime}$ , it necessarily means that the inputs $(X^{\prime},Y^{\prime})$ are consistent with the transcript $\pi$ , so $\pi=\pi^{\prime}$ . It follows that

[TABLE]

as required.

It remains to construct the rectangles $\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime}$ and the associated couplings. We construct them by induction. Let $i\in\left[r\right]$ , and suppose we have already constructed $\mathcal{X}_{i-1}^{\prime}\times\mathcal{Y}_{i-1}^{\prime}$ and its associated coupling (here, if $i=1$ we set both $\mathcal{X}_{i-1}\times\mathcal{Y}_{i-1}$ and $\mathcal{X}_{i-1}^{\prime}\times\mathcal{Y}_{i-1}^{\prime}$ to $\Lambda^{n}\times\Lambda^{n}$ ). The $i$ -th coupling first samples $\mathcal{X}_{i-1}\times\mathcal{Y}_{i-1}$ and $\mathcal{X}_{i-1}^{\prime}\times\mathcal{Y}_{i-1}^{\prime}$ from the $(i-1)$ -th coupling. If they are different, then we set $\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime}$ arbitrarily and assume that the coupling failed (i.e., $\mathcal{X}_{i}\times\mathcal{Y}_{i}$ and $\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime}$ are different). Suppose now that $\mathcal{X}_{i-1}\times\mathcal{Y}_{i-1}$ and $\mathcal{X}_{i-1}^{\prime}\times\mathcal{Y}_{i-1}^{\prime}$ are equal, and condition on some specific choice of this rectangle. If the tree $T^{*}$ has already halted by this point, we set $\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime}=\mathcal{X}_{i-1}^{\prime}\times\mathcal{Y}_{i-1}^{\prime}$ . Otherwise, we proceed as follows.

Let $(X,Y)$ be a random pair that is uniformly distributed over $\mathcal{X}_{i-1}\times\mathcal{Y}_{i-1}$ , and recall that due to our conditioning, the pair $(X^{\prime},Y^{\prime})$ is uniformly distributed over $(\mathcal{X}_{i-1}^{\prime}\times\mathcal{Y}_{i-1}^{\prime})\cap G^{-1}(z)$ . We construct the rest of the coupling by following the simulation step-by-step. For Step 1, with probability

[TABLE]

we assume that the coupling failed and set $\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime}$ arbitrarily. Otherwise, we condition both $X$ and $X^{\prime}$ on not taking a dangerous value. In order to analyze the probability of failure, recall that at the beginning of this step, $(X,Y)$ are $(\rho,\tau)$ -structured, where

[TABLE]

where the last inequality can be made to hold by choosing $h$ to be sufficiently large. Hence, our main technical lemma (3.9) implies that the probability that $X_{\mathrm{free}(\rho)}$ is $\varepsilon$ -dangerous for $Y_{\mathrm{free}(\rho)}$ is at most

[TABLE]

Moreover, the uniform marginals lemma (3.4) implies that $X^{\prime}$ is $\left(\frac{1}{8}\cdot 2^{-\frac{\eta}{8}\cdot b}\cdot\frac{1}{2\cdot n\cdot b}\right)$ -close to $X$ and therefore the probability that $X_{\mathrm{free}(\rho)}^{\prime}$ is $\varepsilon$ -dangerous for $Y_{\mathrm{free}(\rho)}$ is at most $2\cdot\frac{1}{8}\cdot 2^{-\frac{\eta}{8}\cdot b}\cdot\frac{1}{2\cdot n\cdot b}$ . Hence, the failure probability at this step is at most $\frac{1}{4}\cdot 2^{-\frac{\eta}{8}\cdot b}\cdot\frac{1}{2\cdot n\cdot b}$ . Note that if the coupling does not fail, $X$ is conditioned on an event of probability at least $\frac{1}{2}$ , and therefore after the conditioning $X$ and $Y$ are $(\rho,\tau-\frac{1}{b})$ -structured.

For Steps 2 and 4, let $M$ and $\mathcal{X}^{j}$ be the message and partition class that are distributed according to the input $X$ respectively. Let $M^{\prime}$ and ${\mathcal{X}^{j}}^{\prime}$ be the corresponding message and class of $X^{\prime}$ , Since $X$ and $Y$ are $(\rho,\tau-\frac{1}{b})$ -structured, it can again be showed by the uniform marginals lemma that $X$ and $X^{\prime}$ are $\left(\frac{1}{8}\cdot 2^{-\frac{\eta}{8}\cdot b}\cdot\frac{1}{2\cdot n\cdot b}\right)$ -close, and therefore the pair $(M,\mathcal{X}^{j})$ is $\left(\frac{1}{8}\cdot 2^{-\frac{\eta}{8}\cdot b}\cdot\frac{1}{2\cdot n\cdot b}\right)$ -close to the pair $(M^{\prime},{\mathcal{X}^{j}}^{\prime})$ . This implies that there exists a coupling of $(M,\mathcal{X}^{j})$ and $(M^{\prime},{\mathcal{X}^{j}}^{\prime})$ such that the probability that they differ is at most $\frac{1}{8}\cdot 2^{-\frac{\eta}{8}\cdot b}\cdot\frac{1}{2\cdot n\cdot b}$ . We sample $(M,\mathcal{X}^{j})$ and $(M^{\prime},{\mathcal{X}^{j}}^{\prime})$ from this coupling. If they differ, we assume that the coupling failed, and set $\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime}$ arbitrarily. Otherwise, we condition both $X$ and $X^{\prime}$ on being consistent with the message $M$ and the class $\mathcal{X}^{j}$ , and denote by $I_{j},x_{I_{j}}$ the set and values associated with $\mathcal{X}^{j}$ . Finally, for Step 5, if $p_{\geq j}\leq\frac{1}{8}\cdot 2^{-\frac{\eta}{8}\cdot b}\cdot\frac{1}{2\cdot n\cdot b}$ , then we assume that the coupling fails and set $\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime}$ arbitrarily (note that this happens with probability at most $\frac{1}{8}\cdot 2^{-\frac{\eta}{8}\cdot b}\cdot\frac{1}{n\cdot b}$ ).

At this point, we set $\mathcal{X}_{i}^{\prime}=\mathcal{X}^{j}$ , and set $\mathcal{Y}_{i}^{\prime}$ to be the set of inputs $y\in\mathcal{Y}_{i-1}$ for which $g(x_{I_{j}},y_{I_{j}})=z_{I_{j}}$ . It is easy to see that this choice satisfies $\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime}=\mathcal{X}_{i}\times\mathcal{Y}_{i}$ . To analyze the total failure probability of this coupling, observe that by the induction assumption, the failure probability of the $(i-1)$ -th coupling is at most $\frac{1}{2}\cdot 2^{-\frac{\eta}{8}\cdot b}\cdot\frac{i-1}{2\cdot n\cdot b}$ , and the other failure events discussed above at to that a failure probability of at most

[TABLE]

Hence, the failure probability of the $i$ -th coupling is at most $\frac{3}{4}\cdot 2^{-\frac{\eta}{4}\cdot b}\cdot\frac{i}{n\cdot b}$ , as required.

It remains to show that conditioned on any specific choice of $\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime}$ , the pair $(X^{\prime},Y^{\prime})$ is uniformly distributed over $(\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime})\cap G^{-1}(z)$ . In the cases where the coupling fails, we can ensure this property holds by first sampling $(X^{\prime},Y^{\prime})$ and then setting $\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime}=\left\{(X^{\prime},Y^{\prime})\right\}$ . Suppose that the coupling did not fail. Recall that by the induction assumption, it holds that conditioned on the choice of $\mathcal{X}_{i-1}^{\prime}\times\mathcal{Y}_{i-1}^{\prime}$ , the pair $(X^{\prime},Y^{\prime})$ is uniformly distributed over $(\mathcal{X}_{i-1}^{\prime}\times\mathcal{Y}_{i-1}^{\prime})\cap G^{-1}(z)$ . Observe that all the $i$ -th coupling changes in the distribution of $(X^{\prime},Y^{\prime})$ is to condition it on being in $\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime}$ . Thus, at the end of the $i$ -th coupling, the pair $(X^{\prime},Y^{\prime})$ is uniformly distributed over $(\mathcal{X}_{i}^{\prime}\times\mathcal{Y}_{i}^{\prime})\cap G^{-1}(z)$ , as required.

$\pi$ is close to $\pi^{*}$ .

We turn to prove that $\pi$ is $2^{-b}$ -close to $\pi^{*}$ . Let $\mathcal{E}$ denote the event that the tree $T$ halts in Step 3. It is not hard to see that the statistical distance between $\pi$ and $\pi^{*}$ is exactly $\Pr\left[\mathcal{E}\right]$ . We show that $\Pr\left[\mathcal{E}\right]<2^{-b}$ , and this will conclude the proof of correctness.

Intuitively, the reason that $\Pr\left[\mathcal{E}\right]<2^{-b}$ is that the tree halts only if the probability of the transcript up to that point is less than $2^{-C-b}$ : to see it, observe that the variable $K$ measures (roughly) the logarithm of the probability of the transcript up to that point, and recall that the tree halts when $K>C+b$ . By taking union bound over all possible transcripts, we get that the halting probability is less than $2^{-b}$ .

Unfortunately, the formal proof contains a messier calculation: the reason is that the probabilities of the messages as measured by $K$ depend on the choices of the classes $\mathcal{X}^{j}$ in Step 4, so the foregoing intuition only holds for a given choice of these classes. Thus, the formal proof also sums over all the possible choices of classes $\mathcal{X}^{j}$ and conditions on those choices. However, while the resulting calculation is more complicated, the idea is the same.

In order to facilitate the formal proof, we setup some useful notation. Let $M_{1},\ldots,M_{r}$ be the messages that are chosen in Step 2 of the simulation (so $\pi=(M_{1},\ldots,M_{r})$ ), and let $J=(j_{1},\ldots,j_{r})$ be the indices of the classes that are chosen in Step 4 (if the tree halts before the $i$ -th round, set $M_{i}$ to the empty string and set $j_{i}=1$ ). Observe that the execution of $T$ is completely determined by $\pi$ and $J$ , and in particular, $\pi$ and $J$ determine whether the event $\mathcal{E}$ happens or not. With some abuse of notation, let us denote the fact that a particular choice of $(\pi,J)$ is consistent with $\mathcal{E}$ by $(\pi,J)\in\mathcal{E}$ . For any $i\in\left[r\right]$ , let us denote $\pi_{\leq i}=(M_{1},\ldots,M_{i-1})$ and $J_{\leq i}=(j_{1},\ldots,j_{i})$ . Observe that at the $i$ -th round, the probability $p_{M_{i}}$ in Step 2 is determined by $\pi_{<i}$ and $J_{<i}$ , and let us denote by $p_{M_{i}\texttt{$ \mid $}\pi_{<i},J_{<i}}$ this probability for a given choice of $\pi_{<i}$ and $J_{<i}$ . We are now ready to prove the upper bound on $\Pr\left[\mathcal{E}\right]$ . It holds that

[TABLE]

Next, observe that for every choice of $(\pi,J)$ , the corresponding value of $K$ at the end of the simulation is

[TABLE]

In particular, if $(\pi,J)\in\mathcal{E}$ , then it holds that $K>C+b$ , and therefore

[TABLE]

It follows that

[TABLE]

as required. In the calculation above, Equality 9 follows since each sum goes over all the possible choices of $j_{i}$ , and Inequality 10 follows since $\Pi$ has at most $2^{C}$ distinct transcripts.

5.3 The query complexity of $T$

The analysis of the query complexity here is similar to the analysis of the deterministic query complexity. The main difference is the following: In the deterministic setting, the increase in the deficiency due to a single message $M$ was upper bounded by $\left|M\right|$ , and therefore the total increase in the deficiency was upper bounded by $\left|C\right|$ . In the randomized case, the increase in the deficiency due to a single message $M$ is upper bounded by $\log\frac{1}{p_{M}}$ . Thus, we upper bound the total increase in the deficiency by $K$ . Since $K$ is never larger than $C+b$ due to Step 3, we conclude that the query complexity is at most $O(\frac{C+b}{b})=O(\frac{C}{b}+1)$ . Details follow.

As before, we define the deficiency of $X,Y$ to be

[TABLE]

We prove that whenever the protocol transmits a message $M$ , the deficiency increases by $O(\log\frac{1}{p_{M}})$ , and that whenever the tree $T$ makes a query, the deficiency is decreased by $\Omega(b)$ . Since the deficiency is always non-negative, and $K$ is never more than $C+b$ , it will follow that the tree must make at most $O(\frac{C+b}{b})$ queries. More specifically, we prove that in every round, the first two steps increase the deficiency by $\log\frac{1}{p_{M}}+1$ , and the rest of the steps decrease the deficiency by $\Omega(\left|I_{j}\right|\cdot b)$ , and this will imply the desired result.

Fix a round of the simulation, and assume without loss of generality that the message is sent by Alice. We start by analyzing Step 1. At this step, the tree conditions $X_{\mathrm{free}(\rho)}$ on taking dangerous values that are not $\varepsilon$ -dangerous for $Y_{\mathrm{free}(\rho)}$ . Using the same calculation as in LABEL:Subsec:randomized-correctness, it can be showed that the probability of non-dangerous values is at least $\frac{1}{2}$ . Therefore, this step increases the deficiency by at most $1$ bit. Next, in Step 2, the tree conditions $X$ on an event of choosing the message $M$ , whose probability is $p_{M}$ by definition. Thus, this step increases the deficiency by at most $\log\frac{1}{p_{M}}$ bits. All in all, we showed that the first two steps of the simulation increase the deficiency by at most $\log\frac{1}{p_{M}}+1$ bits.

Let $\mathcal{X}^{j}$ be the partition class that is sampled in Step 4, and let $I_{j},x_{j}$ be the set and value that are associated with $\mathcal{X}^{j}$ . We turn to show that the rest of the steps decrease the deficiency by $\Omega(b\cdot\left|I_{j}\right|)$ . Those steps apply the following changes to the deficiency:

•

Step 4 conditions $X$ on the event $X_{\mathrm{free}(\rho)}\in\mathcal{X}^{j}$ . By 3.7, this conditioning increases the deficiency at most $\delta\cdot b\cdot\left|I\right|+\log\frac{1}{p_{\geq j}}$ . Recall that by Step 5, the probability $p_{\geq j}$ can never be less than $\frac{1}{8}\cdot 2^{-\frac{\eta}{8}\cdot b}\cdot\frac{1}{2\cdot n\cdot b}$ . Thus, this step increases the deficiency by at most

[TABLE]

•

Step 6 removes the set $I$ from $\mathrm{free}(\rho)$ . Looking at the definition of deficiency, this change decreases the term of $2\cdot b\cdot\left|\mathrm{free}(\rho)\right|$ by $2\cdot b\cdot\left|I\right|$ , decreases the term $H_{\infty}(Y_{\mathrm{free}(\rho)})$ by at most $b\cdot\left|I\right|$ (2.3), and does not change the term $H_{\infty}(X_{\mathrm{free}(\rho)})$ (since at this point $X_{I}$ is fixed to $x_{I}$ ). All in all, the deficiency is decreased by at least $b\cdot\left|I\right|$ .

•

Finally, Step 7 conditions $Y$ on the event $g^{I}(x_{I},Y_{I})=\rho_{I}$ . This event has probability at least $2^{-\left|I\right|-1}$ by the assumption that $X$ is not dangerous (and hence not leaking). Thus, this conditioning increases the deficiency by at most $\left|I\right|+1$ .

Summing all those effects together, we get that the deficiency was decreased by at least

[TABLE]

By choosing $c$ to be sufficiently large, we can make sure that $1-\delta-\frac{\eta}{8}-\frac{7}{c}$ is a positive constant independent of $b$ and $n$ , and therefore the decrease in the deficiency will be at least $\Omega(b\cdot\left|I\right|)$ , as required. To see it, observe that

[TABLE]

Thus, if we choose $c$ such that $\frac{c}{\log c}>\frac{h+14}{2\cdot\eta^{2}}$ , the expression on the right-hand side will be a constant that is strictly smaller than $1$ . It is not hard to see that we can choose such a value of $c$ that satisfies $c=O(\frac{1}{\eta^{2}}\cdot\log\frac{1}{\eta})$ .

Acknowledgement.

We thank Daniel Kane for some very enlightening conversations and suggestions. The authors would also like to thank anonymous referees for comments that improved the presentation of this work. Part of this work was carried out while the authors were visiting the Simons Institute for the Theory of Computing.

Bibliography43

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[BBCR 10] Boaz Barak, Mark Braverman, Xi Chen, and Anup Rao. How to compress interactive communication. In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010 , pages 67–76, 2010.
2[BFS 86] László Babai, Peter Frankl, and Janos Simon. Complexity classes in communication complexity theory (preliminary version). In 27th Annual Symposium on Foundations of Computer Science, Toronto, Canada, 27-29 October 1986 , pages 337–347, 1986.
3[BPSW 06] Paul Beame, Toniann Pitassi, Nathan Segerlind, and Avi Wigderson. A strong direct product theorem for corruption and the multiparty communication complexity of disjointness. Computational Complexity , 15(4):391–432, 2006.
4[BR 14] Mark Braverman and Anup Rao. Information equals amortized communication. IEEE Trans. Information Theory , 60(10):6058–6069, 2014.
5[Bra 17] Mark Braverman. Interactive information complexity. SIAM Review , 59(4):803–846, 2017.
6[BRWY 13] Mark Braverman, Anup Rao, Omri Weinstein, and Amir Yehudayoff. Direct products in communication complexity. In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013, 26-29 October, 2013, Berkeley, CA, USA , pages 746–755, 2013.
7[CFK + 19] Arkadev Chattopadhyay, Yuval Filmus, Sajin Koroth, Or Meir, and Toniann Pitassi. Query-to-communication lifting for BPP using inner product. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9-12, 2019, Patras, Greece , volume 132 of LIP Ics , pages 35:1–35:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
8[CKLM 17] Arkadev Chattopadhyay, Michal Koucký, Bruno Loff, and Sagnik Mukhopadhyay. Simulation theorems via pseudorandom properties. Co RR , abs/1704.06807, 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Query-to-Communication Lifting Using Low-Discrepancy Gadgets††thanks: This work subsumes an earlier work that appeared in ICALP 2019 [CFK*+*19].

Abstract

1 Introduction

1.1 Background

Applications of lifting theorems.

Lifting theorems as a generalization of direct-sum theorems.

1.2 Our results

Definition 1.1**.**

Theorem 1.2** (Main theorem).**

Remark 1.3**.**

Unifying deterministic and randomized lifting theorems.

1.3 Our techniques

The simulation argument.

Our contribution.

1.4 Open problems

Conjecture 1.4**.**

Conjecture 1.5**.**

2 Preliminaries

Search problems.

2.1 Decision trees

2.1.1 Parallel decision-trees

2.2 Fourier analysis

Fact 2.1**.**

2.3 Probability

Fact 2.2**.**

Fact 2.3**.**

Fact 2.4**.**

2.3.1 Vazirani’s Lemma

Lemma 2.5** ([GLM*+*16]).**

Lemma 2.6**.**

2.3.2 Coupling

Fact 2.7**.**

2.4 Prefix-free codes

Fact 2.8**.**

2.5 Discrepancy

Definition 1.1.

Lemma 2.9**.**

Lemma 2.10**.**

Theorem 2.11** ([LSS08]).**

Corollary 2.12**.**

Corollary 2.13**.**

3 Lifting Machinery

Theorem 1.2.

Definition 3.1**.**

Definition 3.2**.**

Definition 3.3** (following [GPW17]).**

Lemma 3.4** (Uniform marginals lemma).**

Remark 3.5**.**

Proposition 3.6**.**

Lemma 3.7** (Density-restoring partition [GPW17]).**

Definition 3.8**.**

Lemma 3.9** (Main lemma).**

3.1 Proof of the uniform marginals lemma

Lemma 3.4.

Proposition 3.10** (Generalization of [GLM*+*16, Lemma 13]).**

3.2 Proof of the main technical lemma

Definition 3.8.

Lemma 3.9.

Definition 3.11**.**

Claim 3.12**.**

Definition 3.13**.**

Claim 3.14**.**

Proposition 3.15**.**

4 The deterministic lifting theorem

Theorem 4.1** (Deterministic lifting theorem).**

4.1 The construction of TTT

The depth of TTT.

4.2 The correctness of TTT

4.3 The query complexity of TTT

5 The randomized lifting theorem

Theorem 5.1** (Randomized lifting theorem).**

5.1 The construction of TTT

The depth of TTT.

5.2 The correctness of TTT

Query-to-Communication Lifting Using Low-Discrepancy Gadgets††thanks: This work subsumes an earlier work that appeared in ICALP 2019 [CFK+19].

Definition 1.1.

Theorem 1.2 (Main theorem).

Remark 1.3.

Conjecture 1.4.

Conjecture 1.5.

Fact 2.1.

Fact 2.2.

Fact 2.3.

Fact 2.4.

Lemma 2.5 ([GLM+16]).

Lemma 2.6.

Fact 2.7.

Fact 2.8.

Lemma 2.9.

Lemma 2.10.

Theorem 2.11 ([LSS08]).

Corollary 2.12.

Corollary 2.13.

Definition 3.1.

Definition 3.2.

Definition 3.3 (following [GPW17]).

Lemma 3.4 (Uniform marginals lemma).

Remark 3.5.

Proposition 3.6.

Lemma 3.7 (Density-restoring partition [GPW17]).

Definition 3.8.

Lemma 3.9 (Main lemma).

Proposition 3.10 (Generalization of [GLM+16, Lemma 13]).

Definition 3.11.

Claim 3.12.

Definition 3.13.

Claim 3.14.

Proposition 3.15.

Theorem 4.1 (Deterministic lifting theorem).

4.1 The construction of $T$

The depth of $T$ .

4.2 The correctness of $T$

4.3 The query complexity of $T$

Theorem 5.1 (Randomized lifting theorem).

5.1 The construction of $T$

The depth of $T$ .

5.2 The correctness of $T$

$\pi^{*}$ is close to $\pi^{\prime}$ .

$\pi$ is close to $\pi^{*}$ .

5.3 The query complexity of $T$

Acknowledgement.