Conditional Lower Bounds for Space/Time Tradeoffs

Isaac Goldstein; Tsvi Kopelowitz; Moshe Lewenstein; Ely Porat

arXiv:1706.05847·cs.DS·July 26, 2017

Conditional Lower Bounds for Space/Time Tradeoffs

Isaac Goldstein, Tsvi Kopelowitz, Moshe Lewenstein, Ely Porat

PDF

TL;DR

This paper investigates conditional space lower bounds for data structures, revealing that many problems with known polynomial time lower bounds also exhibit space-time tradeoffs, thus deepening understanding of their computational complexity.

Contribution

It introduces a novel framework for polynomial space conjectures, establishing space lower bounds and tradeoffs for well-studied problems based on hardness assumptions.

Findings

01

Many problems with polynomial time lower bounds also have space hardness.

02

Tradeoffs between space and query time can be smooth or singular.

03

Matching upper bounds are presented for several space hardness conjectures.

Abstract

In recent years much effort has been concentrated towards achieving polynomial time lower bounds on algorithms for solving various well-known problems. A useful technique for showing such lower bounds is to prove them conditionally based on well-studied hardness assumptions such as 3SUM, APSP, SETH, etc. This line of research helps to obtain a better understanding of the complexity inside P. A related question asks to prove conditional space lower bounds on data structures that are constructed to solve certain algorithmic tasks after an initial preprocessing stage. This question received little attention in previous research even though it has potential strong impact. In this paper we address this question and show that surprisingly many of the well-studied hard problems that are known to have conditional polynomial time lower bounds are also hard when concerning space. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

11affiliationtext: Bar-Ilan University

{goldshi,moshe,porately}@cs.biu.ac.il22affiliationtext: University of Waterloo

[email protected]

Conditional Lower Bounds for Space/Time Tradeoffs

Isaac Goldstein This research is supported by the Adams Foundation of the Israel Academy of Sciences and Humanities

Tsvi Kopelowitz Part of this work took place while the second author was at University of Michigan. This work is supported in part by the Canada Research Chair for Algorithm Design, NSF grants CCF-1217338, CNS-1318294, and CCF-1514383

Moshe Lewenstein This work was partially supported by an ISF grant #1278/16

Ely Porat ††footnotemark:

Abstract

In recent years much effort has been concentrated towards achieving polynomial time lower bounds on algorithms for solving various well-known problems. A useful technique for showing such lower bounds is to prove them conditionally based on well-studied hardness assumptions such as 3SUM, APSP, SETH, etc. This line of research helps to obtain a better understanding of the complexity inside P.

A related question asks to prove conditional space lower bounds on data structures that are constructed to solve certain algorithmic tasks after an initial preprocessing stage. This question received little attention in previous research even though it has potential strong impact.

In this paper we address this question and show that surprisingly many of the well-studied hard problems that are known to have conditional polynomial time lower bounds are also hard when concerning space. This hardness is shown as a tradeoff between the space consumed by the data structure and the time needed to answer queries. The tradeoff may be either smooth or admit one or more singularity points.

We reveal interesting connections between different space hardness conjectures and present matching upper bounds. We also apply these hardness conjectures to both static and dynamic problems and prove their conditional space hardness.

We believe that this novel framework of polynomial space conjectures can play an important role in expressing polynomial space lower bounds of many important algorithmic problems. Moreover, it seems that it can also help in achieving a better understanding of the hardness of their corresponding problems in terms of time.

1 Introduction

1.1 Background

Lately there has been a concentrated effort to understand the time complexity within P, the class of decision problems solvable by polynomial time algorithms. The main goal is to explain why certain problems have time complexity that seems to be non-optimal. For example, all known efficient algorithmic solutions for the 3SUM problem, where we seek to determine whether there are three elements $x,y,z$ in input set $S$ of size $n$ such that $x+y+z=0$ , take $\tilde{O}(n^{2})$ time111The $\tilde{O}$ and $\tilde{\Omega}$ notations suppress polylogarithmic factors. However, the only real lower bound that we know is the trivial $\Omega(n)$ . Likewise, we know how to solve the all pairs shortest path, APSP, problem in $\tilde{O}(n^{3})$ time but we cannot even determine whether it is impossible to obtain an $\tilde{O}(n^{2})$ time algorithm. One may note that it follows from the time-hierarchy theorem that there exist problems in P with complexity $\Omega(n^{k})$ for every fixed $k$ . Nevertheless, such a separation for natural practical problems seems to be hard to achieve.

The collaborated effort to understand the internals of P has been concentrated on identifying some basic problems that are conjectured to be hard to solve more efficiently (by polynomial factors) than their current known complexity. These problems serve as a basis to prove conditional hardness of other problems by using reductions. The reductions are reminiscent of NP-complete reductions but differ in that they are restricted to be of time complexity strictly smaller (by a polynomial factor) than the problem that we are reducing to. Examples of such hard problems include the well-known 3SUM problem, the fundamental APSP problem, (combinatorial) Boolean matrix multiplication, etc. Recently, conditional time lower bounds have been proven based on the conjectured hardness of these problems for graph algorithms [4, 42], edit distance [13], longest common subsequence (LCS) [3, 15], dynamic algorithms [5, 36], jumbled indexing [11], and many other problems [1, 2, 6, 7, 14, 25, 31, 34, 40].

1.2 Motivation

In stark contrast to polynomial time lower bounds, little effort has been devoted to finding polynomial space conditional lower bounds. An example of a space lower bound appears in the work of Cohen and Porat [19] and Pǎtraşcu and Roditty [38] where lower bounds are shown on the size of a distance oracle for sparse graphs based on a conjecture about the best possible data structure for a set intersection problem (which we call set disjointness in order to differ it from its reporting variant).

A more general question is, for algorithmic problems, what conditional lower bounds of a space/time tradeoff can be shown based on the set disjointness (intersection) conjecture? Even more general is to discover what space/time tradeoffs can be achieved based on the other algorithmic problems that we assumed are hard (in the time sense)? Also, what are the relations between these identified ”hard” problems in the space/time tradeoff sense? These are the questions which form the basis and framework of this paper.

Throughout this paper we show connections between different hardness assumptions, show some matching upper bounds and propose several conjectures based on this accumulated knowledge. Moreover, we conjecture that there is a strong correlation between polynomial hardness in time and space. We note that in order to discuss space it is often more natural to consider data structure variants of problems and this is the approach we follow in this paper.

1.3 Our Results

Set Disjointness.

In the SetDisjointness problem mentioned before, it is required to preprocess a collection of $m$ sets $S_{1},\cdots,S_{m}\subset U$ , where $U$ is the universe of elements and the total number of elements in all sets is $N$ . For a query, a pair of integers $(i,j)$ $(1\leq i,j\leq m)$ is given and we are asked whether $S_{i}\cap S_{j}$ is empty or not. A folklore conjecture, which appears in [18, 38], suggests that to achieve a constant query time the space of the data structure constructed in the preprocessing stage needs to be $\tilde{\Omega}(N^{2})$ . We call this conjecture the SetDisjointness conjecture. This conjecture does not say anything about the case where we allow higher query time. Therefore, we suggest a stronger conjecture which admits a full tradeoff between the space consumed by the data structure (denoted by $S$ ) and the query time (denoted by $T$ ). This is what we call the Strong SetDisjointness conjecture. This conjecture states that for solving SetDisjointness with a query time $T$ our data structure needs $\tilde{\Omega}(N^{2}/T^{2})$ space. A matching upper bound exists for this problem by generalizing ideas from [18] (see also [32]). Our new SetDisjointness conjecture can be used to admit more expressive space lower bounds for a full tradeoff between space and query time.

3SUM Indexing.

One of the basic and frequently used hardness conjectures is the celebrated 3SUM conjecture. This conjecture was used for about 20 years to show many conditional time lower bounds on various problems. However, we focus on what can be said about its space behavior. To do this, it is natural to consider a data structure version of 3SUM which allows one to preprocess the input set $S$ . Then, the query is an external number $z$ for which we need to answer whether there are $x,y\in S$ such that $x+y=z$ . It was pointed out by Chan and Lewenstein [16] that all known algorithms for 3SUM actually work within this model as well. We call this problem 3SUM Indexing. On one hand, this problem can easily be solved using $O(n^{2})$ space by sorting $x+y$ for all $x,y\in S$ and then searching for $z$ in $\tilde{O}(1)$ time. On the other hand, by just sorting $S$ we can answer queries by a well-known linear time algorithm. The big question is whether we can obtain better than $\tilde{\Omega}(n^{2})$ space while using just $\tilde{O}(1)$ time query? Can it be done even if we allow $\tilde{O}(n^{1-\Omega(1)})$ query time? This leads us to our two new hardness conjectures. The 3SUM-Indexing conjecture states that when using $\tilde{O}(1)$ query time we need $\tilde{\Omega}(n^{2})$ space to solve 3SUM-Indexing. In the Strong 3SUM-Indexing conjecture we say that even when using $\tilde{O}(n^{1-\Omega(1)})$ query time we need $\tilde{\Omega}(n^{2})$ space to solve 3SUM-Indexing.

3SUM Indexing and Set Disjointness.

We prove connections between the SetDisjointness conjectures and the 3SUM-Indexing conjectures. Specifically, we show that the Strong 3SUM-Indexing conjecture implies the Strong SetDisjointness conjecture, while the SetDisjointness conjecture implies the 3SUM-Indexing conjecture. This gives some evidence towards establishing the difficulty within the 3SUM-Indexing conjectures. The usefulness of these conjectures should not be underestimated. As many problems are known to be 3SUM-hard these new conjectures can play an important role in achieving space lower bounds on their corresponding data structure variants. Moreover, it is interesting to point on the difference between SetDisjointness which admits smooth tradeoff between space and query time and 3SUM-Indexing which admits a big gap between the two trivial extremes. This may explain why we are unable to show full equivalence between the hardness conjectures of the two problems. Moreover, it can suggest a separation between problems with smooth space-time behavior and others which have no such tradeoff but rather two ”far” extremes.

Generalizations.

Following the discussion on the SetDisjointness and the 3SUM-Indexing conjectures we investigate their generalizations.

I. k-Set Disjointness and (k+1)-SUM Indexing.

The first generalization is a natural parametrization of both problems. In the SetDisjointness problem we query about the emptiness of the intersection between two sets, while in the 3SUM-Indexing problem we ask, given a query number $z$ , whether two numbers of the input $S$ sum up to $z$ . In the parameterized versions of these problems we are interested in the emptiness of the intersection between k sets and ask if k numbers sum up to a number given as a query. These generalized variants are called k-SetDisjointness and (k+1)-SUM-Indexing respectively. For each problem we give corresponding space lower bounds conjectures which generalize those of SetDisjointness and 3SUM-Indexing. These conjectures also have corresponding strong variants which are accompanied by matching upper bounds. We prove that the k-SetDisjointness conjecture implies (k+1)-SUM-Indexing conjecture via a novel method using linear equations.

II. k-Reachability.

A second generalization is the problem we call k-Reachability. In this problem we are given as an input a directed sparse graph $G=(V,E)$ for preprocessing. Afterwards, for a query, given as a pair of vertices $u,v$ , we wish to return if there is a path from $u$ to $v$ consisting of at most $k$ edges. We provide an upper bound on this problem for every fixed $k\geq 1$ . The upper bound admits a tradeoff between the space of the data structure (denoted by $S$ ) and the query time (denoted by $T$ ), which is $ST^{2/(k-1)}=O(n^{2})$ . We argue that this upper bound is tight. That is, we conjecture that if query takes $T$ time, the space must be $\tilde{\Omega}(\frac{n^{2}}{T^{2/(k-1)}})$ . We call this conjecture the k-Reachability conjecture.

We give three indications towards the correctness of this conjecture. First, we prove that the base case, where $k=2$ , is equivalent to the SetDisjointness problem. This is why this problem can be thought of as a generalization of SetDisjointness.

Second, if we consider non-constant $k$ then the smooth tradeoff surprisingly disappears and we get ”extreme behavior” as $\tilde{\Omega}(\frac{n^{2}}{T^{2/(k-1)}})$ eventually becomes $\tilde{\Omega}(n^{2})$ . This means that to answer reachability queries for non-constant path length, we can either store all answers in advance using $n^{2}$ space or simply answer queries from scratch using a standard graph traversal algorithm. The general problem where the length of the path from $u$ to $v$ is unlimited in length is sometimes referred to as the problem of constructing efficient reachability oracles. Pǎtraşcu in [37] leaves it as an open question if a data structure with less than $\tilde{\Omega}(n^{2})$ space can answer reachability queries efficiently. Moreover, Pǎtraşcu proved that for constant time query, truly superlinear space is needed. Our k-Reachability conjecture points to this direction, while admitting full space-time tradeoff for constant $k$ .

The third indication for the correctness of the k-Reachability conjecture comes from a connection to distance oracles. A distance oracle is a data structure that can be used to quickly answer queries about the shortest path between two given nodes in a preprocessed undirected graph. As mentioned above, the SetDisjointness conjecture was used to exclude some possible tradeoffs for sparse graphs. Specifically, Cohen and Porat [19] showed that obtaining an approximation ratio smaller than 2 with constant query time requires $\tilde{\Omega}(n^{2})$ space. Using a somewhat stronger conjecture Pǎtraşcu and Roditty [38] showed that a (2,1)-distance oracle for unweighted graphs with $m=O(n)$ edges requires $\tilde{\Omega}(n^{1.5})$ space. Later, this result was strengthened by Pǎtraşcu et al. [39]. However, these results do not exclude the possibility of compact distance oracles if we allow higher query time. For stretch-2 and stretch-3 in sparse graphs, Agarwal et. al. [9, 10] achieved a space-time tradeoff of $S\times T=O(n^{2})$ and $S\times T^{2}=O(n^{2})$ , respectively. Agarwal [8] also showed many other results for stretch-2 and below. We use our k-Reachability conjecture to prove that for stretch-less-than-(1+2/k) distance oracles $S\times T^{2/(k-1)}$ is bounded by $\tilde{\Omega}(n^{2})$ . This result is interesting in light of Agarwal [8] where a stretch-(5/3) oracle was presented which achieves a space-time tradeoff of $S\times T=O(n^{2})$ . This matches our lower bound, where $k=3$ , if our lower bound would hold not only for stretch-less-than-(5/3) but also for stretch-(5/3) oracles. Consequently, we see that there is strong evidence for the correctness of the k-Reachability conjecture.

Moreover, these observations show that on one hand k-Reachability is a generalization of SetDisjointness which is closely related to 3SUM-Indexing. On the other hand, k-Reachability is related to distance oracles which solve the famous APSP problem using smaller space by sacrificing the accuracy of the distance between the vertices. Therefore, the k-Reachability conjecture seems as a conjecture corresponding to the APSP hardness conjecture, while also admitting some connection with the celebrated 3SUM hardness conjecture.

SETH and Orthogonal Vectors.

After considering space variants of the 3SUM and APSP conjectures it is natural to consider space variants for the Strong Exponential Time Hypothesis (SETH) and the closely related conjecture of orthogonal vectors. SETH asserts that for any $\epsilon>0$ there is an integer $k>3$ such that k-SAT cannot be solved in $2^{(1-\epsilon)n}$ time. The orthogonal vectors time conjecture states that there is no algorithm that for every $c\geq 1$ , finds if there are at least two orthogonal vectors in a set of $n$ Boolean vectors of length $c\log{n}$ in $\tilde{O}(n^{2-\Omega(1)})$ time. We discuss the space variants of these conjectures in Section 7. However, we are unable to connect these conjectures and the previous ones. This is perhaps not surprising as the connection between SETH and the other conjectures even in the time perspective is very loose (see, for example, discussions in [5, 25]).

Boolean Matrix Multiplication.

Another problem which receives a lot of attention in the context of conditional time lower bounds is calculating Boolean Matrix Multiplication (BMM). We give a data structure variant of this well-known problem. We then demonstrate the connection between this problem and the problems of SetDisjointness and k-Reachability.

Applications.

Finally, armed with the space variants of many well-known conditional time lower bounds, we apply this conditional space lower bounds to some static and dynamic problems. This gives interesting space lower bound results on these important problems which sometimes also admits clear space-time tradeoff. We believe that this is just a glimpse of space lower bounds that can be achieved based on our new framework and that many other interesting results are expected to follow this promising route.

Figure 1 in Appendix 0.A presents a sketch of the results in this paper.

2 Set Intersection Hardness Conjectures

We first give formal definitions of the SetDisjointness problem and its enumeration variant:

Problem 1 (SetDisjointness Problem)

Preprocess a family $F$ of $m$ sets, all from universe $U$ , with total size $N=\sum_{S\in F}|S|$ so that given two query sets $S,S^{\prime}\in F$ one can determine if $S\cap S^{\prime}=\emptyset$ .

Problem 2 (SetIntersection Problem)

Preprocess a family $F$ of $m$ sets, all from universe $U$ , with total size $N=\sum_{S\in F}|S|$ so that given two query sets $S,S^{\prime}\in F$ one can enumerate the set $S\cap S^{\prime}$ .

Conjectures.

The SetDisjointness problem was regarded as a problem that admits space hardness. The hardness conjecture of the SetDisjointness problem has received several closely related formulations. One such formulation, given by Pǎtraşcu and Roditty [38], is as follows:

Conjecture 1

SetDisjointness Conjecture [Formulation 1]. Any data structure for the SetDisjointness problem where $|U|=\log^{c}m$ for a large enough constant $c$ and with a constant query time must use $\tilde{\Omega}(m^{2})$ space.

Another formulation is implicitly suggested in Cohen and Porat [18]:

Conjecture 2

SetDisjointness Conjecture [Formulation 2]. Any data structure for the SetDisjointness problem with constant query time must use $\tilde{\Omega}(N^{2})$ space.

There is an important distinction between the two formulations, which is related to the sparsity of SetDisjointness instances. This distinction follows from the following upper bound: store an $m\times m$ matrix of the answers to all possible queries, and then queries will cost constant time. The first formulation of the SetDisjointness conjecture states that if we want constant (or poly-logaritmic) query time, then this is the best we can do. At a first glance this makes the second formulation, whose bounds are in terms of $N$ and not $m$ , look rather weak. In particular, why would we ever be interested in a data structure that uses $O(N^{2})$ space when we can use one with $O(m^{2})$ space? The answer is that the two conjectures are the same if the sets are very sparse, and so at least in terms of $N$ , if one were to require a constant query time then by the second formulation the space must be at least $\Omega(N^{2})$ (which happens in the very sparse case).

Nevertheless, we present a more general conjecture, which in particular captures a tradeoff curve between the space usage and query time. This formulation captures the difficulty that is commonly believed to arise from the SetDisjointness problem, and matches the upper bounds of Cohen and Porat [18] (see also [32]).

Conjecture 3

Strong SetDisjointness Conjecture. Any data structure for the SetDisjointness problem that answers queries in $T$ time must use $S=\tilde{\Omega}(\frac{N^{2}}{T^{2}})$ space.

For example, a natural question to ask is “what is the smallest query time possible with linear space?”. This question is addressed, at least from a lower bound perspective, by the Strong SetDisjointness conjecture.

Conjecture 4

Strong SetIntersection Conjecture. Any data structure for the SetIntersection problem that answers queries in $O(T+op)$ time, where $op$ is the size of the output of the query, must use $S=\tilde{\Omega}(\frac{N^{2}}{T})$ space.

3 3SUM-Indexing Hardness Conjectures

In the classic 3SUM problem we are given an integer array $A$ of size $n$ and we wish to decide whether there are 3 distinct integers in $A$ which sum up to zero. Gajentaan and Overmars [23] showed that an equivalent formulation of this problem receives 3 integer arrays $A_{1}$ , $A_{2}$ , and $A_{3}$ , each of size $n$ , and the goal is to decide if there is a triplet $x_{1}\in A_{1},x_{2}\in A_{2}$ , and $x_{3}\in A_{3}$ that sum up to zero.

We consider the data structure variant of this problem which is formally defined as follows:

Problem 3 (3SUM-Indexing Problem)

Preprocess two integer arrays $A_{1}$ and $A_{2}$ , each of length $n$ , so that given a query integer $z$ we can decide whether there are $x\in A_{1}$ and $y\in A_{2}$ such that $z=x+y$ .

It is straightforward to maintain all possible $O(n^{2})$ sums of pairs in quadratic space, and then answer a query in $\tilde{O}(1)$ time. On the other extreme, if one does not wish to utilize more than linear space then one can sort the arrays separately during preprocssing time, and then a query can be answered in $\tilde{O}(n)$ time by scanning both of the sorted arrays in parallel and in opposite directions.

We introduce two conjectures with regards to the 3SUM-Indexing problem, which serve as natural candidates for proving polynomial space lower bounds.

Conjecture 5

3SUM-Indexing Conjecture: There is no solution for the 3SUM-Indexing problem with truly subquadratic space and $\tilde{O}(1)$ query time.

Conjecture 6

Strong 3SUM-Indexing Conjecture: There is no solution for the 3SUM-Indexing problem with truly subquadratic space and truly sublinear query time.

Notice that one can solve the classic 3SUM problem using a data structure for 3SUM-Indexing by preprocessing $A_{1}$ and $A_{2}$ , and answering $n$ 3SUM-Indexing queries on all of the values in $A_{3}$ .

Next, we prove theorems that show tight connections between the 3SUM-Indexing conjectures and the SetDisjointness conjectures. We note that the proofs of the first two theorems are similar to the proofs of [31], but with space interpretation.

Theorem 3.1

The Strong 3SUM-Indexing Conjecture implies the Strong SetDisjointness Conjecture.

Proof

A family $\mathcal{H}$ of hash functions from $[u]\rightarrow[m]$ is called linear if for any $h\in\mathcal{H}$ and any $x,x^{\prime}\in[u]$ , $h(x)+h(x^{\prime})=h(x+x^{\prime})+c_{h}\;(\operatorname{mod}{}m)$ , where $c_{h}$ is some integer that depends only on $h$ . $\mathcal{H}$ is called almost linear if for any $h\in\mathcal{H}$ and any $x,x^{\prime}\in[u]$ , either $h(x)+h(x^{\prime})=h(x+x^{\prime})+c_{h}\;(\operatorname{mod}{}m)$ , or $h(x)+h(x^{\prime})=h(x+x^{\prime})+c_{h}+1\;(\operatorname{mod}{}m)$ .

Given a hash function $h\in\mathcal{H}$ we say that a value $i\in m$ is heavy for set $S=\{x_{1},\ldots,x_{n}\}\subset[u]$ if $|\{x\in S:h(x)=i\}|>\frac{3n}{m}$ . $\mathcal{H}$ is called almost balanced if for any set $S=\{x_{1},\ldots,x_{n}\}\subset[u]$ , the expected number of elements from $S$ that are hashed to heavy values is $O(m)$ . Kopelowitz et al. showed in [31] that a family of hash functions obtained from the construction of Dietzfelbinger [20] is almost-linear, almost-balanced, and pair-wise independent. In order to reduce clutter in the proof here we assume the existence of linear, almost-balanced, and pair-wise independent families of hash functions. Using the family of hash functions of Dietzfelbinger [20] will only affect multiplicative constants.

We reduce an instance of the 3SUM-Indexing problem to an instance of the SetDisjointness problem as follows. Let $R=n^{\gamma}$ for some constant $0<\gamma<1$ . Let $Q=(5n/R)^{2}$ . Without loss of generality we assume that $\sqrt{Q}$ is an integer. We pick a random hash function $h_{1}:U\rightarrow[R]$ from a family that is linear and almost-balanced. Using $h_{1}$ we create $R$ buckets $\mathcal{B}_{1},\ldots,\mathcal{B}_{R}$ such that $\mathcal{B}_{i}=\{x\in A_{1}:h_{1}(x)=i\}$ , and another $R$ buckets $\mathcal{C}_{1},\ldots,\mathcal{C}_{R}$ such that $\mathcal{C}_{i}=\{x\in A_{2}:h_{1}(x)=i\}$ . Since $h_{1}$ is almost-balanced, the expected number of elements from $A_{1}$ and $A_{2}$ that are mapped to buckets of size greater than $3n/R$ is $O(R)$ . We use $O(R)$ space to maintain this list explicitly, together with a lookup table for the elements in $A_{1}$ and $A_{2}$ .

Next, we pick a random hash function $h_{2}:U\rightarrow[Q]$ where $h_{2}$ is chosen from a pair-wise independent and linear family. For each bucket we create $\sqrt{Q}$ shifted sets as follows: for each $0\leq j<\sqrt{Q}$ let $\mathcal{B}_{i,j}=\{h_{2}(x)-j\cdot\sqrt{Q}\,(\operatorname{mod}{}Q)\,|\,x\in\mathcal{B}_{i}\}$ and $\mathcal{C}_{i,j}=\{-h_{2}(x)+j\,(\operatorname{mod}{}Q)\,|\,x\in\mathcal{C}_{i}\}$ . These sets are all preprocessed into a data structure for the SetDisjointness problem.

Next, we answer a 3SUM-Indexing query $z$ by utilizing the linearity of $h_{1}$ and $h_{2}$ , which implies that if there exist $x\in A_{1}$ and $y\in A_{2}$ such that $x+y=z$ then $h_{1}(x)+h_{1}(y)=h_{1}(z)+c_{h_{1}}\,(\operatorname{mod}{}R)$ and $h_{2}(x)+h_{2}(y)=h_{2}(z)+c_{h_{2}}\,(\operatorname{mod}{}Q)$ .

Thus, if $x\in\mathcal{B}_{i}$ then $y$ must be in $\mathcal{C}_{h_{1}(z)+c_{h_{1}}-i(\operatorname{mod}{}R)}$ . For each $i\in[R]$ we would like to intersect $\mathcal{B}_{i}$ with $\mathcal{C}_{h_{1}(z)+c_{h_{1}}-i(\operatorname{mod}{}R)}$ in order to find candidate pairs of $x$ and $y$ . Denote by $h_{2}^{\uparrow}(z)=\lfloor\frac{h_{2}(z)+c_{h_{1}}}{\sqrt{Q}}\rfloor$ and $h_{2}^{\downarrow}(z)=h_{2}(z)+c_{h_{2}}(\operatorname{mod}{}\sqrt{Q})$ . Due to the almost-linearity of $h_{2}$ , if the sets $\mathcal{B}_{i}$ and $\mathcal{C}_{h_{1}(z)+c_{h_{1}}-i(\operatorname{mod}{}R)}+z$ are not disjoint then the sets $\mathcal{B}_{i,h_{2}^{\uparrow}(z)}$ and $\mathcal{C}_{h_{1}(z)+c_{h_{1}}-i(\operatorname{mod}{}R),h_{2}^{\downarrow}(z)}$ are not disjoint (but the reverse is not necessarily true). Thus, if $\mathcal{B}_{i,h_{2}^{\uparrow}(z)}\cap\mathcal{C}_{h_{1}(z)+c_{h_{1}}-i(\operatorname{mod}{}R),h_{2}^{\downarrow}(z)}=\emptyset$ then there is no candidate pair in $\mathcal{B}_{i}$ and $\mathcal{C}_{h_{1}(z)+c_{h_{1}}-i(\operatorname{mod}{}R)}+z$ . However, if $\mathcal{B}_{i,h_{2}^{\uparrow}(z)}\cap\mathcal{C}_{h_{1}(z)+c_{h_{1}}-i(\operatorname{mod}{}R),h_{2}^{\downarrow}(z)}\neq\emptyset$ then it is possible that this is due to a 3SUM-Indexing solution, but we may have false positives. Notice that the number of set pairs whose intersection we need to examine is $O(R)$ since $z$ is given. Once we pick $i$ ( $R$ choices) the rest is implicit.

Set $z$ and let $k=h_{2}(z)$ . Since $h_{2}$ is pair-wise independent and linear then for any pair $x,y\in U$ where $x\neq y$ we have that if $x+y\neq z$ then $\Pr[h_{2}(x)+h_{2}(y)=k+c_{h_{2}}(\operatorname{mod}{}R)]=\Pr[h_{2}(x+y)=h_{2}(z)+c_{h_{2}}(\operatorname{mod}{}R)]=\frac{1}{Q}$ . Since each bucket contains at most $3n/R$ elements, the probability of a false positive due to two buckets $\mathcal{B}_{i}$ and $\mathcal{C}_{j}$ is not greater than $(\frac{3n}{R})^{2}\frac{1}{Q}=\frac{9}{25}$ . In order to reduce the probability of a false positive to be polynomially small, we repeat the process with $O(\log n)$ different choices of $h_{2}$ functions (but using the same $h_{1}$ ). This blows up the number of sets by a factor of $O(\log n)$ , but not the universe. If the sets intersect under all $O(\log n)$ choices of $h_{2}$ then we can spend $O(n/R)$ time to find $x$ and $y$ within buckets $\mathcal{B}_{i}$ and $\mathcal{C}_{j}$ , which are either a 3SUM-Indexing solution (and the algorithm halts), or a false positive, which only occurs with probability $1/\mbox{poly}(n)$ .

To summarize, we create a total of $O(R\sqrt{Q}\log n)$ sets, each of size at most $3n/R$ . Thus, the total size of the SetDisjointness instance is $N=\tilde{O}(n^{2}/R)$ . For a query, we perform $\tilde{O}(R)$ queries on the SetDisjointness structure, and spend another $O(R\cdot\frac{n}{R}\cdot\frac{1}{poly(n)})=O(1)$ expected time to verify that we did not hit a false positive. Furthermore, we spend $O(R)$ time to check possible solutions containing one of the expected $O(R)$ elements from buckets with too many elements by using the lookup tables. If we denote by $T(N)$ and $S(N)$ the query time and space usage, respectively, of the SetDisjointness data structure on $N$ elements (in our case $N=\tilde{O}(n^{2-\gamma})$ ), then the query time of the reduction becomes $t_{\textsf{3SI}{}}=\tilde{O}(R\cdot T(n^{2}/R))$ time and the space usage is $s_{\textsf{3SI}{}}=\tilde{O}(S(n^{2}/R)+O(n))$ . Since we may assume that $S(N)=\Omega(N)$ , we have that $s_{\textsf{3SI}{}}=\tilde{O}(S(N))$ .

By the Strong 3SUM-Indexing Conjecture, either $s_{\textsf{3SI}{}}=\tilde{\Omega}(n^{2})$ or $t_{\textsf{3SI}{}}=\tilde{\Omega}(n)$ , which means that either $S(N)=\tilde{\Omega}(N^{\frac{2}{2-\gamma}})$ or $T(N)=\tilde{\Omega}(N^{\frac{1-\gamma}{2-\gamma}})$ . For any constant $\epsilon>0$ , if the SetDisjointness data structure uses $\tilde{\Theta}(N^{\frac{2}{2-\gamma}-\epsilon})$ space, then $S(N)\cdot(T(N))^{2}=\tilde{\Omega}(N^{\frac{2}{2-\gamma}-\epsilon+\frac{2-2\gamma}{2-\gamma}})=\tilde{\Omega}(N^{2-\epsilon})$ . Since this holds for any $\epsilon>0$ it must be that $S(N)\cdot(T(N))^{2}=\tilde{\Omega}(N^{2})$ . ∎

Theorem 3.2

The Strong 3SUM-Indexing Conjecture implies the Strong SetIntersection Conjecture.

Proof

The proof follows the same structure as the proof of Theorem 3.1, but here we set $Q=(n^{1+\delta}/R)$ , where $\delta>0$ is a constant. Furthermore, we preprocess the buckets using a SetIntersection data structure, and if two sets intersect then instead of repeating the whole process with different choices of $h_{2}$ (in order to reduce the probability of a false positive), we use the SetIntersection data structure to report all of the elements in an intersection, and verify them all directly.

As before, set $z$ and let $k=h_{2}(z)$ . Since $h_{2}$ is pair-wise independent and linear then for any pair $x,y\in U$ where $x\neq y$ we have that if $x+y\neq z$ then $\Pr[h_{2}(x)+h_{2}(y)=k+c_{h_{2}}(\operatorname{mod}{}R)]=\Pr[h_{2}(x+y)=h_{2}(z)+c_{h_{2}}(\operatorname{mod}{}R)]=\frac{1}{Q}$ . We now bound the expected output size from all of the intersections. Since each pair of buckets imply at most $(\frac{3n}{R})^{2}$ pairs of elements, the expected size of their intersection is $E[|h_{2}(\mathcal{B}_{i})-k\cap h_{2}(\mathcal{C}_{j})|]=(\frac{3n}{R})^{2}\frac{1}{Q}=O(\frac{n^{1-\delta}}{R})$ . Thus, the expected size of the output of all of the $O(R)$ intersections is $O(R\frac{n}{Rn^{\delta}})=O(n^{1-\delta})$ . For each pair in an intersection we can verify in constant time if together with $z$ they form a solution.

To summarize, we create a total of $O(R\sqrt{Q})$ sets, each of size at most $3n/R$ . Thus, the total size of the SetIntersection instance is $N=\tilde{O}(n^{2}/R)$ . For a query, we perform $\tilde{O}(R)$ queries on the SetIntersection structure. Furthermore, we spend $O(R)$ time to check possible solutions containing one of the expected $O(R)$ elements from buckets with too many elements by using the lookup tables. If we denote by $T(N)$ and $S(N)$ the query time and space usage, respectively, of the SetIntersection data structure on $N$ elements (in our case $N=\tilde{O}(R\sqrt{Q}n/R)=\tilde{O}(n^{\frac{3+\delta-\gamma}{2}})$ ), then the query time of the reduction becomes $t_{\textsf{3SI}{}}=\tilde{O}(R\cdot T(N)+n^{1-\delta})$ time and the space usage is $s_{\textsf{3SI}{}}=\tilde{O}(S(N)+O(n))$ . Since we may assume that $S(N)=\Omega(N)$ , we have that $s_{\textsf{3SI}{}}=\tilde{O}(S(N))$ .

By the Strong 3SUM-Indexing conjecture, either $s_{\textsf{3SI}{}}=\tilde{\Omega}(n^{2})$ or $t_{\textsf{3SI}{}}=\tilde{\Omega}(n)$ , which means that either $S(N)=\tilde{\Omega}(N^{\frac{4}{3+\delta-\gamma}})$ or $T(N)=\tilde{\Omega}(N^{\frac{2-2\gamma}{3+\delta-\gamma}})$ . For any constant $\epsilon>0$ , if the SetIntersection data structure uses $\tilde{\Theta}(N^{\frac{4}{3+\delta-\gamma}-\epsilon})$ space, then $S(N)\cdot T(N)=\tilde{\Omega}(N^{\frac{4}{3+\delta-\gamma}-\epsilon+\frac{2-2\gamma}{3+\delta-\gamma}})=\tilde{\Omega}(N^{2-\frac{2\delta}{3+\delta-\gamma}-\epsilon})$ . Since this holds for any $\epsilon>0$ and any $\delta>0$ it must be that $S(N)\cdot T(N)=\tilde{\Omega}(N^{2})$ . ∎

Theorem 3.3

The SetDisjointness Conjecture implies the 3SUM-Indexing Conjecture.

Proof

Given an instance of SetDisjointness, we construct an instance of 3SUM-Indexing as follows. Denote with $M$ the value of the largest element in the SetDisjointness instance. Notice that we may assume that $M\leq N$ (otherwise we can use a straightforward renaming). For every element $x\in U$ that is contained in at least one of the sets we create two integers $x_{A}$ and $x_{B}$ , which are represented by $2\lceil\log{m}\rceil+\lceil\log{N}\rceil+3$ bits each (recall that $m$ is the number of sets).

The $\lceil\log{N}\rceil$ least significant bits in $x_{A}$ represent the value of $x$ . The following bit is a zero. The following $\lceil\log{m}\rceil$ bits in $x_{A}$ represent the index of the set containing $x$ , and the rest of the $2+\lceil\log{m}\rceil$ are all set to zero. The $\lceil\log{N}\rceil$ least significant bits in $x_{B}$ represent the value of $M-x$ . The following $2+\lceil\log{m}\rceil$ are all set to zero. The following $\lceil\log{m}\rceil$ bits in $x_{B}$ represent the index of the set containing $x$ , and the last bit is set to zero. Finally, the integer $x_{A}$ is added to $A_{1}$ of the 3SUM-Indexing instance, while the integer $x_{B}$ is added to $A_{2}$ .

We have created two sets of $n\leq M$ integers. We then preprocess them to answer 3SUM-Indexing queries. Now, to answer a SetDisjointness query on sets $S_{i}$ and $S_{j}$ , we query the 3SUM-Indexing data structure with an integer $z$ which is determined as follows. The $\lceil\log{N}\rceil$ least significant bits in $z$ represent the value of $M$ . The following bit is a zero. The following $\lceil\log{m}\rceil$ bits represent the index $i$ and are followed by a zero. The next $\lceil\log{m}\rceil$ bits represent the index $j$ and the last bit is set to zero.

It is straightforward to verify that there exists a solution to the 3SUM-Indexing problem on $z$ if and only if the sets $S_{i}$ and $S_{j}$ are not disjoint. Therefore, if there is a solution to the 3SUM-Indexing problem with less than $\tilde{\Omega}(n^{2})$ space and constant query time then there is a solution for the SetDisjointness problem which refutes the SetDisjointness Conjecture. ∎

4 Parameterized Generalization:

k-Set Intersection and (k+1)-SUM

Two parameterized generalizations of the SetDisjointness and 3SUM-Indexing problems are formally defined as follows:

Problem 4 (k-SetDisjointness Problem)

Preprocess a family $F$ of $m$ sets, all from universe $U$ , with total size $N=\sum_{S\in F}|S|$ so that given $k$ query sets $S_{1},S_{2},\dots,S_{k}\in F$ one can quickly determine if $\cap_{i=1}^{k}S_{i}=\emptyset$ .

Problem 5 ((k+1)-SUM-Indexing Problem)

Preprocess $k$ integer arrays $A_{1},A_{2},\dots,A_{k}$ , each of length $n$ , so that given a query integer $z$ we can decide if there is $x_{1}\in A_{1},x_{2}\in A_{2},\dots,x_{k}\in A_{k}$ such that $z=\sum_{i=1}^{k}x_{i}$ .

It turn out that a natural generalization of the data structure of Cohen and Porat [18] leads to a data structure for k-SetDisjointness as shown in the following lemma.

Lemma 1

There exists a data structure for the k-SetDisjointness problem where the query time is $T$ and the space usage is $S=O((N/T)^{k})$ .

Proof

We call the $f$ largest sets in $F$ large sets. The rest of the sets are called small sets. In the preprocessing stage we explicitly maintain a $k$ -dimensional table with the answers for all k-SetDisjointness queries where all $k$ sets are large sets. The space needed for such a table is $S=f^{k}$ . Moreover, for each set (large or small) we maintain a look-up table that supports disjointness queries (with this set) in constant time. Since there are $f$ large sets and the total number of elements is $N$ , the size of each of the small sets is at most $N/f$ .

Given a k-SetDisjointness query, if all of the query sets are large then we look up the answer in the $k$ -dimensional table. If at least one of the sets is small then using a brute-force search we look-up each of the at most $O(N/f)$ elements in each of the other $k-1$ sets. Thus, the total query time is bounded by $O(kN/f)$ , and the space usage is $S=O(f^{k})$ . The rest follows. ∎

Notice that for the case of $k=2$ in Lemma 1 we obtain the same tradeoff of Cohen and Porat [18] for SetDisjointness. The following conjecture suggests that the upper bound of Lemma 1 is the best possible.

Conjecture 7

Strong k-SetDisjointness Conjecture. Any data structure for the k-SetDisjointness problem that answers queries in $T$ time must use $S=\tilde{\Omega}(\frac{N^{k}}{T^{k}})$ space.

Similarly, a natural generalization of the Strong 3SUM-Indexing conjecture is the following.

Conjecture 8

Strong (k+1)-SUM-Indexing Conjecture. There is no solution for the (k+1)-SUM-Indexing problem with $\tilde{O}(n^{k-\Omega(1)})$ space and truly sublinear query time.

We also consider some weaker conjectures, similar to the SetDisjointness and 3SUM-Indexing conjectures.

Conjecture 9

k-SetDisjointness Conjecture. Any data structure for the k-SetDisjointness problem that answers queries in constant time must use $\tilde{\Omega}(N^{k})$ space.

Conjecture 10

(k+1)-SUM-Indexing Conjecture. There is no solution for the (k+1)-SUM-Indexing problem with $\tilde{O}(n^{k-\Omega(1)})$ space and constant query time.

Similar to Theorem 3.3, we prove the following relationship between the k-SetDisjointness conjecture and the (k+1)-SUM-Indexing conjecture.

Theorem 4.1

The k-SetDisjointness conjecture implies the (k+1)-SUM-Indexing conjecture

Proof

Given an instance of k-SetDisjointness, we construct an instance of (k+1)-SUM-Indexing as follows. Denote by $M$ the value of the largest element in the SetDisjointness instance. Notice that we may assume that $M\leq N$ (otherwise we use a straightforward renaming). For every element $x\in U$ that is contained in at least one of the sets we create $k$ integers $x_{1},x_{2},...,x_{k}$ , where each integer is represented by $k\lceil\log{m}\rceil+(k-1)\lceil\log{N}\rceil+2k-1$ bits.

For integer $x_{i}$ , if $i>1$ the $(k-1)\lceil\log{N}\rceil+k-1$ least significant bits are all set to zero, except for the bits in indices $(i-2)(\lceil\log{N}\rceil+1)+1,...,(i-1)(\lceil\log{N}\rceil+1)$ that represent the value of $x$ . If $i=1$ the value of the bits in the indices $(j-1)(\lceil\log{N}\rceil+1)+1,...,j(\lceil\log{N}\rceil+1)$ is set to $M-x$ for all $1\leq j\leq k-1$ . The $k\lceil\log{m}\rceil+k$ following bits are all set to zero, except for the bits in indices $(i-1)(\lceil\log{m}\rceil+1)+1,...,i(\lceil\log{m}\rceil+1)$ which represent the index of the set containing $x$ .

We now create an instance of (k+1)-SUM-Indexing where the $j$ th input array $A_{j}$ is the set of integers $x_{j}$ for all $x\in U$ that is contained in at least one set of our family. Thus, the size of each array is at most $N$ . Now, given a k-SetDisjointness query $(i_{1},i_{2},...,i_{k})$ we must decide if $S_{i_{1}}\cap S_{i_{2}}\cap...\cap S_{i_{k}}=\emptyset$ . To answer this query we will query the instance of (k+1)-SUM-Indexing we have created with an integer $z$ whose binary representation is as follows: In the $(k-1)\lceil\log{N}\rceil+k-1$ least significant bits the value of the bits in the indices $(j-1)(\lceil\log{N}\rceil+1)+1,...,j(\lceil\log{N}\rceil+1)$ is set to $M$ for all $1\leq j\leq k-1$ . In the $k\lceil\log{m}\rceil+k$ following bits, the bits at locations $(j-1)(\lceil\log{m}\rceil+1)+1,...,j(\lceil\log{m}\rceil+1)$ represent $i_{j}$ (for $1\leq j\leq k$ ). The rest of the bits are padding zero bits (in between representations of various $i_{j}$ s and $M$ s).

If $S_{i_{1}}\cap S_{i_{2}}\cap...\cap S_{i_{k}}\neq\emptyset$ then by our construction it is straightforward to verify that the (k+1)-SUM-Indexing query on $z$ will return that there is a solution. If $S_{i_{1}}\cap S_{i_{2}}\cap...\cap S_{i_{k}}=\emptyset$ then at least for one $j\in[k-1]$ the sum of values in the bits in indices $(j-1)(\lceil\log{N}\rceil+1)+1,...,j(\lceil\log{N}\rceil+1)$ in the $(k-1)\lceil\log{N}\rceil+k-1$ least significant bits will not be $M$ . This is because we can view each block of $\lceil\log{N}\rceil+1$ bits in the $(k-1)\lceil\log{N}\rceil+k-1$ least significant bits as solving a linear equation. This equation is of the form $M-x_{1}+x_{i}=M$ for every block $i-1$ where $2\leq i\leq k$ . The solution of each of these equations is $x_{1}=x_{i}$ for all $2\leq i\leq k$ . Consequently, a solution can be found only if there is a specific $x$ which is contained in all of the $k$ sets. Therefore, we get a correct answer to a k-SetDisjointness query by answering a (k+1)-SUM-Indexing query.

Consequently, if for some specific constant $k$ there is a solution to the (k+1)-SUM-Indexing problem with less than $\tilde{\Omega}(n^{k})$ space and constant query time, then with this reduction we refute the k-SetDisjointness conjecture. ∎

5 Directed Reachability Oracles as a Generalization of Set Disjointness Conjecture

An open question which was stated by Pǎtraşcu in [37] asks if it is possible to preprocess a sparse directed graph in less than $\Omega(n^{2})$ space so that Reachability queries (given two query vertices $u$ and $v$ decide whether there is a path from $u$ to $v$ or not) can be answered efficiently. A partial answer, given in [37], states that for constant query time truly superlinear space is necessary. In the undirected case the question is trivial and one can answer queries in constant time using linear space. This is also possible for planar directed graphs (see Holm et al. [27]).

We now show that Reachability oracles for sparse graphs can serve as a generalization of the SetDisjointness conjecture. We define the following parameterized version of Reachability. In the k-Reachability problem the goal is to preprocess a directed sparse graph $G=(V,E)$ so that given a pair of distinct vertices $u,v\in V$ one can quickly answer whether there is a path from $u$ to $v$ consisting of at most $k$ edges. We prove that 2-Reachability and SetDisjointness are tightly connected.

Lemma 2

There is a linear time reduction from SetDisjointness to 2-Reachability and vice versa which preserves the size of the instance.

Proof

Given a graph $G=(V,E)$ as an instance for 2-Reachability, we construct a corresponding instance of SetDisjointness as follows. For each vertex $v$ we create the sets $V_{in}=\{u|(u,v)\in E\}$ and $V_{out}=\{u|(v,u)\in E\}\cup\{v\}$ . We have $2n$ sets and $2m+n$ elements in all of them ( $|V|=n$ and $|E|=m$ ). Now, a query $u,v$ is reduced to determining if the sets $U_{out}$ and $V_{in}$ are disjoint or not. Notice, that the construction is done in linear time and preserves the size of the instance. In the opposite direction, we are given $m$ sets $S_{1},S_{2},...,S_{m}$ having $N$ elements in total $e_{1},e_{2},...,e_{N}$ . We can create an instance of 2-Reachability in the following way. For each set $S_{i}$ we create a vertex $v_{i}$ . Moreover, for each element $e_{j}$ we create a vertex $u_{j}$ . Then, for each element $e_{j}$ in a set $s_{i}$ we create two directed edges $(v_{i},u_{j})$ and $(u_{j},v_{i})$ . These vertices and edges define a directed graph, which is preprocessed for 2-Reachability queries. It is straightforward to verify that the disjointness of $S_{i}$ and $S_{j}$ is equivalent to determining if there is a path of length at most $2$ edges from $v_{i}$ to $v_{j}$ . Moreover, the construction is done in linear time and preserves the size of the instance. ∎

Furthermore, we consider k-Reachability for $k\geq 3$ . First we show an upper bound on the tradeoff between space and query time for solving k-Reachability.

Lemma 3

There exists a data structure for k-Reachability with $S$ space and $T$ query time such that $ST^{2/(k-1)}=O(n^{2})$ .

Proof

Let $\alpha>0$ be an integer parameter to be set later. Given a directed graph $G=(V,E)$ , we call vertex $v\in V$ a heavy vertex if $deg(v)>\alpha$ and a vertex $u\in V$ a light vertex if $deg(u)\leq\alpha$ . Notice that the number of heavy vertices is at most $n/\alpha$ . For all heavy vertices in $V$ we maintain a matrix containing the answers to any k-Reachability query between two heavy vertices. This uses $O(n^{2}/\alpha^{2})$ space.

Next, we recursively construct a data structure for (k-1)-Reachability. Given a query $u,v$ , if both vertices are heavy then the answer is obtained from the matrix. Otherwise, either $u$ or $v$ is light vertex. Without loss of generality, say $u$ is a light vertex. We consider each vertex $w\in N_{out}(u)$ ( $N_{out}(u)=\{v|(u,v)\in E\}$ ) and query the (k-1)-Reachability data structure with the pair $w,v$ . Since $u$ is a light node, there are no more than $\alpha$ queries. One of the queries returns a positive answer if and only if there exists a path of length at most $k$ from $u$ to $v$ .

Denote by $S(k,n)$ the space used by our k-Reachability oracle on a graph with $n$ vertices and denote by $Q(k,n)$ the corresponding query time. In our construction we have $S(k,n)=n^{2}/\alpha^{2}+S(k-1,n)$ and $Q(k,n)=\alpha Q(k-1,n)+O(1)$ . For $k=1$ it is easy to construct a linear space data structure using hashing so that queries can be answered in constant time. Thus, $S=S(k,n)=O((k-1)n^{2}/\alpha^{2})$ and $T=Q(k,n)=O(\alpha^{k-1})$ . ∎

Notice that for the case of $k=2$ the upper bounds from Lemma 3 exactly match the tradeoff of the Strong SetDisjointness Conjecture ( $ST^{2}=\tilde{O}(n^{2})$ ). We expand this conjecture by considering the tightness of our upper bound for k-Reachability, which then leads to some interesting consequences with regard to distance oracles.

Conjecture 11

Directed k-Reachability Conjecture. Any data structure for the k-Reachability problem with query time $T$ must use $S=\tilde{\Omega}(\frac{n^{2}}{T^{2/(k-1)}})$ space.

Notice that when $k$ is non-constant then by our upper bound $\tilde{\Omega}(n^{2})$ space is necessary independent of the query time. This fits nicely with what is currently known about the general question of Reachability oracles: either we spend $n^{2}$ space and answer queries in constant time or we do no preprocessing and then answer queries in linear time. This leads to the following conjecture.

Conjecture 12

Directed Reachability Hypothesis. Any data structure for the Reachability problem must either use $\tilde{\Omega}(n^{2})$ space, or linear query time.

The conjecture states that in the general case of Reachability there is no full tradeoff between space and query time. We believe the conjecture is true even if the path is limited to lengths of some non-constant number of edges.

6 Distance Oracles and Directed Reachability

There are known lower bounds for constant query time distance oracles based on the SetDisjointness hypothesis. Specifically, Cohen and Porat [18] showed that stretch-less-than-2 oracles need $\Omega(n^{2})$ space for constant queries. Patrascu et al. [39] showed a conditional space lower bound of $\Omega(m^{5/3})$ for constant-time stretch-2 oracles. Applying the Strong SetDisjointness conjecture to the same argument as in [18] we can prove that for stretch-less-than-2 oracles the tradeoff between $S$ (the space for the oracle) and $T$ (the query time) is by $S\times T^{2}=\Omega(n^{2})$ .

Recent effort was taken toward constructing compact distance oracles where we allow non-constant query time. For stretch-2 and stretch-3 Agarwal et al. [10] [9] achieves a space-time tradeoff of $S\times T=O(n^{2})$ and $S\times T^{2}=O(n^{2})$ , respectively, for sparse graphs. Agarwal [8] also showed many other results for stretch-2 and below. Specifically, Agarwal showed that for any integer $k$ a stretch-(1+1/k) oracle exhibits the following space-time tradeoff: $S\times T^{1/k}=O(n^{2})$ . Agarwal also showed a stretch-(1+1/(k+0.5)) oracle that exhibits the following tradeoff: $S\times T^{1/(k+1)}=O(n^{2})$ . Finally, Agarwal gave a stretch-(5/3) oracle that achieves a space-time tradeoff of $S\times T=O(n^{2})$ . Unfortunately, no lower bounds are known for non-constant query time.

Conditioned on the directed k-Reachability conjecture we prove the following lower bound.

Lemma 4

Assume the directed k-Reachability conjecture holds. Then stretch-less-than- $(1+2/k)$ distance oracles with query time $T$ must use $S\times T^{2/(k-1)}=\tilde{\Omega}(n^{2})$ space.

Proof

Given a graph $G=(V,E)$ for which we want to preprocess for k-Reachability, we create a layered graph with $k$ layers where each layer consists of a copy of all vertices of $V$ . Each pair of neighboring layers is connected by a copy of all edges in $E$ . We omit all directions from the edges. For every fixed integer $k$ , the layered graph has $O(|V|)$ vertices and $O(|E|)$ edges. Next, notice that if we construct a distance oracle that can distinguish between pairs of vertices of distance at most $k$ and pairs of vertices of distance at least $k+2$ , then we can answer k-Reachability queries. Consequently, assuming the k-Reachability conjecture we have that $S\times T^{2/(k-1)}=\Omega(n^{2})$ for stretch-less-than- $(1+2/k)$ distance oracles (For $k=2$ this is exactly the result we get by the SetDisjointness hypothesis). ∎

Notice, that the stretch-(5/3) oracle shown by Agarwal [8] achieves a space-time tradeoff of $S\times T=O(n^{2})$ . Our lower bound is very close to this upper bound since it applies for any distance oracle with stretch-less-than- $(5/3)$ , by setting $k=3$ .

7 SETH and Orthogonal Vectors Space Conjectures

Solving SAT using $O(2^{n})$ time where $n$ is number of variables in the formula can be easily done using only $O(n)$ space. However, the question is how can we use space in the case that we have only a partial assignment of $R$ variables and we would like to quickly figure out whether this partial assignment can be completed to a full satisfying assignment or not. On one end, by using just $O(n)$ space we can answer queries in $O(2^{n-R})$ time. On the other end, we can save the answers to all possible queries using $O(2^{R})$ space. It is not clear if there is some sort of a tradeoff in between these two. A related problem is the problem of Orthogonal Vectors (OV). In this problem one is given a collection of $n$ vectors of length $O(\log{n})$ and need to answer if there are two of them which are orthogonal to one another. A reduction from SETH to OV was shown in [41]. By this reduction given a k-CNF formula of $n$ variables one can transform it using $O(2^{\epsilon n})$ time to $O(2^{\epsilon n})$ instances of OV in which the vectors are of length $2f(k,\epsilon)\log{n}$ (for any $\epsilon>0$ , where $f(k,\epsilon)n$ is the number of clauses of each sparse formula represented by one instance of OV). This reduction leads to the following conjecture regarding OV, which is based on SETH: There is no algorithm that, for every $c\geq 1$ , solves the OV problem on $n$ boolean vectors of length $c\log{n}$ in $\tilde{O}(n^{2-\Omega(1)})$ time.

We can consider a data structure variant of the OV problem, which we call OV indexing. Given a list of $n$ boolean vectors of length $c\log{n}$ we should preprocess them and create a suitable data structure. Then, we answer queries of the following form: Given a vector $v$ , is there a vector in the list which is orthogonal to $v$ ?

We state the following conjecture which is the space variant of the well-studied OV (time) conjecture:

Conjecture 13

Orthogonal Vectors Indexing Hypothesis: There is no algorithm for every $c\geq 1$ that solves the OV indexing problem with $\tilde{O}(n^{2-\Omega(1)})$ space and truly sublinear query time.

We note that we believe that the last conjecture is true even if we allow superpolynomial preprocessing time. Moreover, it seems that it also may be true even for some constant $c$ slightly larger than 2.

8 Space Requirements for Boolean Matrix Multiplication

Boolean Matrix Multiplication(BMM) is one of the most fundamental problems in Theoretical Computer Science. The question of whether computing the Boolean product of two Boolean matrices of size $n\times n$ is possible in $O(n^{2})$ time is one of the most intriguing open problems. Moreover, finding a combinatorial algorithm for BMM taking $O(n^{3-\epsilon})$ time for some $\epsilon>0$ is considered to be impossible to do with current algorithmic techniques.

We focus on the following data structure version of BMM, preprocess two $n\times n$ Boolean matrices $A$ and $B$ , such that given a query $(i,j)$ we can quickly return the value of $c_{i,j}$ where $C=\{c_{i,j}\}$ is the Boolean produce $A$ and $B$ . Since storing all possible answers to queries will require $\theta(n^{2})$ space in the worst case, we focus on the more interesting scenario where we have only $O(n^{2-\Omega(1)})$ space to store the outcome of the preprocessing stage. In case the input matrices are dense (the number of ones and the number of zeroes are both $\theta(n^{2})$ ) it seems that this can be hard to achieve as storing the input matrices alone will take $\theta(n^{2})$ space. So we consider a complexity model, which we call the read-only input model, in which storing the input is for free (say on read-only memory), and the space usage of the data structure is only related to the additional space used. We now demonstrate that BMM in the read-only input model is equivalent to SetDisjointness.

Lemma 5

BMM in the read-only input model and SetDisjointness are equivalent.

Proof

Given an instance of SetDisjointness let $e_{1},...,e_{N}$ denote the elements in an input instance. We construct an instance of BMM as follows. Assume without loss of generality that all sets are not empty, and so $m\leq N$ . Row $i$ in matrix $A$ represents a set $S_{i}$ while each column $j$ represents element $e_{j}$ . An entry $a_{i,j}$ equals $1$ if $e_{j}\in S_{i}$ and equals zero otherwise. We also set $B=A^{T}$ . We also pad each of the matrices with zeroes so their size will be $N\times N$ . Clearly, $c_{i,j}$ in matrix $C$ , which is the product of $A$ and $B$ , is an indicator whether $S_{i}\cap S_{j}=\emptyset$ .

In the opposite direction, given two matrices $A$ and $B$ having $m$ ones we view each row $i$ of $A$ as a characteristic vector of a set $S_{i}$ (the elements in the set correspond to the ones in that row) and each column $j$ of $B$ as a characteristic vector of a set $S_{j+n}$ (the elements in the set corresponds to the ones in that column). Thus, the instance of SetDisjointness that have been created consists of $2n$ set with $O(m)$ elements. The value of an element $c_{i,j}$ in the product of $A$ and $B$ can be determined by the intersection of $S_{i}$ and $S_{j+n}$ . ∎

Another interesting connection between BMM and the other problems discussed in this paper is the connection to the problem of calculating the transitive closure of a graph, which is the general directed reachability mentioned above. It is well-known that BMM and transitive closure are equivalent in terms of time as shown by Fischer and Meyer [22]. But what happens if we consider space? It is easy to see that BMM can be reduced to transitive-closure (directed reachability) even in terms of space. However, the opposite direction is not clear as the reduction for time involves recursive squaring, which cannot be implemented efficiently in terms of space.

Another fascinating variant of BMM is the one in which an $n\times n$ matrix $A$ is input for preprocessing and afterwards we need to calculate the result of multiplying it by a given query vector $v$ . This can be seen as the space variant of the celebrated OMV (online matrix-vector) problem discussed by Henzinger et al. [25]. It is interesting to see if one can make use of a data structure so that $n$ consecutive vector queries can be answered in $\tilde{O}(n^{3-\Omega(1)})$ time.

9 Applications

We now provide applications of our rich framework for proving conditional space lower bounds. In the following subsections we consider both static and dynamic problems.

9.1 Static Problems

9.1.1 Edge Triangles

The first example we consider is in regards to triangles. In a problem that is called edge triangles detection, we are given a graph $G=(V,E)$ to preprocess and then we are given an edge $(v,u)$ as a query and need to answer whether $(u,v)$ belongs to a triangle. In a reporting variant of this problem, called edge triangles we need not only to answer if $(u,v)$ belongs to a triangle but also report all triangles it belongs to. This problem was considered in [12].

It can be easily shown that these problems are equivalent to SetDisjointness and SetIntersection. We just construct a set $S_{v}$ per each vertex $v$ containing all its neighbors. Querying if there is a triangle containing the edge $(u,v)$ is equivalent to asking if $S_{v}\cap S_{u}$ is empty or not. Considering the reporting variant, reporting all triangles containing $(u,v)$ is thus equivalent to finding all the elements in $S_{v}\cap S_{u}$ . Therefore, we get the following results:

Theorem 9.1

Assume the Strong SetDisjointness conjecture. Suppose there is a data structure for edge triangles detection problem for a graph $G=(V,E)$ , with $S$ space and query time $T$ . Then $S=\tilde{\Omega}(|E|^{2}/T^{2})$ .

Theorem 9.2

Assume the Strong SetIntersection conjecture. Suppose there is a data structure for edge triangles problem for a graph $G=(V,E)$ , with $S$ space and query time $O(T+op)$ time, where $op$ is the size of the output of the query. Then $S=\tilde{\Omega}(|E|^{2}/T)$ .

9.1.2 Histogram Indexing

A histogram, also called a Parikh vector, of a string $T$ over alphabet $\Sigma$ is a $|\Sigma|$ -length vector containing the character count of $T$ . For example, for $T=aaccbacab$ the histogram is $v(T)=(4,2,3)$ . In the histogram indexing problem we preprocess an $N$ -length string $T$ to support the following queries: given a query histogram $v$ , return whether there is a substring $T^{\prime}$ of $T$ such that $v(T^{\prime})=v$ .

This problem has received much attention in the recent years. The case where the alphabet size is 2 (binary alphabet) was especially studied. A simple algorithm for this case solves the problem in $O(N^{2})$ preprocessing time and constant query time. There was a concentrated effort to reduce the quadratic preprocessing time for some years. However, an algorithm with preprocessing time that is $O(N^{2-\epsilon})$ for some $\epsilon>0$ was unknown until a recent breakthrough by Chan and Lewenstein [16]. They showed an algorithm with $O(N^{1.859})$ preprocessing time and constant query time. For alphabet size $\ell$ they obtained an algorithm with $\tilde{O}(N^{2-\delta})$ preprocessing time and $\tilde{O}(N^{2/3+\delta(\ell+13)/6})$ query time for $0\leq\delta\leq 1$ . Regarding space complexity, it is well known how to solve histogram indexing for binary alphabet using linear space and constant query time. For alphabet size $\ell$ , Kociumaka et al. [30] presented a data structure with $\tilde{O}(N^{2-\delta})$ space and $\tilde{O}(N^{\delta(2\ell-1)})$ query time. Chan and Lewenstein [16] improved their result and showed a solution by a data structure using $\tilde{O}(N^{2-\delta})$ space with only $\tilde{O}(N^{\delta(\ell+1)/2})$ query time.

Amir et al. [11] proved conditional lower bound on the tradeoff between the preprocessing and query time of the histogram indexing problem. Very recently, their lower bound was improved and generalized by Goldstein et al. [24]. Following the reduction by Goldstein et al. [24] and utilizing our framework for conditional space lower bounds, we obtain the following lower bound on the tradeoff between the space and query time of histogram indexing:

Theorem 9.3

Assume the Strong 3SUM-Indexing conjecture holds. The histogram indexing problem for a string of length $N$ and constant alphabet size $\ell\geq 3$ cannot be solved with $O(N^{2-\frac{2(1-\alpha)}{\ell-1-\alpha}-\Omega(1)})$ space and $O(N^{1-\frac{1+\alpha(\ell-3)}{\ell-1-\alpha}-\Omega(1)})$ query time, for any $0\leq\alpha\leq 1$ .

Proof

We use the same reduction as in [24]. This time it will be used to reduce an instance of 3SUM-Indexing (on $2n$ numbers) to histogram indexing, instead of reducing from an instance of 3SUM. The space consumed by the reduction is dominated by the space needed to prepare a histogram indexing instance with string length $N=O(n^{\frac{\ell-2-\alpha}{\ell-3}})$ for histogram queries. The number of histogram queries we do for each query number $z$ of the 3SUM-Indexing instance is $O(n^{\alpha})$ . The query time is dominated by the time required by these queries. Let $S(N,\ell)$ denote the space required by a data structure for histogram indexing on $N$ -length string over alphabet size $\ell$ and let $Q(N,\ell)$ denote the query time for the same parameters. Assuming the strong 3SUM-Indexing conjecture and following our reduction, we have that $S(N,\ell)=O(n^{2-\Omega(1)})$ and $Q(N,\ell)=O(n^{1-\alpha-\Omega(1)})$ . Plugging in the value of $n$ in terms of $N$ we get the required lower bound. ∎

If we plug in the previous theorem $\delta=\frac{2(1-\alpha)}{\ell-1-\alpha}$ , we get that if the strong 3SUM-Indexing conjecture is true we cannot have a solution for histogram indexing with $\tilde{O}(N^{2-\delta})$ space and $\tilde{O}(N^{\delta(\ell-2)/2})$ query time. This lower bound is very close to the upper bound obtained by Chan and Lewenstein [16] as there is only a gap of $\frac{3}{2}\delta$ in the power of $N$ in the query time. Moreover, if the value of $\delta$ becomes close to 0 (so the value of $\alpha$ is close to 1) the upper bound and the lower bound get even closer to each other. This is very interesting, as it means that to get truly subquadratic space solution for histogram indexing for alphabet size greater than 2, we will have to spend polynomial query time. This is in stark contrast to the simple linear space solution for histogram indexing over binary alphabets that supports queries in constant time.

Following reductions presented in [31], from SetIntersection or SetDisjointness to several other problems, we are able to show that based on the Strong SetDisjointness conjecture, the same problems admit a space/query time lower bounds. For sake of completeness, we reproduce these reductions in the next three subsections and show that they admit the space lower bounds as needed.

9.1.3 Distance Oracles for Colors

Let $P$ be a set of points in some metric with distance function $d(\cdot,\cdot)$ , where each point $p\in P$ has some associated colors $C(p)\subset[\ell]$ . For $c\in[\ell]$ we denote by $P(c)$ the set of points from $P$ with color $c$ . We generalize $d$ so that the distance between a point $p$ and a color $c$ is denoted by $d(p,c)=\min_{q\in P(c)}\{d(p,q)\}$ . In the (Approximate) Distance Oracles for Vertex-Labeled Graphs problem [17, 26] we are interested in preprocessing $P$ so that given a query of a point $q$ and a color $c$ we can return $d(q,c)$ (or some approximation). We further generalize $d$ so that the distance between two colors $c$ and $c^{\prime}$ is denoted by $d(c,c^{\prime})=\min_{p\in P(c)}\{d(p,c^{\prime})\}$ . In the Distance Oracle for Colors problem we are interested in preprocessing $P$ so that given two query colors $c$ and $c^{\prime}$ we can return $d(c,c^{\prime})$ . In the Approximate Distance Oracle for Colors problem we are interested in preprocessing $P$ and some constant $\alpha>1$ so that given two query colors $c$ and $c^{\prime}$ we can return some value $\hat{d}$ such that $d(c,c^{\prime})\leq\hat{d}\leq\alpha d(c,c^{\prime})$ .

We show evidence of the hardness of the Distance Oracle for Colors problem and the Approximate Distance Oracle for Colors problem by focusing on the 1-D case.

Theorem 9.4

Assume the Strong SetDisjointness conjecture. Suppose there is a 1-D Distance Oracle for Colors with constant stretch $\alpha\geq 1$ for an input array of size $N$ with $S$ space and query time $T$ . Then $S=\tilde{\Omega}(N^{2}/T^{2})$ .

Proof

We reduce SetDisjointness to the Colored Distance problem as follows. For each set $S_{i}$ we define a unique color $c_{i}$ . For an element $e\in U$ ( $U$ is the universe of the elements in our sets) let $|e|$ denote the number of sets containing $e$ and notice that $\sum_{e\in U}|e|=N$ . Since each element in $U$ appears in at most $m$ sets, we partition $U$ into $\Theta(\log m)$ parts where the $i^{th}$ part $P_{i}$ contains all of the elements $e\in U$ such that $2^{i-1}<|e|\leq 2^{i}$ . An array $X_{i}$ is constructed from $P_{i}=\{e_{1},\cdots e_{|P_{i}|}\}$ by assigning an interval $I_{j}=[f_{j},\ell_{j}]$ in $X_{i}$ to each $e_{j}\in P_{i}$ such that no two intervals overlap. Every interval $I_{j}$ contains all the colors of sets that contain $e_{j}$ . This implies that $|I_{j}|=|e_{j}|\leq 2^{i}$ . Furthermore, for each $e_{j}$ and $e_{j+1}$ we separate $I_{j}$ from $I_{j+1}$ with a dummy color $d$ listed $2^{i}+1$ times at locations $[\ell_{j}+1,f_{j+1}-1]$ .

We can now simulate a SetDisjointness query on subsets $(S_{i},S_{j})$ by performing a colored distance query on colors $c_{i}$ and $c_{j}$ in each of the $\Theta(\log m)$ arrays. There exists a $P_{i}$ for which the two points returned from the query are at distance strictly less than $2^{i}+1$ if and only if there is an element in $U$ that is contained in both $S_{i}$ and $S_{j}$ . The space usage is $\tilde{O}(S)$ and the query time is $\tilde{O}(T)$ . The rest follows directly from the Strong SetDisjointness conjecture.

Finally, notice that the lower bound also holds for the approximate case, as for any constant $\alpha$ the reduction can overcome the $\alpha$ approximation by separating intervals using $\alpha 2^{i}+1$ listings of $d$ . ∎

9.1.4 Document Retrieval Problems with Multiple Patterns

In the Document Retrieval problem [35] we are interested in preprocessing a collection of documents $X=\{D_{1},\cdots,D_{k}\}$ where $N=\sum_{D\in X}|D|$ , so that given a pattern $P$ we can quickly report all of the documents that contain $P$ . Typically, we are interested in run time that depends on the number of documents that contain $P$ and not in the total number of occurrences of $P$ in the entire collection of documents. In the Two Patterns Document Retrieval problem we are given two patterns $P_{1}$ and $P_{2}$ during query time, and wish to report all of the documents that contain both $P_{1}$ and $P_{2}$ . We consider two versions of the Two Patterns Document Retrieval problem. In the decision version we are only interested in detecting if there exists a document that contains both patterns. In the reporting version we are interested in enumerating all documents that contain both patterns.

All known solutions for the Two Patterns Document Retrieval problem with non trivial preprocessing use at least $\Omega(\sqrt{N})$ time per query [18, 28, 29, 35]. In a recent paper, Larsen, Munro, Nielsen, and Thankachan [33] show lower bounds for the Two Patterns Document Retrieval problem conditioned on the hardness of boolean matrix multiplication.

It is straightforward to see that the appropriate versions of the two pattern document retrieval problem solve the corresponding versions of the SetDisjointness and SetIntersection problems. In particular, this can be obtained by creating an alphabet $\Sigma=F$ (one character for each set), and for each $e\in U$ we create a document that contains the characters corresponding to the sets that contain $e$ . The intersection between $S_{i}$ and $S_{j}$ directly corresponds to all the documents that contain both $a$ and $b$ . Thus, all of the lower bound tradeoffs for intersection problems are lower bound tradeoffs for the two pattern document retrieval problem.

Theorem 9.5

Assume the Strong SetDisjointness conjecture. Suppose there is a data structure for the decision version of the Two Patterns Document Retrieval problem for a collection of documents $X$ where $N=\sum_{D\in X}|D|$ , with $S$ space and query time $T$ . Then $S=\tilde{\Omega}(N^{2}/T^{2})$ .

Theorem 9.6

Assume the Strong SetIntersection conjecture. Suppose there is a data structure for the reporting version of the Two Patterns Document Retrieval problem for a collection of documents $X$ where $N=\sum_{D\in X}|D|$ , with $S$ space and query time $O(T+op)$ where $op$ is the size of the output. Then $S=\tilde{\Omega}(N^{2}/T)$ .

9.1.5 Forbidden Pattern Document Retrieval

In the Forbidden Pattern Document Retrieval problem [21] we are also interested in preprocessing the collection of documents but this time given a query $P^{+}$ and $P^{-}$ we are interested in reporting all of the documents that contain $P^{+}$ and do not contain $P^{-}$ . Here too we consider a decision version and a reporting version.

All known solutions for the Forbidden Pattern Document Retrieval problem with non trivial preprocessing use at least $\Omega(\sqrt{N})$ time per query [21, 29]. In a recent paper, Larsen, Munro, Nielsen, and Thankachan [33] show lower bounds for the Forbidden Pattern Document Retrieval problem conditioned on the hardness of boolean matrix multiplication.

Theorem 9.7

Assume the Strong 3SUM-Indexing conjecture. Suppose there is a data structure for the decision version of the Forbidden Pattern Document Retrieval problem for a collection of documents $X$ where $N=\sum_{D\in X}|D|$ , with $S$ space and query time $T$ . Then $S=\tilde{\Omega}(N^{2}/T^{4})$ .

Proof

We will make use of the hard instance of SetDisjointness that was used in order to prove Theorem 3.1, and reduce this specific hard instance to the decision version of the Forbidden Pattern Document Retrieval problem. Recall that the size of this hard instance is $\tilde{O}(n^{2-\gamma})$ , the universe size is $O(n^{2-2\gamma})$ , the number of sets is $\tilde{O}(n)$ , and we need to perform $\tilde{O}(n^{\gamma})$ SetDisjointness queries in order to answer one 3SUM-Indexing query.

Similar to the proof of Theorem 9.5 we set $\Sigma=F$ (one character for each set). However, this time for each $e$ we create a document that contains all the characters corresponding to sets $\mathcal{B}_{i,j}$ that contain $e$ and all the characters corresponding to sets $\mathcal{C}_{i,j}$ that do not contain $e$ .

The reason that we prove our lower bound based on the Strong 3SUM-Indexing conjecture and not on the Strong SetDisjointness conjecture is because the size of our instance can become rather large relative to $N$ (as the number of sets that do not contain an element can be extremely large).

Thus, the size of the Forbidden Pattern Document Retrieval instance is $N=\theta(n^{3-2\gamma})$ , and the number of queries to answer is $\theta(n^{\gamma})$ . Notice that the size of the instance enforces $\gamma$ to be strictly larger than $1/2$ . By the Strong 3SUM-Indexing conjecture, either $S=s_{\textsf{3SI}{}}=\tilde{\Omega}(n^{2})=\tilde{\Omega}(N^{\frac{2}{3-2\gamma}})$ or $O(n^{\gamma}T)\geq t_{\textsf{3SI}{}}\geq\tilde{\Omega}(n)$ , and so $T\geq\tilde{\Omega}(N^{\frac{1-\gamma}{3-2\gamma}})$ . For any constant $\epsilon>0$ , if the Forbidden Pattern Document Retrieval data structure uses $\tilde{\Theta}(N^{\frac{2}{3-2\gamma}-\epsilon})$ space, then $S\cdot T^{4}=\tilde{\Omega}(N^{\frac{2}{3-2\gamma}-\epsilon+\frac{4-4\gamma}{3-2\gamma}})=\tilde{\Omega}(N^{2-\epsilon})$ . Since this holds for any $\epsilon>0$ it must be that $S\cdot T^{4}=\tilde{\Omega}(N^{2})$ . ∎

Notice that if we only allow linear space then we obtain a query time lower bound of $\Omega(N^{\frac{1}{4}-o(1)})$ .

Theorem 9.8

Assume the Strong 3SUM-Indexing conjecture. Suppose there is a data structure for the reporting version of the Forbidden Pattern Document Retrieval problem for a collection of documents $X$ where $N=\sum_{D\in X}|D|$ , with $S$ space and query time $O(T+op)$ where $op$ is the size of the output. Then $S=\tilde{\Omega}(N^{2}/T)$ .

Proof

Our proof is similar to the proof of Theorem 9.7, only this time we use the hard instance of SetIntersection from Theorem 3.2. So the number of queries is $\tilde{O}(n^{\gamma})$ , the size of the universe is $O(n^{1+\delta-\gamma})$ , the number of sets is $\tilde{O}(n^{\frac{1+\delta+\gamma}{2}})$ , and the total size of the output is $\Theta(n^{2-\delta})$ .

Thus, the size of the Forbidden Pattern Document Retrieval instance is $N=\Theta(n^{1+\delta-\gamma}n^{\frac{1+\delta+\gamma}{2}})=\Theta(n^{\frac{3+3\delta-\gamma}{2}})$ , and the number of queries to answer is $\theta(n^{\gamma})$ . Notice that the size of the instance enforces $3\delta-\gamma<1$ . By the Strong 3SUM-Indexing conjecture, either $S=s_{\textsf{3SI}{}}=\tilde{\Omega}(n^{2})=\tilde{\Omega}(N^{\frac{4}{3+3\delta-\gamma}})$ or $O(n^{\gamma}T)\geq t_{\textsf{3SI}{}}\geq\tilde{\Omega}(n)$ , and so $T\geq\tilde{\Omega}(N^{\frac{2-2\gamma}{3+3\delta-\gamma}})$ . For any constant $\epsilon>0$ , if the Forbidden Pattern Document Retrieval data structure uses $\tilde{\Theta}(N^{\frac{4}{3+3\delta-\gamma}-\epsilon})$ space, then $S\cdot T=\tilde{\Omega}(N^{\frac{4}{3+3\delta-\gamma}-\epsilon+\frac{2-2\gamma}{{3+3\delta-\gamma}}})=\tilde{\Omega}(N^{2-\frac{6\delta}{3+3\delta-\gamma}-\epsilon})$ . Since this holds for any $\epsilon>0$ and since we can make $\frac{6\delta}{3+3\delta-\gamma}$ as small as we like, it must be that $S\cdot T=\tilde{\Omega}(N^{2})$ . ∎

9.2 Dynamic Problems

We show space lower bounds on dynamic problems. Lower bounds for these problems from the time perspective were considered by Abboud and Vassilevska-Williams [5]. The first dynamic problem we consider is st-SubConn which is defined as follows. Given an undirected graph $G=(V,E)$ , two fixed vertices $s$ and $t$ and a set $S\subseteq V$ , answer whether $s$ and $t$ are connected using vertices form $S$ only. Vertices can be added or removed from $S$ .

The SetDisjointness problem can be reduced to st-SubConn. Given an instance of SetDisjointness we create an undirected graph $G=(V,E)$ as follows. We first create two unique vertices $s$ and $t$ . Then, for each set $S_{i}$ we create two vertices $v_{i}$ and $u_{i}$ and for each element $e_{j}$ we create a vertex $w_{j}$ . Moreover, we define $E=\{(v_{i},w_{j})|e_{j}\in S_{i}\}\cup\{(u_{i},w_{j})|e_{j}\in S_{i}\}\cup\{(s,v_{i})|1\leq i\leq m\}\cup\{(u_{i},t)|1\leq i\leq m\}$ . Initially the set $S$ contains $s$ and $t$ and all the $w_{i}$ s. Given a query $(i,j)$ asking about the emptiness of $S_{i}\cap S_{j}$ , we add $v_{i}$ and $u_{j}$ to the set $S$ . Then, we ask if $s$ and $t$ are connected, if so we know that $S_{i}\cap S_{j}$ is not empty as the only way to get from $s$ to $t$ is following $v_{i}$ and $u_{j}$ and some node representing a common element of $S_{i}$ and $S_{j}$ . If $s$ and $t$ are not connected then it is clear that the intersection is empty. After the query we remove the two vertices we have added so other queries can be handled properly. By this construction we get the following result:

Theorem 9.9

Assume the Strong SetDisjointness conjecture. Suppose there is a data structure for st-SubConn problem for a graph $G=(V,E)$ , with $S$ space and update and query time $T$ . Then $S=\tilde{\Omega}(|E|^{2}/T^{2})$ .

There are other dynamic problems that st-SubConn can be efficiently reduced to, as shown by Abboud and Vssilevska-Williams [5]. This includes the following 3 problems:

Problem 6

(s,t)-Reachability (st-Reach). Maintain a directed graph $G=(V,E)$ subject to edge insertions and deletions, so that queries about the reachability of fixed vertices $s$ and $t$ can be answered quickly.

Problem 7

Bipartite Perfect Matching (BPMatch). Preprocess and maintain undirected bipartite graph $G=(V,E)$ subject to edge insertions and deletions, so that we can quickly answer if the graph has perfect matching.

Problem 8

Strong Connectivity (SC). Preprocess and maintain directed graph $G=(V,E)$ subject to edge insertions and deletions, so that we can quickly answer if the graph is strongly connected

Using our last theorem and the reductions by Abboud and Vassilevska-Williams [5], noting that they do not effect the space usage, we get the following:

Theorem 9.10

Assume the Strong SetDisjointness conjecture. Suppose there is a data structure for st-Reach/ BPMatch/ SC problem for a graph $G=(V,E)$ , with $S$ space and update and query time $T$ . Then $S=\tilde{\Omega}(|E|^{2}/T^{2})$ .

We can get better lower bound for these 3 problems on sparse graphs based on the directed reachability conjecture. Given a sparse graph $G=(V,E)$ as an instance of directed reachability we can reduce it to an instance of st-Reach by just adding to special nodes $s$ and $t$ to the graph. Then, we can answer queries of the form ”Is $v$ reachable from $u$ ?” by inserting two edges $(s,v)$ and $(u,t)$ and asking if $t$ is reachable from $s$ . After the query we can restore the initial state by deleting these two edges. Thus, by using the reductions from st-Reach to BPMatch and SC as shown in [5], we get the following hardness result:

Theorem 9.11

Assume the Directed Reachability conjecture. Any data structure for the st-Reach/ BPMatch/ SC problem on sparse graphs can not have $\tilde{O}(n^{1-\Omega(1)})$ update and query time and $\tilde{O}(n^{2-\Omega(1)})$ space.

Appendix

Appendix 0.A Sketch of the Main Results

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Amir Abboud, Arturs Backurs, Thomas Deuholm Hansen, Virginia Vassilevska Williams, and Or Zamir. Subtree isomorphism revisited. In Proc. of 27th ACM-SIAM Symposium on Discrete Algorithms, SODA , pages 1256–1271, 2016.
2[2] Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. If the current clique algorithms are optimal, so is Valiant’s parser. 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS , pages 98–117, 2015.
3[3] Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Quadratic-time hardness of LCS and other sequence similarity measures. 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS , pages 59–78, 2015.
4[4] Amir Abboud, Fabrizio Grandoni, and Virginia Vassilevska Williams. Subcubic equivalences between graph centrality problems, APSP and diameter. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015 , pages 1681–1697, 2015.
5[5] Amir Abboud and Virginia Vassilevska Williams. Popular conjectures imply strong lower bounds for dynamic problems. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014 , pages 434–443, 2014.
6[6] Amir Abboud, Virginia Vassilevska Williams, and Oren Weimann. Consequences of faster alignment of sequences. In Automata, Languages, and Programming - 41st International Colloquium, ICALP 2014, Copenhagen, Denmark, July 8-11, 2014, Proceedings, Part I , pages 39–51, 2014.
7[7] Amir Abboud, Virginia Vassilevska Williams, and Huacheng Yu. Matching triangles and basing hardness on an extremely popular conjecture. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015 , pages 41–50, 2015.
8[8] Rachit Agarwal. The space-stretch-time tradeoff in distance oracles. In Algorithms - ESA 2014 - 22th Annual European Symposium on Algorithms, Wroclaw, Poland, September 8-10, 2014. Proceedings , pages 49–60, 2014.