New Results for the Complexity of Resilience for Binary Conjunctive   Queries with Self-Joins

Cibele Freire; Wolfgang Gatterbauer; Neil Immerman; Alexandra Meliou

arXiv:1907.01129·cs.DB·June 17, 2020

New Results for the Complexity of Resilience for Binary Conjunctive Queries with Self-Joins

Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, Alexandra Meliou

PDF

TL;DR

This paper investigates the computational complexity of resilience in binary conjunctive queries with self-joins, providing new hardness results, structural characterizations, and a dichotomy for specific cases, advancing understanding of deletion problems in database queries.

Contribution

It introduces novel structural properties and complexity classifications for resilience in self-join queries, extending previous results and offering a dichotomy for certain restricted cases.

Findings

01

Identifies new structural properties affecting complexity.

02

Provides NP-hardness results for various query structures.

03

Establishes a dichotomy for queries with relations repeated up to twice.

Abstract

The resilience of a Boolean query is the minimum number of tuples that need to be deleted from the input tables in order to make the query false. A solution to this problem immediately translates into a solution for the more widely known problem of deletion propagation with source-side effects. In this paper, we give several novel results on the hardness of the resilience problem for $binary conjunctive queries with self-joins$ (i.e. conjunctive queries with relations of maximal arity 2) with one repeated relation. Unlike in the self-join free case, the concept of triad is not enough to fully characterize the complexity of resilience. We identify new structural properties, namely chains, confluences and permutations, which lead to various $N P$ -hardness results. We also give novel involved reductions to network flow to show certain cases are in $P$ . Overall, we give a dichotomy…

Tables1

Table 1. Table 1 . Annotations specifying the relevant classes of queries.

	Query class
●	all self-join conjunctive queries
◐	single-self-join (ssj) binary conjunctive queries
$◐^{:}$	ssj binary conjunctive queries with exactly 2 $R$ -atoms
$◐^{∵}$	ssj binary conjunctive queries with exactly 3 $R$ -atoms

Equations203

\texttt{witnesses}(D,q)=\bigl{\{}\bm{\mathbf{w}}\,\bigm{|}\,D\models q[\bm{\mathbf{w}}/\bm{\mathbf{x}}]\bigr{\}}\;.

\texttt{witnesses}(D,q)=\bigl{\{}\bm{\mathbf{w}}\,\bigm{|}\,D\models q[\bm{\mathbf{w}}/\bm{\mathbf{x}}]\bigr{\}}\;.

witnesses (D, q_{chain}) = {(1, 2, 3), (2, 3, 3), (3, 3, 3)}

witnesses (D, q_{chain}) = {(1, 2, 3), (2, 3, 3), (3, 3, 3)}

q_{△}

q_{△}

q_{T}

q_{rats}

q_{lin}

q_{T}^{'}

q_{T}^{'}

q_{rats}^{'}

q_{vc}

q_{vc}

q_{chain}

D = {A (1), A (5), R (1, 2), R (2, 3), R (3, 1), R (5, 1), R (2, 5)}

D = {A (1), A (5), R (1, 2), R (2, 3), R (3, 1), R (5, 1), R (2, 5)}

q_{conf}^{A C}

q_{conf}^{A C}

q_{3perm-R}^{A}

q_{comp}

q_{comp}

q_{comp}^{1}

q_{comp}^{2}

f : [arity (A)] \to [arity (B)]

f : [arity (A)] \to [arity (B)]

q_{1}

q_{1}

q_{2}

q_{△}

q_{△}

q_{△}^{sj_{1}}

q_{△}^{sj_{2}}

q_{△}^{sj_{3}}

q_{rats}

q_{rats}

q_{brats}

q_{rats}^{sj_{1}}

q_{rats}^{sj_{1}}

q_{brats}^{sj_{1}}

q_{perm}

q_{perm}

q_{perm}^{A B}

q_{perm}^{A B}

z_{1}

z_{1}

z_{2}

z_{3}

q_{3 chain} : - R (x, y), R (y, z), R (z, w)

q_{3 chain} : - R (x, y), R (y, z), R (z, w)

q_{3 conf}

q_{3 conf}

q_{3 conf}^{A C}

q_{3 conf}^{A C}

q_{3 conf}^{T S}

q_{3 conf}^{A S}

q_{3 conf}^{A S}

q_{3cc}^{A C}

q_{3cc}^{A C}

q_{3cc}^{A S}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\usetkzobj

all

\newaliascntlemmatheorem \aliascntresetthelemma \newaliascntconjecturetheorem \aliascntresettheconjecture \newaliascntremarktheorem \aliascntresettheremark \newaliascntcorollarytheorem \aliascntresetthecorollary \newaliascntdefinitiontheorem \aliascntresetthedefinition \newaliascntpropositiontheorem \aliascntresettheproposition \newaliascntexampletheorem \aliascntresettheexample \newaliascntaxiomtheorem \aliascntresettheaxiom \newaliascntproblemtheorem \aliascntresettheproblem \newaliascntfacttheorem \aliascntresetthefact \newaliascntclaimtheorem \aliascntresettheclaim

New Results for the Complexity of Resilience for Binary Conjunctive Queries with Self-Joins

Cibele Freire

Wellesley College

,

Wolfgang Gatterbauer

Northeastern University, Boston

,

Neil Immerman

University of Massachusetts Amherst

and

Alexandra Meliou

University of Massachusetts Amherst

Abstract.

The resilience of a Boolean query on a database is the minimum number of tuples that need to be deleted from the input tables in order to make the query false. A solution to this problem immediately translates into a solution for the more widely known problem of deletion propagation with source-side effects. In this paper, we give several novel results on the hardness of the resilience problem for conjunctive queries with self-joins, and, more specifically, we present a dichotomy result for the class of single-self-join binary queries with exactly two repeated relations occurring in the query. Unlike in the self-join free case, the concept of triad is not enough to fully characterize the complexity of resilience. We identify new structural properties, namely chains, confluences and permutations, which lead to various NP-hardness results. We also give novel involved reductions to network flow to show certain cases are in P. Although restricted, our results provide important insights into the problem of self-joins that we hope can help solve the general case of all conjunctive queries with self-joins in the future.

††ccs: Theory of computation Database theory††ccs: Information systems Relational database model

1. Introduction

Various problems in database research, such as causality, explanations, and deletion propagation, examine how interventions in the input to a query impact the query’s output. An intervention constitutes a change (update, addition, or deletion) to the input tuples. In this paper, we study the resilience of a Boolean query with respect to tuple deletions. Resilience is a variant of deletion propagation that focuses on Boolean queries: it corresponds to the minimum number of tuples whose deletion causes the query to evaluate to false. In previous work (Freire et al., 2015), we provided a full characterization of the complexity of resilience for the family of self-join-free conjunctive queries (sj-free CQs) with functional dependencies. In this paper, we augment the previous results to account for a restricted class of self-joins.

Self-joins have long plagued the complexity study of many problems in database theory research: for example, on the topic of consistent query answering, Kolaitis and Pema (Kolaitis and Pema, 2012) proved a dichotomy into PTIME and coNP-complete cases for the family of queries with only two atoms and no self-joins. Koutris and Suciu (Koutris and Suciu, 2014) extended the dichotomy to the larger class of self-join-free conjunctive queries, where each atom has as primary key either a single attribute or all the attributes. Koutris and Wijsen (Koutris and Wijsen, 2017, 2018b) further extended the dichotomy to the full class of sj-free Boolean CQs, and queries with negated atoms (Koutris and Wijsen, 2018a). To the best of our knowledge, there is no known result on this problem for a query family that permits self-joins. As another example, complexity results on the problem of query-based pricing (Koutris et al., 2015) are also restricted to the class of sj-free CQs. On the closely related topic of deletion propagation with view side-effects, Kimelfeld et al. (Kimelfeld et al., 2012) used a characteristic of the query structure (head domination) to formalize a complexity dichotomy for the family of sj-free CQs, and indicated that self-joins can significantly harden approximation in the problem of deletion propagation. Extensions to the cases of functional dependencies (Kimelfeld, 2012) and multi-tuple deletions (Kimelfeld et al., 2013) also focused on the same query class. These examples offer strong indication that self-joins introduce significant hurdles in the study of a variety of problems, and progress in cases that account for self-joins is rare.111While some prior work on related problems does allow for self-joins (Buneman et al., 2002; Cong et al., 2012; Amarilli et al., 2017), the complexity characterizations in those results are not specific to the queries, but rather to high-level operators (e.g, join, projection, etc.). In contrast, our work provides results that are fine-grained and identify elements of the query structure that render the resilience problem NP-complete or PTIME-computable.

In this paper, we give several novel results on the hardness of the resilience problem for CQs with self-joins. We show some results that hold for any CQ with self-join but later we focus on the class of binary CQs (those where relations are either unary or binary). We provide various complexity results for binary CQs where only one relation name can be repeated, which we denote by single-self-join (ssj). We analyze the case of ssj binary queries in general but emphasize that for the case with at most 2 instances of the repeated relation, we prove that a P versus NP-complete dichotomy exists. We further provide a unifying criterion for hardness (a “proof template”), and we conjecture that it subsumes and generalizes the criterion of triads from Sj-free queries, and that it provides a sufficient criterion of hardness for any CQ.

Contributions and outline.

•

Contrasting with current knowledge about the resilience of CQs without self-joins (summarized in Section 2), we demonstrate how self-joins complicate the problem and invalidate several aspects and intuitions from the self-join-free case (Section 3).

•

We establish foundations for tackling the resilience problem for conjunctive queries with self-joins by identifying important conditions on the minimality and connectedness of queries and by revising the fundamental notion of query domination (Section 4).

•

We prove that resilience for queries that contain a triad (a structure that characterizes hardness in the sj-free case (Freire et al., 2015)) remains NP-complete in the presence of self-joins (Section 5.2).

•

By narrowing our target class to the class of binary conjunctive queries (those where relations are either unary or binary) and single-self-join queries (i.e., only one relation can appear in multiple atoms of the query), we identify a new structure that implies hardness, thus expanding the NP-complete class compared to the sj-free case (Section 6).

•

We identify and define the fundamental structures of chains, confluences, and permutations, and use them to prove a complete dichotomy between NP-complete and PTIME cases for the class of single-self-join binary conjunctive queries where exactly two atoms in a query correspond to the same relation (Section 7).

•

We prove several involved results using the chains, confluences, and permutations structures in the case of single-self-join binary conjunctive queries where exactly 3 atoms correspond to the same relation. While a complete dichotomy for this class remains elusive, our work creates a roadmap and identifies remaining open problems (Section 8).

•

We provide the novel concept of Independent Join Paths. This general “proof template” aims to ( $i$ ) provide a sufficient criterion of hardness for any CQs, ( $ii$ ) subsume the prior hardness criterion of triads for SJ-free CQs, and ( $iii$ ) provide a hint for an approach that could possibly automate the search for hardness reductions. (Section 9).

Some of our results apply to the general class of self-join CQs, while others apply to more restricted query families. We annotate our theoretical results with the symbols detailed in Table 1 to indicate the relevant assumptions.

2. Background and Prior Results

This section introduces our notation, defines the resilience of a query, and summarizes prior complexity results for sj-free queries.

Standard database notations. We use boldface to denote tuples or ordered sets, (e.g., $\bm{\mathbf{x}}=(x_{1},\ldots,x_{k})$ ) and use both subscripts and superscripts as indices (e.g., $a^{1}$ and $a_{1}$ ). We fix a relational vocabulary $\bm{\mathbf{R}}=(R_{1},\ldots,R_{\ell})$ , and denote $\texttt{arity}(R_{i})$ the arity of a relation $R_{i}$ . We call unary and binary those relations with arity 1 or 2, respectively. We call “binary queries” those queries that contain only unary or binary relations. A database instance over $\bm{\mathbf{R}}$ is $D=(R_{1}^{D},\ldots,R_{\ell}^{D})$ , where each $R_{i}^{D}$ is a finite relation. We call the elements of $R_{i}^{D}$ tuples and write $R_{i}$ instead of $R_{i}^{D}$ when $D$ is clear from the context. With some abuse of notation we also denote $D$ as the set of all tuples, i.e. $D=\bigcup_{i}R_{i}$ , where the union is understood to be a disjoint union (thus each tuple belongs to only one relation). The active domain $\texttt{dom}(D)$ is the set of all constants occurring in $D$ . The size of the database instance is $n=|D|$ , i.e. the number of tuples in the database.222Notice that other work sometimes uses $\texttt{dom}(D)$ as the size of the database. Our different definition has no implication on our complexity results but simplifies the discussions of our reductions.

A conjunctive query (CQ) is a first-order formula $q(\bm{\mathbf{y}})$ $=\exists\bm{\mathbf{x}}\,(g_{1}\wedge\ldots\wedge g_{m})$ where the variables $\bm{\mathbf{x}}=(x_{1},\ldots,x_{k})$ are called existential variables, $\bm{\mathbf{y}}=(y_{1},\ldots,y_{c})$ are called the head variables (or free variables), and each atom (also called subgoal) $g_{i}$ represents a relation $g_{i}=R(\bm{\mathbf{z}}_{i})$ where $\bm{\mathbf{z}}_{i}\subseteq\bm{\mathbf{x}}\cup\bm{\mathbf{y}}$ .333WLOG, we assume that $\bm{\mathbf{z}}_{i}$ is a tuple of only variables and don’t write the constants. Selections can always be directly pushed into the database before executing the query. In other words, for any constant in the query, we can first apply a selection on each relation and then consider the modified query with a column removed. A self-join-free CQ (sj-free CQ) is one where no relation symbol occurs more than once and thus every atom represents a different relation. In turn, a self-join CQ is one where at least one relation symbol is repeated, and a single-self-join (ssj) CQ is one where only one relation symbol can be repeated in a query. We write $\textup{{var}}(g_{j})$ for the set of variables occurring in atom $g_{j}$ . As usual, we abbreviate a non-Boolean query in Datalog notation by $q(\bm{\mathbf{y}}){\,:\!\!-\,}g_{1},\ldots,g_{m}$ where $q$ has head variables $\bm{\mathbf{y}}$ and $g_{1},\ldots,g_{m}$ represents the body of the query.

Unless otherwise stated, a query in this paper denotes a Boolean CQ $q$ (i.e., $\bm{\mathbf{y}}=\emptyset$ ). We write $D\models q$ to denote that the query $q$ evaluates to true over the database instance $D$ , and $D\not\models q$ to denote that $q$ evaluates to false. For a Boolean query $q$ , we write $q(\bm{\mathbf{x}})$ to indicate that $\bm{\mathbf{x}}$ represents the set of all existentially quantified variables. We write $[k]$ as short notation for the set $\{1,\ldots,k\}$ .

Additional notations.

We call a valuation of all existential variables that is permitted by $D$ and that makes $q$ true (i.e. $D\models q[\bm{\mathbf{w}}/\bm{\mathbf{x}}]$ ) a witness $\bm{\mathbf{w}}$ .444Note that our notion of witness slightly differs from the one commonly seen in provenance literature where a “witness” refers to a subset of the input database records that is sufficient to ensure that a given output tuple appears in the result of a query (Cheney et al., 2009). The set of witnesses is then

[TABLE]

Since every witness implies exactly one set of at most $m$ tuples from $D$ that make the query true, we will slightly abuse the notation and also refer to this set of tuples as “witnesses.” For example, consider the query $q_{\textup{{chain}}}{\,:\!\!-\,}R(x,y),R(y,z)$ with $\bm{\mathbf{x}}=(x,y,z)$ over the database $D=\{t_{1}:R(1,2),t_{2}:R(2,3),t_{3}:R(3,3)\}$ . Then one can easily see that

[TABLE]

and their respective tuples are $\{t_{1},t_{2}\}$ , $\{t_{2},t_{3}\}$ , and $\{t_{3}\}$ .

In line with prior work (Freire et al., 2015; Meliou et al., 2010), relations may be specified as exogenous, meaning that tuples from these relations cannot be deleted.555In other words, tuples in these atoms provide context and are outside the scope of possible “interventions” in the spirit of causality (Halpern and Pearl, 2005). We specify the atoms corresponding to exogenous relations with a superscript “x”. The remaining atoms are endogenous.

Complexity theory. We write $S\leq T$ to mean $S\leq_{\textrm{fo}}T$ .666First-order reductions are not required, but it is the case that all reductions defined in the paper are expressible in first-order. We say that two problems have equivalent complexity ( $S\equiv T$ ) iff they are inter-reducible, i.e., $S\leq T$ and $T\leq S$ .

2.1. Query resilience

In this paper, we focus on the problem of resilience, a variant of the problem of deletion propagation focusing on Boolean queries: Given $D\models q$ , what is the minimum number of endogenous tuples that have to be removed from $D$ to make the query false? A large minimum set implies that the query is more “resilient” and requires the deletion of more tuples to change the query output. In order to study the complexity of resilience, we focus on the decision problem:

Definition 1 (Resilience Decision).

Given a query $q$ , database $D$ , and an integer $k$ . We say that $(D,k)\in\texttt{RES}(q)$ if and only if $D\models q$ and there exists a set $\Gamma$ with at most $k$ endogenous tuples s.t. $D-\Gamma\not\models q$ . We define $\rho(D,q)$ as the size of a minimum contingency set for input $D$ and $q$ .

In other words, $(D,k)\in\texttt{RES}(q)$ means that there is a set of $k$ or fewer endogenous tuples whose removal makes the query false. We refer to such a set of tuples $\Gamma$ as a “contingency set.” Observe that, for a fixed $q$ , we can talk about data complexity and $\texttt{RES}(q)\in\mbox{{\rm NP}}$ when $q$ is computable in PTIME.

A central result of the prior work on resilience (Freire et al., 2015) is that the complexity of resilience of an sj-free CQ can be exactly characterized via a natural property of its dual hypergraph $\mathcal{H}(q)$ . The hypergraph of an sj-free query $q$ is usually defined with its vertices being the variables of $q$ and the hyperedges being the atoms (Abiteboul et al., 1995). The dual hypergraph, $\mathcal{H}(q)$ , has vertex set $V=\{g_{1},\ldots,g_{m}\}$ , and each variable $x_{i}\in\textup{{var}}(q)$ determines the hyperedge consisting of all those atoms in which $x_{i}$ occurs: $\;e_{i}=\{g_{j}\,|\,x_{i}\in\textup{{var}}(g_{j})\}$ . A path in the graph is an alternating sequence of vertices and edges, $g_{1},x_{1},g_{2},x_{2},\ldots,g_{n-1},$ $x_{n-1},g_{n}$ , such that for all $i$ , $x_{i}\in\textup{{var}}(g_{i})\cap\textup{{var}}(g_{i+1})$ , i.e., the hyperedge $x_{i}$ joins vertices $g_{i}$ and $g_{i+1}$ . We explicitly list the hyperedges in the path, because more than one hyperedge may join the same pair of vertices. Since we only consider dual hypergraphs, we use the shorter term “hypergraph” from now on.

Example 2 (Hypergraphs).

We illustrate the prior results with the following 4 queries and their hypergraphs shown in Fig. 1:

[TABLE]

In the remainder of this section, we summarize the intuition behind three main constructs—triads, domination, and linear queries—that lead to the result presented in Theorem 7. Then, in Section 3 we provide an exposition of how self-joins alter or completely invalidate these prior constructs.

2.2. Domination

We may mark some relations in an input database as exogenous and, the remaining relations are endogenous. However, some relations are “implicitly” exogenous. For example, the relation $W$ in $q_{T}$ is given as endogenous, but is never needed in minimum contingency sets. We next define a syntactic property, called domination, that captures when endogenous relations are implicitly exogenous.

Definition 3 (Domination).

If a query $q$ has endogenous atoms $A,B$ such that $\textup{{var}}(A)\!\subset\!\textup{{var}}(B)$ , we say that $A$ dominates $B$ .

For example, $A(x)$ dominates $W(x,y,z)$ in $q_{T}$ . Whenever a contingency set contains tuples from $W$ , they can always be replaced with a smaller than, or equal, number of tuples from $A$ .

Proposition 4 ( Domination for resilience (Meliou

et al., 2010)).

Let $q$ be an sj-free CQ and $q^{\prime}$ the query resulting from labeling some dominated atoms as exogenous. Then $\texttt{RES}(q)\equiv\texttt{RES}(q^{\prime})$ .

When studying resilience, we follow the convention that all dominated atoms are made exogenous, and we consider that the normal form of a query. As we have seen, $A$ dominates $W$ in $q_{\textup{{T}}}$ . Similarly, the atom $A$ dominates both $R$ and $T$ in $q_{\textup{{rats}}}$ . We thus transform the queries so that the dominated atoms are exogenous. Exogenous atoms have the superscript “x”.

[TABLE]

Proposition 4 implies that $\texttt{RES}(q_{\textup{{rats}}})\equiv\texttt{RES}(q_{\textup{{rats}}}^{\prime})$ .

2.3. Triads and hardness

We showed in (Freire et al., 2015) that $\texttt{RES}(q_{\triangle})$ and $\texttt{RES}(q_{\textup{{T}}})$ from Example 2 are NP-complete. While $q_{\triangle}$ and $q_{\textup{{T}}}$ appear to be quite different, they share a key common structural property which alone is responsible for hardness for sj-free CQs.

Definition 5 (Triad).

A triad is a set of three endogenous atoms, $\mathcal{T}=\{S_{0},S_{1},S_{2}\}$ such that for every pair $i,j$ , there is a path from $S_{i}$ to $S_{j}$ in $\mathcal{H}(q)$ that uses no variable occurring in the other atom of $\mathcal{T}$ .

Intuitively, a triad is a triple of points with “robust connectivity.” Observe that atoms $R,S,T$ form a triad in $q_{\triangle}$ and atoms $A,B,C$ form a triad in $q_{\textup{{T}}}$ (see Fig. 1). For example, there is a path from $R$ to $S$ in $q_{\triangle}$ (across hyperedge $y$ ) that uses only variables (here $y$ ) that are not contained in the other atom ( $y\not\in\textup{{var}}(T)$ ). We showed that triads are responsible for hardness (see Appendix B for proof):

Lemma 6 ( Triads make $\texttt{RES}(q)$ hard (Freire et al., 2015)).

Let $q$ be an sj-free CQ where all “dominated” atoms are exogenous. If $q$ has a triad, then $\texttt{RES}(q)$ is NP-complete.

2.4. Linear queries

A query $q$ is linear if its atoms can be arranged in a linear order s.t. each variable occurs in a contiguous sequence of atoms. Geometrically, a query is linear if all of the vertices of its hypergraph can be drawn along a straight line and all of its hyperedges can be drawn as convex regions (thus the variables form intervals on a line of relations). For example $q_{\textup{{lin}}}$ is linear (see Fig. 1(d)).

It was shown in (Meliou et al., 2010) that for any sj-free CQ that is linear, $\texttt{RES}(q)$ may be computed in a natural way using network flow. Thus all such queries are easy.

If all sj-free CQs without a triad were linear, then this would complete the dichotomy theorem for resilience. While this is not the case, we completed the proof of Theorem 7, by showing that every triad-free sj-free CQ may be transformed to a linear query of equivalent resilience.

2.5. Dichotomy Theorem

Now we can present the full characterization of the complexity of sj-free CQs proved in (Freire et al., 2015).

Theorem 7 ( Dichotomy of resilience for sj-free CQs (Freire et al., 2015)).

Let $q$ be an sj-free CQ and let $q^{\prime}$ be the result of making all “dominated” atoms exogenous. If $q^{\prime}$ has a triad, then $\texttt{RES}(q)$ is NP-complete, otherwise it is in PTIME.

3. Self-joins change everything

Queries with self-joins are far more complicated than sj-free queries for at least 4 reasons: (1) For the sj-free case, triads alone were shown to determine hardness. Triads need at least 3 existential variables and at least 3 subgoals. Section 3.1 shows that already 2 atoms or 2 variables can be enough for hardness; (2) Linear sj-free queries can be solved using a natural reduction to network flow. For self-join queries, linear queries can be hard. Furthermore, Section 3.3 shows that we may need more elaborate reductions to network flow, even when they are easy. (3) The previous definition of domination does not lead to the desired properties in the presence of self-joins. Section 3.2 explains why dominated atoms may still be relevant when computing the minimum contingency set. (4) Our previous crucial concept of the dual hypergraph is no longer sufficient to characterize queries when relations appear multiple times. The position at which a variable appears in a subgoal may influence the complexity of resilience, including whether an atom has repeated variables, e.g., “ $R(x,y),R(y,y)$ .”

In the cases where the variable position is relevant and we are restricted to binary queries, we naturally represent queries as labeled direct graphs. This representation captures all relevant structural information of the binary queries, especially the relative position of variables, which the hypergraph representation does not reflect.

Definition 1 (Binary graph).

Let $q{\,:\!\!-\,}A_{1},$ $\ldots,A_{m}$ be a binary CQ. Its binary graph has vertex set $V=\textup{{var}}(q)$ and labeled edge sets defined by atoms $A_{1},$ $\ldots,A_{m}$ , i.e. atom $A(x,y)$ translates into labeled edge $x\xrightarrow{A}y$ . For unary atoms, the edge will be a loop.

3.1. Basic hard queries: $q_{\textup{{vc}}}$ and $q_{\textup{{chain}}}$

We start by proving hardness for two queries that will play an important role in our later results. The first $q_{\textup{{vc}}}$ (for “vertex cover”) has only 2 variables and 3 atoms. The second $q_{\textup{{chain}}}$ (since it “chains” two binary relations together) has only 2 atoms and 3 variables:

[TABLE]

Figure 2 shows graphical representations of both queries while illustrating the differences between the dual hypergraph and the binary graph of a binary CQ.

Recall that in the sj-free case, a query needs a triad to be hard and all linear queries are easy. In particular, an sj-free query must have at least 3 variables and 3 atoms to be hard.

Proposition 2 ( $q_{\textup{{vc}}}$ ).

$\texttt{RES}(q_{\textup{{vc}}})$ * is NP-complete.*

Proposition 3 ( $q_{\textup{{chain}}}$ ).

$\texttt{RES}(q_{\textup{{chain}}})$ * is NP-complete.*

3.2. SJ-Free domination no longer works

We saw from Proposition 4 that in sj-free CQs, making all dominated atoms exogenous leaves the query resilience unchanged. In the presence of self-joins, this is no longer true.

Example 4.

Query $q_{\textup{{rats}}}^{\textrm{sj}_{1}}{\,:\!\!-\,}A(x),R(x,y),R(y,z),R(z,x)$ is a self-join variation of $q_{\textup{{rats}}}$ with $S,T$ replaced by $R$ ’s. Similar to $q_{\textup{{rats}}}$ , we have $\textup{{var}}(A)\subseteq\textup{{var}}(R(x,y))$ , so $A$ dominates $R$ by Definition 3. Thus $R$ should become exogenous when searching for the minimal contingency set. But this is not the case. Consider the database instance

[TABLE]

Our query has 3 witnesses over this database: $(1,2,3)$ , $(1,2,5)$ , and $(5,1,2)$ . If $R$ was made exogenous, the only possible minimum contingency set would be $\Gamma=\{A(1),A(5)\}$ . However, if $R$ is considered as endogenous, there is a smaller contingency set, with only $R(1,2)$ .

This example shows that domination as defined in Definition 3 no longer implies that a relation can be made exogenous in the self-join case. This immediately raises the question of whether there is a set of conditions which implies that a relation can be made exogenous in the self-join setting, i.e. if there is a self-join version of domination. Additionally, does $q_{\textup{{rats}}}^{\textrm{sj}_{1}}$ have a triad? The answer to both is yes, as we will see in LABEL:{sec:domination} and Section 5.1, respectively.

3.3. Easy queries that use flow in a trickier way

As mentioned in the discussion of Theorem 7, resilience for linear sj-free CQ can be computed directly from network flow. As we have just seen, in the presence of self-joins, some linear queries are hard. For those that are easy, network flow can still help us compute resilience, but the arguments become trickier.

The following queries are two such examples, where modified versions of network flow are used to show resilience is easy in these cases.

[TABLE]

Proposition 5 ( $q_{\textrm{conf}}^{AC}$ ).

$\texttt{RES}(q_{\textrm{conf}}^{AC})$ * is in P.*

Proposition 6 ( $q_{\textrm{3perm-R}}^{A}$ ).

$\texttt{RES}(q_{\textrm{3perm-R}}^{A})$ * is in P.*

4. New general observations and plan of attack

We next give 3 new general observations before we describe our plan of attack in the remainder of the paper.

4.1. Minimal queries

Given queries $q_{1}$ and $q_{2}$ , we say that $q_{1}$ is contained in $q_{2}$ ( $q_{1}\subseteq q_{2}$ ) if answers to $q_{1}$ over any database instance $D$ are always a subset of the answers to $q_{2}$ over $D$ . We say $q_{1}$ is equivalent to $q_{2}$ ( $q_{1}\equiv q_{2}$ ) if $q_{1}\subseteq q_{2}$ and $q_{2}\subseteq q_{1}$ (Abiteboul et al., 1995). We say a conjunctive query $q$ is minimal if for every other conjunctive query $q^{\prime}$ such that $q\equiv q^{\prime}$ , $q^{\prime}$ has at least as many atoms as $q$ . For every query $q$ , there exists a minimal equivalent CQ $q^{\prime}$ that can be obtained from $q$ by removing zero or more atoms (Chandra and Merlin, 1977).

From now on, we focus only on minimal queries. This is WLOG, since any non-minimal query can always be minimized as a pre-processing step. The reason is that our hardness evaluation relies on identifying certain subqueries (or patterns) in a query that make this query hard. However, if a pattern is in a subquery that is removed during minimization, then, this pattern has no effect on the resilience of the query.

4.2. Query components

A connected component of $q$ (or “component” in short) is a non-empty subset of atoms that are connected via existential variables. A query $q$ is disconnected if its atoms can be partitioned into two or more components that do not share any existential variables. For example,

[TABLE]

The resilience of a query is determined by taking the minimum of the resiliences of each of its components. In the following, let $\rho(q,D)$ stand for the resilience of query $q$ over database $D$ , which is the size of the minimum contingency set for $(q,D)$ .

Lemma 1 ( $\CIRCLE$ Query components).

Let $q{\,:\!\!-\,}q_{1},\ldots,q_{k}$ be a query that consists of $k$ components $q_{i}$ , $i\in[k]$ . Then $\rho(q,D)=\min_{i}\rho(q_{i},D)$ .

We can now show that the complexity of a query is determined by the hardest of its components if the query is minimal:

Lemma 2 ( $\CIRCLE$ Query components complexity).

Let $q$ be a minimal query that consists of $k$ query components. $\texttt{RES}(q)$ is NP-complete if there is at least one component $i\in[k]$ for which $\texttt{RES}(q_{i})$ is NP-complete. Conversely, if $\texttt{RES}(q_{i})$ is in P for all $i$ , then $\texttt{RES}(q)$ is in P.

In the remainder of the paper we assume queries to be connected.

4.3. SJ-domination

As discussed in Section 3.2, we need to consider the position of the variables in the attribute list of each atom in a sj-query. We write $\texttt{pos}^{q}_{g}(i)=x$ to express that the $i$ -th attribute of atom $g$ is variable $x$ for a query $q$ and omit $q$ when $q$ is clear from the context.

Definition 3 ( Domination with Self-Joins).

Let relations $A$ and $B$ be endogenous relations in query $q$ . We say that $A$ dominates $B$ if there exists a function

[TABLE]

such that for each $B$ atom $g_{B}$ , there exists an $A$ atom $h_{A}$ satisfying ${pos}_{h_{A}}(i)={pos}_{g_{B}}(f(i))$ , $\forall i\in[\texttt{arity}(A)]$ . In other words, each $B$ atom that occurs in $q$ has a corresponding $A$ atom, and each of these pairs will have matching variables accordingly to function $f$ .

Notice that when $B$ appears only once, the definition of domination is equivalent to the sj-free definition: $\textup{{var}}(A)\subseteq\textup{{var}}(B)$ .

Example 4.

To illustrate the new self-join domination, consider the following queries:

[TABLE]

By following the definition above, $A$ doesn’t dominate $R$ in $q_{1}$ but it does in $q_{2}$ , whereas $S$ is dominated in both queries. Notice that in $q_{2}$ , a tuple $R(a,b)$ will always join with tuple $A(b)$ so we can always choose $A(b)$ instead to be in the contingency set. The same is not true for $q_{1}$ , where a tuple $R(a,b)$ could join with $A(a)$ or $A(b)$ .

Proposition 5 ( $\CIRCLE$ Domination for resilience with Self-Join).

Let $q$ be a CQ and $q^{\prime}$ the result of labeling some dominated relations exogenous. Then $\texttt{RES}(q)\equiv\texttt{RES}(q^{\prime})$ .

4.4. Outline of our plan of attack

To obtain a dichotomy result for the resilience of binary queries in the presence of a single-self-join, we proceed as follows. (1) Section 5 shows that triads in any conjunctive queries with self-joins still imply hardness (Theorem 6) and furthermore, when triads are absent, the endogenous atoms are linearly connected. We call such queries pseudo-linear (Theorem 7). We conjecture that pseudo-linear queries may be transformed to linear queries of equivalent resilience (Conjecture 8). In any case, it suffices to study the criteria for hardness of pseudo-linear queries. (2) Section 6 generalizes the hardness pattern behind $q_{\textup{{vc}}}$ to a more general class of hard ssj binary queries that contain “paths” between repeated atoms. (3) We then focus on the complexity of the resilience of ssj binary CQs with at most a single repetition of a single relation. Section 7 gives a complete characterization of the complexity for the cases of 2 occurrences of a repeated relation. This is a dichotomy theorem: we show that for all such queries, $q$ , $\texttt{RES}(q)$ is either NP-complete or $\texttt{RES}(q)$ is reducible to network flow and is thus in P. Section 8 presents the remaining challenges that must be overcome in order to characterize all queries with 3 occurrences of a repeated relation. In Section 9, we present a “template” for hardness proofs that we believe will help us make progress in the general self-join case.

5. Non-linear Queries: NP-Complete

In this section we prove that queries containing triads remain hard in the presence of self-joins (Theorem 6). We then show that for any query that does not contain a triad, its endogenous atoms are arranged linearly. We call such a query pseudo-linear. Thus, we conclude that either a query contains a triad in which case its resilience problem is NP-complete, or it is pseudo-linear. In the following sections, we can thus safely restrict our attention to pseudo-linear queries.

Definition 1 (Self-join variation of a CQ).

Let $q$ be a sj-free CQ and let $q^{\textrm{sj}}$ result from $q$ by replacing some atoms $S_{i}(\overline{v})$ from $q$ with the atom $R_{i}(\overline{v})$ , where the relation $R_{i}$ occurs elsewhere in $q$ . We say that $q^{\textrm{sj}}$ is a self-join variation of $q$ .

Example 2 (Self-join variations).

Consider sj-free query $q_{\triangle}$ . The following are all its possible self-join variations:

[TABLE]

We first observe that the resilience of self-join variations of a query can only be harder than their sj-free counterpart:

Lemma 3 ( $\CIRCLE$ SJ Can Only Make Resilience Harder).

Let $q$ be an sj-free CQ and let $q^{\textrm{sj}}$ be a self-join variation of $q$ . If $q^{\textrm{sj}}$ is minimal, then $\texttt{RES}(q)\leq\texttt{RES}(q^{\textrm{sj}})$ .

We need to rely on the fact that $q^{\textrm{sj}}$ is minimal for the result to hold, as we see in the example below:

Example 4.

Consider query $q{\,:\!\!-\,}R(x,y),S(z,y),T(z,w),A(x,w)$ , and observe that $\texttt{RES}(q)$ is NP-complete because $q$ contains a triad. A possible self-join variation is $q^{\textrm{sj}}{\,:\!\!-\,}R(x,y),R(z,y),R(z,w),R(x,w)$ . Note that $q^{\textrm{sj}}$ is not minimal, and is equivalent to $R(x,y)$ . So $\texttt{RES}(q^{\textrm{sj}})$ is trivially in P.

By Lemma 3, the resilience of the self-join variations of $q_{\triangle}$ are NP-complete. Recall from Definition 5 that a triad is set of three endogenous atoms, so we can say that the self-join variations of $q_{\triangle}$ all have triads. However, it does not immediately follow from Lemma 3 that every sj-query with a triad is hard. The missing cases are when an sj-query includes a triad, but it is not a self-join variation of an sj-free query with a triad. We next explore this situation.

5.1. Self-join variations of $q_{\textup{{rats}}}$ and $q_{\textrm{brats}}$

Recall two important sj-free queries:

[TABLE]

$\texttt{RES}(q_{\triangle})$ is NP-complete because it contains the triad, $R,S,T$ . However, $q_{\textup{{rats}}}$ and $q_{\textrm{brats}}$ are easy because $A$ dominates $R,T$ and $B$ dominates $S$ so they only have two endogenous atoms each and thus no triad.

The same doesn’t occur with some self-join variations of $q_{\textup{{rats}}}$ and $q_{\textrm{brats}}$ . Below we list two example of variations which contain triads.

[TABLE]

In these examples, relation $R$ is now more robust and not dominated by $A$ or $B$ . Therefore, they still contain triads consisting of their three $R$ -atoms. The presence of a triad is a strong indication that these queries are hard but we cannot use Lemma 3 to show this because their sj-free counterparts, $q_{\textup{{rats}}}$ and $q_{\textrm{brats}}$ , are easy. We now proceed to show their complexity is hard.

Proposition 5.

Let $q$ be a self-join variation of $q_{\textup{{rats}}}$ or $q_{\textrm{brats}}$ . If $q$ has a triad, then $\texttt{RES}(q)$ is NP-complete.

Using Proposition 5, we now generalize the fact that triads make sj-free queries hard (Lemma 6) to the same result for general CQs.

5.2. Triads Make Queries Hard

Theorem 6 ( $\CIRCLE$ SJ-queries with triads).

If $q$ has a triad, then $\texttt{RES}(q)$ is NP-complete.

Proof Sketch.

We argue that there are only two cases to consider when a sj-query $q$ has a triad. Consider that $q$ is a self-join variations of a sj-free query $q^{\prime}$ . The first case is when the resilience of $q^{\prime}$ is hard, i.e., $\texttt{RES}(q^{\prime})$ is NP-complete. Then, we can use Lemma 3 to show $\texttt{RES}(q)$ is NP-complete. The second case is when $\texttt{RES}(q^{\prime})$ is in P. Then, we argue that we can show a reduction from $\texttt{RES}(q^{\prime\prime})\leq\texttt{RES}(q)$ , where $q^{\prime\prime}$ is a self-join variation of either $q_{\textup{{rats}}}$ or $q_{\textrm{brats}}$ that contains a triad. From Proposition 5, $\texttt{RES}(q^{\prime\prime})$ is NP-complete, which thus implies that $\texttt{RES}(q)$ is also NP-complete. ∎

Thus, if a query contains a triad, it is hard. In the next section, we discuss queries that do not contain triads and how they are similar to linear queries, since their endogenous atoms are linearly connected.

5.3. No Triad Means Pseudo-Linear

In (Freire et al., 2015), we proved that if a sj-free CQ $q$ has no triad, then $q$ may be transformed to a sj-free CQ query $q^{\prime}$ which is linear and such that $\texttt{RES}(q)\leq\texttt{RES}(q^{\prime})$ . Since linear sj-free CQ’s are easy, it follows that $q$ is easy.

This argument no longer works in the presence of self-joins because linear queries can be easy or hard. However, we can extend the theorem from (Freire et al., 2015) to show the following,

Theorem 7 ( $\CIRCLE$ No Triad Means Pseudo-Linear).

Let $q$ be a CQ with no triad. Then all endogenous atoms in $q$ are connected linearly.

We conjecture that pseudo-linearity is equivalent to linearity when considering resilience. What makes a query pseudo-linear, instead of linear or containing a triad, is the presence of some exogenous atoms. However, the exogenous atoms of a query are mostly only connecting the endogenous atoms, and also, if necessary, ensuring that $q$ is a minimal query, so we believe they can be modified to obtain a linear query without altering the complexity of a query.

Conjecture 8 ( $\CIRCLE$ No Triad Means Linear).

Let $q$ be a CQ with no triad. Then we can transform $q$ to a linear CQ $q^{\prime}$ with $\texttt{RES}(q)\equiv\texttt{RES}(q^{\prime})$ .

6. Paths are hard

Section 3.1 presented two linear queries that are hard, unlike in the sj-free case where all linear queries are easy. Note that these queries are binary and, in both, only one relation is part of a self-join. In other words, these are single-self-join binary queries.

We now identify a pattern characteristic of $q_{\textup{{vc}}}$ that we call a path. The main result of this section is that every ssj binary query containing a path is hard. We start by showing the case where the self-join relation is unary.

Theorem 1 ( $\LEFTcircle$ Unary path).

Let $q$ be a minimal ssj-CQ. If $q$ contains distinct atoms $R(x)$ and $R(y)$ , then $\texttt{RES}(q)$ is NP-complete.

Proof sketch.

Let $R(x)$ and $R(y)$ be the first two occurrences of the relation $R$ in $q$ . Since $q$ is connected, $R(x)$ and $R(y)$ are connected by at least one non-self-join relation, $S$ (see Fig. 4(a)). We prove that $\texttt{RES}(q_{\textup{{vc}}})\leq\texttt{RES}(q)$ . Details are in Appendix A, but it is not hard to see that any database $D\models q_{\textup{{vc}}}$ can be transformed to a database $D^{\prime}\models q$ that exactly preserves resilience. Here $R^{\prime},S^{\prime}$ in $D^{\prime}$ come from $A$ and $R$ in $D$ , and all the other atoms of $q$ (including any additional occurrences of the self-join relation, $R$ , to the right of $R(y)$ ) are covered by multiple, extra values which complete the joins but are never chosen in minimum contingency sets. Note that this proof doesn’t make any assumption about the existence or not of triads. ∎

When the self-join relation is binary, if two consecutive atoms, $R(x,y)$ , $R(z,w)$ , are disjoint, then we call this a binary path. “Overlapping” consecutive atoms with shared variables, such as $R(x,y)$ , $R(y,z)$ in $q_{\textup{{chain}}}$ , can also cause hardness and are studied in later sections.

Theorem 2 ( $\LEFTcircle$ Binary path).

Let $q$ be a minimal ssj-CQ. If $q$ has distinct consecutive sj atoms $R(x,y),R(z,w)$ with $\{x,y\}\cap\{z,w\}=\emptyset$ , then $\texttt{RES}(q)$ is NP complete.

Proof Sketch.

Given $R(x,y),R(z,w)$ as in the statement of the theorem, there must be an atom $S(u,v)$ , with $S\neq R$ on the path between them, and $u\in\{x,y\}$ and $v\not\in\{x,y\}$ . Now, as in the proof of Theorem 1, we reduce $\texttt{RES}(q_{\textup{{vc}}})$ to $\texttt{RES}(q)$ . We map any database $D\models q_{\textup{{vc}}}$ to a database $D^{\prime}\models q$ , where $R^{\prime}$ contains $\bigl{\{}(a,a)\,\bigm{|}\,A(a)\in~{}D\bigr{\}}$ plus other multiple, extra values for any other atoms of the relation $R$ in $q$ to the left of $R(x,y)$ or the right of $R(z,w)$ and $S^{\prime}=\bigl{\{}(a,b)\,\bigm{|}\,R(a,b)\in D\bigr{\}}$ . Same as in the unary case, there is no assumption about the linearity of the query. Details are in Appendix A. ∎

Unary and Binary Paths are the simplest of the hard patterns. By Theorem 1 and LABEL:binarypath, they always force their queries to be hard.

Since we have established that an sj- query either has a triad or is pseudo-linear (Theorem 7) and because we have proved that triads imply hardness (Theorem 6), we can now focus on the pseudo-linear queries.

In the next sections we study the more subtle pseudo-linear ssj binary queries, which do not contain paths.

7. Queries with exactly two $R$ -atoms

In this section we cover the complexity of pseudo-linear ssj binary queries with exactly two atoms referring to the same relation. We will refer to this relation as $R$ . As always, we assume that our query is minimal and connected, and from now on also assume that $q$ does not contain a triad or a path as described in Theorem 1 and Theorem 2; otherwise we would already know that $\texttt{RES}(q)$ is NP-complete. Even in this restricted setting, we will see that there is a surprisingly rich variety of structures, requiring different strategies to determine their complexity.

Because there are no paths, $R$ must be a binary relation and the two $R$ -atoms must have at least one variable in common.

•

Chains have one common variable and join in different attributes, e.g., $R(x,y),R(y,z)$ ;

•

Confluences have one common variable and join in the same attribute, e.g., $R(x,y),R(z,y)$ ;

•

Permutations share both variables but join in different attributes, e.g., $R(x,y),R(y,x)$ .

•

Queries with repeated variables (REP) have repeated variables in at least one $R$ -atom e.g., $R(x,x),R(x,y),B(y)$

Figure 5 shows the binary graphs for each these patterns, which helps visualize the subtle variations in how the $R$ -atoms can join. We consider each of these possibilities in turn and characterize their complexity.

7.1. 2-Chains

The chain query is the simplest possible minimal sj-query with two atoms and we proved earlier that its resilience is NP-complete (Proposition 3). In this section we prove that the chain structure is quite robust and that any of its expansions remains NP-complete.

We call “expansions” of $q_{\textup{{chain}}}$ any query obtained by adding new relations to it, i.e. relations that do not self-join. We start by presenting the expansions obtained by adding unary relations and then generalize that to any expansion.

Figure 6(a) shows how unary relations can be added to $q_{\textup{{chain}}}$ . Each one can appear by itself or combined with others. While the proof involves several subcases, the important take-away is that all 8 of these expansions are hard.

Proposition 1 (Chains with unary relations).

Any expansion of $q_{\textup{{chain}}}$ with unary relations is NP-complete.

Proof Sketch.

We prove these expansions are hard by a reduction from 3SAT. The same idea used to prove that $\texttt{RES}(q_{\textup{{chain}}})$ is hard will work here as long as we adapt the variable and clause gadgets to deal with the existence of the unary relations. Lemmas 3, 4 and 5 in Appendix A contain the details. ∎

Now we can generalize this hardness result to any chain expansion using a reduction idea similar to the ones used for the proofs of Theorems 1 and 2 for paths.

Proposition 2 ( $\LEFTcircle^{:}$ Chains).

If a query $q$ contains a 2-chain as its only self-join, then $\texttt{RES}(q)$ is NP-complete.

7.2. 2-Confluences

Confluences are defined by relation $R$ joining only in the same attribute. We refer to this pattern as $q_{\textup{{conf}}}$ (Fig. 6(b)).

Note that as a stand-alone query $q_{\textup{{conf}}}$ is not minimal, so we need other atoms connected to both $x$ and $z$ . An example of a minimal query containing a confluence is $q_{\textup{{conf}}}^{AC}$ ${\,:\!\!-\,}$ $A(x),R(x,y),R(z,y),C(z)$ .

We next show that the standard flow algorithm without any modifications works correctly for linear queries with no self-join other than one $2$ -confluence, thus generalizing the idea of Proposition 5.

Proposition 3 ( $\LEFTcircle^{:}$ $q_{\textup{{conf}}}$ ).

$\texttt{RES}(q)$ * for any linear query $q$ with $q_{\textup{{conf}}}$ as its only self-join pattern can be solved in P by standard network flow.*

In Proposition 3 we assume that $q$ is linear, thus guaranteeing that every path in $q$ from $x$ to $z$ involves the variable $y$ , and therefore we are able to create a network flow to solve the problem. Note that this is not true in general for pseudo-linear queries containing $q_{\textup{{conf}}}$ . For example, consider $c\!f_{p}{\,:\!\!-\,}R(x,y)H^{\textup{x}}(x,z)R(z,y)$ . It is easy to see that $c\!f_{p}$ is pseudo-linear but we have $\texttt{RES}(c\!f_{p})\equiv\texttt{RES}(q_{\textup{{vc}}})$ . Thus, we cover all possible cases for $q_{\textup{{conf}}}$ by observing,

Proposition 4 ( $\LEFTcircle^{:}$ ).

Let $q$ be a pseudo-linear query with $q_{\textup{{conf}}}$ as its only self-join pattern. If $q$ contains an exogenous path from $x$ to $z$ not involving the variable $y$ , then $\texttt{RES}(q)$ is NP-complete; otherwise it is in P.

7.3. 2-Permutations

We call two $R$ -atoms sharing both variables a permutation. The smallest pattern that has this property is $R(x,y),R(y,x)$ (Fig. 5). We show that permutations have both NP-complete and PTIME instances.

Easy permutations. We start with two easy permutations.

[TABLE]

Proposition 5.

$\texttt{RES}(q_{\textup{perm}})$ * and $\texttt{RES}(q_{\textup{perm}}^{A})$ are in P.*

Proof.

Given a database $D_{1}$ satisfying $q_{\textup{perm}}$ , each tuple that is part of a witness for $D_{1},q_{\textup{perm}}$ is part of exactly one witness. Therefore the size of a minimum contingency set for $D_{1},q_{\textup{perm}}$ is exactly the number of witnesses.

Given a database $D_{2}$ satisfying $q_{\textup{perm}}^{A}$ , for each join $(a,b)$ , we have 2 possible choices. Either $A(a)$ will be in the min $\Gamma$ or either one of $R(a,b)$ and $R(b,a)$ but never both. Therefore we can reduce $\texttt{RES}(q_{\textup{perm}}^{A})$ to vertex cover in a bipartite graph, which is in P. ∎

Hard permutations. Surprisingly, adding another unary atom to $q_{\textup{perm}}^{A}$ , thus bounding it on both ends. leads to a hard query.

[TABLE]

It is still true that for any pair $R(a,b),R(b,a)$ participating in a join, a minimum contingency set will only contain one tuple from the pair. This might lead to the wrong conclusion that network flow could solve this problem. We will next show that this is incorrect.

Proposition 6.

$\texttt{RES}(q_{\textup{perm}}^{AB})$ * is NP-complete.*

The criterion. The main structural difference between the hard and easy permutations defined above is whether or not there are relations that “bound” the permutation on both ends, i.e. whether there are endogenous relations $S,T$ , such that $S$ contains variable $x$ but not $y$ , and $T$ contains variable $y$ but not $x$ . Thus, the hard permutation, $q_{\textrm{perm}}^{AB}$ , is bound, but the easy ones, $q_{\textrm{perm}},q_{\textrm{perm}}^{A}$ , are not bound. Using this characterization, we identify when 2-permutations are hard.

Proposition 7 ( $\LEFTcircle^{:}$ ).

Let $q$ be a pseudo-linear query with $R(x,y),R(y,x)$ as its only self-join. If $q$ is bound, then $\texttt{RES}(q)$ is NP-complete; otherwise, $\texttt{RES}(q)$ is in P.

7.4. Queries with REP

We call queries with repeated variables (or REP in short) those where atoms contain the same variable twice, e.g. occurrences of $R(x,x)$ . Note that this is only relevant for the case where $R$ is part of a self-join, otherwise it could be considered as $R(x)$ .

There are only three patterns to consider when we are restricted to two $R$ -atoms, either one or both atoms have repeated variables. The following queries are the smallest examples of this class of queries:

[TABLE]

Notice that queries $z_{1}$ and $z_{2}$ satisfy the condition for hardness of binary paths (Theorem 2), since their set of variables is disjoint. Therefore, we can conclude that $\texttt{RES}(z_{1})$ and $\texttt{RES}(z_{2})$ are NP-complete, as well as any expansion of those queries. We show that any REP queries that contain $z_{3}$ are in P.

Proposition 8 ( $\LEFTcircle^{:}$ ).

Any pseudo-linear query $q$ with exactly two $R$ -atoms that contains $z_{3}$ is in P.

7.5. The dichotomy

Combining our results so far, with at most two occurrences of the self-join relation, we have proved a complete characterization of the complexity of resilience:

Theorem 9 ( $\LEFTcircle^{:}$ Two-Atom Dichotomy).

Consider $q$ an ssj-CQ, with at most two occurrences of the self-join relation. If $q$ has any of the following

(1)

triad 2. (2)

path 3. (3)

chain 4. (4)

bounded permutation 5. (5)

confluence with exogenous path

then $\texttt{RES}(q)$ is NP-complete. Otherwise, $\texttt{RES}(q)$ is PTIME via a reduction to network flow. In addition there is a PTIME algorithm that on input $q$ determines which case occurs.

8. Queries with exactly three $R$ -atoms

In Theorem 9 we completely characterized the complexity of resilience of all CQs with at most one repetition of a single relation, thus extending the dichotomy for sj-free CQs into the land of self-joins.

In this section, we present an overview of what can happen when we allow a third $R$ -atom to self-join. Since we only have to consider pseudo-linear queries that do not have a path, all three $R$ -atoms must connect to each other directly or through the third $R$ -atom. Even though this is still a restrictive setting, we will see that it brings non-trivial complications to the characterization. We will present some complexity results; but also some remaining open problems.

8.1. 3-Chains

We obtain a 3-chain by adding an extra $R$ -atom to a 2-chain in a way such that the new atom joins in a different attribute from the other two.

[TABLE]

Analogous to the 2-chain case, 3-chains are always hard. In fact this holds for 4-chains, 5-chains, etc.

Proposition 1 ( $\LEFTcircle$ ).

For all $k\geq 2$ , if $q$ contains a $k$ -chain as its only self-join, then $\texttt{RES}(q)$ is NP-complete.

8.2. 3-Confluences

Adding a third $R$ -atom to a 2-confluence and making sure that it joins in the same attribute with one of the two existing $R$ -atoms produces a 3-confluence.

[TABLE]

As in the 2-confluence case, $q_{3\textrm{conf}}$ is not minimal, so other atoms are required to make it minimal. Here are a few examples of minimal queries containing $q_{3\textrm{conf}}$ .

[TABLE]

These queries are very similar but one of them is hard, while the other one is easy.

Proposition 2.

$\texttt{RES}(q_{3\textrm{conf}}^{AC})$ * is NP-complete.*

Proposition 3 ( $\LEFTcircle^{\because}$ ).

Any variation of $q_{3\textrm{conf}}^{AC}$ obtained by including unary relations is NP-complete.

Proof.

We define a reduction from Max 2SAT similar to the one used for $q_{3\textrm{conf}}^{AC}$ by adding the appropriate tuples to obtain the same set of joins. The contingency set doesn’t change with the new tuples and therefore the properties of the reduction hold. ∎

Proposition 4.

$\texttt{RES}(q_{3\textrm{conf}}^{TS})$ * is in P.*

Open problem. There is a third variant of 3-confluences which somewhat mix queries $q_{3\textrm{conf}}^{AC}$ and $q_{3\textrm{conf}}^{TS}$ (Fig. 7).

[TABLE]

The complexity of $\texttt{RES}(q_{3\textrm{conf}}^{AS})$ remains unknown.

8.3. 3-Chain-Confluence

With 3 $R$ -atoms, it is possible that different patterns will occur at the same time. This feature of this case makes it harder to analyze the queries, since the result of these interactions might diverge from what we expect when we see each pattern in isolation.

In this section we present some queries where a 2-chain and a 2-confluence occur at the same time.

[TABLE]

The resilience of these queries is hard but they require different reductions. If $x$ is bound, then we can use a reduction from $\texttt{RES}(q_{\textup{{chain}}})$ . Otherwise we need a reduction from Max 2SAT.

Proposition 5.

$\texttt{RES}(q_{\textrm{3cc}}^{AC})$ * and $\texttt{RES}(q_{\textrm{3cc}}^{AS})$ are NP-complete.*

Proposition 6.

$\texttt{RES}(q_{\textrm{3cc}}^{C})$ * is NP-complete.*

Open Problem. In this category of queries with chain and confluence, we don’t know the complexity of $q_{\textrm{3cc}}^{S}{\,:\!\!-\,}R(x,y)R(y,z)R(w,z)S(w,z)$ .

8.4. 3-Permutation plus R

It is not possible to obtain two permutations in a query with only 3 $R$ -atoms. In fact, there are only two ways that a new $R$ -atom can be connected to a permutation: either by joining with $x$ or $y$ , and those are equivalent.

[TABLE]

Similar to the $q_{\textrm{3conf}}$ case, $q_{\textrm{3perm-R}}$ is not a minimal query, so additional atoms are necessary. We list the main examples of how this query can be made minimal and discuss the complexity of their resilience.

First we start with a query we have already seen and another one that is a slight variation on the first (Fig. 3(b)).

[TABLE]

We proved in Proposition 6 that $\texttt{RES}(q_{\textrm{3perm-R}}^{A})$ is in P by using network flow. A similar argument proves that $\texttt{RES}(q_{\textrm{3perm-R}}^{S_{wx}})$ is also in P.

Proposition 7.

$\texttt{RES}(q_{\textrm{3perm-R}}^{S_{wx}})$ * is in P.*

The next query we will see is $q_{\textrm{3perm-R}}^{S_{xy}}$ . Although very similar to $q_{\textrm{3perm-R}}^{A}$ and $q_{\textrm{3perm-R}}^{S_{wx}}$ , $\texttt{RES}(q_{\textrm{3perm-R}}^{S_{xy}})$ is hard. It is surprising that such a small difference can already change the complexity of the resilience problem. Moreover, the proof requires a new reduction instead of a reduction similar to the one used in Proposition 6.

[TABLE]

Proposition 8.

$\texttt{RES}(q_{\textrm{3perm-R}}^{S_{xy}})$ * is NP-complete.*

Some other examples of queries that are hard but these are somewhat related to $q_{\textrm{perm}}^{AB}$ .

[TABLE]

Proposition 9.

$\texttt{RES}(q_{\textrm{3perm-R}}^{AC})$ , $\texttt{RES}(q_{\textrm{3perm-R}}^{AB})$ and $\texttt{RES}(q_{\textrm{3perm-R}}^{S_{xy}BC})$ are NP-complete.

Open Problems. Despite the similarities with the queries presented in this section, we were not able to determine the complexity of the following queries:

[TABLE]

8.5. Queries with REP

If all three occurrences of $R$ have repeated variables, then we are in the path case.

[TABLE]

Proposition 10.

$\texttt{RES}(z_{4})$ * and $\texttt{RES}(z_{5})$ are NP-complete.*

Open problems. We don’t know the complexity of other queries that fall in this category of having three $R$ -atoms with REP but the following open ones are intriguing.

[TABLE]

Query $z_{6}$ has a similar structure to $q_{\textup{{chain}}}$ but a similar reduction doesn’t seem to work. Similarly, a reduction from $\texttt{RES}(q_{\textrm{perm}}^{AB})$ doesn’t work for $z_{7}$ .

9. Independent Join Paths: a unifying hardness criterion

Motivation. We now define a particular “template” for hardness reductions which we call Independent Join Paths or IJPs. The idea is that if we can construct a particular database that fulfills 5 criteria for a query $q$ , then we can conclude safely that $\texttt{RES}(q)$ is NP-complete.

This recent development is exciting for several reasons: 1) In our earlier attempts to prove hardness for queries, we amassed a plethora of different hardness proofs, with little immediate intuition of how one hardness proof immediately helps facilitate the hardness proof of another query. Now we expect that the task can be simplified to the task of searching for any particular database that serves as “proof” of hardness based on a generalized reduction from Vertex Cover. 2) We were able to look at our existing hardness proofs and post-hoc identify some part in some gadget that formed an IJP. In other words, IJPs were already present in our hardness proofs (we give examples in Appendix C). Thus IJPs are really a unifying common denominator for all hard queries known so far. 3) The search for hardness proofs could now, in theory, be automated. While we have not yet explored this idea, we give the intuition in Appendix C. 4) The hardness based on IJPs is not restricted to the particular fragment of CQs that we have analyzed in this paper; rather they are a universal criterion. Even the original criterion of triads for sj-free CQs can be subsumed under IJPs. 5) We conjecture that the inability to form IJPs for those queries that are in PTIME can be deduced from the structure of a query, and future work will discover the reason.

The intuition of IJPs. We have already seen that paths between two subgoals $g_{1}$ and $g_{n}$ that refer to the same relation are a sufficient condition for hardness under “certain circumstances”. Recall our simplest example for a path implying hardness: $q_{\textup{{vc}}}{\,:\!\!-\,}R(x),S(x,y),R(y)$ . The intuition of our construction is now as follows: Take any minimal VC problem for a graph $G(V,E)$ (see Fig. 8(a)). Replace any existing arc with 3 arcs instead to create $G^{\prime}$ (see Fig. 8(b)). Then $G$ has a VC of size $k$ iff $G^{\prime}$ has a VC of size $k+|E|$ . Similarly, replace each arc with 5 instead of 3 arcs, then the condition for $G^{\prime}$ is $k+2|E|$ . The key property we needed for this to work is the fact that 3 arcs form a particular path with the following “OR-property” (see Fig. 8(c)): As long as at least one end point of the path is removed, then the minimal VC is exactly one additional node per path.

Formalization of IJPs. We next use this idea to define a particular canonical database instance which we call “Independent Join Path.” We conjecture that whenever a query has such a canonical database, then resilience is hard by a proof that generalizes the idea from above. We give the formal definition here and provide intuition for each of the conditions in Appendix C. In the following, we write $\bm{\mathbf{x}}_{\bm{\mathbf{j}}}$ to denote the subvector of $\bm{\mathbf{x}}$ that retains only the entries indexed by $\bm{\mathbf{j}}$ . For example if $\bm{\mathbf{x}}=(1,2,3,4,5)$ and $\bm{\mathbf{j}}=(2,4,5)$ then $\bm{\mathbf{x}}_{\bm{\mathbf{j}}}=(2,4,5)$

Definition 1 (Independent Join Path).

A database $D$ forms an Independent Join Path for query $q$ if the following conditions hold:

(1)

There is a relation $R$ containing at least two tuples $R(\bm{\mathbf{a}})$ and $R(\bm{\mathbf{b}})$ with $\bm{\mathbf{a}}\not\subseteq\bm{\mathbf{b}}$ and $\bm{\mathbf{b}}\not\subseteq\bm{\mathbf{a}}$ . 2. (2)

In $D$ , $R(\bm{\mathbf{a}})$ and $R(\bm{\mathbf{b}})$ each participate in exactly one witness $\bm{\mathbf{w}}_{a},\bm{\mathbf{w}}_{b}$ of $D\models q$ . Both $\bm{\mathbf{w}}_{a}$ and $\bm{\mathbf{w}}_{b}$ have exactly $m$ tuples, where $m$ is the number of atoms in $q$ . 3. (3)

There is no endogenous relation $S$ containing a tuple $S(\bm{\mathbf{c}})$ with $\bm{\mathbf{c}}\subset\bm{\mathbf{a}}$ or $\bm{\mathbf{c}}\subset\bm{\mathbf{b}}$ . 4. (4)

If there is an exogenous relation $T^{\mathrm{x}}$ containing a tuple $T^{\mathrm{x}}(\bm{\mathbf{d}})$ with $\bm{\mathbf{d}}=\bm{\mathbf{a}}_{\bm{\mathbf{j}}}$ for some $\bm{\mathbf{j}}$ , then $T^{\mathrm{x}}$ also contains $T^{\mathrm{x}}(\bm{\mathbf{e}})$ with $\bm{\mathbf{e}}=\bm{\mathbf{b}}_{\bm{\mathbf{j}}}$ . 5. (5)

Let $c$ be the resilience of $q$ on $D$ : $\rho(q,D)=c$ . Then the resilience is $c-1$ in all 3 cases of removing either $R(\bm{\mathbf{a}})$ , or $R(\bm{\mathbf{b}})$ , or both.

Conjecture 2 (IJPs imply hardness).

If there is a database $D$ that forms an IJP for a query $q$ , then $\texttt{RES}(q)$ is NP-complete.

The conjecture. For the fragment of CQs we are considering in this paper, we have been able to simplify some hardness proofs, which at times use very different constructions (reductions from VC, 3-SAT, Max 2-SAT), by looking at our existing hardness proofs and identifying IJPs in our existing gadgets.

We conjecture that the existence of IJPs for a query is also a necessary condition for hardness, that there is an algorithm to verify whether a query can form IJPs or not, and that the fact that a query cannot form IJPs (such as linear SJ-free CQs) translates immediately into a PTIME algorithm for solving $\texttt{RES}(q)$ .

10. Related work

In prior work (Freire et al., 2015), we identified the concept of a triad, a novel structure that allowed us to fully characterize the complexity of resilience (and consequentially for deletion propagation) for the class of self-join-free conjunctive queries with potential functional dependencies. Our work in this paper considers self-joins, which have long-plagued the study of many problems in database theory; results for such queries have been few and far between.

Deletion propagation and view updates. The problem of resilience is a special case of deletion propagation, focusing on Boolean queries. Deletion propagation generally refers to non-Boolean queries. Given a non-Boolean query $q$ and database $D$ , the typical goal is to determine the minimum number of tuples that must be removed from $D$ , so that a tuple $\bm{\mathbf{t}}$ is no longer in the query result (Buneman et al., 2002; Dayal and Bernstein, 1982) (source side-effects). Variants of deletion propagation consider side-effects in the query result rather than the source (Kimelfeld et al., 2012; Kimelfeld, 2012), and multi-tuple deletions (Cong et al., 2012; Kimelfeld et al., 2013). Resilience and deletion propagation are special cases of the view update problem (Bancilhon and Spyratos, 1981; Cong et al., 2012; Cosmadakis and Papadimitriou, 1984; Dayal and Bernstein, 1982; Fagin et al., 1983; Gottlob et al., 1988; Keller, 1985), which consists of finding the set of operations that should be applied to the database in order to obtain a certain modification in the view.

Causality and explanations. Database causality is geared towards providing explanations for query results, but typically relies on the concept of responsibility (Meliou et al., 2010, 2011), which is harder than resilience. The idea of interventions appears in other explanation settings, but often apply to queries instead of the data (Roy and Suciu, 2014; Wu and Madden, 2013; Roy et al., 2015). Finally, the problem of explaining missing query results (Chapman and Jagadish, 2009; Herschel and Hernández, 2010; Huang et al., 2008; Herschel et al., 2009; Tran and Chan, 2010) is a problem analogous to deletion propagation, but in this case, we want to add, rather than remove tuples from the view.

Provenance and view updates. Data provenance studies formalisms that can characterize the relation between the input and the output of a given query (Buneman et al., 2001; Cheney et al., 2009; Cui et al., 2000; Green et al., 2007). “Why-provenance” is the provenance type most closely related to resilience. The motivation behind Why-provenance is to find the “witnesses” for the query answer, i.e., the tuples or group of tuples in the input that can produce the answer. Resilience, searches to find a minimum set of input tuples that can make a query false.

11. Final Remarks

In this paper, we studied the problem of resilience for conjunctive queries with self-joins. We identified fundamental query structures that impact hardness, and proved a complete dichotomy for the restricted class of single-self-join binary CQs where exactly two atoms can correspond to the same relation.

We also present results towards the for the case of binary CQs with a single self-join relation that appears in $3$ atoms, and identifies some open problems and challenges towards completing the dichotomy for this class (Section 8).

Our work also presents a roadmap for tackling the analysis of more extended query families. Section 9 provides towards a possible generalization of our results to all class of self-join queries, by using a unifying criterion that we call Independent Join Paths.

Overall, our work in this paper contributes important progress in the theoretical analysis of self-joins, which has long been stalled for many related problems. We hope that our results, even though they apply to a restricted class, will provide the foundations to help solve the general case for CQs with self-joins in the future.

Appendix A Detailed proofs

A.1. Proofs for Section 3.1

Proof of Proposition 2.

A database with unary $R$ and binary $S$ is simply a directed graph. For a directed graph $G=(V,E)$ , we can create a database instance $D_{G}$ where for each node $v_{i}\in V$ , we add tuple $R(v_{i})$ in $D_{G}$ , and for each edge $(v_{i},v_{j})\in E$ , we add tuple $S(v_{i},v_{j})$ in $D_{G}$ . Furthermore, $D_{G}\models q_{\textup{{vc}}}$ iff graph $G$ has at least one edge. Note that any vertex cover $C$ of $G$ has a correspondent set of tuples $\Gamma_{C}$ in $D_{G}$ , and it is easy to see that $D_{G}-\Gamma\not\models q_{\textup{{vc}}}$ .

More precisely,

[TABLE]

Therefore, $\texttt{RES}(q_{\textup{{vc}}})$ is NP-complete.

∎

Proof of Proposition 3.

We reduce 3SAT to $\texttt{RES}(q_{\textup{{chain}}})$ . Let $\psi$ be a 3CNF formula with $n$ variables $x,y,z,\ldots,v_{n}$ and $m$ clauses $C_{1},\ldots,C_{m}$ . We map any such $\psi$ to a pair $(D_{\psi},k_{\psi})$ where $D_{\psi}$ is a database satisfying $q_{\textup{{chain}}}$ , $k_{\psi}=(2n+5)m$ and

[TABLE]

Figure 10 shows part of $D_{\psi}$ consisting of the gadgets for $x,y,z,C_{1}$ where in this example, $C_{1}=(x\lor\overline{y}\lor z)$ . The nodes correspond to tuples in $D_{\psi}$ and there is a directed edge between any two nodes those are witnesses for $q_{\textup{{chain}}},D_{\psi}$ . The variable gadgets are cycles of length $2m$ whose minimum contingency sets are the set of $m$ blue nodes indicating the variable is assigned true, or the set of $m$ red nodes, indicating the variable is assigned false. The 9-node clause gadgets have minimum contingency sets of size 5 when the clause is assigned true, and 6 otherwise. ∎

A.2. Proofs for Section 3.3

Proof of Proposition 5.

We first argue that $R$ -tuples are not the optimal choice for a contingency set. Let $\Gamma$ be a minimum contingency set containing tuple $R(1,2)$ .

*Case 1: * $D$ contains only $A(1)$ or $C(1)$ but not both. WLOG, suppose it contains only $A(1)$ . We can then obtain a contingency set $\Gamma^{\prime}=(\Gamma-R(1,2))\cup A(1)$ of size $k$ . Similar if it contains only $C(1)$ .

Case 2: $D$ contains both $A(1)$ and $C(1)$ . Consider $\Gamma^{\prime}=(\Gamma\cup A(1))-R(1,2)$ and $\Gamma^{\prime\prime}=(\Gamma\cup C(1))-R(1,2)$ , and suppose that neither of those is a contingency set. Then we have $A(i),R(i,2),R(1,2),C(1)$ in $D-\Gamma^{\prime}$ and $A(1),R(1,2),R(j,2),C(j)$ in $D-\Gamma^{\prime\prime}$ . However, the existence of those witnesses implies that $D-\Gamma$ has the witness $A(i),R(i,2),R(j,2),C(j)$ contradicting the fact that $\Gamma$ is a contingency set. Therefore, at least one of $\Gamma^{\prime},\Gamma^{\prime\prime}$ must be a contingency set and we can replace $R(1,2)$ by $A(1)$ or $C(1)$ .

Since $R$ can be made exogenous, solving resilience for this query is the same as solving vertex cover in a bipartite graph, and therefore is in P. ∎

Proof of Proposition 6.

For a linear sj-free query, we can represent its resilience problem as a network flow making each endogenous tuple an edge of weight 1. Each flow is a witness and the min-cuts are exactly the minimum contingency sets (see (Meliou et al., 2010) for details). It is not clear what to do with repeated relations because there is no obvious way to add to a standard network flow algorithm an extra constraint that two or more edges represent the same tuple, and can thus be removed together at the reduced cost of only 1.

To handle $q_{\textrm{3perm-R}}^{A}$ , consider an input database $D$ with $A$ and $R$ tuples. We refer to $R$ -tuples that have an inverse as 2-way tuples, and the ones that don’t as 1-way tuples. We construct a flow graph by creating 1-weight edges $(a_{l},a_{r})$ for all tuples $A(a)$ , and 1-weight edges $(\langle ab\rangle_{l},\langle ab\rangle_{r})$ for pairs $\{a,b\}$ of 2-way tuples. There are $\infty$ -weight edges $(s,a_{l})$ for all tuples $A(a)$ , where $s$ is the source, $\infty$ -weight edges $(x_{r},\langle uv\rangle_{l})$ if and only if $x\in\{u,v\}$ or there is a 1-way tuple $R(x,u)$ or $R(x,v)$ , and $\infty$ -weight edges $(\langle ab\rangle_{r},t)$ for pairs $\{a,b\}$ of 2-way tuples, where $t$ is the target. Note that 1-way tuples are never the optimal choice, since we can always pick an $A$ -tuple instead, so they have infinite weight in the flow graph. Below we refer to the tuple that the edges represent, instead of the edge itself.

We show that from the min-cut, $M$ , of the flow graph, we can construct a minimum contingency set, $\Gamma$ , as follows: $\Gamma$ contains all the $A(a)$ ’s from $M$ . For each edge $\{a,b\}\in M$ , we add one of $R(a,b)$ or $R(b,a)$ to $\Gamma$ as follows: If $A(a)\in(D-M)$ but $A(b)\not\in(D-M)$ then we add $R(a,b)$ to $\Gamma$ . Symmetrically, if $A(b)\in(D-M)$ but $A(a)\not\in(D-M)$ then we add $R(b,a)$ to $\Gamma$ ; otherwise, arbitrarily add one or the other.

We claim that the resulting $\Gamma$ is a minimum contingency set. Because it comes from a min-cut, it suffices to show that $\Gamma$ is a contingency set, i.e., $D-\Gamma\not\models q_{\textrm{3perm-R}}^{A}$ . Suppose for the sake of a contradiction, that $D-\Gamma$ has a wtiness $A(a)$ , $R(a,b)$ , $R(b,a)$ , $R(a,b)$ , i.e., some tuple, $R(a,b)$ , occurs twice in the join. This is impossible because since $A(a)\not\in M$ , at least one of $R(a,b)$ or $R(b,a)$ must be in $\Gamma$ .

The other possible wtiness is $A(c),R(c,a),R(a,b),R(b,a)$ . Note that if $R(c,a)$ is a 1-way tuple, then this wtiness would be a flow contradicting the fact that $M$ is a cut. Thus, $R(c,a)$ is a 2-way tuple. Since $A(c)\{c,a\}$ can’t be a flow, the pair $\{c,a\}$ must be in $M$ .

Since $R(c,a)$ was not chosen in $\Gamma$ , it must be that $A(a)\in(D-M)$ . This means that there is still a flow from $A(a)$ to $\{a,b\}$ , so $M$ was not a cut. ∎

A.3. Proofs for Section 4.2

Proof of Lemma 1.

First observe that disconnected components join as a cross-product, so for a query to be made false it is enough that at least one of its query components is made false. Hence, for each query component $q_{i}$ , if $D-\Gamma_{i}\not\models q_{i}$ , then $D-\Gamma_{i}\not\models q$ , which then implies $\rho(q,D)=\min_{i}\rho(q_{i},D)$ . ∎

Proof of Lemma 2.

This is easy to see because the resilience problem for $q$ consists of the union of the $k$ independent resilience problems for its components. If $\texttt{RES}(q_{i})$ is NP-complete, then we can take a database that has the relevant instance of $q_{i}$ and all the other components can be extremely resilient, so the minimum contingency sets is always a subset of $q_{i}$ ’s component. Conversely, if each $\texttt{RES}(q_{i})$ is in P, then to solve the minimum contingency set, we find the minimum contingency of each component, and the global minimum is simply the minimum of these minima. ∎

A.4. Proofs for Section 4.3

Proof of Proposition 5.

We show that tuples from dominated relations don’t need to be used in minimum contingency sets. Assume $q$ is a connected query and let $\Gamma$ be a minimum contingency set of $q$ in $D$ .

Suppose that relation $A$ dominates relation $B$ and there is some tuple $B(\mathbf{t})$ that is in $\Gamma$ . Tuple $B(\mathbf{t})$ can participate in joins as one or more of the $B$ -atoms in $q$ . Let’s call those atoms $B_{i}$ , for $i\in[k]$ . Our definition of domination guarantees that there exists an atom $A_{j}$ for each atom $B_{i}$ such that the projection of $\mathbf{t}$ onto $\textup{{var}}(A_{j})$ always produces the same tuple $\mathbf{p}$ . Then we can replace $B(\mathbf{t})$ by $A(\mathbf{p})$ and we remove at least as many witnesses if $D\models q$ .

As a result we show the complexity of $\texttt{RES}(q)$ is the same if $B$ is made exogenous and therefore $\texttt{RES}(q)\equiv\texttt{RES}(q^{\prime})$ . ∎

A.5. Proofs for Section 5

Proof of Lemma 3.

Let $D\models q$ be a database. We map $D$ to $D^{\prime}$ by marking all the tuples according to which variables they refer to in witnesses of $q$ . For each witness $j$ assigning the variables of $q$ to domain values ( $\texttt{dom}(D)$ ), we add the tuples $T(j(v_{1})_{v_{1}},\ldots j(v_{k})_{v_{k}})$ to $D^{\prime}$ , where $T(\overline{v})$ occurs in $q^{\textrm{sj}}$ . In particular, if $S_{i}$ was replaced by $R_{i}$ to obtain $q^{\textrm{sj}}$ , $S_{i}(j(v_{1}),\ldots j(v_{k}))\in D$ results in adding the tuple $R_{i}(j(v_{1})_{v_{1}},\ldots j(v_{k})_{v_{k}})$ to $D^{\prime}$ .

For example, consider that atom $S(x,y,z)$ was replaced by atom $R(x,y.z)$ . If $S(a,b,c)$ is part of a witness $j$ , we have $j(x)=a,j(y)=b,j(z)=c$ . Then $R(j(x)_{x},j(y)_{y},j(z)_{z})=R(a_{x},b_{y},c_{z})$ is included in $D^{\prime}$ .

Since the variables mark the tuples in $D^{\prime}$ , the new self-joins have no effect: if the subscripted variables are $\overline{v}$ in a tuple of $R_{i}$ in $D^{\prime}$ , then it came from a tuple of $S_{i}$ in $D$ . It then follows that there is a 1:1 correspondence of contingency sets for $(D,q)$ and $(D^{\prime},q^{\textrm{sj}})$ . We need the minimality of $q^{\textrm{sj}}$ , because if there were an assignment where $D-\Gamma^{\prime}\models q^{\textrm{sj}}$ when $D-\Gamma\not\models q$ , this would correspond to a reassignment of the variables, $\textup{{var}}(q^{\textrm{sj}})$ to a proper subset, so that some $R_{i}$ would be doing “double duty”. This would mean that a proper subset of $q^{\textrm{sj}}$ implies $q^{\textrm{sj}}$ , i.e, $q^{\textrm{sj}}$ is not minimal.

∎

A.6. Proofs for Section 5.1

Proof of Proposition 5.

The proofs essentially follow the same strategy used to reduce 3SAT to $\texttt{RES}(q_{\triangle})$ with a few adjustments to handle the self-joining relation and also the variable order, which is relevant in some cases. See Lemma 1 and Lemma 2 for the details. ∎

Lemma 1.

$\texttt{RES}(q_{\textup{{rats}}}^{\textrm{sj}_{1}})$ * and $\texttt{RES}(q_{\textup{{rats}}}^{\textrm{sj}_{2}})$ are NP-complete.*

Proof of Lemma 1.

We first show that $\texttt{RES}(q_{\textup{{rats}}}^{\textrm{sj}_{1}})$ is NP-complete by a reduction from 3SAT, similar to the one used to prove $\texttt{RES}(q_{\triangle})$ is NP-complete (Proposition 1).

Let $\psi$ be a 3CNF formula with $n$ variables $v_{1},\ldots,v_{n}$ and $m$ clauses $C_{0},\ldots,C_{m-1}$ . Our reduction will map any such $\psi$ to a pair $(D^{1}_{\psi},k_{\psi})$ where $D^{1}_{\psi}$ is a database satisfying $q_{\textup{{rats}}}^{\textrm{sj}_{1}}$ , and

[TABLE]

In our construction, if $\psi\in 3\mbox{{\rm\sc SAT}}$ , then the size of each minimum contingency set for $q_{\textup{{rats}}}^{\textrm{sj}_{1}}$ in $D^{1}_{\psi}$ will be $k_{\psi}=6mn$ , whereas if $\psi\not\in 3\mbox{{\rm\sc SAT}}$ , then the size of all contingency sets for $q_{\textup{{rats}}}^{\textrm{sj}_{1}}$ in $D^{1}_{\psi}$ will be greater than $k_{\psi}$ .

We construct $D^{1}_{\psi}$ by taking $D_{\psi}$ from the proof of Proposition 1, and adding the following tuples for each witness $\langle a,b,c\rangle$ in $D_{\psi},q_{\triangle}$ :

[TABLE]

Notice that for each witness $\langle a,b,c\rangle$ in $D_{\psi}$ we thus create 3 witnesses, $\langle a,b,c\rangle$ , $\langle b,c,a\rangle$ , $\langle c,a,b\rangle$ in $D_{\psi}^{1}$ but they all use the same $R$ -tuples.

We know from Proposition 1 that some $R$ -tuples participate in 2 witnesses (triangles) and some only in 1 within a variable gadget. Thus, in $D_{\psi}^{2}$ these numbers are 6 witnesses or 3 witnesses. Observe that $A$ -tuples participate in at most 2 witnesses each, so it is never better to choose an $A$ -tuple instead of an $R$ tuple. Therefore it follows that the same choice of tuples for the minimum contingency set for $D_{\psi},q_{\triangle}$ will also work for $D^{1}_{\psi},q_{\textup{{rats}}}^{\textrm{sj}_{1}}$ by choosing the corresponding $R$ -tuples in $D^{1}_{\psi}$ based on the $R,S,T$ -tuples chosen from $D_{\psi}$ .

For $q_{\textup{{rats}}}^{\textrm{sj}_{2}}$ the reduction is similar, but the final atom – $R(x,z)$ instead of $R(z,x)$ – must be handled. The solution is that for each witness $\langle a,b,c\rangle$ in $D_{\psi},q_{\triangle}$ , we add the following tuples to $D^{2}_{\psi}$ :

[TABLE]

Now, each witness from $D_{\psi},q_{\triangle}$ leads to 6 witnesses in $D^{2}_{\psi},q_{\textup{{rats}}}^{\textrm{sj}_{2}}$ – the three from the above proof plus their reversals. Thus, the $R$ -tuples for solid edges from Figure 16 are used in 6 witnesses each, whereas $A$ -tuples are in at most 4 witnesses each. Thus, based on the minimum contingency sets for $D_{\psi},q_{\triangle}$ , we create minimum contingency sets for $D^{2}_{\psi},q_{\textup{{rats}}}^{\textrm{sj}_{2}}$ by including the corresponding $R$ -tuples and their reversals. ∎

Lemma 2.

$\texttt{RES}(q_{\textrm{brats}}^{\textrm{sj}_{1}})$ , $\texttt{RES}(q_{\textrm{brats}}^{\textrm{sj}_{2}})$ and $\texttt{RES}(q_{\textrm{brats}}^{\textrm{sj}_{3}})$ are NP-complete.

Proof of Lemma 2.

The same idea used above to prove that $\texttt{RES}(q_{\textup{{rats}}}^{\textrm{sj}_{1}})$ is hard, works for query $q_{\textrm{brats}}^{\textrm{sj}_{1}}$ . When defining $D^{1}_{\psi}$ for this case, we just need to add the appropriate $B$ -tuples:

[TABLE]

Since $B$ -tuples have the same properties as the $A$ -tuples, they are never better choices than $R$ -tuples and we can obtain a minimum contingency set with only $R$ -tuples, as we saw in Lemma 1 above. Similar reduction thus follow for $\texttt{RES}(q_{\textrm{brats}}^{\textrm{sj}_{2}})$ and $\texttt{RES}(q_{\textrm{brats}}^{\textrm{sj}_{3}})$ . ∎

A.7. Proofs for Section 5.2

Proof of Theorem 6.

This mostly follows from the fact that triads make sj-free queries hard and adding self-joins to a hard query keeps it hard (Lemma 6, Lemma 3).

The case we haven’t covered yet is where the triad in $q$ involves self-join relations which would be dominated and thus exogenous in the corresponding sj-free query. Examples are self-join variations of $q_{\textup{{rats}}}$ and $q_{\textrm{brats}}$ which are hard even though – because of domination – their sj-free cases are easy (Proposition 5).

We now follow and extend the proof of Lemma 6 when $q$ has a triad, ${\mathcal{T}}=(S_{0},S_{1},S_{2})$ , even though if ${\mathcal{T}}$ did not include a self join, one or more of its members would be dominated. In Case 1, $\textup{{var}}(S_{i})$ , $i=1,2,3$ , are pairwise disjoint. Here the reduction from $\texttt{RES}(q_{\triangle})$ to $\texttt{RES}(q)$ goes through exactly as in the proof of Lemma 6. We can choose a single relevant variable for each $S_{i}$ , so no domination is possible. Any minimum contingency set consists of elements of $S_{0}(\langle ab\rangle)$ , $S_{1}(\langle bc\rangle)$ or $S_{2}(\langle ca\rangle)$ , and the reduction from $\texttt{RES}(q_{\triangle})$ goes through.

In Case 2, where $\textup{{var}}(S_{i})$ are not pairwise disjoint, we have to consider a partition of the variables into 7 pieces (Eqn. 6 from the proof of LABEL:hard_partdichotomy). As argued there, there is still a 1:1 correspondence between witnesses of $(D,q_{\triangle})$ and witnesses of $(D^{\prime},q)$ .

If there are no (endogenous) relations containing just the $a$ , $b$ or $c$ variables, then the reduction from $\texttt{RES}(q_{\triangle})$ goes through. If there is a relation containing just $a$ , then we instead use the same reduction but from the appropriate self-join variation of $q_{\textup{{rats}}}$ . If there are relations containing just $a$ and $b$ but not $c$ , then we get a reduction from the appropriate self-join variation of $q_{\textrm{brats}}$ . If there are relations for $a$ , $b$ and $c$ , then these form an sj-free triad and thus we already know that $\texttt{RES}(q)$ is hard. ∎

A.8. Proofs for Section 5.3

Proof of Theorem 7.

We are given $q$ , a CQ with no triad. Let $n$ be the number of groups of endogenous atoms in $q$ , where we put two atoms in the same group iff they contain exactly the same variables, so $A(x,y)$ and $R(y,x)$ belong in the same group, but $B(x)$ and $R(x,z)$ do not. We refer to the groups of endogenous atoms as $G_{1},G_{2},\ldots,G_{n}$ .

Since $q$ is connected but has no triad, for any pair $G_{i},G_{j}$ , either these atoms are connected directly in $\mathcal{H}(q)$ , or they are connected via at least another group $G_{k}$ , but both cases cannot occur. If they are connected directly, then they must appear consecutive in an order of the endogenous atoms. Otherwise, $G_{k}$ must be placed between them. Note that in the latter case, removing the variables of $G_{k}$ separates the atoms of $q$ into two connected components, one containing $G_{i}$ and the other containing $G_{j}$ , so we call $G_{k}$ the separator of $G_{i},G_{j}$ .

Now, for any set $A,B,C$ of endogenous atoms from different groups, when $A$ and $B$ are already placed along the line, say with $B$ to the right of $A$ , then it is easy to see where $C$ must go. If $A$ is the separator, $C$ goes to the left of $A$ , if $B$ is the separator, $C$ goes to the right of $B$ and if $C$ is the separator, then it goes between $A$ and $B$ , and that’s what guarantees the endogenous atoms are linearly connected. Looking at Figure 9, we see that the endogenous atoms of $q$ are arranged linearly.

∎

A.9. Proofs for Section 6

Proof of Theorem 1 (Unary Path).

We define a reduction from $\texttt{RES}(q_{\textup{{vc}}})$ . Given a database $D$ we want to define a database $D^{\prime}$ such that

[TABLE]

We can assume that $A(x)$ and $A(y)$ are consecutive occurrences of $A$ so let $p$ be a subquery of $q$ consisting of a path from $A(x)$ to $A(y)$ with no intervening occurrences of $A$ . Thus, $q=q_{\ell}A(x)pA(y)q_{r}$ . Since $A$ is the only sj relation, the relations that occur in $p$ occur only in $p$ .

For each atom $R_{i}(v_{1},v_{2})$ occurring in $p$ , we define

[TABLE]

where

[TABLE]

In other words, $x$ maps to $a$ , $y$ maps to $b$ , and any other variable $v$ maps to $\langle ab\rangle_{v}$ . Thus, we have made a faithful copy of $D$ capturing $q_{\textup{{vc}}}$ . For the other atoms, $S_{j}(v_{1},v_{2})$ , not in $p$ , let

[TABLE]

where $m(v,a,b)$ matches with $t(v,a,b)$ as well as with a set of $n$ new values, where $n=|\texttt{dom}(D)|$ . It follows that there is always a minimum contingency sets for $(D^{\prime},q)$ with only $A$ -tuples, in particular, the sets $\bigl{\{}A(a)\,\bigm{|}\,V(a)\in\Gamma\bigr{\}}$ for $\Gamma$ any minimum contingency set for $(D,q_{\textup{{vc}}})$ . ∎

Proof of Theorem 2 (Binary Path).

Similar to the unary case, we define a reduction from $\texttt{RES}(q_{\textup{{vc}}})$ . Given a database $D$ we want to define a database $D^{\prime}$ such that

[TABLE]

Consider $q=q_{\ell}R(x,y)pR(z,w)q_{r}$ , and that $p$ is a subquery of $q$ consisting of a path from $R(x,y)$ to $R(z,w)$ with no intervening occurrences of $R$ . By assumption, there is no path of just $R$ ’s from $R(x,y)$ to $R(z,w)$ , so we may assume that $R(x,y)$ and $R(z,w)$ have such an $R$ -free path, $p$ , between them.

In order to define the reduction, we define an equivalence relation, $\equiv$ , on the variables occurring in $q$ , namely $u\equiv v$ iff $q$ has an $R$ -path from $u$ to $v$ , i.e., there is a path of $R$ -atoms occuring in $q$ that takes us from $u$ to $v$ . (For example, for the query $R(x,y),S(u,z),R(z,w),Q(w,x),R(x,v)$ , the equivalence classes of $\equiv$ are $\{x,y,v\},\{z,w\},\{u\}$ .) Note that by assumption, for the equivalence relation defined by $q$ , $x\not\equiv z$ .

For any atom $S_{i}(v_{1},v_{2})$ occurring in $R(x,y)pR(z,w)$ , we define

[TABLE]

where

[TABLE]

Additionally, for atoms $T_{j}(v_{1},v_{2})$ occurring in $q_{l},q_{r}$ , let

[TABLE]

where $m(v,a,b)$ matches with $t^{\prime}(v,a,b)$ as well as with a set of $n$ new values, where $n=|\texttt{dom}(D)|$ .

We have that all $R$ -tuples in $D^{\prime}$ will have the same value as first and second attributes, so $R$ can be seen as corresponding to relation $A$ in $D$ . Similar to the unary case, we have made a copy of $D$ capturing $q_{\textup{{vc}}}$ and there is always a minimum contingency sets for $(D^{\prime},q)$ with only $R$ -tuples, in particular, the sets $\bigl{\{}R(a,a)\,\bigm{|}\,V(a)\in\Gamma\bigr{\}}$ for $\Gamma$ any minimum contingency set for $(D,q_{\textup{{vc}}})$ . ∎

A.10. Proofs for Section 7.1

These are the expansions of $q_{\textup{{chain}}}$ with unary relations:

[TABLE]

We next show all of them are hard queries.

Lemma 3.

$\texttt{RES}(q_{\textup{{chain}}}^{\textup{{b}}})$ * is NP-complete.*

Proof of Lemma 3.

For this case we are going to use almost the same reduction as the one used for $\texttt{RES}(q_{\textup{{chain}}})$ , just with the added $B$ -tuples. Then we argue that there is always a min $\Gamma$ that only uses $R$ -tuples.

Let $\psi$ be a 3CNF formula with $n$ variables $v_{1},\ldots,v_{n}$ and $m$ clauses $C_{1},\ldots,C_{m}$ . Our reduction will map any such $\psi$ to a pair $(D_{\psi},k_{\psi})$ where $D_{\psi}$ is a database satisfying $q_{\textup{{chain}}}^{\textup{{b}}}$ , and

[TABLE]

In our construction, if $\psi\in 3\mbox{{\rm\sc SAT}}$ , then the size of each minimum contingency set for $q_{\textup{{chain}}}^{\textup{{b}}}$ in $D_{\psi}$ will be $k_{\psi}=(n+5)m$ , whereas if $\psi\not\in 3\mbox{{\rm\sc SAT}}$ , then the size of all contingency sets for $q_{\textup{{chain}}}^{\textup{{b}}}$ in $D_{\psi}$ will be greater than $k_{\psi}$ .

First, include in $D_{\psi}$ all the same $R$ -tuples included in the proof of Proposition 3. In addition to that add the following $B$ -tuples:

(1)

Variable gadget: For each variable $v_{i}$ and each $j\in[m]$ insert the following two tuples into the database: $B(v_{i}^{j})$ and $B(\overline{v_{i}^{j}})$ . 2. (2)

Clause gadget: For each clause $j\in[m]$ insert the following 6 tuples into the database: $B(a_{j})$ , $B(b_{j})$ , $B(c_{j})$ , $B(a_{j}^{\prime})$ , $B(b_{j}^{\prime})$ , $B(c_{j}^{\prime})$ .

By adding those tuples, we obtain the same structure and witnesses of the reduction for $\texttt{RES}(q_{\textup{{chain}}})$ . Now suppose that $t=B(d)$ is in a minimum contingency set $\Gamma$ . If $d=v_{i}^{j}$ (or $\overline{v_{i}^{j}}$ ) for some $i,j$ , we know that $t$ must join with $t^{\prime}=R(\overline{v_{i}^{j-1}},v_{i}^{j})$ (or $R(\overline{v_{i}^{j}},v_{i}^{j})$ ) by our construction. Thus, we can exchange $t$ for $t^{\prime}$ and obtain contingency set $\Gamma^{\prime}$ . Similar, if $d\in\{a,b,c,a^{\prime},b^{\prime},c^{\prime}\}$ , then $t$ must join with tuple $R(d,*)$ , since there is only tuple of that kind for each possible value of $d$ .

This shows that there is a minimum contingency set for $D_{\psi}$ without $B$ -tuples, and the properties of the reduction in Proposition 3 also hold in this case. ∎

Lemma 4.

$\texttt{RES}(q_{\textup{{chain}}}^{\textup{{a}}})$ , $\texttt{RES}(q_{\textup{{chain}}}^{\textup{{c}}})$ , $\texttt{RES}(q_{\textup{{chain}}}^{\textup{{ab}}})$ and $\texttt{RES}(q_{\textup{{chain}}}^{\textup{{bc}}})$ are NP-complete.

Proof of Lemma 4.

We again define a reduction from 3SAT, using gadgets similar to the one in Proposition 3. The variable gadget remains such that a minimum cover will choose either blue nodes (variable is set to true), or red nodes (variable is set to false). The clause gadget (black nodes) is chosen as to enforce a clause: if one or more of the outermost black nodes are chosen, then the minimum cover is 5, otherwise 6.

We next reduce 3SAT to $\texttt{RES}(q_{\textup{{chain}}}^{\textup{{a}}})$ . Let $\psi$ be a 3CNF formula with $n$ variables $v_{1},\ldots,v_{n}$ and $m$ clauses $C_{1},\ldots,C_{m}$ . Our reduction will map any such $\psi$ to a pair $(D_{\psi},k_{\psi})$ where $D_{\psi}$ is a database satisfying $q_{\textup{{chain}}}^{\textup{{a}}}$ , and

[TABLE]

In our construction, if $\psi\in 3\mbox{{\rm\sc SAT}}$ , then the size of each minimum contingency set for $q_{\textup{{chain}}}^{\textup{{a}}}$ in $D_{\psi}$ will be $k_{\psi}=(n+5)m$ , whereas if $\psi\not\in 3\mbox{{\rm\sc SAT}}$ , then the size of all contingency sets for $q_{\textup{{chain}}}^{\textup{{a}}}$ in $D_{\psi}$ will be greater than $k_{\psi}$ .

(1)

Variable gadget: For each variable $v_{i}$ and each $j\in[m]$ insert the following tuples into the database: $R(v_{i}^{j},\overline{v_{i}^{j}})$ , $R(\overline{v_{i}^{j}},v_{i}^{j+1})$ and $A(v_{i}^{j})$ , $A(\overline{v_{i}^{j}})$ . If $j+1>m$ , then make the superscript 1. The resulting witnesses between the tuples form a cycle of length $2m$ . The minimum contingency sets are to either choose all tuples $R(v_{i}^{j},\overline{v_{i}^{j}})$ representing a variable to have assignment true, or all tuples $R(\overline{v_{i}^{j}},v_{i}^{j+1})$ representing a variable to have assignment false. Note that any $A$ -tuple only joins once, therefore it is better to choose an $R$ -tuple, since all of these join at least twice. 2. (2)

Clause gadget: For each clause $j\in[m]$ insert the following tuples into the database: $R(a_{j},b_{j})$ , $R(b_{j},c_{j})$ , $R(c_{j},a_{j})$ , $R(a_{j}^{\prime},a_{j})$ , $R(b_{j}^{\prime},b_{j})$ , $R(c_{j}^{\prime},c_{j})$ , $A(a_{j})$ , $A(b_{j})$ , $A(c_{j})$ , $A(a_{j}^{\prime})$ , $A(b_{j}^{\prime})$ , $A(c_{j}^{\prime})$ . The resulting witnesses form a triangle. If either of the $R(*^{\prime},*)$ is removed, then the remaining witnesses can be destroyed by choosing only 2 or more tuples, otherwise we need 3. Similar to the variable gadget, $A$ -tuples are not an optimal choice because they only participate in one witness each. 3. (3)

Connecting the gadgets: For each variable $i$ that appears in clause $j$ at position 1, add the following tuples: $R(a_{j}^{\prime\prime},a_{j}^{\prime})$ and $A(a_{j}^{\prime\prime})$ . If $v_{i}$ appears as positive add tuple $R(a_{j}^{\prime\prime},v_{i}^{j})$ , if it appear as negative add tuple $R(a_{j}^{\prime\prime},\overline{v_{i}^{j}})$ . Analogously use $b_{j}^{\prime},b_{j}^{\prime\prime}$ or $c_{j}^{\prime},c_{j}^{\prime\prime}$ instead of $a_{j}^{\prime},a_{j}^{\prime\prime}$ for positions 2 and 3 instead of position 1.

Observe that if the clause is not satisfied, then we need to choose the $A$ -tuples (orange squares in Fig. 11), and not choose the outer black nodes ( $R$ -tuples) in the clause gadget, resulting in choosing 6 tuples in total in order to delete all the witnesses, otherwise we just need 5 tuples.

The reduction for $q_{\textup{{chain}}}^{\textup{{ab}}}$ is very similar to the one presented above. First, use the same $D_{\psi}$ just adding the appropriate $B$ -tuples, i.e., $B$ -tuples that preserve the witnesses.

Now note that for any $t=B(d)\in D_{\psi}$ , there is only one $R$ -tuple such that $t^{\prime}=R(d,*)$ , therefore $t$ must join with $t^{\prime}$ . Therefore, any occurrence of $B$ -tuple in a contingency set can be exchanged by its correspondent $R$ -tuple, and we are guaranteed this reduction has the same properties as the one for $q_{\textup{{chain}}}^{\textup{{a}}}$ . ∎

Lemma 5.

$\texttt{RES}(q_{\textup{{chain}}}^{\textup{{ac}}})$ * and $\texttt{RES}(q_{\textup{{chain}}}^{\textup{{abc}}})$ are NP-complete.*

Proof of Lemma 5.

We define a reduction from 3SAT. As in the previous cases, the variable gadget remains such that a minimum cover will choose either blue nodes (variable is set to true), or red nodes (variable is set to false). The clause gadget (center black nodes) is chosen as to enforce a clause: if one or more of the outermost joins (black edges) are deleted by choosing the corresponding $A$ -tuple (orange square), then the minimum cover for the black subgraph is 2, otherwise 3.

We next reduce 3SAT to $\texttt{RES}(q_{\textup{{chain}}}^{\textup{{ac}}})$ . Let $\psi$ be a 3CNF formula with $n$ variables $v_{1},\ldots,v_{n}$ and $m$ clauses $C_{1},\ldots,C_{m}$ . Our reduction will map any such $\psi$ to a pair $(D_{\psi},k_{\psi})$ where $D_{\psi}$ is a database satisfying $q_{\textup{{chain}}}$ , and

[TABLE]

In our construction, if $\psi\in 3\mbox{{\rm\sc SAT}}$ , then the size of each minimum contingency set for $q_{\textup{{chain}}}^{\textup{{ac}}}$ in $D_{\psi}$ will be $k_{\psi}=(n+5)m$ , whereas if $\psi\not\in 3\mbox{{\rm\sc SAT}}$ , then the size of all contingency sets for $q_{\textup{{chain}}}^{\textup{{ac}}}$ in $D_{\psi}$ will be greater than $k_{\psi}$ .

(1)

Variable gadget: For each variable $v_{i}$ and each $j\in[m]$ insert the following tuples into the database: $R(v_{i}^{j},\overline{v_{i}^{j}})$ , $R(\overline{v_{i}^{j}},v_{i}^{j+1})$ and $A(v_{i}^{j})$ , $A(\overline{v_{i}^{j}})$ and $C(v_{i}^{j})$ , $C(\overline{v_{i}^{j}})$ . If $j+1>m$ , then make the superscript 1. The resulting witnesses between the tuples form a cycle of length $2m$ . The minimum contingency sets are to either choose all tuples $R(v_{i}^{j},\overline{v_{i}^{j}})$ representing a variable to have assignment true, or all tuples $R(\overline{v_{i}^{j}},v_{i}^{j+1})$ representing a variable to have assignment false. If we only consider those tuples, note that $A$ - and $C$ -tuples participate in only one witness, so the optimal choice is to delete $R$ -tuples. 2. (2)

Clause gadget: For each clause $j\in[m]$ insert the following tuples into the database: $R(a_{j},b_{j})$ , $R(b_{j},c_{j})$ , $R(c_{j},a_{j})$ , $R(a_{j}^{\prime},a_{j})$ , $R(b_{j}^{\prime},b_{j})$ , $R(c_{j}^{\prime},c_{j})$ , $A(a_{j})$ , $A(b_{j})$ , $A(c_{j})$ , $A(a_{j}^{\prime})$ , $A(b_{j}^{\prime})$ , $A(c_{j}^{\prime})$ , $C(a_{j})$ , $C(b_{j})$ , $C(c_{j})$ . The resulting witnesses form a triangle. If either of the $A(*^{\prime})$ is removed, then the remaining witnesses can be destroyed by choosing only 2 or more tuples, otherwise we need 3. We later argue that these tuples only need be $R$ -tuples. 3. (3)

Connecting the gadgets: For each variable $i$ that appears in clause $j$ at position 1, add the following tuples: $R(a_{j}^{\prime},*_{j}^{a}),R(*_{j}^{a},a_{j}^{\prime\prime})$ and $C(a_{j}^{\prime\prime})$ . If $v_{i}$ appears as positive add tuple $R(\overline{v_{i}^{j}},a_{j}^{\prime\prime})$ , if it appear as negative add tuple $R(v_{i}^{j},a_{j}^{\prime\prime})$ . Analogously use $b_{j}^{\prime},b_{j}^{\prime\prime}$ or $c_{j}^{\prime},c_{j}^{\prime\prime}$ instead of $a_{j}^{\prime},a_{j}^{\prime\prime}$ for positions 2 and 3 instead of position 1.

With our gadget, if the clause cannot be satisfied, then we need to choose all the $C$ -tuples (orange diamonds on Fig. 12), since we can delete two witnesses by doing deleting each. In that case, in order to delete the remaining witnesses we need to delete 3 tuples, namely the 3 black nodes in the triangle, resulting on the total deletion of 6 tuples.

We now need to argue that, besides the tuples depicted in Fig. 12, we don’t need other $A$ - or $C$ -tuples for a minimum contingency set. Assume there is a tuple $t=A(d)$ in a min $\Gamma$ . Given that $d\notin\{a_{j}^{\prime},b_{j}^{\prime},c_{j}^{\prime}\}$ , our construction guarantees there is only one $R$ -tuple such that $t^{\prime}=R(d,-)$ , therefore we can have $\Gamma^{\prime}=\Gamma-t+t^{\prime}$ , and $\Gamma^{\prime}$ is also a minimum contingency set. Similarly, if there is a tuple $t=C(d)$ in $\Gamma$ , and assuming $d\notin\{a_{j}^{\prime\prime},b_{j}^{\prime\prime},c_{j}^{\prime\prime}\}$ , there is only one $R$ -tuple $t^{\prime}=R(-,d)$ , and therefore the same follows.

For $q_{\textup{{chain}}}^{\textup{{abc}}}$ use almost the same construction as above. We just add the appropriate $B$ -tuples and show that there is a minimum contingency set that does not contain those.

Consider $D_{\psi}$ as initially defined for $q_{\textup{{chain}}}^{\textup{{ac}}}$ . Now we include the appropriate $B$ -tuples:

(1)

Variable gadget: For each variable $v_{i}$ and each $j\in[m]$ insert the following tuples into the database: $B(v_{i}^{j})$ , $B(\overline{v_{i}^{j}})$ 2. (2)

Clause gadget: For each clause $j\in[m]$ insert the following tuples into the database: $B(a_{j})$ , $B(b_{j})$ , $B(c_{j})$ . 3. (3)

Connecting the gadgets: For each variable $i$ that appears in clause $j$ at position 1, add tuple $B(*_{j}^{a})$ . Analogously $B(*_{j}^{b})$ and $B(*_{j}^{c})$ for positions 2 and 3, respectively.

By adding those $B$ -tuples we obtain the same witnesses we saw in the reduction for $\texttt{RES}(q_{\textup{{chain}}}^{\textup{{ac}}})$ . With this construction we guarantee that for any tuple $t=B(d)$ , there is either only one tuple $R(d,-)$ or only one tuple $R(-,d)$ , which means we can always choose one of those $R$ -tuples instead and obtain another minimum contingency set without $B$ -tuples. ∎

Proof of Proposition 2.

Suppose that $R(x,y),R(y,z)$ are the unique $R$ -atoms in $q$ . Assume first that there are no unary atoms $A(x),B(y),C(z)$ . We define a reduction from $\texttt{RES}(q_{\textup{{chain}}})$ to $\texttt{RES}(q)$ as follows:

Consider a database $D$ with $D\models q_{\textup{{chain}}}$ and we may assume that there are no loops $R(a,a)\in D$ , since those would have to be in any $\Gamma$ . We define a new database $D^{\prime}$ such that for each atom $S_{i}(v_{1},v_{2})$ or $A(v)$ occurring in $q$ , we define

[TABLE]

where

[TABLE]

Now we want to show

[TABLE]

Notice that this mapping from $D$ to $D^{\prime}$ preserves the witnesses in $D,q_{\textup{{chain}}}$ . Moreover, there are no new witnesses created where variables $x,y,z$ are mapped to values that did not correspond to witnesses before. Since $q$ is pseudo-linear, no endogenous atom of $q$ contains both $x$ and $z$ . Therefore, any minimum contingency set for $D,q_{\textup{{chain}}}$ is also a minimum contingency set for $D^{\prime},q$ . This completes our reduction.

Now, if any subset of unary relations $A(x),B(y),C(z)$ does appear in $q$ , then we define a reduction from the appropriate unary expansion of $q_{\textup{{chain}}}$ . The same mapping used above to define $D^{\prime}$ from $D$ preserves all minimum contingency sets, as desired. ∎

A.11. Proofs for Section 7.2

Proof of Proposition 3.

For $q{\,:\!\!-\,}q_{\ell},R(x,y),q_{m},R(z,y),q_{r}$ , let $D$ be any database satisfying $q$ and let $j$ be a witness of $D$ satisfying $q$ . Note that if $y$ occurs in $q_{\ell}$ then, by linearity, it must be as an atom $F(x,y)$ immediately to the left of $R(x,y)$ . Furthermore, any such atom may be considered exogenous because it is never better to choose $F(a,b)$ over $R(a,b)$ . Furthermore, if $x$ occurs in $q_{m}$ , then it would be via an atom $F(x,y)$ immediately to the right of $R(x,y)$ . If so, we can assume it is immediately to the left of $R(x,y)$ . In particular, we may assume that neither $x$ nor $z$ occurs in $q_{m}$ .

We can write $j=A(a,b)R(a,b)B(b)R(c,b)C(b,c)$ where $A(a,b)$ , $B(b)$ , $C(b,c)$ stand for the atoms of $q_{\ell}(a,b),q_{m}(b),q_{r}(b,c)$ , respectively.

Let $N_{D}$ be a network flow for $D,q$ ignoring the fact that $q$ has a self-join. Thus $N_{D}$ has duplicates edges for its $R$ -tuples, i.e., for each $R(a,b)\in D$ there are two edges, $R_{\ell}(a,b),R_{r}(a,b)$ in $N_{D}$ . Assume that each edge corresponding to an endogenous, resp. exogenous tuple has weight 1, resp. $\infty$ .

Let $M$ be a min cut for $N_{D}$ . Let $\Gamma_{M}$ be the corresponding set of atoms of $D$ , where any edges $R_{\ell}(a,b),R_{r}(a,b)$ are replaced by the atom $R(a,b)$ . Observe that since there is no flow through $N_{D}-M$ , $\Gamma_{M}$ is a contingency set for $(D,q)$ .

We claim that in fact $\Gamma_{M}$ is a minimum contingency set for $(D,q)$ . The key idea is the following:

Lemma 6.

Let $M$ be a minimal cut of $N_{D}$ . Then $M$ does not include more than one instance of any $R$ tuple.

Proof.

Suppose to the contrary, that $M$ is a minimal cut for $N_{D}$ and contains both $R_{\ell}(a,b)$ and $R_{r}(a,b)$ . Since $M$ is minimal, it follows that $N_{D}-(M-\{R_{\ell}(a,b)\})$ and $N_{D}-(M-\{R_{r}(a,b)\})$ both contain flows:

$f_{1}=A(a,b)R_{\ell}(a,b)B(b)R(c,b)C(b,c)$ and

$f_{2}=A(a^{\prime},b)R(a^{\prime},b)B(b)R_{r}(a,b)C(b,a)$ . But then $N_{d}-M$ contains the flow

$f=A(a^{\prime},b)R(a^{\prime},b)B(b)R(c,b)C(b,c)$ , contradicting the fact that $M$ is a cut. See Fig. 13 for a depiction in the graph. ∎

Now, let $\Gamma$ be any contingency set. We claim that $\Gamma$ is the same size as some cut of $N_{D}$ . To see this, let us first let $S$ be the result of replacing each atom $R(a,b)\in\Gamma$ with both possible edges, $R_{\ell}(a,b),R_{r}(a,b)$ in $N_{D}$ . Since $\Gamma$ is a contingency set, it follows that $S$ is a cut of $N_{D}$ . Now, let $S^{\prime}$ be a minimal subset of $S$ that is still a cut, where some of the extra $R$ -edges, i.e., either $R_{\ell}(a,b)$ or $R_{r}(a,b)$ have been removed.

By the proof of Lemma 6, we know that $S^{\prime}$ has only one edge for each atom $R(a,b)\in\Gamma$ . Thus, $|S^{\prime}|=|\Gamma|$ as claimed. It follows that the size of a min cut of $N_{D}$ is the same as the size of a minimum contingency set for $(D,q)$ . ∎

A.12. Proofs for Section 7.3

Proof of Proposition 6.

We define a reduction from 3SAT to $\texttt{RES}(q_{\textup{perm}}^{AB})$ , see Figure 14. Similar to the previous cases, we want to create variable gadgets such that a minimum cover will choose either blue nodes (variable is set to true), or red nodes (variable is set to false), and a clause gadget (black nodes) such that if the clause is satisfied, then the minimum cover is 5, otherwise 6.

Let $\psi$ be a 3CNF formula with $n$ variables $v_{1},\ldots,v_{n}$ and $m$ clauses $C_{1},\ldots,C_{m}$ . Our reduction will map any such $\psi$ to a pair $(D_{\psi},k_{\psi})$ where $D_{\psi}$ is a database satisfying $q_{\textup{perm}}$ , and

[TABLE]

In our construction, if $\psi\in 3\mbox{{\rm\sc SAT}}$ , then the size of each minimum contingency set for $q_{\textup{perm}}^{AB}$ in $D_{\psi}$ will be $k_{\psi}=(3n+5)m$ , whereas if $\psi\not\in 3\mbox{{\rm\sc SAT}}$ , then the size of all contingency sets for $q_{\textup{perm}}^{AB}$ in $D_{\psi}$ will be greater than $k_{\psi}$ .

(1)

Variable gadget: For each variable $v_{i}$ and each $j\in[m]$ insert the following tuples into the database: $A(v_{i}^{j})$ , $B(v_{i}^{j})$ , $A(\overline{v_{i}^{j}})$ , $B(\overline{v_{i}^{j}})$ and $R(v_{i}^{j},\overline{v_{i}^{j}})$ , $R(\overline{v_{i}^{j}},v_{i}^{j})$ , $R(v_{i}^{j+1},\overline{v_{i}^{j}})$ , $R(\overline{v_{i}^{j}},v_{i}^{j+1})$ . If $j+1>m$ , then make the superscript 1.

We want to join those tuples such that the minimum contingency sets are to either choose all tuples $A(v_{i}^{j}),B(v_{i}^{j})$ representing a variable to have assignment true, or all tuples $A(\overline{v_{i}^{j}}),B(\overline{v_{i}^{j}})$ representing a variable to have assignment false, plus some $R$ -tuples. To obtain that property, we need the following additional tuples: $A(*_{i}^{j})$ , $B(*_{i}^{j})$ , $A(\overline{*_{i}^{j}})$ , $B(\overline{*_{i}^{j}})$ and $R(*_{i}^{j},v_{i}^{j})$ , $R(v_{i}^{j},*_{i}^{j})$ , $R(\overline{*_{i}^{j}},\overline{v_{i}^{j}})$ , $R(\overline{v_{i}^{j}},\overline{*_{i}^{j}})$ .

With this construction we guarantee that we can “cover” the variable gadget by choosing either all positive $A,B$ -tuples plus the $m$ tuples $R(\overline{*_{i}^{j}},\overline{v_{i}^{j}})$ , or all negative $A,B$ -tuples plus the $m$ tuples $R(*_{i}^{j},v_{i}^{j})$ . In both cases, we choose $3m$ tuples. 2. (2)

Clause gadget: For each clause $j\in[m]$ insert the following tuples into the database: $A(a_{j})$ , $B(a_{j})$ , $A(b_{j})$ , $B(b_{j})$ , $A(c_{j})$ , $B(c_{j})$ , $R(a_{j},b_{j})$ , $R(b_{j},a_{j})$ , $R(b_{j},c_{j})$ , $R(c_{j},b_{j})$ , $R(c_{j},a_{j})$ , $R(a_{j},c_{j})$ and $A(a_{j}^{\prime})$ , $B(a_{j}^{\prime})$ , $A(b_{j}^{\prime})$ , $B(b_{j}^{\prime})$ , $A(c_{j}^{\prime})$ , $B(c_{j}^{\prime})$ , $R(a_{j},a_{j}^{\prime})$ , $R(a_{j}^{\prime},a_{j})$ , $R(b_{j},b_{j}^{\prime})$ , $R(b_{j}^{\prime},b_{j})$ , $R(c_{j},c_{j}^{\prime})$ , $R(c_{j}^{\prime},c_{j})$ and

For this gadget, we have 3 options to choose only 5 tuples in order to delete all the witnesses. For example: $A(a_{j})$ , $B(a_{j})$ , $A(b_{j})$ , $B(b_{j}),R(c_{j},c_{j}^{\prime})$ . 3. (3)

Connecting the gadgets: For each variable $i$ that appears in clause $j$ at position 1, add the following tuples: $R(v_{i}^{j},a_{j}),R(a_{j},v_{i}^{j})$ if $v_{i}$ appears as positive, and $R(\overline{v_{i}^{j}},a_{j}),R(a_{j},\overline{v_{i}^{j}})$ if it appear as negative. Analogously use $b_{j}$ or $c_{j}$ instead of $a_{j}$ for positions 2 and 3 instead of position 1.

After connecting the variable gadgets with the clause gadgets, the witnesses are formed such that if a clause cannot be satisfied, then we need to pick all $A$ - and $B$ -tuples from the clause gadget (the black triangle), totaling 6 tuples. Otherwise, we can delete all witnesses by picking 5 tuples, namely 2 pairs of $A,B$ -tuples and one $R$ -tuple. ∎

Proof of Proposition 7.

There are 2 cases.

Case 1: $q$ is not bound. We can write $q=q_{\ell}(x),G(x,y)$ where $q_{\ell}(x)$ does not contain the variable $y$ . $G(x,y)$ includes $R(x,y),R(y,x)$ and may include exogenous atoms containing the variable $y$ . Think of $G(x,y)$ as the rightmost group in Figure 9.

For any database, $D\models q$ , $\texttt{RES}(D,q)$ is equivalent to the following Network Flow. As usual, each endogenus atom from the pseudo-linear $q_{\ell}(x)$ becomes a 1-weight edge and each exogenus atom is an $\infty$ -weight edge. Whenever $\{R(c,d),R(d,c)\}\subseteq D$ , we add $\infty$ -weight edges from the rightmost output of $q_{\ell}(c)$ and $q_{\ell}(d)$ to $\{c,d\}$ and a 1-weight edge from $\{c,d\}$ to the terminal node, $t$ .

Case 2: $q$ is bound. We can write $q=q_{\ell}(x),G(x,y),q_{r}(y)$ where $G(x,y)$ includes $R(x,y),R(y,x)$ and may include an essentially exogenous atom $D(x,y)$ if that occurs in $q$ . The relevant issues are that removing $G(x,y)$ separates $q_{\ell}(x)$ from $q_{r}(y)$ and these contain at least one endogenous atom each.

We define a reduction from $\texttt{RES}(q_{\textrm{perm}}^{AB})$ to $\texttt{RES}(q)$ . We say that variable $z\ \text{isLike}\ x$ , if $z$ occurs in $q_{\ell}(x)$ . Otherwise, $z$ $\text{isLike}\ y$ .

Now consider a database $D$ with $D\models q_{\textrm{perm}}^{AB}$ . We define a new database $D^{\prime}$ such that for each atom $S_{i}(v_{1},v_{2})$ or $A(v)$ occurring in $q$ , we define

[TABLE]

where

[TABLE]

It is clear that the witnesses and minimum contingency sets of $D\models q_{\textrm{perm}}^{AB}$ are exactly preserved in $D^{\prime}\models q$ . ∎

A.13. Proof for Section 7.4

Proof of Proposition 8.

First consider $q=z_{3}$ . Given a database $D$ such that $D\models z_{3}$ , witnesses can be of two forms:

[TABLE]

From that, we can conclude that no tuple $R(a,b)$ with $a\neq b$ needs to be in a contingency set, since we can choose either $R(a,a)$ or $A(b)$ instead. Thus, we can construct a network flow that doesn’t include tuples $R(a,b)$ and solve resilience for $z_{3}$ . Note that when we consider any expansion of $z_{3}$ that is pseudo-linear, we always have that $R(a,b)$ with $a\neq b$ is not needed in a minimum contingency set. This property together with the assumption that query $q$ is pseudo-linear, allows for a construction of a network flow to solve resilience. Therefore, $\texttt{RES}(q)$ is in P. ∎

A.14. Proof for Section 7.5

Proof of Theorem 9.

If $q$ has a triad, then $\texttt{RES}(q)$ is NP-complete by Theorem 6. By Theorem 7, we only need to consider the cases where $q$ is pseudo-linear.

In this case, if $q$ has a path (Theorem 1, Theorem 2), then $\texttt{RES}(q)$ is NP-complete. Paths cover all the queries where $R$ -atoms do not share a variable, including cases with variable repetition. It remains to characterize the complexity of the queries where $R$ -atoms share at least one variable. Note that chain, permutation, and confluence are the only three possible patterns for a query with exactly two $R$ -atoms and no variable repetition.

If $q$ has a chain , then $\texttt{RES}(q)$ is np-complete (Proposition 2). If $q$ has a permutation, then $\texttt{RES}(q)$ is NP-complete when the permutation is bounded, and it is in P, when the permutation is unbounded (Proposition 7). These are the only two possible ways a permutation can occur. If $q$ has a confluence, then $\texttt{RES}(q)$ is NP-complete when there is an exogenous path, and it is in P otherwise (Proposition 4).

Now we only have left the case where $q$ has variable repetition and the $R$ -atoms share a variable, which implies $\texttt{RES}(q)$ is in P (Proposition 8).

Since we have exhausted all the cases to consider, we show that there is a dichotomy for the class of ssj binary queries with only two $R$ -atoms. ∎

A.15. Proofs for Section 8.1

Proof of Proposition 1.

We define a reduction from $\texttt{RES}(q_{\textup{{chain}}})$ to $\texttt{RES}(q)$ , using a strategy similar to the proof of Theorem 1. ∎

A.16. Proofs for Section 8.2

Proof of Proposition 2.

We reduce Max 2-SAT to $\texttt{RES}(q_{3\textrm{conf}}^{AC})$ . Given a 2CNF formula, $\varphi$ , with $n$ variables and $m$ clauses, and a number $r<m$ , we produce a database, $D$ , and bound $k$ , such that $\varphi$ has an assignment satisfying at least $r$ clauses iff $(D,k)\in\texttt{RES}(q_{3\textrm{conf}}^{AC})$ . The construction is drawn in Figure 15. A sample variable gadget for variable $x$ is shown. The two minimum contingency sets consist of $2s$ $x$ nodes, plus 2 helper nodes in the two crossover gadgets or $2s$ $\overline{x}$ nodes, plus 2 helper nodes, corresponding to variable $x$ being true or false, respectively. The reason for the crossover is so that each variable can be instantiated via diamonds and hexagons corresponding to the atoms $A,C$ , respectively.

The clause gadgets for clauses of size 1 and size 2 are also drawn. Clauses of size 1 need no nodes chosen when they are true and one node otherwise. Clauses of size 2 need 1 node chosen when they are true and 2 when they are false. Let $d$ be the number of clauses of size 2 in $\varphi$ . Saying that at least $r$ clauses of $\varphi$ are true means that at most $m-r$ clauses are false. Thus, the size of the minimum contingency set is $k=n(2s+2)+d+m-r$ . ∎

Proof of Proposition 4.

First observe that any contingency set contains only $R$ -tuples, since $S,T$ are dominated and therefore exogenous. For any tuple $R(a,b)\in D$ , if $S(a,b),T(a,b)\in D$ , then $R(a,b)$ must be in all contingency sets, since those 3 tuples form a witness. Let $\Gamma_{TS}$ be the set of all such tuples. We then proceed to create a flow with tuples $D^{\prime}=D-\Gamma_{TS}$ and we claim that $\Gamma=\Gamma_{TS}\cup C$ is a min contingency set for $(q_{3\textrm{conf}}^{TS},D)$ , where $C$ is a min cut found by flow.

Let $C$ be a min cut and suppose there is a $\Gamma^{\prime}$ such that $D^{\prime}-\Gamma^{\prime}\not\models q_{3\textrm{conf}}^{TS}$ and $|C|>|\Gamma^{\prime}|$ . That implies that there are at least 2 witnesses that can be broken by deleting one tuple but the min cut chose to delete 2 edges. Consider the tuple $R(a,b)$ and these witnesses to be

[TABLE]

Note that with this set of tuples we also have witness

[TABLE]

which cannot be deleted by deleting $R(a,b)$ , contradicting the assumption that it was possible. ∎

A.17. Proofs for Section 8.3

Proof of Proposition 5.

Reduction from $\texttt{RES}(q_{\textup{{chain}}})$ . ∎

Proof of Proposition 6.

Reduction from Max 2SAT, similar to the one used for $q_{\textrm{3conf}}^{AC}$ . ∎

A.18. Proofs for Section 8.4

Proof of Proposition 7.

This is similar to Proposition 6. The difference is that while $A(a)$ “dominates” the 1-way tuple $R(a,b)$ in $q_{\textrm{3perm-R}}^{A}$ , it is not the case that $S(e_{1},a)$ would dominate $R(a,b)$ because there might be many $e_{i}$ ’s such that $S(e_{i},a)\in D$ , in which case it might be advantageous to choose one $R(a,b)$ instead of many $S(e_{i},a)$ ’s.

We thus modify the flow graph to include all the $S(e,a)$ edges at cost 1 each on the left, all the $\{a,b\}$ pairs at cost 1 each on the right. We include $\infty$ -weight edges from any $S(e,a)$ to $\{a,b\}$ plus cost 1 edges from $S(e,a)$ to $\{b,c\}$ for any 1-way edges $R(a,b)$ .

Let $M$ be a min-cost flow and form $\Gamma$ by including all the $S(e,a)$ ’s and 1-way $R(a,b)$ ’s from $M$ together with one of $R(a,b)$ or $R(b,a)$ whenever $\{a,b\}\in M$ . Similar to Proposition 6, the rule for which to choose is that if some $S(e,a)\in(D-M)$ but no $S(f,b)\in(D-M)$ , then add $R(a,b)$ to $\Gamma$ . Symmetrically, if $S(e,b)\in(D-M)$ but no $S(f,a)\in(D-M)$ , then add $R(b,a)$ to $\Gamma$ ; otherwise, arbitrarily add one or the other.

The same argument as in Proposition 6 shows that the resulting $\Gamma$ is a minimum contingency set. ∎

Proof of Proposition 8.

We reduce 3SAT to $\texttt{RES}(q_{\textrm{3perm-R}}^{S_{xy}})$ . The idea for the variable gadgets is that for a database that contains the tuples $T_{x_{i}}=\{S(x_{i},\overline{x_{i}})$ , $R(x_{i},\overline{x_{i}})$ , $S(\overline{x_{i}},{x_{i}})$ , $R(\overline{x_{i}},{x_{i}})\}$ , we must choose exactly one $R(x_{i},\overline{x_{i}})$ or $R(\overline{x_{i}},{x_{i}})$ , the first of which will correspond to the assignment $x$ to 1, and the second of which, to 0. In full detail, the $x$ gadget consists of a chain of these choices, i.e., the union of $T_{x_{i}}$ , $i=1\ldots,m$ , together with all the tuples $R(x_{i},x_{i+1})$ , $R(x_{i+1},x_{i})$ , $R(\overline{x_{i}},\overline{x_{i+1}})$ , $R(\overline{x_{i+1}},\overline{x_{i}})$ . For a minimum contingency over this gadget we may choose all of the $R(x_{i},x_{i+1})$ and $R(x_{i},\overline{x_{i}})$ edges (corresponding to $x$ gets 1), or all the $R(\overline{x_{i}},\overline{x_{i+1}})$ and $R(\overline{x_{i}},{x_{i}})$ edges (corresponding to $x$ gets 0).

The clause gadget is similar. If $C_{i}$ is $(x\lor\overline{y}\lor z)$ , then the clause can eliminate two, but not all three pointers to the edges $\{x_{i},x_{i+1}\}$ , $\{\overline{y_{i}},\overline{y_{i+1}}\}$ , $\{z_{i},z_{i+1}\}$ after removing 8 tuples. To simplify the explanation, let $P(a,b)=\{R(a,b),R(b,a)\}$ and $F(a,b)=P(a,b)\cup\{S(a,b),S(b,a)\}$ for elements $a,b\in D$ . The $C_{i}$ clause gadget contains the union of the following sets of tuples: $F(a_{i},b_{i})$ , $F(b_{i},c_{i})$ , $F(c_{i},a_{i})$ , $F(a_{i},x_{i})$ , $F(b_{i},\overline{y_{i}})$ , $F(c_{i},z_{i})$ , $P(a_{i},a_{i}^{\prime})$ , $P(b_{i},b_{i}^{\prime})$ , $P(c_{i},c_{i}^{\prime})$ . The idea is that for each full pair, $F(e,f)$ , exactly one of $R(e,f)$ or $R(f,e)$ must be chosen in the minimum contingency set $\Gamma$ . $C_{i}$ is designed so that a contingency set of size 8 exists iff at least one pair from $P(x_{i},x_{i+1})$ , $P(\overline{y_{i}},\overline{y_{i+1}})$ , $P(z_{i},z_{i+1})$ has been previously chosen, i.e., iff the clause $C_{i}$ is true. ∎

Proof of Proposition 9.

We reduce $\texttt{RES}(q_{\textrm{perm}}^{AB})$ to $\texttt{RES}(q_{\textrm{3perm-R}}^{AC})$ . Given a database $D\models q_{\textrm{perm}}^{AB}$ , construct $D^{\prime}\models q_{\textrm{3perm-R}}^{AC}$ as

[TABLE]

It then follows, that it is always at least as good to put $A(a^{\prime})$ into $\Gamma$ , rather than $R(a^{\prime},a)$ . Thus, the minimum contingency sets for $(D^{\prime},q_{\textrm{3perm-R}}^{AC})$ correspond exactly to the minimum contingency sets for $(D,q_{\textrm{perm}}^{AB})$ .

For $\texttt{RES}(q_{\textrm{3perm-R}}^{AB})$ , Even though $q_{\textrm{perm}}^{AB}\rightarrow q_{\textrm{3perm-R}}^{AB}$ , there is no obvious reduction between $\texttt{RES}(q_{\textrm{perm}}^{AB})$ and $\texttt{RES}(q_{\textrm{3perm-R}}^{AB})$ . However, the same reduction from 3SAT to $\texttt{RES}(q_{\textrm{perm}}^{AB})$ in Proposition 6 also works for $\texttt{RES}(q_{\textrm{3perm-R}}^{AB})$ .

For $\texttt{RES}(q_{\textrm{3perm-R}}^{S_{xy}BC})$ , we can define a reduction from $\texttt{RES}(q_{\textrm{perm}}^{AB})$ . ∎

A.19. Proofs for Section 8.5

Proof of Proposition 10.

For $\texttt{RES}(z_{4})$ , a reduction from $\texttt{RES}(q_{\textup{{vc}}})$ is enough. Note that tuples $R(a,b)$ with $a\neq b$ do not need to be in a contingency set.

For $\texttt{RES}(z_{5})$ , a reduction from Max 2SAT, similar to the one used in Proposition 2, can be used to show NP-hardness. ∎

Appendix B Relevant proofs from sj-free case

Proof of Proposition 4.

Let $\Gamma$ be a minimum contingency set of $q$ in $D$ . Suppose that atom $A$ dominates atom $B$ but there is some tuple $B(\bm{\mathbf{t}})\in\Gamma$ . Let $\bm{\mathbf{p}}$ be the projection of $\bm{\mathbf{t}}$ onto $\textup{{var}}(A)$ . Then we can replace $B(\bm{\mathbf{t}})$ by $A(\bm{\mathbf{p}})$ and we remove at least as many witnesses that $D\models q$ . It follows, as desired, that the complexity of $\texttt{RES}(q)$ is unchanged if $B$ is exogenous, i.e., $\texttt{RES}(q)\equiv\texttt{RES}(q^{\prime})$ . ∎

Proposition 1 (Triangle $q_{\triangle}$ is hard).

$\texttt{RES}(q_{\triangle})$ * is NP-complete.*

Proof of Proposition 1.

We reduce 3SAT to $\texttt{RES}(q_{\triangle})$ . It will then follow that $\texttt{RES}(q_{\triangle})$ is NP complete. Let $\psi$ be a 3CNF formula with $n$ variables $v_{1},\ldots,v_{n}$ and $m$ clauses $C_{0},\ldots,C_{m-1}$ . Our reduction will map any such $\psi$ to a pair $(D_{\psi},k_{\psi})$ where $D_{\psi}$ is a database satisfying $q_{\triangle}$ , and

[TABLE]

In our construction, if $\psi\in 3\mbox{{\rm\sc SAT}}$ , then the size of each minimum contingency set for $q_{\triangle}$ in $D_{\psi}$ will be $k_{\psi}=6mn$ , whereas if $\psi\not\in 3\mbox{{\rm\sc SAT}}$ , then the size of all contingency sets for $q_{\triangle}$ in $D_{\psi}$ will be greater than $k_{\psi}$ .

Note $D_{\psi}\models q_{\triangle}$ iff it contains three pairs $R(a,b)$ , $S(b,c)$ , $T(c,a)$ . We visualize $R(a,b)$ as a red edge, $S(b,c)$ as a green edge and $T(c,a)$ as a blue edge. Thus each witness $(a,b,c)$ that $D_{\psi}\models q_{\triangle}$ is an RGB triangle. (Notice that the edge direction $a\rightarrow b$ drawn in Figure 16 corresponds to the variable order in $R$ , and analogously for $S$ and $T$ .) The job of a contingency set for $q_{\triangle}$ is to remove all RGB triangles.

$D_{\psi}$ contains one circular gadget $G_{i}$ for each variable $v_{i}$ . The circle consists of $12m$ solid edges, half of them marked $v_{i}$ and the other half marked $\overline{v_{i}}$ (see 16(a) and 16(b)). Note that there are $12m$ RGB triangles and they can be minimally broken by choosing the $6m$ $v_{i}$ edges or the $6m$ $\overline{v_{i}}$ edges. Any other way would require more edges removed. Thus, each minimum contingency set for $D_{\psi}$ corresponds to a truth assignment to the variables of $\psi$ . And there will be a minimum contingency set of size $k_{\psi}=6mn$ iff $\psi\in 3\mbox{{\rm\sc SAT}}$ .

We complete the construction of $D_{\psi}$ by adding one RGB triangle for each clause $C_{j}$ . For example, suppose $C_{j}=v_{1}\lor\overline{v_{2}}\lor v_{3}$ . The RGB triangle we add consists of a red edge marked $v_{1}$ , a green edge marked $\overline{v_{2}}$ and a blue edge marked $v_{3}$ (see 16(c)). Note that if the chosen assignment satisfies $C_{j}$ , then all $v_{1}$ edges are removed, or all $\overline{v_{2}}$ edges are removed, or all $v_{3}$ edges are removed. Thus the $C_{j}$ triangle is automatically removed.

How do we create $C_{j}$ ’s RGB triangle? Remember that we have chosen $G_{i}$ to contain 2 segments for each clause. We use the $j$ th odd-numbered segment of $G_{i}$ to produce the $v_{i}$ or $\overline{v_{i}}$ used in the clause- $j$ triangle. The even numbered segments are not used: they serve as buffers to prevent spurious RGB triangles from being created (In 16(b) we mark these even segments with frowns: they are sad because they are never used).

More precisely, the red $v_{1}$ -edge from $G_{1}$ is $(a^{1}_{4j+1},b^{1}_{4j+1})$ , the green $\overline{v_{2}}$ -edge from $G_{2}$ is $(b^{2}_{4j+1},c^{2}_{4j+1})$ , and the blue $v_{3}$ -edge from $G_{3}$ is $(c^{3}_{4j+1},a^{3}_{4j+2})$ (see 16(c)).

Now to make this an RGB triangle in $D_{\psi}$ , we identify the two $a$ -vertices, the two $b$ vertices and the two $c$ vertices. In other words, $G_{1}$ ’s $a$ -vertex $a^{1}_{4j+1}$ is equal to $G_{3}$ ’s $a$ -vertex $a^{3}_{4j}$ , i.e., they are the same element of the domain of $D_{\psi}$ . We have thus constructed $C_{j}$ ’s RGB triangle (see 16(c)).

The key idea is that these identifications can only create this single new RGB triangle because there is no other way to get back to $G_{1}$ from $G_{2}$ in two steps. All other identifications involve different segments and so are at least six steps away. Recall that this is the reason why the odd-numbered segments in the $G_{i}$ ’s are not used: this ensures that no additional RGB triangles are created.

Thus, as desired, Equation 3 holds and we have reduced $3\mbox{{\rm\sc SAT}}$ to $\texttt{RES}(q_{\triangle})$ . ∎

Proposition 2 (Tripod $q_{\textup{{T}}}$ is hard).

$\texttt{RES}(q_{\textup{{T}}})$ * is NP-complete.*

Proof of Proposition 2.

We reduce $\texttt{RES}(q_{\triangle})$ to $\texttt{RES}(q_{\textup{{T}}})$ . It will then follow that $\texttt{RES}(q_{\textup{{T}}})$ is NP-complete. Let $(D,k)$ be an instance of $\texttt{RES}(q_{\triangle})$ . We construct an instance $(D^{\prime},k)$ of $\texttt{RES}(q_{\textup{{T}}})$ by constructing relations $A,B,C$ as copies of $R,S,T$ from $D$ . Define $D^{\prime}=(A,B,C,W)$ as follows:

[TABLE]

Here, $\langle ab\rangle$ stands for a new unique domain value resulting from the concatenation of domain values $a$ and $b$ . Observe that there is a 1:1 correspondence between the witnesses of $D\models q_{\triangle}$ and the witnesses of $D^{\prime}\models q_{\textup{{T}}}$ . Thus, every contingency set for $q_{\triangle}$ in $D$ corresponds to a contingency set of the same size for $q_{\textup{{T}}}$ in $D^{\prime}$ . Furthermore no minimum $\Gamma^{\prime}$ from $D^{\prime}$ needs to choose tuples from $W$ . If $\bm{\mathbf{t}}=W(\langle ab\rangle,\langle bc\rangle,\langle ac\rangle)$ were in $\Gamma^{\prime}$ , then we could replace it by $A(\langle ab\rangle)$ , which suffices to remove all the witnesses removed by $\bm{\mathbf{t}}$ . As we will explain later, $A$ “dominates” $W$ (3). It follows that $(D,k)\in\texttt{RES}(q_{\triangle})\Leftrightarrow(D^{\prime},k)\in\texttt{RES}(q_{\textup{{T}}})$ . ∎

Proof of Lemma 6.

Let $q$ be a query with triad ${\mathcal{T}}=\{S_{0},S_{1},S_{2}\}$ . We build a reduction from $\texttt{RES}(q_{\triangle})$ to $\texttt{RES}(q)$ . Given any $D$ that satisfies $q_{\triangle}$ we will produce a database $D^{\prime}$ that satisfies $q$ such that for all $k$ :

[TABLE]

We will assume that no variable is shared by all three elements of ${\mathcal{T}}$ (we can ignore any such variable by setting it to a constant). Our proof splits into two cases:

Case 1: $\textup{{var}}(S_{0}),\textup{{var}}(S_{1}),\textup{{var}}(S_{2})$ are pairwise disjoint. Our reduction is similar to the reduction from $q_{\triangle}$ to $q_{\textup{{T}}}$ (2).

We first define the triad relations in $D^{\prime}$ :

[TABLE]

Thus, each tuple of, for example, $S_{0}$ consists of identical entries with value $\langle ab\rangle$ for each pair $R(a,b)\in D$ . Thus, $S_{0},S_{1},S_{2}$ mirror $R,S,T$ , respectively.

To define all the other atoms $A_{i}$ of $D^{\prime}$ , we first partition the variables of $q$ into 4 disjoint sets: $\textup{{var}}(q)=\textup{{var}}(S_{0})\cup\textup{{var}}(S_{1})\cup\textup{{var}}(S_{2})\cup V_{3}$ . Now for each atom $A_{i}$ , arrange its variables in these four groups. Then define the atom $A^{\prime}_{i}$ of $D^{\prime}$ as follows:

[TABLE]

For example, all the variables $v\in\textup{{var}}(S_{0})$ are assigned the value $\langle ab\rangle$ and all the variables $v\in V_{3}$ are assigned $\langle abc\rangle$ .

By the definition of triad, there is a path from $S_{0}$ to $S_{1}$ not using any edges (variables) from $\textup{{var}}(S_{2})$ . Thus, any witness that $D^{\prime}\models q$ which includes occurrences of $\langle ab\rangle$ and $\langle b^{\prime}c^{\prime}\rangle$ must have $b=b^{\prime}$ .

Similarly, a path from $S_{1}$ to $S_{2}$ guarantees that $c$ is preserved and a path from $S_{2}$ to $S_{0}$ guarantees that $a$ is preserved. It follows that the witnesses that $D^{\prime}\models q$ are essentially identical to the witnesses that $D\models q_{\triangle}(x,y,z)$ (See Fig. 17).

Furthermore, any minimum contingency set only needs tuples from $S_{0},S_{1}$ or $S_{2}$ . For example, if a tuple contains $\langle ab\rangle$ or $\langle abc\rangle$ , then it can be replaced by a tuple from $S_{0}$ . Thus the sizes of minimum contingency sets are preserved, i.e., Equation 4 holds, as desired. Thus $\texttt{RES}(q)$ is NP-complete.

Case 2: $\textup{{var}}(S_{i})\cap\textup{{var}}(S_{j})\neq\emptyset$ for some $i\neq j$ : We generalize the construction from Case 1 as follows. Partition $\textup{{var}}(S_{i})$ into those unshared, those shared with $S_{i-1}$ , and those shared with $S_{i+1}$ (Addition is mod 3).

We then assign the relations of the triad as follows:

[TABLE]

Since none of the $S_{i}$ ’s is dominated, in each case both possible values occur, e.g., $a$ and $b$ both occur in the tuples of $S_{0}$ Thus as in Case 1, $S_{0},S_{1},S_{2}$ capture $R,S,T$ , respectively. We now partition $\textup{{var}}(q)$ into 7 sets as follows. The key idea is that for each assignment of $x,y,z$ to values $a,b,c$ in $D$ , we will make assignments according to that partition.

[TABLE]

We then define each other atom $A$ in $D^{\prime}$ to be the following set of tuples, where the only difference between atoms is which of the 7 members of the partition of variables occurs in $\textup{{var}}(A)$ .

[TABLE]

By the definition of triad, there is a path from $S_{0}$ to $S_{1}$ not using any edges (variables) from $S_{2}$ , i.e., none from $\textup{{var}}(S_{2})\cup V_{4}\cup V_{6}$ . Thus, any witness including occurrences of some of $\langle ab\rangle,b^{\prime},\langle b^{\prime\prime}c\rangle$ must have $b=b^{\prime}=b^{\prime\prime}$ . Thus, as in Case 1, the witnesses of $D^{\prime}\models q$ are essentially identical to the witnesses of $D\models q_{\triangle}$ and we have reduced $\texttt{RES}(q_{\triangle})$ to $\texttt{RES}(q)$ . ∎

Appendix C Independent Join Paths: details

We give more details on the concept of Independent Join Paths. We start with some intuition by providing examples (Section C.1), state our conjecture, and finish by pointing out how this concept could possibly allow an automated search for hardness proofs (Section C.2), a prospect we are especially excited about.

C.1. IJP Examples

We give here examples of IJPs for various queries and earlier hardness reductions, and provide the intuition for our 4 conditions.

Standard paths. The first example shows that IJPs contain standard paths (Theorem 1) as a special case.

Example 1 ( $q_{\textup{{vc}}}$ ).

Consider our simplest example for an SJ-path implying hardness: $q_{\textup{{vc}}}$ from Fig. 2(a). The following database of 3 tuples forms an IJP:

[TABLE]

(1)

We have $R(1)$ and $R(2)$ with $\{1\}\not\subseteq\{2\}$ and $\{2\}\not\subseteq\{1\}$ . 2. (2)

$R(1)$ * and $R(2)$ each participate in only one witness, which in this case is the same one.* 3. (3)

$R$ * being unary, there can’t be any other relation with a strict subset of the constants.* 4. (4)

No exogenous relation. 5. (5)

The resilience $\rho(q_{\textup{{vc}}},D)=1$ , but becomes 0 after removing either $R(1)$ or $R(2)$ or both.

Triads. The second example shows that any query with a triad can form IJPs. We illustrate with our favorite triangle query.

Example 2 ( $q_{\triangle}$ ).

Consider the triangle query as the simplest example of a non-linear SJ-free query containing a triad (see Fig. 1(a)). The following database of 7 tuples form an IJP:

[TABLE]

(1)

We have $R(1,2)$ and $R(4,5)$ with $\{1,2\}\not\subseteq\{4,5\}$ and $\{4,5\}\not\subseteq\{1,2\}$ . 2. (2)

$R(1,2)$ * only participates in witness $w_{1}=(1,2,3)$ , and $R(4,5)$ only participates in witness $w_{2}=(4,5,3)$ .* 3. (3)

No other relation has a strict subsets of the constants from $R$ 4. (4)

No exogenous relation. 5. (5)

The resilience $\rho(q_{\triangle},D)=2$ , but becomes 1 after removing either $R(1,2)$ , or $R(4,5)$ , or both.

Figure 18* illustrates the 3 joins forming the IJP. The connection to our idea from Fig. 8(b) now becomes clearer. Also notice that this IJP forms the basic element of our prior hardness proof for triads.*

More complicated IJPs. The third example uses a more complicated IJP.

Example 3 (more complicated gadget).

Consider the query

[TABLE]

Then following database forms an IJP:

[TABLE]

(1)

We have $A(9)$ and $A(13)$ . 2. (2)

$A(9)$ * only participates in witness $w_{1}=(9,8,7)$ and $A(13)$ only participates in witness $w_{2}=(13,12,11)$ .* 3. (3)

No other relation has a strict subset of the constants from $A$ . 4. (4)

No exogenous relation. 5. (5)

The resilience $\rho(q_{\textup{{vc}}},D)=4$ with

[TABLE]

but becomes 3 after ( $i$ ) removing $A(9)$ with

[TABLE]

or ( $ii$ ) removing $A(13)$ with

[TABLE]

or ( $iii$ ) removing both with

[TABLE]

Figure 19* illustrates how these 21 tuples create 8 different joins, representing the IJP. It turns out that this IJP is “hidden” and can be spotted by the careful reader in the crossover part of the variable gadget used in Proposition 2.*

Condition 4. We next give one example that illustrates why we need condition 4 of our definition for IJPs. In particular, this query is an example in which two (instead of only one) relation is repeated. We know through a dedicated proof that the complexity of this query is in PTIME. We illustrate a “failed attempt” to create an IJP and point out the problems that would arise if we ignored condition 4.

Example 4 (Independent paths).

Consider the following query $q{\,:\!\!-\,}$ ${A}^{\textup{x}}(x),R(x),S(x,y),S(z,y),R(z),{B}^{\textup{x}}(z)$ which contains two repeated relations. We investigate the canonical database

[TABLE]

and its ability to form an IJP.

(1)

We have $R(1)$ and $R(3)$ . 2. (2)

$R(1)$ * and $R(3)$ participate in only one witness $w=(1,2,3)$ .* 3. (3)

No other relation has a strict subset of the constants from $A$ . 4. (4)

Condition 3 requires that ${B}^{\textup{x}}(1)$ and ${A}^{\textup{x}}(3)$ be added to the database, which is currently not the case, and which we ignore for a moment. 5. (5)

The resilience is 1, and becomes 0 if any tuple is removed.

The crucial condition 4 forces us to add ${B}^{\textup{x}}(1)$ and ${A}^{\textup{x}}(3)$ to the database. And then condition 2 and 5 are not true anymore. Addition of these tuples form 2 more joins $\{R(1),{A}^{\textup{x}}(1),S(1,2),S(1,2),R(1),{B}^{\textup{x}}(1)\}$ and $\{R(3),{A}^{\textup{x}}(3),S(3,2),S(3,2),R(3),{B}^{\textup{x}}(3)\}$ , which requires both tuples $R(1)$ and $R(3)$ to be removed make the query false.

In other words, the canonical database is not enough to succeed with the reduction from VC (recall Fig. 8(b): any two edges incoming and outgoing from vertex $a$ create addition joins.

C.2. Toward an automated proof construction

At its core, each IJP can be considered as a set of “canonical databases” or witnesses, which have been appropriately “aligned.” We give the intuition with the triangle query $q_{\triangle}$ from Example 2 and Fig. 18.

Example 5.

Assume we construct three disjoint canonical databases:

[TABLE]

The total number of constants used is 9, three for each of the three joins.

We can now look at all the possible ways in which these $n=9$ constant can be partitioned into nonempty subsets. The answer is given by the Bell number and is 21147 for $n=9$ . Exhaustive enumeration over these 21147 cases will also lead to partition

[TABLE]

which is isomorph to the IJP from Fig. 18.

Our Definition 1 now provides a procedure to test that the resulting database indeed forms an IJP.

The more general procedure is now as follows

(1)

for an increasing number of joins $k=1,2,3,\ldots$ 2. (2)

for all possible partitions 3. (3)

for all pairs of tuples of the same relation that are not dominated 4. (4)

if an exogenous tuple contains a subset of the constants, then possible add a second tuple 5. (5)

calculate the minimal VC of the resulting hypergraph under the 4 cases $\{(0,0),(0,1),(1,0),(1,1)\}$ , where 0 and 1 mean that a tuple is present or absent, respectively.

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Abiteboul et al . (1995) Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases . Addison-Wesley. https://dl.acm.org/doi/10.5555/551350
3Amarilli et al . (2017) Antoine Amarilli, Mikaël Monet, and Pierre Senellart. 2017. Conjunctive Queries on Probabilistic Graphs: Combined Complexity. In PODS . 217–232. https://doi.org/10.1145/3034786.3056121 · doi ↗
4Bancilhon and Spyratos (1981) F. Bancilhon and N. Spyratos. 1981. Update Semantics of Relational Views. ACM TODS 6, 4 (1981), 557–575. https://doi.org/10.1145/319628.319634 · doi ↗
5Buneman et al . (2001) Peter Buneman, Sanjeev Khanna, and Wang Chiew Tan. 2001. Why and Where: A Characterization of Data Provenance. In ICDT . 316–330. https://doi.org/10.1007/3-540-44503-X_20 · doi ↗
6Buneman et al . (2002) Peter Buneman, Sanjeev Khanna, and Wang-Chiew Tan. 2002. On Propagation of Deletions and Annotations Through Views. In PODS . 150–158. https://doi.org/10.1145/543613.543633 · doi ↗
7Chandra and Merlin (1977) Ashok K. Chandra and Philip M. Merlin. 1977. Optimal Implementation of Conjunctive Queries in Relational Data Bases. In STOC . 77–90. https://doi.org/10.1145/800105.803397 · doi ↗
8Chapman and Jagadish (2009) Adriane Chapman and H. V. Jagadish. 2009. Why not?. In SIGMOD . 523–534. https://doi.org/10.1145/1559845.1559901 · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

New Results for the Complexity of Resilience for Binary Conjunctive Queries with Self-Joins

Abstract.

1. Introduction

2. Background and Prior Results

2.1. Query resilience

Definition 1 (Resilience Decision).

Example 2 (Hypergraphs).

2.2. Domination

Definition 3 (Domination).

Proposition 4 ( Domination for resilience (Meliou

2.3. Triads and hardness

Definition 5 (Triad).

Lemma 6 ( Triads make RES(q)\texttt{RES}(q)RES(q) hard (Freire et al., 2015)).

2.4. Linear queries

2.5. Dichotomy Theorem

Theorem 7 ( Dichotomy of resilience for sj-free CQs (Freire et al., 2015)).

3. Self-joins change everything

Definition 1 (Binary graph).

3.1. Basic hard queries: qvcq_{\textup{{vc}}}qvc​ and qchainq_{\textup{{chain}}}qchain​

Proposition 2 (qvcq_{\textup{{vc}}}qvc​).

Proposition 3 (qchainq_{\textup{{chain}}}qchain​).

3.2. SJ-Free domination no longer works

Example 4.

3.3. Easy queries that use flow in a trickier way

Proposition 5 (qconfACq_{\textrm{conf}}^{AC}qconfAC​).

Proposition 6 (q3perm-RAq_{\textrm{3perm-R}}^{A}q3perm-RA​).

4. New general observations and plan of attack

4.1. Minimal queries

4.2. Query components

Lemma 1 (\CIRCLE\CIRCLE\CIRCLE Query components).

Lemma 2 (\CIRCLE\CIRCLE\CIRCLE Query components complexity).

4.3. SJ-domination

Definition 3 ( Domination with Self-Joins).

Example 4.

Proposition 5 (\CIRCLE\CIRCLE\CIRCLE Domination for resilience with Self-Join).

4.4. Outline of our plan of attack

5. Non-linear Queries: NP-Complete

Definition 1 (Self-join variation of a CQ).

Example 2 (Self-join variations).

Lemma 3 (\CIRCLE\CIRCLE\CIRCLE SJ Can Only Make Resilience Harder).

Example 4.

5.1. Self-join variations of qratsq_{\textup{{rats}}}qrats​ and qbratsq_{\textrm{brats}}qbrats​

Proposition 5.

5.2. Triads Make Queries Hard

Theorem 6 (\CIRCLE\CIRCLE\CIRCLE SJ-queries with triads).

Proof Sketch.

5.3. No Triad Means Pseudo-Linear

Theorem 7 (\CIRCLE\CIRCLE\CIRCLE No Triad Means Pseudo-Linear).

Conjecture 8 (\CIRCLE\CIRCLE\CIRCLE No Triad Means Linear).

6. Paths are hard

Theorem 1 (\LEFTcircle\LEFTcircle\LEFTcircle Unary path).

Proof sketch.

Theorem 2 (\LEFTcircle\LEFTcircle\LEFTcircle Binary path).

Proof Sketch.

7. Queries with exactly two RRR-atoms

7.1. 2-Chains

Proposition 1 (Chains with unary relations).

Proof Sketch.

Proposition 2 (\LEFTcircle:\LEFTcircle^{:}\LEFTcircle: Chains).

7.2. 2-Confluences

Proposition 3 (\LEFTcircle:\LEFTcircle^{:}\LEFTcircle: qconfq_{\textup{{conf}}}qconf​).

Proposition 4 (\LEFTcircle:\LEFTcircle^{:}\LEFTcircle:).

7.3. 2-Permutations

Proposition 5.

Proof.

Proposition 6.

Proposition 7 (\LEFTcircle:\LEFTcircle^{:}\LEFTcircle:).

7.4. Queries with REP

Proposition 8 (\LEFTcircle:\LEFTcircle^{:}\LEFTcircle:).

7.5. The dichotomy

Theorem 9 (\LEFTcircle:\LEFTcircle^{:}\LEFTcircle: Two-Atom Dichotomy).

8. Queries with exactly three RRR-atoms

8.1. 3-Chains

Proposition 1 (\LEFTcircle\LEFTcircle\LEFTcircle).

Lemma 6 ( Triads make $\texttt{RES}(q)$ hard (Freire et al., 2015)).

3.1. Basic hard queries: $q_{\textup{{vc}}}$ and $q_{\textup{{chain}}}$

Proposition 2 ( $q_{\textup{{vc}}}$ ).

Proposition 3 ( $q_{\textup{{chain}}}$ ).

Proposition 5 ( $q_{\textrm{conf}}^{AC}$ ).

Proposition 6 ( $q_{\textrm{3perm-R}}^{A}$ ).

Lemma 1 ( $\CIRCLE$ Query components).

Lemma 2 ( $\CIRCLE$ Query components complexity).

Proposition 5 ( $\CIRCLE$ Domination for resilience with Self-Join).

Lemma 3 ( $\CIRCLE$ SJ Can Only Make Resilience Harder).

5.1. Self-join variations of $q_{\textup{{rats}}}$ and $q_{\textrm{brats}}$

Theorem 6 ( $\CIRCLE$ SJ-queries with triads).

Theorem 7 ( $\CIRCLE$ No Triad Means Pseudo-Linear).

Conjecture 8 ( $\CIRCLE$ No Triad Means Linear).

Theorem 1 ( $\LEFTcircle$ Unary path).

Theorem 2 ( $\LEFTcircle$ Binary path).

7. Queries with exactly two $R$ -atoms

Proposition 2 ( $\LEFTcircle^{:}$ Chains).

Proposition 3 ( $\LEFTcircle^{:}$ $q_{\textup{{conf}}}$ ).

Proposition 4 ( $\LEFTcircle^{:}$ ).

Proposition 7 ( $\LEFTcircle^{:}$ ).

Proposition 8 ( $\LEFTcircle^{:}$ ).

Theorem 9 ( $\LEFTcircle^{:}$ Two-Atom Dichotomy).

8. Queries with exactly three $R$ -atoms

Proposition 1 ( $\LEFTcircle$ ).

Proposition 3 ( $\LEFTcircle^{\because}$ ).