Reasoning about disclosure in data integration in the presence of source   constraints

Michael Benedikt; Pierre Bourhis (CRIStAL; CNRS; SPIRALS); Louis; Jachiet (CRIStAL; CNRS; SPIRALS); Micha\"el Thomazo (DI-ENS; ENS Paris; CNRS,; PSL; VALDA )

arXiv:1906.00624·cs.LO·December 15, 2020

Reasoning about disclosure in data integration in the presence of source constraints

Michael Benedikt, Pierre Bourhis (CRIStAL, CNRS, SPIRALS), Louis, Jachiet (CRIStAL, CNRS, SPIRALS), Micha\"el Thomazo (DI-ENS, ENS Paris, CNRS,, PSL, VALDA )

PDF

TL;DR

This paper investigates how source constraints influence privacy disclosure analysis in data integration systems, providing bounds on what an attacker can infer considering source semantics and constraints.

Contribution

It introduces a formal framework for analyzing disclosure in data integration with source constraints, highlighting their significant impact on privacy assessments.

Findings

01

Source constraints significantly affect disclosure analysis.

02

The paper establishes bounds on source-aware disclosure.

03

Constraints can either limit or enable information disclosure.

Abstract

Data integration systems allow users to access data sitting in multiple sources by means of queries over a global schema, related to the sources via mappings. Data sources often contain sensitive information, and thus an analysis is needed to verify that a schema satisfies a privacy policy, given as a set of queries whose answers should not be accessible to users. Such an analysis should take into account not only knowledge that an attacker may have about the mappings, but also what they may know about the semantics of the sources. In this paper, we show that source constraints can have a dramatic impact on disclosure analysis. We study the problem of determining whether a given data integration system discloses a source query to an attacker in the presence of constraints, providing both lower and upper bounds on source-aware disclosure analysis.

Tables1

Table 1. Table 1: Complexity of disclosure: PSpace U = C 3 L = 𝖰𝖤𝗇𝗍𝖺𝗂𝗅 superscript subscript absent 𝐿 𝖰𝖤𝗇𝗍𝖺𝗂𝗅 𝑈 𝐶 3 {}_{L={\mathsf{QEntail}}}^{U={C\ref{cor:ididupper}}} means the corresponding problem is PSpace -complete, where the Upper bound is given by Corollary 3 (U=C 3 ) and the Lower bound is inherited from entailment. We omit bounds inferred from inclusion ( ℳ ℳ \mathcal{M} or Σ 𝖲𝗈𝗎𝗋𝖼𝖾 subscript Σ 𝖲𝗈𝗎𝗋𝖼𝖾 \Sigma_{{\mathsf{Source}}} ).

	Unbounded arity				Bounded arity
	$𝖯𝗋𝗈𝗃𝖬𝖺𝗉$	$𝖠𝗍𝗈𝗆𝖬𝖺𝗉$	$𝖦𝗎𝖺𝗋𝖽𝖾𝖽𝖬𝖺𝗉$	$𝖢𝖰𝖬𝖺𝗉$	$𝖯𝗋𝗈𝗃𝖬𝖺𝗉$	$𝖠𝗍𝗈𝗆𝖬𝖺𝗉$	$𝖦𝗎𝖺𝗋𝖽𝖾𝖽𝖬𝖺𝗉$	$𝖢𝖰𝖬𝖺𝗉$
$𝖨𝗇𝖼𝖣𝖾𝗉$	PSpace ${}^{U = C 3}_{L = 𝖰𝖤𝗇𝗍𝖺𝗂𝗅}$	ExpTime $_{L = T 7}$	2ExpTime $_{L = T 6}$	2ExpTime	NP_L=QEntail	NP	ExpTime $_{L = T 6}$	2ExpTime $_{L = T 8}$
$𝖫𝖳𝖦𝖣$	ExpTime $_{L = T 7}$	ExpTime $^{U = T 4}$	2ExpTime	2ExpTime	NP	NP $^{U = T 4}$	ExpTime	2ExpTime
$𝖦𝖳𝖦𝖣$	2ExpTime $_{L = T 6}$	2ExpTime	2ExpTime	2ExpTime	ExpTime $_{L = T 6}$	ExpTime	ExpTime $^{U = C 2}$	2ExpTime
$𝖥𝖦𝖳𝖦𝖣$	2ExpTime	2ExpTime	2ExpTime	2ExpTime $^{U = C 1}$	2ExpTime_L=QEntail	2ExpTime	2ExpTime	2ExpTime $^{U = C 1}$

Equations209

\begin{array}[]{rcl}{\mathsf{IsOpen}}(b,t)&\rightarrow&{\mathsf{OpenHours}}(b,t)\\ {\mathsf{PatBdlg}}(p,b)\wedge{\mathsf{IsOpen}}(b,t)&\rightarrow&{\mathsf{VisitingHours}}(p,t)\\ {\mathsf{DocSpec}}(d,s)\wedge{\mathsf{DocBldg}}(d,b)&\rightarrow&{\mathsf{DocList}}(d,s,b)\\ \end{array}

\begin{array}[]{rcl}{\mathsf{IsOpen}}(b,t)&\rightarrow&{\mathsf{OpenHours}}(b,t)\\ {\mathsf{PatBdlg}}(p,b)\wedge{\mathsf{IsOpen}}(b,t)&\rightarrow&{\mathsf{VisitingHours}}(p,t)\\ {\mathsf{DocSpec}}(d,s)\wedge{\mathsf{DocBldg}}(d,b)&\rightarrow&{\mathsf{DocList}}(d,s,b)\\ \end{array}

PatDoc (p, d) \to \exists s PatSpec (p, s) \land DocSpec (d, s)

PatDoc (p, d) \to \exists s PatSpec (p, s) \land DocSpec (d, s)

PatBdlg (p, b) \to \exists d PatDoc (p, d) \land DocBldg (d, b)

PatBdlg (p, b) \to \exists d PatDoc (p, d) \land DocBldg (d, b)

T (x_{1} \dots x_{n}) \to IsCrit (x_{i})

T (x_{1} \dots x_{n}) \to IsCrit (x_{i})

CritRewrite (Σ_{Source}) \cup CritRewrite (M) \cup IsCrit (M)

CritRewrite (Σ_{Source}) \cup CritRewrite (M) \cup IsCrit (M)

CritRewrite_{\textsc P T im e} (Σ_{Source}) \cup CritRewrite_{\textsc P T im e} (M) \cup IsCrit (M)

CritRewrite_{\textsc P T im e} (Σ_{Source}) \cup CritRewrite_{\textsc P T im e} (M) \cup IsCrit (M)

Children_{\forall} (c, c_{α}, c_{β}, a c, a c_{α}, a c_{β}, y_{0}, y_{1}, r) .

Children_{\forall} (c, c_{α}, c_{β}, a c, a c_{α}, a c_{β}, y_{0}, y_{1}, r) .

Children_{\forall} (c, c_{α}, c_{β}, a c_{c}, c_{Crit}, c_{Crit}, y_{0}, y_{1}, r) \to a c_{c} = c_{Crit} .

Children_{\forall} (c, c_{α}, c_{β}, a c_{c}, c_{Crit}, c_{Crit}, y_{0}, y_{1}, r) \to a c_{c} = c_{Crit} .

CritRewrite (Σ_{Source}) \cup CritRewrite (M) \cup IsCrit (M)

CritRewrite (Σ_{Source}) \cup CritRewrite (M) \cup IsCrit (M)

CritRewrite_{\textsc P T im e} (Σ_{Source}) \cup CritRewrite_{\textsc P T im e} (M) \cup IsCrit (M)

CritRewrite_{\textsc P T im e} (Σ_{Source}) \cup CritRewrite_{\textsc P T im e} (M) \cup IsCrit (M)

HOCWQ (D_{Crit}^{G (M)}, Σ_{Source} \cup Σ_{M} (M), p, G (M))

HOCWQ (D_{Crit}^{G (M)}, Σ_{Source} \cup Σ_{M} (M), p, G (M))

Children_{\exists} (c_{r oo t},

Children_{\exists} (c_{r oo t},

y_{0}^{1, k}, y_{1}^{1, k}, c_{r oo t}, x, y_{0}^{0}, y_{1}^{0})

Children_{\forall} (c, α, β, a c, a c_{α}, a c_{β}, y_{0}^{1, k}, y_{1}^{1, k}, r, z, y_{0}, y_{1})

Children_{\forall} (c, α, β, a c, a c_{α}, a c_{β}, y_{0}^{1, k}, y_{1}^{1, k}, r, z, y_{0}, y_{1})

\to \exists α_{α}, α_{β}, a c_{α_{α}}, a c_{α_{β}}

Children_{\exists} (α, α_{α}, α_{β}, a c_{α}, a c_{α_{α}}, a c_{α_{β}}, y_{0}^{1, k}, y_{1}^{1, k}, r, z, y_{0}, y_{1})

Children_{\forall} (c, α, β, x, z, z, y_{0}^{k}, y_{1}^{k}, r, z, y_{0}, y_{1})

Children_{\forall} (c, α, β, x, z, z, y_{0}^{k}, y_{1}^{k}, r, z, y_{0}, y_{1})

Children_{\exists} (c, α, β, x, z, a c_{β}, y_{0}^{k}, y_{1}^{k}, r, z, y_{0}, y_{1})

Children_{\exists} (c, α, β, x, z, a c_{β}, y_{0}^{k}, y_{1}^{k}, r, z, y_{0}, y_{1})

Children_{Q} (c, α, β, a c, a c_{α}, a c_{β}, y_{0}^{1, k}, y_{1}^{1, k}, r, z, y_{0}, y_{1})

Children_{Q} (c, α, β, a c, a c_{α}, a c_{β}, y_{0}^{1, k}, y_{1}^{1, k}, r, z, y_{0}, y_{1})

\to GenAddr (c, α, y_{0}^{1, k}, y_{1}^{1, k}, r, z, y_{0}, y_{1})

Children_{Q} (c, α, β, y_{0}^{1, k}, y_{1}^{1, k}, r, z, y_{0}, y_{1})

Children_{Q} (c, α, β, y_{0}^{1, k}, y_{1}^{1, k}, r, z, y_{0}, y_{1})

\to GenAddr (c, β, y_{0}^{1, k}, y_{1}^{1, k}, r, z, y_{0}, y_{1})

GenAddr (c_{p}, c_{n}, a_{1}, \dots, a_{i}, \dots, a_{k + i}, \dots, a_{2 k}, r, z, y_{0}, y_{1})

GenAddr (c_{p}, c_{n}, a_{1}, \dots, a_{i}, \dots, a_{k + i}, \dots, a_{2 k}, r, z, y_{0}, y_{1})

\to GenAddr (c_{p}, c_{n}, a_{1}, \dots, a_{k + i}, \dots, a_{i}, \dots, a_{2 k}, r, z, y_{0}, y_{1})

GenAddr (c_{p}, c_{n}, a_{1}, \dots, a_{2 k}, z, y_{0}, y_{1}) \to \exists v, v_{prev}, v_{next}

GenAddr (c_{p}, c_{n}, a_{1}, \dots, a_{2 k}, z, y_{0}, y_{1}) \to \exists v, v_{prev}, v_{next}

Cell (c_{p}, c_{n}, a_{1}, \dots, a_{k}, v, v_{prev}, v_{next}, r, z, y_{0}, y_{1})

Cell (c_{p}, c_{n}, y^{1, k}, l_{i} (x), v_{prev}^{'}, v_{nxt}^{'}, r, z, y_{0}, y_{1})

Cell (c_{p}, c_{n}, y^{1, k}, l_{i} (x), v_{prev}^{'}, v_{nxt}^{'}, r, z, y_{0}, y_{1})

\to Cell_{i}^{c} (c_{n}, y^{1, k}, x)

Cell (c_{p}, c_{n}, y_{b_{1}}, \dots, y_{b_{j}}, y_{0}, y_{1}, v, v_{prev}, l_{i} (x), r, z, y_{0}, y_{1})

Cell (c_{p}, c_{n}, y_{b_{1}}, \dots, y_{b_{j}}, y_{0}, y_{1}, v, v_{prev}, l_{i} (x), r, z, y_{0}, y_{1})

\land Cell_{i}^{c} (c_{n}, y_{b_{1}}, \dots, y_{b_{j}}, y_{1}, y_{0}, z)

Cell (c_{p}, c_{n}, y_{0}, l_{1} (x), v_{prev}, v_{next}, c_{p}, z, y_{0}, y_{1})

Cell (c_{p}, c_{n}, y_{0}, l_{1} (x), v_{prev}, v_{next}, c_{p}, z, y_{0}, y_{1})

Cell (c_{p}, c_{n}, \dots, y_{1}, \dots, a_{n}, l_{2} (x), v_{prev}, v_{next}, c_{p}, z, y_{0}, y_{1})

Cell (c_{p}, c_{n}, \dots, y_{1}, \dots, a_{n}, l_{2} (x), v_{prev}, v_{next}, c_{p}, z, y_{0}, y_{1})

Children_{Q} (c, α, β, z, a c_{α}, a c_{β}, y_{0}^{k}, y_{1}^{k}, c_{r oo t}, z, y_{0}, y_{1})

Children_{Q} (c, α, β, z, a c_{α}, a c_{β}, y_{0}^{k}, y_{1}^{k}, c_{r oo t}, z, y_{0}, y_{1})

\to succ_{α} (c, α)

Cell (c_{p}, c_{n}, b^{1, k}, l_{w} (x), v_{prev}, v_{next}, r, z, y_{0}, y_{1})

Cell (c_{p}, c_{n}, b^{1, k}, l_{w} (x), v_{prev}, v_{next}, r, z, y_{0}, y_{1})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Reasoning about Disclosure in Data Integration in the Presence of Source Constraints

Michael Benedikt1

Pierre Bourhis2

Louis Jachiet2&Michaël Thomazo3

1University of Oxford

2CNRS CRIStAL, Université Lille, Inria Lille

3Inria, DI ENS, ENS, CNRS, PSL University

{pierre.bourhis, louis.jachiet}@univ-lille.fr, [email protected], [email protected]

Abstract

Data integration systems allow users to access data sitting in multiple sources by means of queries over a global schema, related to the sources via mappings. Datasources often contain sensitive information, and thus an analysis is needed to verify that a schema satisfies a privacy policy, given as a set of queries whose answers should not be accessible to users. Such an analysis should take into account not only knowledge that an attacker may have about the mappings, but also what they may know about the semantics of the sources. In this paper, we show that source constraints can have a dramatic impact on disclosure analysis. We study the problem of determining whether a given data integration system discloses a source query to an attacker in the presence of constraints, providing both lower and upper bounds on source-aware disclosure analysis.

1 Introduction

In data integration, users are shielded from the heterogeneity of multiple datasources by querying via a global schema, which provides a unified vocabulary. The relationship between sources and the user-facing schema are specified declaratively via mapping rules. In data integration systems based on knowledge representation techniques, users pose queries against the global schema, and these queries are answered using data in the sources and background knowledge. The computation of the answers involves reasoning based on the query, the mappings, and any additional semantic information that is known on the global schema.

Data integration brings with it the danger of disclosing information that data owners wish to keep confidential. In declarative data integration, detection of privacy violations is complex: although explicit access to source information may be masked by the global schema, an attacker can infer source facts via reasoning with schema and mapping information.

Example 1.

We consider an information integration setting for a hospital, which internally stores the following data:

[TABLE]

The hospital publishes the following data: ${\mathsf{OpenHours}}(b,t)$ giving opening times $t$ for building $b$ , ${\mathsf{VisitingHours}}(p,t)$ giving times $t$ when a given patient $p$ can be visited, and ${\mathsf{DocList}}(d,s,b)$ listing the doctors $d$ with their specialty $s$ and their building $b$ . Formally the data being exposed is given by the following mappings:

[TABLE]

Prior work Benedikt et al. (2018) has studied disclosure in knowledge-based data integration, with an emphasis on the role of semantic information on the global schema – in the form of ontological rules that relate the global schema vocabulary. The presence of an ontology can assist in privacy, since distinctions in the source data may become indistinguishable in the ontology. More dangerous from the point of view of protecting information is semantic information about sources. For example, the sources in a data integration setting will generally overlap: that is, they will satisfy referential integrity constraints, saying that data items in one source link to items in another source. Such constraints should be assumed as public knowledge, and with that knowledge the attacker may be able to infer information that was intended to be secret.

Example 2.

*Continuing Example 1, suppose that we know that each patient has a doctor specialized in their condition, which can be formalized as: *

[TABLE]

And that we also know that when a patient is in a building, they must have a doctor there:

[TABLE]

Due to the presence of these constraints, there can be a disclosure of the relationship of patient to speciality ${\mathsf{PatSpec}}(p,s)$ . Indeed, an attacker can see the ${\mathsf{VisitingHours}}$ for $p$ , and from this, along with ${\mathsf{OpenHours}}$ , they can sometimes infer the building $b$ where $p$ is treated (e.g. if $b$ has a unique set of open hours). From this they may be able to infer, using ${\mathsf{DocList}}$ , the specialty that $p$ has been treated for – for example, if all the doctors in $b$ share a specialty.

In this work, we perform a detailed examination of the role of source constraints in disclosing information in the context of data integration. We focus on mappings from the sources given by universal Horn rules, where the global schema comes with no constraints. Since our disclosure problem requires reasoning over all sources satisfying the constraints, we need a constraint formalism that admits effective reasoning. We will look at a variety of well-studied rule-based formalisms, with the simplest being referential constraints, and the most complex being the frontier-guarded rules Baget et al. (2011). While decidability of our disclosure problems will follow from prior work Benedikt et al. (2016), we will need new tools to analyze the complexity of the problem. In Section 3, we give reductions of disclosure problems to the query entailment problem that is heavily-studied in knowledge representation. While a naïve application of the reduction allows us only to conclude very pessimistic bounds, a more fine-grained analysis, combined with some recent results on CQ entailment, will allow us to get much better bounds, in some cases ensuring tractability. In Section 4, we complement these results with lower bounds. Both the upper and lower bounds revolve around a complexity analysis for reasoning with guarded existential rules and a restricted class of equality rules, where the rule head compares a variable and a distinguished constant. We believe this exploration of limited equality rules can be productive for other reasoning problems.

Overall we get a complete picture of the complexity of disclosure in the presence of source constraints for many natural classes: see Tables 1 in Section 6 for a summary of our bounds. Full proofs are available at the address https://hal.inria.fr/hal-02145369.

2 Preliminaries

We adopt standard notions from function-free first-order logic over a vocabulary of relational names. An instance is a finite set of facts. By a query we always mean a conjunctive query (CQ), which is a first-order formula of the form $\exists\vec{x}~{}\bigwedge A_{i}$ , where each $A_{i}$ is an atom. The arity of a CQ is the number of its free variables, and CQs of arity 0 are Boolean.

Data integration.

Assume that the relational names in the vocabulary are split into two disjoint subsets: source and global schema. The arity of such a schema is the maximal arity of its relational names. We consider a set $\mathcal{M}$ of mapping rules between source relations and a global schema relation $\mathcal{T}$ given. We focus on rules $\phi(\vec{x},\vec{y})\rightarrow\mathcal{T}(\vec{x})$ where $\phi$ is a conjunctive query, there are no repeated variables in $\mathcal{T}(\vec{x})$ , and where each global schema relation $\mathcal{T}$ is associated with exactly one rule. Such rules are sometimes called “GAV mappings” in the database literature Lenzerini (2002), and the unique $\phi$ associated to a global relation $\mathcal{T}$ is referred to as the definition of $\mathcal{T}$ . The rules are guarded ( $\mathcal{M}\in{\mathsf{GuardedMap}}$ ) if for every rule, there exists an atom in the antecedent $\phi$ that contains all the variables of $\phi$ . The rules are atomic ( $\mathcal{M}\in{\mathsf{AtomMap}}$ ) if each $\phi$ consists of a single atom, and they are projection maps ( $\mathcal{M}\in{\mathsf{ProjMap}}$ ) if each $\phi$ is a single atom with no repeated variables. Given an instance $\mathcal{D}$ for the source relations, the image of $\mathcal{D}$ under mapping $\mathcal{M}$ , denoted $\mathcal{M}(\mathcal{D})$ , is the instance for the global schema consisting of all facts $\{\mathcal{T}(\vec{c})\mid\mathcal{D}\models\exists\vec{y}~{}\phi(\vec{c})\},$ where $\phi$ is the definition of $\mathcal{T}$ .

Source constraints.

We consider restrictions on the sources in the form of rules. A tuple-generating dependency (TGD) is a universally quantified sentence of the form $\varphi(\mathbf{x},\mathbf{z})\rightarrow\exists\mathbf{y}\psi(\mathbf{x},\mathbf{y})$ , where the body $\varphi(\mathbf{x},\mathbf{z})$ and the head $\psi(\mathbf{x},\mathbf{y})$ are conjunctions of atoms such that each term is either a constant or a variable in $\mathbf{x}\cup\mathbf{z}$ and $\mathbf{x}\cup\mathbf{y}$ , respectively. Variables $\mathbf{x}$ , common to the head and body, are called the frontier variables. A frontier-guarded TGD ( ${\mathsf{FGTGD}}$ ) is a TGD in which there is an atom of the body that contains every frontier variable. We focus on ${\mathsf{FGTGD}}$ s because they have been heavily studied in the database and knowledge representation community, and it is known that many computational problems involving ${\mathsf{FGTGD}}$ s are decidable Baget et al. (2011). In particular this is true of the query entailment problem, which asks, given a finite collection of facts $\mathcal{F}$ , a finite set $\Sigma$ of sentences, and a CQ $Q$ , whether $\mathcal{F}\wedge\Sigma$ entails $Q$ . We use ${\mathsf{QEntail}}(\mathcal{F},\Sigma,Q)$ to denote an instance of this problem and also say that “ $\mathcal{F}$ entails $Q$ w.r.t. constraints $\Sigma$ ”. A special case of ${\mathsf{FGTGD}}$ s are Guarded TGDs ( ${\mathsf{GTGD}}$ s), in which there is an atom containing all body variables. These specialize further to linear TGDs ( ${\mathsf{LTGD}}$ s), whose body consists of a single atom; and even further to inclusion dependencies ( ${\mathsf{IncDep}}$ s), a linear TGD with a single atom in the head, in which no variable occurs multiple times in the body, and no variable occurs multiple times in the head. Even ${\mathsf{IncDep}}$ s occur quite commonly: for example, the source constraints of Example 2 can be rewritten as ${\mathsf{IncDep}}$ s. The most specialized class we study are the unary ${\mathsf{IncDep}}$ s:( ${\mathsf{UID}}$ s), which are ${\mathsf{IncDep}}$ s with at most one frontier variable.

Queries and disclosure.

The sensitive information in a data integration setting is given by a CQ $p$ over the source schema, which we refer to as the policy. Intuitively, disclosure of sensitive information occurs in a source instance $\mathcal{D}$ whenever the attacker can infer from the image $\mathcal{M}(\mathcal{D})$ that $p$ holds of a tuple in $\mathcal{D}$ . Formally, we say an instance $\mathcal{V}$ for the global schema is realizable, with respect to mappings $\mathcal{M}$ and source constraints $\Sigma_{{\mathsf{Source}}}$ if there is some source instance $\mathcal{D}$ that satisfies $\Sigma_{{\mathsf{Source}}}$ such that $\mathcal{M}(\mathcal{D})=\mathcal{V}$ . For a realizable $\mathcal{V}$ , the set of such $\mathcal{D}$ are the possible source instances for $\mathcal{V}$ . A query result $p(\vec{t})$ is disclosed at $\mathcal{V}$ if $p(\vec{t})$ holds on all possible source instances for $\mathcal{V}$ . A query $p$ admits a disclosure (for mappings $\mathcal{M}$ and source constraints $\Sigma_{{\mathsf{Source}}}$ ) if there is some realizable instance $\mathcal{V}$ and binding $\vec{t}$ for the free variables of $p$ for which $p(\vec{t})$ is disclosed. In this terminology, the conclusion of Example 2 was that policy ${\mathsf{PatSpec}}(p,s)$ admits a disclosure with respect to the constraints and mappings. For a class of constraints $\mathcal{C}$ , a class of mappings ${\mathsf{Map}}$ , a class of policies ${\mathsf{Policy}}$ , we write ${\mathsf{Disclose_{C}}}(\mathcal{C},{\mathsf{Map}})$ to denote the problem of determining whether a policy (a CQ, unless otherwise stated) admits a disclosure for a set of mappings in ${\mathsf{Map}}$ and a set of source constraints in $\mathcal{C}$ . Given $\Sigma_{{\mathsf{Source}}},\mathcal{M}$ and a CQ $p$ , the corresponding instance of this problem is denoted by ${\mathsf{Disclose}}(\mathcal{C},\mathcal{M},p)$ . In this paper we will focus on disclosure for queries and constraints without constants, although our techniques extend to the setting with constants, as long as distinct constants are not assumed to be unequal.

3 Reducing Disclosure to Query Entailment

Our first goal is to provide a reduction from ${\mathsf{Disclose_{C}}}({\mathsf{TGD}},{\mathsf{Map}})$ to a finite collection of standard query entailment problems. For simplicity we will restrict to Boolean queries $p$ in stating the results, but it is straightforward to extend the reductions and results to the non-Boolean case. We first recall a prior reduction of ${\mathsf{Disclose_{C}}}({\mathsf{TGD}},{\mathsf{Map}})$ to a more complex problem, the hybrid open and closed world query answering problem Lutz et al. (2013, 2015); Franconi et al. (2011), denoted ${\mathsf{HOCWQ}}$ . ${\mathsf{HOCWQ}}$ takes as input a set of facts $\mathcal{F}$ , a collection of constraints $\Sigma$ , a Boolean query $Q$ , and additionally a subset ${\cal C}$ of the vocabulary. A possible world for such ${\mathsf{HOCWQ}}(\mathcal{F},\Sigma,Q,{\cal C})$ is any instance $\mathcal{D}$ containing $\mathcal{F}$ , satisfying $\Sigma$ , and such that for each relation $C\in{\cal C}$ , the $C$ -facts in $\mathcal{D}$ are the same as the $C$ -facts in $\mathcal{F}$ . ${\mathsf{HOCWQ}}(\mathcal{F},\Sigma,Q,{\cal C})$ holds if $Q$ holds in every possible world. Note that the query entailment problem is a special case of ${\mathsf{HOCWQ}}$ , where ${\cal C}$ is empty.

Given a set of mapping rules $\mathcal{M}$ of the form $\phi(\vec{y},\vec{x})\rightarrow\mathcal{T}(\vec{x})$ , we let ${\cal G}(\mathcal{M})$ be the set of global schema predicates, and let $\Sigma_{\mathcal{M}}(\mathcal{M})$ be the mapping rules, considered as bi-directional constraints between global schema predicates and sources.

We now recall one of the main results of Benedikt et al. (2016):

Theorem 1.

There is an instance $\mathcal{D}^{\prime}$ computable in linear time from $\Sigma_{{\mathsf{Source}}},\mathcal{M},p$ , such that ${\mathsf{Disclose}}(\Sigma_{{\mathsf{Source}}},\mathcal{M},p)$ holds if and only if ${\mathsf{HOCWQ}}(\mathcal{D}^{\prime},\Sigma_{{\mathsf{Source}}}\cup\Sigma_{\mathcal{M}}(\mathcal{M}),p,{\cal G}(\mathcal{M}))$ holds.

In fact, the arguments in Benedikt et al. (2016) show that $\mathcal{D}^{\prime}$ can be taken to be a very simple instance, the critical instance over the global schema ${\cal G}(\mathcal{M})$ denoted $\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ where $\mathcal{D}_{{\mathsf{Crit}}}^{\mathcal{S}}$ , for $\mathcal{S}$ a set of predicates, denotes the instance that mentions only a single element $c_{{\mathsf{Crit}}}$ , and contains, for each relation $R$ in $\mathcal{S}$ of arity $n$ , the fact $R(c_{{\mathsf{Crit}}},\ldots c_{{\mathsf{Crit}}})$ .

Corollary 1.

${\mathsf{Disclose_{C}}}({\mathsf{FGTGD}},{\mathsf{CQMap}})$ * is in 2ExpTime.*

Proof.

The non-classical aspect of ${\mathsf{HOCWQ}}$ comes into play with rules of $\Sigma_{\mathcal{M}}(\mathcal{M})$ of form $\phi(\vec{x},\vec{y})\rightarrow\mathcal{T}(\vec{x})$ . But in the context of $\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ , these can be rewritten as single-constant equality rules ( ${\mathsf{SCEQrule}}$ s) $\phi(\vec{x},\vec{y})\rightarrow\bigwedge_{i}x_{i}=c_{{\mathsf{Crit}}}$ . Such rules remain in the Guarded Negation Fragment of first-order logic, which also subsumes ${\mathsf{FGTGD}}$ s, while having a query entailment problem in 2ExpTime Bárány et al. (2015). ∎

We now want to conduct a finer-grained analysis, looking for cases that give lower complexity. To do this we will transform further into a classical query entailment problem. This will require a transformation of our query $p$ , a transformation of our source constraints and mappings into a new set of constraints, and a transformation of the instance $\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ . The idea of the transformation is that we remove the ${\mathsf{SCEQrule}}$ s that are implicit in the ${\mathsf{HOCWQ}}$ problem, replacing them with constraints and queries that reflect all the possible impacts the rules might have on identifying two variables.

We first describe the transformation of the query and the constraints. They will involve introducing a new unary predicate ${\mathsf{IsCrit}}(x)$ ; informally this states that $x$ is equal to $c_{{\mathsf{Crit}}}$ . Consider a CQ $Q=\exists\vec{y}~{}\bigwedge A_{i}$ . An annotation of $Q$ is a subset of $Q$ ’s variables. Given an annotation ${\mathsf{Annot}}$ of $Q$ , we let $Q_{\mathsf{Annot}}$ be the query obtained from $Q$ by performing the following operation for each $v$ in ${\mathsf{Annot}}$ : for all occurrences $j$ of $v$ except the first one, replacing $v$ with a fresh variable $v_{j}$ ; and adding conjuncts ${\mathsf{IsCrit}}(v_{j})$ as well as ${\mathsf{IsCrit}}(v)$ to $Q_{\mathsf{Annot}}$ . A critical-instance rewriting of a CQ $Q$ is a CQ obtained by applying the above process to $Q$ for any annotation. We write $Q_{\mathsf{Annot}}\in{\mathsf{CritRewrite}}(Q)$ to indicate that $Q_{\mathsf{Annot}}$ is such a rewriting.

To transform the mapping rules and constraints to a new set of constraints using ${\mathsf{IsCrit}}(x)$ , we lift the notion of critical-instance rewriting to TGDs in the obvious way: a critical-instance rewriting of a TGD $\sigma$ (either in $\Sigma_{{\mathsf{Source}}}$ or $\Sigma_{\mathcal{M}}(\mathcal{M})$ ), is the set of TGDs formed by applying the above process to the body of $\sigma$ . We write $\sigma_{\mathsf{Annot}}\in{\mathsf{CritRewrite}}(\Sigma)$ to indicate that $\sigma_{\mathsf{Annot}}$ is a critical-instance rewriting for a $\sigma\in\Sigma$ , and similarly for mappings. For example, the second mapping rule in Example 1 has several rewritings; one of them will change the rule body to ${\mathsf{PatBdlg}}(p,b)\wedge{\mathsf{IsOpen}}(b^{\prime},d)\wedge{\mathsf{IsCrit}}(b)\wedge{\mathsf{IsCrit}}(b^{\prime})$ .

Our transformed constraints will additionally use the set of constraints ${\mathsf{IsCrit}}(\mathcal{M})$ , including all rules:

[TABLE]

where $\mathcal{T}$ ranges over the global schema and $1\leq i\leq n$ . Informally ${\mathsf{IsCrit}}(\mathcal{M})$ states that all elements in the mapping image must be $c_{{\mathsf{Crit}}}$ . We also need to transform the instance, using a source instance with “witnesses for the target facts”. Consider a fact $\mathcal{T}(c_{{\mathsf{Crit}}}\ldots c_{{\mathsf{Crit}}})$ in $\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ formed by applying a mapping rule $\bigwedge_{i}A_{i}(\vec{x}_{i},\vec{y}_{i})\rightarrow\mathcal{T}(\vec{x})$ in $\mathcal{M}$ . The set of witness tuples for $\mathcal{T}(\vec{x})$ is the set $A_{i}(\vec{c})$ , where $\vec{c}$ contains $c_{{\mathsf{Crit}}}$ in each position containing a variable $x_{j}$ and containing a constant $c_{y_{j}}$ in every position containing a variable $y_{j}$ . That is the witness tuples are witnesses for the fact $\mathcal{T}(c_{{\mathsf{Crit}}}\ldots c_{{\mathsf{Crit}}})$ , where each existential witness is chosen fresh. Let ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ be the instance formed by taking the witness tuples for every fact $\mathcal{T}(c_{{\mathsf{Crit}}}\ldots c_{{\mathsf{Crit}}})\in\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ .

We are now ready to state the reduction of the disclosure problem to query entailment:

Theorem 2.

${\mathsf{Disclose}}(\Sigma_{{\mathsf{Source}}},\mathcal{M},p)$ * holds exactly when there is a $p_{\mathsf{Annot}}\in{\mathsf{CritRewrite}}(p)$ such that ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ entails $p_{\mathsf{Annot}}$ w.r.t. constraints:*

[TABLE]

Note that Theorem 2 does not give a polynomial time reduction: both ${\mathsf{CritRewrite}}(\Sigma_{{\mathsf{Source}}})$ and ${\mathsf{CritRewrite}}(\mathcal{M})$ can contain exponentially many rewritings, and further there can be exponentially many rewritings in ${\mathsf{CritRewrite}}(p)$ .

However, the algorithm does give us a better bound in the case of Guarded TGDs with bounded arity.

Corollary 2.

If we bound the arity of schema relations, then ${\mathsf{Disclose_{C}}}({\mathsf{GTGD}},{\mathsf{GuardedMap}})$ is in ExpTime.

Proof.

First, by introducing additional intermediate relations and source constraints, we can assume that $\mathcal{M}$ contains only projection mappings. Thus we can guarantee that ${\mathsf{CritRewrite}}(\mathcal{M})$ just contains the rules in $\mathcal{M}$ . By introducing intermediate relations and additional source constraints, we can also assume that each ${\mathsf{GTGD}}\in\Sigma_{{\mathsf{Source}}}$ has a body with at most two atoms. Since the arity of relations is fixed, the size of such $1$ - or $2$ -atom bodies is fixed as well. From this we see that the number of constraints in any ${\mathsf{CritRewrite}}(\sigma)$ is polynomial. The reduction in Theorem 2 thus gives us exponentially many ${\mathsf{GTGD}}$ entailment problems of polynomial size. Since entailment over Guarded TGDs with bounded arity is in ExpTime Calì et al. (2013), we can conclude. ∎

3.1 Refinements of the Reduction to Identify Lower Complexity Cases

In order to lower the complexity to ExpTime without bounding the arity, we refine the construction of the function ${\mathsf{CritRewrite}}(\sigma)$ in the case where $\sigma$ is a linear TGD, providing a function ${\mathsf{CritRewrite}}_{\textsc{PTime}}(\sigma)$ that constructs only polynomially many rewritten constraints.

Let $\sigma=B(\vec{x})\rightarrow\exists\vec{y}~{}H(\vec{z})$ be a linear TGD with relation $B$ of arity $k$ , and suppose $\vec{x}$ contains $d$ distinct free variables $V=\{v_{1}\ldots v_{d}\}$ . Let $P$ be the set of pairs $(e,f)$ with $e<f\leq k$ such that the same variable $v_{i}$ sits at positions $e$ and $f$ in $\vec{x}$ . We order $P$ as $(e_{0},f_{0})\ldots(e_{h},f_{h})$ ; for each $(e,f)$ that is not the initial pair $(e_{0},f_{0})$ , we let $(e,f)^{-}$ be its predecessor in the linear order.

We let $B_{e,f}$ denote new predicates of arity $k$ for each $(e,f)\in P$ . Let $\vec{w}$ be a set of $k$ distinct variables, and $\vec{w}^{i=j}$ be formed from $\vec{w}$ by replacing $w_{j}$ with $w_{i}$ . We begin the construction of ${\mathsf{CritRewrite}}_{\textsc{PTime}}(\sigma)$ with the constraints: $B(\vec{w}^{e_{0}=f_{0}})\rightarrow B_{e_{0},f_{0}}(\vec{w}^{e_{0}=f_{0}})$ and $B(\vec{w})\wedge{\mathsf{IsCrit}}(w_{e_{0}})\wedge{\mathsf{IsCrit}}(w_{f_{0}})\rightarrow B_{e_{0},f_{0}}(\vec{w})$ .

For each $(e,f)$ with a predecessor $(e,f)^{-}=(e^{\prime},f^{\prime})$ , we add to ${\mathsf{CritRewrite}}_{\textsc{PTime}}(\sigma)$ the following constraints: $B_{e^{\prime},f^{\prime}}(\vec{w}^{e=f})\rightarrow B_{e,f}(\vec{w}^{e=f})$ and $B_{e^{\prime},f^{\prime}}(\vec{w})\wedge{\mathsf{IsCrit}}(w_{e})\wedge{\mathsf{IsCrit}}(w_{f})\rightarrow B_{e,f}(\vec{w})$ .

Letting $e_{h},f_{h}$ the final pair in $P$ , we add to ${\mathsf{CritRewrite}}_{\textsc{PTime}}(\sigma)$ the constraint $B_{e_{h},f_{h}}(\vec{x}^{\prime})\rightarrow\exists\vec{y}~{}H(\vec{z})$ where $\vec{x}^{\prime}$ is obtained from $\vec{x}$ by replacing all but the first occurrence of each variable $v$ by a fresh variable.

If $\Sigma_{{\mathsf{Source}}}$ consists of ${\mathsf{LTGD}}$ s, we let ${\mathsf{CritRewrite}}_{\textsc{PTime}}(\Sigma_{{\mathsf{Source}}})$ be the result of applying this process to every $\sigma\in\Sigma_{{\mathsf{Source}}}$ . Similarly, if $\mathcal{M}$ consists of atomic mappings (implying that the associated rules are ${\mathsf{LTGD}}$ s), then we let ${\mathsf{CritRewrite}}_{\textsc{PTime}}(\mathcal{M})$ the result of applying the process above to the rule going from source relation to global schema relation associated to $m\in\mathcal{M}$ . Then we have:

Theorem 3.

When $\Sigma_{{\mathsf{Source}}}$ consists of ${\mathsf{LTGD}}$ s, ${\mathsf{Disclose}}(\Sigma_{{\mathsf{Source}}},\mathcal{M},p)$ holds exactly when there is a $p_{\mathsf{Annot}}\in{\mathsf{CritRewrite}}(p)$ such that ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ entails $p_{\mathsf{Annot}}$ w.r.t. to the constraints

[TABLE]

We can combine this result with recent work on fine-grained complexity of ${\mathsf{GTGD}}$ s to improve the doubly exponential upper bound of Corollary 1 for linear TGD source constraints and atomic mappings:

Theorem 4.

${\mathsf{Disclose_{C}}}({\mathsf{LTGD}},{\mathsf{AtomMap}})$ * is in ExpTime. If the arity of relations in the source schema is bounded, then the complexity drops to NP, while if further the policy is atomic, the problem is in PTime.*

Proof.

It is sufficient to get an ExpTime algorithm for the entailment problem produced by Theorem 3, since then we can apply it to each $p_{\mathsf{Annot}}$ in ExpTime. The constraints in ${\mathsf{CritRewrite}}_{\textsc{PTime}}(\Sigma_{{\mathsf{Source}}})\cup{\mathsf{CritRewrite}}_{\textsc{PTime}}(\mathcal{M})$ are Guarded TGDs that are not necessarily ${\mathsf{LTGD}}$ s. But the bodies of these guarded TGDs consist of a guard predicate and atoms over a fixed “side signature”, namely the unary predicate ${\mathsf{IsCrit}}$ . It is known that the query entailment for ${\mathsf{IncDep}}$ s and guarded TGDs with a fixed side signature is in ExpTime, with the complexity dropping to NP (resp. PTime) when the arity is fixed (resp. fixed and the query is atomic) Amarilli and Benedikt (2018a). ∎

Can we do better than ExpTime? We can note that when the constraints $\sigma\in\Sigma_{{\mathsf{Source}}}$ are ${\mathsf{IncDep}}$ s, ${\mathsf{CritRewrite}}(\sigma)$ consists only of $\sigma$ ; similarly if a mapping $m\in\mathcal{M}$ is a projection, then ${\mathsf{CritRewrite}}(m)$ consists only of $m$ . This gives us a good upper bound in one of the most basic cases:

Corollary 3.

${\mathsf{Disclose_{C}}}({\mathsf{IncDep}},{\mathsf{ProjMap}})$ * is in PSpace. If further a bound is fixed on the arity of relations in the source schema, then the problem becomes NP, dropping to PTime when the policy is atomic.*

Proof.

Our algorithm will guess a $p_{\mathsf{Annot}}$ in ${\mathsf{CritRewrite}}(Q)$ and checks the entailment of Theorem 2. This gives an entailment problem for ${\mathsf{IncDep}}$ s, known to be in PSpace in general, in NP for bounded arity, and in PTime for bounded arity and atomic queries Johnson and Klug (1984). ∎

3.2 Obtaining Tractability

Thus far we have seen cases where the complexity drops to PSpace in the general case and NP in the bounded arity case, and PTime for atomic queries. We now present a case where we obtain tractability for arbitrary queries and arity. Recall that a ${\mathsf{UID}}$ is an ${\mathsf{IncDep}}$ where at most one variable is exported. They are actually quite common, capturing referential integrity when data is identified by a single attribute. We can show that restricting to ${\mathsf{UID}}$ s while having only projection maps leads to tractability:

Theorem 5.

${\mathsf{Disclose_{C}}}({\mathsf{UID}},{\mathsf{ProjMap}})$ * is in PTime.*

Proof.

The first step is to refine the reduction of Theorem 2 to get an entailment problem with only ${\mathsf{UID}}$ s, over an instance consisting of a single unary fact ${\mathsf{IsCrit}}(c_{{\mathsf{Crit}}})$ . The main issue is avoid the constraints in $\Sigma_{\mathcal{M}}(\mathcal{M})$ , corresponding to the mapping rules. The intuition for this is that on $\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ , the only impact of the backward and forward implications of $\Sigma_{\mathcal{M}}(\mathcal{M})$ is to create new facts among the source relations. In these new facts only $c_{{\mathsf{Crit}}}$ , is propagated. Rather than creating ${\mathsf{SCEQrule}}$ s (implicitly what happens in the ${\mathsf{HOCWQ}}$ reduction) or generating classical constraints where the impact of the equalities are “baked in” (as in the critical-instance rewritings of Theorems 2 and 3), we truncate the source relations to the positions where non-visible elements occur, while generating ${\mathsf{UID}}$ s on these truncated relations that simulate the impact of back-and-forth using $\Sigma_{\mathcal{M}}(\mathcal{M})$ .

The second step is to show that query entailment with ${\mathsf{UID}}$ s over the instance consisting only of ${\mathsf{IsCrit}}(c_{{\mathsf{Crit}}})$ is in PTime. This can be seen as an extension of the PTime inference algorithm for ${\mathsf{UID}}$ s Cosmadakis et al. (1990). The idea behind this result is to analyze the classical “chase procedure” for query entailment with TGDs Fagin et al. (2005). In the case of ${\mathsf{UID}}$ s over a unary fact, the shape of the chase model is very restricted; roughly speaking, it is a tree where only a single fact connects two values. Based on this, we can simplify the query dramatically, making it into an acyclic query where any two variables co-occur in at most one predicate. Once query simplification is performed, we can reduce query entailment to polynomial many entailment problems involving individual atoms in the query. This in turn can be solved using the ${\mathsf{UID}}$ inference procedure of Cosmadakis et al. (1990). ∎

4 Lower Bounds

We now focus on providing lower bounds for ${\mathsf{Disclose_{C}}}(\mathcal{C},{\mathsf{Map}})$ , showing in particular that the upper bounds provided in Section 3 can not be substantially improved. For many classes of constraints it is easy to see that the complexity of disclosure inherits the lower bounds for the classical entailment problem for the class. From this we get a number of matching lower bounds; e.g. 2ExpTime for ${\mathsf{GTGD}}$ constraints, PSpace for ${\mathsf{IncDep}}$ constraints. But note that in some cases the upper bounds we have provided for disclosure in Section 3 are higher than the complexity of entailment over the source constraints. For example, for ${\mathsf{IncDep}}$ s we have provided only a 2ExpTime upper bound for guarded mappings (from Corollary 1), and only an exponential bound for atomic mappings (from Theorem 4). This suggests that the form of the mappings influences the complexity as well, as we now show.

Most of our proofs for hardness above the entailment bound for source constraints rely on the encoding of a Turing machine. Source constraints are used to generate the underlying structures (tree of configurations, tape of a Turing machine) while mappings are used to ensure consistency (a universal configuration is accepting if and only if all its successor configurations are accepting, the content of the tape is consistently represented,…). To illustrate our approach, we sketch the proof of the following result.

Theorem 6.

${\mathsf{Disclose_{C}}}({\mathsf{IncDep}},{\mathsf{GuardedMap}})$ * and ${\mathsf{Disclose_{C}}}({\mathsf{GTGD}},{\mathsf{ProjMap}})$ are 2ExpTime-hard, and are ExpTime-hard even in bounded arity.*

Proof.

Recall that Theorem 1 relates disclosure to a ${\mathsf{HOCWQ}}$ problem on $\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ . Also recall from Section 3 the intuition that such a problem amounts to a classical entailment problem for a CQ over a very simple instance, using the source dependencies and ${\mathsf{SCEQrule}}$ s: of the form $\phi(\vec{x})\rightarrow x=c_{{\mathsf{Crit}}}$ , where $\phi$ will be the body of a mapping. We will sketch how to simulate an alternating ExpSpace Turing machine $\mathcal{M}$ using a ${\mathsf{QEntail}}$ problem using ${\mathsf{IncDep}}$ s and guarded ${\mathsf{SCEQrule}}$ s. This can in turn be simulated using our ${\mathsf{HOCWQ}}$ problem.

We first build a tree of configurations using ${\mathsf{IncDep}}$ s, such that each node has a type (existential or universal) and is the parent of two nodes (called $\alpha$ -successor and $\beta$ -successor) of the opposite type. This tree structure is represented, together with additional information, by atoms such as:

[TABLE]

Intuitively, this states that $c$ is a universal configuration, parent of $c_{\alpha}$ and $c_{\beta}$ . $ac$ (resp. $ac_{\alpha}$ , resp. $ac_{\beta}$ ) is the acceptance bit for $c$ (resp. $c_{\alpha}$ , resp. $c_{\beta}$ ), which will be made equal to $c_{{\mathsf{Crit}}}$ if and only if the configuration represented by $c$ (resp. $c_{\alpha}$ , resp. $c_{\beta}$ ) is accepting. $\vec{y_{0}},\vec{y_{1}}$ will be used to represent cell addresses, while $r$ is the identifier of the root of the configuration tree. The initial instance is such an atom, where the first position and the last position are the same constant, $\vec{y_{0}}$ is a vector of $n$ [math]’s, $\vec{y_{1}}$ is a vector of $n$ $1$ ’s, and all other arguments are distinct constants.

We use ${\mathsf{SCEQrule}}$ s to propagate acceptance information up in the tree. For instance, a universal configuration is accepting if both its successors are accepting. This is simulated by the following ${\mathsf{SCEQrule}}$ :

[TABLE]

To simulate $\mathcal{M}$ , we need access to an exponential number of cells for each configuration. We identify a cell by the configuration it belongs to and an address, which is a vector, generated by ${\mathsf{IncDep}}$ s, of length $n$ whose arguments are either [math] or $1$ . The atom for representing a cell is thus ${\mathsf{Cell}}(c_{p},c,\vec{addr},\vec{v},\vec{v}_{prev},\vec{v}_{next})$ , where $c_{p}$ is the parent configuration of $c$ , which is the configuration to which the represented cell belongs, $\vec{addr}$ is the address of the cell, $\vec{v}$ its content, $\vec{v}_{prev}$ the content of the previous cell, and $\vec{v}_{next}$ the content of the next cell. Note that this representation is redundant, and we need to use ${\mathsf{SCEQrule}}$ s to ensure its consistency.

Note that $\vec{v}$ is a tuple of length the size of $(\Sigma\cup\{\flat\})\times(Q\cup\{\bot\})$ . Each position corresponds to an element of that set, and the content of a represented cell is the element which corresponds to the unique position in which $c_{{\mathsf{Crit}}}$ appears.

We now explain how to build the representation of the initial tape, and simulate the transition function. Both steps are done by unifying some nulls with $c_{{\mathsf{Crit}}}$ . W.l.o.g., we assume that the initial tape contains a $l$ in the first cell, on which points the head of $\mathcal{M}$ in a state $s$ , and that $(l,s)$ corresponds to the first bit of $\vec{v}$ . We thus use a ${\mathsf{SCEQrule}}$ to set this bit to $c_{{\mathsf{Crit}}}$ in the first cell of the first configuration. We then set (w.l.o.g.) the second bit of all the other cells of that configuration to $c_{{\mathsf{Crit}}}$ (assuming this represents $(\flat,\bot)$ ).

To simulate the transitions, we note that the content of a cell in a configuration depends only on the content of the same cell in the parent configuration, along with the content of parent’s previous and next cells. We thus add a ${\mathsf{SCEQrule}}$ that checks for the presence of $c_{{\mathsf{Crit}}}$ specifying the content of three consecutive cells in a configuration, and unify a null with $c_{{\mathsf{Crit}}}$ to specify the content of the corresponding cell of a child configuration.

The argument above uses ${\mathsf{IncDep}}$ s and ${\mathsf{GuardedMap}}$ s, but we can simplify the mappings to ${\mathsf{ProjMap}}$ using ${\mathsf{GTGD}}$ s. ∎

A simple variation of the construction used for PSpace-hardness of entailment with ${\mathsf{IncDep}}$ s Casanova et al. (1984) shows that our upper bounds for ${\mathsf{IncDep}}$ source constraints and atomic maps are tight. The case of ${\mathsf{LTGD}}$ source constraints and projection maps can be done via reduction to that of ${\mathsf{IncDep}}$ source constraints and atomic maps:

Theorem 7.

${\mathsf{Disclose_{C}}}({\mathsf{IncDep}},{\mathsf{AtomMap}})$ * and ${\mathsf{Disclose_{C}}}({\mathsf{LTGD}},{\mathsf{ProjMap}})$ are both ExpTime-hard.*

The above results, coupled with argument that the lower bounds for entailment are inherited by disclosure, show tightness of all upper bounds from Table 1 in the unbounded arity case. Another variation of the encoding in Theorem 6 shows that with no restriction on the mappings one can not do better than the 2ExpTime upper bound of Corollary 1 even for ${\mathsf{IncDep}}$ constraints in bounded arity,

Theorem 8.

${\mathsf{Disclose_{C}}}({\mathsf{IncDep}},{\mathsf{CQMap}})$ * is 2ExpTime-hard in bounded arity.*

The theorem above, again combined with results showing that the lower bounds for entailment are inherited, suffice to show tightness of all upper bounds from Table 1 in the case of bounded arity.

We can also show that our tractability result for ${\mathsf{UID}}$ constraints and projection maps does not extend when either the maps or the constraints are broadened. Informally, this is because with these extensions we can generate an instance on which CQ querying is NP-hard.

5 Related Work

Disclosure analysis has been approached from many angles. We do not compare with the vast amount of work that analyzes probabilistic mechanisms for releasing information, providing probabilistic guarantees on disclosure Dwork (2006). Our work focuses on the impact of reasoning on mapping-based mechanisms used in knowledge-based information integration, which are deterministic; thus one would prefer, and can hope for, deterministic guarantees on disclosure. We deal here with the analysis of disclosure, while there is a complementary literature on how to enforce privacy Biskup and Weibert (2008); Bonatti et al. (1995); Bonatti and Sauro (2013); Studer and Werner (2014).

The problem of whether information is disclosed on a particular instance (variation of ${\mathsf{HOCWQ}}$ introduced in Section 3) has been studied in both the knowledge representation Lutz et al. (2013, 2015); Franconi et al. (2011); Ahmetaj et al. (2016); Amendola et al. (2018) and database community Abiteboul and Duschka (1998). The corresponding schema-level problem was defined in Benedikt et al. (2016), which allows arbitrary constraints relating the source and the global schema. However, results are provided only for constraints in guarded logics, which does not subsume the case of mappings given here. Our results clarify some issues in prior work: Benedikt et al. (2016) claimed that disclosure with ${\mathsf{IncDep}}$ source constraints and atomic maps is in PSpace, while our Theorem 7 shows that the problem is ExpTime-hard. Our notion of disclosure corresponds to the complement of Benedikt et al. (2018)’s “data-independent compliance”. The formal framework of Benedikt et al. (2018) is orthogonal to ours. On the one hand, source constraints are absent; on the other hand a more powerful mapping language is considered, with existentials in the head of rules, while constraints on the global schema, given by ontological axioms, are now allowed. Benedikt et al. (2018) assume that the attacker has an interface for posing queries against the global schema, with the queries being answered under entailment semantics. In general, the semantic information on the global schema makes disclosure harder, since the outputs of different mapping rules may be indistinguishable by an attacker who only sees the results of reasoning. In contrast, source constraints make disclosure of secrets easier, since they provide additional information to the attacker.

6 Summary and Conclusion

We have isolated the complexity of information disclosure from a schema in the presence of commonly-studied sets of source constraints. A summary of many combinations of mappings $\mathcal{M}$ and source constraints $\Sigma_{{\mathsf{Source}}}$ is given in Table 1: note that all problems are complete for the complexity classes listed. We have shown tractability in the case of ${\mathsf{UID}}$ s and projection maps (omitted in the tables), while showing that lifting the restriction leads to intractability. But we leave open a finer-grained analysis of complexity for frontier-one constraints with more general mappings. Our results depend on a fine-grained analysis of reasoning with TGDs and ${\mathsf{SCEQrule}}$ s, a topic we think is of independent interest.

Acknowledgements

This work was partially funder by CNRS Momentum project “Managing Data without Leak”.

Appendix A Detailed Proofs from Section 3: Upper Bounds for Disclosure

A.1 Proof of Theorem 2: Correctness of the Basic Reduction

from Disclosure to Classical Entailment

Recall the statement of Theorem 2, which applies the algorithms ${\mathsf{CritRewrite}}(\Sigma_{{\mathsf{Source}}})$ to TGDs and ${\mathsf{CritRewrite}}(\mathcal{M})$ to mappings.

${\mathsf{Disclose}}(\Sigma_{{\mathsf{Source}}},\mathcal{M},p)$ holds exactly when there is a $p_{\mathsf{Annot}}\in{\mathsf{CritRewrite}}(p)$ such that ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ entails $p_{\mathsf{Annot}}$ w.r.t. constraints:

[TABLE]

holds.

By Theorem 1 we know that ${\mathsf{Disclose}}(\Sigma_{{\mathsf{Source}}},\mathcal{M},p)$ is equivalent to ${\mathsf{HOCWQ}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})},\Sigma_{{\mathsf{Source}}}\cup\Sigma_{\mathcal{M}}(\mathcal{M}),p,{\cal G}(\mathcal{M}))$ .

This will immediately allow us to prove one direction of the equivalence. Suppose each of our entailments fails. From this, we see using Sagiv and Yannakakis [1980] that ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ does not entail the disjunction of $p_{\mathsf{Annot}}$ . Thus we have an instance $\mathcal{D}$ extending ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ with facts that may include the ${\mathsf{IsCrit}}$ predicate, where $\mathcal{D}$ satisfies all the rewritten constraints and no rewritten query $p_{\mathsf{Annot}}$ . Note that since $\mathcal{D}$ satisfies the constraints of ${\mathsf{CritRewrite}}(\mathcal{M})$ as well as ${\mathsf{IsCrit}}(\mathcal{M})$ , we know that the element $c_{{\mathsf{Crit}}}$ , if it occurs in ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ , will be labeled with ${\mathsf{IsCrit}}$ .

Form an instance $\mathcal{D}^{\prime}$ by unifying all elements $e$ in $\mathcal{D}$ satisfying ${\mathsf{IsCrit}}$ into a single element $c_{{\mathsf{Crit}}}$ , making $c_{{\mathsf{Crit}}}$ inherit any fact that such an $e$ participates in. That is, we choose $\mathcal{D}^{\prime}$ so that if $h$ is the mapping taking any element satisfying ${\mathsf{IsCrit}}$ to $c_{{\mathsf{Crit}}}$ and fixing every other element, then $h$ is a homomorphism from $\mathcal{D}$ onto $\mathcal{D}^{\prime}$ . We can easily verify that $\mathcal{D}^{\prime}$ satisfies the original source constraints $\Sigma_{{\mathsf{Source}}}$ . For each homomorphism $\lambda^{\prime}$ of the body of $\sigma^{\prime}\in\Sigma_{{\mathsf{Source}}}$ into $\mathcal{D}^{\prime}$ , there is a homomorphism $\lambda$ of some $\sigma\in{\mathsf{CritRewrite}}(\sigma^{\prime})$ into $\mathcal{D}$ . We know $\sigma$ is satisfied in $\mathcal{D}$ , and taking the $h$ -image of the tuples that witness this gives us the required witnesses for $\sigma^{\prime}$ in $\mathcal{D}^{\prime}$ . Now let $\mathcal{D}^{\prime}_{0}$ be the restriction of $\mathcal{D}^{\prime}$ to the source relations. We argue that the mapping image of $\mathcal{D}^{\prime}_{0}$ under $\mathcal{M}$ is exactly $\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ . To see that the image of $\mathcal{D}^{\prime}_{0}$ must include all the facts in $\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ , note that $\mathcal{D}$ includes all facts of ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ , which contains witnesses for each such fact. Thus the $h$ -image, namely $\mathcal{D}^{\prime}$ , contains witnesses for each such fact as well. Conversely, suppose the image of $\mathcal{D}^{\prime}_{0}$ includes a fact $F(\vec{d})$ ; we will argue that $F(\vec{d})$ is in $\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ . Since $\mathcal{D}$ satisfied ${\mathsf{IsCrit}}(\mathcal{M})$ , any such fact in $\mathcal{D}$ must have all $d_{i}$ satisfying ${\mathsf{IsCrit}}$ . Thus in $\mathcal{D}^{\prime}_{0}$ each such fact must be of the form $F(c_{{\mathsf{Crit}}}\ldots c_{{\mathsf{Crit}}})$ . Thus the $\mathcal{M}$ -image of $\mathcal{D}^{\prime}_{0}$ is exactly the same $\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ .

Finally, we claim that $\mathcal{D}^{\prime}$ satisfies $\neg p$ . If it satisfies $p$ , then $\mathcal{D}$ would satisfy $p_{\mathsf{Annot}}$ for some annotation ${\mathsf{Annot}}$ , a contradiction. Putting this all together, we see that $\mathcal{D}^{\prime}$ contradicts ${\mathsf{HOCWQ}}(\mathcal{D}_{{\mathsf{Crit}}},\Sigma_{{\mathsf{Source}}}\cup\Sigma_{\mathcal{M}}(\mathcal{M}),p,{\cal G}(\mathcal{M}))$ .

Before turning to the other direction, we will explain some other results that will be necessary. The first is the chase procedure for checking entailment of a query $Q$ from a set of constraints $\Sigma$ and a set of facts $\mathcal{D}$ . This proceeds by building a sequence of instances $\mathcal{D}=\mathcal{D}_{0}\ldots\mathcal{D}_{i}\ldots$ where each $\mathcal{D}_{i+1}$ is formed from $\mathcal{D}_{i}$ by “firing a rule” $\sigma\in\Sigma$ $\mathcal{D}_{i}$ . Firing $\sigma$ in $\mathcal{D}_{i}$ means finding a homomorphism $\lambda$ from the body of $\sigma$ into $\mathcal{D}_{i}$ , and adding facts to extend $\lambda$ to the head, using fresh values for all existentially quantified variables. Such a homomorphism $\lambda$ is called a trigger for the rule firing. The chase of $\mathcal{D}$ under $\Sigma$ , denoted ${\mathsf{Chase}}_{\Sigma}(\mathcal{D})$ , is any instance formed as the union of such a sequence having the additional property that every rule that could fire in some $\mathcal{D}_{i}$ fires in some later $\mathcal{D}_{j}$ . The significance of the chase for query entailment is the following result Fagin et al. [2005]:

Theorem 9.

For an instance $\mathcal{D}$ , set of TGDs $\Sigma$ , and UCQ $Q$ , we have ${\mathsf{QEntail}}(\mathcal{D},\Sigma,Q)$ if and only if some chase model for $\mathcal{D}$ under $\Sigma$ satisfies $Q$ .

We will also need a variation of the chase for the problem ${\mathsf{HOCWQ}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})},\Sigma_{{\mathsf{Source}}}\cup\Sigma_{\mathcal{M}}(\mathcal{M}),p,{\cal G}(\mathcal{M}))$ , taken from Benedikt et al. [2016]. The visible chase is a sequence of source instances $\mathcal{D}_{0}\ldots\mathcal{D}_{n}\ldots$ that begins with $\mathcal{D}_{0}={\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ . $\mathcal{D}_{i+1}$ is formed from $\mathcal{D}_{i}$ by “chasing and merging”. The chase step applies the usual chase procedure described above to $\mathcal{D}_{i}$ with constraints $\Sigma_{{\mathsf{Source}}}$ , creating new facts that possibly contain fresh values. In a merge step, we take a mapping $m\in\mathcal{M}$ and a homomorphism $\lambda$ of the body of $m$ into $\mathcal{D}_{i}$ , and for each free variable $x$ of $m$ , we replace $\lambda(x)$ by $c_{{\mathsf{Crit}}}$ in all facts in which it appears. We say that this is a merge step with $m,\lambda$ on $\mathcal{D}_{i}$ . Since the process is monotone, it must reach a fixpoint, which we refer to as the visible chase of $\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ , denoted ${\mathsf{VisChase}}(\Sigma_{{\mathsf{Source}}},\mathcal{M})$ .

Proposition 1.

Benedikt et al. [2016]** ${\mathsf{HOCWQ}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})},\Sigma_{{\mathsf{Source}}}\cup\Sigma_{\mathcal{M}}(\mathcal{M}),p,{\cal G}(\mathcal{M}))$ holds exactly when ${\mathsf{VisChase}}(\Sigma_{{\mathsf{Source}}},\mathcal{M})$ satisfies $p$ .

We now prove the other direction, assuming that ${\mathsf{HOCWQ}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})},\Sigma_{{\mathsf{Source}}}\cup\Sigma_{\mathcal{M}}(\mathcal{M}),p,{\cal G}(\mathcal{M}))$ fails, but one of the entailments holds. By Theorem 9, this means that some chase of ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ under the constraints ${\mathsf{CritRewrite}}(\Sigma_{{\mathsf{Source}}})\cup{\mathsf{CritRewrite}}(\mathcal{M})\cup{\mathsf{IsCrit}}(\mathcal{M})$ satisfies $p_{\mathsf{Annot}}$ for some annotation ${\mathsf{Annot}}$ . Let $\mathcal{D}^{\prime}_{0}\ldots\mathcal{D}^{\prime}_{n}\ldots$ denote such a chase sequence for ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ under ${\mathsf{CritRewrite}}(\Sigma_{{\mathsf{Source}}})\cup{\mathsf{CritRewrite}}(\mathcal{M})\cup{\mathsf{IsCrit}}(\mathcal{M})$ . We form another sequence $\mathcal{D}_{0}\ldots\mathcal{D}_{n}\ldots$ , with $\mathcal{D}_{0}=\mathcal{D}^{\prime}_{0}$ , maintaining the invariant that there is a homomorphism $h_{i}$ from $\mathcal{D}^{\prime}_{i}$ to $\mathcal{D}_{i}$ mapping every element satisfying ${\mathsf{IsCrit}}$ to $\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ . The inductive step is performed as follows:

•

For every chase step with a rule $\sigma^{\prime}$ of ${\mathsf{CritRewrite}}(\Sigma_{{\mathsf{Source}}})$ applied in $\mathcal{D}^{\prime}_{i}$ , having trigger $\lambda^{\prime}$ , we know that $\sigma^{\prime}={\mathsf{CritRewrite}}(\sigma)$ for some $\sigma\in\Sigma_{{\mathsf{Source}}}$ . We can apply the corresponding rule $\sigma$ in $\mathcal{D}_{i}$ , with a trigger $\lambda$ that maps a variable $x$ to the $h_{i}$ -image of $\lambda^{\prime}(x)$ . Thus $\lambda$ composed with $h_{i}$ is $\lambda$ .

•

For every chase step in $\mathcal{D}^{\prime}_{i}$ with a rule of $\sigma^{\prime}\in{\mathsf{CritRewrite}}(m)$ for $m\in\mathcal{M}$ and a trigger $\lambda$ , we apply a merge step in $\mathcal{D}_{i}$ with $m$ and $\lambda$ .

Since some $\mathcal{D}^{\prime}_{n}$ satisfies $p_{\mathsf{Annot}}$ , one of the $\mathcal{D}_{n}$ must satisfy $p_{\mathsf{Annot}}$ . Since $\mathcal{D}_{n}$ contains the image of $\mathcal{D}^{\prime}_{n}$ under the homomorphism $h_{n}$ , and $h_{n}$ maps $p_{\mathsf{Annot}}$ to $p$ , we see that $\mathcal{D}_{n}$ must satisfy $p$ . But $\mathcal{D}_{n}$ is a subinstance of the visible chase for our ${\mathsf{HOCWQ}}$ problem. Thus the assumption that ${\mathsf{HOCWQ}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})},\Sigma_{{\mathsf{Source}}}\cup\Sigma_{\mathcal{M}}(\mathcal{M}),p,{\cal G}(\mathcal{M}))$ fails and Proposition 1 imply that $p$ cannot hold in $\mathcal{D}_{n}$ , a contradiction.

A.2 Simplifying Mappings

In this section, we will see that we can simplify mapping to be projection maps at the cost of moving to a richer class of source constraints.

Given a problem ${\mathsf{Disclose}}(\Sigma_{{\mathsf{Source}}},\mathcal{M},p)$ we consider $\Sigma_{{\mathsf{Source}}}^{\prime}$ and $\mathcal{M}^{\prime}$ built in the following way: $\Sigma_{{\mathsf{Source}}}^{\prime}$ is composed of $\Sigma_{{\mathsf{Source}}}$ plus for each mapping $\phi(\vec{x},\vec{y})\rightarrow T(\vec{x})$ we create a predicate $R_{\phi}(\vec{x},\vec{y})$ and we add to $\Sigma_{{\mathsf{Source}}}^{\prime}$ the two constraints $\phi(\vec{x},\vec{y})\rightarrow R_{\phi}(\vec{x},\vec{y})$ and $R_{\phi}(\vec{x},\ \vec{y})\rightarrow\phi(\vec{x},\vec{y})$ . $\mathcal{M}^{\prime}$ is composed of mappings $R_{\phi}(\vec{x},\vec{y})\rightarrow T_{\phi}(\vec{x})$ .

Proposition 2.

We have ${\mathsf{Disclose}}(\Sigma_{{\mathsf{Source}}},\mathcal{M},p)$ if and only if ${\mathsf{Disclose}}(\Sigma_{{\mathsf{Source}}}^{\prime},\mathcal{M}^{\prime},p)$ .

Proof.

To prove the proposition, it is sufficient to prove that $p$ holds on ${\mathsf{VisChase}}(\Sigma_{{\mathsf{Source}}},\mathcal{M})$ if and only if $p$ holds on ${\mathsf{VisChase}}(\Sigma_{{\mathsf{Source}}}^{\prime},\mathcal{M}^{\prime})$ (see Proposition 1). Let $\Pi(\mathcal{D})$ be the instance obtained by removing all the facts $R_{\phi}(\vec{x},\vec{y})$ in $\mathcal{D}$ .

We recall that the visible chase works iteratively, at each step a database $\mathcal{D}_{i+1}$ is created from $\mathcal{D}_{i}$ by chasing all facts then merging some values with $c_{{\mathsf{Crit}}}$ . For the sake of simplicity we suppose that each step is composed of either one rule firing or one merging.

•

We start by proving that ${\mathsf{Disclose}}(\Sigma_{{\mathsf{Source}}},\mathcal{M},p)$ implies ${\mathsf{Disclose}}(\Sigma_{{\mathsf{Source}}}^{\prime},\mathcal{M}^{\prime},p)$ .

Let $\mathcal{D}_{0},\dots$ be a sequence corresponding to ${\mathsf{VisChase}}(\Sigma_{{\mathsf{Source}}},\mathcal{M})$ . We build a sequence $\mathcal{D}^{\prime}_{0},\dots$ corresponding to ${\mathsf{VisChase}}(\Sigma_{{\mathsf{Source}}}^{\prime},\mathcal{M}^{\prime})$ . We are trying to build $\mathcal{D}^{\prime}_{0},\dots$ such that there exists for all $i$ there exists $j$ such that $\mathcal{D}_{i}=\Pi(\mathcal{D}^{\prime}_{j})$ , and $h(x)=c_{{\mathsf{Crit}}}$ implies $x=c_{{\mathsf{Crit}}}$ .

We prove by induction:

–

$\mathcal{D}_{0}$ is composed of witnesses of $\mathcal{M}$ and $\mathcal{D}^{\prime}_{0}$ of witnesses of $\mathcal{M}^{\prime}$ . We build $\mathcal{D}^{\prime}_{1},\dots,\mathcal{D}^{\prime}_{j}$ such that each $\mathcal{D}^{\prime}_{i}$ is obtained by firing the $i$ -th rule $R_{\phi}(\vec{x},\vec{y})\rightarrow\phi(\vec{x},\vec{y})$ .

–

Let us suppose that $\mathcal{D}_{i}=\Pi(\mathcal{D}^{\prime}_{j})$ and $\mathcal{D}_{i+1}$ is obtained by firing a rule $\sigma$ ; $\sigma$ could have been fired on $\mathcal{D}^{\prime}_{j}$ and thus we can build $\mathcal{D}^{\prime}_{j+1}$ such that $\mathcal{D}_{i+1}=\Pi(\mathcal{D}^{\prime}_{j+1})$ .

–

When $\mathcal{D}_{i+1}$ is obtained by merging values then it means that we have $\phi(\vec{x},\vec{y})$ holding in $\mathcal{D}_{i}$ and thus $\phi(\vec{x},\vec{y})$ holding in $\Pi(D^{\prime}_{j})$ therefore we could use the rule $\phi(\vec{x},\vec{y})\rightarrow R_{\phi}(\vec{x},\vec{y})$ followed by an unification on $R_{\phi}$ . Therefore we can build $\mathcal{D}^{\prime}_{j+1}=\mathcal{D}^{\prime}_{j}\cup\{R_{\phi}(\vec{x},\vec{y})\}$ and $\mathcal{D}^{\prime}_{j+2}$ such that $\mathcal{D}_{i+1}=\Pi(\mathcal{D}_{j+2})$ .

•

For the direction ${\mathsf{Disclose}}(\Sigma_{{\mathsf{Source}}}^{\prime},\mathcal{M}^{\prime},p)$ implies ${\mathsf{Disclose}}(\Sigma_{{\mathsf{Source}}},\mathcal{M},p)$ we start by noticing that, without loss of generality, we can suppose that the sequence $\mathcal{D}^{\prime}_{0},\dots$ of ${\mathsf{VisChase}}(\Sigma_{{\mathsf{Source}}}^{\prime},\mathcal{M}^{\prime})$ starts by firing each rule $R_{\phi}(\vec{x},\vec{y})\rightarrow\phi(\vec{x},\vec{y})$ (it is always possible to generate more facts) and then we create $\mathcal{D}_{0},\dots$ such that for all $i$ big enough there exists $j$ such that $\mathcal{D}_{j}=h(\Pi(\mathcal{D}^{\prime}_{i}))$

–

Once all rules $R_{\phi}(\vec{x},\vec{y})\rightarrow\phi(\vec{x},\vec{y})$ have been fired, we see that we obtain an instance isomorphic to $\mathcal{D}_{0}$ .

–

When $\mathcal{D}_{j}=h(\Pi(\mathcal{D}^{\prime}_{i}))$ and $\mathcal{D}^{\prime}_{i+1}$ is obtained through a merge step, it means that we had $\mathcal{D}^{\prime}_{i}\models R_{\phi}(\vec{x},\vec{y})$ but we easily see by induction that this means that we had $\mathcal{D}_{j}\models h(\phi(\vec{x},\vec{y}))$ and thus that we can also perform the merge step on $\mathcal{D}_{j}$

–

When $\mathcal{D}^{\prime}_{i+1}$ is obtained through a rule, it is either a rule in $\Sigma_{{\mathsf{Source}}}$ that we can reproduce in $\mathcal{D}_{j}$ or it is a rule $\phi(\vec{x},\vec{y})\rightarrow R_{\phi}(\vec{x},\vec{y})$ . In this latter case, we don’t have anything to do as $R_{\phi}(\vec{x},\vec{y})$ will be discarded by $\Pi$ .

Now, we also see that $j$ will grow as $i$ grows since except for rules $\phi(\vec{x},\vec{y})\rightarrow R_{\phi}(\vec{x},\vec{y})$ , our $j$ increases. Therefore at the limit we have that ${\mathsf{VisChase}}(\Sigma_{{\mathsf{Source}}}^{\prime},\mathcal{M}^{\prime})\models p$ implies ${\mathsf{VisChase}}(\Sigma_{{\mathsf{Source}}},\mathcal{M})\models p$ .

∎

Corollary 4.

${\mathsf{Disclose_{C}}}({\mathsf{GTGD}},{\mathsf{GuardedMap}})$ * reduces to ${\mathsf{Disclose_{C}}}({\mathsf{GTGD}},{\mathsf{ProjMap}})$ .*

A.3 More Details for the Proof of Corollary 2

We recall the statement of Corollary 2:

If we fix the maximal arity of relations in the schema, then ${\mathsf{Disclose_{C}}}({\mathsf{GTGD}},{\mathsf{GuardedMap}})$ is in ExpTime.

We now fill in the details of the proof sketch in the body.

Reducing to ${\mathsf{ProjMap}}$ .

Using Corollary 4, we can reduce the problem to ${\mathsf{Disclose_{C}}}({\mathsf{GTGD}},{\mathsf{ProjMap}})$ . We now show that this latter problem is in ExpTime.

Reducing to two atoms in the body of TGDs.

Given a set of ${\mathsf{GTGD}}$ s $\Sigma_{{\mathsf{Source}}}$ and a set of maps $\mathcal{M}\in{\mathsf{IncDep}}$ we now reduce ${\mathsf{Disclose}}(\Sigma_{{\mathsf{Source}}},\mathcal{M},p)$ to ${\mathsf{Disclose}}(\Sigma_{{\mathsf{Source}}}^{\prime},\mathcal{M},p)$ where each ${\mathsf{GTGD}}$ in $\Sigma_{{\mathsf{Source}}}^{\prime}$ holds at most two conjuncts in the rule body.

$\Sigma_{{\mathsf{Source}}}^{\prime}$ is composed by applying the following process for each ${\mathsf{GTGD}}$ $\phi(\vec{x})\rightarrow\exists\vec{y}~{}R(\vec{t})\in\Sigma_{{\mathsf{Source}}}$ . The constraint $\phi(\vec{x})\rightarrow\exists\vec{y}~{}R(\vec{t})$ is guarded, therefore we can select a guarding conjunct $G_{\phi}(\vec{x})$ such that $\phi(\vec{x})=G_{\phi}(\vec{x})\land Q_{1}(\vec{x})\land\dots\land Q_{k}(\vec{x})$ . When $k\leq 1$ we simply add $\phi(\vec{x})\rightarrow\exists\vec{y}~{}R(\vec{t})$ to $\Sigma_{{\mathsf{Source}}}^{\prime}$ . When $k>1$ , we rewrite this constraint by introducing $k$ predicates $R_{1},\dots,R_{k}$ , while producing the following constraints $G_{\phi}(\vec{x})\land Q_{1}(\vec{x})\rightarrow R_{1}(\vec{x})$ and for $1\leq i\leq k-1$ : $R_{i}(\vec{x})\land Q_{i+1}(\vec{x})\rightarrow R_{i+1}(\vec{x})$ . Finally we also add $R_{k}(\vec{x})\rightarrow\exists\vec{y}~{}R(\vec{t})$ . It is easy to see that this new problem is equivalent because each constraint in $\Sigma_{{\mathsf{Source}}}$ is implied by its corresponding constraints in $\Sigma_{{\mathsf{Source}}}^{\prime}$ and if we look at the result of the visible chase, the only fact derived from a $R_{k}(\vec{x})$ are facts $R(\vec{y})$ such that $\phi(\vec{x})$ .

Rewriting in PTime.

Now that maps are ${\mathsf{ProjMap}}$ s and each ${\mathsf{GTGD}}$ has at most two atoms in their body, we can apply the rewriting presented in Theorem 2. Notice that each ${\mathsf{GTGD}}$ will be rewritten to a bounded number of ${\mathsf{GTGD}}$ s, and the rewriting of the maps will be trivial. Since query entailment with ${\mathsf{GTGD}}$ s is ExpTime when the arity is bounded we can conclude the proof.

A.4 Proof of Theorem 3: More Efficient Reduction to Entailment for ${\mathsf{LTGD}}$ Source Constraints and Atomic Mappings

Recall the statement of Theorem 3, which concerns the application of the rewriting algorithms ${\mathsf{CritRewrite}}_{\textsc{PTime}}(\Sigma_{{\mathsf{Source}}})$ for ${\mathsf{LTGD}}$ source constraints $\Sigma_{{\mathsf{Source}}}$ , and the algorithm ${\mathsf{CritRewrite}}_{\textsc{PTime}}(\mathcal{M})$ for atomic mappings $\mathcal{M}$ :

${\mathsf{Disclose}}(\Sigma_{{\mathsf{Source}}},\mathcal{M},p)$ holds exactly when there is a $Q_{\mathsf{Annot}}\in{\mathsf{CritRewrite}}(Q)$ such that ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ entails $Q_{\mathsf{Annot}}$ w.r.t. to the constraints

[TABLE]

Let $\Sigma_{{\mathsf{simple}}}={\mathsf{CritRewrite}}_{\textsc{PTime}}(\Sigma_{{\mathsf{Source}}})\cup{\mathsf{CritRewrite}}_{\textsc{PTime}}(\mathcal{M})\cup{\mathsf{IsCrit}}(\mathcal{M})$ and $\Sigma_{\textsc{PTime}}$ be the constraints posed in Theorem 3. By Theorem 2, it is enough to show that query entailment involving $\Sigma_{\textsc{PTime}}$ is equivalent to entailment involving $\Sigma_{{\mathsf{simple}}}$ .

In one direction, suppose that $I$ is a counterexample to entailment involving $\Sigma_{{\mathsf{simple}}}$ . We fire the rules generating atoms $B_{e,f}$ to get instance $I^{\prime}$ . We claim that the constraints of $\Sigma_{\textsc{PTime}}$ hold. Clearly, the rules generating atoms $B_{e,f}$ hold. Further, by construction, for any $e,f$ if $B_{e,f}$ holds exactly when there is an annotation We now consider the rule $B_{e_{h},f_{h}}(\vec{x})\rightarrow\exists\vec{z}~{}H(\vec{z})$ . Considering a $\vec{c}$ such that $B_{e_{h},f_{h}}(\vec{c})$ holds, we want to claim that there is an annotation ${\mathsf{Annot}}$ such that $B_{\mathsf{Annot}}(\vec{c})$ holds.

Recall that each $e_{i},f_{i}$ is associated with some variable $v$ that occurs as both $x_{e_{i}}$ and $x_{f_{i}}$ in $B(\vec{x})$ . If $B_{e_{i},f_{i}}(\vec{c})$ holds, we know that either $c_{e_{i}}=c_{f_{i}}$ or ${\mathsf{IsCrit}}(c_{e_{i}})\wedge{\mathsf{IsCrit}}(c_{f_{i}})$ holds. If the latter happens, then we add the variable $v$ to our annotation. We can then verify that $B_{\mathsf{Annot}}(\vec{c})$ holds.

Since we are assuming that the corresponding constraint of $\Sigma_{{\mathsf{simple}}}$ holds in $I$ , we can conclude that $I^{\prime},\vec{c}\models\exists\vec{z}~{}H(\vec{z})$ . From this we see that $I^{\prime}$ is a counterexample to the entailment involving $\Sigma_{\textsc{PTime}}$ .

In the other direction, let $I^{\prime}$ be a counterexample to the entailment for the constraints in $\Sigma_{\textsc{PTime}}$ . We claim that the constraints of $\Sigma_{{\mathsf{simple}}}$ hold of $I^{\prime}$ . For constraints corresponding to source constraints with no repeated variables in the body, this is easy to verify, so we concentrate on constraints deriving from source constraints that do have repeated variables in the body.

Each of these constraints is of the form $B_{\mathsf{Annot}}(\vec{x})\rightarrow\exists\vec{z}~{}H(\vec{z})$ for some annotation ${\mathsf{Annot}}$ . Fix a $\vec{c}$ such that $B_{\mathsf{Annot}}(\vec{c})$ holds. We claim that $B_{e,f}(\vec{c})$ holds for all $(e,f)\in P$ . We prove this by induction on the position of $(e,f)$ in the ordering of pairs in $P$ . Each $(e,f)$ corresponds to some variable $v$ that is repeated. If $v$ is in ${\mathsf{Annot}}$ , then $B_{\mathsf{Annot}}(\vec{c})$ implies that ${\mathsf{IsCrit}}(c_{e})\wedge{\mathsf{IsCrit}}(c_{f})$ hold. Using the corresponding rule and the induction hypothesis we conclude that $B_{e,f}(\vec{c})$ holds. If $v$ is not in ${\mathsf{Annot}}$ then $B_{\mathsf{Annot}}(\vec{c})$ implies that $c_{e}=c_{f}$ . Using the other rule generating $B_{e,f}$ in $\Sigma_{\textsc{PTime}}$ , as well as the induction hypothesis, we conclude that $B_{e,f}(\vec{c})$ holds. This completes the inductive proof that $B_{e,f}(\vec{c})$ holds. Now using the corresponding constraint of $\Sigma_{\textsc{PTime}}$ we conclude that $I^{\prime},\vec{c}\models\exists\vec{z}~{}H(\vec{z})$ . Since the constraints of $\Sigma_{{\mathsf{simple}}}$ hold, $I^{\prime}$ is also a counterexample to the entailment involving $\Sigma_{{\mathsf{simple}}}$ .

A.5 More details in proof of Theorem 4: upper bounds

for ${\mathsf{LTGD}}$ source constraints and atomic maps

Recall the statement of Theorem 4

The problem ${\mathsf{Disclose_{C}}}({\mathsf{LTGD}},{\mathsf{AtomMap}})$ is in ExpTime. If the arity of relations in the source schema is bounded, then the complexity drops to NP. If further the query is atomic, the problem is in PTime.

We now give more details on the proof. As mentioned in the body, is sufficient to get an ExpTime algorithm for the entailment problem produced by Theorem 3, since then we can apply it to each $p_{\mathsf{Annot}}$ in ExpTime. The constraints in ${\mathsf{CritRewrite}}_{\textsc{PTime}}(\Sigma_{{\mathsf{Source}}})\cup{\mathsf{CritRewrite}}_{\textsc{PTime}}(\mathcal{M})$ are Guarded TGDs that are not necessarily ${\mathsf{LTGD}}$ s. But the bodies of these guarded TGDs consist of a guard predicate and atoms over a fixed “side signature”, namely the unary predicate ${\mathsf{IsCrit}}$ . We can apply now the linearization technique, originating in Gottlob et al. [2014] and refined in Amarilli and Benedikt [2018a]. Given a side signature ${\cal S}_{{\mathsf{Side}}}$ this is an algorithm that converts an entailment problem involving ta set of non-full ${\mathsf{IncDep}}$ s and Guarded TGDs using ${\cal S}_{{\mathsf{Side}}}$ , producing an equivalent entailment problem involving the same query, but only ${\mathsf{LTGD}}$ s. Further:

•

The algorithm runs in ExpTime in general, and in PTime when the arity of the relations in the input is fixed

•

The algorithm does not increase the arity of the signature, and thus the size of each output ${\mathsf{LTGD}}$ is polynomially-bounded in the input.

See also Appendix G of Amarilli and Benedikt [2018b] for a longer exposition of the linearization technique. Thus for general arity, we can use this algorithm to get an entailment problem with the same query, a data set exponentially bounded in the input data $I^{\prime}$ and a set of ${\mathsf{LTGD}}$ s, each polynomially-sized in the inputs. By applying a standard first-order query-rewriting algorithm to the query, we reduce this problem to evaluation of a union of conjunctive queries get a UCQ $Q^{\prime}$ on $I^{\prime}$ . The size of each conjunct in $Q^{\prime}$ is polynomially-bounded in the inputs, and so each conjunct $C$ can be evaluated in time $|I^{\prime}|^{|C^{\prime}|}$ , giving an ExpTime algorithm in total.

For fixed arity, we apply the same algorithm to get an entailment problem using ${\mathsf{IncDep}}$ s of bounded arity, which is known Johnson and Klug [1984] to be solvable in NP. Further, when the query is atomic, entailment with ${\mathsf{IncDep}}$ s is in PTime.

A.6 Proof of Theorem 5: Disclosure for ${\mathsf{UID}}$ Source Constraints and ${\mathsf{ProjMap}}$ is PTime

We prove that when the source constraints are ${\mathsf{UID}}$ s and the mappings are projections, disclosure analysis is in PTime. By Theorem 1, it suffices to show that the problem ${\mathsf{HOCWQ}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})},\Sigma_{{\mathsf{Source}}}\cup\Sigma_{\mathcal{M}}(\mathcal{M}),p,{\cal G}(\mathcal{M}))$ is PTime. We will thus first reduce this problem a problem ${\mathsf{QEntail}}(\mathcal{D},\Sigma,p)$ where $\Sigma$ is composed of ${\mathsf{UID}}$ constraints and $\mathcal{D}$ is composed of a single unary fact ${\mathsf{IsCrit}}(c_{{\mathsf{Crit}}})$ .

Reachable predicates.

We define the entailment graph over a set of ${\mathsf{IncDep}}$ constraints $\Sigma$ . In this graph, nodes correspond to predicates and there is an edge $P\rightarrow R$ for each constraint $P(\vec{x})\rightarrow R(\vec{y})$ . Given an initial set of facts $\mathcal{D}$ , one can compute the set ${\mathsf{Reachable}}(\Sigma,\mathcal{D})$ of entailed predicates. This set is defined as the set of predicates reachable in the entailment graph starting from the predicates appearing in $\mathcal{D}$ .

Visible position graph.

In studying tuple-generating dependencies, one often associates a set of dependencies with a graph whose edges represent the flow of data from one relation to another via the dependencies. See, for example the position graph used in defining the class of weakly acyclic sets of TGDs Fagin et al. [2005].

We develop another such graph, the visible position graph associated with a set of source constraints and mappings. The nodes are the pairs $(P,i)$ where $P$ is a predicate, $1\leq i\leq ar(P)$ and there is an edge $(P,i)\rightarrow(R,j)$ when we have an ${\mathsf{IncDep}}$ (either a source constraint or a mapping rule) $P(\vec{x})\rightarrow\exists\vec{y}~{}R(\vec{t})$ with $x_{i}=t_{j}$ . We refer to a node in this graph as a position. A position of a relation in the source schema is said to be visible if there is a path from $(P,i)$ to a node $(R,j)$ such that $R$ belongs to the global schema. Another other position is said to be invisible. We see that when a position $(P,i)$ is visible then for any fact $P(\vec{c})$ that holds in a possible world for ${\mathsf{HOCWQ}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})},\Sigma_{{\mathsf{Source}}}\cup\Sigma_{\mathcal{M}}(\mathcal{M}),p,{\cal G}(\mathcal{M}))$ we must have $c_{i}=c_{{\mathsf{Crit}}}$ .

Note that if we have $P(\vec{x})\rightarrow\exists\vec{y}~{}R(\vec{t})$ , $x_{i}$ is exported to $t_{j}$ , and position $j$ of $R$ is visible, then position $i$ of $P$ is visible as well.

Reduction to entailment.

Let $\Sigma=\Sigma_{{\mathsf{Source}}}\cup\Sigma_{\mathcal{M}}(\mathcal{M})$ and $\mathcal{D}_{0}=\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ . We will reduce the problem ${\mathsf{HOCWQ}}(\mathcal{D}_{0},\Sigma,p,{\cal G}(\mathcal{M}))$ to the problem ${\mathsf{QEntail}}(\mathcal{D}^{\prime}_{0},\tilde{\Sigma},\tilde{p})$ , where $\tilde{\Sigma}=\Sigma_{reach}\cup\Sigma_{1}\cup\Sigma_{c_{{\mathsf{Crit}}}}$ is a set of ${\mathsf{UID}}$ s, and $\tilde{p}$ is a CQ. Our reduction proceeds as follows:

•

We transform the schema for sources creating a predicate $\tilde{P}$ for each source predicate $P$ , where the arity of $\tilde{P}$ is the arity of $P$ minus the number of positions $(P,i)$ that are visible.

•

$\mathcal{D}^{\prime}_{0}=\{{\mathsf{IsCrit}}(c_{{\mathsf{Crit}}})\}$ .

•

$\Sigma_{reach}$ is built as the set of constraints ${\mathsf{IsCrit}}(w)\rightarrow\exists\vec{x}~{}P(\vec{x})$ where $\vec{x}$ are fresh distinct variables and $P\in{\mathsf{Reachable}}({\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}),\Sigma)$ .

•

$\Sigma_{1}$ is formed from the set of constraints $P(\vec{x})\rightarrow\exists\vec{y}~{}R(\vec{t})\in\Sigma$ such that there is an exported variable lying in an invisible position of $P(\vec{x})$ . For each such constraint, $\Sigma_{1}$ contains the constraint $\tilde{P}(\vec{x}^{*})\rightarrow\exists\vec{y^{*}}~{}\tilde{R}(\vec{t}^{*})$ where $\vec{x}^{*}$ denotes the projection of $\vec{x}$ to the invisible positions of $P$ , and similarly for $\vec{y^{*}}$ and $\vec{t}^{*}$ .

•

$\Sigma_{c_{{\mathsf{Crit}}}}$ is formed from constraints $P(\vec{x})\rightarrow\exists\vec{y}~{}R(\vec{t})\in\Sigma$ such that $P\in{\mathsf{Reachable}}({\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}),\Sigma)$ and there is an exported variable $x$ lying in a visible position of $P(\vec{x})$ , exported to an invisible position of $R$ . For each such constraint $\Sigma_{c_{{\mathsf{Crit}}}}$ includes the constraint ${\mathsf{IsCrit}}(x)\rightarrow\exists\vec{y}^{*}~{}\tilde{R}(\vec{t}^{*})$ where $\vec{y}^{*}$ denotes the projection of $\vec{y}$ to the invisible positions of $P$ and similarly for $\vec{t}^{*}$ .

•

the query $\tilde{p}$ is built from $p$ by first replacing each conjunct $P(\vec{x})$ with its corresponding predicate $\tilde{P}(\tilde{\vec{x}})$ , projecting out the visible positions. After this, for every variable $x$ that occurred in $p$ within both a visible and an invisible position, $x$ is replaced by $v$ , while we add a conjunct ${\mathsf{IsCrit}}(v)$ .

Correctness of the reduction.

The correctness of the reduction is captured in the following result:

Proposition 3.

For any source constraints $\Sigma_{{\mathsf{Source}}}$ consisting of ${\mathsf{IncDep}}$ s and $\mathcal{M}$ consisting of projection mappings, there is a disclosure over a schema $\mathcal{S}$ with constraints $\Sigma_{{\mathsf{Source}}}$ mappings $\mathcal{M}$ and secret query $p$ if and only if ${\mathsf{QEntail}}(\mathcal{D}^{\prime}_{0},\tilde{\Sigma},\tilde{p})$ holds.

Proof.

We start with the argument for the left to right direction. We let $\mathcal{D}^{\prime}$ be a counterexample to the entailment ${\mathsf{QEntail}}(\mathcal{D}^{\prime}_{0},\tilde{\Sigma},\tilde{p})$ . By Theorem 9, we can assume that $\mathcal{D}^{\prime}$ is formed by applying the chase procedure to $\mathcal{D}^{\prime}_{0}$ . In particular, each fact in $\mathcal{D}^{\prime}$ can be assumed to use a predicate in ${\mathsf{Reachable}}({\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}),\Sigma)$ .

We show that there is an instance $\mathcal{D}$ that is a counterexample to

[TABLE]

and thus (by Theorem 1) we cannot have a disclosure. We form $\mathcal{D}$ by filling out each visible position with $c_{{\mathsf{Crit}}}$ . We claim that $\mathcal{D}$ satisfies each source constraint $\sigma=P(\vec{x})\rightarrow\exists\vec{y}~{}R(\vec{t})$ . Suppose that $P(\vec{c})$ holds in $\mathcal{D}$ . Then $\tilde{P}(\vec{c}^{\prime})$ holds in $\mathcal{D}^{\prime}$ , where $\vec{c}^{\prime}$ projects $\vec{c}$ on to the invisible positions.

•

First, suppose there is a variable $x$ in an invisible position of $P(\vec{x})$ exported to an invisible position in $R(\vec{t})$ . Then since $\mathcal{D}^{\prime}$ satisfies $\Sigma_{1}$ , we know that for some $\vec{d}$ , $\tilde{R}(\vec{d})$ holds in $\mathcal{D}^{\prime}$ , By the definition of $\mathcal{D}$ , we have that $R(\vec{d}^{*})$ holds, where $\vec{d}^{*}$ fills out each visible position with $c_{{\mathsf{Crit}}}$ . We can see that $R(\vec{d}^{*})$ is the required witness for $P(\vec{c})$ .

•

Next, suppose there is a variable $x$ in a visible position $j$ of $P(\vec{x})$ exported to an invisible position in $R(\vec{t})$ . Then we must have $c_{j}=c_{{\mathsf{Crit}}}$ . Since $P$ is in ${\mathsf{Reachable}}({\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}),\Sigma)$ and $\mathcal{D}^{\prime}$ satisfies $\Sigma_{c_{{\mathsf{Crit}}}}$ , we have $\tilde{R}(\vec{e})$ holding in $\mathcal{D}^{\prime}$ for some $\vec{e}$ , and hence $R(\vec{f})$ holding in $\mathcal{D}$ for some tuple where $c_{{\mathsf{Crit}}}$ fills all the visible positions. Thus $\sigma$ holds in this case as well.

•

Finally, note that a variable at an invisible position cannot be exported to a visible position. Therefore the only remaining case is the case where no variable has been exported. Since $P$ is reachable, then $R$ is also reachable therefore there is a constraints ${\mathsf{IsCrit}}(x)\rightarrow\exists\vec{y}^{*}R(\vec{y}^{*})\in\Sigma_{{\mathsf{Reachable}}}$ and thus $\tilde{R}(\vec{d}^{*})$ holds in $\mathcal{D}^{\prime}$

We next claim that the image of $\mathcal{D}$ under $\mathcal{M}$ agrees with $\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ .

•

For every global schema predicate $G$ , $G(c_{{\mathsf{Crit}}}\ldots c_{{\mathsf{Crit}}})$ occurs in the the image of $\mathcal{D}$ under $\mathcal{M}$ . This follows easily from the fact that $\mathcal{D}^{\prime}$ contains $\mathcal{D}^{\prime}_{0}$ .

•

If $G(\vec{c})$ holds in the $\mathcal{M}$ -image, then because each visible position was filled out with $c_{{\mathsf{Crit}}}$ , we must have each $c_{i}=c_{{\mathsf{Crit}}}$ . Thus the result follows.

Note that from the preceding claims, we know that $\mathcal{D}$ is a possible world for ${\mathsf{HOCWQ}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})},\Sigma_{{\mathsf{Source}}}\cup\Sigma_{\mathcal{M}}(\mathcal{M}),p,{\cal G}(\mathcal{M}))$ . Finally, we claim that $\mathcal{D}$ does not satisfy $p$ .

•

Suppose $\mathcal{D}\models p$ with homomorphism $h$ as a witness. Since $\mathcal{D}$ is a possible world for ${\mathsf{HOCWQ}}(\mathcal{D}_{0},\Sigma,p,{\cal G}(\mathcal{M}))$ , for any variable $v$ occurring in a visible position, $h(v)=c_{{\mathsf{Crit}}}$ . Let $h^{\prime}$ be formed from the restriction of $h$ to variables that occur in $\tilde{p}$ , by mapping the additional variable $v$ to $c_{{\mathsf{Crit}}}$ . Note that in $\mathcal{D}^{\prime}$ , ${\mathsf{IsCrit}}(c_{{\mathsf{Crit}}})$ holds. For this, we see that $h^{\prime}$ is a homomorphism witnessing that $\mathcal{D}^{\prime}\models\tilde{p}$ . This is a contradiction to the fact that $\mathcal{D}^{\prime}$ is a counterexample to the entailment.

We now have argued that $\mathcal{D}$ is a counterexample to ${\mathsf{HOCWQ}}(\mathcal{D}_{0},\Sigma,p,{\cal G}(\mathcal{M}))$ , which completes the proof of the left to right direction.

For the other direction, suppose that $\mathcal{D}$ is a counterexample to ${\mathsf{HOCWQ}}(\mathcal{D}_{0},\Sigma,p,{\cal G}(\mathcal{M}))$ . Note that for any fact $R(\vec{c})$ over the source relations in $\mathcal{D}$ , for any visible position $i$ of $R$ , we must have $c_{i}=c_{{\mathsf{Crit}}}$ . Form $\mathcal{D}^{\prime}$ by projecting each fact in $\mathcal{D}$ to the invisible positions of the relation. We will argue that $\mathcal{D}^{\prime}$ is a counterexample to the entailment produced by the reduction.

•

$\mathcal{D}$ should contain ${\mathsf{IsCrit}}(c_{{\mathsf{Crit}}})$ therefore $\mathcal{D}^{\prime}$ extends $\mathcal{D}^{\prime}_{0}$ .

•

The fact that $\mathcal{D}$ was a solution to ${\mathsf{HOCWQ}}(\mathcal{D}_{0},\Sigma,p,{\cal G}(\mathcal{M}))$ also guarantees that for all reachable predicates $P$ we have $\mathcal{D}\models\exists\vec{x}~{}{P}(\vec{x})$ and thus $\mathcal{D}^{\prime}\models\exists\vec{x}^{*}~{}\tilde{P}(\vec{x}^{*})$ and thus all constraints in $\Sigma_{{\mathsf{Reachable}}}$ are satisfied.

•

Let us show that the constraints in $\Sigma_{1}$ are satisfied: fix a constraint $\sigma^{\prime}\in\Sigma_{1}=\tilde{P}(\vec{x}^{*})\rightarrow\exists\vec{y}^{\prime}~{}\tilde{R}(\tilde{\vec{t}})$ , derived from source constraint $\sigma=P(\vec{x})\rightarrow\exists\vec{y}~{}R(\vec{t})$ . Fix a fact $F^{\prime}=\tilde{P}(\vec{c}^{*})$ in $\mathcal{D}^{\prime}$ . By definition of $\mathcal{D}^{\prime}$ , $\vec{c}^{*}$ extends to a $\vec{c}$ satisfying $P$ in $\mathcal{D}$ . Thus, since $\mathcal{D}\models\Sigma$ , there is a fact $G=R(\vec{d})$ that holds in $\mathcal{D}$ with $d_{i}=c_{j}$ whenever $t_{i}=x_{j}$ . We can project to the invisible positions to get a fact $G^{\prime}=\tilde{R}(d_{j_{1}}\ldots d_{j_{n}})$ in $\mathcal{D}^{\prime}$ . We claim that $G^{\prime}$ is a witness for the satisfaction of $\sigma^{\prime}$ with respect to $F^{\prime}$ . Consider any variable $x$ exported from $F^{\prime}$ to position $j^{\prime}$ of $G^{\prime}$ where $x$ is mapped to value $c$ in $\vec{c}^{*}$ . Then in $\sigma$ , $x$ was exported to the corresponding invisible position $j$ in $R(\vec{y})$ , and from this we see that $d_{j}=c$ as required.

•

Now consider a constraint $\sigma^{\prime}\in\Sigma_{c_{{\mathsf{Crit}}}}={\mathsf{IsCrit}}(x)\rightarrow\exists\vec{y}^{*}~{}\tilde{R}(\tilde{\vec{t}^{*}})$ . Since ${\mathsf{IsCrit}}(x)$ holds only for $x=c_{{\mathsf{Crit}}}$ in $\mathcal{D}$ , we only have to verify that $\tilde{R}(\vec{e}^{*})$ holds for some $\vec{e}$ such that $e_{\ell}=c_{{\mathsf{Crit}}}$ (where $\ell$ is the position of $x$ in $\vec{y}^{*}$ ). Let us suppose that $\sigma^{\prime}$ was derived from source constraint $\sigma=P(\vec{x})\rightarrow\exists\vec{y}~{}R(\vec{t})$ where $j$ is the position of the exported in $\vec{y}$ and $i$ is the position of the exported variable in $\vec{t}$ . By the definition of $\Sigma_{c_{{\mathsf{Crit}}}}$ , we know that $P$ is a reachable predicate, and hence $P(\vec{d})$ must hold for some $\vec{d}$ in $\mathcal{D}$ and since $d_{j}$ is visible we have $d_{j}=c_{{\mathsf{Crit}}}$ . Because $\mathcal{D}\models\sigma$ we have $\tilde{R}(\vec{e})$ holds in $\mathcal{D}$ for some $\vec{e}$ such that $e_{i}=c_{{\mathsf{Crit}}}$ and thus $R(\vec{e}^{*})$ is the required witness for $\sigma^{\prime}$ .

•

Finally, we argue that $\mathcal{D}^{\prime}$ does not satisfy $\tilde{p}$ . Suppose by way of contradiction that $\mathcal{D}^{\prime}$ satisfies $\tilde{p}$ via homomorphism $h^{\prime}$ . Note that the variables of $p$ that do not occur in $\tilde{p}$ are those that occur only in visible positions within an atom of $p$ . We extend $h^{\prime}$ to a mapping $h$ from the variables of $p$ to $\mathcal{D}$ by mapping each such variable $x$ to $c_{{\mathsf{Crit}}}$ . We argue that $h$ is a homomorphism of $p$ to $\mathcal{D}$ . Consider an atom $R(\vec{t},\vec{t}^{\prime})$ of $p$ , where $\vec{t}$ correspond to the invisible positions. Suppose first that the corresponding atom of $\tilde{p}$ is of the form $\tilde{R}(\vec{t}^{*})$ where $\vec{t}^{*}$ is obtained from $\vec{t}$ by replacing any variable shared with a visible position by $v$ . We know that $\tilde{R}(h(t^{*}_{1})\ldots h(t^{*}_{j}))$ holds in $\mathcal{D}^{\prime}$ because $h$ is a homomorphism. Thus $R(h(t^{*}_{1}),\ldots h(t^{*}_{j}),\vec{e})$ holds in $\mathcal{D}$ for some $\vec{e}$ . By the properties of visible positions and the fact that $\mathcal{D}$ is a possible world for ${\mathsf{HOCWQ}}(\mathcal{D}_{0},\Sigma,p,{\cal G}(\mathcal{M}))$ , we see that each $e_{i}=c_{{\mathsf{Crit}}}$ . Thus $h$ not only preserves the atom $\tilde{R}(\vec{t}^{*})$ , but it also preserves the additional atom ${\mathsf{IsCrit}}(v)$ , since ${\mathsf{IsCrit}}(c_{{\mathsf{Crit}}})$ holds in $\mathcal{D}^{\prime}$ . thus $h$ is a homomorphism, contradicting the fact that $\mathcal{D}$ is a counterexample to ${\mathsf{HOCWQ}}(\mathcal{D}_{0},\Sigma,p,{\cal G}(\mathcal{M}))$ .

Since $\mathcal{D}^{\prime}$ extends $\mathcal{D}_{0}$ , satisfies the constraints $\tilde{\Sigma}$ , and does not satisfy the query $\tilde{p}$ , it is a counterexample to the entailment, completing this direction of the argument. ∎

Overview of PTime algorithm for entailment with ${\mathsf{UID}}$ s over a single fact.

At this point we have restricted to a CQ entailment problem for a set of ${\mathsf{UID}}$ s and a single fact. It was claimed in Kikot et al. [2011] that there is a polynomial time query rewriting for ${\mathsf{UID}}$ s, and from this it would easily follow that our entailment problem is in PTime (query evaluation is PTime when these is a single fact). However later work (footnote on page 38 of Bienvenu et al. [2018]) refers to flaws in this argument, and says that polynomial rewritability is open. We therefore give a direct proof that such an entailment problems are in PTime. This will proceed via several steps:

•

A reduction to the case of “binary schemas”: those where the arity of each predicate is at most $2$ .

•

Query simplification, which will reduce the query to a connected acyclic query.

•

Reduction to atomic entailment.

Reduction to binary schemas.

We begin by using verbatim an idea of Kikot et al. [2011], reducing to the same problem but when the input schema is binary. We do this via a standard reduction of general arity reasoning to binary reasoning, introducing predicates $R_{i}(t,v)$ for every relation $R$ of arity $n\geq 1$ and each $1\leq i\leq n$ ; informally these state that $v$ is the value in position $i$ of $n$ -tuple $t$ . We also introduce a predicate $R_{\exists}(t)$ for each predicate $R$ ; informally this states that there is some tuple $t$ in the predicate $R$ . We translate each ${\mathsf{UID}}$ $B(\vec{x})\rightarrow\exists\vec{y}~{}H(\vec{t})$ exporting a variable $x_{i}$ from position $i$ to position $j$ to a ${\mathsf{UID}}$ $B_{i}(t,x_{i})\rightarrow\exists t^{\prime}~{}H_{j}(t^{\prime},x_{i})$ . For each ${\mathsf{UID}}$ , $H(\vec{x})\rightarrow B(\vec{y})$ that is not exporting a variable, we create a rule $H_{\exists}(t)\rightarrow\exists t^{\prime}~{}B_{\exists}(t^{\prime})$ . We also create rules $R_{i}(t,x)\rightarrow R_{\exists}(t)$ and $R_{\exists}(t)\rightarrow\exists x~{}R_{i}(t,x)$ for each predicate $R$ and $1\leq i\leq n$ where $n$ is the arity of $R$ . Finally the query $p$ is transformed into $p^{\prime}$ where each conjunct $R(x_{1},\dots,x_{n})$ is transformed into the conjunction $R_{1}(t,x_{1})\land\dots\land R_{n}(t,x_{n})\land R_{\exists}(t)$ , for a fresh variable $t$ . Finally the database over the binary schema is built in the following way: for each fact $R(\vec{v})$ of the initial database, we create a fresh value $t$ and we add the conjunct $R_{i}(t,x)$ for $1\leq i\leq n$ where $n$ is the arity of $R$ and we also add $R_{\exists}(t)$ . Further details can be found in Kikot et al. [2011]. Note that, in the resulting problem, each frontier-0 rule produced has a body with an atom over a unary predicate.

Proposition 4.

The transformation above preserves query entailment.

Special form of the chase: annotated chase forest.

In the case of ${\mathsf{UID}}$ s the chase process applied to our single-fact instance $\mathcal{D}_{0}$ produces an in instance ${\mathsf{Chase}}_{\Sigma}(\mathcal{D}_{0})$ that will be infinite. However, it has a special shape that we can exploit. For the remainder of this section, by ${\mathsf{Chase}}_{\Sigma}(\mathcal{D}_{0})$ we consider an instance formed from a restricted chase sequence, in which a witness to a TGD $\phi(\vec{x})\rightarrow\exists\vec{y}~{}H(\vec{t})$ is added to instance $\mathcal{D}_{i}$ for binding $\vec{c}$ to $\vec{x}$ only if $\mathcal{D},\vec{c}\models\phi(\vec{x})\wedge\neg\exists\vec{y}~{}H(\vec{t})$ . It is known Fagin et al. [2005] that in Theorem 9 it suffices to consider such instances. The annotated chase is a node- and edge-labelled forest formed from ${\mathsf{Chase}}_{\Sigma}(\mathcal{D}_{0})$ as follows:

•

the nodes are the values of ${\mathsf{Chase}}(\mathcal{D})$

•

the node label of a value $v$ is the collection of unary predicates holding at $v$

•

an edge labeled by fact $F$ mentioning $v_{1}$ and $v_{2}$ connects a value $v_{1}$ to a value $v_{2}$ if $F$ holds in ${\mathsf{Chase}}(\mathcal{D})$ and $v_{2}$ is generated in the chase step that produces $F$ .

We can see that this graph is a forest where he roots are $c_{{\mathsf{Crit}}}$ (the value where ${\mathsf{IsCrit}}(c_{{\mathsf{Crit}}})$ holds) as well as some other trees rooted to reachable facts generated from frontier-[math] dependencies and thus rooted at elements $t$ where $R_{\exists}(t)$ holds for some $R$ . Further, since the chase is restricted, we can see that this graph has the unique adjoining label property: for each $v_{1}$ , for each predicate $P$ , there cannot be two nodes $v_{2},v^{\prime}_{2}$ adjacent to $v_{1}$ such that the edge $e$ from $v_{1}$ to $v_{2}$ and $e^{\prime}$ from $v_{1}$ to $v_{2}$ both are labelled with the same predicate and have $v_{1}$ in the same position. Furthermore, the restricted chase also ensures that the forest is composed of at most one tree per predicate since all the roots that are produced needs to be different.

First query simplification: eliminating forking pairs.

Given a CQ $Q$ , a pair of distinct atoms $A_{1}$ and $A_{2}$ sharing the same predicate and a variable at the same position (i.e. $q_{1}=R(x,z)$ and $q_{2}=R(x,y)$ or $q_{1}=R(z,x)$ and $q_{2}=R(y,x)$ ) is a forking pair of $Q$ . We say that a query $Q$ is non-forking when there are no forking pairs.

Proposition 5.

If a CQ $Q$ has a forking pair $A_{1}=R(x,z)$ and $A_{2}=R(x,y)$ and $Q^{\prime}$ is the query $Q$ where the variable $z$ is replaced with $y$ , then ${\mathsf{QEntail}}(\mathcal{D}_{0},\Sigma,Q)={\mathsf{QEntail}}(\mathcal{D}_{0},\Sigma,Q^{\prime})$

Proof.

Let $\mathcal{D}={\mathsf{Chase}}_{\Sigma}(\mathcal{D}_{0},\Sigma)$ . If $p^{\prime}$ holds in $\mathcal{D}$ , then clearly the same holds of $p$ . Conversely suppose $p$ holds in $\mathcal{D}$ via homomorphism $h$ , and suppose $h(y)\neq h(z)$ . This gives us a violation of the unique adjoining label property. ∎

Applying the proposition above, we can assume that $Q$ is non-forking. Without loss of generality, we can also assume that $Q$ is connected (otherwise we can test the entailment of each connected part).

Second simplification: reducing to acyclic queries

The CQ-graph. of a CQ $Q$ is the node- and edge-labelled graph whose nodes are the variables of $Q$ and whose edges are labelled with atoms of $Q$ such that:

•

an edge between variables labelled with $x$ and $y$ is labelled with the binary atoms containing both $x$ and $y$ ;

•

a node $x$ is labelled with the set of unary predicates in $Q$ containing $x$ .

The CQ-graph said embedded in some annotated chase forest $T$ if there is a homomorphism $h:\mathcal{A}\rightarrow T$ preserving edges, i.e. if there is an edge $x$ to $y$ labeled with $R(a,b)$ then $T$ should contain an $\mathcal{A}(x)$ to $\mathcal{A}(y)$ labeled with $R(\mathcal{A}(a),\mathcal{A}(b))$ and nodes, i.e. if there is a predicate $P(x)$ on the node $x$ then there should be $P(\mathcal{A}(x))$ in $T$ . The homomorphism $h$ is called an embedding of $Q$ in $T$ .

It is immediate from the completeness of the chase procedure that for any annotated chase forest $T$ for ${\mathsf{Chase}}_{\Sigma}(\mathcal{D})$ , a query is entailed if and only if its CQ-graph is embedded in $T$ . Our reduction to the case of a CQ with acyclic CQ-graph will depend heavily on the following observation:

Proposition 6.

Any embedding of a connected and non-forking CQ $Q$ into an annotated chase forest for ${\mathsf{Chase}}_{\Sigma}(\mathcal{D}_{0})$ must be injective.

Proof.

Let $Q$ be a connected and non-forking and let $h$ be an embedding. Let us prove by induction on the size of the path between $x$ and $y$ that $h(x)\neq h(y)$ when $x\neq y$ .

Two neighboring nodes cannot be sent to the same value. For a path of size $2$ , if we have $z$ such that $x,z,y$ forms a path in the CQ-graph of $Q$ then $h(x)$ has to be different than $h(y)$ otherwise the label from $x$ to $z$ and from $z$ to $y$ would be the same and there would be a forking pair in $Q$ .

Let $x=p_{1},p_{2},\dots,p_{k}=y$ with $k\geq 4$ be a path in the CQ-graph between $x$ and $y$ . By induction the $h(p_{i})$ for $i<k$ are all distinct and thus the distance between $h(x)$ and $h(p_{k-1})$ is at least $k-2$ hence $h(y)$ is at least at distance $k-3>0$ of $h(x)$ . ∎

Our reduction to the acyclic case follows immediately:

Corollary 5.

If a connected non-forking CQ $Q$ is entailed by $\Sigma$ over then the CQ-graph of $Q$ is acyclic.

Proof.

$Q$ is entailed The image of the CQ-graph through the injective homomorphism is a forest. ∎

Determining entailment for acyclic connected graphs.

We now give the final setp in our algorithm, which deals with deciding entailment of a connected, non-forking query $Q$ , which by Corollary 5 must have an acyclic CQ-graph. Given an acyclic connected undirected graph and any vertex $v$ of the graph, we can direct it be a tree with $v$ as the root. Thus for such a $Q$ having $n$ variables, the tree arrangements are the $n$ possible ways to root the CQ-graph of the query $Q$ . We are particularly interested in arrangements of $Q$ where the directionality from parent to child reflects the entailment structure relative to $\Sigma$ between atoms in the query. A tree arrangement $\mathcal{A}$ of $Q$ is faithfully entailed if for every variable $y$ in $Q$ with parent $x$ in the tree, there is an atom $A$ containing $x$ and not containing $y$ such that $A\wedge\Sigma$ entails $\exists y~{}B_{x,y}$ , where $B_{x,y}$ is the conjunction of all atoms whose variables are contained in $\{x,y\}$ ; in the case that $y$ is the root, we require $\Sigma$ alone to entail $\exists y~{}B_{x,y}$ .

In a faithfully-entailed tree arrangement, the conjunction of atoms holding at the root of the tree entails the existence of the whole tree. We can further find a single atom that entails the whole tree. A root-generating atom of a tree arrangement is an atom $A$ (not necessarily in $Q$ ) containing the root variable $r$ , such that $A\wedge\Sigma$ generates all atoms mentioning $r$ .

Proposition 7.

A faithfully entailed tree arrangement for $Q$ must have a root-generating atom.

Proof.

We know that $Q$ must hold in the chase of the initial fact under $\Sigma$ , and by Proposition 6 we know that there is an injective homomorphism $h$ from $Q$ to the chase. Consider the point in the chase process where value $h(r)$ is first generated. This occurs by firing some rule with an atom, where the head has either a binary atom $A(x,y)$ or a unary atom $B(x)$ . We consider the case where the atom is binary, and where the generated atom is $A(h(r),s)$ . In this case the fact $A(h(r),s)$ must generate every fact containing $r$ . Thus we can take the atom $A(r,w)$ , where $w$ is a fresh variable, as a root-generating atom. The case of unary atoms and the case where $r$ is in the second position of the fact is similar. ∎

Given a tree arrangement $T$ of $Q$ and variable $x$ of $Q$ , $T_{x}$ denotes the the restriction of $T$ to the variables that are descendants of $x$ in $T$ .

The main idea of our PTime algorithm is that it suffices to descend through the tree arrangement, checking some entailments for each parent-child pair in isolation.

Proposition 8.

There is a PTime algorithm taking as input a variable $x$ in a CQ $Q$ , a tree arrangement of $Q$ , and an atom $A$ containing $x$ such that the existential quantification of $A$ is entailed by $\Sigma$ , and determining whether $T_{x}$ is faithfully entailed and $A$ is a root-generating atom.

Proof.

We first check whether $A$ is a root-generating atom, using a PTime inference algorithm for ${\mathsf{UID}}$ s Cosmadakis et al. [1990]. We then consider each child $y$ of $x$ in the tree arrangement. We know that there is exactly one conjunct $B$ containing $x$ and $y$ . We check whether $A$ entails $\exists y~{}B$ , and then call the algorithm recursively for $y$ and $B$ . If each recursive call succeeds, the algorithm succeeds. ∎

From the prior proposition we get a PTime algorithm for the arrangement as a whole:

Proposition 9.

There is a PTime algorithm taking a tree arrangement of CQ $Q$ , and an atom $A$ containing the root of the arrangement, and determines whether the whole tree arrangement can be faithfully entailed and $A$ is a root-generating atom.

Proof.

We first need to check that $A$ is entailed, which amounts to checking that $\Sigma\models{\mathsf{IsCrit}}(c)\rightarrow\exists y~{}A$ . As before this can be done using Cosmadakis et al. [1990]. We then utilize the algorithm of Proposition 8. ∎

Note that Proposition 9 gives a polynomial time algorithm for checking whether a tree arrangement can be faithfully entailed. We can apply the algorithm of the proposition with every possible unary and binary atom $A$ containing the root variable. In the binary case, we consider all atoms containing the root variable and an additional fresh variable.

Putting it all together.

Putting together our reduction to ${\mathsf{UID}}$ -entailment (Proposition 3), our schema simplification (Proposition 4) the query simplifications (the reduction to connected CQs, Proposition 5, and Corollary 5), and our PTime algorithm for simplified queries (Proposition 9) we obtain the proof of Theorem 5.

Appendix B Detailed Proofs from Section 4: Lower Bounds for Disclosure

B.1 Proof of the first part: Theorem 6:

2ExpTime-hardness for ${\mathsf{IncDep}}$ and ${\mathsf{GuardedMap}}$ without arity bound

Recall the first part of Theorem 6:

${\mathsf{Disclose_{C}}}({\mathsf{IncDep}},{\mathsf{GuardedMap}})$ is 2ExpTime-hard.

Recall that Theorem 1 relates disclosure to a ${\mathsf{HOCWQ}}$ problem on a very simple instance. Also recall from Section 3 the intuition that such a problem amounts to a classical entailment problem for a CQ over a very simple instance, using the source dependencies and ${\mathsf{SCEQrule}}$ s: of the form $\phi(\vec{x})\rightarrow x=c_{{\mathsf{Crit}}}$ , where $\phi$ will be the body of a mapping. We show here how to simulate the run of an alternating ExpSpace Turing machine $\mathscr{T}$ without explicitly using ${\mathsf{SCEQrule}}$ s, instead using inclusion dependencies as source constraints coupled with guarded mappings. An alternating Turing machine $\mathscr{T}$ is a $6$ -tuple $(Q,\Sigma,\delta_{\alpha},\delta_{\beta},q_{0},g)$ where:

•

$Q$ is the finite set of states

•

$\Sigma$ is the finite tape alphabet

•

$\delta_{\alpha}$ and $\delta_{\beta}$ are functions from $Q\times\Sigma$ to $Q\times\Sigma\times\{L,R\}$

•

$q_{0}\in Q$ is the initial state

•

$g$ is a function from $Q$ to $\{accept,reject,\forall,\exists\}$ that specifies the type of each state.

We assume that $\mathscr{T}$ always alternates between existential and universal states, and that there is a unique final state, that can be reached only if the head is in the first cell and contains a specific symbol. All of these assumptions can be made without loss of generality. If $\mathscr{T}$ is in a configuration where whose state $q$ is such that $g(q)=accept$ , the configuration is said to be accepting. If $\mathscr{T}$ is in a configuration where whose state $q$ is such that $g(q)=\forall$ , the configuration is said to be accepting if its $\alpha$ and $\beta$ successors (obtained after applying $\delta_{\alpha}$ or $\delta_{\beta}$ ) are accepting. If $\mathscr{T}$ is in a configuration whose state $q$ is such that $g(q)=\exists$ , the configuration is said to be accepting if its $\alpha$ -successor or its $\beta$ -successor is accepting. A more thorough introduction to Turing machines can be found in Papadimitriou [1994].

We first present the reduction, and show its correctness in the next subsection.

B.2 The Reduction

We will create constraints and mappings that will serve to perform the following tasks:

•

generate addresses for cells of $\mathscr{T}$ in such a way that one can check whether two addresses are consecutive in a guarded way. The same addresses will be used for all the configurations. This will be done by a mapping creating $k$ copies of two individuals that represent [math] and $1$ , along with inclusion dependencies that perform permutations and generate $2^{k}$ addresses;

•

encode the content of a cell, the position of the head, and the state of the head: for each cell, we store a vector whose length is the size of $(\Sigma\cup\{\flat\})\times(Q\cup\bot)$ . Each position corresponds to an element $(l,s)$ of that set; we will arrange that the position contains $c_{{\mathsf{Crit}}}$ if and only if the cell contains $l$ , and either the head is over that cell and is in state $s$ , or the head is not over that cell and $s=\flat$ . All values are first freshly instantiated by inclusion dependencies, and mappings are then responsible for unifying the correct positions with $c_{{\mathsf{Crit}}}$ ;

•

ensure that the tape that is associated with a successor of a configuration can be obtained by a transition of the Turing machine: this is also performed by using a mapping to enforce the correct positions of the cell to be unified with $c_{{\mathsf{Crit}}}$ ;

•

check that configurations are accepting: this is the case either when the corresponding tape is in a final accepting state, or when it is in an existential state and one of the two successor configurations is accepting, or it is in a universal state, and both successor configurations are accepting.

Let us describe the source signature. For each predicate, we will explain what feature of the ATM $\mathscr{T}$ it should represent in the appropriate instance generated by the constraints. By “the appropriate instance”, we mean the visible chase of the initial instance over the source constraints and mappings: this was introduced after Theorem 9, and it was noted that it is the canonical instance for the source and targets to consider for disclosure.

We use $\mathbf{y^{1,k}}$ to represent a tuple $(y^{1},\ldots,y^{k})$ , and $\mathbf{y^{k}}$ to represent the tuple $(y,\ldots,y)$ of size $k$ .

•

${\mathsf{Children}}_{\forall}(c,c_{\alpha},c_{\beta},ac,ac_{\alpha},ac_{\beta},\mathbf{y_{0}^{1,k}},\mathbf{y_{1}^{1,k}},r,z,y_{0},y_{1})$ . The intended meaning is that a configuration $c$ is universal and has as children $c_{\alpha}$ and $c_{\beta}$ , and that the acceptance bit of $c$ is $ac$ , of $c_{\alpha}$ is $ac_{\alpha}$ and of $c_{\beta}$ is $ac_{\beta}$ . The last four positions are placeholders: $r$ for the root of the tree of configurations, $z$ for $c_{{\mathsf{Crit}}}$ , $y_{0}$ for a value representing [math] and $y_{1}$ for a value representing $1$ .

•

${\mathsf{Children}}_{\exists}(c,c_{\alpha},c_{\beta},ac,ac_{\alpha},ac_{\beta},\mathbf{y_{0}^{1,k}},\mathbf{y_{1}^{1,k}},r,z,y_{0},y_{1})$ : same intended meaning, except that $c$ is existential.

•

${\mathsf{Cell}}(c_{p},c_{n},\mathbf{y^{1,k}},\mathbf{v},\mathbf{v_{prev}},\mathbf{v_{next}},r,z,y_{0},y_{1})$ with intended meaning that the cell of address $\mathbf{y^{1,k}}$ of the tape represented by $c_{n}$ has a content represented by $\mathbf{v}$ , while the previous cell has a content represented by $\mathbf{v_{prev}}$ and the next cell has content represented by $\mathbf{v_{next}}$ . The last four positions are placeholders for the root of the tree of configurations, $c_{{\mathsf{Crit}}}$ , a value representing [math] and a value representing $1$ .

•

${\mathsf{Cell}}_{i}^{c}(c,\mathbf{y^{1,k}},x,z,y_{0},y_{1})$ with intended meaning that the cell of address $\mathbf{y^{1,k}}$ in configuration $c$ contains $x$ at the $i^{\mathrm{th}}$ position of the representation of its content. ${\mathsf{Cell}}_{i}^{p}$ and ${\mathsf{Cell}}_{i}^{n}$ play similar roles for the cell before and after the cell of address $\mathbf{y^{1,k}}$ .

•

${\mathsf{GenAddr}}$ is an auxiliary predicate used to generated an exponential number of addresses.

•

${\mathsf{succ}}_{\alpha}(c_{p},c_{n})$ states that $c_{n}$ is the $\alpha$ -successor of $c_{p}$ (and similarly for $\beta$ )

Below we will always use the symbol $\mathscr{Q}$ to range over $\{\forall,\exists\}$ .

The structure generated by the inclusion dependencies is represented Figure 1. Atoms are represented by geometric shapes in the inside of which are arguments (some are omitted to ease the reading). The ${\mathsf{Children}}_{\mathscr{Q}}$ atoms form a tree shaped structure, and induce a tree structure on the configuration identifiers: for instance, $c$ is the parent of $\alpha$ and $\beta$ . ${\mathsf{Cell}}$ atoms are associated with a configuration identifier (for instance, those represented are associated with $\beta$ ), and has the parent configuration identifier to ensure guardedness of the mappings used in the following reduction. Note that the elements used to describe the cell’s addresses ( $y_{0}$ and $y_{1}$ ) also appear in the ${\mathsf{Children}}_{\mathscr{Q}}$ atoms, to ensure guardedness.

Initialization.

We first define a mapping $\mathcal{T}_{{\mathsf{Init}}}(x)$ , introducing some elements in the visible chase. The definition of this mapping is:

[TABLE]

Generation of the tree of configuration.

$\alpha$ -successors have themselves $\alpha$ - and $\beta$ -successors, and are existential if their parent is universal:

[TABLE]

And similarly for ${\mathsf{Children}}_{\exists}$ and for the $\beta$ -successor.

Universal and existential acceptance condition.

If both successors of a universal configuration $n$ are accepting, so is $n$ . We create a mapping $\mathcal{T}_{\forall}(x)$ with definition:

[TABLE]

If the $\alpha$ -successor of an existential configuration $n$ is accepting, so is $n$ . We create a mapping $\mathcal{T}_{\exists,\alpha}(x)$ with definition:

[TABLE]

We create a similar mapping $\mathcal{T}_{\exists,\beta}$ for the $\beta$ -successor.

Tape representation and consistency of tapes.

We now focus on the representation of the tape and its consistency. We generate $2^{k}$ addresses and associated values:

[TABLE]

${\mathsf{GenAddr}}$ will generate addresses to represent the tape associated with its fifth argument. To emphasize this, we use the letter $n$ (as node) at this position, while the fourth argument contains its parent configuration, denoted by $p$ .

[TABLE]

For each address, we initialize its content (as well as the content of the previous and next cells) by fresh values $\mathbf{v},\mathbf{v_{prev}},\mathbf{v_{next}}$ .

[TABLE]

Note that the values $\mathbf{v}$ , $\mathbf{v_{prev}}$ and $\mathbf{v_{next}}$ are vectors of length the size of $(\Sigma\cup\{\flat\})\times(Q\cup\bot)$ . In particular, we use the notation $\mathbf{l_{i}}(x)$ to represent a vector of same length, composed of fresh variables, except for the position $i$ , that contains $x$ .

We now use mappings to force some of these values to be equal to $c_{{\mathsf{Crit}}}$ . Each position of $\mathbf{v}$ represents an element of $(\Sigma\cup\{\flat\})\times(Q\cup\bot)$ , and we will enforce exactly one of these positions to contain $c_{{\mathsf{Crit}}}$ . If the head of the Turing machine is on the cell represented, then the position of $v$ corresponding to $(a,q)$ where $a$ is the letter in the cell and $q$ the state of the Turing machine, will contain $c_{{\mathsf{Crit}}}$ . Otherwise, the position of $v$ corresponding to $(a,\bot)$ will contain $c_{{\mathsf{Crit}}}$ .

As we store the content of a cell in several atoms, we must ensure that the tape associated with a configuration is consistent, by checking that $\mathbf{v_{next}}$ is consistent with $\mathbf{v}$ from the next cell. To ensure guardedness, we first introduce auxiliary predicates ${\mathsf{Cell}}_{i}^{c},{\mathsf{Cell}}_{i}^{p}$ and ${\mathsf{Cell}}_{i}^{n}$ that define the content of the $i^{\mathrm{th}}$ bit of the value of the current, previous and next cells:

[TABLE]

We now introduce the definition of a mapping $\mathcal{T}_{{\mathsf{data}}_{n}}(x)$ which ensures the consistency of the tape content (note that the first atom is a guard):

[TABLE]

$\mathcal{T}_{{\mathsf{data}}_{p}}(x)$ is defined similarly to deal with the previous cell.

We enforce the tape of the initial configuration to have the head of the Turing machine on the first cell (and assume w.l.o.g that this is represented by the first position of $\mathbf{v}$ containing $c_{{\mathsf{Crit}}}$ ) and all the other cells containing $\flat$ (and we assume w.l.o.g that this is represented by the second position of $\mathbf{v}$ containing $c_{{\mathsf{Crit}}}$ ). We thus create the mappings $\mathcal{T}_{tape_{i}}(x)$ , for the the first cell, having definition:

[TABLE]

and we introduce the mappings $\mathcal{T}_{tape_{o}}(x)$ , for all the other cells, having definition:

[TABLE]

Note that this data is associated with the children of the root (as $p$ is both in the fourth and last minus three positions of the atoms), and not with the root itself, due to the choice of keeping in ${\mathsf{Cell}}$ the identifier of the parent of the considered configuration.

We then check that the tape associated with the $\alpha$ -successor of a configuration is indeed obtained by applying an $\alpha$ -transition. This is done by noticing that the value of each cell of the $\alpha$ -successor is deterministically defined by the value of the cell and its two neighbors in the original configuration (the neighbors are necessary to know whether the head of the Turing machine is now in the considered cell). To ensure guardedness, we first define a predicate marking $\alpha$ -successors (and similarly for $\beta$ -successors):

[TABLE]

Let us consider a cell of address $\mathbf{b^{1,k}}$ in $c_{p}$ . We assume that its content is represented by $i$ , while the content of its left (resp. right) neighbor is represented by $j$ (resp. $k$ ). We represent the fact that this implies that the content of the cell of address $\mathbf{b^{1,k}}$ is $w$ in the $\alpha$ -successor of $c_{p}$ by the following mapping $\mathcal{T}^{\alpha}_{i,j,k\rightarrow w}(x)$ :

[TABLE]

Note that the above formulation requires the content of the previous and of the next cells, which makes this mappings not applicable when $\mathbf{b^{1,k}}$ is the address of either the first or the last cell. We thus add rules to specifically deal with these two cases (that looks at the content of the current and next cell when $\mathbf{b^{1,k}}$ is a vector of $y_{0}$ , and at the content of current and previous cell when $\mathbf{b^{1,k}}$ is a vector of $y_{1}$ ). Note that there is only polynomially such mappings to be built. And we finally create a mapping $\mathcal{T}_{accept}(x)$ enforcing that configurations whose tape is in an accepting state (which we assume w.l.o.g. corresponds to the case where the first cell contains the $l^{\mathrm{th}}$ bit) are declared as accepting.

[TABLE]

The policy query is

[TABLE]

We will show that this policy query is disclosed if and only if the original Turing machine accepts on the empty tape.

B.3 Proof of Correctness

We show that the policy is disclosed if and only if $\mathscr{T}$ accepts on the empty tape. By Theorem 1, the policy is disclosed if and only if the corresponding ${\mathsf{HOCWQ}}$ problem has a positive answer. Further, this holds if and only if the policy query holds on the result of the visible chase (introduced after Theorem 9). We thus focus on showing the equivalence of the acceptance of the empty tape by $\mathscr{T}$ and the satisfaction of the policy in the visible chase.

Let us start by describing some relationships between the visible chase of $\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ and the run of $\mathscr{T}$ . As $\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ contains $\mathcal{T}_{{\mathsf{Init}}}(c_{{\mathsf{Crit}}})$ , there is in the visible chase the atom

[TABLE]

where all individuals but $c_{{\mathsf{Crit}}}$ are nulls.

Definition 1 (Tape Representation).

Let $T$ be a tape (with head position and state included) of $\mathscr{T}$ . A representation of $T$ is a set of atoms

[TABLE]

where $\mathbf{a}$ ranges over the binary representations of the addresses of $T$ , and such that for any cell of $T$ the following holds:

•

for any $a$ , $\mathbf{v}$ contains fresh nulls except for the bit that represents the content of $T$ at address $\mathbf{a}$ , where it contains $c_{{\mathsf{Crit}}}$

•

for any $a$ except the representation of the leftmost cell, $\mathbf{v_{prev}}$ contains fresh nulls except for the bit that represents the content of $T$ at address $\mathbf{a}-1$ ; in this bit it contains $c_{{\mathsf{Crit}}}$ ( $\mathbf{v_{prev}}$ * exclusively contains fresh nulls for the leftmost cell)*

•

for any $a$ except the representation of the rightmost cell, $\mathbf{v_{next}}$ contains fresh nulls except for the bit that represents the content of $T$ at address $\mathbf{a}+1$ ; on this bit it contains $c_{{\mathsf{Crit}}}$ ( $\mathbf{v_{next}}$ * exclusively contains fresh nulls for the rightmost cell)*

In that case, $c_{n}$ is called a representative of $T$ .

Lemma 1.

$c_{\alpha}$ * and $c_{\beta}$ , as defined above Definition 1, are representatives of the initial tape.*

Proof.

We show the result for $c_{\alpha}$ , the same reasoning being applicable to $c_{\beta}$ . As the atom

[TABLE]

belongs to the visible chase, atoms of the shape

[TABLE]

for any vector $a_{1},\ldots,a_{k}$ with $a_{i}\in\{y_{0},y_{1}\}$ for any $i$ , are generated, where all nulls from $\mathbf{v},\mathbf{v_{prev}}$ and $\mathbf{v_{next}}$ are fresh (thanks to the rules involving ${\mathsf{GenAddr}}$ ). As the first argument and the ante-ante-penultimate argument of such an atom are equal, the definition of $\mathcal{T}_{{\mathsf{tape}}_{i}}(x)$ maps to the atom of address $y_{0},\ldots,y_{0}$ , and the body of $\mathcal{T}_{{\mathsf{tape_{o}}}}(x)$ maps to all the other atoms. Applying $\mathcal{T}_{{\mathsf{data}}_{n}}$ and $\mathcal{T}_{{\mathsf{data}}_{p}}$ then ensures that $c_{\alpha}$ is a representative for the initial tape, as no other mapping may merge a term of these atoms. ∎

Lemma 2.

If $c_{p}$ is a representative of a tape $T$ and if the visible chase contains

[TABLE]

then $\alpha$ (resp. $\beta$ ) is a representative of the tape $T_{\alpha}$ (resp. $T_{\beta}$ ) obtained by applying the $\alpha$ -transition (resp. $\beta$ -transition) applicable to $T$ .

Proof.

We show the result for the $\alpha$ -successor, the same reasoning being applicable for the $\beta$ -successor. As the visible chase contains

[TABLE]

it also contains atoms of the shape:

[TABLE]

for any vector $a_{1},\ldots,a_{k}$ , where all nulls from $\mathbf{v},\mathbf{v_{prev}}$ and $\mathbf{v_{next}}$ are fresh. Note that $c_{p}$ is necessary distinct from $c_{root}$ (as it is the representative of a tape). Hence neither $\mathcal{T}_{{\mathsf{tape}}_{i}}(x)$ nor $\mathcal{T}_{{\mathsf{tape_{o}}}}(x)$ may unify a term with $c_{{\mathsf{Crit}}}$ . As $c_{p}$ is a representative of $T$ , for any address, if the $i^{\mathrm{th}}$ bit of $\mathbf{v}$ represents the actual value in $T$ at address $ad$ , then the visible chase contains ${\mathsf{Cell}}_{i}^{c}(c_{p},\mathbf{b^{1,k}},c_{{\mathsf{Crit}}})$ where $\mathbf{b^{1,k}}$ is the binary encoding of $ad$ . Similarly, ${\mathsf{Cell}}_{i}^{n}(c_{p},\mathbf{b^{1,k}},c_{{\mathsf{Crit}}})$ and ${\mathsf{Cell}}_{i}^{p}(c_{p},\mathbf{b^{1,k}},c_{{\mathsf{Crit}}})$ also belong to the visible chase where applicable. Then for all addresses, an application of the relevant mapping of the shape $\mathcal{T}^{\alpha}_{i,j,k\rightarrow w}(x)$ merges the null at the position representing the content of $T_{\alpha}$ with $c_{{\mathsf{Crit}}}$ . Applying $\mathcal{T}_{{\mathsf{data}}_{n}}$ and $\mathcal{T}_{{\mathsf{data}}_{p}}$ then ensures that $\alpha$ is a representative for $\mathcal{T}_{\alpha}$ . ∎

Wrapping up the previous two lemmas, we get that there is a tree structure in the visible chase that corresponds exactly to the tree of configurations of the run of $\mathscr{T}$ : the two individuals $c_{\alpha}$ and $c_{\beta}$ are representatives of the initial configuration, and their children (which are the individuals at the second and third individuals in the ${\mathsf{Children}}$ atom in which they appear at the first position) are representatives of the configurations that can be reached with an $\alpha$ or $\beta$ transition. It remains to check that the argument representing the accepting status of a configuration are correctly set, which is the topic of the following lemma.

Lemma 3.

If $c_{p}$ is the representative of a tape, there is in the visible chase an atom of the shape

[TABLE]

if and only if $\mathscr{T}$ accepts on $T$ .

Proof.

Let $T$ be a tape of representative $c_{p}$ . There are four cases in which $\mathscr{T}$ accepts on $T$ :

•

the state of $\mathscr{T}$ is final in $T$ : this is the case if and only if $\mathcal{T}_{{\mathsf{accept}}}$ merges the fourth argument of ${\mathsf{Children}}_{\mathscr{Q}}(c_{p},\alpha,\beta,ac,ac_{\alpha},ac_{\beta},\mathbf{y_{0}^{k}},\mathbf{y_{1}^{k}},c_{root},c_{{\mathsf{Crit}}},y_{0},y_{1})$ with $c_{{\mathsf{Crit}}}$

•

the state of $\mathscr{T}$ is universal in $T$ and both its successors are accepting: by induction assumption (on the number of transitions that need to be applied to prove acceptance of a tape), both accepting bits of $\alpha$ and $\beta$ are unified with $c_{{\mathsf{Crit}}}$ , and thus the accepting bit of $c_{p}$ is unified with $c_{{\mathsf{Crit}}}$ thanks to $\mathcal{T}_{\forall}$

•

the state of $\mathscr{T}$ is existential in $T$ and its $\alpha$ -successor is accepting: by induction assumption, the accepting bit of $\alpha$ is unified with $c_{{\mathsf{Crit}}}$ , and thus the accepting bit of $c_{p}$ is unified with $c_{{\mathsf{Crit}}}$ thanks to $\mathcal{T}_{\exists,\alpha}$

•

similar case, with the $\beta$ -successor.

∎

Let us now use the above lemmas to show that $\mathscr{T}$ accepts if and only if the policy query $p$ holds in the visible chase. If $\mathscr{T}$ accepts, let us consider an accepting run of $\mathscr{T}$ . From Lemmas 1 and 2, we can build a configuration tree that contains a representative for all the tapes that are involved in this run. From Lemma 3, the accepting bits of the representative are set adequately, and the policy query holds in the visible chase.

Conversely, let us consider a visible chase sequence such that the policy query holds in its result. Let us first remark that the argument of the policy query appearing in the first position is equal to the argument in the ante-ante-penultimate position. This implies that none of the witnesses of mappings other than $\mathcal{T}_{\mathsf{Init}}$ need to be applied in order to entail the policy query, which can be seen from the following three facts: (i) none of the positions that may contain a configuration identifier may be unified with $c_{{\mathsf{Crit}}}$ ; (ii) all mappings contain configuration identifiers (iii) only the witness associated with $\mathcal{T}_{\mathsf{Init}}$ may generate an atom of the shape

[TABLE]

This implies that in the visible chase sequence entailing the policy query, we start by introducing $\alpha$ and $\beta$ as in Lemma 1. Let us now consider the smallest set $S$ of configuration representatives that fulfills the following conditions:

•

$\alpha$ and $\beta$ are in $S$

•

if $c$ is in $S$ and the tape associated with $c$ is in a universal state, then both successors of $c$ are in $S$

•

if $c$ is in $S$ and the tape associated with $c$ is an existential state, then a successor of $c$ having its acceptance bit equal to $c_{{\mathsf{Crit}}}$ is in $S$ .

By the previous lemmas, there exists an accepting run of $\mathscr{T}$ going exactly through the represented configurations.

B.4 Second Part of Proof of Theorem 6: ExpTime-hardness for Inclusion Dependencies and Guarded Maps in Bounded Arity

Recall the statement of the second part of Theorem 6:

${\mathsf{Disclose_{C}}}({\mathsf{IncDep}},{\mathsf{GuardedMap}})$ is ExpTime-hard even in bounded arity.

In the proof of Theorem 6, we used predicates of unbounded arity only to generate exponentially many cell addresses. Here, we use only $k$ addresses, and can encode their content through $k$ predicates ${\mathsf{Cell}}_{1}$ to ${\mathsf{Cell}}_{k}$ . However, the proof follows the same line of argumentation as in Theorem 6.

Let us describe the source signature.

•

${\mathsf{Children}}_{\forall}(c,c_{\alpha},c_{\beta},ac,ac_{\alpha},ac_{\beta},r,z)$ states that a configuration $c$ is universal and has as children $c_{\alpha}$ and $c_{\beta}$ , and that the acceptance bit of $c$ is $ac$ , of $c_{\alpha}$ is $ac_{\alpha}$ and of $c_{\beta}$ is $ac_{\beta}$ . The last two positions are placeholders for the root of the tree of configurations, and $c_{{\mathsf{Crit}}}$ .

•

${\mathsf{Children}}_{\exists}(c,c_{\alpha},c_{\beta},ac,ac_{\alpha},ac_{\beta},r,z)$ : same meaning, except that $c$ is existential.

•

${\mathsf{Cell}}^{l}(c_{p},c_{n},\mathbf{v},z)$ states that the cell of address $l$ of the tape represented by $c_{n}$ has a content represented by $\mathbf{v}$ . The last position is a placeholder for $c_{{\mathsf{Crit}}}$ .

•

${\mathsf{Cell}}_{i}^{l}(c,x,z)$ states that the cell of address $l$ in configuration $c$ contains $x$ at the $i^{\mathrm{th}}$ position of the representation of its content.

•

${\mathsf{succ}}_{\alpha}(c_{p},c_{n})$ states that $c_{n}$ is the $\alpha$ -successor of $c_{p}$ (and similarly for $\beta$ )

The symbol $\mathscr{Q}$ always ranges over $\{\forall,\exists\}$ .

Initialization.

We first define a mapping $\mathcal{T}_{{\mathsf{Init}}}(x)$ , introducing some elements in the visible chase, whose definition is:

[TABLE]

Generation of the tree of configuration.

$\alpha$ -successors have themselves $\alpha$ - and $\beta$ -successors, and are existential if their parent is universal:

[TABLE]

And similarly for ${\mathsf{Children}}_{\exists}$ and for the $\beta$ -successor.

Universal and existential acceptance condition.

If both successors of a universal configuration $n$ are accepting, so is $n$ . We create a mapping $\mathcal{T}_{\forall}(x)$ having definition:

[TABLE]

If the $\alpha$ -successor of an existential configuration $n$ is accepting, so is $n$ . We create a mapping $\mathcal{T}_{\exists,\alpha}(x)$ having definition:

[TABLE]

We create a similar mapping $\mathcal{T}_{\exists,\beta}$ for the $\beta$ -successor.

Tape representation and consistency of tapes.

We now focus on the representation of the tape and its consistency. For each configuration, we generate $k$ cells whose content is initialized freshly:

[TABLE]

and similarly for existential configurations and for the $\beta$ -successors. Note that the values $\mathbf{v}$ , $\mathbf{v_{prev}}$ and $\mathbf{v_{next}}$ are again vectors of length the size of $(\Sigma\cup\{\flat\})\times(Q\cup\bot)$ . We again use the notation $\mathbf{l_{i}}(x)$ to represent a vector of same length, composed of fresh variables, except for the position $i$ , which contains $x$ .

To ensure guardedness, we first introduce auxiliary predicates ${\mathsf{Cell}}_{i}^{c},{\mathsf{Cell}}_{i}^{p}$ and ${\mathsf{Cell}}_{i}^{n}$ that define the content of the $i^{\mathrm{th}}$ bit of the value of the current, previous and next cells:

[TABLE]

We enforce that the tape of the initial configuration has the head of the Turing machine on the first cell (and assume w.l.o.g that this is represented by the first position of $\mathbf{v}$ containing $c_{{\mathsf{Crit}}}$ ) with all the other cells containing $\flat$ . We also assume w.l.o.g that the other cells containing $\flat$ is represented by the second position of $\mathbf{v}$ containing $c_{{\mathsf{Crit}}}$ . We thus create the mappings $\mathcal{T}_{tape_{i}}(x)$ , for the the first cell, having definition:

[TABLE]

and $\mathcal{T}^{l}_{tape_{o}}(x)$ , for all the other cells ( $2\leq l\leq n$ ), with definition:

[TABLE]

Note that this data is associated with the children of the root (as $c_{p}$ is both in the first and the penultimate positions of the atoms), and not with the root itself, due to the choice of keeping in ${\mathsf{Cell}}^{l}$ the identifier of the parent of the considered configuration.

We then check that the tape associated with the $\alpha$ -successor of a configuration is indeed obtained by applying an $\alpha$ -transition. This is done by noticing that the value of each cell of the $\alpha$ -successor is determined by the value of the cell and its two neighbors in the original configuration (the neighbors are necessary to know whether the head of the Turing machine is now in the considered cell). To ensure guardedness, we first define a predicate marking $\alpha$ -successors (and similarly for $\beta$ -successors):

[TABLE]

Let us consider a cell of address $l$ in $c_{p}$ . We assume that its content is represented by $i$ , while the content of its left (resp. right) neighbor is represented by $j$ (resp. $k$ ). We represent the fact that this implies that the content of the cell of address $l$ is $w$ in the $\alpha$ -successor of $c_{p}$ by the following mapping $\mathcal{T}^{\alpha}_{i,j,k\rightarrow w}(x)$ :

[TABLE]

As in the non-bounded case, the first (resp. last) cell should be dealt with separately, as there is no content in the (non-existent) previous (resp. next) cell. And we finally create a mapping $\mathcal{T}_{accept}(x)$ enforcing that configurations whose tape is in an accepting state (which we assume w.l.o.g. corresponds to the case where the first cell contains the $l_{f}^{\mathrm{th}}$ bit) are declared as accepting.

[TABLE]

The policy is

[TABLE]

We can verify that this policy is disclosed if and only if the original Turing machine accepts on the empty tape, using a similar reasoning to the unbounded case.

B.5 Final Part of Proof of Theorem 6:

Reduction from ${\mathsf{GTGD}}$ and ${\mathsf{ProjMap}}$ to ${\mathsf{IncDep}}$ and ${\mathsf{GuardedMap}}$

Theorem 6 states a 2ExpTime lower bound for general arity and an ExpTime lower bound in bounded arity for two different cases. The first case was when the source constraints are ${\mathsf{IncDep}}$ s and the mappings are guarded. The previous sections of the appendix have gone through the proofs of this case in detail. We now finish the proof of Theorem 6 showing:

${\mathsf{Disclose_{C}}}({\mathsf{GTGD}},{\mathsf{ProjMap}})$ is 2ExpTime-hard, and is ExpTime-hard even in bounded arity.

The proof of both of these assertions follows directly from Corollary 4 (the general reduction of maps presented in section A.2).

We have seen that ${\mathsf{IncDep}}$ and ${\mathsf{GuardedMap}}$ reduces to ${\mathsf{GTGD}}$ and ${\mathsf{ProjMap}}$ , therefore we have the lower bound for ${\mathsf{Disclose}}({\mathsf{GTGD}},{\mathsf{ProjMap}},p)$ from the lower bound ${\mathsf{Disclose}}({\mathsf{IncDep}},{\mathsf{GuardedMap}},p)$

B.6 Proof of Theorem 7: ExpTime-hardness for Inclusion Dependencies and Atomic Maps,

and for ${\mathsf{LTGD}}$ s with Projection Maps

Recall the statement of Theorem 7:

${\mathsf{Disclose_{C}}}({\mathsf{IncDep}},{\mathsf{AtomMap}})$ and ${\mathsf{Disclose_{C}}}({\mathsf{LTGD}},{\mathsf{ProjMap}})$ are both ExpTime-hard.

We first focus on the case of ${\mathsf{Disclose_{C}}}({\mathsf{IncDep}},{\mathsf{AtomMap}})$ . We adapt the construction used for PSpace-hardness of entailment with ${\mathsf{IncDep}}$ s Casanova et al. [1984] to show ExpTime-hardness for ${\mathsf{IncDep}}$ source constraints and atomic maps. We start with an alternating (rather than deterministic in Casanova et al. [1984]) Turing machine $\mathcal{M}$ and an input $x$ , and consider the problem asking whether there exists a halting computation of $\mathcal{M}$ that uses at most $|x|$ cells. As in the original reduction, we use inclusion dependencies to simulate the transition relation of $\mathcal{M}$ . The adaptation lies in the additional use of a fresh position holding a configuration identifier, and the generation of a tree of configurations, as in the reduction presented in Theorem 6.

Let us describe the signature:

•

${\mathsf{Config}}^{\mathscr{Q}}(c,ac,\mathbf{v},z)$ states the configuration $c$ has quantification $\mathscr{Q}$ , has accepting bit $ac$ , a tape represented by $\mathbf{c}$ . the last argument will always hold $c_{{\mathsf{Crit}}}$ in the visible chase;

•

${\mathsf{Transition}}^{\mathscr{Q}}_{t_{\alpha},t_{\beta}}(c,ac,\mathbf{v},\alpha,ac_{\alpha},\beta,ac_{\beta},z)$ names two successors configurations $\alpha$ and $\beta$ , with the configurations consisting of acceptance bits $ac_{\alpha}$ and $ac_{\beta}$ , which are obtained from $c$ by applying transitions $t_{\alpha}$ and $t_{\beta}$ .

Let us turn to the description of $\mathbf{v}$ and subsequently $t_{\alpha}$ . $\mathbf{v}$ represents the content of the tape: for each position of the tape, there is an argument for each pair of $\Sigma\times(Q\cup\{\bot\})$ . Intuitively, this argument is equal to $c_{{\mathsf{Crit}}}$ if and only if the position contains the corresponding letter and head, and a fresh null otherwise.

We introduce a mapping that initializes the tape:

[TABLE]

As in the proof of Theorem 6, we propagate the acceptance information using mappings. For a universal state, we use a mapping with definition:

[TABLE]

For an existential state, we use two mappings with definitions:

[TABLE]

and

[TABLE]

As before, we notice that the state of a cell after applying a transition is deterministically defined by its content as well as the content of its left and right neighbor. The following inclusion dependency states that from any configuration, we can try to apply all possible transitions to generate the $\alpha$ - and $\beta$ -successors:

[TABLE]

We now generate the tape associated with the $\alpha$ -transition (and similarly for the $\beta$ -transition):

[TABLE]

where $\vec{\mathscr{Q}}$ denotes the dual quantifier. Let us describe the vector $\mathbf{v_{\alpha}}\oplus\mathbf{v^{\prime}_{\alpha}}$ . Suppose $t_{\alpha}$ is the transition that checks whether position $i$ contains $a$ , position $i+1$ contains $b$ and the head in state $s$ , and position $i+2$ contains $c$ ; changes $b$ to $b^{\prime}$ , moves the head to the right and goes into state $s^{\prime}$ . Then $\mathbf{v_{\alpha}}\oplus\mathbf{v^{\prime}_{\alpha}}$ is defined as follows:

•

any argument that corresponds to a position distinct from $i+1$ or $i+2$ is chosen equal to the argument at the same position in $\mathbf{v}$ ;

•

the argument that corresponds to $(i+1,(b^{\prime},\bot))$ now contains the value of $\mathbf{v}$ at position $((i+1),(b,s))$ , and all other variables appearing in an argument corresponding to position $(i+1)$ are existentially quantified;

•

the argument that corresponds to $((i+2),(c,s^{\prime}))$ now contains the value of $\mathbf{v}$ at position $((i+2),(b,\bot))$ , and all other variables appearing in an argument corresponding to position $(i+1)$ are existentially quantified.

Note that here we have a distinction with the previous reduction: we do not check that a transition is applicable before applying it, as this would be out of the capabilities of ${\mathsf{IncDep}}$ . However, the same argument as in Casanova et al. [1984] proves that a configuration reached from simulating a non-applicable transition cannot lead to an accepting state. We choose as a policy:

[TABLE]

Proposition 10.

The policy is disclosed if and only if there is an accepting computation that uses at most $|x|$ cells.

The lower bound for ${\mathsf{Disclose_{C}}}({\mathsf{LTGD}},{\mathsf{ProjMap}})$ follows by reduction:

Proposition 11.

There is a polynomial time reduction from ${\mathsf{Disclose_{C}}}({\mathsf{IncDep}},{\mathsf{AtomMap}})$ to ${\mathsf{Disclose_{C}}}({\mathsf{LTGD}},{\mathsf{ProjMap}})$ .

Proof.

Given a mapping $\phi(\vec{x})\rightarrow\exists\vec{y}~{}H(\vec{t})$ where there may be repeated variables in the head atom, we replace it by a projection mapping

[TABLE]

where $H^{\prime}$ is a new predicate whose arity is the number of distinct variables in $H(\vec{t})$ . $H^{\prime}(\vec{t}^{\prime})$ has the same variables as $H$ , but with no repetition. For example, if the head of the original rule is $H(x,x,y)$ , then the new rule has head $H^{\prime}(x,y)$ .

We additionally add the source constraint:

[TABLE]

It is easy to see that this transformation preserves disclosure. ∎

B.7 Proof of Theorem 8: lower bounds for ${\mathsf{IncDep}}$ s in bounded arity

Recall the statement of Theorem 8:

${\mathsf{Disclose_{C}}}({\mathsf{IncDep}},{\mathsf{Map}})$ is 2ExpTime-hard in bounded arity.

This proof will be very similar to the proof of Theorem 6. We will provide a reduction from an alternating ExpSpace Turing machine to ${\mathsf{IncDep}}$ and ${\mathsf{SCEQrule}}$ s. We show how to simulate the run of an alternating ExpSpace Turing machine $\mathcal{M}$ with inclusion dependencies and ${\mathsf{SCEQrule}}$ s.

The main difference between the proof of Theorem 6 and the proof here is that in Theorem 6 each cell carried $n$ bits $b_{1}\dots b_{n}$ specifying the address of the cell. In this version we cannot use this trick as we are using a reduction where all predicates are bounded. For each configuration, the tape will represented in the leaves of a full binary tree of depth $n$ . For a cell $c$ , the $n$ bits specifying the address of a $c$ will scattered across the $n$ predicates in its lineage, each holding one bit of the address: an internal node has two descendants each carrying four values $b,\vec{b},y_{0},y_{1}$ . We will have $b=y_{0}$ and $\vec{b}=y_{1}$ when then node represents the addresses where the $i$ -th bit is [math] and $b=y_{1}$ , $\vec{b}=y_{0}$ when it is $1$ .

Let us describe our source signature:

•

${\mathsf{Children}}_{\forall}(c,c_{\alpha},c_{\beta},ac,ac_{\alpha},ac_{\beta},r,y_{0},y^{bis}_{0},y_{1},y^{bis}_{1})$ states that a configuration $c$ is universal and has children $c_{\alpha}$ and $c_{\beta}$ , and that the acceptance bit of $c$ is $ac$ , of $c_{\alpha}$ is $ac_{\alpha}$ and of $c_{\beta}$ is $ac_{\beta}$ . The last four positions are placeholders for $r$ the root of the tree of configurations, two values $y_{0}=y^{bis}_{0}$ representing [math] and two values $y_{1}=y^{bis}_{1}$ representing $1$ .

•

${\mathsf{Children}}_{\exists}(c,c_{\alpha},c_{\beta},ac,ac_{\alpha},ac_{\beta},r,y_{0},y^{bis}_{0},y_{1},y^{bis}_{1})$ : same meaning, except that $c$ is existential.

•

for $i\in 1..n$ , ${\mathsf{Address}}_{i}(c_{p},c_{n},b_{i},\vec{b}_{i},y_{0},y_{1})$ corresponds to a node of depth $i$ in the binary tree representing the tape of a configuration. In this predicate $c_{p}$ is the parent of the node, $c_{n}$ is the current node, $b_{i}$ will be equal to $y_{0}$ when the node if the first child of $c_{p}$ and equal to $y_{1}$ otherwise. $\vec{b}_{i}$ will be the complement of $b_{i}$ (i.e. $y_{0}=b_{i}$ implies $y_{1}=\vec{i}$ and $y_{1}=b_{i}$ implies $y_{0}=\vec{i}$ ).

•

${\mathsf{Cell}}^{c}(c,\vec{v})$ states that the cell at position $c$ contains the data represented by $\vec{v}$ . ${\mathsf{Cell}}^{p}$ and ${\mathsf{Cell}}^{n}$ play similar roles for the previous cell and the next cell.

Critical element.

We create a mapping $T_{c_{{\mathsf{Crit}}}}(x)$ defined as ${\mathsf{IsCrit}}(x)$ . The relation ${\mathsf{IsCrit}}$ will allow us to test whether a variable is equal to $c_{{\mathsf{Crit}}}$ .

Initialization.

We first define a mapping $\mathcal{T}_{init}()$ introducing some elements in the visible chase, whose definition is:

[TABLE]

Generating the tree of configuration.

$\alpha$ -successors have themselves $\alpha$ - and $\beta$ -successors, and are existential if their parent is universal:

[TABLE]

And similarly for ${\mathsf{Children}}_{\exists}$ and for the $\beta$ -successor.

Universal and Existential Acceptance Condition.

If both successors of a universal configuration $n$ are accepting, so is $n$ . We create a mapping $\mathcal{T}_{\forall}(x)$ with definition:

[TABLE]

If the $\alpha$ -successor of an existential configuration $n$ is accepting, so is $n$ . We create a mapping $\mathcal{T}_{\exists,\alpha}(x)$ of definition:

[TABLE]

We create a similar mapping $\mathcal{T}_{\exists,\beta}$ for the $\beta$ -successor.

Generating the tape cells.

We now focus on the representation of the tape and its consistency. We generate $2^{k}$ addresses and associated values:

[TABLE]

And for $i\in 1..n-1$ we have:

[TABLE]

Finally for $n$ we have:

[TABLE]

Initialization of the tape.

For the case 0, we use the pattern $l_{1}$ and introduce a mapping $\mathcal{T}_{tape0}(x)$ defined as:

[TABLE]

For all others cases, with a first $1$ at the $i$ -th bit, we use the pattern $l_{0}$ and introduce $\mathcal{T}_{tape~{}i}(x)$ :

[TABLE]

Ensuring the coherence between ${\mathsf{Cell}}^{c}$ and ${\mathsf{Cell}}^{p}$ .

We need to check the coherence between ${\mathsf{Cell}}^{c}$ in an address and ${\mathsf{Cell}}^{p}$ at the previous address. As usual when $v$ is at the address $\vec{b}10^{j}$ then ${\mathsf{Cell}}^{c}$ needs to be checked against the ${\mathsf{Cell}}^{p}$ at the address $\vec{b}01^{j}$ . We introduce the mapping $\mathcal{T}_{prev~{}j}(x)$ :

[TABLE]

Encoding transitions.

As in previous reductions, we encode the transitions of $\delta_{\epsilon}$ as a set of $(i,j,k)\rightarrow w$ (where $i$ is the value of the cell, $j$ is the value at the cell before and $k$ at the cell after and $w$ is the written value).

For our transition, we need to write $w$ at the same address s where $l$ lies in the $\epsilon$ -child of the configuration of $c$ . To be at the same address, we need to check that the path follows the same bits (that we note here $b_{n}\dots b_{1}$ ). We use the mapping $\mathcal{T}_{i,j,k\rightarrow w}^{\epsilon}(x)$ defined as:

[TABLE]

Encoding final states.

Whenever the current state is $q_{accept}$ we need to enforce that the $ac$ bit is set. To enforce that the $ac$ bit is set, we introduce the following mapping, for each value $k\in\{q_{accept}\}\times\Sigma$ marking a final state:

[TABLE]

Policy.

The policy query is

[TABLE]

We can verify that the policy query is disclosed if and only if the original Turing machine accepts on the empty tape.

B.8 Maximality of our Tractability Conditions

Recall that Theorem 5 shows that we can get tractability by simultaneously restricting our constraints to be ${\mathsf{UID}}$ s and our mappings to be ${\mathsf{ProjMap}}$ s. Recall also that a ${\mathsf{UID}}$ is an ${\mathsf{IncDep}}$ with at most one exported variable. Here we show that these restrictions are maximal in the following sense: if we increase from ${\mathsf{UID}}$ s to ${\mathsf{LTGD}}$ s with frontier one we get intractability. We also get intractability if we stick with ${\mathsf{UID}}$ s but we allow the mappings to be atomic. Let ${\mathsf{Fr1}}{\mathsf{LTGD}}$ denote the ${\mathsf{LTGD}}$ s with at most one exported variable. In fact, we will show something stronger (here $\emptyset$ denotes no constraints):

Theorem 10.

${\mathsf{Disclose_{C}}}(\emptyset,{\mathsf{AtomMap}})$ * and ${\mathsf{Disclose_{C}}}({\mathsf{Fr1}}{\mathsf{LTGD}},{\mathsf{ProjMap}})$ are both NP-hard.*

In order to prove our results, we will rely again on Proposition 1, which states that testing for disclosure is equivalent to evaluating the policy query on the result of the visible chase process. The process starts with the instance ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ , which has source witnesses for each tuple in $\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})}$ . It proceeds by alternating traditional chase steps and merge steps, which are applications of a ${\mathsf{SCEQrule}}$ . It is well known that query evaluation is NP-hard on arbitrary instances. But the constraints that we are considering in this section do not allow us to generate arbitrary instances as a visible sections. In this section we will exhibit a instance $\mathcal{D}$ on which query evaluation is NP-hard, but where $\mathcal{D}$ can be the result of the visible chase using ${\mathsf{AtomMap}}$ s but no constraints, or with a visible chase using ${\mathsf{Fr1}}{\mathsf{LTGD}}$ constraints and ${\mathsf{ProjMap}}$ s.

The instance $\mathcal{D}$ .

$\mathcal{D}$ will have one relation $R$ with $6$ atoms. We present the content of $R$ below. Empty cells are filled with fresh nulls, $c$ is the only value shared by two tuples and $n_{i}$ correspond to nulls that are shared inside a tuple:

[TABLE]

Note that this is a single-shared value instance: only one value, namely $c$ , is shared among multiple tuples. Such instances can be produced as the result of the visible chase over atomic mappings with no constraints. In this case:

[TABLE]

They can also be produced as the result of the visible chase over one projection mapping $A(x)\rightarrow T(x)$ with $6$ ${\mathsf{Fr1}}{\mathsf{LTGD}}$ s:

[TABLE]

The remainder of the argument is to show that CQ evaluation is NP-hard over this instance, via reduction from satisfiability of a propositional circuit (Circuit SAT).

General idea of the reduction.

The reduction that we provide will create a query $Q$ for each instance $\mathcal{I}$ of Circuit SAT. Without loss of generality, we suppose that $\mathcal{I}$ is composed of wires $w_{1},\dots,w_{k}$ , of negation gates $N_{1},\dots N_{l}$ and of binary OR gates $O_{1},\dots,O_{m}$ . Wire that are not the output of any gate are the inputs of the circuit. We will suppose that the output corresponds to the wire $1$ .

We will build the query $Q$ to contain conjuncts for each wire, each negation gate and each binary OR. Furthermore we will create a variable $v_{i}$ for each wire $w_{i}$ .

For the sake of readability, we present the conjuncts graphically, with each row representing an $R$ atom. A row with entries $t_{j_{1}}\ldots t_{j_{k}}$ represents an atom $R(\vec{w})$ where $w_{i}$ is a fresh existentially quantified variable when the cell is empty and the variable $t_{j_{i}}$ in the cell otherwise.

Wires.

For each wire $w_{i}$ , we will force the value of its associated variable $v_{i}$ to be either $c$ (when the wire carries the value true) or $n_{1}$ (when the wire carries false).

For each wire $i$ , we will have a conjunct:

[TABLE]

For the variable $v_{1}$ corresponding to the output wire we also add a conjunct:

[TABLE]

Negation.

For each negation gate $N_{k}$ , whose input is the wire $i$ and output is the wire $j$ , we will have the following conjuncts:

[TABLE]

Computing binary OR.

For the binary OR $O_{\ell}$ gate whose inputs are the wires $v_{i}$ and $v_{j}$ and the output is $v_{k}$ , we introduce the following conjuncts:

[TABLE]

Proof that this reduction captures Circuit-SAT.

Let us suppose that the circuit is satisfied. Towards showing that the query is satisfied in the instance $\mathcal{D}$ , we first build a binding for the variables that are shared between multiple of the conjunct grouping above, which are exactly the “wire variables” $v_{i}$ . We do this by setting $v_{i}=c$ when $w_{i}=\top$ and $v_{i}=n_{1}$ when $w_{i}=\bot$ . We now show that this binding extends to a valuation making the query true. Since all the other variables are not shared between the conjunct groups, it suffices to show satisfiability of each conjunct group in isolation.

•

We see that all the conjuncts corresponding to wires are satisfied (even the special conjunct corresponding to the output).

•

For the negation gate $N_{k}$ whose input is $v_{i}$ and output is $v_{j}$ . When $w_{i}=\top$ and thus $v_{i}=c$ , we can set $r_{k}=n_{5}$ , $p_{k}=c$ and satisfy all 3 conjuncts. When $w_{i}=\bot$ and thus $v_{i}=n_{1}$ , we can set $r_{k}=c$ and $p_{k}=n_{6}$ and satisfy all 3 conjuncts.

•

For an OR gate $O_{\ell}$ whose inputs are $v_{i}$ and $v_{j}$ , and whose output is $v_{k}$ . There are four cases:

–

when $w_{i}=w_{j}=\top$ and thus $v_{i}=v_{j}=c$ we can set $x_{\ell}=y_{\ell}=n_{4}$

–

when $w_{i}=\bot$ and $w_{j}=\top$ and thus $v_{i}=n_{1}$ , $v_{j}=c$ we can set $x_{\ell}=c$ , $y_{\ell}=n_{3}$

–

when $w_{j}=\bot$ and $w_{i}=\top$ and thus $v_{j}=n_{1}$ , $v_{i}=c$ we can set $x_{\ell}=n_{2}$ , $y_{\ell}=c$

–

when $w_{i}=w_{j}=\bot$ and thus $v_{i}=v_{j}=n)1$ we can set $x_{\ell}=y_{\ell}=c$

In all cases, our conjuncts are satisfied.

Conversely, let us show that when the query is satisfied in our instance $\mathcal{D}$ , then the circuit is satisfiable. Let $h$ be a homomorphism from the query variables to values. Since we have wire conjuncts constraining $v_{i}$ for each wire $w_{i}$ , we can see that $h(v_{i})=n_{1}$ or $h(v_{i})=c$ . We now consider the circuit assignment such that $w_{i}=\top$ when $h(v_{i})=c$ and $w_{i}=\bot$ when $h(v_{i})=n_{1}$ . Let us show that this assignment witnesses the satisfiability of the circuit.

•

The output wire is already constrained such that $h(v_{1})\in\{n_{1},c\}$ but it also has a special conjunct and the only remaining possibility for $h(v_{1})$ is $c$ and thus the output gate is set at $\top$ .

•

For each negation gate whose input is $w_{i}$ and output is $w_{j}$ :

–

when $h(v_{i})=n_{1}$ then the conjunct holding $v_{i}$ and $r_{k}$ (i.e. the first row in the graphical representation) forces that $h(r_{k})=c$ . The conjunct holding $r_{k}$ and $p_{k}$ forces $p_{k}$ to be a fresh null or $n_{6}$ . But since $p_{k}$ appears in the column $\lnot^{2}b$ and in the column $d$ , we can only have $p_{k}=n_{6}$ and thus $v_{j}=c$ .

–

when $h(v_{i})=c$ then the conjunct holding $v_{i}$ and $r_{k}$ forces that $h(r_{k})=n_{5}$ or $h(r_{k})=n4$ . Then the conjunct holding $r_{k}$ and $p_{k}$ forces $p_{k}$ to be either a fresh null (when $h(r_{k})=n_{4}$ ) or $c$ (when $h(r_{k})=n_{5}$ ). Since $p_{k}$ appears in the column $\lnot^{2}b$ and in the column $d$ we cannot have $p_{k}$ fresh null, we conclude that $p_{k}=c$ , and thus $v_{j}=n_{1}$ .

In both cases, the semantics of the negation gate is respected.

•

Consider each OR gate whose inputs are $w_{i}$ , $w_{j}$ and output is $w_{k}$ . First we have:

–

when $h(v_{i})=n_{1}$ then necessarily $h(x_{\ell})=c$

–

when $h(v_{i})=c$ then $h(x_{\ell})=n_{2}$ or $h(x_{\ell})=n_{4}$ or $h(x_{\ell})=n_{5}$

–

when $h(v_{j})=n_{1}$ then necessarily $h(y_{\ell})=c$

–

when $h(v_{j})=c$ then $h(x_{\ell})=n_{3}$ or $h(x_{\ell})=n_{4}$ .

Therefore we see that:

–

when $h(v_{i})=n_{1}=h(v_{j})$ then necessarily $h(x_{\ell})=h(y_{\ell})=c$ and thus $h(v_{k})=n_{1}$

–

when $h(v_{i})=c$ and $h(v_{j})=n_{1}$ then $h(x_{\ell})=n_{2}$ and thus $h(v_{k})=c$

–

when $h(v_{i})=n_{1}$ and $h(v_{j})=c$ then necessarily $h(y_{\ell})=n_{3}$ and thus $h(v_{k})=c$

–

when $h(v_{j})=c=h(v_{i})$ then $h(x_{\ell})=n_{4}=h(y_{\ell})$ and thus $h(v_{k})=c$

in all cases we do have that the semantics of the OR gate is respected.

All in all, we have seen that the circuit is satisfiable if and only if the query has a solution on the visible chase.

B.9 Lower Bounds Inherited from Entailment

In the body of the paper we claimed that in several cases, we could show that the complexity of disclosure for a class was at least as hard as the complexity of query entailment for the class. We do not claim that there is a generic reduction from query entailment to disclosure. There is a simple reduction from entailment for special classes of instances to disclosure. More specifically, disclosure is easily seen to subsume entailment on instances of the form ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ . But one needs to see that entailment on these specialized instances is as hard as entailment in general; this requires a separate argument for each class.

There are three cases of “lower bounds from entailment” that are used in the body of the paper: those whose lower bound is annotated with ${\mathsf{QEntail}}$ in Table 1. We give the details of each argument below.

B.9.1 In Bounded Arity, Disclosure with ${\mathsf{IncDep}}$ Source Constraints and Projection Maps is NP-hard

We begin by showing that disclosure for ${\mathsf{IncDep}}$ source constraints and projection maps inherits the NP-hardness that is known for query entailment with ${\mathsf{IncDep}}$ s. We do this via a direct reduction from $3$ -coloring. We make use again of the characterization of disclosure using the visible chase.

Let us take a graph $G=(V,E)$ that is an input to $3$ -coloring. In our reduction, the schema, the constraints and the mapping will not depend on this actual graph reduced. Only the query will depend on the graph.

We will have a single source relation $OK(x,y,z)$ and one mapping $OK(x,y,z)\rightarrow M()$ to create canonical values for $(x_{0},y_{0},z_{0})$ in ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ , which is the initial instance in the visible chase. Then we will use two ${\mathsf{IncDep}}$ constraints to create all permutations for these values: $OK(x,y,z)\rightarrow OK(x,z,y)$ and $OK(x,y,z)\rightarrow OK(y,x,z)$ .

Because of the mapping, ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ will have three values $x_{0},y_{0},z_{0}$ with $OK(x_{0},y_{0},z_{0})$ . Then the constraints ensure that the canonical model contains the six permutations of arguments for $OK$ : $OK(x_{0},y_{0},z_{0})$ , $OK(x_{0},z_{0},y_{0})$ , $OK(y_{0},x_{0},z_{0})$ , $OK(y_{0},z_{0},x_{0})$ , $OK(z_{0},x_{0},y_{0})$ , $OK(z_{0},y_{0},z_{0})$ .

In our query $|V|$ variables will capture the coloring of each node, we note $v(n)$ the variable associated with node $n$ . For each $(f,t)\in E$ the query will include a conjunct $\exists c~{}~{}OK(v(f),v(t),c)$ .

We sketch the correctness of this reduction. The three values $x_{0},y_{0},z_{0}$ in ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ encode the three possible colors in a coloring. The conjunct $\exists c~{}~{}OK(v(f),v(t),c)$ forbids the nodes $f$ and $t$ to be mapped to the same value ( $x_{0}$ , $y_{0}$ or $z_{0}$ ) as $\exists c~{}~{}OK(v,v,c)$ has no solution in ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ . Therefore if we have disclosure have a $3$ -coloring.

Conversely, if we have a $3$ -coloring, we can find a solution for the query in the visible chase.

B.9.2 In General Arity, Disclosure with ${\mathsf{IncDep}}$ Source Constraints and Projection Maps is PSpace-hard

Here we give a direct reduction111The original proof given here was faulty, many thanks to Balder ten Cate for noticing it and suggesting a fix. from the implication problem for ${\mathsf{IncDep}}$ s, or equivalently, the entailment problem for a single-atom instance and an atomic query. This is known to be PSpace-hard Casanova et al. [1984]. Given a problem $\Sigma\vDash R_{1}(\vec{X})\subseteq R_{2}(\vec{X})$ (where $\vec{X}$ has no repeated variables and $\Sigma$ is composed of IDs), we introduce a fresh predicate $shadowR_{1}$ and we reduce it to the disclosure problem with query $shadowR_{1}(\vec{X})\land R_{2}(\vec{X})$ on the constraints $\Sigma$ plus $shadowR_{1}(\vec{X})\rightarrow R_{1}(\vec{X})$ and the mapping $V():=\exists\vec{X}~{}~{}shadowR_{1}(\vec{X})$ .

B.9.3 In Bounded Arity, Disclosure with ${\mathsf{FGTGD}}$ Source Constraints and Projection Maps is 2ExpTime-hard

The last place where we claim that disclosure is at least as hard as entailment is for ${\mathsf{FGTGD}}$ source constraints and projection maps in bounded arity. Here we will proceed by modifying the reduction used in Theorem 8. In this proof we used mappings for two distinct purposes. The initialization mapping $\mathcal{T}_{init}()$ was used to generate some values in the initial instance of the visible chase. In the proof, this mapping is an atomic map but not a projection map; but we can easily change this to use a projection map and an ${\mathsf{LTGD}}$ .

The remaining maps are used to ensure that certain values get merged with $c_{{\mathsf{Crit}}}$ in the visible chase. Put another way, they are used to enforce certain ${\mathsf{SCEQrule}}$ s. But with the mappings $\mathcal{T}_{init}()$ and $T_{c_{{\mathsf{Crit}}}}(x)$ , we can ensure that the initial instance of the visible chase includes exactly one element satisfying ${\mathsf{IsCrit}}$ . Once we have done this, we can mimic a ${\mathsf{SCEQrule}}$

[TABLE]

by a source constraint

[TABLE]

This must be a ${\mathsf{FGTGD}}$ , since the frontier has size one. Transforming the mappings according to this methodology, while leaving the query the same as in Theorem 8 gives us a modification of the hardness proof using ${\mathsf{FGTGD}}$ source constraints and projection maps, as required.

Appendix C Refinements of our results

Atomic queries.

We have focused in the body of the paper on policy queries given as general CQs. But almost all of our lower bounds can be seen to hold for atomic queries. The only exceptions are stated in Theorem 4 and Corollary 3, where we claim PTime membership in bounded arity when restricting to atomic queries. Note that the NP-hardness bounds for general CQs corresponding to these upper bounds do not follow from our custom reductions, but using the simple reduction from entailment of CQs for the corresponding classes (e.g. ${\mathsf{IncDep}}$ s).

Non-Boolean queries.

In this appendix we have provided details of our upper bounds, assuming for simplicity that the queries $p$ are Boolean. But the proofs all extend to the non-Boolean case, as we now explain. To see this we need to go back to Theorem 1. We restate the theorem in a slightly different variant:

Theorem 11.

Benedikt et al. [2016]** When source constraints are TGDs and mapping rules are given by CQ definitions, then if a disclosure of a CQ (Boolean or non-Boolean) occurs, then the source instance which witnesses this can be taken to be $\mathcal{D}_{{\mathsf{Crit}}}^{\cal S}$ .

The statement differs slightly from that of Theorem 1, since this version talks about getting an instance that agrees with $\mathcal{D}_{{\mathsf{Crit}}}^{\cal S}$ on the mapping images, rather than having one that extends ${\mathsf{Hide}}_{\mathcal{M}}(\mathcal{D}_{{\mathsf{Crit}}}^{{\cal G}(\mathcal{M})})$ and satisfies the constraints.

The important point is that the result holds for non-Boolean queries as well as Boolean queries. Note that for a Non-Boolean query $p(\vec{x})$ , all the facts that an attacker will see in the mapping image of $\mathcal{D}_{{\mathsf{Crit}}}^{\cal S}$ will contain only the value $c_{{\mathsf{Crit}}}$ . Thus the only query answers that can be disclosed to the attacker will involve the value $c_{{\mathsf{Crit}}}$ . Inspection of each of the upper bound reductions will show that to detect such disclosures, it suffices to pre-process the query to add conjuncts ${\mathsf{IsCrit}}(x_{i})$ for each variable $x_{i}$ , treating the result as a Boolean query.

Note that this transformation converts atomic queries to queries consisting of a single atom and an additional set of unary atoms. However, this will not impact the PTime claims in Theorem 4 and Corollary 3. For example in Theorem 4, we will need only to note that for atomic queries on a bounded arity schema, we will get an atomic query with a bounded number of additional atoms of the form ${\mathsf{IsCrit}}(x_{i})$ . Entailment of such queries over a bounded arity schema with ${\mathsf{IncDep}}$ s is still in PTime.

Dependencies with multiple atoms in the head.

In some of our upper bound proofs, we assumed that the dependencies had a single atom in the head for simplicity, even when the classes in question (e.g. ${\mathsf{GTGD}}$ s) does not impose this. In our results that are stated for general arity, this assumption can be made without loss of generality, since one can simplify the heads by introducing intermediate predicates. In bounded arity, one must take some care, since one cannot polynomially reduce to the case of a single atom in the head. All of our results for bounded arity do in fact hold as stated, without any additional restrictions on the head. We explain how the argument needs to be customized for the most subtle case, Theorem 4.

Recall that the bounded arity case of Theorem 4 starts with the critical-instance rewriting, which reduces to reasoning with Guarded TGDs having a fixed side signature, the unary predicate ${\mathsf{IsCrit}}(x)$ . The linearization of Amarilli and Benedikt [2018a]; Amarilli and Benedikt [2018b], applied in this context, proceeds in two steps. First we generate all derived rules of the form:

[TABLE]

Notice that these are full-dependencies: no existentials in the head. This generation can be done inductively, via the dynamic programming steps in Amarilli and Benedikt [2018b]: one inductive steps composes a derived rule with one of the original non-full dependencies. A second step composes two derived full rules. This can be applied directly to the case of rules with multiple atoms in the head.

After this is done, the second step of linearization moves to an extended signature described as follows: for every relation $R$ of arity $k$ in the original signature (without ${\mathsf{IsCrit}}$ ), and for each set of positions $P$ of $R$ , we introduce predicates $R^{P}$ of arity $k$ . Informally, $R^{P}(\vec{x})$ stands in for $R(\vec{x})\wedge\bigwedge_{i\in P}{\mathsf{IsCrit}}(x_{i})$ . We lift every original dependency:

[TABLE]

to a linear TGD:

[TABLE]

where $P_{j}$ contains the positions corresponding to exported variables in $P$ . We lift every derived full dependency of the form:

[TABLE]

to a linear TGD:

[TABLE]

Finally we have linear TGD asserting that the semantics of $R^{P}$ become stronger as one adds to the set of positions $P$ :

[TABLE]

for $P\subset P^{\prime}$ .

We rewrite the query to the extended signature in the analogous way. The correctness of this transformation is given by an argument identical to that in the single-headed case in Amarilli and Benedikt [2018b].

Note that this transformation is in PTime when the arity is fixed. It reduces us to an entailment problem with ${\mathsf{LTGD}}$ s, still with bounded arity, but with multiple atoms in the head. Such an entailment problem can be shown to be in NP using a simple variation of the algorithm for ${\mathsf{IncDep}}$ s of Johnson and Klug [1984].

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Abiteboul and Duschka [1998] Serge Abiteboul and Olivier Duschka. Complexity of answering queries using materialized views. In PODS , pages 254–263, 1998.
2Ahmetaj et al. [2016] Shqiponja Ahmetaj, Magdalena Ortiz, and Mantas Šimkus. Polynomial datalog rewritings for expressive description logics with closed predicates. In IJCAI , pages 878–885, 2016.
3Amarilli and Benedikt [2018 a] Antoine Amarilli and Michael Benedikt. When Can We Answer Queries Using Result-Bounded Data Interfaces? In PODS , pages 281–293, 2018.
4Amarilli and Benedikt [2018 b] Antoine Amarilli and Michael Benedikt. When Can We Answer Queries Using Result-Bounded Data Interfaces? In arxiv , 2018. available at https://arxiv.org/pdf/1706.07936.pdf .
5Amendola et al. [2018] Giovanni Amendola, Nicola Leone, Marco Manna, and Pierfrancesco Veltri. Enhancing existential rules by closed-world variables. In IJCAI , pages 1676–1682, 2018.
6Baget et al. [2011] Jean-François Baget, Michel Leclère, Marie-Laure Mugnier, and Eric Salvat. On rules with existential variables: Walking the decidability line. Artif. Intell. , 175(9-10), 2011.
7Bárány et al. [2015] Vince Bárány, Balder Ten Cate, and Luc Segoufin. Guarded negation. J. ACM , 62(3):356–367, 2015.
8Benedikt et al. [2016] Michael Benedikt, Pierre Bourhis, Balder ten Cate, and Gabriele Puppis. Querying visible and invisible information. In LICS , pages 297–306, 2016.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Reasoning about Disclosure in Data Integration in the Presence of Source Constraints

Abstract

1 Introduction

Example 1**.**

Example 2**.**

2 Preliminaries

Data integration.

Source constraints.

Queries and disclosure.

3 Reducing Disclosure to Query Entailment

Theorem 1**.**

Corollary 1**.**

Proof.

Theorem 2**.**

Corollary 2**.**

Proof.

3.1 Refinements of the Reduction to Identify Lower Complexity Cases

Theorem 3**.**

Theorem 4**.**

Proof.

Corollary 3**.**

Proof.

3.2 Obtaining Tractability

Theorem 5**.**

Proof.

4 Lower Bounds

Theorem 6**.**

Proof.

Theorem 7**.**

Theorem 8**.**

5 Related Work

6 Summary and Conclusion

Acknowledgements

Appendix A Detailed Proofs from Section 3: Upper Bounds for Disclosure

A.1 Proof of Theorem 2: Correctness of the Basic Reduction

Theorem 9**.**

Proposition 1**.**

A.2 Simplifying Mappings

Proposition 2**.**

Proof.

Corollary 4**.**

A.3 More Details for the Proof of Corollary 2

Reducing to ProjMap{\mathsf{ProjMap}}ProjMap.

Reducing to two atoms in the body of TGDs.

Rewriting in PTime.

A.4 Proof of Theorem 3: More Efficient Reduction to Entailment for LTGD{\mathsf{LTGD}}LTGD Source Constraints and Atomic Mappings

A.5 More details in proof of Theorem 4: upper bounds

A.6 Proof of Theorem 5: Disclosure for UID{\mathsf{UID}}UID Source Constraints and ProjMap{\mathsf{ProjMap}}ProjMap is PTime

Reachable predicates.

Visible position graph.

Reduction to entailment.

Correctness of the reduction.

Proposition 3**.**

Proof.

Overview of PTime algorithm for entailment with UID{\mathsf{UID}}UIDs over a single fact.

Reduction to binary schemas.

Proposition 4**.**

Special form of the chase: annotated chase forest.

First query simplification: eliminating forking pairs.

Proposition 5**.**

Proof.

Second simplification: reducing to acyclic queries

Proposition 6**.**

Proof.

Corollary 5**.**

Proof.

Determining entailment for acyclic connected graphs.

Proposition 7**.**

Proof.

Proposition 8**.**

Proof.

Proposition 9**.**

Proof.

Putting it all together.

Example 1.

Example 2.

Theorem 1.

Corollary 1.

Theorem 2.

Corollary 2.

Theorem 3.

Theorem 4.

Corollary 3.

Theorem 5.

Theorem 6.

Theorem 7.

Theorem 8.

Theorem 9.

Proposition 1.

Proposition 2.

Corollary 4.

Reducing to ${\mathsf{ProjMap}}$ .

A.4 Proof of Theorem 3: More Efficient Reduction to Entailment for ${\mathsf{LTGD}}$ Source Constraints and Atomic Mappings

A.6 Proof of Theorem 5: Disclosure for ${\mathsf{UID}}$ Source Constraints and ${\mathsf{ProjMap}}$ is PTime

Proposition 3.

Overview of PTime algorithm for entailment with ${\mathsf{UID}}$ s over a single fact.

Proposition 4.

Proposition 5.

Proposition 6.

Corollary 5.

Proposition 7.

Proposition 8.

Proposition 9.

Definition 1 (Tape Representation).

Lemma 1.

Lemma 2.

Lemma 3.

Proposition 10.

Proposition 11.

B.7 Proof of Theorem 8: lower bounds for ${\mathsf{IncDep}}$ s in bounded arity

Ensuring the coherence between ${\mathsf{Cell}}^{c}$ and ${\mathsf{Cell}}^{p}$ .

Theorem 10.

The instance $\mathcal{D}$ .

B.9.1 In Bounded Arity, Disclosure with ${\mathsf{IncDep}}$ Source Constraints and Projection Maps is NP-hard

B.9.2 In General Arity, Disclosure with ${\mathsf{IncDep}}$ Source Constraints and Projection Maps is PSpace-hard

B.9.3 In Bounded Arity, Disclosure with ${\mathsf{FGTGD}}$ Source Constraints and Projection Maps is 2ExpTime-hard

Theorem 11.