The Bag Semantics of Ontology-Based Data Access††thanks: This work was supported by the Royal Society under a University Research
Fellowship, the EPSRC projects ED3 and DBOnto, and the Research Council of
Norway via the Sirius SFI.
Charalampos Nikolaou
Egor V. Kostylev
George Konstantinidis
Mark Kaminski
Bernardo Cuenca Grau
**Ian Horrocks
**Department of Computer Science, University of Oxford, UK
Abstract
Ontology-based data access (OBDA) is a popular approach
for integrating and
querying multiple data sources by means of a shared ontology.
The ontology is linked to the sources
using mappings, which assign views over the data to ontology predicates.
Motivated by the need for OBDA systems supporting
database-style aggregate queries, we propose a
bag semantics for
OBDA, where duplicate tuples in the views defined by the mappings
are retained, as is the case in standard databases.
We show that bag semantics
makes conjunctive query answering in OBDA coNP-hard in data complexity.
To regain tractability, we consider a rather general class of queries and show
its rewritability to a generalisation of the relational calculus to bags.
1 Introduction
Ontology-based data access (OBDA) is an increasingly
popular approach to enable uniform access to
multiple data sources with diverging schemas Poggi et al. (2008).
In OBDA, an ontology
provides a unifying conceptual model for
the data sources together with
domain knowledge.
The ontology is linked to
each source by global-as-view (GAV)
mappings Lenzerini (2002), which
assign views over the data
to ontology predicates.
Users
access the data by means of
queries formulated using
the vocabulary of the ontology;
query answering amounts to
computing the certain answers to the query over the union of
ontology and the materialisation of the views
defined by the mappings.
The formalism of choice for representing ontologies in
OBDA is
the description logic
DL-LiteR Calvanese et al. (2007), which underpins OWL 2 QL Motik et al. (2012). DL-LiteR was
designed to
ensure that
queries
against the ontology are first-order rewritable; that is, they
can be
reformulated as a set of relational queries
over the sources Calvanese et al. (2007).
Example 1**.**
A company stores data about departments and their
employees in several databases. The
sales department uses the schema
SalEmployee(id,name,salary,loc,mngr),
where attributes id, name,
salary, loc, and mngr
stand for employee ID within the department, their name, salary, location, and name of their manager.
In turn, the IT department stores data using the schema
ITEmployee(id,surname,salary,city), where
managers
are not
specified.
To integrate employee data, the company relies on an ontology
with TBox Tex, which
defines
unary predicates such as SalEmp, ITEmp, and
Mngr,
and binary predicates
such as hasMngr
relating employees to their managers.
The following mappings determine the extension of the
predicates based
on the data, where each atti represents the
attributes occurring only in the source:
[TABLE]
*TBox Tex
specifies the meaning of its
vocabulary using inclusions
(i) SalEmp⊑Empand
ITEmp⊑Emp, which say that
both sales and IT employees are company employees;
(ii) ∃hasMngr−⊑Mngr, specifying the range of the
hasMngr relation, and
(iii) Emp⊑∃hasMngr, requiring that
employees have a (maybe unspecified) manager.
Such inclusions influence query answering:
when asking for the names of all company employees,
the system will retrieve
all relevant sales and IT employees; this is achieved via query
rewriting, where the query is reformulated as the union of
queries over the sales and IT databases.
◊*
OBDA has received a great deal of attention in recent years.
Researchers have studied the limits of
first-order rewritability in ontology languages Calvanese et al. (2007); Artale et al. (2009),
established bounds on the size of rewritings Gottlob et al. (2014); Kikot et al. (2014),
developed optimisation techniques Kontchakov et al. (2014), and implemented
systems well-suited for real-world applications Calvanese et al. (2017, 2011).
An important observation about
the conventional
semantics of OBDA is that it is
set-based: the materialisation of the views
defined by the mappings is formalised as
a virtual ABox consisting of a set of facts over the
ontology predicates. This treatment is, however,
in contrast with the semantics of database views, which is based on bags
(multisets) and where duplicate tuples are retained by default.
The distinction between set and bag semantics in databases is
very significant in practice; in particular, it influences the
evaluation of aggregate queries, which combine various
aggregation functions such as Min, Max, Sum, Count or
Avg with the grouping functionality provided in SQL by the
GroupBy construct.
Example 2**.**
Consider the query asking for the number of employees named
Lee. Assume there are two different
employees named Lee, which are represented as different tuples in the
sales database (e.g., tuples with the same employee name, but different ID).
Under the conventional semantics of OBDA, the virtual
ABox would
contain a single fact SalEmp(Lee); hence,
the query would wrongly return one, even under
the semantics for counting aggregate queries in
Calvanese et al. (2008); Kostylev and Reutter (2015).
The correct count can be obtained by considering the extension of
SalEmp as a bag with multiple occurrences of
Lee.
◊
The goal of this paper is to propose and study
a bag semantics for OBDA which is compatible with the
semantics of standard databases and can
provide a suitable foundation for the future study of aggregate queries.
We focus on conjunctive query (CQ) answering over DL-LiteR ontologies under
bag semantics, and our main contributions are
as follows.
-
We propose the ontology language DL-LiteRbag and its restriction DL-Litecorebag, where
ABoxes consist of a bag of facts, thus providing a
faithful representation of the views defined by OBDA mappings. We define
the semantics
of query answering in this setting
and show that it is compatible with the conventional set-based
semantics.
2. 2.
We show that, in contrast to the set case,
ontologies may not have a universal model (i.e., a single model over which
all CQs can be correctly evaluated), and
bag query answering becomes coNP-hard in data complexity even if we restrict ourselves to DL-Litecorebag ontologies.
3. 3.
To regain tractability, we study the class
of rooted CQs Bienvenu et al. (2012), where each
connected component of the query graph is required to contain
an individual or an answer variable.
This is a very general class, which arguably captures
most practical OBDA queries.
We show that rooted CQs over DL-Litecorebag ontologies
not only admit a universal model and enjoy favourable computational properties, but also
allow for rewritings that can be directly evaluated over the bag
ABox of the ontology.
Proofs of all results are deferred to the appendix.
2 Preliminaries
**Syntax of Ontologies **
We fix a vocabulary consisting
of countably infinite and pairwise disjoint sets of individuals I (i.e., constants), variables X,
atomic concepts C (unary predicates) and
atomic roles R (binary predicates).
A role is an atomic role P∈R or its inverse P−.
A concept is an atomic concept in C or
an expression ∃R, where R is a role.
An inclusion is an expression of the form S1⊑S2 with S1 and S2 either both concepts or both roles.
A disjointness axiom is an expression of the form Disj(S1,S2) with S1 and S2
either both concepts or both roles.
A concept assertion is of the form A(a) with a∈I
and A∈C. A role assertion is of the form
P(a,b) with a,b∈I
and P∈R.
A DL-LiteR TBox is a finite set of inclusions and
disjointness axioms. An ABox
is a finite set of concept and role assertions.
A DL-LiteR ontology is a pair ⟨T,A⟩
with T a DL-LiteR TBox and A an ABox.
The ontology language DL-Litecore restricts
DL-LiteR by disallowing inclusions and disjointness axioms for roles.
**Semantics of Ontologies ** An interpretation I is a pair
⟨ΔI,⋅I⟩, where
the domain ΔI is a non-empty set,
and the interpretation function ⋅I
maps each a∈I to aI∈ΔI
such that aI=bI for all a,b∈I,111We adopt the
unique name assumption for convenience; dropping it does not affect results
(modulo minor changes of definitions).
each A∈C to a subset AI of
ΔI
and each P∈R to a subset PI
of ΔI×ΔI.
The
interpretation function extends to concepts and roles as follows:
(R−)I={(u,v)∣(v,u)∈RI} and
(∃R)I={u∈ΔI∣(u,v)∈RI for some v∈ΔI}.
An interpretation I
satisfies ABox A if aI∈AI for all
A(a)∈A and (aI,bI)∈PI
for all
P(a,b)∈A;
I satisfies TBox T if S1I⊆S2I for all S1⊑S2 in T and S1I∩S2I=∅ for all
Disj(S1,S2) in T;
I is a model of ontology ⟨T,A⟩ if it satisfies T and A. An ontology is satisfiable if it has a
model.
Queries A conjunctive query (CQ)
q(x) with answer variables x is a formula ∃y.ϕ(x,y), where x,
y are (possibly empty) repetition-free tuples of variables and ϕ(x,y) is a conjunction of atoms of the form A(t),
P(t1,t2) or z=t, where A∈C,
P∈R, z∈x∪y, and t,t1,t2∈x∪y∪I.
If x is inessential, then we write q instead of q(x).
If x is the empty tuple ⟨⟩, then q is Boolean.
A union of CQs (UCQ) is a disjunction of CQs with the same answer
variables.
The equality atoms in a CQ q(x)=∃y.ϕ(x,y)
yield an equivalence relation ∼ on terms
x∪y∪I, and we write t~ for the equivalence class of
a term t.
The Gaifman graph of q(x)
has a node t~ for each
t∈x∪y∪I in ϕ,
and an edge {t~1,t~2} for each atom in ϕ over t1 and t2. We assume that all CQs are safe:
for each z∈x∪y, the class z~ contains a term mentioned in an atom
of ϕ(x,y) that is not an equality.
The
certain answers
qK
to a (U)CQ q(x) over a DL-LiteR ontology K are the set of all tuples a of individuals such that q(a) holds in every
model of K.
A class of queries Q1 is rewritable to a class Q2 for an ontology language O if for any
q1∈Q1 and TBox T in O, there is
q2∈Q2 such that, for any ABox A in O with
⟨T,A⟩ satisfiable,
q1⟨T,A⟩ equals the answers to q2 in (the least model of) A.
Checking a∈q⟨T,A⟩ for a tuple a, (U)CQ
q, and DL-LiteR ontology ⟨T,A⟩ is an NP-complete problem
with \textscAC0 data complexity (i.e., when T and q are fixed)
Calvanese et al. (2007). The latter follows from the rewritability of UCQs to themselves for DL-LiteR.
**Bags **
A bag over a set M
is a function Ω:M→N0∞, where N0∞ is the set of
nonnegative integers and infinity.
The value Ω(c) is the multiplicity of c in M.
A bag Ω is finite if
there are finitely many c∈M
with
Ω(c)>0 and there is no c with
Ω(c)=∞. The empty bag ∅ over M
is the bag such that ∅(c)=0 for all c∈M.
Given bags Ω1 and Ω2 over M, let
Ω1⊆Ω2 if
Ω1(c)≤Ω2(c) for each c∈M.
The intersection ∩, max union ∪, arithmetic union ⊎, and difference − are the binary operations
defined for bags Ω1 and Ω2 over the same set M as follows:
for every
c∈M, (Ω1∩Ω2)(c)=min{Ω1(c),Ω2(c)},
(Ω1∪Ω2)(c)=max{Ω1(c),Ω2(c)}, (Ω1⊎Ω2)(c)=Ω1(c)+Ω2(c), and (Ω1−Ω2)(c)=max{0,Ω1(c)−Ω2(c)}; difference is
well-defined only when Ω2 is finite.
3 DL-LiteR with Bag Semantics
In this section we present a bag semantics for DL-LiteR ontologies, define
the associated query answering problem, and establish its intractability in data complexity.
We formalise ABoxes as bags of facts (rather than sets)
in order to faithfully represent
the materialised views over source data defined by OBDA mappings.
Definition 3**.**
A bag ABox is a finite bag
over the set of concept and role assertions.
A DL-LiteRbag ontology is a pair ⟨T,A⟩
of a DL-LiteR TBox
T and a bag ABox A; the ontology is
DL-Litecorebag if T is a DL-Litecore TBox.
The semantics of DL-LiteRbag is based on bag interpretations I, with
atomic concepts and roles mapped to bags of domain elements and pairs of
elements, respectively, and where the
interpretation function is extended to complex concepts and roles
in the natural way; in particular,
a concept ∃P is interpreted
as the bag projection
of PI
to the first component, where
each occurrence of a pair (u,v) in PI
contributes to the
multiplicity of domain element u in (∃P)I.
Definition 4**.**
A bag interpretation I is a pair ⟨ΔI,⋅I⟩
defined the same as in the set case with the exception that AI and PI are bags (not sets) over ΔI and ΔI×ΔI, respectively.
The interpretation function extends to concepts and roles as follows:
(P−)I maps
each (u,v)∈ΔI×ΔI to
PI(v,u), and
(∃R)I
maps each u∈ΔI to
∑v∈ΔIRI(u,v).
The definition of semantics of ontologies is as expected.
Definition 5**.**
A bag interpretation I=⟨ΔI,⋅I⟩
satisfies a bag ABox A if
A(A(a))≤AI(aI) for each concept assertion A(a) in A
and A(P(a,b))≤PI(aI,bI)
for each role assertion P(a,b).
Satisfaction of T is defined
as in the set case, except that ⊆ and ∩ are applied to bags instead of sets.
Bag interpretation I
is a bag model of the DL-LiteRbag ontology ⟨T,A⟩, written I⊨b⟨T,A⟩,
if it satisfies both T and A. The ontology is satisfiable if it has a bag model.
Example 6**.**
Let Kex=⟨Tex,Aex⟩ be a
DL-LiteRbag
ontology with Tex as
in Example 1 and
Aex has
SalEmp(Lee) with multiplicity 3,
ITEmp(Lee) and
hasMngr(Lee,Hill) both with
multiplicity 2 (and all other assertions with multiplicity 0).
Let Iex be the bag interpretation mapping
individuals to themselves and with the following non-zero values:
[TABLE]
where w is a fresh element.
We can check that Iex⊨bKex.
◊
We now define the notion of query answering under bag semantics.
We first define the answers qI of a CQ q(x) over a bag interpretation I.
Intuitively, qI is a bag of tuples of individuals such that
each valid embedding λ of the body of q into I
contributes separately to the multiplicity of the tuple λ(x) in qI; in turn,
the contribution of each specific λ is the product of the
multiplicities of the images of the query atoms under λ. The latter is in accordance with
the interpretation of joins in the bag relational algebra and SQL, where the multiplicity of a tuple in a join
is the product of the
multiplicities of the joined tuples (e.g., see García-Molina et al. (2009)).
Definition 7**.**
Let q(x)=∃y.ϕ(x,y) be a CQ. The
bag answers qI to q over a bag interpretation
I=⟨ΔI,⋅I⟩ are defined as the bag over tuples of individuals from
I of the same size as x such that, for every such tuple a,
[TABLE]
*where Λ is the set of all valuations λ:x∪y∪I→ΔI such that λ(x)=aI, λ(a)=aI for each a∈I, and λ(z)=λ(t) for each z=t in ϕ(x,y).
*
If q is Boolean then qI are defined only
for the empty tuple ⟨⟩. Also,
conjunction ϕ(x,y) may contain repeated atoms, and hence can be seen as a bag of atoms; while
repeated atoms are redundant in the set case, they are essential in the bag
setting Chaudhuri and Vardi (1993)
and thus the definition of
qI(a)
treats each copy of
a query atom
S(t) separately.
The following definition of certain answers, capturing open-world query answering, is a reformulation of
the definition in Kostylev and Reutter (2015) for counting queries. It is a natural extension of the set notion to bags:
a query answer is
certain for a given multiplicity if it occurs with at least that multiplicity in every bag model of the ontology.
Definition 8**.**
*The bag certain answers qK to a query q over a
DL-LiteRbag ontology K are the bag ⋂I⊨bKqI. *
We study the problem \textscBagCert[Q,O] of checking, given a
query q from a class of CQs Q, ontology K=⟨T,A⟩ from an ontology language O,
tuple a over I, and number k∈N0∞, whether
qK(a)≥k; data complexity of BagCert is studied
under the assumption that T and q are fixed. Following
Grumbach and Milo (1996), we assume that the multiplicities of assertions in A
and k (if not infinity) are given in unary.
Example 9**.**
Let qex(x)=∃y.hasMngr(x,y) and Kex be as in
Example 6. Then qexKex(Lee)=3. Indeed, on
the one hand, qexIex(Lee)=3 for Iex in
Example 6. On the other, for any bag model I of Kex,
qexI(Lee)=Σu∈ΔIhasMngrI(LeeI,u)≥3, because Aex(SalEmp(Lee))=3 and Tex
contains inclusions SalEmp⊑Emp and
Emp⊑∃hasMngr .
◊
The bag semantics can be seen as a
generalisation of the
set semantics of DL-Lite:
first, satisfiability under bag semantics reduces to
the set case; second, certain answers
under bag and set semantics coincide if multiplicities are ignored.
Proposition 10**.**
Let ⟨T,A⟩ be a DL-LiteR ontology and ⟨T,A′⟩ be
a DL-LiteRbag ontology with the same TBox such that
{S(t)∣A′(S(t))≥1}=A. Then, the following holds:
-
⟨T,A⟩* is satisfiable if and only if ⟨T,A′⟩ is satisfiable;*
2. 2.
for each CQ q and tuple a of individuals from I,
a∈q⟨T,A⟩ if and only if
q⟨T,A′⟩(a)≥1.
An important property of satisfiable DL-LiteR ontologies K is the
existence of so called universal models for CQs, that is, models I such that the
certain answers to every CQ q over K can be obtained by evaluating q
over I Calvanese et al. (2007). This notion extends naturally to bags.
Definition 11**.**
A bag model I of a DL-LiteRbag ontology K is universal
for a class of queries Q if qK=qI for any q∈Q.
Unfortunately, in contrast to the set case, even DL-Litecorebag ontologies may
not admit a universal bag model for all CQs.
Proposition 12**.**
There exists a satisfiable DL-Litecorebag ontology that has no universal bag
model for the class of all CQs.
The lack of a universal model suggests that CQ answering under bag semantics is
harder than in the set case. Indeed, this problem is coNP-hard in data
complexity, which is in stark contrast to
the \textscAC0 upper bound in the set case.
Theorem 13**.**
\textscBagCert[CQs,DL-Litecorebag]* is coNP-hard in data complexity.*
4 Universal Models for Rooted Queries
Theorem 13 suggests that bag semantics
is generally not well-suited for OBDA. Our approach to overcome
this negative result
is to consider a restricted class of CQs, introduced in the context of query optimisation in
DLs Bienvenu et al. (2012), called rooted: in a rooted CQ, each existential
variable is connected in the Gaifman graph to an
individual or an answer variable.
Rooted CQs capture most practical queries; for example, they
include all connected non-Boolean CQs.
Definition 14**.**
*A CQ q(x) is
rooted if each connected component of
its Gaifman graph has
a node with a term in
x∪I. *
In contrast to arbitrary CQs, any satisfiable DL-Litecorebag ontology admits a universal bag model for
rooted CQs. Although we define such a model, called canonical, in a
fully
declarative way, it can be intuitively seen as the result of applying a variant
of the restricted chase procedure Calì et al. (2013) extended to bags. Starting
from the ABox, the procedure successively “repairs” violations of T by
extending the interpretation of concepts and roles in a minimal way.
To formalise canonical models, we need two auxiliary
notions. First,
the concept closure cclT[u,I] of an element
u∈ΔI in a bag interpretation I=⟨ΔI,⋅I⟩ over a TBox T is the bag of concepts such that, for any concept C,
cclT[u,I](C) is the maximum value of C0I(u) amongst all concepts C0 satisfying T⊨C0⊑C.
Second, the union I∪J of bag interpretations
I=⟨ΔI,⋅I⟩ and J=⟨ΔJ,⋅J⟩ with aI=aJ for all a∈I is the
bag interpretation ⟨ΔI∪ΔJ,⋅I∪J⟩ with aI∪J=aI for a∈I and SI∪J=SI∪SJ for S∈C∪R.
Definition 15**.**
The canonical bag model C(K) of a DL-Litecorebag ontology K=⟨T,A⟩ is the bag interpretation ⋃i≥0Ci(K) with the bag interpretations Ci(K)=⟨ΔCi(K),⋅Ci(K)⟩ defined as follows:
ΔC0(K)=I, aC0(K)=a for each a∈I, and SC0(K)(a)=A(S(a)) for each S∈C∪R and individuals a;
for each i>0, ΔCi(K) is
[TABLE]
*where wu,Rj are fresh domain elements, called anonymous,
aCi(K)=a for all a∈I, and, for all A∈C, P∈R, and elements u, v,
*
[TABLE]
It is easily seen that C(K) satisfies K whenever K is satisfiable.
We next show that it is universal for rooted CQs.
Theorem 16**.**
The canonical bag model C(K) of a satisfiable DL-Litecorebag ontology K
is universal for rooted CQs.
Example 17**.**
Consider an ontology Kr=⟨Tr,Ar⟩ with
[TABLE]
The canonical model C(Kr) interprets (all with multiplicity 1) Emp by Lee,
Mngr by Hill and wLee,hasMngr1, and
hasMngr by (Lee,wLee,hasMngr1).
Note that C(Kr) is not universal for all CQs: for instance,
qnrC(Kr)(⟨⟩)=2 for non-rooted
qnr=∃y.Mngr(y), but
qnrInr(⟨⟩)=1 for
the model Inr interpreting
Emp by Lee, hasMngr by
(Lee,Hill), and Mngr by Hill.
◊
We conclude this section by showing an important
property of rooted CQs, which justifies their
favourable computational properties.
As in the set case for arbitrary CQs,
given a satisfiable DL-Litecorebag ontology K and a rooted CQ q,
qK can be computed over a small sub-interpretation of
C(K).
Theorem 18**.**
Let K be a satisfiable DL-Litecorebag ontology with
C(K)=⋃i≥0Ci(K) and
q be a rooted CQ having n atoms. Then, qC(K)=qCn(K).
5 Rewritability of Rooted Queries
Rewritability is key for OBDA, and we next establish to what extent rooted CQs
over bag semantics are rewritable.
The first idea would be to use the analogy with the set case and rewrite to unions of CQs.
There are two corresponding operations for bags: max union ∪ and
arithmetic union ⊎. So we may consider max unions
qmax=q1(x)∨⋯∨qn(x) or
arithmetic unions q_{\textit{ar}}=q_{1}(\mathbf{x})\mathop{\mathchoice{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}}\cdots\mathop{\mathchoice{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}}q_{n}(\mathbf{x}) of CQs qi(x), 1≤i≤n, with the following
semantics, for any interpretation I: qmaxI=q1I∪⋯∪qnI and qarI=q1I⊎⋯⊎qnI, respectively.
Our first result is negative: rewriting to either of these classes is not possible even for DL-Litecorebag.
Proposition 19**.**
The class of rooted CQs is rewritable neither to max nor to arithmetic unions
of CQs for
DL-Litecorebag.
Next we show that rooted queries are rewritable to
\textscBALGε1-queries: the class directly corresponding to the algebra
\textscBALGε1 for bags
Grumbach et al. (1996); Grumbach and Milo (1996); Libkin and
Wong (1997). Since
\textscBALGε1⊂\textscLogSpace Grumbach and Milo (1996), where
\textscBALGε1 is the complexity class for \textscBALGε1 algebra evaluation,
rewritability to \textscBALGε1-queries is highly desirable.
Intuitively, in addition to projection ∃, join ∧, and unions ∨ and \mathop{\mathchoice{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}},
\textscBALGε1 also allows for
difference ∖. Domain-dependent
queries, inexpressible in
algebraic query languages, are precluded
by restrictions on
the use of variables.
Definition 20**.**
A \textscBALGε1-query q(x) with answer variables x is
one of the following, where qi are \textscBALGε1-queries:
S(t), for S∈C∪R, t tuple over x∪I mentioning all x;
q1(x1)∧q2(x2), for x=x1∪x2;
q0(x0)∧(x=t), for x∈x0, t∈X∪I, x=x0∪({t}∖I);
∃y.q0(x,y); q1(x)∨q2(x); q_{1}(\mathbf{x})\mathop{\mathchoice{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}}q_{2}(\mathbf{x}); q1(x)∖q2(x).
The semantics of \textscBALGε1-queries is defined
as follows.
Definition 21**.**
The bag answers qI to a \textscBALGε1-query q(x) over
a bag interpretation I=⟨ΔI,⋅I⟩ is the
bag of tuples over I of the same size as x inductively defined as follows, for each tuple
a and the corresponding mapping λ such that
λ(x)=aI and λ(a)=aI for all a∈I:
SI(λ(t)), if q(x)=S(t);
q1I(λ(x1))×q2I(λ(x2)), if
q(x)=q1(x1)∧q2(x2);
q0I(λ(x0)), if q(x)=q0(x0)∧(x=t) and λ(x)=λ(t);
[math], if q(x)=q0(x0)∧(x=t) and λ(x)=λ(t);
∑λ′:y→ΔIq0I(aI,λ′(y)), if q(x)=∃y.q0(x,y);
(q1Iopq2I)(aI)* if q(x)=q1(x)op′q2(x), where op is ∪, ⊎, or − and op′ is ∨, \mathop{\mathchoice{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}}, or ∖, respectively.*
The data complexity of \textscBALGε1-query evaluation
is obtained by showing that
\textscBALGε1-queries can be be mapped to the
\textscBALGε1 algebra of Grumbach and Milo (1996).
Proposition 22**.**
Given a fixed \textscBALGε1-query q(x), the problem of checking whether qC(⟨∅,A⟩)(a)≥k for a bag ABox A, tuple
a, and k∈N0∞ is
\textscAC0 reducible to
\textscBALGε1.
Our rewriting algorithm is inspired by the algorithm in
Kikot et al. (2012) for the set case of DL-LiteR.
Before going into details, we provide a high-level description.
The key observation is that the set of valuations of a CQ q(x)=∃y.ϕ(x,y) over the bag canonical
model C(K) can be partitioned into subsets, each of which
is characterised by variables z⊆y that are
sent to anonymous elements of C(K). Hence, we can rewrite q(x) for
each of these subsets separately and then take an arithmetic union of the resulting queries, provided these queries are guaranteed to give the same answers as the corresponding subsets of valuations.
Our rewriting proceeds along the following steps.
Step 1. First, each z is checked for realisability, that is, whether the subquery induced by z can
indeed be folded into the anonymous forest-shaped part of C(K). This can
be done without the ABox, looking only at the atoms of q that link z
to other terms of q (these linking atoms exist because q is
rooted). Non-realisable z can be disregarded.
Step 2. For every realisable z, CQ q(x) is
replaced (for this z in the arithmetic union) by a CQ qz(x) obtained
from q by replacing each maximal connected component of the subquery induced by z by just one linking atom.
This transformation is equivalence-preserving, because the
anonymous part of C(K) does not involve multiplicities
other than 0 and 1.
Step 3. Finally, each resulting qz(x) is
rewritten to a \textscBALGε1-query qˉz(x) by “chasing back”
each unary atom and each binary atom mentioning a variable in z with the
TBox; for the binary atoms it is also guaranteed, by means of difference, that
the variable in z is indeed mapped to the anonymous part, thus avoiding
double-counting in the arithmetic union.
For the rest of this section, let us fix a rooted CQ q(x)=∃y.ϕ(x,y) and a DL-Litecorebag TBox T. We start by formalising
Step 1.
Definition 23**.**
Given an ontology K with a TBox T and
variables z⊆y, let [q,z]C(K) be the bag of tuples over I such that, for each tuple a of individuals,
[TABLE]
where Λz is the set of valuations
λ:x∪y∪I→ΔC(K) such that
λ(x)=a, λ(a)=a for each a∈I,
λ(x)=λ(t) for each x=t in ϕ(x,y), λ(z)
is an anonymous element for each z∈z, and λ(y)∈I for each y∈y∖z.
Hence, the bag answers to q can be partitioned as follows:
[TABLE]
Variables z⊆y are equality-consistent if ϕ(x,y)
has no equality
z=t
with z∈z and t∈/z.
If z is not equality-consistent, then [q,z]C(K)=∅ and these z can be disregarded in (1). Next, we show which other z can be
ignored.
Definition 24**.**
Given equality-consistent z⊆y,
variables
z′⊆z are maximally connected in the anonymous
part (ma-connected) if z~⊆z′ for the equivalence class z~ of any z∈z′ and
the equivalence classes z~′ are a maximal subset of
z~ connected in the Gaifman graph of q via nodes in z~.
Next we introduce several notations for ma-connected z′⊆z with equality-consistent z⊆y. First, let ϕz′ be the sub-conjunction of ϕ(x,y) that consists of all atoms mentioning at least one
variable in z′ (these sub-conjunctions are disjoint for different z′).
Second, since q is rooted,
ϕz′ contains an atom αz′ of the form
P(t,z) or P(z,t)
with z∈z′ and
t∈/z
(note that this definition may be non-deterministic).
Third, let
[TABLE]
where tz′ are all such terms t, a is an individual in tz′ if it
exists or a fresh individual otherwise, and x′=tz′∩X,
(this definition may also be non-deterministic because of a).
Notice that qz′a is a Boolean CQ with possible equalities of individuals and inequalities, and we can define
the bag answers of such a query q′
over a bag interpretation I in the same way as for usual CQs in Definition 7
with the extra requirement that each contributing valuation λ
should satisfy
λ(x)=λ(t) for each
inequality x=t of q′ (and equalities of individuals are handled as usual equalities).
Definition 25**.**
Given equality-consistent variables z⊆y,
ma-connected z′⊆z are realisable by TBox T if
[TABLE]
*where, for a fresh individual b,
A′ is the bag ABox having either only the assertion P(a,b) (with multiplicity 1), when αz′=P(t,z), or only
P(b,a), when αz′=P(z,t).
*
This definition does not depend on the choice of αz′ and a.
Indeed, if there are two atoms P1(t1,z1) and P2(t2,z2) satisfying the
definition of αz′, then either P1=P2 and both pairs
(t1,z1) and (t2,z2) are mapped by a valuation of qz′a to
the same tuple, or z′ are not realisable regardless of the
choice of αz′.
Similarly, if tz′ contains two individuals a, a′, then
qz′a has the equality a=a′,
and hence z′ are not realisable regardless of this
choice.
Intuitively, z′ are realisable if their corresponding subquery qz′a is satisfied by
the tree-shaped model induced by the TBox from a connection
αz′ of z′ and the rest of the query.
This definition does not essentially involve multiplicities, because all tuples of anonymous
elements in the canonical model have multiplicity at most 1, and, hence, if qz′a matches a part of the canonical model, it does so in a unique way.
Thus, checking realisability is decidable using
standard set-based techniques.
Definition 26**.**
Variables z⊆y are
realisable by TBox T if
they are equality-consistent and each non-empty ma-connected subset of z
is realisable by T.
We proceed to Step 2. For realisable z⊆y, let
qz(x) be the
CQ ∃y′.ϕz(x,y′) such that ϕz(x,y′) is obtained from ϕ(x,y) by replacing ϕz′, for each ma-connected z′⊆z, with
[TABLE]
where tz′ is as in qz′a, and y′ is the subset of y remaining in ϕz.
In other words, qz contains, for each z′,
just one atom αz′ and equalities identifying tz′ instead of
conjunction ϕz′ in q.
The following lemma justifies Steps 1 and 2. It says that in partitioning (1) we
only need to iterate over tuples z that are realisable by T
and can also replace q with qz for each z.
Lemma 27**.**
For any ontology K with TBox T and z⊆y with qz(x)=∃y′.ϕz(x,y′),
-
if z is realisable by T then [q,z]C(K)=[qz,z∩y′]C(K);
2. 2.
if z is not realisable by T then [q,z]C(K)=∅.
For Step 3, it suffices to rewrite each CQ qz(x)=∃y′.ϕz(x,y′) to a \textscBALGε1-query qˉz(x)=∃yz.ψz(x,yz), for yz=y′∖z, which
is guaranteed to give [qz,z∩y′]C(K) as the bag answers
on the ABox in any ontology K with TBox T.
To this end, we use the following notation: for
t∈X∪I,
let ζA(t)=A(t) for A∈C, while ζ∃P(t)=∃y.P(t,y) and ζ∃P−(t)=∃y.P(y,t) for P∈R, where y is a variable different from t.
Then, formula
ψz(x,yz) is obtained from
ϕz(x,y′)
by
replacing all atoms mentioning a term t∈I∪x∪yz
or a variable z∈z as follows:
each A(t) with ⋁T⊨C⊑AζC(t);
each P(t,z) with \big{(}\bigvee\nolimits_{{\cal T}\models C\sqsubseteq\exists P}\zeta_{C}(t)\big{)}\,\backslash\,\zeta_{\exists P}(t);
each P(z,t) with \big{(}\bigvee\nolimits_{{\cal T}\models C\sqsubseteq\exists P^{-}}\zeta_{C}(t)\big{)}\,\backslash\,\zeta_{\exists P^{-}}(t).
Note that ϕz(x,y′) does not contain any atoms of the
form A(z) for z∈z, so ψz(x,yz) does not
mention variables z. Also, atoms over roles without variables z stay intact, because T contains no role inclusions.
Finally, the rewriting of
q(x) over T
is the \textscBALGε1-query
[TABLE]
Example 28**.**
Consider TBox Tr from Example 17 and the rooted CQ
qr(x)=∃y.hasMngr(x,y)∧Mngr(y).
The query \bar{q}^{\textit{r}}(x)=\bar{q}^{\textit{r}}_{\langle\rangle}(x)\mathop{\mathchoice{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}}\bar{q}^{\textit{r}}_{y}(x), where qˉ⟨⟩r(x) and qˉyr(x) are
[TABLE]
is a rewriting of qr over Tr, since ⟨⟩ and y are realisable.
◊
The following theorem establishes the correctness of our approach and
leads to the main rewritability result.
Theorem 29**.**
For any rooted CQ q and DL-Litecorebag ontology K=⟨T,A⟩ we have that
qC(K)=qˉC(⟨∅,A⟩).
Corollary 30**.**
The class of rooted CQs is rewritable to \textscBALGε1-queries for
DL-Litecorebag.
We conclude this section by establishing
the complexity of rooted query answering. The bounds
follow as an easy consequence of Theorem 18, Proposition 22,
and Corollary 30.
Theorem 31**.**
\textscBagCert[rooted CQs,DL-Litecorebag]* is NP-complete and in
LogSpace in data complexity.*
However,
the next theorem implies
that rooted queries are not
\textscBALGε1-rewritable for
unrestricted DL-LiteRbag TBoxes.
Theorem 32**.**
\textscBagCert[rooted CQs,DL-LiteRbag]* is coNP-hard in data
complexity.
*
6 Related work
Query answering under bag semantics has
received significant attention
in the database literature Libkin and
Wong (1994); Grumbach et al. (1996); Grumbach and Milo (1996); Libkin and
Wong (1997).
These works study the relative expressive power of bag algebra primitives, the relationship
with set-based algebras, and establish the
data complexity of query answering. Such problems have also been recently studied in the setting
of Semantic Web and SPARQL 1.1 in Kaminski et al. (2016); Angles and
Gutierrez (2016).
Bag semantics
in the context of Description Logics has been studied in Jiang (2010), where
the author proposes a bag semantics for ALC and provides a tableaux
algorithm.
In contrast to our work, their results
are restricted to ontology satisfiability and do not
encompass CQ answering.
CQ answering under bag semantics is closely related to
answering Count aggregate queries. The semantics of aggregate queries for database
settings with incomplete information, such as inconsistent databases and data
exchange, have been studied in Arenas et al. (2003); Libkin (2006); Afrati and Kolaitis (2008). As pointed out in Kostylev and Reutter (2015), these
techniques are not directly
applicable to ontologies.
The practical solution in Calvanese et al. (2008) is to give epistemic
semantics to
aggregate queries, where the query is evaluated over
ABox facts entailed by the ontology; thus,
the anonymous part of the
ontology models is essentially ignored, and
the
semantics easily leads to counter-intuitive answers.
To remedy these issues, Kostylev and Reutter (2015)
propose a certain answer semantics for Count aggregate queries over
ontologies and prove tight complexity bounds for
DL-LiteR and DL-Litecore. Similarly to our work, their
semantics is open-world and
considers all models of the ontology for query evaluation, which leads to more intuitive answers.
The main difference resides in the definition of
the ontology language, where they consider
set ABoxes and adopt conventional
set-based semantics for TBox axioms.
Although DL-LiteRbag is closely related to the logic in Kostylev and Reutter (2015),
the two settings do not coincide even for set ABoxes.
For example, if A comprises only assertions R(a,b) and R(a,c) and
T comprises axiom ∃R⊑B,
then the query over ⟨T,A⟩ that counts the number of
individuals a in concept B returns 1 in the setting
of Kostylev and Reutter (2015), while the corresponding
DL-LiteRbag query returns 2.
7 Conclusion and Future Work
We have studied OBDA under bag semantics and
identified a general class of rewritable queries over
DL-Litecorebag ontologies.
As our framework covers already the class of Count aggregate queries,
in future work we plan to extend it to capture further aggregate functions
and more expressive ontologies.
Appendix A Appendix
In this appendix we give the complete proofs omitted in the main part of the paper.
See 10
Proof.
Let K=⟨T,A⟩ and
K′=⟨T,A′⟩ for any A′ satisfying
requirement {S(t)∣A′(S(t))≥1}=A.
First assume that K has a model
I=⟨ΔI,⋅I⟩. We prove that there exists a
bag model of K′.
To this end, consider the bag interpretation I′=⟨ΔI,⋅I′⟩ such that, for any u,v∈ΔI and a∈I,
[TABLE]
Bag interpretation I′ satisfies A′ and all axioms in T′, so it is a
bag model of K′. Therefore, K′ is satisfiable, as required.
To complete the proof of statement 1, suppose that K′
has a bag model I′=⟨ΔI′,⋅I′⟩. We construct
an interpretation I=⟨ΔI′,⋅I⟩ of
K in a similar way. For u,v∈ΔI and a∈I, let
[TABLE]
Same as in the previous case, I is a model of K.
For the forward direction, let a∈qK
for a tuple of individuals a, but, for the sake of
contradiction, qK′(a)=0. The
latter means that there exists a bag model I′ such that qI′(a)=0. Consider the interpretation I constructed on the base
of I′ as in the second part of the proof of statement 1. On the one hand, it
is a model of K. On the other, I⊨q(a) by construction. However, it contradicts the fact that a∈qK. Therefore, our assumption was wrong
and qK′(a)≥1.
For the backward direction, we proceed similarly.
For this let a be a tuple of individuals and assume that
qK′(a)≥1 holds but a∈qK.
The latter implies that K has a model I such that I⊨q(a). But this means that the model I′ of K′ constructed in the proof of
statement 1. on the basis of I is such that qI′(a)=0, which
contradicts our assumption that qK′(a)≥1.
∎
See 12
Proof.
Consider a variant of our running example where
T={Emp⊑∃hasMngr,∃hasMngr−⊑Mngr} and A contains
Emp(Lee) and Mngr(Hill) once.
Consider bag interpretations I1,I2 defined as
[TABLE]
Both I1 and I2 are
bag models of K=⟨T,A⟩.
Moreover, for q1=hasMngr(Lee,Hill) and
q2=∃x.Mngr(x), we have
q1I1(⟨⟩)=1,
q1I2(⟨⟩)=0,
q2I1(⟨⟩)=1, and q2I2(⟨⟩)=2;
thus, neither model is universal for both {q1,q2}. Suppose there
is a universal model I for {q1,q2}. Then, since
q1I(⟨⟩) must be zero,
(Lee,Hill) does not occur
in hasMngrI;
since Emp(Lee) is an assertion of A and
Emp⊑∃hasMngr∈T, we have
⟨Lee,w′⟩∈hasMngrI for some
w′∈ΔI distinct from Hill; since
∃hasMngr−⊑Mngr∈T, it follows
w′∈MngrI, and hence q2I(⟨⟩)≥2,
contradicting universality of I.
∎
Following the unary representation of bags Grumbach and Milo [1996], we represent
a bag Ω over a set S using expression {∣⋅∣} within which
we repeat all elements of S as many times as their multiplicity in Ω.
For convenience, we shall also write a bag {∣a,a,b,b,b∣} in the
more compressed form {∣a,b∣}2,3 where instead of repeating an
element, we list a single occurrence and denote its multiplicity with a number
in the appropriate subscript position of that bag.
See 13
Proof.
We prove that there exists a DL-Litecorebag TBox T and a Boolean CQ q such
that checking whether q⟨T,A⟩(⟨⟩)≥k for
an input
bag ABox
A and k∈N0∞ is coNP-hard.
To prove this claim, we follow Kostylev and Reutter [2015] and reduce non
3-colourability of undirected
graphs (a coNP-complete problem) to query answering over DL-Litecorebag
ontologies. We show that if G=⟨V,E⟩ is an undirected and
connected
graph with no self-loops, then G is not 3-colourable if and only if
q⟨T,AG⟩(⟨⟩)≥3×∣V∣+2
where T is the TBox
{Vertex⊑∃hasColour,∃hasColour−⊑ACol},
AG is an ABox constructed based on G, and q is the Boolean query
[TABLE]
Let I⊇V∪{a,r,g,b}. ABox AG is defined so that it
contains the following assertions:
Vertex(u) for each u∈V,
Edge(u,v), Edge(v,u) for each (u,v)∈E,
ACol(r) (∣V∣+1 times), ACol(g) (∣V∣ times), ACol(b) (∣V∣ times),
for colours r, g, b, and
Vertex(a), Edge(a,a), and hasColour(a,r), for the auxiliary vertex
a.
Individual a corresponds to an auxiliary vertex for the purposes
of the reduction, whereas individuals r, g, and b play the role of
colours.
The usage of Vertex and Edge is clear; they encode G.
Role hasColor plays the role of a colour assignment to the vertices of G;
this is also imposed by axiom
Vertex⊑∃hasColour.
Concept ACol provides a sufficient number of pre-defined colour copies that
favours 3-colour assignments based on the colours r, g, and b. Any proper
assignment of G shall use at most ∣V∣ times each one of these colours.
However, if any assignment is not proper and exhausts the number of available
colours (i.e., by assigning multiple colours to the same vertex) or uses an
additional colour, these will have to be added to concept ACol due to the
axiom
∃hasColour−⊑ACol,
effectively increasing its minimum cardinality. This behaviour is the one that
we exploit in the following reduction.
We next show that
G is not 3-colourable if and only if
q⟨T,AG⟩(⟨⟩)≥3×∣V∣+2.
“⇒”
Let G be non-3-colourable.
Consider a model I of ⟨T,AG⟩ (which exists since
⟨T,AG⟩ is satisfiable) such that, if
γ:V→{r,g,b}
is an assignment of colours to the vertices of G and u=a, then
hasColourI((uI,cI))=1 if and only if
γ(u)=c with c∈{r,g,b}.
Since G is not 3-colourable, then, for all assignments γ, there exists
at least an edge (u,v)∈E with γ(u)=γ(v)=c.
Consequently, for all models I defined on the basis of γ,
hasColourI contains tuples
(uI,cI) and (vI,cI), and hence, the subquery of q
[TABLE]
has at least two matches, each one contributing multiplicity 1; one match
corresponds to valuation
{x/uI,y/vI,z/cI}
and one to valuation
{x/aI,y/aI,z/rI}.
Observe also that atom ACol(w) contributes at least multiplicity
3×∣V∣+1.
Therefore,
qI(⟨⟩)≥2×(3×∣V∣+1)
for every model I following a proper 3-colour assignment, and hence,
3×∣V∣+2 is a certain multiplicity with respect to all these models, as
required.
Clearly, the same statement holds for all of
the models that add additional elements in Vertex, Edge, or assign multiple
colours to some vertices exceeding the number of available colours.
What is left to consider is those models that assign additional colours to
vertices and not just one among r, g, and b. For such colour assignments,
G might turn out to be colourable. Suppose G is 4-colourable (if it is not,
then the above discussion carries over) and let p∈I.
Then, there
exists a model that follows a 4-colour assignment
γ:V→{r,g,b,p} such that
γ(u)=γ(v) for every
(u,v)∈E.
Therefore, for that model we would get one match with multiplicity 1 for
subquery q1(x,y,z), that is, for valuation {x/aI,y/aI,z/rI}).
On the other hand, given the observations above, that model would have to
include element p in the extension of ACol at least once, effectively
increasing the cardinality of ACol to 3×∣V∣+2. Therefore, the
evaluation of q over that model would always give at least 3×∣V∣+2
empty tuples.
Clearly, the same holds for models that make use of further colours.
Therefore, q⟨T,AG⟩(⟨⟩)≥3×∣V∣+2.
“⇐”
Let G be 3-colourable. It suffices to show that there exists a model I for
which qI(⟨⟩)=m with m<3×∣V∣+2.
Since G is 3-colourable, there is an assignment
γ:V→{r,g,b}
such that, for every
(u,v)∈E, γ(u)=γ(v).
Consider an interpretation Iγ defined as follows:
[TABLE]
Interpretation Iγ is defined based on the contents of V, E, and
the 3-colour assignment γ.
It is easy to verify that Iγ is a model of ⟨T,AG⟩.
Next, we show that
qIγ(⟨⟩)=3×∣V∣+1.
First, we observe that subquery q1(x,y,z)
matches exactly once (i.e., under valuation {x/da,y/da,z/dr}).
This holds because γ is a proper 3-colouring of G and, for every
(u,v)∈E, γ(u)=γ(v).
Note also that there are three valuations for atom ACol(w)
contributing multiplicity 3×∣V∣+1 in total. Consequently,
qIγ(⟨⟩)=3×∣V∣+1, as desired.
∎
Remark 1**.**
When the UNA is dropped, we can modify the definition of ABox satisfaction and
show that a similar reduction holds for establishing coNP-hardness of query
answering for DL-Litecorebag ontologies. Under this new definition, a bag
interpretation I satisfies an ABox A if:
for each concept assertion A(a) in A, we have
∑a0∈I:a0I=aIA(A(a0))≤AI(aI), and
for each role assertion P(a,b) in A, we have
∑a0,b0∈I:a0I=aI,b0I=bIA(P(a0,b0))≤PI(aI,bI).
Observe that under the UNA, the definition of ABox satisfaction
(Definition 5) is a special case of the above, hence,
Theorem 13 is still valid under this new definition.
We now discuss the modifications that are necessary for reducing
non-3-colourability of undirected and connected graphs without self-loops to
query answering in DL-Litecorebag ontologies without making the UNA.
For this, we need to make sure that the auxiliary vertex a is not interpreted
with the same element with any of the vertices of G as well as that none of
the colours r,g,b are interpreted by the same element. To ensure this, we
employ atomic concepts Va, VG, Red, Blue, and Green which will hold
the auxiliary vertex a, the vertices of G, and the three colours,
respectively. Then, we make sure that no interpretation mixes their role by
introducing pairwise disjointness axioms:
Disj(Red,Blue), Disj(Red,Green), Disj(Blue,Green),
and Disj(Va,VG). Last, we modify AG to have the additional
assertions
Va(a), Red(r), Green(g), Blue(b), and VG(u), for every vertex u∈V.
Following exactly the argumentation used in Theorem 13, we
can show that the above reduction works if the UNA is dropped.
An enumerated bag (e-bag, for short) Θ over a set M is a set of pairs [c:m] with c∈M and m∈N, where N is the set of positive integers, such that if [c:m]∈Θ then [c:m−1]∈Θ for all m∈N. There is a straightforward one-to-one correspondence between bags and e-bags, and we denote Ωe the enumerated version of a bag Ω. This notion generalises to bag interpretations: the e-bag interpretation Ie corresponding to a bag interpretation I=⟨ΔI,⋅I⟩ is the pair ⟨ΔI,⋅Ie⟩ such that aIe=aI for each individual a and SIe=(SI)e for any S∈C∪R. The interpretation function extends to inverse roles in the same way.
An enumerated homomorphism (e-homomorphism) from an e-bag interpretation Ie=⟨ΔI,⋅Ie⟩ to an e-bag interpretation Je=⟨ΔJ,⋅Je⟩ is a family (h,hS,…), S∈C∪R, of functions
[TABLE]
such that
h(aIe)=aJe for each a∈I,
hA([u:m])=[h(u):ℓ] for all A∈C and [u:m]∈AIe, where ℓ∈N is some number (which can be different for different A and [u:m]),
hP([(u,v):m])=[(h(u),h(v)):ℓ] for
all P∈R and [(u,v):m]∈PIe, where
ℓ∈N is some number.
To handle some cases uniformly, we sometimes write hP−([(v,u):m]) instead of hP([(u,v):m]), for P∈R.
Intuitively, an e-homomorphism is a usual homomorphism that additionally establishes correspondence for each enumerated tuple of elements in each relation in Ie.
An e-homomorphism (h,hS,…)
from Ie=⟨ΔI,⋅Ie⟩ to Je=⟨ΔJ,⋅Je⟩
is predicate-injective on individuals I if, for each u such that there exists a∈I with h(u)=aJe,
hA([u:m])=hA([u:ℓ]) for all A∈C and all [u:m],[u:ℓ]∈AIe with m=ℓ,
hR([(u,v1):m])=hR([(u,v2):ℓ])
for
all roles R and all [(u,v1):m],[(u,v2):ℓ]∈RIe with v1=v2 or m=ℓ.
Lemma 33**.**
For any DL-Litecorebag ontology K and any bag model I of K there exists an e-homomorphism from Ce(K) to Ie that is predicate-injective on I.
Proof.
Let C(K)=⋃i≥0Ci(K) with Ci(K)=⟨ΔCi(K),⋅Ci(K)⟩.
We first define a witnessing predicate-injective e-homomorphism (h,hS,…) for the elements in ΔC0(K), that is, on the
(interpretations of the) individuals, then extend it to elements introduced in
C1(K), and finally recursively define it on all other elements.
For the first step, consider an individual a∈I and the element u=aC0(K). We set h(u)=aI. Then, consider any atomic concept A
such that AC0(K)(u)=k, k∈N, that is, such that
[u:m]∈AC0e(K) for all m∈N with m≤k.
By the definition of C0(K),
A(A(a))=k. Since I is a model of A, we have that AI(aI)≥k. In other words, [h(u):m]∈AIe for all m≤k, and
we can set hA([u:m])=[h(u):m] for all m.
Consider now individuals a,b∈I with corresponding elements u=aC0(K) and v=bC0(K) and an atomic role P such that
PC(K)0(u,v)=k, k∈N, that is, such that [(u,v):m]∈PC0e(K) for all m∈N with m≤k.
By the definition of C0(K), we have that A(P(a,b))=k. Since I
is a model of A, we have that PI(aI,bI)≥k. In other words,
[(h(u),h(v)):m]∈PIe for all m≤k, and,
similarly to the concept case, we set hP([(u,v):m])=[(h(u),h(v)):m] for all m.
For the second step, consider an individual a∈I with its
interpretation u=aC0(K) and a role P∈R such that
cclT[u,C0(K)](∃P)=k for k∈N, but
(∃P)C0(K)(u)=l<k (the case where P is not an atomic
role is analogous).
Then,
δ=cclT[u,C0(K)](∃P)−(∃P)C0(K)(u)>0, hence ΔC1(K)=ΔC0(K)∪{wu,P1,…,wu,Pδ} where wu,Pj are fresh anonymous
elements.
Moreover, PC1(K) contains PC0(K) plus tuples
(u,wu,P1),…,(u,wu,Pδ).
We next show that h can be extended to all anonymous elements
wu,Pj introduced at this step as a result of some u and role P with
the above properties such that (h,hS,…), S∈C∪R, is predicate injective on I.
Because
cclT[u,C0(K)](∃P)=k and
(∃P)C0(K)(u)=l<k, there exists a sequence of
concepts C0,…,Cn with Cn=∃P
such that
Ci−1⊑Ci∈T for all i∈[1,n] and
C0C0(K)(u)=k.
Since (h,hS,…) is predicate injective on I at the first step and
h(u)=aI, we have
C0I(aI)≥k. Because I is a model of K, it satisfies all
axioms in T, hence, CiI(aI)≥k, and as a result
(∃P)I(aI)≥k. In other words,
PIe contains at least k pairs
[(aI,zi):mi], i∈[1,k].
Observe that from the first step and
every pair
[(u,v1):m],[(u,v2):m′]∈PC0e(K)
with
v1=v2 or m=m′, we have
hP([(u,v1):m])=hP([(u,v2):m′]).
Because PC0e(K) contains l such distinct tuples and
k=δ+l,
there are at least δ pairs
[(aI,r1):n1],…,[(aI,rδ):nδ] in PIe
for which there is no
[(u,v):m]∈PC0e(K) that maps to them under hP.
Therefore, we can extend h such that h(wu,Pj)=rj and
set hP so that
hP([(u,wu,Pj):1])=[(aI,rj):nj].
Suppose now that there exists wu,Pj such that h(wu,Pj)=bI with
b∈I. Since PC1(K)(u,wu,Pj)=1, we have
[(u,wu,Pj):1]∈PC1e(K) and
[(wu,Pj,u):1]∈(P−)C1e(K), hence,
the requirement for hP− w.r.t. wu,Pj is trivially satisfied.
Finally, consider an element u=aC0(K) such that
cclT[u,C0(K)](A)>AC0(K)(u) with A∈C.
In such a case,
AC1(K)(u) is set to cclT[u,C0(K)](A).
Given the above discussion, it is trivial to verify that
hA satisfies the required condition on the pairs
[u:m]∈AC1e(K).
As a result of all the above, we have shown that (h,hS,…) is predicate
injective on I at the second step as well.
Last, observe that for all i>1, and for all S∈C∪R,
extensions SCi(K) contain SCi−1(K) plus tuples t
mentioning only anonymous elements, for which we know by definition that
SCi(K)(t)=1. Therefore, h can be trivially extended to
these anonymous elements so that (h,hS,…) is predicate injective on
I at step i.
∎
Since a Boolean CQ q can be seen as a bag of atoms, we can consider its corresponding Boolean enumerated CQ (e-CQ), which is the e-bag qe. We call the elements of qe enumerated atoms (e-atoms).
For the following definition, it is convenient to partition a Boolean CQ q to
the subqueries qS each of which consists of all atoms in q over
atomic concept or role S (with corresponding multiplicities)
and subquery q= consisting of all equalities in q.
An enumerated valuation (e-valuation) of a Boolean e-CQ
qe, for q()=∃y.ϕ(y), over an e-bag
interpretation Ie=⟨ΔI,⋅Ie⟩ is
a family (ν,νS,…), S∈C∪R, of functions
[TABLE]
such that
ν(a)=aIe for each a∈I,
ν(y)=ν(t) for all equality e-atoms [y=t:m]∈q=e,
νA([A(t):m])=[ν(t):ℓ] for all A∈C and [A(t):m]∈qAe, where ℓ∈N is some
number, and
νP([P(t1,t2):m])=[(ν(t1),ν(t2)):ℓ]
for all P∈R and [P(t1,t2):m]∈qPe, where
ℓ∈N is some number.
Similarly to the case of e-homomorphisms, we sometimes write
νP−([P−(t1,t2):m]) instead of νP([P(t1,t2):m]), for P∈R.
Intuitively, a Boolean CQ can be seen as a bag interpretation with terms
(variables and individuals) in the domain. Then, an e-valuation is just an
e-bag homomorphism from the enumerated version of this special bag
interpretation to a normal e-bag interpretation.
It is straightforward to check that the number of e-valuations of a Boolean
e-CQ qe over an e-bag interpretation Ie is precisely the
multiplicity qI(⟨⟩) of the empty tuple in the evaluation of
q over I.
The following lemma says that if two e-valuations over the bag canonical model
coincide on all the (enumerated copies of the) atoms of a rooted CQ that
involve terms evaluating to (the interpretations of) individuals, then they are
the same e-valuation.
Lemma 34**.**
Let q be a rooted Boolean CQ and K be a DL-Litecorebag ontology. If two
e-valuations (ν1,νS′1,…) and (ν2,νS′2,…) of
qe over Ce(K) are different, then there exist an individual a∈I, e-atom [S(t):m]∈qe and number i∈{1,2} such that νi(a)∈νi(t) and νS1([S(t):m])=νS2([S(t):m]).
Proof.
Let e-valuations (ν1,νS′1,…) and (ν2,νS′2,…) of qe over Ce(K) be different, but, for the sake of
contradiction, νS1([S(t):m])=νS2([S(t):m])
for all a∈I, [S(t):m]∈qe and i∈{1,2}
such that νi(a)∈νi(t). Since the e-valuations are different,
there exists
[S(t):m]∈qe such that νS1([S(t):m])=νS2([S(t):m]). Moreover, by assumption t consists of only variables. Suppose that S(t) is P(x1,x2), where P∈R (we do it without loss of generality, because the case of A(x) for A∈C can be handled in the same way).
Boolean CQ q is rooted, so there exists a sequence
[TABLE]
of e-atoms such that t0∈I, [Rk(tk−1,tk):mk] is either [P(x1,x2):m] or [P−(x2,x1):m], and for each j=1,…,k either [Rj(tj−1,tj):mj] is in qe, if Rj is an atomic role, or [Pj(tj,tj−1):mj] is in qe, if Rj=Pj−.
We claim that
[TABLE]
for all j=1,…,k (which, in particular, contradicts our assumption on [P(x1,x2):m]). To prove this claim, suppose for the sake of contradiction that it is not the case, and let j∈{1,…,k} be the smallest number such that (2) does not hold.
By assumption, we know that νi(tj−1)=νi(a) for both i=1,2
and any a∈I (therefore, j=1, because t0∈I). However,
since j is the smallest number, ν1(tj−1)=ν2(tj−1). So, the
element u=ν1(tj−1) in the bag canonical model C(K)=⋃i≥0Ci(K) was introduced not in C0(K), which implies, by
construction, that (∃Rj)C(K)(u)≤1. In fact, since
(ν1,νS′1,…) is an e-valuation, (∃Rj)C(K)(u)=1, that is, there exists just one v∈ΔC(K) such that
RjC(K)(u,v)≥1, and, moreover, RjC(K)(u,v)=1. In
other words, it holds that [(u,v):1]∈RjCe(K), but
[(u,v):2]∈/RjCe(K).
Since (ν1,νS′1,…) and (ν2,νS′2,…) are
e-valuations, νRj1 and νRj2 send [Rj(tj−1,tj):mj] to some enumerated pairs in RiCe(K), which, by
assumption, are different. However, we also know that ν1(tj−1)=ν2(tj−1), so the only possibility for both
νRj1([Rj(tj−1,tj):mj]) and
νRj2([Rj(tj−1,tj):mj]) is [(u,v):1].
Therefore, our assumption on existence of j was wrong and
(2) indeed holds for all j. In particular, it holds for j=k, which contradicts the fact that νP1([P(x1,x2):m])=νP2([P(x1,x2):m]). Therefore, our assumption on (ν1,νS′1,…) and (ν2,νS′2,…) was wrong, and the lemma
is proven.
∎
Having Lemmas 33 and 34 at
hand, we are ready to prove that for DL-Litecorebag ontologies rooted queries
can be evaluated over the bag canonical model.
See 16
Proof.
First, note that it is enough to consider only Boolean rooted CQs, because the
required property for a non-Boolean rooted CQ q(x) follows from the
property for all Boolean CQs obtained from q(x) by replacing variables
x by individuals from I.
For a Boolean rooted CQ q it is enough to show that for any DL-Litecorebag
ontology K, any bag model I of K and any e-valuation (ν,νS,…) of q over Ce(K) there exists a unique e-valuation (ν′,νS′,…) of q over Ie. By Lemma 33 we
know that
there exists an e-homomorphism (h,hS,…) from Ce(K) to Ie that is predicate-injective on I.
Therefore, we can take the composition (ν,νS,…)∘(h,hS,…)=(ν∘h,νS∘hS,…) as (ν′,νS′,…);
indeed, the result of this composition is an e-valuation of q over Ie
and, by Lemma 34, this result is
unique throughout e-valuations of q over Ce(K).
∎
See 18
Proof.
Let q be the CQ q(x)=∃y.ϕ(x,y).
First note that because CQs are safe and equalities between individuals are not
allowed, ϕ(x,y) contains at least one atom, thus, n≥1.
Observe that Cn(K) is a subinterpretation of C(K), hence, from the
monotonicity property of CQs, we have qCn(K)⊆qC(K).
To prove the inverse inclusion, we show that interpretations
Ck(K) with k>n do not
contribute to the bag answers qC(K), and as a result, they can be
disregarded.
In other words, we prove that for every tuple of individuals
a and every valuation
λ:x∪y∪I→ΔC(K)
with
λ(x)=a
such that there exist a number k>n
and an atom Sk(tk) in ϕ(x,y) with
SkCk(K)(λ(tk))>SkCn(K)(λ(tk)), it holds that
∏S(t) in ϕ(x,y)SCk(K)(λ(t))=0.
By definition of canonical models, for k>n≥1,
interpretation Ck(K) differs from Ck−1(K) in that it contains
a number of tuples not present in Ck−1(K) having multiplicity 1 and
mentioning only anonymous elements.
Hence, inequality
SkCk(K)(λ(tk))>SkCn(K)(λ(tk))
effectively means that we are considering only valuations that send an atom of
ϕ(x,y) to a tuple of anonymous elements of C(K) added after
step n.
Suppose by contradiction that there are a and λ satisfying the
above criteria but
∏S(t) in ϕ(x,y)SCk(K)(λ(t))≥1.
This means that λ satisfies all equalities of q and for every atom
S(t) of q, SCk(K)(λ(t))≥1.
Because q is rooted, every connected component of the Gaifman graph of
q has a node, that is, an equivalence class, that mentions a free variable or
an individual.
Consider the component of q that contains atom Sk(tk) and the
equivalence class t~ of this component that contains a free variable or
an individual.
Because CQs are safe by definition, this component
contains an atom P(t′) mentioning a term in t~.
As a result, λ(t′) contains at least one individual, which,
given that λ(tk) is a tuple of anonymous elements,
implies that P(t′) and Sk(tk) are different atoms.
By definition of canonical models, we know that C1(K) is the
subinterpretation of C(K) containing tuples with at least one individual,
hence we derive that
PC1(K)(λ(t′))≥1.
But then, since the image of P(t′) under λ falls into
C1(K) while the image of Sk(tk) under λ falls into
Ck(K) but not into Cn(K) (which implies the same for all
subinterpretations of Cn(K)),
and both atoms belong to the same connected component, it means that
ϕ(x,y) contains conjunction
⋀j=1kSj(tj)
such that
(i) atom S1(t1) is connected with P(t′),
(ii)
tj∩tj+1=∅, for 1≤j<k, and
(iii)
SjCj(K)(λ(tj))≥1 and
SjCj−1(K)(λ(tj))=0, for j∈[1,k].
In other words, the image of each one of the atoms under λ
falls respectively onto tuples created in
C1(K),C2(K),…,Ck(K).
But then, this means that q contains at least k atoms, which is a
contradiction given that k>n.
∎
See 19
Proof.
First we prove the claim for max unions of CQs.
Consider the DL-Litecorebag TBox
T={A⊑∃R, ∃R−⊑B},
the rooted CQ
q(x)=∃y.R(x,y)∧B(y), and
the DL-Litecorebag ABox
A={∣A(a),A(a),A(a),R(a,b),R(a,b),B(b),B(b),B(b)∣}
and let
K=⟨T,A⟩.
Then, C(K) is such that
[TABLE]
Evaluating q over C(K), we get qC(K)(a)=7 for the
individual a.
Suppose now that there exists a rewriting of q to a max union of CQs and let
q′(x)=q1(x)∨⋯∨qn(x) be such a rewriting where
q1,…,qn are CQs. This means that ⋃i=1nqiC(⟨∅,A⟩)=qC(K) or, alternatively,
that there exists i∈[1,n] with
qiC(⟨∅,A⟩)=qC(K).
Observe that A contains three distinct assertions with multiplicities 3,
2, and 3. Therefore, whenever there is a valuation for the terms of
qi that maps an atom of qi to one of these assertions, the multiplicity
is either 2 or 3. As a result and because qi is a CQ, any valuation of
qi contributes to qiC(⟨∅,A⟩)(a) a
multiplicity that is a multiple of 2 or 3.
Since 7 is prime, there can be no valuation contributing a multiplicity of
7. However, 7 can be expressed as the sums 2+2+3 or 2×2+3.
For the former sum, this means that there exist three distinct valuations
contributing to qiC(⟨∅,A⟩)(a) multiplicities
2, 2, and 3, respectively, which is
clearly impossible given the fact that to get 2, query qi must be set
equal to ∃y.R(x,y), which excludes the possibility of getting a
multiplicity of 3.
For the latter sum, this means that there exist two distinct valuations
contributing to qiC(⟨∅,A⟩)(a) multiplicities
4 and 3, respectively, which is again impossible given the fact that to get
4, query qi must be set equal to
∃y.∃z.R(x,y)∧R(x,z)
(another possibility would have been to use the same variable for y and z,
but the argumentation stays the same), which excludes the possibility of
getting a multiplicity of 3.
We now prove the claim for arithmetic unions of CQs by building on the
observations made in the proof above.
Consider the DL-Litecorebag TBox
T′=T∪{ C⊑∃P, ∃P−⊑D},
the rooted CQ
q′(x,z)=∃y.∃u.R(x,y)∧B(y)∧P(z,u)∧D(u), and
the DL-Litecorebag ABox
A′=A∪{∣C(a),P(a,b),D(b)∣}8,8,8
and let
K′=⟨T′,A′⟩.
Then, C(K′) is such that
[TABLE]
First, observe that T′ contains T plus a copy of the axioms in T with
their predicates renamed. Hence, T′ can be seen as having two disconnected
parts.
Second, q′ has two rooted connected components the first of which, say
q1(x), is query q from the previous part of the proof, while the second,
say q2(z), is an isomorphic query of q with the predicates renamed
according to the one-to-one mapping f={(A,C),(R,P),(B,D)}.
Based on these observations, we draw the following conclusions:
(i) the multiplicity of a tuple (c1,c2) of individuals in
(q′)C(K′) is the result of multiplying numbers q1C(K′)(c1)
and
q2C(K′)(c2);
(ii) a rewriting of q′ into an arithmetic union of CQs exists if and
only if a rewriting for q1 and q2 exists;
(iii) the rewritings of q1 and q2 should have the same number of
CQs, which should be identical up to renaming of variables and predicates
based on f.
Consider now the evaluation of q′ over C(K′). This leads to a bag
containing just tuple (a,a) with multiplicity
(q′)C(K′)((a,a))=q1C(K′)(a)×q2C(K′)(a)=7×64.
Let also q1′(x) and q2′(z) be the rewritings for q1(x) and q2(z),
respectively. Given the discussion in the first part of the proof, to get
multiplicity 7 for q1C(K′)(a), we have only two ways: either as the
sum of 2+2+3 or as the sum 4+3.
For the former, q1′(x) should be equal to
\exists y.\,R(x,y)\mathop{\mathchoice{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}}\exists y.\,R(x,y)\mathop{\mathchoice{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}}A(x), while for the
latter, q1′(x) should be equal to
\exists y_{1}.\,\exists y_{2}.\,R(x,y_{1})\wedge R(x,y_{2})\mathop{\mathchoice{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}}A(x).
By construction, both of these queries when evaluated over
C(⟨∅,A′⟩) return the correct multiplicity for
q1C(K′)(a) and there are no other queries with this property.
However, evaluating their identical versions up to renaming of variables and
predicates based on f over
C(⟨∅,A′⟩), that is, queries
\exists u.\,P(z,u)\mathop{\mathchoice{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}}\exists u.\,P(z,u)\mathop{\mathchoice{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}}C(z)
and
\exists u_{1}.\,\exists u_{2}.\,P(z,u_{1})\wedge P(z,u_{2})\mathop{\mathchoice{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}}C(z),
we get, respectively, multiplicity 24 and 72, both of which are different
from
q2C(K′)(a)=64.
∎
Let S be a finite set of concept and role symbols.
We say that a bag interpretation
⟨ΔI,⋅I⟩
is finite relative to S if,
for every S∈S, bag SI is finite.
Finite bag interpretations relative to a finite set S correspond to
bag database instances I over bag schemas S with
domains ΔI as these where defined in Grumbach and Milo [1996].
Hence, in the following, given a finite set S and a bag interpretation
I that is finite relative to S, we denote by II the corresponding
bag database instance.
Also, given a bag database instance II and a \textscBALGε1 algebra expression
E, we denote by E(II) the bag corresponding to the evaluation of E over
II.
Proposition 35**.**
Let I by any bag interpretation that is finite relative to a finite set
S.
For each \textscBALGε1-query q there is a \textscBALGε1 algebra expression E
such that
qI=E(II).
Proof.
We refer to Grumbach and Milo [1996] for the definition of the \textscBALGε1
operators.
For each \textscBALGε1-query q, we define a \textscBALGε1 algebra expression
Eq by induction on the structure of q as follows, where, for each
tuple of terms t over X∪I and t∈t∪I,
ref(t,t) is defined as t if t∈I, and otherwise as the
first position in t containing t:
If q(x)=S(t) for S∈C∪R and
t=⟨t1,…,t∣t∣⟩ a tuple over
x∪I, then
[TABLE]
If q(x)=q0(x0)∧(x=t) for x∈x0,
t∈X∪I, and x=x0∪({t}∖I), then
Eq=σref(x,x0)=ref(t,x0)(Eq0) if t∈x0∪I and x=x0
(we assume w.l.o.g. that the order of variables in x and
x0 is the same), and
Eq=π1,…,∣x0∣,ref(x,x0)(Eq0)
if t∈X∖x0 and x=x0t (we assume
w.l.o.g. that t is added as the last variable to x0).
If q(x)=q1(x1)∧q2(x2), for
x=x1∪x2, then
[TABLE]
If q(x)=∃y.q0(x,y), then
Eq=π1,…,∣x∣(Eq0) (we assume w.l.o.g. that in q0
variables in y come after variables in x).
If q(x)=q1(x)∨q2(x), then
Eq=Eq1∪Eq2.
If q(\mathbf{x})=q_{1}(\mathbf{x})\mathop{\mathchoice{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss\vee\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{\cdot}\raise 2.15277pt\hbox{}}}}}q_{2}(\mathbf{x}), then
Eq=Eq1⊎Eq2.
If q(x)=q1(x)∖q2(x), then
Eq=Eq1−Eq2.
It is straightforward to check that, for each \textscBALGε1-query q and each
bag interpretation I, we have qI=Eq(II).
∎
See 22
Proof.
The complexity class \textscBALGε1 is defined in Grumbach and Milo [1996] by the
problem of checking, given a bag database instance I, a tuple of individuals
b, a number n≥0, and a fixed \textscBALGε1 algebra expression E,
whether the multiplicity of b in bag E(I) is exactly n.
We next reduce the problem of checking
qC(⟨∅,A⟩)(a)≥k to the above
problem. Without loss of generality we assume that k∈N∪{0}, since
inequality
qC(⟨∅,A⟩)(a)≥k is always false
whenever k=∞.
For this consider the following definitions:
Let I be the extension of
C(⟨∅,A⟩) such that
[TABLE]
Note that I is finite relative to the finite set S consisting of
the predicate symbols in A and symbol S, hence II is a bag database
instance with schema S.
Let b=a and n=0.
Let E be the algebra expression corresponding to query S(x)∖q(x) according to Proposition 35.
Then, qC(⟨∅,A⟩)(a)≥k if and only if
the multiplicity of a in E(II) is 0.
Indeed, suppose
qC(⟨∅,A⟩)(a)<k. Then
(S(x)∖q(x))I(a)>0, thus by
Proposition 35 the multiplicity of a in E(II)
is greater than 0; the other direction is analogous.
The above many-one reduction can be seen to be computable, for each A,
a, and k, by a Boolean circuit whose depth depends only on q. We
conclude that the language
{⟨A,a,k⟩∣qC(⟨∅,A⟩)(a)≥k} is
contained in \textscBALGε1 under LogSpace-uniform \textscAC0 reductions, as
required.
∎
In the following, for a CQ possibly with inequalities
q(x)=∃y.ϕ(x,y)
and a bag interpretation I,
we call a valuation
λ:x∪y∪I→ΔI
a homomorphism from q to I if
for every atom P(t) of ϕ(x,y), it holds
PI(λ(t))≥1.
Proposition 36**.**
For any rooted CQ
q(x)=∃y.ϕ(x,y),
DL-Litecorebag ontology K,
and equality-consistent subset z⊆y,
we have
[TABLE]
where
(i) zˉ are all the terms of ϕ(x,y) not
appearing in z,
(ii) ϕzˉ is the subconjunction of
ϕ(x,y) that consists of atoms and equalities mentioning only
terms in zˉ,
(iii) ϕz′ is the subconjunction of
ϕ(x,y) that consists of atoms and equalities mentioning a
variable in z′, and
(iv) tz′ are all the terms of zˉ appearing
in ϕz′.
Proof.
First observe that by Definition 24 of ma-connected subsets
of z the following hold:
(i) if a variable z belongs to a ma-connected subset z′ of
z, then z′ contains all variables in z~ plus all variables
in the equivalence classes that are reachable from z~ in the Gaifman
graph of q through nodes in z~;
(ii) any two ma-connected subsets of z do not have any atom or
equality in common.
Combining the above observations with the fact that q is rooted, we derive
that the query appearing on the right-hand side of
equation (3), name it q′, contains
exactly query q plus a number of equalities between terms connecting
ϕzˉ with the subconjunction corresponding to a ma-connected
subset of z.
Let a be a tuple of individuals and Λz be the set of
homomorphisms for [q,z]C(K) and a.
First, suppose that Λz=∅, that is,
[q,z]C(K)(a)=0. This means that there is no homomorphism
λ from q to C(K) with λ(x)=a that
satisfies the equalities of q. But then, the same is true for q′, hence
[q′,z]C(K)(a)=0 as well.
Suppose now that there is a homomorphism λ satisfying the equalities of
q. This means that λ satisfies the equalities of q′ as well except
possibly the extra ones.
We next prove that λ satisfies these extra equalities for each
non-empty ma-connected subset z′ of z as well.
For this, consider such a subset z′.
Because q is rooted,
ϕz′ should be connected with
ϕzˉ, hence, ϕz′ contains atoms
Pi(ti,zi) (resp., Pi(zi,ti)) such that ti∈z′,
zi∈z′, and i∈[1,n]
(note that if ϕzˉ is empty the above still holds, because if
we assume the opposite, then
⋀ma-connected z′⊆zϕz′,
as a rooted query, should contain a distinguished variable or an equality
mentioning an individual, but since for every z′ we have
z′⊆z⊆y and z is
equality-consistent, we derive a contradiction in both cases).
Suppose by contradiction that there is a pair i,j
such that
λ(ti)=λ(tj).
Because λ(ti),λ(tj)∈I,
by definition of canonical models, we have
λ(zi)=wλ(ti),Pi and
λ(zj)=wλ(tj),Pj.
But then, because z′ is ma-connected, ϕz′ contains an atom
that is sent by λ to a tuple (w1,w2) such that w1 is either
wλ(ti),Pi or an anonymous generated by wλ(ti),Pi
and w2 is either wλ(tj),Pj or an anonymous generated by w2.
Given that the anonymous elements of canonical models are characterised by the
individual and the role that generated them and there can be no tuple having
anonymous elements generated from different combination, the above situation is
impossible.
Hence, not only we have λ(ti)=λ(tj) for every i,j∈[1,n],
but also that Pi=Pj holds.
From the former it follows that the extra equalities in q′ are satisfied by
all homomorphisms in Λz.
∎
Proposition 37**.**
Let q(x)=∃y.ϕ(x,y)
be a rooted CQ,
K=⟨T,A⟩ a DL-Litecorebag ontology, and
z an equality-consistent subset of y.
For all non-empty, ma-connected, and realisable by T subsets z′ of
z, we have
qz′aC(⟨T,A′⟩)(⟨⟩)=1
where individual a and ABox A′ are picked according to
Section 5.
Proof.
Consider a non-empty, ma-connected, and realisable by T subset z′ of
z.
Let us first inspect query qz′a.
For x′=tz′∩X,
a an individual in tz′ if it exists or a fresh individual
otherwise, we have
qz′a()=∃x′.∃z′.ϕz′∧⋀t∈tz′(t=a)∧⋀z∈z′(z=a)
where, for t∈tz′ and z∈z′, A′ is the bag ABox
having either only assertion P(a,b) (with multiplicity 1), when
αz′=P(t,z), or only assertion P(b,a),
when αz′=P(z,t).
Since z′ is realisable by T,
by Definition 25 we have
(qz′a)C(⟨T,A′⟩)(⟨⟩)≥1. Suppose that
(qz′a)C(⟨T,A′⟩)(⟨⟩)>1.
Observe that every concept/role extension under C(⟨T,A′⟩)
is a set, hence, the multiplicity of a tuple in any such extension is 1.
This means that all homomorphisms from qz′a to C(⟨T,A′⟩) that satisfy the
equalities and inequalities of qz′a contribute multiplicity 1 for
⟨⟩ in bag (qz′a)C(⟨T,A′⟩), hence, there exist at least two such
homomorphisms.
Because z′ is ma-connected, the subgraph of the Gaifman graph of q
induced by the set of the equivalence classes of z′ is connected, and as
a result, the Gaifman graph of ϕz′ in qz′a contains a single
connected component.
But then, because qz′a contains equalities between all terms in tz′ and individual a, all homomorphisms from qz′a to C(⟨T,A′⟩)
send all atoms containing a term in tz′ to the assertion of
A′. Hence, ϕz′ is essentially rooted.
Therefore, in order to have two homomorphisms from qz′a to C(⟨T,A′⟩), it
is necessary that qz′a contains an atom P(x,y) such that it can be sent
to two different tuples, say (u,v1) and (u,v2) in PC(⟨T,A′⟩).
However, observe that for every element u of ΔC(⟨T,A′⟩) and for every
role P, C(⟨T,A′⟩) may contain at most one P-successor for u, which
implies that every atom P(x,y) in qz′a is sent to a tuple in PC(⟨T,A′⟩)
in a unique way.
Combining all of the above, we conclude that there is only one homomorphism from
qz′a to C(⟨T,A′⟩) satisfying the equalities and inequalities of qz′a,
which proves the claim.
∎
See 27
Proof.
Let Λz be the set of valuations
λ:x∪y∪I→ΔC(K)
corresponding to Definition 23 for [q,z]C(K).
1.
Because variables z are realisable by T, by
Definition 26 we have that z are
equality-consistent and all non-empty ma-connected subsets z′ of z are realisable by T.
The former implies that there is no equality
z=t with z∈z and t∈z in any ϕz′.
The latter implies that for any non-empty ma-connected subset z′ with
x′=tz′∩X,
a an individual in tz′ if it exists or a fresh individual
otherwise, and
query
qz′a()=∃x′.∃z′.ϕz′∧⋀t∈tz′(t=a)∧⋀z∈z′(z=a),
we have
(qz′a)C(⟨T,A′⟩)(⟨⟩)≥1
where, for tz′∈tz′ and zz′∈z′,
A′ is the bag ABox
having either only assertion P(a,b) (with multiplicity 1), when
αz′=P(tz′,zz′), or only assertion P(b,a),
when αz′=P(zz′,tz′)
(note that by the proof of Proposition 36, atom
αz′ always exists).
This means that there exists a homomorphism ν from ϕz′ to
C(⟨T,A′⟩)
such that ν maps a term t of qz′a to a if and only if t∈tz′.
Consider now a valuation λ∈Λz, the subconjunction
ϕz′ of q, and atom αz′=P(tz′,zz′) (resp.,
αz′=P(zz′,tz′)) of ϕz′.
By Definition 23,
λ maps term tz′ to individuals and variable zz′ to
anonymous elements.
Therefore, whenever λ is a homomorphism from q to C(K), it means
that PC(K) contains a tuple
(a′,wa′,P) for some individual a′. But then, this means that
C(K) contains an isomorphic copy of C(⟨T,A′⟩) modulo the individuals a
and b in A′.
Hence, inequality
(qz′a′)C(K)(⟨⟩)≥1
holds as well, which due to the above observations it implies that either a′
is the
only individual contained in ϕz′ and a=a′ or ϕz′
does not mention any individual and the choice of a above is irrelevant.
By Proposition 37, we have
(qz′a)C(⟨T,A′⟩)(⟨⟩)=1, hence,
(qz′a′)C(K)(⟨⟩)=1 as well, which implies that the
images of
all atoms in ϕz′ under λ have multiplicity 1.
Based on this fact, we derive the following equivalence for every ma-connected
subset z′ of z.
[TABLE]
We now inspect the form of qz.
Let
zˉ be all terms of ϕ(x,y) not appearing in z,
yzˉ=X∩zˉ,
ϕzˉ the subconjunction of ϕ(x,y) consisting
of atoms and equalities mentioning only terms in zˉ,
and
zz=⋃ma-connected z′⊆z{zz′∣zz′ appears in αz′}
(note that zz=z∩y′). Then,
qz takes the following form:
[TABLE]
Notice that, for all non-empty ma-connected subsets z′ of z, the
query on the left-hand side of equation (4) corresponds to a
conjunction w.r.t. z′ in the query at the right-hand side of
equation (3) of Proposition 36,
while both map their common variables in z to anonymous elements
when evaluated over C(K).
By Definition 24 all ma-connected subsets of
z are pairwise disjoint, while their union makes up z.
Hence, considering equation (4) for all ma-connected subsets
of z and combining it with (3)
and (5),
we immediately derive
[q,z]C(K)=[qz,z]C(K).
2.
Since z is not realisable by T, by Definition 26
we have that either
ϕ(x,y) contains an equality z=t with z∈z and t∈z
or that
there is a non-empty ma-connected subset z′⊆z
which is not realisable by T.
For the former, by Definition 23
all λ∈Λz are such that
λ(z)=λ(t), hence, equality z=t is not satisfied by any
of them, which results in
[q,z]C(K)=∅.
For the latter, this means that,
for x′=tz′∩X,
a an individual in tz′ if it exists or a fresh individual
otherwise, and
query
qz′a()=∃x′.∃z′.ϕz′∧⋀t∈tz′(t=a)∧⋀z∈z′(z=a),
we have
(qz′a)C(⟨T,A′⟩)(⟨⟩)=0
where, for t∈tz′ and z∈z′, A′ is the bag ABox
having either only assertion P(a,b) (with multiplicity 1), when
αz′=P(t,z), or only assertion P(b,a), when αz′=P(z,t).
We distinguish two cases:
(i)
either there is no homomorphism ν from ϕz′ to C(⟨T,A′⟩) or
(ii)
there is one but it violates some equality or inequality of qz′a.
Assume there exists λ∈Λz that is a
homomorphism from ϕz′ to C(K) (otherwise we trivially have
[q,z]C(K)=∅).
From the existence of λ and atom αz′ in
ϕz′, we have that C(K) contains an isomorphic copy of
C(⟨T,A′⟩), modulo individuals a and b (see also the proof of statement 1.).
But then, considering case (i), if there is no homomorphism ν from
ϕz′ to C(⟨T,A′⟩), this means there is no homomorphism λ
from ϕz′ to C(K) either, which is a contradiction (this is
easily seen by the fact that there is a homomorphism from the image of
ϕz′ under λ to C(⟨T,A′⟩) that preserves individuals and
the fact that the composition of homomorphisms is another homomorphism).
Considering case (ii), it means that
either there are two different individuals in the query connected with
variables in z′ or that ν maps z to a.
But then, the former means that ϕz′ contains atoms P1(a,z1)
and P2(a′,z2) such that a=a′ with P1,P2 not necessarily
distinct that contradicts Proposition 36, which
requires all t∈tz′ be mapped to the same individual by
λ∈Λz.
On the other hand, if ν(z)=a, then λ(z)=a if a∈tz′, otherwise λ(z)=a′ for some individual a′ in
ΔC(K). In either case, λ(z)∈I, from which we conclude
that λ∈Λz, which is a contradiction.
Therefore, there is a subquery ϕz′ in q for which there is no
λ∈Λz that is a homomorphism from ϕz′ to
C(K) and satisfies the equalities of ϕz′, from which it
follows that
[q,z]C(K)=∅.
∎
For a DL-Litecorebag TBox T, a concept C, a role R, and a term t,
where ζC0(t) is defined in Section 5, we
define the expressions ηC(t) and θ∃R(t) as follows:
[TABLE]
Lemma 38**.**
For a DL-Litecorebag ontology K=⟨T,A⟩ the following hold:
-
For a query of the form
qC(x)=ζC(x) and an individual a, we have
qCC(K)(a)=ηCC(⟨∅,A⟩)(a).
2. 2.
For a role R∈R, queries of the form
qR(x,y)=R(x,y) and
qR−(x,y)=R(y,x), and a pair of individuals a, we have
qRC(K)(a)=qRC(⟨∅,A⟩)(a) and
qR−C(K)(a)=qR−C(⟨∅,A⟩)(a).
3. 3.
For a role R, a query of the form q(x)=R(x,x), and an individual a,
we have
qC(K)(a)=qC(⟨∅,A⟩)(a).
Proof.
-
Recalling the definition of canonical models, it is immediate to verify that
CC(K)(a)=CC1(K)=cclT[a,C0(K)](C) holds.
Observe that the latter quantity is equal to
⋃T⊨C0⊑C(C0C0(K)(a)).
Last, notice that
C0C0(K)(a)=A(C0(a)) holds
whenever C0∈C, and
C0C0(K)(a)=∑c∈IA(R(a,c)) holds
whenever C0=∃R with R∈R (the case is analogous when C0=∃R− with R∈R).
Recalling the semantics of CQs, this means that
C0C0(K)(a)=ζCC(⟨∅,A⟩)(a) holds.
Combining this with
CC(K)(a)=⋃T⊨C0⊑C(C0C0(K)(a)) and
the definition of equation (6), we derive
CC(K)(a)=ηCC(⟨∅,A⟩)(a), which is
equivalent to,
qCC(K)(a)=ηCC(⟨∅,A⟩)(a).
-
Since T does not contain role inclusion axioms,
for every i≥0 in the definition of the canonical model for K,
every role P, and every pair of individuals a, we have
PCi(K)(a)=PC0(K)(a).
Moreover, by definition of canonical models, we have
C(K)=⋃i≥0Ci(K), from which and the definition of
∪ for bag interpretations, we get
PC(K)=⋃i≥0PCi(K).
Combining this with the fact that
PCi(K)(a)=PC0(K)(a), we derive
PC(K)(a)=PC0(K)(a).
Last, because PC0(K)(a)=A(P(a)), we get
qPC(K)(a)=qPC(⟨∅,A⟩)(a),
which proves the claim.
-
The claim follows from the facts that
T does not contain any role inclusion axioms
and that canonical interpretations do not add to role extentions tuples
with repeated elements.
∎
Lemma 39**.**
For a rooted CQ
q(x)=∃y.ϕ(x,y), a DL-Litecorebag ontology
K=⟨T,A⟩, and variables z⊆y that are
realisable by T, we have
[qz,z]C(K)=(qˉz)C(⟨∅,A⟩).
Proof.
Denote by zˉ all terms of ϕ(x,y) not appearing in
z, by ϕzˉ the subconjunction of ϕ(x,y) that consists of all atoms and equalities mentioning only terms in zˉ, and set yzˉ=y∖z.
Then, recalling the definition of ϕz′ in Section 5,
q is written as
[TABLE]
For a ma-connected subset z′ of z and for P∈R, let
Rz′=P, if αz′=P(t,z), and
Rz′=P−, if αz′=P(z,t).
Last, denote term t and variable z
appearing in αz′ by tz′ and zz′,
respectively, and let
zz=⋃ma-connected z′⊆z{zz′∣zz′ appears in αz′}.
Based on this, query qz takes the form
[TABLE]
Last, denote by
apply(ϕzˉ,η)
the formula obtained from
ϕzˉ such that each occurrence of an atom A(t) in
ϕzˉ, where A∈C and t∈I∪X,
is replaced with ηA(t) (defined in equation (6)).
Recalling equation (7) that defines θ∃R(t) for a
role R and a term t, query qˉz takes the form
[TABLE]
Consider now Definition 23 and the set of valuations
accounting for bag
[qz,z]C(K).
All such valuations map a variable in ϕzˉ to an individual.
Because apply(ϕzˉ,η) replaces unary atoms
A(t) in ϕzˉ with ηA(t) leaving any binary atoms
intact, by Lemma 38 (Case 1 for unary atoms and Case 2), it follows that the evaluation of ϕzˉ over C(K)
coincides with the evaluation of apply(ϕzˉ,η) over
C(⟨∅,A⟩).
It remains to be shown that, for each ma-connected subset z′ of z, equality
[∃zz′.αz′,zz′]C(K)=(θ∃Rz′(tz′))C(⟨∅,A⟩) holds.
Then, the claim will follow because qz and qˉz share
the same equalities.
To see this last equivalance, observe that for an individual a∈I
(or for a=tz′ if tz′∈I),
the multiplicity for
[∃zz′.αz′,zz′]C(K)(a)
corresponds to the number of anonymous elements associated with a in
extension ∃Rz′ under C(K).
Recalling the definition of canonical models and the proof of
Lemma 38,
number
[∃zz′.αz′,zz′]C(K)(a) can be
written successively as follows:
[TABLE]
∎
See 29
Proof.
Let q be of the form q(x)=∃y.ϕ(x,y).
From (1) and then by Lemmas 27
and 39, we have
[TABLE]
∎
See 30
Proof.
The claim is an immediate consequence of Theorems 16
and 29,
and the fact that rewriting of a CQ q depends only on q and TBox.
∎
See 31
Proof.
For combined complexity, NP-hardness comes from reducing CQ query
answering in DL-Litecore to CQ query answering for rooted queries in
DL-Litecorebag. It is known that given a CQ q in DL-Litecore, an ontology K=⟨T,A⟩, and a tuple of individuals a, the problem of
deciding whether a∈qK is NP-complete
even when all variables in q are free (i.e., q is rooted). Then,
NP-hardness follows from Proposition 10.
We now discuss membership in NP.
\textscBagCert[rooted CQs,DL-Litecorebag] decides for any given
ontology K=⟨T,A⟩, rooted query
q(x)=∃y.ϕ(x,y),
tuple of individuals a, and number k∈N0∞,
whether qK(a)≥k.
Without loss of generality in the following we assume that k is a positive
integer since for the cases of k being [math] or ∞,
\textscBagCert[rooted CQs,DL-Litecorebag] is always true or false,
respectively.
By Theorem 16 and the definition of universal models, we have
qK(a)=qC(K)(a). Hence, in the following, we focus on the
canonical model of K.
By the semantics of bag query answering, we
might have a number of valuations from the terms of q to ΔC(K)
that contribute to the multiplicity of qC(K)(a).
In the worst case, each valuation may contribute 1 to this multiplicity,
which means that to verify whether
qC(K)(a)≥k holds, we need at most k different valuations
λ1,…,λk
that send the atoms of q to tuples in C(K),
bind x to a, and
satisfy the equalities of q.
By Theorem 18 the images of q under any of these
valuations fall into Cn(K) where n is the number of atoms of q.
Based on these observations, to prove membership in NP, we describe how we
can obtain a tuple
⟨J,λ1,…,λk⟩
where J is a subinterpretation of Cn(K)
and each
λi:x∪y∪I→ΔJ
satisfies λ(x)=a,
and verify that the multiplicity of qJ(a) with respect to
λ1,…,λk is at least k.
Then, we also prove that
⟨J,λ1,…,λk⟩
has size N and
that verification can be done in time T
such that both N and T are some polynomials with respect to the size of
K, q, and number k.
To obtain
⟨J,λ1,…,λk⟩
we guess
(g1) an interpretation
J=C1(K)∪⋃i=1kJi where each Ji is a
subinterpretation of ⋃j=2nCj(K) and
(g2) k different valuations λ1,…,λk such that
λi:x∪y∪I→ΔJ
and
λ(x)=a.
For the verification,
we
(v1) check which of the valuations λ1,…,λk satisfy the
equalities of q letting Λ= be the corresponding subset and
(v2) compute quantity
m=∑λ∈Λ=∏S(t) in ϕ(x,y)SJ(λ(t))
for checking whether m≥k.
We now elaborate on the guessing of J.
The guessed J is such that ΔJ is a finite set comprising all
individuals appearing in A and a number of anonymous elements of the form
wu,Rj where j is a positive number, u an element of ΔJ, and
R a role in T.
Further J contains finite bag extensions for every concept or role S
appearing in T or A.
The part of J corresponding to C1(K) can be trivially computed from
the assertions of A and the axioms of T. To avoid an exponential
computation, however, we guess the remaining interpretations J1,…,Jk
using a non-deterministic algorithm having n−1 steps for each Ji.
Initially, Ji is set to C1(K).
At each step,
the algorithm picks
a tuple t from an extension SJi with
S∈C∪R and
a concept D appearing in T
such that the following conditions are satisfied:
if S∈C, then
T⊨S⊑D,
t=wu,Rj,
and DJi(wu,Rj)=0
where
wu,Rj∈ΔJi;
if S∈R, then
T⊨∃S−⊑D,
t=(u,wu,Rj),
and
DJi(wu,Rj)=0
where
u,wu,Rj∈ΔJi
(resp., T⊨∃S⊑D, t=(wu,R−j,u),
wu,R−j∈ΔJi).
Then,
if D∈C, it sets DJi(wu,Rj)=1;
if D=∃P with P∈R, it sets
PJi((wu,Rj,wwu,Rj,P1))=1
and adds
wwu,Rj,P1 to ΔJi; and
if D=∃P− with P∈R, it sets
PJi((wwu,Rj,P−1,wu,Rj))=1
and adds
wwu,Rj,P−1 to ΔJi.
It can be readily verified that each Ji can be any subinterpretation of
Cn(K) that always includes C1(K) and potentially tuples
t1,…,tl that would have been created respectively in
C1(K),…,Cl(K) such that
tj∩tj+1=∅
with 1≤j<l and l∈[2,n].
What remains to be shown is that the size, N, of tuple
⟨J,λ1,…,λk⟩
and time, T, needed to verify that this tuple is a certificate
for qC(K)(a)≥k are polynomials
in the size of K, q, and number k.
Consider first size N.
It is easy to see that C1(K) has a size that is
polynomial in the size of A and T. The remaining parts of
J1,…,Jk are linear in the size of q by construction. Therefore,
J is of polynomial size with respect to K, q, and number k.
As for the size of λ1,…,λk, because q has n atoms, it
contains at most 2n terms, hence
each valuation can be represented by 2n pairs (t,d)
where t is a term in q and d∈ΔJ.
Therefore, overall, N is polynomial in the size of A, q, and number k.
Consider now quantity T. Step (v1) takes time Θ(n).
Step (v2) takes polynomial time in the size of q, A,
and number k. To see this, first observe that retrieval of the multiplicities
involved in a product for a specific valuation takes O(n×∣J∣) time
where ∣J∣ is the sum of the cardinalities of all extensions in J.
Each such number l
is determined by the maximum multiplicity in A and can be represented in
binary using logl bits.
Second, multiplication of n such numbers can be done in
polynomial time, while the result, m, can be represented using
nlogl bits. Since ∣J∣ is a polynomial determined by the input,
overall verification can be done in polynomial time.
This proves that
\textscBagCert[rooted CQs,DL-Litecorebag]∈\textscNP.
Since we have also showed that it is NP-hard, we conclude that
\textscBagCert[rooted CQs,DL-Litecorebag] is NP-complete.
For data complexity, it suffices to prove that for any fixed DL-Litecorebag TBox
T and any fixed rooted CQ q,
the problem of checking whether
q⟨T,A⟩(a)≥k
is in LogSpace
for an input ABox A, a tuple of individuals a, and a k∈N0∞.
The claim follows from Theorem 16 and
Theorem 29, which in combination, allow us to decide whether
q⟨T,A⟩(a)≥k by computing the rewriting qˉ
of q in constant time and then deciding whether
qˉC(⟨∅,A⟩)(a)≥k holds.
By Proposition 22, the latter problem is \textscAC0
reducible to \textscBALGε1, which is known to be strictly included in
LogSpace,
hence, the claim follows immediately.
Computation of qˉ can be done in constant time because of the following:
(i) the number of all possible subsets z of y
participating in the realisability check by T is constant;
(ii) computing all ma-connected subsets of z can be done in
constant time;
(iii) checking whether each ma-connected subset of z is
realisable by T can be done in constant time by employing
Theorem 18 for bounding the depth of C(⟨T,A′⟩) to a
constant number (in particular, to the number of atoms of q); and
(iv) constructing the rewriting of the original query for each
realisable ma-connected subset of z can be done in constant time.
∎
See 32
Proof.
We prove that there exists a DL-LiteRbag TBox T and a rooted CQ q
such that checking whether q⟨T,A⟩(⟨⟩)≥k
for an
input bag
ABox A and k∈N0∞ is coNP-hard.
To prove this claim, we give a similar reduction to the proof of
Theorem 13.
We show that if G=⟨V,E⟩ is an undirected and connected graph
with no self-loops, then
G is not 3-colourable if and only if
q⟨T,AG⟩(r)≥3×∣V∣+2
where
T is the TBox
{Vertex⊑∃hasColour,hasColour⊑Assign},
AG is an ABox constructed based on G, and q(w) is the rooted query
[TABLE]
Let I⊇V∪{a,r,g,b}. ABox AG is defined so that it
contains the following assertions:
Vertex(u) for each u∈V,
Edge(u,v), Edge(v,u) for each (u,v)∈E,
Assign(u,r), Assign(u,g), Assign(u,b) for each u∈V,
Vertex(a), Edge(a,a), hasColour(a,r), Assign(a,r) for an
auxiliary vertex a∈/V,
Reachable(a,a), Reachable(a,u) and Reachable(u,a) for every u∈V,
Reachable(u,v) and Reachable(v,u) for every u,v∈V with u=v.
Role hasColor plays the role of a colour assignment to the vertices of G;
this is also imposed by axiom Vertex⊑∃hasColour.
Role Assign provides a pre-defined list of colours for every vertex of G
that favours 3-colour assignments based on the colours r, g, and b. Any
proper assignment of G shall use at most ∣V∣ times each one of the colours.
However, if any assignment is not proper and exhausts the number of available
colours (i.e., by assigning multiple colours to the same vertex) or uses an
additional colour, these will have to be added to role Assign due to the
axiom
hasColour⊑Assign.
Role Reachable plays the role of an accessibility relation of an individual
from any other individual. This property is used for counting the total number
of available colours among all vertices.
We next show that
G is not 3-colourable if and only if
q⟨T,AG⟩(r)≥3×∣V∣+2.
“⇒”
Let G be non-3-colourable.
Consider a model I of ⟨T,AG⟩ (which exists since
⟨T,AG⟩ is satisfiable) such that, if
γ:V→{r,g,b}
is an assignment of colours to the vertices of G, then for u=a,
hasColourI((uI,cI))=1 if and only if γ(u)=c with c∈{r,g,b}.
Since G is not 3-colourable, then, for all assignments γ, there exists
at least an edge (u,v)∈E with γ(u)=γ(v)=c. Without
loss of generality assume that c=r.
Consequently, for all models I, hasColourI contains tuples
(uI,cI) and (vI,cI), and hence, subquery
[TABLE]
matches at least two times, each one contributing multiplicity equal to 1;
one match corresponds to valuation
ν1={x/uI,y/vI,z/cI,w/rI}
and one to
ν2={x/aI,y/aI,z/rI,w/rI}
(note that we are considering only valuations ν with ν(w)=rI).
Extending the above query to q(w), we observe that ν1 can be extended
with variables k and l in 3×∣V∣−2 ways. To see this, observe that
every node in V is related to ∣V∣ other nodes in ReachableI of
which ∣V∣−1 are related to at least 3 colours in AssignI while the
other one, namely aI, is related to at least 1.
Similarly, ν2 can be extended with variables k and l in 3×∣V∣+1 ways.
Therefore, q has at least 6×∣V∣−1 matches for every model I
following a proper 3-colour assignment, and hence, 3×∣V∣+2 is a
certain multiplicity for r, as required.
Clearly, the same statement holds for all of the
models that add additional elements in Vertex, Edge, or assign multiple
colours to some vertices exceeding the number of available colours.
What is left to consider is those models that assign additional colours to
vertices and not just one among r, g, and b. For such colour assignments,
G might turn out to be colourable. Suppose G is 4-colourable (if it is not,
then the above discussion carries over) and let p∈I. Then, there
exists a model that follows a 4-colour assignment
γ:V→{r,g,b,p} such that
γ(u)=γ(v) for every
(u,v)∈E.
Therefore, for that model we would get just one match for subquery q1(x,y,z,w)
corresponding to valuation
ν2. On the other hand, given the observations above, that model would have
associated at least one vertex to colour p in the extension of
hasColI, and hence in AssignI, effectively
increasing by one the number of colours to which that vertex is associated.
Therefore, extending the above subquery to q, we observe that ν can be
extended with variables k and l in at least 3×∣V∣+2 ways.
Clearly, the same holds for models that make use of further colours.
Therefore, q⟨T,AG⟩(⟨⟩)≥3×∣V∣+2.
“⇐”
Let G be 3-colourable. It suffices to show that
there exists a model I for which qI(r)=m with
m<3×∣V∣+2.
Since G is 3-colourable, there is an assignment
γ:V→{r,g,b}
such that, for every (u,v)∈E, γ(u)=γ(v).
Consider an interpretation Iγ defined as follows:
[TABLE]
Interpretation Iγ is defined based on the contents of V, E, and
the 3-colour assignment γ.
It is easy to verify that Iγ is a model of ⟨T,AG⟩.
Next, we show that qIγ(r)=3×∣V∣+1.
First, we observe that subquery q1(x,y,z,w)
matches exactly once, i.e., under valuation
ν={x/da,y/da,z/dr,w/dr}.
This holds because γ is a proper
3-colouring of G and, for every (u,v)∈E,
γ(u)=γ(v).
Note also that extending the above subquery to q(w), valuation ν can be
extended with variables k and l in 3×∣V∣+1 ways. Consequently,
qIγ(r)=3×∣V∣+1.
∎
Remark 2**.**
When the UNA is dropped, we can use a similar argumentation to the one given in
Remark 1 to reduce the problem of non-3-colourability
of undirected graphs to that of query answering over DL-LiteRbag ontologies.