The Bag Semantics of Ontology-Based Data Access

Charalampos Nikolaou; Egor V. Kostylev; George Konstantinidis; and Mark Kaminski; Bernardo Cuenca Grau; Ian Horrocks

arXiv:1705.07105·cs.AI·May 22, 2017

The Bag Semantics of Ontology-Based Data Access

Charalampos Nikolaou, Egor V. Kostylev, George Konstantinidis, and Mark Kaminski, Bernardo Cuenca Grau, Ian Horrocks

PDF

Open Access

TL;DR

This paper introduces a bag semantics for ontology-based data access (OBDA) that preserves duplicates in query views, aligning OBDA more closely with traditional database semantics, but increases computational complexity.

Contribution

It proposes a bag semantics model for OBDA, analyzes its computational complexity, and identifies conditions for query rewriting to maintain tractability.

Findings

01

Bag semantics makes conjunctive query answering coNP-hard in data complexity.

02

A class of queries can be rewritten to a generalized relational calculus for bags.

03

Supports database-style aggregate queries in OBDA.

Abstract

Ontology-based data access (OBDA) is a popular approach for integrating and querying multiple data sources by means of a shared ontology. The ontology is linked to the sources using mappings, which assign views over the data to ontology predicates. Motivated by the need for OBDA systems supporting database-style aggregate queries, we propose a bag semantics for OBDA, where duplicate tuples in the views defined by the mappings are retained, as is the case in standard databases. We show that bag semantics makes conjunctive query answering in OBDA coNP-hard in data complexity. To regain tractability, we consider a rather general class of queries and show its rewritability to a generalisation of the relational calculus to bags.

Equations100

\begin{array}[]{lcl}\!\!\!\mathsf{SalEmployee}(\mathsf{name},\mathit{att}_{1})&\!\!\!\rightarrow\!\!\!&\mathsf{SalEmp}(\mathsf{name}),\\ \!\!\!\mathsf{SalEmployee}(\mathsf{name},\mathsf{mngr},\mathit{att}_{2})\!&\!\!\!\rightarrow\!\!\!&\mathsf{hasMngr}(\mathsf{name},\mathsf{mngr}),\\ \!\!\!\mathsf{SalEmployee}(\mathsf{mngr},\mathit{att}_{3})&\!\!\!\rightarrow\!\!\!&\mathsf{Mngr}(\mathsf{mngr}),\\ \!\!\!\mathsf{ITEmployee}(\mathsf{surname},\mathit{att}_{4})&\!\!\!\rightarrow\!\!\!&\mathsf{ITEmp}(\mathsf{surname}).\end{array}

\begin{array}[]{lcl}\!\!\!\mathsf{SalEmployee}(\mathsf{name},\mathit{att}_{1})&\!\!\!\rightarrow\!\!\!&\mathsf{SalEmp}(\mathsf{name}),\\ \!\!\!\mathsf{SalEmployee}(\mathsf{name},\mathsf{mngr},\mathit{att}_{2})\!&\!\!\!\rightarrow\!\!\!&\mathsf{hasMngr}(\mathsf{name},\mathsf{mngr}),\\ \!\!\!\mathsf{SalEmployee}(\mathsf{mngr},\mathit{att}_{3})&\!\!\!\rightarrow\!\!\!&\mathsf{Mngr}(\mathsf{mngr}),\\ \!\!\!\mathsf{ITEmployee}(\mathsf{surname},\mathit{att}_{4})&\!\!\!\rightarrow\!\!\!&\mathsf{ITEmp}(\mathsf{surname}).\end{array}

\begin{array}[]{ll}\mathsf{SalEmp}^{{\cal I}_{{\textit{ex}}}}(\mathit{Lee})=\mathsf{Emp}^{{\cal I}_{{\textit{ex}}}}(\mathit{Lee})=3,\ \mathsf{ITEmp}^{{\cal I}_{{\textit{ex}}}}(\mathit{Lee})=2,\\ \mathsf{hasMngr}^{{\cal I}_{{\textit{ex}}}}(\mathit{Lee},\mathit{Hill})=2,\ \mathsf{hasMngr}^{{\cal I}_{{\textit{ex}}}}(\mathit{Lee},w)=1,\\ \mathsf{Mngr}^{{\cal I}_{{\textit{ex}}}}(\mathit{Hill})=2,\ \mathsf{Mngr}^{{\cal I}_{{\textit{ex}}}}(w)=1,\end{array}

\begin{array}[]{ll}\mathsf{SalEmp}^{{\cal I}_{{\textit{ex}}}}(\mathit{Lee})=\mathsf{Emp}^{{\cal I}_{{\textit{ex}}}}(\mathit{Lee})=3,\ \mathsf{ITEmp}^{{\cal I}_{{\textit{ex}}}}(\mathit{Lee})=2,\\ \mathsf{hasMngr}^{{\cal I}_{{\textit{ex}}}}(\mathit{Lee},\mathit{Hill})=2,\ \mathsf{hasMngr}^{{\cal I}_{{\textit{ex}}}}(\mathit{Lee},w)=1,\\ \mathsf{Mngr}^{{\cal I}_{{\textit{ex}}}}(\mathit{Hill})=2,\ \mathsf{Mngr}^{{\cal I}_{{\textit{ex}}}}(w)=1,\end{array}

q^{I} (a) = \sum_{λ \in Λ} \prod_{S (t) in ϕ (x, y)} S^{I} (λ (t)),

q^{I} (a) = \sum_{λ \in Λ} \prod_{S (t) in ϕ (x, y)} S^{I} (λ (t)),

\begin{array}[]{@{}r}\Delta^{\mathcal{C}_{i-1}({\cal K})}\cup{}\{w^{1}_{u,R},\ldots,w^{\delta}_{u,R}\mid u\in\Delta^{\mathcal{C}_{i-1}({\cal K})},R\textnormal{ a role},\\ \delta=\mathsf{ccl}_{{\cal T}}[u,\mathcal{C}_{i-1}({\cal K})](\exists R)-(\exists R)^{\mathcal{C}_{i-1}({\cal K})}(u)\},\end{array}

\begin{array}[]{@{}r}\Delta^{\mathcal{C}_{i-1}({\cal K})}\cup{}\{w^{1}_{u,R},\ldots,w^{\delta}_{u,R}\mid u\in\Delta^{\mathcal{C}_{i-1}({\cal K})},R\textnormal{ a role},\\ \delta=\mathsf{ccl}_{{\cal T}}[u,\mathcal{C}_{i-1}({\cal K})](\exists R)-(\exists R)^{\mathcal{C}_{i-1}({\cal K})}(u)\},\end{array}

A^{C_{i} (K)} (u)

A^{C_{i} (K)} (u)

P^{C_{i} (K)} (u, v)

\begin{array}[]{c}{\cal T}_{\textit{r}}=\{\mathsf{Emp}\sqsubseteq\exists\mathsf{hasMngr},\exists\mathsf{hasMngr}^{-}\sqsubseteq\mathsf{Mngr}\},\\ {\cal A}_{\textit{r}}(\mathsf{Emp}(\mathit{Lee}))={\cal A}_{\textit{r}}(\mathsf{Mngr}(\mathit{Hill}))=1.\end{array}

\begin{array}[]{c}{\cal T}_{\textit{r}}=\{\mathsf{Emp}\sqsubseteq\exists\mathsf{hasMngr},\exists\mathsf{hasMngr}^{-}\sqsubseteq\mathsf{Mngr}\},\\ {\cal A}_{\textit{r}}(\mathsf{Emp}(\mathit{Lee}))={\cal A}_{\textit{r}}(\mathsf{Mngr}(\mathit{Hill}))=1.\end{array}

[q, z]^{C (K)} (a) = \sum_{λ \in Λ_{z}} \prod_{S (t) in ϕ (x, y)} S^{C (K)} (λ (t)),

[q, z]^{C (K)} (a) = \sum_{λ \in Λ_{z}} \prod_{S (t) in ϕ (x, y)} S^{C (K)} (λ (t)),

q^{C (K)} = ⨄_{z \subseteq y} [q, z]^{C (K)} .

q^{C (K)} = ⨄_{z \subseteq y} [q, z]^{C (K)} .

q^{a}_{\mathbf{z}^{\prime}}()=\exists\mathbf{x}^{\prime}.\,\exists\mathbf{z}^{\prime}.\;\phi_{\mathbf{z}^{\prime}}\land\;{\bigwedge}\nolimits_{t\in\mathbf{t}_{\mathbf{z}^{\prime}}}(t=a)\;\land\;{\bigwedge}\nolimits_{z\in\mathbf{z}^{\prime}}(z\neq a),

q^{a}_{\mathbf{z}^{\prime}}()=\exists\mathbf{x}^{\prime}.\,\exists\mathbf{z}^{\prime}.\;\phi_{\mathbf{z}^{\prime}}\land\;{\bigwedge}\nolimits_{t\in\mathbf{t}_{\mathbf{z}^{\prime}}}(t=a)\;\land\;{\bigwedge}\nolimits_{z\in\mathbf{z}^{\prime}}(z\neq a),

(q_{z^{'}}^{a})^{C (⟨ T, A^{'} ⟩)} (⟨ ⟩) \geq 1,

(q_{z^{'}}^{a})^{C (⟨ T, A^{'} ⟩)} (⟨ ⟩) \geq 1,

\alpha_{\mathbf{z}^{\prime}}\;\;\land\;\;\;{\bigwedge}\nolimits_{y\in\mathbf{t}_{\mathbf{z}^{\prime}}\cap{\mathbf{X}},\,t\in\mathbf{t}_{\mathbf{z}^{\prime}}}(y=t),

\alpha_{\mathbf{z}^{\prime}}\;\;\land\;\;\;{\bigwedge}\nolimits_{y\in\mathbf{t}_{\mathbf{z}^{\prime}}\cap{\mathbf{X}},\,t\in\mathbf{t}_{\mathbf{z}^{\prime}}}(y=t),

\bar{q}(\mathbf{x})=\mathop{\mathchoice{\mathrel{\ooalign{\hss$\bigvee$\hss\cr\kern 2.75554pt\raise 2.15277pt\hbox{$\cdot$}\raise-2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$\bigvee$\hss\cr\kern 2.75554pt\raise 2.15277pt\hbox{$\cdot$}\raise-2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$\bigvee$\hss\cr\kern 2.75554pt\raise 2.15277pt\hbox{$\cdot$}\raise-2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$\bigvee$\hss\cr\kern 2.75554pt\raise 2.15277pt\hbox{$\cdot$}\raise-2.15277pt\hbox{}}}}}\nolimits_{\mathbf{z}\text{ realisable by }{\cal T}}\bar{q}_{\mathbf{z}}(\mathbf{x}).

\bar{q}(\mathbf{x})=\mathop{\mathchoice{\mathrel{\ooalign{\hss$\bigvee$\hss\cr\kern 2.75554pt\raise 2.15277pt\hbox{$\cdot$}\raise-2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$\bigvee$\hss\cr\kern 2.75554pt\raise 2.15277pt\hbox{$\cdot$}\raise-2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$\bigvee$\hss\cr\kern 2.75554pt\raise 2.15277pt\hbox{$\cdot$}\raise-2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$\bigvee$\hss\cr\kern 2.75554pt\raise 2.15277pt\hbox{$\cdot$}\raise-2.15277pt\hbox{}}}}}\nolimits_{\mathbf{z}\text{ realisable by }{\cal T}}\bar{q}_{\mathbf{z}}(\mathbf{x}).

\begin{array}[]{l}\exists y.\,{\sf hasMngr}(x,y)\wedge\big{(}{\sf Mngr}(y)\vee\exists z.\,{\sf hasMngr}(z,y)\big{)}\text{ and }\\ \big{(}{\sf Emp}(x)\vee\exists y.\,{\sf hasMngr}(x,y)\big{)}\setminus\exists y.\,{\sf hasMngr}(x,y),\end{array}

\begin{array}[]{l}\exists y.\,{\sf hasMngr}(x,y)\wedge\big{(}{\sf Mngr}(y)\vee\exists z.\,{\sf hasMngr}(z,y)\big{)}\text{ and }\\ \big{(}{\sf Emp}(x)\vee\exists y.\,{\sf hasMngr}(x,y)\big{)}\setminus\exists y.\,{\sf hasMngr}(x,y),\end{array}

\begin{array}[]{rcl}a^{{\cal I}^{\prime}}&=&a^{\cal I},\\ A^{{\cal I}^{\prime}}(u)&=&\left\{\begin{array}[]{ll}\infty,&\text{if }u\in A^{\cal I},\\ 0,&\text{otherwise,}\end{array}\right.\\ P^{{\cal I}^{\prime}}(u,v)&=&\left\{\begin{array}[]{ll}\infty,&\text{if }(u,v)\in P^{\cal I},\\ 0,&\text{otherwise}.\end{array}\right.\end{array}

\begin{array}[]{rcl}a^{{\cal I}^{\prime}}&=&a^{\cal I},\\ A^{{\cal I}^{\prime}}(u)&=&\left\{\begin{array}[]{ll}\infty,&\text{if }u\in A^{\cal I},\\ 0,&\text{otherwise,}\end{array}\right.\\ P^{{\cal I}^{\prime}}(u,v)&=&\left\{\begin{array}[]{ll}\infty,&\text{if }(u,v)\in P^{\cal I},\\ 0,&\text{otherwise}.\end{array}\right.\end{array}

\begin{array}[]{rcl}&a^{{\cal I}}~{}~{}=~{}~{}a^{{\cal I}^{\prime}},\\ u\in A^{\cal I}&\text{if and only if}&A^{{\cal I}^{\prime}}(u)>0,\\ (u,v)\in P^{\cal I}&\text{if and only if}&P^{{\cal I}^{\prime}}(u,v)>0.\end{array}

\begin{array}[]{rcl}&a^{{\cal I}}~{}~{}=~{}~{}a^{{\cal I}^{\prime}},\\ u\in A^{\cal I}&\text{if and only if}&A^{{\cal I}^{\prime}}(u)>0,\\ (u,v)\in P^{\cal I}&\text{if and only if}&P^{{\cal I}^{\prime}}(u,v)>0.\end{array}

\begin{array}[]{r@{~} l r@{~} l r@{~} l r@{~} l r@{~} l r@{~} l}\Delta^{{\cal I}_{1}}~{}&=\{\mathit{Lee},\mathit{Hill}\},&\mathsf{Emp}^{{\cal I}_{1}}~{}&=\{\mathit{Lee}\},&\mathsf{hasMngr}^{{\cal I}_{1}}~{}&=\{(\mathit{Lee},\mathit{Hill})\},&\mathsf{Mngr}^{{\cal I}_{1}}~{}&=\{\mathit{Hill}\}\\ \Delta^{{\cal I}_{2}}~{}&=\{\mathit{Lee},\mathit{Hill},w\},&\mathsf{Emp}^{{\cal I}_{2}}~{}&=\{\mathit{Lee}\},&\mathsf{hasMngr}^{{\cal I}_{2}}~{}&=\{(\mathit{Lee},w)\},&\mathsf{Mngr}^{{\cal I}_{2}}~{}&=\{\mathit{Hill},w\}.\\ \end{array}

\begin{array}[]{r@{~} l r@{~} l r@{~} l r@{~} l r@{~} l r@{~} l}\Delta^{{\cal I}_{1}}~{}&=\{\mathit{Lee},\mathit{Hill}\},&\mathsf{Emp}^{{\cal I}_{1}}~{}&=\{\mathit{Lee}\},&\mathsf{hasMngr}^{{\cal I}_{1}}~{}&=\{(\mathit{Lee},\mathit{Hill})\},&\mathsf{Mngr}^{{\cal I}_{1}}~{}&=\{\mathit{Hill}\}\\ \Delta^{{\cal I}_{2}}~{}&=\{\mathit{Lee},\mathit{Hill},w\},&\mathsf{Emp}^{{\cal I}_{2}}~{}&=\{\mathit{Lee}\},&\mathsf{hasMngr}^{{\cal I}_{2}}~{}&=\{(\mathit{Lee},w)\},&\mathsf{Mngr}^{{\cal I}_{2}}~{}&=\{\mathit{Hill},w\}.\\ \end{array}

q () = \exists x . \exists y . \exists z . \exists w . E d g e (x, y) \land ha s C o l o u r (x, z) \land ha s C o l o u r (y, z) \land A C o l (w) .

q () = \exists x . \exists y . \exists z . \exists w . E d g e (x, y) \land ha s C o l o u r (x, z) \land ha s C o l o u r (y, z) \land A C o l (w) .

q_{1} (x, y, z) = E d g e (x, y) \land ha s C o l o u r (x, z) \land ha s C o l o u r (y, z)

q_{1} (x, y, z) = E d g e (x, y) \land ha s C o l o u r (x, z) \land ha s C o l o u r (y, z)

Δ^{I_{γ}}

Δ^{I_{γ}}

c^{I_{γ}}

V er t e x^{I_{γ}}

E d g e^{I_{γ}}

ha s C o l o u r^{I_{γ}}

A C o l^{I_{γ}}

\begin{array}[]{rll}h:&\Delta^{{\cal I}}\to\Delta^{{\cal J}},\\ h_{S}:&S^{{{\cal I}}^{\texttt{e}}}\to S^{{{\cal J}}^{\texttt{e}}},&\text{for all }S\in\mathbf{C}\cup\mathbf{R},\end{array}

\begin{array}[]{rll}h:&\Delta^{{\cal I}}\to\Delta^{{\cal J}},\\ h_{S}:&S^{{{\cal I}}^{\texttt{e}}}\to S^{{{\cal J}}^{\texttt{e}}},&\text{for all }S\in\mathbf{C}\cup\mathbf{R},\end{array}

\begin{array}[]{rll}\nu:&\mathbf{y}\cup{\mathbf{I}}\to\Delta^{{\cal I}},\\ \nu_{S}:&{q}^{\texttt{e}}_{S}\to S^{{{\cal I}}^{\texttt{e}}},&\text{for all }S\in\mathbf{C}\cup\mathbf{R},\end{array}

\begin{array}[]{rll}\nu:&\mathbf{y}\cup{\mathbf{I}}\to\Delta^{{\cal I}},\\ \nu_{S}:&{q}^{\texttt{e}}_{S}\to S^{{{\cal I}}^{\texttt{e}}},&\text{for all }S\in\mathbf{C}\cup\mathbf{R},\end{array}

[R_{1} (t_{0}, t_{1}) : m_{1}], [R_{2} (t_{1}, t_{2}) : m_{2}], \dots, [R_{k} (t_{k - 1}, t_{k}) : m_{k}]

[R_{1} (t_{0}, t_{1}) : m_{1}], [R_{2} (t_{1}, t_{2}) : m_{2}], \dots, [R_{k} (t_{k - 1}, t_{k}) : m_{k}]

ν_{R_{j}}^{1} ([R_{j} (t_{j - 1}, t_{j}) : m_{j}]) = ν_{R_{j}}^{2} ([R_{j} (t_{j - 1}, t_{j}) : m_{j}])

ν_{R_{j}}^{1} ([R_{j} (t_{j - 1}, t_{j}) : m_{j}]) = ν_{R_{j}}^{2} ([R_{j} (t_{j - 1}, t_{j}) : m_{j}])

\begin{array}[]{l @{~=~} l l @{~=~} l}\Delta^{\mathcal{C}({\cal K})}\hfil~{}=~{}&{\mathbf{I}}\cup\{w_{a,R}\},&A^{\mathcal{C}({\cal K})}\hfil~{}=~{}&\{|a,a,a|\},\\ R^{\mathcal{C}({\cal K})}\hfil~{}=~{}&\{|(a,b),(a,b),(a,w_{a,R})|\},&B^{\mathcal{C}({\cal K})}\hfil~{}=~{}&\{|b,b,b,w_{a,R}|\}.\end{array}

\begin{array}[]{l @{~=~} l l @{~=~} l}\Delta^{\mathcal{C}({\cal K})}\hfil~{}=~{}&{\mathbf{I}}\cup\{w_{a,R}\},&A^{\mathcal{C}({\cal K})}\hfil~{}=~{}&\{|a,a,a|\},\\ R^{\mathcal{C}({\cal K})}\hfil~{}=~{}&\{|(a,b),(a,b),(a,w_{a,R})|\},&B^{\mathcal{C}({\cal K})}\hfil~{}=~{}&\{|b,b,b,w_{a,R}|\}.\end{array}

\begin{array}[]{l @{~=~} l l @{~=~} l}\Delta^{\mathcal{C}({\cal K}^{\prime})}\hfil~{}=~{}&{\mathbf{I}}\cup\{w_{a,R}\},&A^{\mathcal{C}({\cal K}^{\prime})}\hfil~{}=~{}&\{|a,a,a|\},\\ R^{\mathcal{C}({\cal K}^{\prime})}\hfil~{}=~{}&\{|(a,b),(a,b),(a,w_{a,R})|\},&B^{\mathcal{C}({\cal K}^{\prime})}\hfil~{}=~{}&\{|b,b,b,w_{a,R}|\},\\ P^{\mathcal{C}({\cal K}^{\prime})}\hfil~{}=~{}&\{|(a,b)|\}_{8},&C^{\mathcal{C}({\cal K}^{\prime})}\hfil~{}=~{}&\{|a|\}_{8},\\ D^{\mathcal{C}({\cal K}^{\prime})}\hfil~{}=~{}&\{|b|\}_{8}.\end{array}

\begin{array}[]{l @{~=~} l l @{~=~} l}\Delta^{\mathcal{C}({\cal K}^{\prime})}\hfil~{}=~{}&{\mathbf{I}}\cup\{w_{a,R}\},&A^{\mathcal{C}({\cal K}^{\prime})}\hfil~{}=~{}&\{|a,a,a|\},\\ R^{\mathcal{C}({\cal K}^{\prime})}\hfil~{}=~{}&\{|(a,b),(a,b),(a,w_{a,R})|\},&B^{\mathcal{C}({\cal K}^{\prime})}\hfil~{}=~{}&\{|b,b,b,w_{a,R}|\},\\ P^{\mathcal{C}({\cal K}^{\prime})}\hfil~{}=~{}&\{|(a,b)|\}_{8},&C^{\mathcal{C}({\cal K}^{\prime})}\hfil~{}=~{}&\{|a|\}_{8},\\ D^{\mathcal{C}({\cal K}^{\prime})}\hfil~{}=~{}&\{|b|\}_{8}.\end{array}

E_{q} = π_{ref (x_{1}, t), \dots, ref (x_{∣ x ∣}, t)} (σ_{j \in [1, ∣ t ∣] ∖ {ref (x_{1}, t), \dots, ref (x_{∣ x ∣}, t)} : j = ref (t_{j}, t)} (S)) .

E_{q} = π_{ref (x_{1}, t), \dots, ref (x_{∣ x ∣}, t)} (σ_{j \in [1, ∣ t ∣] ∖ {ref (x_{1}, t), \dots, ref (x_{∣ x ∣}, t)} : j = ref (t_{j}, t)} (S)) .

E_{q} = π_{ref (x_{1}, x_{1} x_{2}), \dots, ref (x_{∣ x ∣}, x_{1} x_{2})} (σ_{x \in x_{1} \cap x_{2} : ref (x, x_{1}) = ∣ x_{1} ∣ + ref (x, x_{2})} (E_{q_{1}} \times E_{q_{2}})) .

E_{q} = π_{ref (x_{1}, x_{1} x_{2}), \dots, ref (x_{∣ x ∣}, x_{1} x_{2})} (σ_{x \in x_{1} \cap x_{2} : ref (x, x_{1}) = ∣ x_{1} ∣ + ref (x, x_{2})} (E_{q_{1}} \times E_{q_{2}})) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Service-Oriented Architecture and Web Services · Advanced Database Systems and Queries

Full text

The Bag Semantics of Ontology-Based Data Access††thanks: This work was supported by the Royal Society under a University Research

Fellowship, the EPSRC projects ED3 and DBOnto, and the Research Council of Norway via the Sirius SFI.

Charalampos Nikolaou

Egor V. Kostylev

George Konstantinidis

Mark Kaminski

Bernardo Cuenca Grau

**Ian Horrocks

**Department of Computer Science, University of Oxford, UK

Abstract

Ontology-based data access (OBDA) is a popular approach for integrating and querying multiple data sources by means of a shared ontology. The ontology is linked to the sources using mappings, which assign views over the data to ontology predicates. Motivated by the need for OBDA systems supporting database-style aggregate queries, we propose a bag semantics for OBDA, where duplicate tuples in the views defined by the mappings are retained, as is the case in standard databases. We show that bag semantics makes conjunctive query answering in OBDA coNP-hard in data complexity. To regain tractability, we consider a rather general class of queries and show its rewritability to a generalisation of the relational calculus to bags.

1 Introduction

Ontology-based data access (OBDA) is an increasingly popular approach to enable uniform access to multiple data sources with diverging schemas Poggi et al. (2008).

In OBDA, an ontology provides a unifying conceptual model for the data sources together with domain knowledge. The ontology is linked to each source by global-as-view (GAV) mappings Lenzerini (2002), which assign views over the data to ontology predicates. Users access the data by means of queries formulated using the vocabulary of the ontology; query answering amounts to computing the certain answers to the query over the union of ontology and the materialisation of the views defined by the mappings. The formalism of choice for representing ontologies in OBDA is the description logic $\textit{DL-Lite}_{\cal R}$ Calvanese et al. (2007), which underpins OWL 2 QL Motik et al. (2012). $\textit{DL-Lite}_{\cal R}$ was designed to ensure that queries against the ontology are first-order rewritable; that is, they can be reformulated as a set of relational queries over the sources Calvanese et al. (2007).

Example 1.

A company stores data about departments and their employees in several databases. The sales department uses the schema $\mathsf{SalEmployee}(\mathsf{id},\mathsf{name},\mathsf{salary},\mathsf{loc},\mathsf{mngr})$ , where attributes $\mathsf{id}$ , $\mathsf{name}$ , $\mathsf{salary}$ , $\mathsf{loc}$ , and $\mathsf{mngr}$ stand for employee ID within the department, their name, salary, location, and name of their manager. In turn, the IT department stores data using the schema $\mathsf{ITEmployee}(\mathsf{id},\mathsf{surname},\mathsf{salary},\mathsf{city})$ , where managers are not specified. To integrate employee data, the company relies on an ontology with TBox ${\cal T}_{{\textit{ex}}}$ , which defines unary predicates such as $\mathsf{SalEmp}$ , $\mathsf{ITEmp}$ , and $\mathsf{Mngr}$ , and binary predicates such as $\mathsf{hasMngr}$ relating employees to their managers. The following mappings determine the extension of the predicates based on the data, where each $\mathit{att}_{i}$ represents the attributes occurring only in the source:

[TABLE]

*TBox ${\cal T}_{{\textit{ex}}}$ specifies the meaning of its vocabulary using inclusions

(i) $\mathsf{SalEmp}\sqsubseteq\mathsf{Emp}$ and $\mathsf{ITEmp}\sqsubseteq\mathsf{Emp}$ , which say that both sales and IT employees are company employees;

(ii) $\exists\mathsf{hasMngr}^{-}\sqsubseteq\mathsf{Mngr}$ , specifying the range of the $\mathsf{hasMngr}$ relation, and

(iii) $\mathsf{Emp}\sqsubseteq\exists\mathsf{hasMngr}$ , requiring that employees have a (maybe unspecified) manager.

Such inclusions influence query answering: when asking for the names of all company employees, the system will retrieve all relevant sales and IT employees; this is achieved via query rewriting, where the query is reformulated as the union of queries over the sales and IT databases. $\lozenge$ *

OBDA has received a great deal of attention in recent years. Researchers have studied the limits of first-order rewritability in ontology languages Calvanese et al. (2007); Artale et al. (2009), established bounds on the size of rewritings Gottlob et al. (2014); Kikot et al. (2014), developed optimisation techniques Kontchakov et al. (2014), and implemented systems well-suited for real-world applications Calvanese et al. (2017, 2011).

An important observation about the conventional semantics of OBDA is that it is set-based: the materialisation of the views defined by the mappings is formalised as a virtual ABox consisting of a set of facts over the ontology predicates. This treatment is, however, in contrast with the semantics of database views, which is based on bags (multisets) and where duplicate tuples are retained by default. The distinction between set and bag semantics in databases is very significant in practice; in particular, it influences the evaluation of aggregate queries, which combine various aggregation functions such as $\mathsf{Min}$ , $\mathsf{Max}$ , $\mathsf{Sum}$ , $\mathsf{Count}$ or $\mathsf{Avg}$ with the grouping functionality provided in SQL by the $\mathsf{GroupBy}$ construct.

Example 2.

Consider the query asking for the number of employees named Lee. Assume there are two different employees named Lee, which are represented as different tuples in the sales database (e.g., tuples with the same employee name, but different ID). Under the conventional semantics of OBDA, the virtual ABox would contain a single fact $\mathsf{SalEmp}(\mathit{Lee})$ ; hence, the query would wrongly return one, even under the semantics for counting aggregate queries in Calvanese et al. (2008); Kostylev and Reutter (2015). The correct count can be obtained by considering the extension of $\mathsf{SalEmp}$ as a bag with multiple occurrences of $\mathit{Lee}$ . $\lozenge$

The goal of this paper is to propose and study a bag semantics for OBDA which is compatible with the semantics of standard databases and can provide a suitable foundation for the future study of aggregate queries. We focus on conjunctive query (CQ) answering over $\textit{DL-Lite}_{\cal R}$ ontologies under bag semantics, and our main contributions are as follows.

We propose the ontology language $\textit{DL-Lite}_{\cal R}^{\textit{bag}}$ and its restriction $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ , where ABoxes consist of a bag of facts, thus providing a faithful representation of the views defined by OBDA mappings. We define the semantics of query answering in this setting and show that it is compatible with the conventional set-based semantics. 2. 2.

We show that, in contrast to the set case, ontologies may not have a universal model (i.e., a single model over which all CQs can be correctly evaluated), and bag query answering becomes coNP-hard in data complexity even if we restrict ourselves to $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontologies. 3. 3.

To regain tractability, we study the class of rooted CQs Bienvenu et al. (2012), where each connected component of the query graph is required to contain an individual or an answer variable. This is a very general class, which arguably captures most practical OBDA queries. We show that rooted CQs over $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontologies not only admit a universal model and enjoy favourable computational properties, but also allow for rewritings that can be directly evaluated over the bag ABox of the ontology.

Proofs of all results are deferred to the appendix.

2 Preliminaries

**Syntax of Ontologies ** We fix a vocabulary consisting of countably infinite and pairwise disjoint sets of individuals ${\mathbf{I}}$ (i.e., constants), variables ${\mathbf{X}}$ , atomic concepts ${\mathbf{C}}$ (unary predicates) and atomic roles ${\mathbf{R}}$ (binary predicates). A role is an atomic role $P\in{\mathbf{R}}$ or its inverse $P^{-}$ . A concept is an atomic concept in ${\mathbf{C}}$ or an expression $\exists R$ , where $R$ is a role. An inclusion is an expression of the form $S_{1}\sqsubseteq S_{2}$ with $S_{1}$ and $S_{2}$ either both concepts or both roles. A disjointness axiom is an expression of the form $\mathsf{Disj}(S_{1},S_{2})$ with $S_{1}$ and $S_{2}$ either both concepts or both roles. A concept assertion is of the form $A(a)$ with $a\in{\mathbf{I}}$ and $A\in{\mathbf{C}}$ . A role assertion is of the form $P(a,b)$ with $a,b\in{\mathbf{I}}$ and $P\in{\mathbf{R}}$ . A $\textit{DL-Lite}_{\cal R}$ TBox is a finite set of inclusions and disjointness axioms. An ABox is a finite set of concept and role assertions. A $\textit{DL-Lite}_{\cal R}$ ontology is a pair $\langle{\cal T},{\cal A}\rangle$ with ${\cal T}$ a $\textit{DL-Lite}_{\cal R}$ TBox and ${\cal A}$ an ABox. The ontology language $\textit{DL-Lite}_{\textit{core}}$ restricts $\textit{DL-Lite}_{\cal R}$ by disallowing inclusions and disjointness axioms for roles.

**Semantics of Ontologies ** An interpretation ${\cal I}$ is a pair $\langle\Delta^{{\cal I}},\cdot^{{\cal I}}\rangle$ , where the domain $\Delta^{{\cal I}}$ is a non-empty set, and the interpretation function $\cdot^{{\cal I}}$ maps each $a\in\mathbf{I}$ to $a^{{\cal I}}\in\Delta^{{\cal I}}$ such that $a^{\cal I}\neq b^{\cal I}$ for all $a,b\in{\mathbf{I}}$ ,111We adopt the unique name assumption for convenience; dropping it does not affect results (modulo minor changes of definitions). each $A\in{\mathbf{C}}$ to a subset $A^{{\cal I}}$ of $\Delta^{{\cal I}}$ and each $P\in{\mathbf{R}}$ to a subset $P^{{\cal I}}$ of $\Delta^{{\cal I}}\times\Delta^{{\cal I}}$ . The interpretation function extends to concepts and roles as follows: $(R^{-})^{{\cal I}}=\{(u,v)\mid(v,u)\in R^{{\cal I}}\}$ and $(\exists R)^{{\cal I}}=\{u\in\Delta^{{\cal I}}\mid(u,v)\in R^{{\cal I}}\text{ for some }v\in\Delta^{{\cal I}}\}$ .

An interpretation ${\cal I}$ satisfies ABox ${\cal A}$ if $a^{{\cal I}}\in A^{{\cal I}}$ for all $A(a)\in{\cal A}$ and $(a^{{\cal I}},b^{{\cal I}})\in P^{{\cal I}}$ for all $P(a,b)\in{\cal A}$ ; ${\cal I}$ satisfies TBox ${\cal T}$ if $S_{1}^{{\cal I}}\subseteq S_{2}^{{\cal I}}$ for all $S_{1}\sqsubseteq S_{2}$ in ${\cal T}$ and $S_{1}^{{\cal I}}\cap S_{2}^{{\cal I}}=\emptyset$ for all $\mathsf{Disj}(S_{1},S_{2})$ in ${\cal T}$ ; ${\cal I}$ is a model of ontology $\langle{\cal T},{\cal A}\rangle$ if it satisfies ${\cal T}$ and ${\cal A}$ . An ontology is satisfiable if it has a model.

Queries A conjunctive query (CQ) $q(\mathbf{x})$ with answer variables $\mathbf{x}$ is a formula $\exists\mathbf{y}.\,\phi(\mathbf{x},\mathbf{y})$ , where $\mathbf{x}$ , $\mathbf{y}$ are (possibly empty) repetition-free tuples of variables and $\phi(\mathbf{x},\mathbf{y})$ is a conjunction of atoms of the form $A(t)$ , $P(t_{1},t_{2})$ or $z=t$ , where $A\in{\mathbf{C}}$ , $P\in{\mathbf{R}}$ , $z\in\mathbf{x}\cup\mathbf{y}$ , and $t,t_{1},t_{2}\in\mathbf{x}\cup\mathbf{y}\cup{\mathbf{I}}$ . If $\mathbf{x}$ is inessential, then we write $q$ instead of $q(\mathbf{x})$ . If $\mathbf{x}$ is the empty tuple $\langle\rangle$ , then $q$ is Boolean. A union of CQs (UCQ) is a disjunction of CQs with the same answer variables.

The equality atoms in a CQ $q(\mathbf{x})=\exists\mathbf{y}.\,\phi(\mathbf{x},\mathbf{y})$ yield an equivalence relation $\sim$ on terms $\mathbf{x}\cup\mathbf{y}\cup{\mathbf{I}}$ , and we write $\tilde{t}$ for the equivalence class of a term $t$ . The Gaifman graph of $q(\mathbf{x})$ has a node $\tilde{t}$ for each $t\in\mathbf{x}\cup\mathbf{y}\cup{\mathbf{I}}$ in $\phi$ , and an edge $\{\tilde{t}_{1},\tilde{t}_{2}\}$ for each atom in $\phi$ over $t_{1}$ and $t_{2}$ . We assume that all CQs are safe: for each $z\in\mathbf{x}\cup\mathbf{y}$ , the class $\tilde{z}$ contains a term mentioned in an atom of $\phi(\mathbf{x},\mathbf{y})$ that is not an equality.

The certain answers $q^{{\cal K}}$ to a (U)CQ $q(\mathbf{x})$ over a $\textit{DL-Lite}_{\cal R}$ ontology ${\cal K}$ are the set of all tuples $\mathbf{a}$ of individuals such that $q(\mathbf{a})$ holds in every model of ${\cal K}$ . A class of queries $\mathcal{Q}_{1}$ is rewritable to a class $\mathcal{Q}_{2}$ for an ontology language $\mathcal{O}$ if for any $q_{1}\in\mathcal{Q}_{1}$ and TBox ${\cal T}$ in $\mathcal{O}$ , there is $q_{2}\in\mathcal{Q}_{2}$ such that, for any ABox ${\cal A}$ in $\mathcal{O}$ with $\langle{\cal T},{\cal A}\rangle$ satisfiable, $q_{1}^{\langle{\cal T},{\cal A}\rangle}$ equals the answers to $q_{2}$ in (the least model of) ${\cal A}$ . Checking $\mathbf{a}\in q^{\langle{\cal T},{\cal A}\rangle}$ for a tuple $\mathbf{a}$ , (U)CQ $q$ , and $\textit{DL-Lite}_{\cal R}$ ontology $\langle{\cal T},{\cal A}\rangle$ is an NP-complete problem with $\textsc{AC}^{0}$ data complexity (i.e., when ${\cal T}$ and $q$ are fixed) Calvanese et al. (2007). The latter follows from the rewritability of UCQs to themselves for $\textit{DL-Lite}_{\cal R}$ .

**Bags ** A bag over a set $M$ is a function $\Omega:M\to\mathbb{N}^{\infty}_{0}$ , where $\mathbb{N}^{\infty}_{0}$ is the set of nonnegative integers and infinity. The value $\Omega(c)$ is the multiplicity of $c$ in $M$ . A bag $\Omega$ is finite if there are finitely many $c\in M$ with $\Omega(c)>0$ and there is no $c$ with $\Omega(c)=\infty$ . The empty bag $\emptyset$ over $M$ is the bag such that $\emptyset(c)=0$ for all $c\in M$ . Given bags $\Omega_{1}$ and $\Omega_{2}$ over $M$ , let $\Omega_{1}\subseteq\Omega_{2}$ if $\Omega_{1}(c)\leq\Omega_{2}(c)$ for each $c\in M$ .

The intersection $\cap$ , max union $\cup$ , arithmetic union $\uplus$ , and difference $-$ are the binary operations defined for bags $\Omega_{1}$ and $\Omega_{2}$ over the same set $M$ as follows: for every $c\in M$ , $(\Omega_{1}\cap\Omega_{2})(c)=\min\{\Omega_{1}(c),\Omega_{2}(c)\}$ , $(\Omega_{1}\cup\Omega_{2})(c)=\max\{\Omega_{1}(c),\Omega_{2}(c)\}$ , $(\Omega_{1}\uplus\Omega_{2})(c)=\Omega_{1}(c)+\Omega_{2}(c)$ , and $(\Omega_{1}-\Omega_{2})(c)=\max\{0,\Omega_{1}(c)-\Omega_{2}(c)\}$ ; difference is well-defined only when $\Omega_{2}$ is finite.

3 $\textit{DL-Lite}_{\cal R}$ with Bag Semantics

In this section we present a bag semantics for $\textit{DL-Lite}_{\cal R}$ ontologies, define the associated query answering problem, and establish its intractability in data complexity.

We formalise ABoxes as bags of facts (rather than sets) in order to faithfully represent the materialised views over source data defined by OBDA mappings.

Definition 3.

A bag ABox is a finite bag over the set of concept and role assertions. A $\textit{DL-Lite}_{\cal R}^{\textit{bag}}$ ontology is a pair $\langle{\cal T},{\cal A}\rangle$ of a $\textit{DL-Lite}_{\cal R}$ TBox ${\cal T}$ and a bag ABox ${\cal A}$ ; the ontology is $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ if ${\cal T}$ is a $\textit{DL-Lite}_{\textit{core}}$ TBox.

The semantics of $\textit{DL-Lite}_{\cal R}^{\textit{bag}}$ is based on bag interpretations ${\cal I}$ , with atomic concepts and roles mapped to bags of domain elements and pairs of elements, respectively, and where the interpretation function is extended to complex concepts and roles in the natural way; in particular, a concept $\exists P$ is interpreted as the bag projection of $P^{{\cal I}}$ to the first component, where each occurrence of a pair $(u,v)$ in $P^{{\cal I}}$ contributes to the multiplicity of domain element $u$ in $(\exists P)^{{\cal I}}$ .

Definition 4.

A bag interpretation ${\cal I}$ is a pair $\langle\Delta^{{\cal I}},\cdot^{{\cal I}}\rangle$ defined the same as in the set case with the exception that $A^{{\cal I}}$ and $P^{{\cal I}}$ are bags (not sets) over $\Delta^{{\cal I}}$ and $\Delta^{{\cal I}}\times\Delta^{{\cal I}}$ , respectively. The interpretation function extends to concepts and roles as follows: $(P^{-})^{{\cal I}}$ maps each $(u,v)\in\Delta^{{\cal I}}\times\Delta^{{\cal I}}$ to $P^{{\cal I}}(v,u)$ , and $(\exists R)^{{\cal I}}$ maps each $u\in\Delta^{{\cal I}}$ to $\sum_{v\in\Delta^{{\cal I}}}R^{{\cal I}}(u,v)$ .

The definition of semantics of ontologies is as expected.

Definition 5.

A bag interpretation ${\cal I}=\langle\Delta^{{\cal I}},\cdot^{{\cal I}}\rangle$ satisfies a bag ABox ${\cal A}$ if ${\cal A}(A(a))\leq A^{{\cal I}}(a^{\cal I})$ for each concept assertion $A(a)$ in ${\cal A}$ and ${\cal A}(P(a,b))\leq P^{{\cal I}}(a^{\cal I},b^{\cal I})$ for each role assertion $P(a,b)$ . Satisfaction of ${\cal T}$ is defined as in the set case, except that $\subseteq$ and $\cap$ are applied to bags instead of sets. Bag interpretation ${\cal I}$ is a bag model of the $\textit{DL-Lite}_{\cal R}^{\textit{bag}}$ ontology $\langle{\cal T},{\cal A}\rangle$ , written ${\cal I}\models^{\textnormal{{b}}}\langle{\cal T},{\cal A}\rangle$ , if it satisfies both ${\cal T}$ and ${\cal A}$ . The ontology is satisfiable if it has a bag model.

Example 6.

Let ${\cal K}_{{\textit{ex}}}=\langle{\cal T}_{{\textit{ex}}},{\cal A}_{{\textit{ex}}}\rangle$ be a $\textit{DL-Lite}_{\cal R}^{\textit{bag}}$ ontology with ${\cal T}_{{\textit{ex}}}$ as in Example 1 and ${\cal A}_{{\textit{ex}}}$ has $\mathsf{SalEmp}(\mathit{Lee})$ with multiplicity $3$ , $\mathsf{ITEmp}(\mathit{Lee})$ and $\mathsf{hasMngr}(\mathit{Lee},\mathit{Hill})$ both with multiplicity $2$ (and all other assertions with multiplicity 0). Let ${\cal I}_{{\textit{ex}}}$ be the bag interpretation mapping individuals to themselves and with the following non-zero values:

[TABLE]

where $w$ is a fresh element. We can check that ${\cal I}_{{\textit{ex}}}\models^{\textnormal{{b}}}{\cal K}_{{\textit{ex}}}$ . $\lozenge$

We now define the notion of query answering under bag semantics. We first define the answers $q^{{\cal I}}$ of a CQ $q(\mathbf{x})$ over a bag interpretation ${\cal I}$ . Intuitively, $q^{{\cal I}}$ is a bag of tuples of individuals such that each valid embedding $\lambda$ of the body of $q$ into ${\cal I}$ contributes separately to the multiplicity of the tuple $\lambda(\mathbf{x})$ in $q^{{\cal I}}$ ; in turn, the contribution of each specific $\lambda$ is the product of the multiplicities of the images of the query atoms under $\lambda$ . The latter is in accordance with the interpretation of joins in the bag relational algebra and SQL, where the multiplicity of a tuple in a join is the product of the multiplicities of the joined tuples (e.g., see García-Molina et al. (2009)).

Definition 7.

Let $q(\mathbf{x})=\exists\mathbf{y}.\,\phi(\mathbf{x},\mathbf{y})$ be a CQ. The bag answers $q^{{\cal I}}$ to $q$ over a bag interpretation ${\cal I}=\langle\Delta^{{\cal I}},\cdot^{{\cal I}}\rangle$ are defined as the bag over tuples of individuals from ${\mathbf{I}}$ of the same size as $\mathbf{x}$ such that, for every such tuple $\mathbf{a}$ ,

[TABLE]

*where $\Lambda$ is the set of all valuations $\lambda:\mathbf{x}\cup\mathbf{y}\cup{\mathbf{I}}\to\Delta^{\cal I}$ such that $\lambda(\mathbf{x})=\mathbf{a}^{\cal I}$ , $\lambda(a)=a^{\cal I}$ for each $a\in{\mathbf{I}}$ , and $\lambda(z)=\lambda(t)$ for each $z=t$ in $\phi(\mathbf{x},\mathbf{y})$ . *

If $q$ is Boolean then $q^{{\cal I}}$ are defined only for the empty tuple $\langle\rangle$ . Also, conjunction $\phi(\mathbf{x},\mathbf{y})$ may contain repeated atoms, and hence can be seen as a bag of atoms; while repeated atoms are redundant in the set case, they are essential in the bag setting Chaudhuri and Vardi (1993) and thus the definition of $q^{{\cal I}}(\mathbf{a})$ treats each copy of a query atom $S(\mathbf{t})$ separately.

The following definition of certain answers, capturing open-world query answering, is a reformulation of the definition in Kostylev and Reutter (2015) for counting queries. It is a natural extension of the set notion to bags: a query answer is certain for a given multiplicity if it occurs with at least that multiplicity in every bag model of the ontology.

Definition 8.

*The bag certain answers $q^{{\cal K}}$ to a query $q$ over a $\textit{DL-Lite}_{\cal R}^{\textit{bag}}$ ontology ${\cal K}$ are the bag $\bigcap\nolimits_{{\cal I}\models^{\textnormal{{b}}}{\cal K}}q^{{\cal I}}$ . *

We study the problem $\textsc{BagCert}[\mathcal{Q},\mathcal{O}]$ of checking, given a query $q$ from a class of CQs $\mathcal{Q}$ , ontology ${\cal K}=\langle{\cal T},{\cal A}\rangle$ from an ontology language $\mathcal{O}$ , tuple $\mathbf{a}$ over ${\mathbf{I}}$ , and number $k\in\mathbb{N}^{\infty}_{0}$ , whether $q^{{\cal K}}(\mathbf{a})\geq k$ ; data complexity of BagCert is studied under the assumption that ${\cal T}$ and $q$ are fixed. Following Grumbach and Milo (1996), we assume that the multiplicities of assertions in ${\cal A}$ and $k$ (if not infinity) are given in unary.

Example 9.

Let $q_{{\textit{ex}}}(x)=\exists y.\,\mathsf{hasMngr}(x,y)$ and ${\cal K}_{{\textit{ex}}}$ be as in Example 6. Then $q_{\textit{ex}}^{{\cal K}_{{\textit{ex}}}}(\mathit{Lee})=3$ . Indeed, on the one hand, $q_{\textit{ex}}^{{\cal I}_{\textit{ex}}}(\mathit{Lee})=3$ for ${\cal I}_{\textit{ex}}$ in Example 6. On the other, for any bag model ${\cal I}$ of ${\cal K}_{{\textit{ex}}}$ , $q_{\textit{ex}}^{\cal I}(\mathit{Lee})=\Sigma_{u\in\Delta^{\cal I}}\mathsf{hasMngr}^{{\cal I}}(\mathit{Lee}^{\cal I},u)\geq 3$ , because ${\cal A}_{{\textit{ex}}}(\mathsf{SalEmp}(\mathit{Lee}))=3$ and ${\cal T}_{{\textit{ex}}}$ contains inclusions $\mathsf{SalEmp}\sqsubseteq\mathsf{Emp}$ and $\mathsf{Emp}\sqsubseteq\exists\mathsf{hasMngr}$ . $\lozenge$

The bag semantics can be seen as a generalisation of the set semantics of DL-Lite: first, satisfiability under bag semantics reduces to the set case; second, certain answers under bag and set semantics coincide if multiplicities are ignored.

Proposition 10.

Let $\langle{\cal T},{\cal A}\rangle$ be a $\textit{DL-Lite}_{\cal R}$ ontology and $\langle{\cal T},{\cal A}^{\prime}\rangle$ be a $\textit{DL-Lite}_{\cal R}^{\textit{bag}}$ ontology with the same TBox such that $\{S(\mathbf{t})\mid{\cal A}^{\prime}(S(\mathbf{t}))\geq 1\}={\cal A}$ . Then, the following holds:

$\langle{\cal T},{\cal A}\rangle$ * is satisfiable if and only if $\langle{\cal T},{\cal A}^{\prime}\rangle$ is satisfiable;* 2. 2.

for each CQ $q$ and tuple $\mathbf{a}$ of individuals from ${\mathbf{I}}$ , $\mathbf{a}\in q^{\langle{\cal T},{\cal A}\rangle}$ if and only if $q^{\langle{\cal T},{\cal A}^{\prime}\rangle}(\mathbf{a})\geq 1$ .

An important property of satisfiable $\textit{DL-Lite}_{\cal R}$ ontologies ${\cal K}$ is the existence of so called universal models for CQs, that is, models ${\cal I}$ such that the certain answers to every CQ $q$ over ${\cal K}$ can be obtained by evaluating $q$ over ${\cal I}$ Calvanese et al. (2007). This notion extends naturally to bags.

Definition 11.

A bag model ${\cal I}$ of a $\textit{DL-Lite}_{\cal R}^{\textit{bag}}$ ontology ${\cal K}$ is universal for a class of queries $\mathcal{Q}$ if $q^{{\cal K}}=q^{{\cal I}}$ for any $q\in\mathcal{Q}$ .

Unfortunately, in contrast to the set case, even $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontologies may not admit a universal bag model for all CQs.

Proposition 12.

There exists a satisfiable $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontology that has no universal bag model for the class of all CQs.

The lack of a universal model suggests that CQ answering under bag semantics is harder than in the set case. Indeed, this problem is coNP-hard in data complexity, which is in stark contrast to the $\textsc{AC}^{0}$ upper bound in the set case.

Theorem 13.

$\textsc{BagCert}[\textup{CQs},\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}]$ * is coNP-hard in data complexity.*

4 Universal Models for Rooted Queries

Theorem 13 suggests that bag semantics is generally not well-suited for OBDA. Our approach to overcome this negative result is to consider a restricted class of CQs, introduced in the context of query optimisation in DLs Bienvenu et al. (2012), called rooted: in a rooted CQ, each existential variable is connected in the Gaifman graph to an individual or an answer variable. Rooted CQs capture most practical queries; for example, they include all connected non-Boolean CQs.

Definition 14.

*A CQ $q(\mathbf{x})$ is rooted if each connected component of its Gaifman graph has a node with a term in $\mathbf{x}\cup{\mathbf{I}}$ . *

In contrast to arbitrary CQs, any satisfiable $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontology admits a universal bag model for rooted CQs. Although we define such a model, called canonical, in a fully declarative way, it can be intuitively seen as the result of applying a variant of the restricted chase procedure Calì et al. (2013) extended to bags. Starting from the ABox, the procedure successively “repairs” violations of ${\cal T}$ by extending the interpretation of concepts and roles in a minimal way.

To formalise canonical models, we need two auxiliary notions. First, the concept closure $\mathsf{ccl}_{{\cal T}}[u,{\cal I}]$ of an element $u\in\Delta^{\cal I}$ in a bag interpretation ${\cal I}=\langle\Delta^{{\cal I}},\cdot^{{\cal I}}\rangle$ over a TBox ${\cal T}$ is the bag of concepts such that, for any concept $C$ , $\mathsf{ccl}_{{\cal T}}[u,{\cal I}](C)$ is the maximum value of $C_{0}^{\cal I}(u)$ amongst all concepts $C_{0}$ satisfying ${\cal T}\models C_{0}\sqsubseteq C$ . Second, the union ${\cal I}\cup{\cal J}$ of bag interpretations ${\cal I}=\langle\Delta^{{\cal I}},\cdot^{{\cal I}}\rangle$ and ${\cal J}=\langle\Delta^{{\cal J}},\cdot^{{\cal J}}\rangle$ with $a^{\cal I}=a^{\cal J}$ for all $a\in{\mathbf{I}}$ is the bag interpretation $\langle\Delta^{{\cal I}}\cup\Delta^{{\cal J}},\cdot^{{\cal I}\cup{\cal J}}\rangle$ with $a^{{\cal I}\cup{\cal J}}=a^{\cal I}$ for $a\in{\mathbf{I}}$ and $S^{{\cal I}\cup{\cal J}}=S^{\cal I}\cup S^{\cal J}$ for $S\in{\mathbf{C}}\cup{\mathbf{R}}$ .

Definition 15.

The canonical bag model $\mathcal{C}({\cal K})$ of a $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontology ${\cal K}=\langle{\cal T},{\cal A}\rangle$ is the bag interpretation $\bigcup_{i\geq 0}\mathcal{C}_{i}({\cal K})$ with the bag interpretations $\mathcal{C}_{i}({\cal K})=\langle\Delta^{\mathcal{C}_{i}({\cal K})},\cdot^{\mathcal{C}_{i}({\cal K})}\rangle$ defined as follows:

$\Delta^{\mathcal{C}_{0}({\cal K})}={\mathbf{I}}$ , $a^{\mathcal{C}_{0}({\cal K})}=a$ for each $a\in{\mathbf{I}}$ , and $S^{\mathcal{C}_{0}({\cal K})}(\mathbf{a})={\cal A}(S(\mathbf{a}))$ for each $S\in{\mathbf{C}}\cup{\mathbf{R}}$ and individuals $\mathbf{a}$ ;

-

for each $i>0$ , $\Delta^{\mathcal{C}_{i}({\cal K})}$ is

[TABLE]

*where $w^{j}_{u,R}$ are fresh domain elements, called anonymous, $a^{\mathcal{C}_{i}({\cal K})}=a$ for all $a\in{\mathbf{I}}$ , and, for all $A\in{\mathbf{C}}$ , $P\in{\mathbf{R}}$ , and elements $u$ , $v$ , *

[TABLE]

It is easily seen that $\mathcal{C}({\cal K})$ satisfies ${\cal K}$ whenever ${\cal K}$ is satisfiable. We next show that it is universal for rooted CQs.

Theorem 16.

The canonical bag model $\mathcal{C}({\cal K})$ of a satisfiable $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontology ${\cal K}$ is universal for rooted CQs.

Example 17.

Consider an ontology ${\cal K}_{\textit{r}}=\langle{\cal T}_{\textit{r}},{\cal A}_{\textit{r}}\rangle$ with

[TABLE]

The canonical model $\mathcal{C}({\cal K}_{\textit{r}})$ interprets (all with multiplicity 1) $\mathsf{Emp}$ by $\mathit{Lee}$ , $\mathsf{Mngr}$ by $\mathit{Hill}$ and $w^{1}_{\mathit{Lee},\mathsf{hasMngr}}$ , and $\mathsf{hasMngr}$ by $(\mathit{Lee},w^{1}_{\mathit{Lee},\mathsf{hasMngr}})$ . Note that $\mathcal{C}({\cal K}_{\textit{r}})$ is not universal for all CQs: for instance, $q_{\textit{nr}}^{\mathcal{C}({\cal K}_{\textit{r}})}(\langle\rangle)=2$ for non-rooted $q_{\textit{nr}}=\exists y.\,\mathsf{Mngr}(y)$ , but $q_{\textit{nr}}^{{\cal I}_{\textit{nr}}}(\langle\rangle)=1$ for the model ${\cal I}_{\textit{nr}}$ interpreting $\mathsf{Emp}$ by $\mathit{Lee}$ , $\mathsf{hasMngr}$ by $(\mathit{Lee},\mathit{Hill})$ , and $\mathsf{Mngr}$ by $\mathit{Hill}$ . $\lozenge$

We conclude this section by showing an important property of rooted CQs, which justifies their favourable computational properties. As in the set case for arbitrary CQs, given a satisfiable $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontology ${\cal K}$ and a rooted CQ $q$ , $q^{{\cal K}}$ can be computed over a small sub-interpretation of $\mathcal{C}({\cal K})$ .

Theorem 18.

Let ${\cal K}$ be a satisfiable $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontology with $\mathcal{C}({\cal K})=\bigcup_{i\geq 0}\mathcal{C}_{i}({\cal K})$ and $q$ be a rooted CQ having $n$ atoms. Then, $q^{\mathcal{C}({\cal K})}=q^{\mathcal{C}_{n}({\cal K})}$ .

5 Rewritability of Rooted Queries

Rewritability is key for OBDA, and we next establish to what extent rooted CQs over bag semantics are rewritable.

The first idea would be to use the analogy with the set case and rewrite to unions of CQs. There are two corresponding operations for bags: max union $\cup$ and arithmetic union $\uplus$ . So we may consider max unions $q_{\textit{max}}=q_{1}(\mathbf{x})\lor\dots\lor q_{n}(\mathbf{x})$ or arithmetic unions $q_{\textit{ar}}=q_{1}(\mathbf{x})\mathop{\mathchoice{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}}\cdots\mathop{\mathchoice{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}}q_{n}(\mathbf{x})$ of CQs $q_{i}(\mathbf{x})$ , $1\leq i\leq n$ , with the following semantics, for any interpretation ${\cal I}$ : $q_{\textit{max}}^{{\cal I}}=q_{1}^{{\cal I}}\cup\cdots\cup q_{n}^{{\cal I}}$ and $q_{\textit{ar}}^{{\cal I}}=q_{1}^{{\cal I}}\uplus\cdots\uplus q_{n}^{{\cal I}}$ , respectively. Our first result is negative: rewriting to either of these classes is not possible even for $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ .

Proposition 19.

The class of rooted CQs is rewritable neither to max nor to arithmetic unions of CQs for $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ .

Next we show that rooted queries are rewritable to $\textsc{BALG}^{1}_{\varepsilon}$ -queries: the class directly corresponding to the algebra $\textsc{BALG}^{1}_{\varepsilon}$ for bags Grumbach et al. (1996); Grumbach and Milo (1996); Libkin and Wong (1997). Since $\textsc{BALG}^{1}_{\varepsilon}\subset\textsc{LogSpace}$ Grumbach and Milo (1996), where $\textsc{BALG}^{1}_{\varepsilon}$ is the complexity class for $\textsc{BALG}^{1}_{\varepsilon}$ algebra evaluation, rewritability to $\textsc{BALG}^{1}_{\varepsilon}$ -queries is highly desirable.

Intuitively, in addition to projection $\exists$ , join $\wedge$ , and unions $\vee$ and $\mathop{\mathchoice{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}}$ , $\textsc{BALG}^{1}_{\varepsilon}$ also allows for difference $\setminus$ . Domain-dependent queries, inexpressible in algebraic query languages, are precluded by restrictions on the use of variables.

Definition 20.

A $\textsc{BALG}^{1}_{\varepsilon}$ -query $q(\mathbf{x})$ with answer variables $\mathbf{x}$ is one of the following, where $q_{i}$ are $\textsc{BALG}^{1}_{\varepsilon}$ -queries:

$S(\mathbf{t})$ , for $S\in{\mathbf{C}}\cup{\mathbf{R}}$ , $\mathbf{t}$ tuple over $\mathbf{x}\cup{\mathbf{I}}$ mentioning all $\mathbf{x}$ ;

-

$q_{1}(\mathbf{x}_{1})\wedge q_{2}(\mathbf{x}_{2})$ , for $\mathbf{x}=\mathbf{x}_{1}\cup\mathbf{x}_{2}$ ;

-

$q_{0}(\mathbf{x}_{0})\wedge(x=t)$ , for $x\in\mathbf{x}_{0}$ , $t\in{\mathbf{X}}\cup{\mathbf{I}}$ , $\mathbf{x}=\mathbf{x}_{0}\cup(\{t\}\setminus{\mathbf{I}})$ ;

-

$\exists\mathbf{y}.\,q_{0}(\mathbf{x},\mathbf{y})$ ; $q_{1}(\mathbf{x})\vee q_{2}(\mathbf{x})$ ; $q_{1}(\mathbf{x})\mathop{\mathchoice{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}}q_{2}(\mathbf{x})$ ; $q_{1}(\mathbf{x})\setminus q_{2}(\mathbf{x})$ .

The semantics of $\textsc{BALG}^{1}_{\varepsilon}$ -queries is defined as follows.

Definition 21.

The bag answers $q^{{\cal I}}$ to a $\textsc{BALG}^{1}_{\varepsilon}$ -query $q(\mathbf{x})$ over a bag interpretation ${\cal I}=\langle\Delta^{{\cal I}},\cdot^{{\cal I}}\rangle$ is the bag of tuples over ${\mathbf{I}}$ of the same size as $\mathbf{x}$ inductively defined as follows, for each tuple $\mathbf{a}$ and the corresponding mapping $\lambda$ such that $\lambda(\mathbf{x})=\mathbf{a}^{\cal I}$ and $\lambda(a)=a^{\cal I}$ for all $a\in{\mathbf{I}}$ :

$S^{\cal I}(\lambda(\mathbf{t}))$ , if $q(\mathbf{x})=S(\mathbf{t})$ ;

-

$q_{1}^{{\cal I}}(\lambda(\mathbf{x}_{1}))\times q_{2}^{{\cal I}}(\lambda(\mathbf{x}_{2}))$ , if $q(\mathbf{x})=q_{1}(\mathbf{x}_{1})\wedge q_{2}(\mathbf{x}_{2})$ ;

-

$q_{0}^{{\cal I}}(\lambda(\mathbf{x}_{0}))$ , if $q(\mathbf{x})=q_{0}(\mathbf{x}_{0})\wedge(x=t)$ and $\lambda(x)=\lambda(t)$ ;

-

[math], if $q(\mathbf{x})=q_{0}(\mathbf{x}_{0})\wedge(x=t)$ and $\lambda(x)\neq\lambda(t)$ ;

-

$\sum\nolimits_{\lambda^{\prime}:\,\mathbf{y}\to\Delta^{\cal I}}q_{0}^{{\cal I}}(\mathbf{a}^{\cal I},\lambda^{\prime}(\mathbf{y}))$ , if $q(\mathbf{x})=\exists\mathbf{y}.\,q_{0}(\mathbf{x},\mathbf{y})$ ;

-

$(q_{1}^{{\cal I}}\,\mathtt{op}\,q_{2}^{{\cal I}})(\mathbf{a}^{\cal I})$ * if $q(\mathbf{x})=q_{1}(\mathbf{x})\,\mathtt{op}^{\prime}\,q_{2}(\mathbf{x})$ , where $\mathtt{op}$ is $\cup$ , $\uplus$ , or $-$ and $\mathtt{op}^{\prime}$ is $\vee$ , $\mathop{\mathchoice{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}}$ , or $\setminus$ , respectively.*

The data complexity of $\textsc{BALG}^{1}_{\varepsilon}$ -query evaluation is obtained by showing that $\textsc{BALG}^{1}_{\varepsilon}$ -queries can be be mapped to the $\textsc{BALG}^{1}_{\varepsilon}$ algebra of Grumbach and Milo (1996).

Proposition 22.

Given a fixed $\textsc{BALG}^{1}_{\varepsilon}$ -query $q(\mathbf{x})$ , the problem of checking whether $q^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(\mathbf{a})\geq k$ for a bag ABox ${\cal A}$ , tuple $\mathbf{a}$ , and $k\in\mathbb{N}^{\infty}_{0}$ is $\textsc{AC}^{0}$ reducible to $\textsc{BALG}^{1}_{\varepsilon}$ .

Our rewriting algorithm is inspired by the algorithm in Kikot et al. (2012) for the set case of $\textit{DL-Lite}_{\cal R}$ . Before going into details, we provide a high-level description.

The key observation is that the set of valuations of a CQ $q(\mathbf{x})=\exists\mathbf{y}.\,\phi(\mathbf{x},\mathbf{y})$ over the bag canonical model $\mathcal{C}({\cal K})$ can be partitioned into subsets, each of which is characterised by variables $\mathbf{z}\subseteq\mathbf{y}$ that are sent to anonymous elements of $\mathcal{C}({\cal K})$ . Hence, we can rewrite $q(\mathbf{x})$ for each of these subsets separately and then take an arithmetic union of the resulting queries, provided these queries are guaranteed to give the same answers as the corresponding subsets of valuations.

Our rewriting proceeds along the following steps.

Step 1. First, each $\mathbf{z}$ is checked for realisability, that is, whether the subquery induced by $\mathbf{z}$ can indeed be folded into the anonymous forest-shaped part of $\mathcal{C}({\cal K})$ . This can be done without the ABox, looking only at the atoms of $q$ that link $\mathbf{z}$ to other terms of $q$ (these linking atoms exist because $q$ is rooted). Non-realisable $\mathbf{z}$ can be disregarded.

Step 2. For every realisable $\mathbf{z}$ , CQ $q(\mathbf{x})$ is replaced (for this $\mathbf{z}$ in the arithmetic union) by a CQ $q_{\mathbf{z}}(\mathbf{x})$ obtained from $q$ by replacing each maximal connected component of the subquery induced by $\mathbf{z}$ by just one linking atom. This transformation is equivalence-preserving, because the anonymous part of $\mathcal{C}({\cal K})$ does not involve multiplicities other than 0 and 1.

Step 3. Finally, each resulting $q_{\mathbf{z}}(\mathbf{x})$ is rewritten to a $\textsc{BALG}^{1}_{\varepsilon}$ -query $\bar{q}_{\mathbf{z}}(\mathbf{x})$ by “chasing back” each unary atom and each binary atom mentioning a variable in $\mathbf{z}$ with the TBox; for the binary atoms it is also guaranteed, by means of difference, that the variable in $\mathbf{z}$ is indeed mapped to the anonymous part, thus avoiding double-counting in the arithmetic union.

For the rest of this section, let us fix a rooted CQ $q(\mathbf{x})=\exists\mathbf{y}.\,\phi(\mathbf{x},\mathbf{y})$ and a $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ TBox ${\cal T}$ . We start by formalising Step 1.

Definition 23.

Given an ontology ${\cal K}$ with a TBox ${\cal T}$ and variables $\mathbf{z}\subseteq\mathbf{y}$ , let $[q,\mathbf{z}]^{\mathcal{C}({\cal K})}$ be the bag of tuples over ${\mathbf{I}}$ such that, for each tuple $\mathbf{a}$ of individuals,

[TABLE]

where $\Lambda_{\mathbf{z}}$ is the set of valuations $\lambda:\mathbf{x}\cup\mathbf{y}\cup{\mathbf{I}}\to\Delta^{\mathcal{C}({\cal K})}$ such that $\lambda(\mathbf{x})=\mathbf{a}$ , $\lambda(a)=a$ for each $a\in{\mathbf{I}}$ , $\lambda(x)=\lambda(t)$ for each $x=t$ in $\phi(\mathbf{x},\mathbf{y})$ , $\lambda(z)$ is an anonymous element for each $z\in\mathbf{z}$ , and $\lambda(y)\in{\mathbf{I}}$ for each $y\in\mathbf{y}\setminus\mathbf{z}$ .

Hence, the bag answers to $q$ can be partitioned as follows:

[TABLE]

Variables $\mathbf{z}\subseteq\mathbf{y}$ are equality-consistent if $\phi(\mathbf{x},\mathbf{y})$ has no equality $z=t$ with $z\in\mathbf{z}$ and $t\notin\mathbf{z}$ . If $\mathbf{z}$ is not equality-consistent, then $[q,\mathbf{z}]^{\mathcal{C}({\cal K})}=\emptyset$ and these $\mathbf{z}$ can be disregarded in (1). Next, we show which other $\mathbf{z}$ can be ignored.

Definition 24.

Given equality-consistent $\mathbf{z}\subseteq\mathbf{y}$ , variables $\mathbf{z}^{\prime}\subseteq\mathbf{z}$ are maximally connected in the anonymous part (ma-connected) if $\tilde{z}\subseteq\mathbf{z}^{\prime}$ for the equivalence class $\tilde{z}$ of any $z\in\mathbf{z}^{\prime}$ and the equivalence classes $\tilde{\mathbf{z}}^{\prime}$ are a maximal subset of $\tilde{\mathbf{z}}$ connected in the Gaifman graph of $q$ via nodes in $\tilde{\mathbf{z}}$ .

Next we introduce several notations for ma-connected $\mathbf{z}^{\prime}\subseteq\mathbf{z}$ with equality-consistent $\mathbf{z}\subseteq\mathbf{y}$ . First, let $\phi_{\mathbf{z}^{\prime}}$ be the sub-conjunction of $\phi(\mathbf{x},\mathbf{y})$ that consists of all atoms mentioning at least one variable in $\mathbf{z}^{\prime}$ (these sub-conjunctions are disjoint for different $\mathbf{z}^{\prime}$ ). Second, since $q$ is rooted, $\phi_{\mathbf{z}^{\prime}}$ contains an atom $\alpha_{\mathbf{z}^{\prime}}$ of the form $P(t,z)$ or $P(z,t)$ with $z\in\mathbf{z}^{\prime}$ and $t\notin\mathbf{z}$ (note that this definition may be non-deterministic). Third, let

[TABLE]

where $\mathbf{t}_{\mathbf{z}^{\prime}}$ are all such terms $t$ , $a$ is an individual in $\mathbf{t}_{\mathbf{z}^{\prime}}$ if it exists or a fresh individual otherwise, and $\mathbf{x}^{\prime}=\mathbf{t}_{\mathbf{z}^{\prime}}\cap{\mathbf{X}}$ , (this definition may also be non-deterministic because of $a$ ). Notice that $q^{a}_{\mathbf{z}^{\prime}}$ is a Boolean CQ with possible equalities of individuals and inequalities, and we can define the bag answers of such a query $q^{\prime}$ over a bag interpretation ${\cal I}$ in the same way as for usual CQs in Definition 7 with the extra requirement that each contributing valuation $\lambda$ should satisfy $\lambda(x)\neq\lambda(t)$ for each inequality $x\not=t$ of $q^{\prime}$ (and equalities of individuals are handled as usual equalities).

Definition 25.

Given equality-consistent variables $\mathbf{z}\subseteq\mathbf{y}$ , ma-connected $\mathbf{z}^{\prime}\subseteq\mathbf{z}$ are realisable by TBox ${\cal T}$ if

[TABLE]

*where, for a fresh individual $b$ , ${\cal A}^{\prime}$ is the bag ABox having either only the assertion $P(a,b)$ (with multiplicity 1), when $\alpha_{\mathbf{z}^{\prime}}=P(t,z)$ , or only $P(b,a)$ , when $\alpha_{\mathbf{z}^{\prime}}=P(z,t)$ . *

This definition does not depend on the choice of $\alpha_{\mathbf{z}^{\prime}}$ and $a$ . Indeed, if there are two atoms $P_{1}(t_{1},z_{1})$ and $P_{2}(t_{2},z_{2})$ satisfying the definition of $\alpha_{\mathbf{z}^{\prime}}$ , then either $P_{1}=P_{2}$ and both pairs $(t_{1},z_{1})$ and $(t_{2},z_{2})$ are mapped by a valuation of $q^{a}_{\mathbf{z}^{\prime}}$ to the same tuple, or $\mathbf{z}^{\prime}$ are not realisable regardless of the choice of $\alpha_{\mathbf{z}^{\prime}}$ . Similarly, if $\mathbf{t}_{\mathbf{z}^{\prime}}$ contains two individuals $a$ , $a^{\prime}$ , then $q^{a}_{\mathbf{z}^{\prime}}$ has the equality $a=a^{\prime}$ , and hence $\mathbf{z}^{\prime}$ are not realisable regardless of this choice.

Intuitively, $\mathbf{z}^{\prime}$ are realisable if their corresponding subquery $q^{a}_{\mathbf{z}^{\prime}}$ is satisfied by the tree-shaped model induced by the TBox from a connection $\alpha_{\mathbf{z}^{\prime}}$ of $\mathbf{z}^{\prime}$ and the rest of the query. This definition does not essentially involve multiplicities, because all tuples of anonymous elements in the canonical model have multiplicity at most 1, and, hence, if $q^{a}_{\mathbf{z}^{\prime}}$ matches a part of the canonical model, it does so in a unique way. Thus, checking realisability is decidable using standard set-based techniques.

Definition 26.

Variables $\mathbf{z}\subseteq\mathbf{y}$ are realisable by TBox ${\cal T}$ if they are equality-consistent and each non-empty ma-connected subset of $\mathbf{z}$ is realisable by ${\cal T}$ .

We proceed to Step 2. For realisable $\mathbf{z}\subseteq\mathbf{y}$ , let $q_{\mathbf{z}}(\mathbf{x})$ be the CQ $\exists\mathbf{y}^{\prime}.\,\phi_{\mathbf{z}}(\mathbf{x},\mathbf{y}^{\prime})$ such that $\phi_{\mathbf{z}}(\mathbf{x},\mathbf{y}^{\prime})$ is obtained from $\phi(\mathbf{x},\mathbf{y})$ by replacing $\phi_{\mathbf{z}^{\prime}}$ , for each ma-connected $\mathbf{z}^{\prime}\subseteq\mathbf{z}$ , with

[TABLE]

where $\mathbf{t}_{\mathbf{z}^{\prime}}$ is as in $q^{a}_{\mathbf{z}^{\prime}}$ , and $\mathbf{y}^{\prime}$ is the subset of $\mathbf{y}$ remaining in $\phi_{\mathbf{z}}$ . In other words, $q_{\mathbf{z}}$ contains, for each $\mathbf{z}^{\prime}$ , just one atom $\alpha_{\mathbf{z}^{\prime}}$ and equalities identifying $t_{\mathbf{z}^{\prime}}$ instead of conjunction $\phi_{\mathbf{z}^{\prime}}$ in $q$ .

The following lemma justifies Steps 1 and 2. It says that in partitioning (1) we only need to iterate over tuples $\mathbf{z}$ that are realisable by ${\cal T}$ and can also replace $q$ with $q_{\mathbf{z}}$ for each $\mathbf{z}$ .

Lemma 27.

For any ontology ${\cal K}$ with TBox ${\cal T}$ and $\mathbf{z}\subseteq\mathbf{y}$ with $q_{\mathbf{z}}(\mathbf{x})=\exists\mathbf{y}^{\prime}.\,\phi_{\mathbf{z}}(\mathbf{x},\mathbf{y}^{\prime})$ ,

if $\mathbf{z}$ is realisable by ${\cal T}$ then $[q,\mathbf{z}]^{\mathcal{C}({\cal K})}=[q_{\mathbf{z}},\mathbf{z}\cap\mathbf{y}^{\prime}]^{\mathcal{C}({\cal K})}$ ; 2. 2.

if $\mathbf{z}$ is not realisable by ${\cal T}$ then $[q,\mathbf{z}]^{\mathcal{C}({\cal K})}=\emptyset$ .

For Step 3, it suffices to rewrite each CQ $q_{\mathbf{z}}(\mathbf{x})=\exists\mathbf{y}^{\prime}\!.\,\phi_{\mathbf{z}}(\mathbf{x},\mathbf{y}^{\prime})$ to a $\textsc{BALG}^{1}_{\varepsilon}$ -query $\bar{q}_{\mathbf{z}}(\mathbf{x})=\exists\mathbf{y}_{\mathbf{z}}.\,\psi_{\mathbf{z}}(\mathbf{x},\mathbf{y}_{\mathbf{z}})$ , for $\mathbf{y}_{\mathbf{z}}=\mathbf{y}^{\prime}\setminus\mathbf{z}$ , which is guaranteed to give $[q_{\mathbf{z}},\mathbf{z}\cap\mathbf{y}^{\prime}]^{\mathcal{C}({\cal K})}$ as the bag answers on the ABox in any ontology ${\cal K}$ with TBox ${\cal T}$ . To this end, we use the following notation: for $t\in{\mathbf{X}}\cup{\mathbf{I}}$ , let $\zeta_{A}(t)=A(t)$ for $A\in{\mathbf{C}}$ , while $\zeta_{\exists P}(t)=\exists y.\,P(t,y)$ and $\zeta_{\exists P^{-}}(t)=\exists y.\,P(y,t)$ for $P\in{\mathbf{R}}$ , where $y$ is a variable different from $t$ . Then, formula $\psi_{\mathbf{z}}(\mathbf{x},\mathbf{y}_{\mathbf{z}})$ is obtained from $\phi_{\mathbf{z}}(\mathbf{x},\mathbf{y}^{\prime})$ by replacing all atoms mentioning a term $t\in{\mathbf{I}}\cup\mathbf{x}\cup\mathbf{y}_{\mathbf{z}}$ or a variable $z\in\mathbf{z}$ as follows:

each $A(t)$ with $\bigvee\nolimits_{{\cal T}\models C\sqsubseteq A}\zeta_{C}(t)$ ;

-

each $P(t,z)$ with $\big{(}\bigvee\nolimits_{{\cal T}\models C\sqsubseteq\exists P}\zeta_{C}(t)\big{)}\,\backslash\,\zeta_{\exists P}(t)$ ;

-

each $P(z,t)$ with $\big{(}\bigvee\nolimits_{{\cal T}\models C\sqsubseteq\exists P^{-}}\zeta_{C}(t)\big{)}\,\backslash\,\zeta_{\exists P^{-}}(t)$ .

Note that $\phi_{\mathbf{z}}(\mathbf{x},\mathbf{y}^{\prime})$ does not contain any atoms of the form $A(z)$ for $z\in\mathbf{z}$ , so $\psi_{\mathbf{z}}(\mathbf{x},\mathbf{y}_{\mathbf{z}})$ does not mention variables $\mathbf{z}$ . Also, atoms over roles without variables $\mathbf{z}$ stay intact, because ${\cal T}$ contains no role inclusions.

Finally, the rewriting of $q(\mathbf{x})$ over ${\cal T}$ is the $\textsc{BALG}^{1}_{\varepsilon}$ -query

[TABLE]

Example 28.

Consider TBox ${\cal T}_{\textit{r}}$ from Example 17 and the rooted CQ $q^{\textit{r}}(x)=\exists y.\,{\sf hasMngr}(x,y)\wedge{\sf Mngr}(y)$ . The query $\bar{q}^{\textit{r}}(x)=\bar{q}^{\textit{r}}_{\langle\rangle}(x)\mathop{\mathchoice{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}}\bar{q}^{\textit{r}}_{y}(x)$ , where $\bar{q}^{\textit{r}}_{\langle\rangle}(x)$ and $\bar{q}^{\textit{r}}_{y}(x)$ are

[TABLE]

is a rewriting of $q^{\textit{r}}$ over ${\cal T}_{\textit{r}}$ , since $\langle\rangle$ and $y$ are realisable. $\lozenge$

The following theorem establishes the correctness of our approach and leads to the main rewritability result.

Theorem 29.

For any rooted CQ $q$ and $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontology ${\cal K}=\langle{\cal T},{\cal A}\rangle$ we have that $q^{\mathcal{C}({\cal K})}=\bar{q}^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}$ .

Corollary 30.

The class of rooted CQs is rewritable to $\textsc{BALG}^{1}_{\varepsilon}$ -queries for $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ .

We conclude this section by establishing the complexity of rooted query answering. The bounds follow as an easy consequence of Theorem 18, Proposition 22, and Corollary 30.

Theorem 31.

$\textsc{BagCert}[\textup{rooted CQs},\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}]$ * is NP-complete and in LogSpace in data complexity.*

However, the next theorem implies that rooted queries are not $\textsc{BALG}^{1}_{\varepsilon}$ -rewritable for unrestricted $\textit{DL-Lite}_{\cal R}^{\textit{bag}}$ TBoxes.

Theorem 32.

$\textsc{BagCert}[\textup{rooted CQs},\textit{DL-Lite}_{\cal R}^{\textit{bag}}]$ * is coNP-hard in data complexity. *

6 Related work

Query answering under bag semantics has received significant attention in the database literature Libkin and Wong (1994); Grumbach et al. (1996); Grumbach and Milo (1996); Libkin and Wong (1997). These works study the relative expressive power of bag algebra primitives, the relationship with set-based algebras, and establish the data complexity of query answering. Such problems have also been recently studied in the setting of Semantic Web and SPARQL 1.1 in Kaminski et al. (2016); Angles and Gutierrez (2016).

Bag semantics in the context of Description Logics has been studied in Jiang (2010), where the author proposes a bag semantics for $\mathcal{ALC}$ and provides a tableaux algorithm. In contrast to our work, their results are restricted to ontology satisfiability and do not encompass CQ answering.

CQ answering under bag semantics is closely related to answering $\sf Count$ aggregate queries. The semantics of aggregate queries for database settings with incomplete information, such as inconsistent databases and data exchange, have been studied in Arenas et al. (2003); Libkin (2006); Afrati and Kolaitis (2008). As pointed out in Kostylev and Reutter (2015), these techniques are not directly applicable to ontologies. The practical solution in Calvanese et al. (2008) is to give epistemic semantics to aggregate queries, where the query is evaluated over ABox facts entailed by the ontology; thus, the anonymous part of the ontology models is essentially ignored, and the semantics easily leads to counter-intuitive answers. To remedy these issues, Kostylev and Reutter (2015) propose a certain answer semantics for $\sf Count$ aggregate queries over ontologies and prove tight complexity bounds for $\textit{DL-Lite}_{\cal R}$ and $\textit{DL-Lite}_{\textit{core}}$ . Similarly to our work, their semantics is open-world and considers all models of the ontology for query evaluation, which leads to more intuitive answers. The main difference resides in the definition of the ontology language, where they consider set ABoxes and adopt conventional set-based semantics for TBox axioms. Although $\textit{DL-Lite}_{\cal R}^{\textit{bag}}$ is closely related to the logic in Kostylev and Reutter (2015), the two settings do not coincide even for set ABoxes. For example, if ${\cal A}$ comprises only assertions $R(a,b)$ and $R(a,c)$ and ${\cal T}$ comprises axiom $\exists R\sqsubseteq B$ , then the query over $\langle{\cal T},{\cal A}\rangle$ that counts the number of individuals $a$ in concept $B$ returns $1$ in the setting of Kostylev and Reutter (2015), while the corresponding $\textit{DL-Lite}_{\cal R}^{\textit{bag}}$ query returns $2$ .

7 Conclusion and Future Work

We have studied OBDA under bag semantics and identified a general class of rewritable queries over $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontologies. As our framework covers already the class of $\sf Count$ aggregate queries, in future work we plan to extend it to capture further aggregate functions and more expressive ontologies.

Appendix A Appendix

In this appendix we give the complete proofs omitted in the main part of the paper.

See 10

Proof.

Let ${\cal K}=\langle{\cal T},{\cal A}\rangle$ and ${\cal K}^{\prime}=\langle{\cal T},{\cal A}^{\prime}\rangle$ for any ${\cal A}^{\prime}$ satisfying requirement $\{S(\mathbf{t})\mid{\cal A}^{\prime}(S(\mathbf{t}))\geq 1\}={\cal A}$ .

First assume that ${\cal K}$ has a model ${\cal I}=\langle\Delta^{{\cal I}},\cdot^{{\cal I}}\rangle$ . We prove that there exists a bag model of ${\cal K}^{\prime}$ . To this end, consider the bag interpretation ${\cal I}^{\prime}=\langle\Delta^{{\cal I}},\cdot^{{\cal I}^{\prime}}\rangle$ such that, for any $u,v\in\Delta^{{\cal I}}$ and $a\in{\mathbf{I}}$ ,

[TABLE]

Bag interpretation ${\cal I}^{\prime}$ satisfies ${\cal A}^{\prime}$ and all axioms in ${\cal T}^{\prime}$ , so it is a bag model of ${\cal K}^{\prime}$ . Therefore, ${\cal K}^{\prime}$ is satisfiable, as required.

To complete the proof of statement 1, suppose that ${\cal K}^{\prime}$ has a bag model ${\cal I}^{\prime}=\langle\Delta^{{\cal I}^{\prime}},\cdot^{{\cal I}^{\prime}}\rangle$ . We construct an interpretation ${\cal I}=\langle\Delta^{{\cal I}^{\prime}},\cdot^{{\cal I}}\rangle$ of ${\cal K}$ in a similar way. For $u,v\in\Delta^{{\cal I}}$ and $a\in{\mathbf{I}}$ , let

[TABLE]

Same as in the previous case, ${\cal I}$ is a model of ${\cal K}$ .

For the forward direction, let $\mathbf{a}\in q^{{\cal K}}$ for a tuple of individuals $\mathbf{a}$ , but, for the sake of contradiction, $q^{{\cal K}^{\prime}}(\mathbf{a})=0$ . The latter means that there exists a bag model ${\cal I}^{\prime}$ such that $q^{{\cal I}^{\prime}}(\mathbf{a})=0$ . Consider the interpretation ${\cal I}$ constructed on the base of ${\cal I}^{\prime}$ as in the second part of the proof of statement 1. On the one hand, it is a model of ${\cal K}$ . On the other, ${\cal I}\not\models q(\mathbf{a})$ by construction. However, it contradicts the fact that $\mathbf{a}\in q^{{\cal K}}$ . Therefore, our assumption was wrong and $q^{{\cal K}^{\prime}}(\mathbf{a})\geq 1$ .

For the backward direction, we proceed similarly. For this let $\mathbf{a}$ be a tuple of individuals and assume that $q^{{\cal K}^{\prime}}(\mathbf{a})\geq 1$ holds but $\mathbf{a}\not\in q^{{\cal K}}$ . The latter implies that ${\cal K}$ has a model ${\cal I}$ such that ${\cal I}\not\models q(\mathbf{a})$ . But this means that the model ${\cal I}^{\prime}$ of ${\cal K}^{\prime}$ constructed in the proof of statement 1. on the basis of ${\cal I}$ is such that $q^{{\cal I}^{\prime}}(\mathbf{a})=0$ , which contradicts our assumption that $q^{{\cal K}^{\prime}}(\mathbf{a})\geq 1$ . ∎

See 12

Proof.

Consider a variant of our running example where ${\cal T}=\{\mathsf{Emp}\sqsubseteq\exists\mathsf{hasMngr},\exists\mathsf{hasMngr}^{-}\sqsubseteq\mathsf{Mngr}\}$ and ${\cal A}$ contains $\mathsf{Emp}(\mathit{Lee})$ and $\mathsf{Mngr}(\mathit{Hill})$ once. Consider bag interpretations ${\cal I}_{1},{\cal I}_{2}$ defined as

[TABLE]

Both ${\cal I}_{1}$ and ${\cal I}_{2}$ are bag models of ${\cal K}=\langle{\cal T},{\cal A}\rangle$ . Moreover, for $q_{1}=\mathsf{hasMngr}(\mathit{Lee},\mathit{Hill})$ and $q_{2}=\exists x.\,\mathsf{Mngr}(x)$ , we have $q_{1}^{{\cal I}_{1}}(\langle\rangle)=1$ , $q_{1}^{{\cal I}_{2}}(\langle\rangle)=0$ , $q_{2}^{{\cal I}_{1}}(\langle\rangle)=1$ , and $q_{2}^{{\cal I}_{2}}(\langle\rangle)=2$ ; thus, neither model is universal for both $\{q_{1},q_{2}\}$ . Suppose there is a universal model ${\cal I}$ for $\{q_{1},q_{2}\}$ . Then, since $q_{1}^{{\cal I}}(\langle\rangle)$ must be zero, $(\mathit{Lee},\mathit{Hill})$ does not occur in $\mathsf{hasMngr}^{{\cal I}}$ ; since $\mathsf{Emp}(\mathit{Lee})$ is an assertion of ${\cal A}$ and $\mathsf{Emp}\sqsubseteq\exists\mathsf{hasMngr}\in{\cal T}$ , we have $\langle\mathit{Lee},w^{\prime}\rangle\in\mathsf{hasMngr}^{{\cal I}}$ for some $w^{\prime}\in\Delta^{{\cal I}}$ distinct from $\mathit{Hill}$ ; since $\exists\mathsf{hasMngr}^{-}\sqsubseteq\mathsf{Mngr}\in{\cal T}$ , it follows $w^{\prime}\in\mathsf{Mngr}^{{\cal I}}$ , and hence $q_{2}^{{\cal I}}(\langle\rangle)\geq 2$ , contradicting universality of ${\cal I}$ . ∎

Following the unary representation of bags Grumbach and Milo [1996], we represent a bag $\Omega$ over a set $S$ using expression $\{|\cdot|\}$ within which we repeat all elements of $S$ as many times as their multiplicity in $\Omega$ . For convenience, we shall also write a bag $\{|a,a,b,b,b|\}$ in the more compressed form $\{|a,b|\}_{2,3}$ where instead of repeating an element, we list a single occurrence and denote its multiplicity with a number in the appropriate subscript position of that bag.

See 13

Proof.

We prove that there exists a $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ TBox ${\cal T}$ and a Boolean CQ $q$ such that checking whether $q^{\langle{\cal T},{\cal A}\rangle}(\langle\rangle)\geq k$ for an input bag ABox ${\cal A}$ and $k\in\mathbb{N}^{\infty}_{0}$ is coNP-hard. To prove this claim, we follow Kostylev and Reutter [2015] and reduce non 3-colourability of undirected graphs (a coNP-complete problem) to query answering over $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontologies. We show that if $G=\langle V,E\rangle$ is an undirected and connected graph with no self-loops, then $G$ is not 3-colourable if and only if $q^{\langle{\cal T},{\cal A}_{G}\rangle}(\langle\rangle)\geq 3\times|V|+2$ where ${\cal T}$ is the TBox $\{Vertex\sqsubseteq\exists hasColour,\exists hasColour^{-}\sqsubseteq ACol\}$ , ${\cal A}_{G}$ is an ABox constructed based on $G$ , and $q$ is the Boolean query

[TABLE]

Let ${\mathbf{I}}\supseteq V\cup\{a,r,g,b\}$ . ABox ${\cal A}_{G}$ is defined so that it contains the following assertions:

–

$Vertex(u)$ for each $u\in V$ ,

–

$Edge(u,v)$ , $Edge(v,u)$ for each $(u,v)\in E$ ,

–

$ACol(r)$ ( $|V|+1$ times), $ACol(g)$ ( $|V|$ times), $ACol(b)$ ( $|V|$ times), for colours $r$ , $g$ , $b$ , and

–

$Vertex(a)$ , $Edge(a,a)$ , and $hasColour(a,r)$ , for the auxiliary vertex $a$ .

Individual $a$ corresponds to an auxiliary vertex for the purposes of the reduction, whereas individuals $r$ , $g$ , and $b$ play the role of colours. The usage of $Vertex$ and $Edge$ is clear; they encode $G$ . Role $hasColor$ plays the role of a colour assignment to the vertices of $G$ ; this is also imposed by axiom $Vertex\sqsubseteq\exists hasColour$ . Concept $ACol$ provides a sufficient number of pre-defined colour copies that favours 3-colour assignments based on the colours $r$ , $g$ , and $b$ . Any proper assignment of $G$ shall use at most $|V|$ times each one of these colours. However, if any assignment is not proper and exhausts the number of available colours (i.e., by assigning multiple colours to the same vertex) or uses an additional colour, these will have to be added to concept $ACol$ due to the axiom $\exists hasColour^{-}\sqsubseteq ACol$ , effectively increasing its minimum cardinality. This behaviour is the one that we exploit in the following reduction.

We next show that $G$ is not 3-colourable if and only if $q^{\langle{\cal T},{\cal A}_{G}\rangle}(\langle\rangle)\geq 3\times|V|+2$ .

“ $\Rightarrow$ ” Let $G$ be non-3-colourable. Consider a model ${\cal I}$ of $\langle{\cal T},{\cal A}_{G}\rangle$ (which exists since $\langle{\cal T},{\cal A}_{G}\rangle$ is satisfiable) such that, if $\gamma:V\to\{r,g,b\}$ is an assignment of colours to the vertices of $G$ and $u\not=a$ , then $hasColour^{\cal I}((u^{\cal I},c^{\cal I}))=1$ if and only if $\gamma(u)=c$ with $c\in\{r,g,b\}$ . Since $G$ is not 3-colourable, then, for all assignments $\gamma$ , there exists at least an edge $(u,v)\in E$ with $\gamma(u)=\gamma(v)=c$ . Consequently, for all models ${\cal I}$ defined on the basis of $\gamma$ , $hasColour^{\cal I}$ contains tuples $(u^{\cal I},c^{\cal I})$ and $(v^{\cal I},c^{\cal I})$ , and hence, the subquery of $q$

[TABLE]

has at least two matches, each one contributing multiplicity $1$ ; one match corresponds to valuation $\{x/u^{\cal I},y/v^{\cal I},z/c^{\cal I}\}$ and one to valuation $\{x/a^{\cal I},y/a^{\cal I},z/r^{\cal I}\}$ . Observe also that atom $ACol(w)$ contributes at least multiplicity $3\times|V|+1$ . Therefore, $q^{\cal I}(\langle\rangle)\geq 2\times(3\times|V|+1)$ for every model ${\cal I}$ following a proper 3-colour assignment, and hence, $3\times|V|+2$ is a certain multiplicity with respect to all these models, as required. Clearly, the same statement holds for all of the models that add additional elements in $Vertex$ , $Edge$ , or assign multiple colours to some vertices exceeding the number of available colours. What is left to consider is those models that assign additional colours to vertices and not just one among $r$ , $g$ , and $b$ . For such colour assignments, $G$ might turn out to be colourable. Suppose $G$ is 4-colourable (if it is not, then the above discussion carries over) and let $p\in{\mathbf{I}}$ . Then, there exists a model that follows a 4-colour assignment $\gamma:V\to\{r,g,b,p\}$ such that $\gamma(u)\not=\gamma(v)$ for every $(u,v)\in E$ . Therefore, for that model we would get one match with multiplicity $1$ for subquery $q_{1}(x,y,z)$ , that is, for valuation $\{x/a^{\cal I},y/a^{\cal I},z/r^{\cal I}\}$ ). On the other hand, given the observations above, that model would have to include element $p$ in the extension of $ACol$ at least once, effectively increasing the cardinality of $ACol$ to $3\times|V|+2$ . Therefore, the evaluation of $q$ over that model would always give at least $3\times|V|+2$ empty tuples. Clearly, the same holds for models that make use of further colours. Therefore, $q^{\langle{\cal T},{\cal A}_{G}\rangle}(\langle\rangle)\geq 3\times|V|+2$ .

“ $\Leftarrow$ ” Let $G$ be 3-colourable. It suffices to show that there exists a model ${\cal I}$ for which $q^{\cal I}(\langle\rangle)=m$ with $m<3\times|V|+2$ . Since $G$ is 3-colourable, there is an assignment $\gamma:V\to\{r,g,b\}$ such that, for every $(u,v)\in E$ , $\gamma(u)\not=\gamma(v)$ . Consider an interpretation ${\cal I}_{\gamma}$ defined as follows:

[TABLE]

Interpretation ${\cal I}_{\gamma}$ is defined based on the contents of $V$ , $E$ , and the 3-colour assignment $\gamma$ . It is easy to verify that ${\cal I}_{\gamma}$ is a model of $\langle{\cal T},{\cal A}_{G}\rangle$ . Next, we show that $q^{{\cal I}_{\gamma}}(\langle\rangle)=3\times|V|+1$ . First, we observe that subquery $q_{1}(x,y,z)$ matches exactly once (i.e., under valuation $\{x/d_{a},y/d_{a},z/d_{r}\}$ ). This holds because $\gamma$ is a proper 3-colouring of $G$ and, for every $(u,v)\in E$ , $\gamma(u)\not=\gamma(v)$ . Note also that there are three valuations for atom $ACol(w)$ contributing multiplicity $3\times|V|+1$ in total. Consequently, $q^{{\cal I}_{\gamma}}(\langle\rangle)=3\times|V|+1$ , as desired.

∎

Remark 1.

When the UNA is dropped, we can modify the definition of ABox satisfaction and show that a similar reduction holds for establishing coNP-hardness of query answering for $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontologies. Under this new definition, a bag interpretation ${\cal I}$ satisfies an ABox ${\cal A}$ if:

–

for each concept assertion $A(a)$ in ${\cal A}$ , we have $\sum_{a_{0}\in{\mathbf{I}}:a_{0}^{\cal I}=a^{\cal I}}{\cal A}(A(a_{0}))\leq A^{{\cal I}}(a^{\cal I})$ , and

–

for each role assertion $P(a,b)$ in ${\cal A}$ , we have $\sum_{a_{0},b_{0}\in{\mathbf{I}}:a_{0}^{\cal I}=a^{\cal I},b_{0}^{\cal I}=b^{\cal I}}{\cal A}(P(a_{0},b_{0}))\leq P^{{\cal I}}(a^{\cal I},b^{\cal I}).$

Observe that under the UNA, the definition of ABox satisfaction (Definition 5) is a special case of the above, hence, Theorem 13 is still valid under this new definition. We now discuss the modifications that are necessary for reducing non-3-colourability of undirected and connected graphs without self-loops to query answering in $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontologies without making the UNA. For this, we need to make sure that the auxiliary vertex $a$ is not interpreted with the same element with any of the vertices of $G$ as well as that none of the colours $r,g,b$ are interpreted by the same element. To ensure this, we employ atomic concepts $V_{a}$ , $V_{G}$ , $Red$ , $Blue$ , and $Green$ which will hold the auxiliary vertex $a$ , the vertices of $G$ , and the three colours, respectively. Then, we make sure that no interpretation mixes their role by introducing pairwise disjointness axioms: ${\sf Disj}(Red,Blue)$ , ${\sf Disj}(Red,Green)$ , ${\sf Disj}(Blue,Green)$ , and ${\sf Disj}(V_{a},V_{G})$ . Last, we modify ${\cal A}_{G}$ to have the additional assertions $V_{a}(a)$ , $Red(r)$ , $Green(g)$ , $Blue(b)$ , and $V_{G}(u)$ , for every vertex $u\in V$ . Following exactly the argumentation used in Theorem 13, we can show that the above reduction works if the UNA is dropped.

An enumerated bag (e-bag, for short) $\Theta$ over a set $M$ is a set of pairs $[c{:}\,m]$ with $c\in M$ and $m\in\mathbb{N}$ , where $\mathbb{N}$ is the set of positive integers, such that if $[c{:}\,m]\in\Theta$ then $[c{:}\,m-1]\in\Theta$ for all $m\in\mathbb{N}$ . There is a straightforward one-to-one correspondence between bags and e-bags, and we denote ${\Omega}^{\texttt{e}}$ the enumerated version of a bag $\Omega$ . This notion generalises to bag interpretations: the e-bag interpretation ${{\cal I}}^{\texttt{e}}$ corresponding to a bag interpretation ${\cal I}=\langle\Delta^{{\cal I}},\cdot^{{\cal I}}\rangle$ is the pair $\langle\Delta^{{\cal I}},\cdot^{{{\cal I}}^{\texttt{e}}}\rangle$ such that $a^{{{\cal I}}^{\texttt{e}}}=a^{\cal I}$ for each individual $a$ and $S^{{{\cal I}}^{\texttt{e}}}={(S^{\cal I})}^{\texttt{e}}$ for any $S\in\mathbf{C}\cup\mathbf{R}$ . The interpretation function extends to inverse roles in the same way.

An enumerated homomorphism (e-homomorphism) from an e-bag interpretation ${{\cal I}}^{\texttt{e}}=\langle\Delta^{{\cal I}},\cdot^{{{\cal I}}^{\texttt{e}}}\rangle$ to an e-bag interpretation ${{\cal J}}^{\texttt{e}}=\langle\Delta^{{\cal J}},\cdot^{{{\cal J}}^{\texttt{e}}}\rangle$ is a family $(h,h_{S},\ldots)$ , $S\in\mathbf{C}\cup\mathbf{R}$ , of functions

[TABLE]

such that

–

$h(a^{{{\cal I}}^{\texttt{e}}})=a^{{{\cal J}}^{\texttt{e}}}$ for each $a\in{\mathbf{I}}$ ,

–

$h_{A}([u{:}\,m])=[h(u){:}\,\ell]$ for all $A\in{\mathbf{C}}$ and $[u{:}\,m]\in A^{{{\cal I}}^{\texttt{e}}}$ , where $\ell\in\mathbb{N}$ is some number (which can be different for different $A$ and $[u{:}\,m]$ ),

–

$h_{P}([(u,v){:}\,m])=[(h(u),h(v)){:}\,\ell]$ for all $P\in{\mathbf{R}}$ and $[(u,v){:}\,m]\in P^{{{\cal I}}^{\texttt{e}}}$ , where $\ell\in\mathbb{N}$ is some number.

To handle some cases uniformly, we sometimes write $h_{P^{-}}([(v,u){:}\,m])$ instead of $h_{P}([(u,v){:}\,m])$ , for $P\in\mathbf{R}$ .

Intuitively, an e-homomorphism is a usual homomorphism that additionally establishes correspondence for each enumerated tuple of elements in each relation in ${{\cal I}}^{\texttt{e}}$ .

An e-homomorphism $(h,h_{S},\ldots)$ from ${{\cal I}}^{\texttt{e}}=\langle\Delta^{{\cal I}},\cdot^{{{\cal I}}^{\texttt{e}}}\rangle$ to ${{\cal J}}^{\texttt{e}}=\langle\Delta^{{\cal J}},\cdot^{{{\cal J}}^{\texttt{e}}}\rangle$ is predicate-injective on individuals ${\mathbf{I}}$ if, for each $u$ such that there exists $a\in{\mathbf{I}}$ with $h(u)=a^{{{\cal J}}^{\texttt{e}}}$ ,

–

$h_{A}([u{:}\,m])\neq h_{A}([u{:}\,\ell])$ for all $A\in{\mathbf{C}}$ and all $[u{:}\,m],[u{:}\,\ell]\in A^{{{\cal I}}^{\texttt{e}}}$ with $m\neq\ell$ ,

–

$h_{R}([(u,v_{1}){:}\,m])\neq h_{R}([(u,v_{2}){:}\,\ell])$ for all roles $R$ and all $[(u,v_{1}){:}\,m],[(u,v_{2}){:}\,\ell]\in R^{{{\cal I}}^{\texttt{e}}}$ with $v_{1}\neq v_{2}$ or $m\neq\ell$ .

Lemma 33.

For any $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontology ${\cal K}$ and any bag model ${\cal I}$ of ${\cal K}$ there exists an e-homomorphism from $\mathcal{C}^{\texttt{e}}({\cal K})$ to ${{\cal I}}^{\texttt{e}}$ that is predicate-injective on ${\mathbf{I}}$ .

Proof.

Let $\mathcal{C}({\cal K})=\bigcup_{i\geq 0}\mathcal{C}_{i}({\cal K})$ with $\mathcal{C}_{i}({\cal K})=\langle\Delta^{\mathcal{C}_{i}({\cal K})},\cdot^{\mathcal{C}_{i}({\cal K})}\rangle$ . We first define a witnessing predicate-injective e-homomorphism $(h,h_{S},\ldots)$ for the elements in $\Delta^{\mathcal{C}_{0}({\cal K})}$ , that is, on the (interpretations of the) individuals, then extend it to elements introduced in ${\mathcal{C}_{1}({\cal K})}$ , and finally recursively define it on all other elements.

For the first step, consider an individual $a\in{\mathbf{I}}$ and the element $u=a^{\mathcal{C}_{0}({\cal K})}$ . We set $h(u)=a^{\cal I}$ . Then, consider any atomic concept $A$ such that $A^{\mathcal{C}_{0}({\cal K})}(u)=k$ , $k\in\mathbb{N}$ , that is, such that $[u{:}\,m]\in A^{\mathcal{C}_{0}^{\texttt{e}}({\cal K})}$ for all $m\in\mathbb{N}$ with $m\leq k$ . By the definition of $\mathcal{C}_{0}({\cal K})$ , ${\cal A}(A(a))=k$ . Since ${\cal I}$ is a model of ${\cal A}$ , we have that $A^{\cal I}(a^{\cal I})\geq k$ . In other words, $[h(u){:}\,m]\in A^{{{\cal I}}^{\texttt{e}}}$ for all $m\leq k$ , and we can set $h_{A}([u{:}\,m])=[h(u){:}\,m]$ for all $m$ . Consider now individuals $a,b\in{\mathbf{I}}$ with corresponding elements $u=a^{\mathcal{C}_{0}({\cal K})}$ and $v=b^{\mathcal{C}_{0}({\cal K})}$ and an atomic role $P$ such that $P^{\mathcal{C}({\cal K}){0}}(u,v)=k$ , $k\in\mathbb{N}$ , that is, such that $[(u,v){:}\,m]\in P^{\mathcal{C}_{0}^{\texttt{e}}({\cal K})}$ for all $m\in\mathbb{N}$ with $m\leq k$ . By the definition of $\mathcal{C}_{0}({\cal K})$ , we have that ${\cal A}(P(a,b))=k$ . Since ${\cal I}$ is a model of ${\cal A}$ , we have that $P^{\cal I}(a^{\cal I},b^{\cal I})\geq k$ . In other words, $[(h(u),h(v)){:}\,m]\in P^{{{\cal I}}^{\texttt{e}}}$ for all $m\leq k$ , and, similarly to the concept case, we set $h_{P}([(u,v){:}\,m])=[(h(u),h(v)){:}\,m]$ for all $m$ .

For the second step, consider an individual $a\in{\mathbf{I}}$ with its interpretation $u=a^{\mathcal{C}_{0}({\cal K})}$ and a role $P\in{\mathbf{R}}$ such that $\mathsf{ccl}_{{\cal T}}[u,\mathcal{C}_{0}({\cal K})](\exists P)=k$ for $k\in\mathbb{N}$ , but $(\exists P)^{\mathcal{C}_{0}({\cal K})}(u)=l<k$ (the case where $P$ is not an atomic role is analogous). Then, $\delta=\mathsf{ccl}_{{\cal T}}[u,\mathcal{C}_{0}({\cal K})](\exists P)-(\exists P)^{\mathcal{C}_{0}({\cal K})}(u)>0$ , hence $\Delta^{\mathcal{C}_{1}({\cal K})}=\Delta^{\mathcal{C}_{0}({\cal K})}\cup\{w^{1}_{u,P},\dots,w^{\delta}_{u,P}\}$ where $w^{j}_{u,P}$ are fresh anonymous elements. Moreover, $P^{\mathcal{C}_{1}({\cal K})}$ contains $P^{\mathcal{C}_{0}({\cal K})}$ plus tuples $(u,w^{1}_{u,P}),\dots,(u,w^{\delta}_{u,P})$ . We next show that $h$ can be extended to all anonymous elements $w^{j}_{u,P}$ introduced at this step as a result of some $u$ and role $P$ with the above properties such that $(h,h_{S},\dots)$ , $S\in{\mathbf{C}}\cup{\mathbf{R}}$ , is predicate injective on ${\mathbf{I}}$ . Because $\mathsf{ccl}_{{\cal T}}[u,\mathcal{C}_{0}({\cal K})](\exists P)=k$ and $(\exists P)^{\mathcal{C}_{0}({\cal K})}(u)=l<k$ , there exists a sequence of concepts $C_{0},\ldots,C_{n}$ with $C_{n}=\exists P$ such that $C_{i-1}\sqsubseteq C_{i}\in{\cal T}$ for all $i\in[1,n]$ and $C_{0}^{\mathcal{C}_{0}({\cal K})}(u)=k$ . Since $(h,h_{S},\dots)$ is predicate injective on ${\mathbf{I}}$ at the first step and $h(u)=a^{\cal I}$ , we have $C_{0}^{{\cal I}}(a^{\cal I})\geq k$ . Because ${\cal I}$ is a model of ${\cal K}$ , it satisfies all axioms in ${\cal T}$ , hence, $C_{i}^{{\cal I}}(a^{\cal I})\geq k$ , and as a result $(\exists P)^{{\cal I}}(a^{\cal I})\geq k$ . In other words, $P^{{{\cal I}}^{\texttt{e}}}$ contains at least $k$ pairs $[(a^{\cal I},z_{i}){:}\,m_{i}]$ , $i\in[1,k]$ . Observe that from the first step and every pair $[(u,v_{1}){:}\,m],[(u,v_{2}){:}\,m^{\prime}]\in P^{\mathcal{C}_{0}^{\texttt{e}}({\cal K})}$ with $v_{1}\not=v_{2}$ or $m\not=m^{\prime}$ , we have $h_{P}([(u,v_{1}){:}\,m])\not=h_{P}([(u,v_{2}){:}\,m^{\prime}])$ . Because $P^{\mathcal{C}_{0}^{\texttt{e}}({\cal K})}$ contains $l$ such distinct tuples and $k=\delta+l$ , there are at least $\delta$ pairs $[(a^{\cal I},r_{1}){:}\,n_{1}],\dots,[(a^{\cal I},r_{\delta}){:}\,n_{\delta}]$ in $P^{{{\cal I}}^{\texttt{e}}}$ for which there is no $[(u,v){:}\,m]\in P^{\mathcal{C}_{0}^{\texttt{e}}({\cal K})}$ that maps to them under $h_{P}$ . Therefore, we can extend $h$ such that $h(w^{j}_{u,P})=r_{j}$ and set $h_{P}$ so that $h_{P}([(u,w^{j}_{u,P}){:}\,1])=[(a^{\cal I},r_{j}){:}\,n_{j}]$ . Suppose now that there exists $w^{j}_{u,P}$ such that $h(w^{j}_{u,P})=b^{\cal I}$ with $b\in{\mathbf{I}}$ . Since $P^{\mathcal{C}_{1}({\cal K})}(u,w^{j}_{u,P})=1$ , we have $[(u,w^{j}_{u,P}){:}\,1]\in P^{\mathcal{C}_{1}^{\texttt{e}}({\cal K})}$ and $[(w^{j}_{u,P},u){:}\,1]\in(P^{-})^{\mathcal{C}_{1}^{\texttt{e}}({\cal K})}$ , hence, the requirement for $h_{P^{-}}$ w.r.t. $w^{j}_{u,P}$ is trivially satisfied. Finally, consider an element $u=a^{\mathcal{C}_{0}({\cal K})}$ such that $\mathsf{ccl}_{{\cal T}}[u,\mathcal{C}_{0}({\cal K})](A)>A^{\mathcal{C}_{0}({\cal K})}(u)$ with $A\in{\mathbf{C}}$ . In such a case, $A^{\mathcal{C}_{1}({\cal K})}(u)$ is set to $\mathsf{ccl}_{{\cal T}}[u,\mathcal{C}_{0}({\cal K})](A)$ . Given the above discussion, it is trivial to verify that $h_{A}$ satisfies the required condition on the pairs $[u{:}\,m]\in A^{\mathcal{C}_{1}^{\texttt{e}}({\cal K})}$ . As a result of all the above, we have shown that $(h,h_{S},\dots)$ is predicate injective on ${\mathbf{I}}$ at the second step as well.

Last, observe that for all $i>1$ , and for all $S\in{\mathbf{C}}\cup{\mathbf{R}}$ , extensions $S^{\mathcal{C}_{i}({\cal K})}$ contain $S^{\mathcal{C}_{i-1}({\cal K})}$ plus tuples $\mathbf{t}$ mentioning only anonymous elements, for which we know by definition that $S^{\mathcal{C}_{i}({\cal K})}(\mathbf{t})=1$ . Therefore, $h$ can be trivially extended to these anonymous elements so that $(h,h_{S},\dots)$ is predicate injective on ${\mathbf{I}}$ at step $i$ . ∎

Since a Boolean CQ $q$ can be seen as a bag of atoms, we can consider its corresponding Boolean enumerated CQ (e-CQ), which is the e-bag ${q}^{\texttt{e}}$ . We call the elements of ${q}^{\texttt{e}}$ enumerated atoms (e-atoms). For the following definition, it is convenient to partition a Boolean CQ $q$ to the subqueries $q_{S}$ each of which consists of all atoms in $q$ over atomic concept or role $S$ (with corresponding multiplicities) and subquery $q_{=}$ consisting of all equalities in $q$ . An enumerated valuation (e-valuation) of a Boolean e-CQ ${q}^{\texttt{e}}$ , for $q()=\exists\mathbf{y}.\,\phi(\mathbf{y})$ , over an e-bag interpretation ${{\cal I}}^{\texttt{e}}=\langle\Delta^{{\cal I}},\cdot^{{{\cal I}}^{\texttt{e}}}\rangle$ is a family $(\nu,\nu_{S},\ldots)$ , $S\in\mathbf{C}\cup\mathbf{R}$ , of functions

[TABLE]

such that

–

$\nu(a)=a^{{{\cal I}}^{\texttt{e}}}$ for each $a\in{\mathbf{I}}$ ,

–

$\nu(y)=\nu(t)$ for all equality e-atoms $[y=t{:}\,m]\in{q}^{\texttt{e}}_{=}$ ,

–

$\nu_{A}([A(t){:}\,m])=[\nu(t){:}\,\ell]$ for all $A\in{\mathbf{C}}$ and $[A(t){:}\,m]\in{q}^{\texttt{e}}_{A}$ , where $\ell\in\mathbb{N}$ is some number, and

–

$\nu_{P}([P(t_{1},t_{2}){:}\,m])=[(\nu(t_{1}),\nu(t_{2})){:}\,\ell]$ for all $P\in{\mathbf{R}}$ and $[P(t_{1},t_{2}){:}\,m]\in{q}^{\texttt{e}}_{P}$ , where $\ell\in\mathbb{N}$ is some number.

Similarly to the case of e-homomorphisms, we sometimes write $\nu_{P^{-}}([P^{-}(t_{1},t_{2}){:}\,m])$ instead of $\nu_{P}([P(t_{1},t_{2}){:}\,m])$ , for $P\in\mathbf{R}$ .

Intuitively, a Boolean CQ can be seen as a bag interpretation with terms (variables and individuals) in the domain. Then, an e-valuation is just an e-bag homomorphism from the enumerated version of this special bag interpretation to a normal e-bag interpretation. It is straightforward to check that the number of e-valuations of a Boolean e-CQ ${q}^{\texttt{e}}$ over an e-bag interpretation ${{\cal I}}^{\texttt{e}}$ is precisely the multiplicity $q^{{\cal I}}(\langle\rangle)$ of the empty tuple in the evaluation of $q$ over ${\cal I}$ .

The following lemma says that if two e-valuations over the bag canonical model coincide on all the (enumerated copies of the) atoms of a rooted CQ that involve terms evaluating to (the interpretations of) individuals, then they are the same e-valuation.

Lemma 34.

Let $q$ be a rooted Boolean CQ and ${\cal K}$ be a $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontology. If two e-valuations $(\nu^{1},\nu^{1}_{S^{\prime}},\ldots)$ and $(\nu^{2},\nu^{2}_{S^{\prime}},\ldots)$ of ${q}^{\texttt{e}}$ over $\mathcal{C}^{\texttt{e}}({\cal K})$ are different, then there exist an individual $a\in{\mathbf{I}}$ , e-atom $[S(\mathbf{t}){:}\,m]\in{q}^{\texttt{e}}$ and number $i\in\{1,2\}$ such that $\nu^{i}(a)\in\nu^{i}(\mathbf{t})$ and $\nu^{1}_{S}([S(\mathbf{t}){:}\,m])\neq\nu^{2}_{S}([S(\mathbf{t}){:}\,m])$ .

Proof.

Let e-valuations $(\nu^{1},\nu^{1}_{S^{\prime}},\ldots)$ and $(\nu^{2},\nu^{2}_{S^{\prime}},\ldots)$ of ${q}^{\texttt{e}}$ over $\mathcal{C}^{\texttt{e}}({\cal K})$ be different, but, for the sake of contradiction, $\nu^{1}_{S}([S(\mathbf{t}){:}\,m])=\nu^{2}_{S}([S(\mathbf{t}){:}\,m])$ for all $a\in{\mathbf{I}}$ , $[S(\mathbf{t}){:}\,m]\in{q}^{\texttt{e}}$ and $i\in\{1,2\}$ such that $\nu^{i}(a)\in\nu^{i}(\mathbf{t})$ . Since the e-valuations are different, there exists $[S(\mathbf{t}){:}\,m]\in{q}^{\texttt{e}}$ such that $\nu^{1}_{S}([S(\mathbf{t}){:}\,m])\neq\nu^{2}_{S}([S(\mathbf{t}){:}\,m])$ . Moreover, by assumption $\mathbf{t}$ consists of only variables. Suppose that $S(\mathbf{t})$ is $P(x_{1},x_{2})$ , where $P\in{\mathbf{R}}$ (we do it without loss of generality, because the case of $A(x)$ for $A\in{\mathbf{C}}$ can be handled in the same way).

Boolean CQ $q$ is rooted, so there exists a sequence

[TABLE]

of e-atoms such that $t_{0}\in{\mathbf{I}}$ , $[R_{k}(t_{k-1},t_{k}){:}\,m_{k}]$ is either $[P(x_{1},x_{2}){:}\,m]$ or $[P^{-}(x_{2},x_{1}){:}\,m]$ , and for each $j=1,\ldots,k$ either $[R_{j}(t_{j-1},t_{j}){:}\,m_{j}]$ is in ${q}^{\texttt{e}}$ , if $R_{j}$ is an atomic role, or $[P_{j}(t_{j},t_{j-1}){:}\,m_{j}]$ is in ${q}^{\texttt{e}}$ , if $R_{j}=P_{j}^{-}$ .

We claim that

[TABLE]

for all $j=1,\ldots,k$ (which, in particular, contradicts our assumption on $[P(x_{1},x_{2}){:}\,m]$ ). To prove this claim, suppose for the sake of contradiction that it is not the case, and let $j\in\{1,\ldots,k\}$ be the smallest number such that (2) does not hold. By assumption, we know that $\nu^{i}(t_{j-1})\neq\nu^{i}(a)$ for both $i=1,2$ and any $a\in{\mathbf{I}}$ (therefore, $j\neq 1$ , because $t_{0}\in{\mathbf{I}}$ ). However, since $j$ is the smallest number, $\nu^{1}(t_{j-1})=\nu^{2}(t_{j-1})$ . So, the element $u=\nu^{1}(t_{j-1})$ in the bag canonical model $\mathcal{C}({\cal K})=\bigcup_{i\geq 0}\mathcal{C}_{i}({\cal K})$ was introduced not in $\mathcal{C}_{0}({\cal K})$ , which implies, by construction, that $(\exists R_{j})^{\mathcal{C}({\cal K})}(u)\leq 1$ . In fact, since $(\nu^{1},\nu^{1}_{S^{\prime}},\ldots)$ is an e-valuation, $(\exists R_{j})^{\mathcal{C}({\cal K})}(u)=1$ , that is, there exists just one $v\in\Delta^{\mathcal{C}({\cal K})}$ such that $R_{j}^{\mathcal{C}({\cal K})}(u,v)\geq 1$ , and, moreover, $R_{j}^{\mathcal{C}({\cal K})}(u,v)=1$ . In other words, it holds that $[(u,v){:}\,1]\in R_{j}^{\mathcal{C}^{\texttt{e}}({\cal K})}$ , but $[(u,v){:}\,2]\notin R_{j}^{\mathcal{C}^{\texttt{e}}({\cal K})}$ . Since $(\nu^{1},\nu^{1}_{S^{\prime}},\ldots)$ and $(\nu^{2},\nu^{2}_{S^{\prime}},\ldots)$ are e-valuations, $\nu^{1}_{R_{j}}$ and $\nu^{2}_{R_{j}}$ send $[R_{j}(t_{j-1},t_{j}){:}\,m_{j}]$ to some enumerated pairs in $R_{i}^{\mathcal{C}^{\texttt{e}}({\cal K})}$ , which, by assumption, are different. However, we also know that $\nu^{1}(t_{j-1})=\nu^{2}(t_{j-1})$ , so the only possibility for both $\nu^{1}_{R_{j}}([R_{j}(t_{j-1},t_{j}){:}\,m_{j}])$ and $\nu^{2}_{R_{j}}([R_{j}(t_{j-1},t_{j}){:}\,m_{j}])$ is $[(u,v){:}\,1]$ . Therefore, our assumption on existence of $j$ was wrong and (2) indeed holds for all $j$ . In particular, it holds for $j=k$ , which contradicts the fact that $\nu^{1}_{P}([P(x_{1},x_{2}){:}\,m])\neq\nu^{2}_{P}([P(x_{1},x_{2}){:}\,m])$ . Therefore, our assumption on $(\nu^{1},\nu^{1}_{S^{\prime}},\ldots)$ and $(\nu^{2},\nu^{2}_{S^{\prime}},\ldots)$ was wrong, and the lemma is proven. ∎

Having Lemmas 33 and 34 at hand, we are ready to prove that for $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontologies rooted queries can be evaluated over the bag canonical model.

See 16

Proof.

First, note that it is enough to consider only Boolean rooted CQs, because the required property for a non-Boolean rooted CQ $q(\mathbf{x})$ follows from the property for all Boolean CQs obtained from $q(\mathbf{x})$ by replacing variables $\mathbf{x}$ by individuals from ${\mathbf{I}}$ .

For a Boolean rooted CQ $q$ it is enough to show that for any $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontology ${\cal K}$ , any bag model ${\cal I}$ of ${\cal K}$ and any e-valuation $(\nu,\nu_{S},\ldots)$ of $q$ over $\mathcal{C}^{\texttt{e}}({\cal K})$ there exists a unique e-valuation $(\nu^{\prime},\nu^{\prime}_{S},\ldots)$ of $q$ over ${{\cal I}}^{\texttt{e}}$ . By Lemma 33 we know that there exists an e-homomorphism $(h,h_{S},\ldots)$ from $\mathcal{C}^{\texttt{e}}({\cal K})$ to ${{\cal I}}^{\texttt{e}}$ that is predicate-injective on ${\mathbf{I}}$ . Therefore, we can take the composition $(\nu,\nu_{S},\ldots)\circ(h,h_{S},\ldots)=(\nu\circ h,\nu_{S}\circ h_{S},\ldots)$ as $(\nu^{\prime},\nu^{\prime}_{S},\ldots)$ ; indeed, the result of this composition is an e-valuation of $q$ over ${{\cal I}}^{\texttt{e}}$ and, by Lemma 34, this result is unique throughout e-valuations of $q$ over $\mathcal{C}^{\texttt{e}}({\cal K})$ . ∎

See 18

Proof.

Let $q$ be the CQ $q(\mathbf{x})=\exists\mathbf{y}.\,\phi(\mathbf{x},\mathbf{y})$ . First note that because CQs are safe and equalities between individuals are not allowed, $\phi(\mathbf{x},\mathbf{y})$ contains at least one atom, thus, $n\geq 1$ . Observe that $\mathcal{C}_{n}({\cal K})$ is a subinterpretation of $\mathcal{C}({\cal K})$ , hence, from the monotonicity property of CQs, we have $q^{\mathcal{C}_{n}({\cal K})}\subseteq q^{\mathcal{C}({\cal K})}$ . To prove the inverse inclusion, we show that interpretations $\mathcal{C}_{k}({\cal K})$ with $k>n$ do not contribute to the bag answers $q^{\mathcal{C}({\cal K})}$ , and as a result, they can be disregarded. In other words, we prove that for every tuple of individuals $\mathbf{a}$ and every valuation $\lambda:\mathbf{x}\cup\mathbf{y}\cup{\mathbf{I}}\to\Delta^{\mathcal{C}({\cal K})}$ with $\lambda(\mathbf{x})=\mathbf{a}$ such that there exist a number $k>n$ and an atom $S_{k}({\mathbf{t}}_{k})$ in $\phi(\mathbf{x},\mathbf{y})$ with $S_{k}^{\mathcal{C}_{k}({\cal K})}(\lambda({\mathbf{t}}_{k}))>S_{k}^{\mathcal{C}_{n}({\cal K})}(\lambda({\mathbf{t}}_{k}))$ , it holds that $\prod_{S(\mathbf{t})\text{ in }\phi(\mathbf{x},\mathbf{y})}S^{\mathcal{C}_{k}({\cal K})}(\lambda(\mathbf{t}))=0$ . By definition of canonical models, for $k>n\geq 1$ , interpretation $\mathcal{C}_{k}({\cal K})$ differs from $\mathcal{C}_{k-1}({\cal K})$ in that it contains a number of tuples not present in $\mathcal{C}_{k-1}({\cal K})$ having multiplicity 1 and mentioning only anonymous elements. Hence, inequality $S_{k}^{\mathcal{C}_{k}({\cal K})}(\lambda({\mathbf{t}}_{k}))>S_{k}^{\mathcal{C}_{n}({\cal K})}(\lambda({\mathbf{t}}_{k}))$ effectively means that we are considering only valuations that send an atom of $\phi(\mathbf{x},\mathbf{y})$ to a tuple of anonymous elements of $\mathcal{C}({\cal K})$ added after step $n$ . Suppose by contradiction that there are $\mathbf{a}$ and $\lambda$ satisfying the above criteria but $\prod_{S(\mathbf{t})\text{ in }\phi(\mathbf{x},\mathbf{y})}S^{\mathcal{C}_{k}({\cal K})}(\lambda(\mathbf{t}))\geq 1$ . This means that $\lambda$ satisfies all equalities of $q$ and for every atom $S(\mathbf{t})$ of $q$ , $S^{\mathcal{C}_{k}({\cal K})}(\lambda(\mathbf{t}))\geq 1$ . Because $q$ is rooted, every connected component of the Gaifman graph of $q$ has a node, that is, an equivalence class, that mentions a free variable or an individual. Consider the component of $q$ that contains atom $S_{k}({\mathbf{t}}_{k})$ and the equivalence class $\tilde{t}$ of this component that contains a free variable or an individual. Because CQs are safe by definition, this component contains an atom $P({\mathbf{t}}^{\prime})$ mentioning a term in $\tilde{t}$ . As a result, $\lambda({\mathbf{t}}^{\prime})$ contains at least one individual, which, given that $\lambda({\mathbf{t}}_{k})$ is a tuple of anonymous elements, implies that $P({\mathbf{t}}^{\prime})$ and $S_{k}({\mathbf{t}}_{k})$ are different atoms. By definition of canonical models, we know that $\mathcal{C}_{1}({\cal K})$ is the subinterpretation of $\mathcal{C}({\cal K})$ containing tuples with at least one individual, hence we derive that $P^{\mathcal{C}_{1}({\cal K})}(\lambda({\mathbf{t}}^{\prime}))\geq 1$ . But then, since the image of $P({\mathbf{t}}^{\prime})$ under $\lambda$ falls into $\mathcal{C}_{1}({\cal K})$ while the image of $S_{k}({\mathbf{t}}_{k})$ under $\lambda$ falls into $\mathcal{C}_{k}({\cal K})$ but not into $\mathcal{C}_{n}({\cal K})$ (which implies the same for all subinterpretations of $\mathcal{C}_{n}({\cal K})$ ), and both atoms belong to the same connected component, it means that $\phi(\mathbf{x},\mathbf{y})$ contains conjunction $\bigwedge_{j=1}^{k}S_{j}({\mathbf{t}}_{j})$ such that (i) atom $S_{1}({\mathbf{t}}_{1})$ is connected with $P({\mathbf{t}}^{\prime})$ , (ii) ${\mathbf{t}}_{j}\cap{\mathbf{t}}_{j+1}\not=\emptyset$ , for $1\leq j<k$ , and (iii) $S_{j}^{\mathcal{C}_{j}({\cal K})}(\lambda({\mathbf{t}}_{j}))\geq 1$ and $S_{j}^{\mathcal{C}_{j-1}({\cal K})}(\lambda({\mathbf{t}}_{j}))=0$ , for $j\in[1,k]$ . In other words, the image of each one of the atoms under $\lambda$ falls respectively onto tuples created in $\mathcal{C}_{1}({\cal K}),\mathcal{C}_{2}({\cal K}),\dots,\mathcal{C}_{k}({\cal K})$ . But then, this means that $q$ contains at least $k$ atoms, which is a contradiction given that $k>n$ . ∎

See 19

Proof.

First we prove the claim for max unions of CQs. Consider the $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ TBox ${\cal T}=\{A\sqsubseteq\exists R,\ \exists R^{-}\sqsubseteq B\}$ , the rooted CQ $q(x)=\exists y.\,R(x,y)\wedge B(y)$ , and the $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ABox ${\cal A}=\{|A(a),A(a),A(a),R(a,b),R(a,b),B(b),B(b),B(b)|\}$ and let ${\cal K}=\langle{\cal T},{\cal A}\rangle$ . Then, $\mathcal{C}({\cal K})$ is such that

[TABLE]

Evaluating $q$ over $\mathcal{C}({\cal K})$ , we get $q^{\mathcal{C}({\cal K})}(a)=7$ for the individual $a$ . Suppose now that there exists a rewriting of $q$ to a max union of CQs and let $q^{\prime}(x)=q_{1}(x)\vee\cdots\vee q_{n}(x)$ be such a rewriting where $q_{1},\dots,q_{n}$ are CQs. This means that $\bigcup_{i=1}^{n}q_{i}^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}=q^{\mathcal{C}({\cal K})}$ or, alternatively, that there exists $i\in[1,n]$ with $q_{i}^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}=q^{\mathcal{C}({\cal K})}$ . Observe that ${\cal A}$ contains three distinct assertions with multiplicities $3$ , $2$ , and $3$ . Therefore, whenever there is a valuation for the terms of $q_{i}$ that maps an atom of $q_{i}$ to one of these assertions, the multiplicity is either $2$ or $3$ . As a result and because $q_{i}$ is a CQ, any valuation of $q_{i}$ contributes to $q_{i}^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(a)$ a multiplicity that is a multiple of $2$ or $3$ . Since $7$ is prime, there can be no valuation contributing a multiplicity of $7$ . However, $7$ can be expressed as the sums $2+2+3$ or $2\times 2+3$ . For the former sum, this means that there exist three distinct valuations contributing to $q_{i}^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(a)$ multiplicities $2$ , $2$ , and $3$ , respectively, which is clearly impossible given the fact that to get $2$ , query $q_{i}$ must be set equal to $\exists y.\,R(x,y)$ , which excludes the possibility of getting a multiplicity of $3$ . For the latter sum, this means that there exist two distinct valuations contributing to $q_{i}^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(a)$ multiplicities $4$ and $3$ , respectively, which is again impossible given the fact that to get $4$ , query $q_{i}$ must be set equal to $\exists y.\,\exists z.\,R(x,y)\wedge R(x,z)$ (another possibility would have been to use the same variable for $y$ and $z$ , but the argumentation stays the same), which excludes the possibility of getting a multiplicity of $3$ .

We now prove the claim for arithmetic unions of CQs by building on the observations made in the proof above. Consider the $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ TBox ${\cal T}^{\prime}={\cal T}\cup\{\ C\sqsubseteq\exists P,\ \exists P^{-}\sqsubseteq D\}$ , the rooted CQ $q^{\prime}(x,z)=\exists y.\,\exists u.\,R(x,y)\wedge B(y)\wedge P(z,u)\wedge D(u)$ , and the $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ABox ${\cal A}^{\prime}={\cal A}\cup\{|C(a),P(a,b),D(b)|\}_{8,8,8}$ and let ${\cal K}^{\prime}=\langle{\cal T}^{\prime},{\cal A}^{\prime}\rangle$ . Then, $\mathcal{C}({\cal K}^{\prime})$ is such that

[TABLE]

First, observe that ${\cal T}^{\prime}$ contains ${\cal T}$ plus a copy of the axioms in ${\cal T}$ with their predicates renamed. Hence, ${\cal T}^{\prime}$ can be seen as having two disconnected parts. Second, $q^{\prime}$ has two rooted connected components the first of which, say $q_{1}(x)$ , is query $q$ from the previous part of the proof, while the second, say $q_{2}(z)$ , is an isomorphic query of $q$ with the predicates renamed according to the one-to-one mapping $f=\{(A,C),(R,P),(B,D)\}$ . Based on these observations, we draw the following conclusions: (i) the multiplicity of a tuple $(c_{1},c_{2})$ of individuals in $(q^{\prime})^{\mathcal{C}({\cal K}^{\prime})}$ is the result of multiplying numbers $q_{1}^{\mathcal{C}({\cal K}^{\prime})}(c_{1})$ and $q_{2}^{\mathcal{C}({\cal K}^{\prime})}(c_{2})$ ; (ii) a rewriting of $q^{\prime}$ into an arithmetic union of CQs exists if and only if a rewriting for $q_{1}$ and $q_{2}$ exists; (iii) the rewritings of $q_{1}$ and $q_{2}$ should have the same number of CQs, which should be identical up to renaming of variables and predicates based on $f$ . Consider now the evaluation of $q^{\prime}$ over $\mathcal{C}({\cal K}^{\prime})$ . This leads to a bag containing just tuple $(a,a)$ with multiplicity $(q^{\prime})^{\mathcal{C}({\cal K}^{\prime})}((a,a))=q_{1}^{\mathcal{C}({\cal K}^{\prime})}(a)\times q_{2}^{\mathcal{C}({\cal K}^{\prime})}(a)=7\times 64$ . Let also $q^{\prime}_{1}(x)$ and $q^{\prime}_{2}(z)$ be the rewritings for $q_{1}(x)$ and $q_{2}(z)$ , respectively. Given the discussion in the first part of the proof, to get multiplicity $7$ for $q_{1}^{\mathcal{C}({\cal K}^{\prime})}(a)$ , we have only two ways: either as the sum of $2+2+3$ or as the sum $4+3$ . For the former, $q^{\prime}_{1}(x)$ should be equal to $\exists y.\,R(x,y)\mathop{\mathchoice{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}}\exists y.\,R(x,y)\mathop{\mathchoice{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}}A(x)$ , while for the latter, $q^{\prime}_{1}(x)$ should be equal to $\exists y_{1}.\,\exists y_{2}.\,R(x,y_{1})\wedge R(x,y_{2})\mathop{\mathchoice{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}}A(x)$ . By construction, both of these queries when evaluated over $\mathcal{C}(\langle\emptyset,{\cal A}^{\prime}\rangle)$ return the correct multiplicity for $q_{1}^{\mathcal{C}({\cal K}^{\prime})}(a)$ and there are no other queries with this property. However, evaluating their identical versions up to renaming of variables and predicates based on $f$ over $\mathcal{C}(\langle\emptyset,{\cal A}^{\prime}\rangle)$ , that is, queries $\exists u.\,P(z,u)\mathop{\mathchoice{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}}\exists u.\,P(z,u)\mathop{\mathchoice{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}}C(z)$ and $\exists u_{1}.\,\exists u_{2}.\,P(z,u_{1})\wedge P(z,u_{2})\mathop{\mathchoice{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}}C(z)$ , we get, respectively, multiplicity $24$ and $72$ , both of which are different from $q_{2}^{\mathcal{C}({\cal K}^{\prime})}(a)=64$ . ∎

Let $\cal S$ be a finite set of concept and role symbols. We say that a bag interpretation $\langle\Delta^{\cal I},\cdot^{\cal I}\rangle$ is finite relative to $\cal S$ if, for every $S\in\cal S$ , bag $S^{\cal I}$ is finite. Finite bag interpretations relative to a finite set $\cal S$ correspond to bag database instances $I$ over bag schemas $\cal S$ with domains $\Delta^{\cal I}$ as these where defined in Grumbach and Milo [1996]. Hence, in the following, given a finite set $\cal S$ and a bag interpretation ${\cal I}$ that is finite relative to $\cal S$ , we denote by $I_{\cal I}$ the corresponding bag database instance. Also, given a bag database instance $I_{\cal I}$ and a $\textsc{BALG}^{1}_{\varepsilon}$ algebra expression $E$ , we denote by $E(I_{\cal I})$ the bag corresponding to the evaluation of $E$ over $I_{\cal I}$ .

Proposition 35.

Let ${\cal I}$ by any bag interpretation that is finite relative to a finite set $\cal S$ . For each $\textsc{BALG}^{1}_{\varepsilon}$ -query $q$ there is a $\textsc{BALG}^{1}_{\varepsilon}$ algebra expression $E$ such that $q^{{\cal I}}=E(I_{\cal I})$ .

Proof.

We refer to Grumbach and Milo [1996] for the definition of the $\textsc{BALG}^{1}_{\varepsilon}$ operators. For each $\textsc{BALG}^{1}_{\varepsilon}$ -query $q$ , we define a $\textsc{BALG}^{1}_{\varepsilon}$ algebra expression $E_{q}$ by induction on the structure of $q$ as follows, where, for each tuple of terms $\mathbf{t}$ over ${\mathbf{X}}\cup{\mathbf{I}}$ and $t\in\mathbf{t}\cup{\mathbf{I}}$ , $\mathsf{ref}(t,\mathbf{t})$ is defined as $t$ if $t\in{\mathbf{I}}$ , and otherwise as the first position in $\mathbf{t}$ containing $t$ :

•

If $q(\mathbf{x})=S(\mathbf{t})$ for $S\in{\mathbf{C}}\cup{\mathbf{R}}$ and $\mathbf{t}=\langle t_{1},\dots,t_{|\mathbf{t}|}\rangle$ a tuple over $\mathbf{x}\cup{\mathbf{I}}$ , then

[TABLE]

•

If $q(\mathbf{x})=q_{0}(\mathbf{x}_{0})\land(x=t)$ for $x\in\mathbf{x}_{0}$ , $t\in{\mathbf{X}}\cup{\mathbf{I}}$ , and $\mathbf{x}=\mathbf{x}_{0}\cup(\{t\}\setminus{\mathbf{I}})$ , then

–

$E_{q}=\sigma_{\mathsf{ref}(x,\mathbf{x}_{0})=\mathsf{ref}(t,\mathbf{x}_{0})}(E_{q_{0}})$ if $t\in\mathbf{x}_{0}\cup{\mathbf{I}}$ and $\mathbf{x}=\mathbf{x}_{0}$ (we assume w.l.o.g. that the order of variables in $\mathbf{x}$ and $\mathbf{x}_{0}$ is the same), and

–

$E_{q}=\pi_{1,\dots,|\mathbf{x}_{0}|,\mathsf{ref}(x,\mathbf{x}_{0})}(E_{q_{0}})$ if $t\in{\mathbf{X}}\setminus\mathbf{x}_{0}$ and $\mathbf{x}=\mathbf{x}_{0}t$ (we assume w.l.o.g. that $t$ is added as the last variable to $\mathbf{x}_{0}$ ).

•

If $q(\mathbf{x})=q_{1}(\mathbf{x}_{1})\land q_{2}(\mathbf{x}_{2})$ , for $\mathbf{x}=\mathbf{x}_{1}\cup\mathbf{x}_{2}$ , then

[TABLE]

•

If $q(\mathbf{x})=\exists\mathbf{y}.q_{0}(\mathbf{x},\mathbf{y})$ , then $E_{q}=\pi_{1,\dots,|\mathbf{x}|}(E_{q_{0}})$ (we assume w.l.o.g. that in $q_{0}$ variables in $\mathbf{y}$ come after variables in $\mathbf{x}$ ).

•

If $q(\mathbf{x})=q_{1}(\mathbf{x})\lor q_{2}(\mathbf{x})$ , then $E_{q}=E_{q_{1}}\cup E_{q_{2}}$ .

•

If $q(\mathbf{x})=q_{1}(\mathbf{x})\mathop{\mathchoice{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}{\mathrel{\ooalign{\hss$ \vee $\hss\cr\kern 1.85135pt\raise 2.15277pt\hbox{$ \cdot $}\raise 2.15277pt\hbox{}}}}}q_{2}(\mathbf{x})$ , then $E_{q}=E_{q_{1}}\uplus E_{q_{2}}$ .

•

If $q(\mathbf{x})=q_{1}(\mathbf{x})\setminus q_{2}(\mathbf{x})$ , then $E_{q}=E_{q_{1}}-E_{q_{2}}$ .

It is straightforward to check that, for each $\textsc{BALG}^{1}_{\varepsilon}$ -query $q$ and each bag interpretation ${\cal I}$ , we have $q^{{\cal I}}=E_{q}(I_{\cal I})$ . ∎

See 22

Proof.

The complexity class $\textsc{BALG}^{1}_{\varepsilon}$ is defined in Grumbach and Milo [1996] by the problem of checking, given a bag database instance $I$ , a tuple of individuals $\mathbf{b}$ , a number $n\geq 0$ , and a fixed $\textsc{BALG}^{1}_{\varepsilon}$ algebra expression $E$ , whether the multiplicity of $\mathbf{b}$ in bag $E(I)$ is exactly $n$ . We next reduce the problem of checking $q^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(\mathbf{a})\geq k$ to the above problem. Without loss of generality we assume that $k\in\mathbb{N}\cup\{0\}$ , since inequality $q^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(\mathbf{a})\geq k$ is always false whenever $k=\infty$ . For this consider the following definitions:

–

Let ${\cal I}$ be the extension of $\mathcal{C}(\langle\emptyset,{\cal A}\rangle)$ such that

[TABLE]

Note that ${\cal I}$ is finite relative to the finite set $\cal S$ consisting of the predicate symbols in ${\cal A}$ and symbol $S$ , hence $I_{\cal I}$ is a bag database instance with schema $\cal S$ .

–

Let $\mathbf{b}=\mathbf{a}$ and $n=0$ .

–

Let $E$ be the algebra expression corresponding to query $S(\mathbf{x})\setminus q(\mathbf{x})$ according to Proposition 35.

Then, $q^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(\mathbf{a})\geq k$ if and only if the multiplicity of $\mathbf{a}$ in $E(I_{\cal I})$ is 0. Indeed, suppose $q^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(\mathbf{a})<k$ . Then $(S(\mathbf{x})\setminus q(\mathbf{x}))^{{\cal I}}(\mathbf{a})>0$ , thus by Proposition 35 the multiplicity of $\mathbf{a}$ in $E(I_{\cal I})$ is greater than 0; the other direction is analogous.

The above many-one reduction can be seen to be computable, for each ${\cal A}$ , $\mathbf{a}$ , and $k$ , by a Boolean circuit whose depth depends only on $q$ . We conclude that the language $\{\langle{\cal A},\mathbf{a},k\rangle\mid q^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(\mathbf{a})\geq k\}$ is contained in $\textsc{BALG}^{1}_{\varepsilon}$ under LogSpace-uniform $\textsc{AC}^{0}$ reductions, as required. ∎

In the following, for a CQ possibly with inequalities $q(\mathbf{x})=\exists\mathbf{y}.\,\phi(\mathbf{x},\mathbf{y})$ and a bag interpretation ${\cal I}$ , we call a valuation $\lambda:\mathbf{x}\cup\mathbf{y}\cup{\mathbf{I}}\to\Delta^{\cal I}$ a homomorphism from $q$ to ${\cal I}$ if for every atom $P(\mathbf{t})$ of $\phi(\mathbf{x},\mathbf{y})$ , it holds $P^{\cal I}(\lambda(\mathbf{t}))\geq 1$ .

Proposition 36.

For any rooted CQ $q(\mathbf{x})=\exists\mathbf{y}.\,\phi(\mathbf{x},\mathbf{y})$ , $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontology ${\cal K}$ , and equality-consistent subset $\mathbf{z}\subseteq\mathbf{y}$ , we have

[TABLE]

where (i) $\mathbf{\bar{z}}$ are all the terms of $\phi(\mathbf{x},\mathbf{y})$ not appearing in $\mathbf{z}$ , (ii) $\phi_{\mathbf{\bar{z}}}$ is the subconjunction of $\phi(\mathbf{x},\mathbf{y})$ that consists of atoms and equalities mentioning only terms in $\mathbf{\bar{z}}$ , (iii) $\phi_{\mathbf{z}^{\prime}}$ is the subconjunction of $\phi(\mathbf{x},\mathbf{y})$ that consists of atoms and equalities mentioning a variable in $\mathbf{z}^{\prime}$ , and (iv) $\mathbf{t}_{\mathbf{z}^{\prime}}$ are all the terms of $\mathbf{\bar{z}}$ appearing in $\phi_{\mathbf{z}^{\prime}}$ .

Proof.

First observe that by Definition 24 of ma-connected subsets of $\mathbf{z}$ the following hold: (i) if a variable $z$ belongs to a ma-connected subset $\mathbf{z}^{\prime}$ of $\mathbf{z}$ , then $\mathbf{z}^{\prime}$ contains all variables in $\tilde{z}$ plus all variables in the equivalence classes that are reachable from $\tilde{z}$ in the Gaifman graph of $q$ through nodes in $\tilde{\mathbf{z}}$ ; (ii) any two ma-connected subsets of $\mathbf{z}$ do not have any atom or equality in common. Combining the above observations with the fact that $q$ is rooted, we derive that the query appearing on the right-hand side of equation (3), name it $q^{\prime}$ , contains exactly query $q$ plus a number of equalities between terms connecting $\phi_{\mathbf{\bar{z}}}$ with the subconjunction corresponding to a ma-connected subset of $\mathbf{z}$ . Let $\mathbf{a}$ be a tuple of individuals and $\Lambda_{\mathbf{z}}$ be the set of homomorphisms for $[q,\mathbf{z}]^{\mathcal{C}({\cal K})}$ and $\mathbf{a}$ . First, suppose that $\Lambda_{\mathbf{z}}=\emptyset$ , that is, $[q,\mathbf{z}]^{\mathcal{C}({\cal K})}(\mathbf{a})=0$ . This means that there is no homomorphism $\lambda$ from $q$ to $\mathcal{C}({\cal K})$ with $\lambda(\mathbf{x})=\mathbf{a}$ that satisfies the equalities of $q$ . But then, the same is true for $q^{\prime}$ , hence $[q^{\prime},\mathbf{z}]^{\mathcal{C}({\cal K})}(\mathbf{a})=0$ as well. Suppose now that there is a homomorphism $\lambda$ satisfying the equalities of $q$ . This means that $\lambda$ satisfies the equalities of $q^{\prime}$ as well except possibly the extra ones. We next prove that $\lambda$ satisfies these extra equalities for each non-empty ma-connected subset $\mathbf{z}^{\prime}$ of $\mathbf{z}$ as well. For this, consider such a subset $\mathbf{z}^{\prime}$ . Because $q$ is rooted, $\phi_{\mathbf{z}^{\prime}}$ should be connected with $\phi_{\mathbf{\bar{z}}}$ , hence, $\phi_{\mathbf{z}^{\prime}}$ contains atoms $P_{i}(t_{i},z_{i})$ (resp., $P_{i}(z_{i},t_{i})$ ) such that $t_{i}\not\in\mathbf{z}^{\prime}$ , $z_{i}\in\mathbf{z}^{\prime}$ , and $i\in[1,n]$ (note that if $\phi_{\mathbf{\bar{z}}}$ is empty the above still holds, because if we assume the opposite, then $\bigwedge_{\text{ma-connected }\mathbf{z}^{\prime}\subseteq\mathbf{z}}\phi_{\mathbf{z}^{\prime}}$ , as a rooted query, should contain a distinguished variable or an equality mentioning an individual, but since for every $\mathbf{z}^{\prime}$ we have $\mathbf{z}^{\prime}\subseteq\mathbf{z}\subseteq\mathbf{y}$ and $\mathbf{z}$ is equality-consistent, we derive a contradiction in both cases). Suppose by contradiction that there is a pair $i,j$ such that $\lambda(t_{i})\not=\lambda(t_{j})$ . Because $\lambda(t_{i}),\lambda(t_{j})\in{\mathbf{I}}$ , by definition of canonical models, we have $\lambda(z_{i})=w_{\lambda(t_{i}),P_{i}}$ and $\lambda(z_{j})=w_{\lambda(t_{j}),P_{j}}$ . But then, because $\mathbf{z}^{\prime}$ is ma-connected, $\phi_{\mathbf{z}^{\prime}}$ contains an atom that is sent by $\lambda$ to a tuple $(w_{1},w_{2})$ such that $w_{1}$ is either $w_{\lambda(t_{i}),P_{i}}$ or an anonymous generated by $w_{\lambda(t_{i}),P_{i}}$ and $w_{2}$ is either $w_{\lambda(t_{j}),P_{j}}$ or an anonymous generated by $w_{2}$ . Given that the anonymous elements of canonical models are characterised by the individual and the role that generated them and there can be no tuple having anonymous elements generated from different combination, the above situation is impossible. Hence, not only we have $\lambda(t_{i})=\lambda(t_{j})$ for every $i,j\in[1,n]$ , but also that $P_{i}=P_{j}$ holds. From the former it follows that the extra equalities in $q^{\prime}$ are satisfied by all homomorphisms in $\Lambda_{\mathbf{z}}$ . ∎

Proposition 37.

Let $q(\mathbf{x})=\exists\mathbf{y}.\,\phi(\mathbf{x},\mathbf{y})$ be a rooted CQ, ${\cal K}=\langle{\cal T},{\cal A}\rangle$ a $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontology, and $\mathbf{z}$ an equality-consistent subset of $\mathbf{y}$ . For all non-empty, ma-connected, and realisable by ${\cal T}$ subsets $\mathbf{z}^{\prime}$ of $\mathbf{z}$ , we have ${q^{a}_{\mathbf{z}^{\prime}}}^{\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}(\langle\rangle)=1$ where individual $a$ and ABox ${\cal A}^{\prime}$ are picked according to Section 5.

Proof.

Consider a non-empty, ma-connected, and realisable by ${\cal T}$ subset $\mathbf{z}^{\prime}$ of $\mathbf{z}$ . Let us first inspect query ${q^{a}_{\mathbf{z}^{\prime}}}$ . For $\mathbf{x}^{\prime}=\mathbf{t}_{\mathbf{z}^{\prime}}\cap{\mathbf{X}}$ , $a$ an individual in $\mathbf{t}_{\mathbf{z}^{\prime}}$ if it exists or a fresh individual otherwise, we have ${q^{a}_{\mathbf{z}^{\prime}}}()=\exists\mathbf{x}^{\prime}.\,\exists\mathbf{z}^{\prime}.\;\phi_{\mathbf{z}^{\prime}}\land\;\bigwedge_{t\in\mathbf{t}_{\mathbf{z}^{\prime}}}(t=a)\;\land\;\bigwedge_{z\in\mathbf{z}^{\prime}}(z\neq a)$ where, for $t\in\mathbf{t}_{\mathbf{z}^{\prime}}$ and $z\in\mathbf{z}^{\prime}$ , ${\cal A}^{\prime}$ is the bag ABox having either only assertion $P(a,b)$ (with multiplicity 1), when $\alpha_{\mathbf{z}^{\prime}}=P(t,z)$ , or only assertion $P(b,a)$ , when $\alpha_{\mathbf{z}^{\prime}}=P(z,t)$ . Since $\mathbf{z}^{\prime}$ is realisable by ${\cal T}$ , by Definition 25 we have $({q^{a}_{\mathbf{z}^{\prime}}})^{\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}(\langle\rangle)\geq 1$ . Suppose that $({q^{a}_{\mathbf{z}^{\prime}}})^{\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}(\langle\rangle)>1$ . Observe that every concept/role extension under ${\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}$ is a set, hence, the multiplicity of a tuple in any such extension is $1$ . This means that all homomorphisms from ${q^{a}_{\mathbf{z}^{\prime}}}$ to ${\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}$ that satisfy the equalities and inequalities of ${q^{a}_{\mathbf{z}^{\prime}}}$ contribute multiplicity $1$ for $\langle\rangle$ in bag $({q^{a}_{\mathbf{z}^{\prime}}})^{\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}$ , hence, there exist at least two such homomorphisms. Because $\mathbf{z}^{\prime}$ is ma-connected, the subgraph of the Gaifman graph of $q$ induced by the set of the equivalence classes of $\mathbf{z}^{\prime}$ is connected, and as a result, the Gaifman graph of $\phi_{\mathbf{z}^{\prime}}$ in ${q^{a}_{\mathbf{z}^{\prime}}}$ contains a single connected component. But then, because ${q^{a}_{\mathbf{z}^{\prime}}}$ contains equalities between all terms in $\mathbf{t}_{\mathbf{z}^{\prime}}$ and individual $a$ , all homomorphisms from ${q^{a}_{\mathbf{z}^{\prime}}}$ to ${\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}$ send all atoms containing a term in $\mathbf{t}_{\mathbf{z}^{\prime}}$ to the assertion of ${\cal A}^{\prime}$ . Hence, $\phi_{\mathbf{z}^{\prime}}$ is essentially rooted. Therefore, in order to have two homomorphisms from ${q^{a}_{\mathbf{z}^{\prime}}}$ to ${\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}$ , it is necessary that ${q^{a}_{\mathbf{z}^{\prime}}}$ contains an atom $P(x,y)$ such that it can be sent to two different tuples, say $(u,v_{1})$ and $(u,v_{2})$ in $P^{{\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}}$ . However, observe that for every element $u$ of $\Delta^{\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}$ and for every role $P$ , ${\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}$ may contain at most one $P$ -successor for $u$ , which implies that every atom $P(x,y)$ in ${q^{a}_{\mathbf{z}^{\prime}}}$ is sent to a tuple in $P^{\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}$ in a unique way. Combining all of the above, we conclude that there is only one homomorphism from ${q^{a}_{\mathbf{z}^{\prime}}}$ to ${\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}$ satisfying the equalities and inequalities of ${q^{a}_{\mathbf{z}^{\prime}}}$ , which proves the claim. ∎

See 27

Proof.

Let $\Lambda_{\mathbf{z}}$ be the set of valuations $\lambda:\mathbf{x}\cup\mathbf{y}\cup{\mathbf{I}}\to\Delta^{\mathcal{C}({\cal K})}$ corresponding to Definition 23 for $[q,\mathbf{z}]^{\mathcal{C}({\cal K})}$ .

1. Because variables $\mathbf{z}$ are realisable by ${\cal T}$ , by Definition 26 we have that $\mathbf{z}$ are equality-consistent and all non-empty ma-connected subsets $\mathbf{z}^{\prime}$ of $\mathbf{z}$ are realisable by ${\cal T}$ . The former implies that there is no equality $z=t$ with $z\in\mathbf{z}$ and $t\not\in\mathbf{z}$ in any $\phi_{\mathbf{z}^{\prime}}$ . The latter implies that for any non-empty ma-connected subset $\mathbf{z}^{\prime}$ with $\mathbf{x}^{\prime}=\mathbf{t}_{\mathbf{z}^{\prime}}\cap{\mathbf{X}}$ , $a$ an individual in $\mathbf{t}_{\mathbf{z}^{\prime}}$ if it exists or a fresh individual otherwise, and query ${q^{a}_{\mathbf{z}^{\prime}}}()=\exists\mathbf{x}^{\prime}.\,\exists\mathbf{z}^{\prime}.\;\phi_{\mathbf{z}^{\prime}}\land\;\bigwedge_{t\in\mathbf{t}_{\mathbf{z}^{\prime}}}(t=a)\;\land\;\bigwedge_{z\in\mathbf{z}^{\prime}}(z\neq a)$ , we have $({q^{a}_{\mathbf{z}^{\prime}}})^{{\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}}(\langle\rangle)\geq 1$ where, for $t_{\mathbf{z}^{\prime}}\in\mathbf{t}_{\mathbf{z}^{\prime}}$ and $z_{\mathbf{z}^{\prime}}\in\mathbf{z}^{\prime}$ , ${\cal A}^{\prime}$ is the bag ABox having either only assertion $P(a,b)$ (with multiplicity 1), when $\alpha_{\mathbf{z}^{\prime}}=P(t_{\mathbf{z}^{\prime}},z_{\mathbf{z}^{\prime}})$ , or only assertion $P(b,a)$ , when $\alpha_{\mathbf{z}^{\prime}}=P(z_{\mathbf{z}^{\prime}},t_{\mathbf{z}^{\prime}})$ (note that by the proof of Proposition 36, atom $\alpha_{\mathbf{z}^{\prime}}$ always exists). This means that there exists a homomorphism $\nu$ from $\phi_{\mathbf{z}^{\prime}}$ to ${\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}$ such that $\nu$ maps a term $t$ of ${q^{a}_{\mathbf{z}^{\prime}}}$ to $a$ if and only if $t\in\mathbf{t}_{\mathbf{z}^{\prime}}$ . Consider now a valuation $\lambda\in\Lambda_{\mathbf{z}}$ , the subconjunction $\phi_{\mathbf{z}^{\prime}}$ of $q$ , and atom $\alpha_{\mathbf{z}^{\prime}}=P(t_{\mathbf{z}^{\prime}},z_{\mathbf{z}^{\prime}})$ (resp., $\alpha_{\mathbf{z}^{\prime}}=P(z_{\mathbf{z}^{\prime}},t_{\mathbf{z}^{\prime}})$ ) of $\phi_{\mathbf{z}^{\prime}}$ . By Definition 23, $\lambda$ maps term $t_{\mathbf{z}^{\prime}}$ to individuals and variable $z_{\mathbf{z}^{\prime}}$ to anonymous elements. Therefore, whenever $\lambda$ is a homomorphism from $q$ to $\mathcal{C}({\cal K})$ , it means that $P^{\mathcal{C}({\cal K})}$ contains a tuple $(a^{\prime},w_{a^{\prime},P})$ for some individual $a^{\prime}$ . But then, this means that $\mathcal{C}({\cal K})$ contains an isomorphic copy of ${\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}$ modulo the individuals $a$ and $b$ in ${\cal A}^{\prime}$ . Hence, inequality $(q^{a^{\prime}}_{\mathbf{z}^{\prime}})^{\mathcal{C}({\cal K})}(\langle\rangle)\geq 1$ holds as well, which due to the above observations it implies that either $a^{\prime}$ is the only individual contained in $\phi_{\mathbf{z}^{\prime}}$ and $a=a^{\prime}$ or $\phi_{\mathbf{z}^{\prime}}$ does not mention any individual and the choice of $a$ above is irrelevant. By Proposition 37, we have $({q^{a}_{\mathbf{z}^{\prime}}})^{{\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}}(\langle\rangle)=1$ , hence, $(q^{a^{\prime}}_{\mathbf{z}^{\prime}})^{\mathcal{C}({\cal K})}(\langle\rangle)=1$ as well, which implies that the images of all atoms in $\phi_{\mathbf{z}^{\prime}}$ under $\lambda$ have multiplicity $1$ . Based on this fact, we derive the following equivalence for every ma-connected subset $\mathbf{z}^{\prime}$ of $\mathbf{z}$ .

[TABLE]

We now inspect the form of $q_{\mathbf{z}}$ . Let $\mathbf{\bar{z}}$ be all terms of $\phi(\mathbf{x},\mathbf{y})$ not appearing in $\mathbf{z}$ , $\mathbf{y}_{\mathbf{\bar{z}}}={\mathbf{X}}\cap\mathbf{\bar{z}}$ , $\phi_{\mathbf{\bar{z}}}$ the subconjunction of $\phi(\mathbf{x},\mathbf{y})$ consisting of atoms and equalities mentioning only terms in $\mathbf{\bar{z}}$ , and $\mathbf{z}_{\mathbf{z}}=\bigcup_{\text{ma-connected }\mathbf{z}^{\prime}\subseteq\mathbf{z}}\{z_{\mathbf{z}^{\prime}}\mid z_{\mathbf{z}^{\prime}}\text{ appears in }\alpha_{\mathbf{z}^{\prime}}\}$ (note that $\mathbf{z}_{\mathbf{z}}=\mathbf{z}\cap\mathbf{y}^{\prime}$ ). Then, $q_{\mathbf{z}}$ takes the following form:

[TABLE]

Notice that, for all non-empty ma-connected subsets $\mathbf{z}^{\prime}$ of $\mathbf{z}$ , the query on the left-hand side of equation (4) corresponds to a conjunction w.r.t. $\mathbf{z}^{\prime}$ in the query at the right-hand side of equation (3) of Proposition 36, while both map their common variables in $\mathbf{z}$ to anonymous elements when evaluated over $\mathcal{C}({\cal K})$ . By Definition 24 all ma-connected subsets of $\mathbf{z}$ are pairwise disjoint, while their union makes up $\mathbf{z}$ . Hence, considering equation (4) for all ma-connected subsets of $\mathbf{z}$ and combining it with (3) and (5), we immediately derive $[q,\mathbf{z}]^{\mathcal{C}({\cal K})}=[q_{\mathbf{z}},\mathbf{z}]^{\mathcal{C}({\cal K})}$ .

2. Since $\mathbf{z}$ is not realisable by ${\cal T}$ , by Definition 26 we have that either $\phi(\mathbf{x},\mathbf{y})$ contains an equality $z=t$ with $z\in\mathbf{z}$ and $t\not\in\mathbf{z}$ or that there is a non-empty ma-connected subset $\mathbf{z}^{\prime}\subseteq\mathbf{z}$ which is not realisable by ${\cal T}$ . For the former, by Definition 23 all $\lambda\in\Lambda_{\mathbf{z}}$ are such that $\lambda(z)\not=\lambda(t)$ , hence, equality $z=t$ is not satisfied by any of them, which results in $[q,\mathbf{z}]^{\mathcal{C}({\cal K})}=\emptyset$ . For the latter, this means that, for $\mathbf{x}^{\prime}=\mathbf{t}_{\mathbf{z}^{\prime}}\cap{\mathbf{X}}$ , $a$ an individual in $\mathbf{t}_{\mathbf{z}^{\prime}}$ if it exists or a fresh individual otherwise, and query ${q^{a}_{\mathbf{z}^{\prime}}}()=\exists\mathbf{x}^{\prime}.\,\exists\mathbf{z}^{\prime}.\;\phi_{\mathbf{z}^{\prime}}\land\;\bigwedge_{t\in\mathbf{t}_{\mathbf{z}^{\prime}}}(t=a)\;\land\;\bigwedge_{z\in\mathbf{z}^{\prime}}(z\neq a)$ , we have $({q^{a}_{\mathbf{z}^{\prime}}})^{{\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}}(\langle\rangle)=0$ where, for $t\in\mathbf{t}_{\mathbf{z}^{\prime}}$ and $z\in\mathbf{z}^{\prime}$ , ${\cal A}^{\prime}$ is the bag ABox having either only assertion $P(a,b)$ (with multiplicity 1), when $\alpha_{\mathbf{z}^{\prime}}=P(t,z)$ , or only assertion $P(b,a)$ , when $\alpha_{\mathbf{z}^{\prime}}=P(z,t)$ . We distinguish two cases: (i) either there is no homomorphism $\nu$ from $\phi_{\mathbf{z}^{\prime}}$ to ${\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}$ or (ii) there is one but it violates some equality or inequality of ${q^{a}_{\mathbf{z}^{\prime}}}$ . Assume there exists $\lambda\in\Lambda_{\mathbf{z}}$ that is a homomorphism from $\phi_{\mathbf{z}^{\prime}}$ to $\mathcal{C}({\cal K})$ (otherwise we trivially have $[q,\mathbf{z}]^{\mathcal{C}({\cal K})}=\emptyset$ ). From the existence of $\lambda$ and atom $\alpha_{\mathbf{z}^{\prime}}$ in $\phi_{\mathbf{z}^{\prime}}$ , we have that $\mathcal{C}({\cal K})$ contains an isomorphic copy of ${\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}$ , modulo individuals $a$ and $b$ (see also the proof of statement 1.). But then, considering case (i), if there is no homomorphism $\nu$ from $\phi_{\mathbf{z}^{\prime}}$ to ${\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}$ , this means there is no homomorphism $\lambda$ from $\phi_{\mathbf{z}^{\prime}}$ to $\mathcal{C}({\cal K})$ either, which is a contradiction (this is easily seen by the fact that there is a homomorphism from the image of $\phi_{\mathbf{z}^{\prime}}$ under $\lambda$ to ${\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}$ that preserves individuals and the fact that the composition of homomorphisms is another homomorphism). Considering case (ii), it means that either there are two different individuals in the query connected with variables in $\mathbf{z}^{\prime}$ or that $\nu$ maps $z$ to $a$ . But then, the former means that $\phi_{\mathbf{z}^{\prime}}$ contains atoms $P_{1}(a,z_{1})$ and $P_{2}(a^{\prime},z_{2})$ such that $a\not=a^{\prime}$ with $P_{1},P_{2}$ not necessarily distinct that contradicts Proposition 36, which requires all $t\in\mathbf{t}_{\mathbf{z}^{\prime}}$ be mapped to the same individual by $\lambda\in\Lambda_{\mathbf{z}}$ . On the other hand, if $\nu(z)=a$ , then $\lambda(z)=a$ if $a\in\mathbf{t}_{\mathbf{z}^{\prime}}$ , otherwise $\lambda(z)=a^{\prime}$ for some individual $a^{\prime}$ in $\Delta^{\mathcal{C}({\cal K})}$ . In either case, $\lambda(z)\in{\mathbf{I}}$ , from which we conclude that $\lambda\not\in\Lambda_{\mathbf{z}}$ , which is a contradiction. Therefore, there is a subquery $\phi_{\mathbf{z}^{\prime}}$ in $q$ for which there is no $\lambda\in\Lambda_{\mathbf{z}}$ that is a homomorphism from $\phi_{\mathbf{z}^{\prime}}$ to $\mathcal{C}({\cal K})$ and satisfies the equalities of $\phi_{\mathbf{z}^{\prime}}$ , from which it follows that $[q,\mathbf{z}]^{\mathcal{C}({\cal K})}=\emptyset$ . ∎

For a $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ TBox ${\cal T}$ , a concept $C$ , a role $R$ , and a term $t$ , where $\zeta_{C_{0}}(t)$ is defined in Section 5, we define the expressions $\eta_{C}(t)$ and $\theta_{\exists R}(t)$ as follows:

[TABLE]

Lemma 38.

For a $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontology ${\cal K}=\langle{\cal T},{\cal A}\rangle$ the following hold:

For a query of the form $q_{C}(x)=\zeta_{C}(x)$ and an individual $a$ , we have $q_{C}^{\mathcal{C}({\cal K})}(a)=\eta_{C}^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(a)$ . 2. 2.

For a role $R\in{\mathbf{R}}$ , queries of the form $q_{R}(x,y)=R(x,y)$ and $q_{R^{-}}(x,y)=R(y,x)$ , and a pair of individuals $\mathbf{a}$ , we have $q_{R}^{\mathcal{C}({\cal K})}(\mathbf{a})=q_{R}^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(\mathbf{a})$ and $q_{R^{-}}^{\mathcal{C}({\cal K})}(\mathbf{a})=q_{R^{-}}^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(\mathbf{a})$ . 3. 3.

For a role $R$ , a query of the form $q(x)=R(x,x)$ , and an individual $a$ , we have $q^{\mathcal{C}({\cal K})}(a)=q^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(a)$ .

Proof.

Recalling the definition of canonical models, it is immediate to verify that $C^{\mathcal{C}({\cal K})}(a)=C^{\mathcal{C}_{1}({\cal K})}=\mathsf{ccl}_{{\cal T}}[a,\mathcal{C}_{0}({\cal K})](C)$ holds. Observe that the latter quantity is equal to $\bigcup_{{\cal T}\models C_{0}\sqsubseteq C}(C_{0}^{\mathcal{C}_{0}({\cal K})}(a))$ . Last, notice that $C_{0}^{\mathcal{C}_{0}({\cal K})}(a)={\cal A}(C_{0}(a))$ holds whenever $C_{0}\in{\mathbf{C}}$ , and $C_{0}^{\mathcal{C}_{0}({\cal K})}(a)=\sum_{c\in{\mathbf{I}}}{\cal A}(R(a,c))$ holds whenever $C_{0}=\exists R$ with $R\in{\mathbf{R}}$ (the case is analogous when $C_{0}=\exists R^{-}$ with $R\in{\mathbf{R}}$ ). Recalling the semantics of CQs, this means that $C_{0}^{\mathcal{C}_{0}({\cal K})}(a)=\zeta_{C}^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(a)$ holds. Combining this with $C^{\mathcal{C}({\cal K})}(a)=\bigcup_{{\cal T}\models C_{0}\sqsubseteq C}(C_{0}^{\mathcal{C}_{0}({\cal K})}(a))$ and the definition of equation (6), we derive $C^{\mathcal{C}({\cal K})}(a)=\eta_{C}^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(a)$ , which is equivalent to, $q_{C}^{\mathcal{C}({\cal K})}(a)=\eta_{C}^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(a)$ .
Since ${\cal T}$ does not contain role inclusion axioms, for every $i\geq 0$ in the definition of the canonical model for ${\cal K}$ , every role $P$ , and every pair of individuals $\mathbf{a}$ , we have $P^{\mathcal{C}_{i}({\cal K})}(\mathbf{a})=P^{\mathcal{C}_{0}({\cal K})}(\mathbf{a})$ . Moreover, by definition of canonical models, we have $\mathcal{C}({\cal K})=\bigcup_{i\geq 0}\mathcal{C}_{i}({\cal K})$ , from which and the definition of $\cup$ for bag interpretations, we get $P^{\mathcal{C}({\cal K})}=\bigcup_{i\geq 0}P^{\mathcal{C}_{i}({\cal K})}$ . Combining this with the fact that $P^{\mathcal{C}_{i}({\cal K})}(\mathbf{a})=P^{\mathcal{C}_{0}({\cal K})}(\mathbf{a})$ , we derive $P^{\mathcal{C}({\cal K})}(\mathbf{a})=P^{\mathcal{C}_{0}({\cal K})}(\mathbf{a})$ . Last, because $P^{\mathcal{C}_{0}({\cal K})}(\mathbf{a})={\cal A}(P(\mathbf{a}))$ , we get $q_{P}^{\mathcal{C}({\cal K})}(\mathbf{a})=q_{P}^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(\mathbf{a})$ , which proves the claim.
The claim follows from the facts that ${\cal T}$ does not contain any role inclusion axioms and that canonical interpretations do not add to role extentions tuples with repeated elements. ∎

Lemma 39.

For a rooted CQ $q(\mathbf{x})=\exists\mathbf{y}.\,\phi(\mathbf{x},\mathbf{y})$ , a $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ ontology ${\cal K}=\langle{\cal T},{\cal A}\rangle$ , and variables $\mathbf{z}\subseteq\mathbf{y}$ that are realisable by ${\cal T}$ , we have $[q_{\mathbf{z}},\mathbf{z}]^{\mathcal{C}({\cal K})}=({\bar{q}}_{\mathbf{z}})^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}$ .

Proof.

Denote by $\mathbf{\bar{z}}$ all terms of $\phi(\mathbf{x},\mathbf{y})$ not appearing in $\mathbf{z}$ , by $\phi_{\mathbf{\bar{z}}}$ the subconjunction of $\phi(\mathbf{x},\mathbf{y})$ that consists of all atoms and equalities mentioning only terms in $\mathbf{\bar{z}}$ , and set $\mathbf{y}_{\mathbf{\bar{z}}}=\mathbf{y}\setminus\mathbf{z}$ . Then, recalling the definition of $\phi_{\mathbf{z}^{\prime}}$ in Section 5, $q$ is written as

[TABLE]

For a ma-connected subset $\mathbf{z}^{\prime}$ of $\mathbf{z}$ and for $P\in{\mathbf{R}}$ , let $R_{\mathbf{z}^{\prime}}=P$ , if $\alpha_{\mathbf{z}^{\prime}}=P(t,z)$ , and $R_{\mathbf{z}^{\prime}}=P^{-}$ , if $\alpha_{\mathbf{z}^{\prime}}=P(z,t)$ . Last, denote term $t$ and variable $z$ appearing in $\alpha_{\mathbf{z}^{\prime}}$ by $t_{\mathbf{z}^{\prime}}$ and $z_{\mathbf{z}^{\prime}}$ , respectively, and let $\mathbf{z}_{\mathbf{z}}=\bigcup_{\text{ma-connected }\mathbf{z}^{\prime}\subseteq\mathbf{z}}\{z_{\mathbf{z}^{\prime}}\mid z_{\mathbf{z}^{\prime}}\text{ appears in }\alpha_{\mathbf{z}^{\prime}}\}$ . Based on this, query $q_{\mathbf{z}}$ takes the form

[TABLE]

Last, denote by ${\sf apply}(\phi_{\mathbf{\bar{z}}},\eta)$ the formula obtained from $\phi_{\mathbf{\bar{z}}}$ such that each occurrence of an atom $A(t)$ in $\phi_{\mathbf{\bar{z}}}$ , where $A\in{\mathbf{C}}$ and $t\in{\mathbf{I}}\cup{\mathbf{X}}$ , is replaced with $\eta_{A}(t)$ (defined in equation (6)). Recalling equation (7) that defines $\theta_{\exists R}(t)$ for a role $R$ and a term $t$ , query $\bar{q}_{\mathbf{z}}$ takes the form

[TABLE]

Consider now Definition 23 and the set of valuations accounting for bag $[q_{\mathbf{z}},\mathbf{z}]^{\mathcal{C}({\cal K})}$ . All such valuations map a variable in $\phi_{\mathbf{\bar{z}}}$ to an individual. Because ${\sf apply}(\phi_{\mathbf{\bar{z}}},\eta)$ replaces unary atoms $A(t)$ in $\phi_{\mathbf{\bar{z}}}$ with $\eta_{A}(t)$ leaving any binary atoms intact, by Lemma 38 (Case 1 for unary atoms and Case 2), it follows that the evaluation of $\phi_{\mathbf{\bar{z}}}$ over $\mathcal{C}({\cal K})$ coincides with the evaluation of ${\sf apply}(\phi_{\mathbf{\bar{z}}},\eta)$ over $\mathcal{C}(\langle\emptyset,{\cal A}\rangle)$ . It remains to be shown that, for each ma-connected subset $\mathbf{z}^{\prime}$ of $\mathbf{z}$ , equality $[\exists z_{\mathbf{z}^{\prime}}.\,\alpha_{\mathbf{z}^{\prime}},z_{\mathbf{z}^{\prime}}]^{\mathcal{C}({\cal K})}=(\theta_{\exists R_{\mathbf{z}^{\prime}}}(t_{\mathbf{z}^{\prime}}))^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}$ holds. Then, the claim will follow because $q_{\mathbf{z}}$ and $\bar{q}_{\mathbf{z}}$ share the same equalities. To see this last equivalance, observe that for an individual $a\in{\mathbf{I}}$ (or for $a=t_{\mathbf{z}^{\prime}}$ if $t_{\mathbf{z}^{\prime}}\in{\mathbf{I}}$ ), the multiplicity for $[\exists z_{\mathbf{z}^{\prime}}.\,\alpha_{\mathbf{z}^{\prime}},z_{\mathbf{z}^{\prime}}]^{\mathcal{C}({\cal K})}(a)$ corresponds to the number of anonymous elements associated with $a$ in extension $\exists R_{\mathbf{z}^{\prime}}$ under $\mathcal{C}({\cal K})$ . Recalling the definition of canonical models and the proof of Lemma 38, number $[\exists z_{\mathbf{z}^{\prime}}.\,\alpha_{\mathbf{z}^{\prime}},z_{\mathbf{z}^{\prime}}]^{\mathcal{C}({\cal K})}(a)$ can be written successively as follows:

[TABLE]

∎

See 29

Proof.

Let $q$ be of the form $q(\mathbf{x})=\exists\mathbf{y}.\phi(\mathbf{x},\mathbf{y})$ . From (1) and then by Lemmas 27 and 39, we have

[TABLE]

∎

See 30

Proof.

The claim is an immediate consequence of Theorems 16 and 29, and the fact that rewriting of a CQ $q$ depends only on $q$ and TBox. ∎

See 31

Proof.

For combined complexity, NP-hardness comes from reducing CQ query answering in $\textit{DL-Lite}_{\textit{core}}$ to CQ query answering for rooted queries in $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ . It is known that given a CQ $q$ in $\textit{DL-Lite}_{\textit{core}}$ , an ontology ${\cal K}=\langle{\cal T},{\cal A}\rangle$ , and a tuple of individuals $\mathbf{a}$ , the problem of deciding whether $\mathbf{a}\in q^{{\cal K}}$ is NP-complete even when all variables in $q$ are free (i.e., $q$ is rooted). Then, NP-hardness follows from Proposition 10.

We now discuss membership in NP. $\textsc{BagCert}[\textup{rooted CQs},\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}]$ decides for any given ontology ${\cal K}=\langle{\cal T},{\cal A}\rangle$ , rooted query $q(\mathbf{x})=\exists\mathbf{y}.\,\phi(\mathbf{x},\mathbf{y})$ , tuple of individuals $\mathbf{a}$ , and number $k\in\mathbb{N}^{\infty}_{0}$ , whether $q^{\cal K}(\mathbf{a})\geq k$ . Without loss of generality in the following we assume that $k$ is a positive integer since for the cases of $k$ being [math] or $\infty$ , $\textsc{BagCert}[\textup{rooted CQs},\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}]$ is always true or false, respectively. By Theorem 16 and the definition of universal models, we have $q^{\cal K}(\mathbf{a})=q^{\mathcal{C}({\cal K})}(\mathbf{a})$ . Hence, in the following, we focus on the canonical model of ${\cal K}$ . By the semantics of bag query answering, we might have a number of valuations from the terms of $q$ to $\Delta^{\mathcal{C}({\cal K})}$ that contribute to the multiplicity of $q^{\mathcal{C}({\cal K})}(\mathbf{a})$ . In the worst case, each valuation may contribute $1$ to this multiplicity, which means that to verify whether $q^{\mathcal{C}({\cal K})}(\mathbf{a})\geq k$ holds, we need at most $k$ different valuations $\lambda_{1},\dots,\lambda_{k}$ that send the atoms of $q$ to tuples in $\mathcal{C}({\cal K})$ , bind $\mathbf{x}$ to $\mathbf{a}$ , and satisfy the equalities of $q$ . By Theorem 18 the images of $q$ under any of these valuations fall into $\mathcal{C}_{n}({\cal K})$ where $n$ is the number of atoms of $q$ . Based on these observations, to prove membership in NP, we describe how we can obtain a tuple $\langle{\cal J},\lambda_{1},\dots,\lambda_{k}\rangle$ where ${\cal J}$ is a subinterpretation of $\mathcal{C}_{n}({\cal K})$ and each $\lambda_{i}:\mathbf{x}\cup\mathbf{y}\cup{\mathbf{I}}\to\Delta^{\cal J}$ satisfies $\lambda(\mathbf{x})=\mathbf{a}$ , and verify that the multiplicity of $q^{\cal J}(\mathbf{a})$ with respect to $\lambda_{1},\dots,\lambda_{k}$ is at least $k$ . Then, we also prove that $\langle{\cal J},\lambda_{1},\dots,\lambda_{k}\rangle$ has size $N$ and that verification can be done in time $T$ such that both $N$ and $T$ are some polynomials with respect to the size of ${\cal K}$ , $q$ , and number $k$ . To obtain $\langle{\cal J},\lambda_{1},\dots,\lambda_{k}\rangle$ we guess (g1) an interpretation ${\cal J}=\mathcal{C}_{1}({\cal K})\cup\bigcup_{i=1}^{k}{\cal J}_{i}$ where each ${\cal J}_{i}$ is a subinterpretation of $\bigcup_{j=2}^{n}\mathcal{C}_{j}({\cal K})$ and (g2) $k$ different valuations $\lambda_{1},\dots,\lambda_{k}$ such that $\lambda_{i}:\mathbf{x}\cup\mathbf{y}\cup\mathbf{I}\to\Delta^{\cal J}$ and $\lambda(\mathbf{x})=\mathbf{a}$ . For the verification, we (v1) check which of the valuations $\lambda_{1},\dots,\lambda_{k}$ satisfy the equalities of $q$ letting $\Lambda_{=}$ be the corresponding subset and (v2) compute quantity $m=\sum_{\lambda\in\Lambda_{=}}\prod_{S(\mathbf{t})\text{ in }\phi(\mathbf{x},\mathbf{y})}S^{{\cal J}}(\lambda(\mathbf{t}))$ for checking whether $m\geq k$ .

We now elaborate on the guessing of ${\cal J}$ . The guessed ${\cal J}$ is such that $\Delta^{\cal J}$ is a finite set comprising all individuals appearing in ${\cal A}$ and a number of anonymous elements of the form $w^{j}_{u,R}$ where $j$ is a positive number, $u$ an element of $\Delta^{\cal J}$ , and $R$ a role in ${\cal T}$ . Further ${\cal J}$ contains finite bag extensions for every concept or role $S$ appearing in ${\cal T}$ or ${\cal A}$ . The part of ${\cal J}$ corresponding to $\mathcal{C}_{1}({\cal K})$ can be trivially computed from the assertions of ${\cal A}$ and the axioms of ${\cal T}$ . To avoid an exponential computation, however, we guess the remaining interpretations ${\cal J}_{1},\dots,{\cal J}_{k}$ using a non-deterministic algorithm having $n-1$ steps for each ${\cal J}_{i}$ . Initially, ${\cal J}_{i}$ is set to $\mathcal{C}_{1}({\cal K})$ . At each step, the algorithm picks a tuple $\mathbf{t}$ from an extension $S^{{\cal J}_{i}}$ with $S\in{\mathbf{C}}\cup{\mathbf{R}}$ and a concept $D$ appearing in ${\cal T}$ such that the following conditions are satisfied:

–

if $S\in{\mathbf{C}}$ , then ${\cal T}\models S\sqsubseteq D$ , $\mathbf{t}=w^{j}_{u,R}$ , and $D^{{\cal J}_{i}}(w^{j}_{u,R})=0$ where $w^{j}_{u,R}\in\Delta^{{\cal J}_{i}}$ ;

–

if $S\in{\mathbf{R}}$ , then ${\cal T}\models\exists S^{-}\sqsubseteq D$ , $\mathbf{t}=(u,w^{j}_{u,R})$ , and $D^{{\cal J}_{i}}(w^{j}_{u,R})=0$ where $u,w^{j}_{u,R}\in\Delta^{{\cal J}_{i}}$ (resp., ${\cal T}\models\exists S\sqsubseteq D$ , $\mathbf{t}=(w^{j}_{u,R^{-}},u)$ , $w^{j}_{u,R^{-}}\in\Delta^{{\cal J}_{i}}$ ).

Then, if $D\in{\mathbf{C}}$ , it sets $D^{{\cal J}_{i}}(w^{j}_{u,R})=1$ ; if $D=\exists P$ with $P\in{\mathbf{R}}$ , it sets $P^{{\cal J}_{i}}((w^{j}_{u,R},w^{1}_{w^{j}_{u,R},P}))=1$ and adds $w^{1}_{w^{j}_{u,R},P}$ to $\Delta^{{\cal J}_{i}}$ ; and if $D=\exists P^{-}$ with $P\in{\mathbf{R}}$ , it sets $P^{{\cal J}_{i}}((w^{1}_{w^{j}_{u,R},P^{-}},w^{j}_{u,R}))=1$ and adds $w^{1}_{w^{j}_{u,R},P^{-}}$ to $\Delta^{{\cal J}_{i}}$ . It can be readily verified that each ${\cal J}_{i}$ can be any subinterpretation of $\mathcal{C}_{n}({\cal K})$ that always includes $\mathcal{C}_{1}({\cal K})$ and potentially tuples ${\mathbf{t}}_{1},\dots,{\mathbf{t}}_{l}$ that would have been created respectively in $\mathcal{C}_{1}({\cal K}),\dots,\mathcal{C}_{l}({\cal K})$ such that ${\mathbf{t}}_{j}\cap{\mathbf{t}}_{j+1}\not=\emptyset$ with $1\leq j<l$ and $l\in[2,n]$ .

What remains to be shown is that the size, $N$ , of tuple $\langle{\cal J},\lambda_{1},\dots,\lambda_{k}\rangle$ and time, $T$ , needed to verify that this tuple is a certificate for $q^{\mathcal{C}({\cal K})}(\mathbf{a})\geq k$ are polynomials in the size of ${\cal K}$ , $q$ , and number $k$ . Consider first size $N$ . It is easy to see that $\mathcal{C}_{1}({\cal K})$ has a size that is polynomial in the size of ${\cal A}$ and ${\cal T}$ . The remaining parts of ${\cal J}_{1},\dots,{\cal J}_{k}$ are linear in the size of $q$ by construction. Therefore, ${\cal J}$ is of polynomial size with respect to ${\cal K}$ , $q$ , and number $k$ . As for the size of $\lambda_{1},\dots,\lambda_{k}$ , because $q$ has $n$ atoms, it contains at most $2n$ terms, hence each valuation can be represented by $2n$ pairs $(t,d)$ where $t$ is a term in $q$ and $d\in\Delta^{\cal J}$ . Therefore, overall, $N$ is polynomial in the size of ${\cal A}$ , $q$ , and number $k$ . Consider now quantity $T$ . Step (v1) takes time $\Theta(n)$ . Step (v2) takes polynomial time in the size of $q$ , ${\cal A}$ , and number $k$ . To see this, first observe that retrieval of the multiplicities involved in a product for a specific valuation takes $O(n\times|{\cal J}|)$ time where $|{\cal J}|$ is the sum of the cardinalities of all extensions in ${\cal J}$ . Each such number $l$ is determined by the maximum multiplicity in ${\cal A}$ and can be represented in binary using $\log l$ bits. Second, multiplication of $n$ such numbers can be done in polynomial time, while the result, $m$ , can be represented using $n\log l$ bits. Since $|{\cal J}|$ is a polynomial determined by the input, overall verification can be done in polynomial time. This proves that $\textsc{BagCert}[\textup{rooted CQs},\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}]\in\textsc{NP}$ . Since we have also showed that it is NP-hard, we conclude that $\textsc{BagCert}[\textup{rooted CQs},\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}]$ is NP-complete.

For data complexity, it suffices to prove that for any fixed $\textit{DL-Lite}_{\textit{core}}^{\textit{bag}}$ TBox ${\cal T}$ and any fixed rooted CQ $q$ , the problem of checking whether $q^{\langle{\cal T},{\cal A}\rangle}(\mathbf{a})\geq k$ is in LogSpace for an input ABox ${\cal A}$ , a tuple of individuals $\mathbf{a}$ , and a $k\in\mathbb{N}^{\infty}_{0}$ . The claim follows from Theorem 16 and Theorem 29, which in combination, allow us to decide whether $q^{\langle{\cal T},{\cal A}\rangle}(\mathbf{a})\geq k$ by computing the rewriting $\bar{q}$ of $q$ in constant time and then deciding whether ${\bar{q}}^{\mathcal{C}(\langle\emptyset,{\cal A}\rangle)}(\mathbf{a})\geq k$ holds. By Proposition 22, the latter problem is $\textsc{AC}^{0}$ reducible to $\textsc{BALG}^{1}_{\varepsilon}$ , which is known to be strictly included in LogSpace, hence, the claim follows immediately. Computation of $\bar{q}$ can be done in constant time because of the following: (i) the number of all possible subsets $\mathbf{z}$ of $\mathbf{y}$ participating in the realisability check by ${\cal T}$ is constant; (ii) computing all ma-connected subsets of $\mathbf{z}$ can be done in constant time; (iii) checking whether each ma-connected subset of $\mathbf{z}$ is realisable by ${\cal T}$ can be done in constant time by employing Theorem 18 for bounding the depth of ${\mathcal{C}(\langle{\cal T},{\cal A}^{\prime}\rangle)}$ to a constant number (in particular, to the number of atoms of $q$ ); and (iv) constructing the rewriting of the original query for each realisable ma-connected subset of $\mathbf{z}$ can be done in constant time. ∎

See 32

Proof.

We prove that there exists a $\textit{DL-Lite}_{\cal R}^{\textit{bag}}$ TBox ${\cal T}$ and a rooted CQ $q$ such that checking whether $q^{\langle{\cal T},{\cal A}\rangle}(\langle\rangle)\geq k$ for an input bag ABox ${\cal A}$ and $k\in\mathbb{N}^{\infty}_{0}$ is coNP-hard. To prove this claim, we give a similar reduction to the proof of Theorem 13. We show that if $G=\langle V,E\rangle$ is an undirected and connected graph with no self-loops, then $G$ is not 3-colourable if and only if $q^{\langle{\cal T},{\cal A}_{G}\rangle}(r)\geq 3\times|V|+2$ where ${\cal T}$ is the TBox $\{Vertex\sqsubseteq\exists hasColour,hasColour\sqsubseteq Assign\}$ , ${\cal A}_{G}$ is an ABox constructed based on $G$ , and $q(w)$ is the rooted query

[TABLE]

Let ${\mathbf{I}}\supseteq V\cup\{a,r,g,b\}$ . ABox ${\cal A}_{G}$ is defined so that it contains the following assertions:

–

$Vertex(u)$ for each $u\in V$ ,

–

$Edge(u,v)$ , $Edge(v,u)$ for each $(u,v)\in E$ ,

–

$Assign(u,r)$ , $Assign(u,g)$ , $Assign(u,b)$ for each $u\in V$ ,

–

$Vertex(a)$ , $Edge(a,a)$ , $hasColour(a,r)$ , $Assign(a,r)$ for an auxiliary vertex $a\notin V$ ,

–

$Reachable(a,a)$ , $Reachable(a,u)$ and $Reachable(u,a)$ for every $u\in V$ , $Reachable(u,v)$ and $Reachable(v,u)$ for every $u,v\in V$ with $u\not=v$ .

Role $hasColor$ plays the role of a colour assignment to the vertices of $G$ ; this is also imposed by axiom $Vertex\sqsubseteq\exists hasColour$ . Role $Assign$ provides a pre-defined list of colours for every vertex of $G$ that favours 3-colour assignments based on the colours $r$ , $g$ , and $b$ . Any proper assignment of $G$ shall use at most $|V|$ times each one of the colours. However, if any assignment is not proper and exhausts the number of available colours (i.e., by assigning multiple colours to the same vertex) or uses an additional colour, these will have to be added to role $Assign$ due to the axiom $hasColour\sqsubseteq Assign$ . Role $Reachable$ plays the role of an accessibility relation of an individual from any other individual. This property is used for counting the total number of available colours among all vertices.

We next show that $G$ is not 3-colourable if and only if $q^{\langle{\cal T},{\cal A}_{G}\rangle}(r)\geq 3\times|V|+2$ .

“ $\Rightarrow$ ” Let $G$ be non-3-colourable. Consider a model ${\cal I}$ of $\langle{\cal T},{\cal A}_{G}\rangle$ (which exists since $\langle{\cal T},{\cal A}_{G}\rangle$ is satisfiable) such that, if $\gamma:V\to\{r,g,b\}$ is an assignment of colours to the vertices of $G$ , then for $u\not=a$ , $hasColour^{\cal I}((u^{\cal I},c^{\cal I}))=1$ if and only if $\gamma(u)=c$ with $c\in\{r,g,b\}$ . Since $G$ is not 3-colourable, then, for all assignments $\gamma$ , there exists at least an edge $(u,v)\in E$ with $\gamma(u)=\gamma(v)=c$ . Without loss of generality assume that $c=r$ . Consequently, for all models ${\cal I}$ , $hasColour^{\cal I}$ contains tuples $(u^{\cal I},c^{\cal I})$ and $(v^{\cal I},c^{\cal I})$ , and hence, subquery

[TABLE]

matches at least two times, each one contributing multiplicity equal to $1$ ; one match corresponds to valuation $\nu_{1}=\{x/u^{\cal I},y/v^{\cal I},z/c^{\cal I},w/r^{\cal I}\}$ and one to $\nu_{2}=\{x/a^{\cal I},y/a^{\cal I},z/r^{\cal I},w/r^{\cal I}\}$ (note that we are considering only valuations $\nu$ with $\nu(w)=r^{\cal I}$ ). Extending the above query to $q(w)$ , we observe that $\nu_{1}$ can be extended with variables $k$ and $l$ in $3\times|V|-2$ ways. To see this, observe that every node in $V$ is related to $|V|$ other nodes in $Reachable^{\cal I}$ of which $|V|-1$ are related to at least $3$ colours in $Assign^{\cal I}$ while the other one, namely $a^{\cal I}$ , is related to at least $1$ . Similarly, $\nu_{2}$ can be extended with variables $k$ and $l$ in $3\times|V|+1$ ways. Therefore, $q$ has at least $6\times|V|-1$ matches for every model ${\cal I}$ following a proper 3-colour assignment, and hence, $3\times|V|+2$ is a certain multiplicity for $r$ , as required. Clearly, the same statement holds for all of the models that add additional elements in $Vertex$ , $Edge$ , or assign multiple colours to some vertices exceeding the number of available colours. What is left to consider is those models that assign additional colours to vertices and not just one among $r$ , $g$ , and $b$ . For such colour assignments, $G$ might turn out to be colourable. Suppose $G$ is 4-colourable (if it is not, then the above discussion carries over) and let $p\in{\mathbf{I}}$ . Then, there exists a model that follows a 4-colour assignment $\gamma:V\to\{r,g,b,p\}$ such that $\gamma(u)\not=\gamma(v)$ for every $(u,v)\in E$ . Therefore, for that model we would get just one match for subquery $q_{1}(x,y,z,w)$ corresponding to valuation $\nu_{2}$ . On the other hand, given the observations above, that model would have associated at least one vertex to colour $p$ in the extension of $hasCol^{\cal I}$ , and hence in $Assign^{\cal I}$ , effectively increasing by one the number of colours to which that vertex is associated. Therefore, extending the above subquery to $q$ , we observe that $\nu$ can be extended with variables $k$ and $l$ in at least $3\times|V|+2$ ways. Clearly, the same holds for models that make use of further colours. Therefore, $q^{\langle{\cal T},{\cal A}_{G}\rangle}(\langle\rangle)\geq 3\times|V|+2$ .

“ $\Leftarrow$ ” Let $G$ be 3-colourable. It suffices to show that there exists a model ${\cal I}$ for which $q^{\cal I}(r)=m$ with $m<3\times|V|+2$ . Since $G$ is 3-colourable, there is an assignment $\gamma:V\to\{r,g,b\}$ such that, for every $(u,v)\in E$ , $\gamma(u)\not=\gamma(v)$ . Consider an interpretation ${\cal I}_{\gamma}$ defined as follows:

[TABLE]

Interpretation ${\cal I}_{\gamma}$ is defined based on the contents of $V$ , $E$ , and the 3-colour assignment $\gamma$ . It is easy to verify that ${\cal I}_{\gamma}$ is a model of $\langle{\cal T},{\cal A}_{G}\rangle$ . Next, we show that $q^{{\cal I}_{\gamma}}(r)=3\times|V|+1$ . First, we observe that subquery $q_{1}(x,y,z,w)$ matches exactly once, i.e., under valuation $\nu=\{x/d_{a},y/d_{a},z/d_{r},w/d_{r}\}$ . This holds because $\gamma$ is a proper 3-colouring of $G$ and, for every $(u,v)\in E$ , $\gamma(u)\not=\gamma(v)$ . Note also that extending the above subquery to $q(w)$ , valuation $\nu$ can be extended with variables $k$ and $l$ in $3\times|V|+1$ ways. Consequently, $q^{{\cal I}_{\gamma}}(r)=3\times|V|+1$ . ∎

Remark 2.

When the UNA is dropped, we can use a similar argumentation to the one given in Remark 1 to reduce the problem of non-3-colourability of undirected graphs to that of query answering over $\textit{DL-Lite}_{\cal R}^{\textit{bag}}$ ontologies.

Bibliography27

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Afrati and Kolaitis [2008] Foto N. Afrati and Phokion G. Kolaitis. Answering aggregate queries in data exchange. In PODS , 2008.
2Angles and Gutierrez [2016] Renzo Angles and Claudio Gutierrez. The multiset semantics of SPARQL patterns. In ISWC , 2016.
3Arenas et al. [2003] Marcelo Arenas, Leopoldo E. Bertossi, Jan Chomicki, Xin He, Vijay Raghavan, and Jeremy P. Spinrad. Scalar aggregation in inconsistent databases. Theor. Comput. Sci. , 296(3):405–434, 2003.
4Artale et al. [2009] Alessandro Artale, Diego Calvanese, Roman Kontchakov, and Michael Zakharyaschev. The DL-Lite family and relations. J. Artif. Intell. Res. (JAIR) , 36:1–69, 2009.
5Bienvenu et al. [2012] Meghyn Bienvenu, Carsten Lutz, and Frank Wolter. Query containment in description logics reconsidered. In KR , 2012.
6Calì et al. [2013] Andrea Calì, Georg Gottlob, and Michael Kifer. Taming the infinite chase: Query answering under expressive relational constraints. J. Artif. Intell. Res. (JAIR) , 48:115–174, 2013.
7Calvanese et al. [2007] Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. Autom. Reasoning , 39(3):385–429, 2007.
8Calvanese et al. [2008] Diego Calvanese, Evgeny Kharlamov, Werner Nutt, and Camilo Thorne. Aggregate queries over ontologies. In Proceedings of the 2nd International Workshop on Ontologies and Information Systems for the Semantic Web, ONISW , 2008.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

The Bag Semantics of Ontology-Based Data Access††thanks: This work was supported by the Royal Society under a University Research

Abstract

1 Introduction

Example 1**.**

Example 2**.**

2 Preliminaries

3 DL-LiteR\textit{DL-Lite}_{\cal R}DL-LiteR​ with Bag Semantics

Definition 3**.**

Definition 4**.**

Definition 5**.**

Example 6**.**

Definition 7**.**

Definition 8**.**

Example 9**.**

Proposition 10**.**

Definition 11**.**

Proposition 12**.**

Theorem 13**.**

4 Universal Models for Rooted Queries

Definition 14**.**

Definition 15**.**

Theorem 16**.**

Example 17**.**

Theorem 18**.**

5 Rewritability of Rooted Queries

Proposition 19**.**

Definition 20**.**

Definition 21**.**

Proposition 22**.**

Definition 23**.**

Definition 24**.**

Definition 25**.**

Definition 26**.**

Lemma 27**.**

Example 28**.**

Theorem 29**.**

Corollary 30**.**

Theorem 31**.**

Theorem 32**.**

6 Related work

7 Conclusion and Future Work

Appendix A Appendix

Proof.

Proof.

Proof.

Remark 1**.**

Lemma 33**.**

Proof.

Lemma 34**.**

Proof.

Proof.

Proof.

Proof.

Proposition 35**.**

Proof.

Proof.

Proposition 36**.**

Proof.

Proposition 37**.**

Proof.

Proof.

Lemma 38**.**

Proof.

Lemma 39**.**

Proof.

Proof.

Proof.

Proof.

Proof.

Remark 2**.**

Example 1.

Example 2.

3 $\textit{DL-Lite}_{\cal R}$ with Bag Semantics

Definition 3.

Definition 4.

Definition 5.

Example 6.

Definition 7.

Definition 8.

Example 9.

Proposition 10.

Definition 11.

Proposition 12.

Theorem 13.

Definition 14.

Definition 15.

Theorem 16.

Example 17.

Theorem 18.

Proposition 19.

Definition 20.

Definition 21.

Proposition 22.

Definition 23.

Definition 24.

Definition 25.

Definition 26.

Lemma 27.

Example 28.

Theorem 29.

Corollary 30.

Theorem 31.

Theorem 32.

Remark 1.

Lemma 33.

Lemma 34.

Proposition 35.

Proposition 36.

Proposition 37.

Lemma 38.

Lemma 39.

Remark 2.