Subsumption of Weakly Well-Designed SPARQL Patterns is Undecidable
Mark Kaminski, Egor V. Kostylev

TL;DR
This paper proves that determining whether one weakly well-designed SPARQL pattern subsumes another is an undecidable problem, highlighting a significant computational complexity difference from well-designed patterns.
Contribution
It establishes the undecidability of subsumption for weakly well-designed SPARQL patterns, contrasting with known decidability results for well-designed patterns.
Findings
Subsumption is undecidable for weakly well-designed patterns.
Contrasts with decidability of equivalence and containment.
Highlights computational complexity challenges in SPARQL analysis.
Abstract
Weakly well-designed SPARQL patterns is a recent generalisation of well-designed patterns, which preserve good computational properties but also capture almost all patterns that appear in practice. Subsumption is one of static analysis problems for SPARQL, along with equivalence and containment. In this paper we show that subsumption is undecidable for weakly well-designed patterns, which is in stark contrast to well-designed patterns, and to equivalence and containment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
Subsumption of Weakly Well-Designed SPARQL Patterns is Undecidable
Mark Kaminski
Department of Computer Science,
University of Oxford,
Oxford, UK
Egor V. Kostylev
Department of Computer Science,
University of Oxford,
Oxford, UK
The Resource Description Framework (RDF) [1, 4] is the W3C standard for representing linked data on the Web. SPARQL [11, 3] is the default query language for RDF graphs.
A distinctive feature of SPARQL is the operator (abbreviated as in this paper), which was introduced to “not reject (solutions) because some part of the query pattern does not match” [11]. The operator accounts in a natural way for the open world assumption and the fundamental incompleteness of the Web. However, evaluating queries that use is computationally expensive: the corresponding decision problem is PSpace-complete [8, 12], even if only projection-free queries (i.e., patterns) are considered.
Pérez et al. [8] introduced the well-designed fragment of SPARQL queries by imposing a syntactic restriction on the use of variables in -expressions. On the one hand, well-designed patterns have lower complexity of query evaluation—the problem is coNP-complete. On the other hand, such queries have a more intuitive behaviour than arbitrary SPARQL queries and enjoy specific monotonicity properties. However, by far not all SPARQL queries are well-designed [9]. Weakly well-designed SPARQL fragment has been recently introduced to overcome this shortcoming: it possesses the same complexity of evaluation, but also includes almost all queries that appear in practice [5, 6].
Besides evaluation, every query language has associated static analysis problems, such as query equivalence and containment. For SPARQL there is also a specific static analysis problem, namely, query subsumption [7]. It is known that equivalence and containment are both NP-complete for well-designed patterns, while subsumption is -complete for such queries [7, 10]. Moreover, all three problems are undecidable for well-designed queries with projection [7, 10]. From the results of Zhang et al. [13] it follows that all these problems are undecidable for arbitrary patterns. Finally, equivalence and containment for weakly well-designed patterns are both -complete [5, 6]. It is also claimed that subsumption is also -complete for such patterns [5]. In this paper, however, we show that this problem is much more difficult; in fact, it is undecidable.
1 SPARQL Patterns
We adopt the formalisation of SPARQL that mostly follows [8]. However, we concentrate on patterns constructed using only basic graph patterns and optional matching.
RDF Graphs An RDF graph is a labelled graph where nodes can also serve as edge labels. Formally, let be a set of IRIs. Then an RDF triple is a tuple from , where is called subject, predicate, and object. An RDF graph is a finite set of RDF triples.
SPARQL Syntax Let be an infinite set of variables, disjoint from . A basic (graph) pattern is a possibly empty set of triples from
[TABLE]
An (optional SPARQL graph) patterns are defined by the following grammar, where ranges over basic patterns:
[TABLE]
We denote the set of all variables that appear in a pattern .
Note that a given pattern can occur more than once within a larger pattern. In what follows we will need to distinguish between a (sub-)pattern as a possibly repeated building block of another pattern and its occurrences in —that is, unique subtrees in the parse tree. Then, the left (right) argument of an occurrence is the subtree rooted in the left (right) child of the root of in the parse tree, and an occurrence is inside an occurrence if the root of is a descendant of the root of .
A pattern is well-designed (Pérez et al. [8]) if for every occurrence of an -pattern in the variables from occur in only inside .
Given a pattern , an occurrence in dominates an occurrence if there exists an occurrence of an -pattern such that is inside the left argument of and is inside the right argument. A pattern is weakly well-designed ([5, 6]) if, for each occurrence of an -subpattern , the variables in appear outside only in subpatterns whose occurrences are dominated by .
SPARQL Semantics The semantics of graph patterns is defined in terms of mappings—that is, partial functions from variables to IRIs. The domain of a mapping is the set of variables on which is defined. Two mappings and are compatible, written , if for all variables . Mapping is subsumed by mapping , written , if and . If , then constitutes a mapping with domain that coincides with on and with on .
Given two sets of mappings and , we define their left outer join operation as follows:
[TABLE]
Given a graph , the evaluation of a pattern over is defined as follows:
if is a basic pattern, then 2. 2.
.
A pattern is contained in a pattern if for every graph . Patterns and are equivalent if they contain each other. Pattern is subsumed by , written , if, for every graph , each has such that (Letelier et al. [7]).
2 Pattern Subsumption
Theorem 1
The problem of checking whether for weakly well-designed patterns and is undecidable.
Proof. We prove undecidability by a reduction of a variant of the tiling problem, which is known to be undecidable (see e.g., [2]). We start by introducing the notation used throughout the proof.
A tiling instance consists of a collection of tile types and edge compatibility relations and on . Intuitively, means that a tile of type can be placed to the right of a tile of type in a row, while means that can be placed above in a column.
A tiling of the positive plane with is a function , for the set of natural numbers , such that, for all ,
- –
, and
- –
.
Tiling is periodic if there exist positive numbers and , called horizontal and vertical periods, respectively, such that for all . A periodic tiling can be seen as a tiling of a torus, since column and row can be “glued” with the left-most column and bottom row, respectively.
Let denote the set of all tiling instances that allow for tilings of the positive plane, and the set of all tiling instances that allow for periodic tilings. To prove undecidability we will use the following fact.
Fact 1** (Gurevich and Koryakov [2])**
Sets and are recursively inseparable—that is, there is no recursive set with .
In what follows we first construct, for each tiling instance , weakly well-designed patterns and , and then show that the set
[TABLE]
contains , and is contained in . This will imply, by Fact 1, that (and, hence, the complement of ) cannot be recursive.
Let be a tiling instance with tile types , and compatibility relations and . Let be
[TABLE]
so, is a basic pattern with 6 triples, only one of which mentions a variable, . The other pattern has a more complex structure: let be
[TABLE]
where , ,
[TABLE]
Having the construction complete, next we show that for any tiling instance in . In particular, on the base of a witnessing periodic tiling we build a graph and a mapping such that , but there is no such that . Assume that has tile types , compatibility relations and , and periodic tiling with the horizontal and vertical periods and , respectively. Let consist of the triples
[TABLE]
as well as the triples
[TABLE]
Let also .
It is immediate to see that . Moreover, assuming that has form (1), consists of mappings sending to one of , to one of , also to one of , while , and to the IRIs accordingly connected to the value of (note that the values of , , and do not depend on each other).
Since the tiling agrees with and , none of basic patterns and has a match in , because each of them requires a pair of horizontally or vertically adjacent cells with incompatible tile types. So, none of the mappings in are extendable to any of and . However, each mapping extends to such that with . In particular, this extension sends to , which implies that . Therefore, and are a witness for the required .
We continue by showing that implies for any tiling instance . In particular, on the base of a graph and mapping witnessing we construct a tiling of the positive plane with . Assume that has tile types as well as compatibility relations and . Since , graph contains triples
[TABLE]
for the IRI such that . Therefore, assuming that has form (1), contains a mapping sending to . Mapping is extendable to for some ; indeed, if it is not the case, then contains an extension of sending to , because all , , and contain , while matches , which implies contradicting the fact that and are a witness for non-subsumption. Therefore, triples are matched in extending , that is, contains triples
[TABLE]
for some IRI . Just for uniformity, assume that . Therefore, contains a mapping sending to (and all other variables same as ). Reasoning in the same way as for , we obtain that has triples
[TABLE]
for some IRI . Continuing like this, we conclude that contains
[TABLE]
for all (note that many of these coincide, because is finite).
For each , contains a mapping sending to . As before, this mapping is extendable in to for some . In particular, it is extendable to the triples , , and —that is, contains triples
[TABLE]
for some IRI (again, if is 1 or 2, then we assume that is the same as in for uniformity). Similarly as before, contains a mapping sending to , from which we have that has triples
[TABLE]
for some and . Repeating this process, we conclude that contains, for any and ,
[TABLE]
for some and . Set for each and .
We need to show that is indeed a tiling with . To this end, we first note that contains the triple for all and : we already showed this fact for , and for all other it can be proved very similarly to the reasoning above, based on the fact that contains a mapping sending , , , and to , , , and , respectively. Now, to see that is a tiling with we just note that if there exist horizontally or vertically adjacent tiles that do not agree with or , then there exists or such that or is matched in ; since this basic patterns does not have any variables in common with , any mapping in is then extendable to this BGP and hence contains a mapping sending to , contradicting the fact that graph and mapping are a witness for non-subsumption.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Richard Cyganiak, David Wood, and Markus Lanthaler. RDF 1.1 concepts and abstract syntax. W 3C recommendation, W 3C, February 2014. http://www.w 3.org/TR/rdf 11-concepts/ .
- 2[2] Yuri Sh. Gurevich and I. O. Koryakov. Remarks on Berger’s paper on the domino problem. Siberian Mathematical Journal , 13(2):319–321, 1972.
- 3[3] Steve Harris and Andy Seaborne. SPARQL 1.1 query language. W 3C recommendation, W 3C, March 2013. http://www.w 3.org/TR/sparql 11-query/ .
- 4[4] Patrick J. Hayes and Peter F. Patel-Schneider. RDF 1.1 semantics. W 3C recommendation, W 3C, February 2014. http://www.w 3.org/TR/rdf 11-mt/ .
- 5[5] Mark Kaminski and Egor V. Kostylev. Beyond well-designed SPARQL. In Wim Martens and Thomas Zeume, editors, Proc. 19th International Conference on Database Theory, ICDT 2016 , volume 48 of LIP Ics , pages 5:1–5:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016.
- 6[6] Mark Kaminski and Egor V. Kostylev. Complexity and expressive power of weakly well-designed SPARQL. Theory of Computing Systems (To CS) , 62(4):772–809, 2018.
- 7[7] Andrés Letelier, Jorge Pérez, Reinhard Pichler, and Sebastian Skritek. Static analysis and optimization of semantic web queries. ACM Trans. Database Syst. , 38(4:25), 2013.
- 8[8] Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. Semantics and complexity of SPARQL. ACM Trans. Database Syst. , 34(3):16:1–16:45, 2009.
