Distributed Processing of Generalized Graph-Pattern Queries in SPARQL 1.1
Sairam Gurajada, Martin Theobald

TL;DR
This paper presents a scalable distributed architecture for processing complex SPARQL 1.1 graph-pattern queries involving relational joins and graph reachability over large RDF datasets, using partitioning, indexing, and asynchronous communication.
Contribution
It introduces a unified approach for optimizing and distributedly processing SPARQL 1.1 queries with both joins and reachability predicates over large RDF collections.
Findings
Efficient partitioning and indexing scheme for RDF data
Parallel processing of SPARQL queries across compute nodes
Scalable handling of complex graph-pattern queries
Abstract
We propose an efficient and scalable architecture for processing generalized graph-pattern queries as they are specified by the current W3C recommendation of the SPARQL 1.1 "Query Language" component. Specifically, the class of queries we consider consists of sets of SPARQL triple patterns with labeled property paths. From a relational perspective, this class resolves to conjunctive queries of relational joins with additional graph-reachability predicates. For the scalable, i.e., distributed, processing of this kind of queries over very large RDF collections, we develop a suitable partitioning and indexing scheme, which allows us to shard the RDF triples over an entire cluster of compute nodes and to process an incoming SPARQL query over all of the relevant graph partitions (and thus compute nodes) in parallel. Unlike most prior works in this field, we specifically aim at the unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Semantic Web and Ontologies · Advanced Database Systems and Queries
