S2RDF: RDF Querying with SPARQL on Spark

Alexander Sch\"atzle; Martin Przyjaciel-Zablocki; Simon Skilevic,; Georg Lausen

arXiv:1512.07021·cs.DB·January 28, 2016·41 cites

S2RDF: RDF Querying with SPARQL on Spark

Alexander Sch\"atzle, Martin Przyjaciel-Zablocki, Simon Skilevic,, Georg Lausen

PDF

Open Access

TL;DR

S2RDF is a scalable system that enables fast SPARQL querying over large RDF datasets on Spark by using a novel ExtVP partitioning schema, significantly improving performance over existing Hadoop-based solutions.

Contribution

The paper introduces ExtVP, a semi-join based RDF partitioning schema, and implements S2RDF on Spark, achieving high-performance SPARQL querying on billion-triple datasets.

Findings

01

S2RDF achieves sub-second query runtimes on billion-triple RDF graphs.

02

ExtVP reduces query input size regardless of pattern shape.

03

S2RDF outperforms state-of-the-art Hadoop-based SPARQL systems.

Abstract

RDF has become very popular for semantic data publishing due to its flexible and universal graph-like data model. Yet, the ever-increasing size of RDF data collections makes it more and more infeasible to store and process them on a single machine, raising the need for distributed approaches. Instead of building a standalone but closed distributed RDF store, we endorse the usage of existing infrastructures for Big Data processing, e.g. Hadoop. However, SPARQL query performance is a major challenge as these platforms are not designed for RDF processing from ground. Thus, existing Hadoop-based approaches often favor certain query pattern shape while performance drops significantly for other shapes. In this paper, we describe a novel relational partitioning schema for RDF data called ExtVP that uses a semi-join based preprocessing, akin to the concept of Join Indices in relational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries · Data Quality and Management