A MapReduce Approach to NoSQL RDF Databases

Albert Haque

arXiv:1601.01770·cs.DB·January 11, 2016·1 cites

A MapReduce Approach to NoSQL RDF Databases

Albert Haque

PDF

Open Access

TL;DR

This paper presents a MapReduce-based approach for NoSQL RDF databases, emphasizing the importance of join algorithms and query optimization to improve performance in distributed triplestores.

Contribution

It introduces a MapReduce framework for NoSQL RDF databases and evaluates its performance, highlighting the impact of join algorithms and query optimization strategies.

Findings

01

Join algorithms significantly affect query runtimes.

02

Optimizing queries before MapReduce planning reduces network traffic.

03

Distributed graph databases require careful query optimization.

Abstract

In recent years, the increased need to house and process large volumes of data has prompted the need for distributed storage and querying systems. The growth of machine-readable RDF triples has prompted both industry and academia to develop new database systems, called NoSQL, with characteristics that differ from classical databases. Many of these systems compromise ACID properties for increased horizontal scalability and data availability. This thesis concerns the development and evaluation of a NoSQL triplestore. Triplestores are database management systems central to emerging technologies such as the Semantic Web and linked data. The evaluation spans several benchmarks, including the two most commonly used in triplestore evaluation, the Berlin SPARQL Benchmark, and the DBpedia benchmark, a query workload that operates an RDF representation of Wikipedia. Results reveal that the join…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGraph Theory and Algorithms · Cloud Computing and Resource Management · Advanced Database Systems and Queries