PAPyA: Performance Analysis of Large RDF Graphs Processing Made Easy
Mohamed Ragab, Adam Satria Adidarma, Riccardo Tommasini

TL;DR
PAPyA is an open-source library that simplifies performance analysis of large RDF graph processing on relational Big Data systems, enabling automatic ranking and flexible extensions for better deployment decisions.
Contribution
This paper introduces PAPyA, a library that automates and extends prescriptive performance analysis for large RDF graph processing in Big Data frameworks.
Findings
PAPyA effectively automates performance ranking in RDF graph processing.
PAPyA simplifies performance analysis workflows for large graph data.
Experimental results demonstrate PAPyA's utility with SparkSQL.
Abstract
Prescriptive Performance Analysis (PPA) has shown to be more useful than traditional descriptive and diagnostic analyses for making sense of Big Data (BD) frameworks' performance. In practice, when processing large (RDF) graphs on top of relational BD systems, several design decisions emerge and cannot be decided automatically, e.g., the choice of the schema, the partitioning technique, and the storage formats. PPA, and in particular ranking functions, helps enable actionable insights on performance data, leading practitioners to an easier choice of the best way to deploy BD frameworks, especially for graph processing. However, the amount of experimental work required to implement PPA is still huge. In this paper, we present PAPyA 1, a library for implementing PPA that allows (1) preparing RDF graphs data for a processing pipeline over relational BD systems, (2) enables automatic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies · Advanced Graph Neural Networks
