Experimental Analysis of Distributed Graph Systems

Khaled Ammar; Tamer Ozsu

arXiv:1806.08082·cs.DC·June 22, 2018

Experimental Analysis of Distributed Graph Systems

Khaled Ammar, Tamer Ozsu

PDF

TL;DR

This paper conducts a comprehensive experimental comparison of eight distributed graph processing systems across multiple large datasets and workloads, analyzing their performance, scalability, and usability.

Contribution

It provides an independent, empirical evaluation of system performance and scalability, offering insights and tuning heuristics for better efficiency.

Findings

01

GraphLab (PowerGraph) outperforms others in scalability

02

Performance varies significantly across datasets and workloads

03

System tuning heuristics improve overall performance

Abstract

This paper evaluates eight parallel graph processing systems: Hadoop, HaLoop, Vertica, Giraph, GraphLab (PowerGraph), Blogel, Flink Gelly, and GraphX (SPARK) over four very large datasets (Twitter, World Road Network, UK 200705, and ClueWeb) using four workloads (PageRank, WCC, SSSP and K-hop). The main objective is to perform an independent scale-out study by experimentally analyzing the performance, usability, and scalability (using up to 128 machines) of these systems. In addition to performance results, we discuss our experiences in using these systems and suggest some system tuning heuristics that lead to better performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.