Distributed Data Processing Frameworks for Big Graph Data
Afsin Akdogan, Hien To

TL;DR
This paper surveys programming models and frameworks for processing large-scale graph data, compares their performance, and evaluates fundamental algorithms to assess their efficiency and scalability.
Contribution
It provides a comprehensive survey of existing frameworks for big graph data and evaluates their performance on fundamental algorithms like PageRank and Bipartite Matching.
Findings
Techniques can yield up to 1340 times speedup on Hadoop.
Vertex-based models are effective for large graph processing.
Baseline single-node implementations help evaluate the benefits of partitioning.
Abstract
Recently we create so much data (2.5 quintillion bytes every day) that 90% of the data in the world today has been created in the last two years alone [1]. This data comes from sensors used to gather traffic or climate information, posts to social media sites, photos, videos, emails, purchase transaction records, call logs of cellular networks, etc. This data is big data. In this report, we first briefly discuss what programming models are used for big data processing, and focus on graph data and do a survey study about what programming models/frameworks are used to solve graph problems at very large-scale. In section 2, we introduce the programming models which are not specifically designed to handle graph data but we include them in this survey because we believe these are important frameworks and/or there have been studies to customize them for more efficient graph processing. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Cloud Computing and Resource Management · Advanced Database Systems and Queries
