GraphH: High Performance Big Graph Analytics in Small Clusters
Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong, Xiaokui Xiao

TL;DR
GraphH is a system designed to enable high-performance big graph analytics on small clusters by combining efficient partitioning, a novel computation model, and caching strategies, significantly outperforming existing in-memory and out-of-core systems.
Contribution
It introduces a two-stage graph partitioning scheme and a GAB computation model to efficiently process large graphs in small clusters or single servers.
Findings
GraphH is up to 7.8x faster than Pregel+ and PowerGraph.
GraphH is more than 100x faster than GraphD and Chaos on big graphs.
The system effectively reduces disk I/O and improves communication performance.
Abstract
It is common for real-world applications to analyze big graphs using distributed graph processing systems. Popular in-memory systems require an enormous amount of resources to handle big graphs. While several out-of-core approaches have been proposed for processing big graphs on disk, the high disk I/O overhead could significantly reduce performance. In this paper, we propose GraphH to enable high-performance big graph analytics in small clusters. Specifically, we design a two-stage graph partition scheme to evenly divide the input graph into partitions, and propose a GAB (Gather-Apply-Broadcast) computation model to make each worker process a partition in memory at a time. We use an edge cache mechanism to reduce the disk I/O overhead, and design a hybrid strategy to improve the communication performance. GraphH can efficiently process big graphs in small clusters or even a single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Cloud Computing and Resource Management · Interconnection Networks and Systems
