TL;DR
The paper introduces Hybrid Edge Partitioner (HEP), a system that adaptively partitions large graphs with memory constraints, combining in-memory and streaming methods to improve partition quality and processing speed.
Contribution
HEP is a novel system that dynamically balances memory use and partitioning quality by combining a new in-memory algorithm with streaming partitioning.
Findings
HEP outperforms traditional in-memory and streaming partitioners on large real-world graphs.
Using HEP significantly speeds up distributed graph processing on Spark/GraphX.
HEP effectively balances memory overhead and partition quality in large-scale graph partitioning.
Abstract
Distributed systems that manage and process graph-structured data internally solve a graph partitioning problem to minimize their communication overhead and query run-time. Besides computational complexity -- optimal graph partitioning is NP-hard -- another important consideration is the memory overhead. Real-world graphs often have an immense size, such that loading the complete graph into memory for partitioning is not economical or feasible. Currently, the common approach to reduce memory overhead is to rely on streaming partitioning algorithms. While the latest streaming algorithms lead to reasonable partitioning quality on some graphs, they are still not completely competitive to in-memory partitioners. In this paper, we propose a new system, Hybrid Edge Partitioner (HEP), that can partition graphs that fit partly into memory while yielding a high partitioning quality. HEP can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
