Fast and Efficient Parallel Breadth-First Search with Power-law Graph Transformation
Zite Jiang, Tao Liu, Shuai Zhang, Zhen Guan, Mengting Yuan, Haihang, You

TL;DR
This paper presents a new preprocessing method using RCM to optimize BFS on power-law graphs, significantly improving cache efficiency and load balancing, leading to high-performance results on real-world and benchmark graphs.
Contribution
It introduces an RCM-based preprocessing technique combined with SIMD acceleration to enhance BFS performance on large-scale power-law graphs.
Findings
Achieved 326.48 MTEPS/W on ARMv8 system
Ranked 2nd on Green Graph500 list in June 2020
Improved data locality and load balancing for BFS
Abstract
In the big data era, graph computing is widely used to exploit the hidden value in real-world graphs in various scenarios such as social networks, knowledge graphs, web searching, and recommendation systems. However, the random memory accesses result in inefficient use of cache and the irregular degree distribution leads to substantial load imbalance. Breadth-First Search (BFS) is frequently utilized as a kernel for many important and complex graph algorithms. In this paper, we describe a preprocessing approach using Reverse Cuthill-Mckee (RCM) algorithm to improve data locality and demonstrate how to achieve an efficient load balancing for BFS. Computations on RCM-reordered graph data are also accelerated with SIMD executions. We evaluate the performance of the graph preprocessing approach on Kronecker graphs of the Graph500 benchmark and real-world graphs. Our BFS implementation on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Advanced Graph Neural Networks · Data Management and Algorithms
