Scalable Breadth-First Search on a GPU Cluster

Yuechao Pan; Roger Pearce; John D. Owens

arXiv:1803.03922·cs.DC·April 6, 2018·1 cites

Scalable Breadth-First Search on a GPU Cluster

Yuechao Pan, Roger Pearce, John D. Owens

PDF

Open Access

TL;DR

This paper introduces a scalable GPU cluster implementation for BFS on scale-free graphs, using degree separation and optimized communication to achieve near-linear scaling and high performance.

Contribution

It presents a novel degree separation approach and communication model that significantly improve BFS scalability and efficiency on GPU clusters.

Findings

01

Achieves 259.8 GTEPS on a 124-GPU system.

02

Reduces graph size to one third of traditional representation.

03

Demonstrates linear weak scaling with increasing GPUs.

Abstract

On a GPU cluster, the ratio of high computing power to communication bandwidth makes scaling breadth-first search (BFS) on a scale-free graph extremely challenging. By separating high and low out-degree vertices, we present an implementation with scalable computation and a model for scalable communication for BFS and direction-optimized BFS. Our communication model uses global reduction for high-degree vertices, and point-to-point transmission for low-degree vertices. Leveraging the characteristics of degree separation, we reduce the graph size to one third of the conventional edge list representation. With several other optimizations, we observe linear weak scaling as we increase the number of GPUs, and achieve 259.8 GTEPS on a scale-33 Graph500 RMAT graph with 124 GPUs on the latest CORAL early access system.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCaching and Content Delivery · Graph Theory and Algorithms · Advanced Graph Neural Networks