Parallel Distributed Breadth First Search on the Kepler Architecture

Mauro Bisson; Massimo Bernaschi; Enrico Mastrostefano

arXiv:1408.1605·cs.DC·December 24, 2014·1 cites

Parallel Distributed Breadth First Search on the Kepler Architecture

Mauro Bisson, Massimo Bernaschi, Enrico Mastrostefano

PDF

Open Access

TL;DR

This paper introduces an optimized CUDA-based parallel BFS algorithm that leverages Kepler architecture features to efficiently explore massive graphs, achieving over 800 billion edges per second on a large GPU cluster.

Contribution

It presents a novel GPU-accelerated BFS implementation that significantly reduces communication and data exchange, enabling ultra-fast large-scale graph traversal.

Findings

01

Visited over 800 billion edges per second on a 4096 GPU cluster

02

Achieved high efficiency by optimizing communication and data exchange

03

Demonstrated scalability on large GPU clusters

Abstract

We present the results obtained by using an evolution of our CUDA-based solution for the exploration, via a Breadth First Search, of large graphs. This latest version exploits at its best the features of the Kepler architecture and relies on a combination of techniques to reduce both the number of communications among the GPUs and the amount of exchanged data. The final result is a code that can visit more than 800 billion edges in a second by using a cluster equipped with 4096 Tesla K20X GPUs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGraph Theory and Algorithms · Parallel Computing and Optimization Techniques · Data Management and Algorithms