Parallel Distributed Breadth First Search on the Kepler Architecture
Mauro Bisson, Massimo Bernaschi, Enrico Mastrostefano

TL;DR
This paper introduces an optimized CUDA-based parallel BFS algorithm that leverages Kepler architecture features to efficiently explore massive graphs, achieving over 800 billion edges per second on a large GPU cluster.
Contribution
It presents a novel GPU-accelerated BFS implementation that significantly reduces communication and data exchange, enabling ultra-fast large-scale graph traversal.
Findings
Visited over 800 billion edges per second on a 4096 GPU cluster
Achieved high efficiency by optimizing communication and data exchange
Demonstrated scalability on large GPU clusters
Abstract
We present the results obtained by using an evolution of our CUDA-based solution for the exploration, via a Breadth First Search, of large graphs. This latest version exploits at its best the features of the Kepler architecture and relies on a combination of techniques to reduce both the number of communications among the GPUs and the amount of exchanged data. The final result is a code that can visit more than 800 billion edges in a second by using a cluster equipped with 4096 Tesla K20X GPUs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Parallel Computing and Optimization Techniques · Data Management and Algorithms
