ButterFly BFS -- An Efficient Communication Pattern for Multi Node Traversals
Oded Green

TL;DR
ButterFly BFS is a multi-GPU algorithm that significantly accelerates large-scale graph traversals, achieving over 10 times faster performance than CPU-based methods and scaling efficiently across multiple GPUs.
Contribution
The paper introduces ButterFly BFS, a novel multi-GPU traversal algorithm that enables fast, scalable BFS on large graphs, surpassing CPU performance and effectively utilizing multiple GPUs.
Findings
Achieves over 10X speedup compared to CPU implementations.
Scales linearly with the number of GPUs, reaching 70% efficiency.
Traverses large graphs at over 300 GTEP/s on a single server.
Abstract
Breadth-First Search (BFS) is a building block used in a wide array of graph analytics and is used in various network analysis domains: social, road, transportation, communication, and much more. Over the last two decades, network sizes have continued to grow. The popularity of BFS has brought with it a need for significantly faster traversals. Thus, BFS algorithms have been designed to exploit shared-memory and shared-nothing systems -- this includes algorithms for accelerators such as the GPU. GPUs offer extremely fast traversals at the cost of processing smaller graphs due to their limited memory size. In contrast, CPU shared-memory systems can scale to graphs with several billion edges but do not have enough compute resources needed for fast traversals. This paper introduces ButterFly BFS, a multi-GPU traversal algorithm that allows analyzing significantly larger networks at high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Caching and Content Delivery · Interconnection Networks and Systems
