How Fast Can Graph Computations Go on Fine-grained Parallel Architectures
Yuqing Wang, Charles Colley, Brian Wheatman, Jiya Su, David F. Gleich, Andrew A. Chien

TL;DR
This paper investigates the potential performance of graph computations on a specialized fine-grained parallel architecture, demonstrating significant speedups over prior systems through simulation of optimized algorithms.
Contribution
It introduces the UpDown architecture optimized for fine-grained parallelism and irregular graph processing, achieving unprecedented performance in graph benchmarks.
Findings
UpDown achieves 637K GTEPS for PageRank
UpDown achieves 989K GTEPS for BFS
Performance exceeds prior results by up to 100x
Abstract
Large-scale graph problems are of critical and growing importance and historically parallel architectures have provided little support. In the spirit of co-design, we explore the question, How fast can graph computing go on a fine-grained architecture? We explore the possibilities of an architecture optimized for fine-grained parallelism, natural programming, and the irregularity and skew found in real-world graphs. Using two graph benchmarks, PageRank (PR) and Breadth-First Search (BFS), we evaluate a Fine-Grained Graph architecture, UpDown, to explore what performance codesign can achieve. To demonstrate programmability, we wrote five variants of these algorithms. Simulations of up to 256 nodes (524,288 lanes) and projections to 16,384 nodes (33M lanes) show the UpDown system can achieve 637K GTEPS PR and 989K GTEPS BFS on RMAT, exceeding the best prior results by 5x and 100x…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · VLSI and FPGA Design Techniques · Cloud Computing and Resource Management
