How Fast Can Graph Computations Go on Fine-grained Parallel Architectures

Yuqing Wang; Charles Colley; Brian Wheatman; Jiya Su; David F. Gleich; Andrew A. Chien

arXiv:2507.00949·cs.DC·July 2, 2025

How Fast Can Graph Computations Go on Fine-grained Parallel Architectures

Yuqing Wang, Charles Colley, Brian Wheatman, Jiya Su, David F. Gleich, Andrew A. Chien

PDF

Open Access

TL;DR

This paper investigates the potential performance of graph computations on a specialized fine-grained parallel architecture, demonstrating significant speedups over prior systems through simulation of optimized algorithms.

Contribution

It introduces the UpDown architecture optimized for fine-grained parallelism and irregular graph processing, achieving unprecedented performance in graph benchmarks.

Findings

01

UpDown achieves 637K GTEPS for PageRank

02

UpDown achieves 989K GTEPS for BFS

03

Performance exceeds prior results by up to 100x

Abstract

Large-scale graph problems are of critical and growing importance and historically parallel architectures have provided little support. In the spirit of co-design, we explore the question, How fast can graph computing go on a fine-grained architecture? We explore the possibilities of an architecture optimized for fine-grained parallelism, natural programming, and the irregularity and skew found in real-world graphs. Using two graph benchmarks, PageRank (PR) and Breadth-First Search (BFS), we evaluate a Fine-Grained Graph architecture, UpDown, to explore what performance codesign can achieve. To demonstrate programmability, we wrote five variants of these algorithms. Simulations of up to 256 nodes (524,288 lanes) and projections to 16,384 nodes (33M lanes) show the UpDown system can achieve 637K GTEPS PR and 989K GTEPS BFS on RMAT, exceeding the best prior results by 5x and 100x…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGraph Theory and Algorithms · VLSI and FPGA Design Techniques · Cloud Computing and Resource Management