GraphAGILE: An FPGA-based Overlay Accelerator for Low-latency GNN Inference
Bingyi Zhang, Hanqing Zeng, Viktor Prasanna

TL;DR
GraphAGILE is an FPGA-based overlay accelerator with a unified architecture and compiler that enables low-latency GNN inference without FPGA reconfiguration, supporting various models and graph sizes.
Contribution
It introduces a novel instruction set architecture and compiler for FPGA-based GNN inference, eliminating reconfiguration and optimizing execution across diverse models.
Findings
Achieves low-latency inference on multiple GNN models.
Supports dynamic load balancing and computation optimization.
Operates efficiently on a state-of-the-art FPGA platform.
Abstract
This paper presents GraphAGILE, a domain-specific FPGA-based overlay accelerator for graph neural network (GNN) inference. GraphAGILE consists of (1) \emph{a novel unified architecture design} with an \emph{instruction set}, and (2) \emph{a compiler} built upon the instruction set that can quickly generate optimized code. Due to the proposed instruction set architecture (ISA) and the compiler, GraphAGILE does not require any FPGA reconfiguration when performing inference on various GNN models and input graphs. For the architecture design, we propose a novel hardware module named Adaptive Computation Kernel (ACK), that can execute various computation kernels of GNNs, including general matrix multiplication (GEMM), sparse-dense matrix multiplication (SpDMM) and sampled dense-dense matrix multiplication (SDDMM). The compiler takes the specifications of a GNN model and the graph meta data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Advanced Memory and Neural Computing · Parallel Computing and Optimization Techniques
