AMPLE: Event-Driven Accelerator for Mixed-Precision Inference of Graph Neural Networks
Pedro Gimenes, Yiren Zhao, George Constantinides

TL;DR
AMPLE is an FPGA-based accelerator for GNN inference that uses an event-driven approach and mixed-precision quantization, achieving significant speedups over CPU and GPU implementations.
Contribution
It introduces an event-driven FPGA architecture with mixed-precision quantization and a novel prefetcher to improve GNN inference performance on irregular graph data.
Findings
Achieved 243x speedup over CPU
Achieved 7.2x speedup over GPU
Effective handling of irregular memory access patterns
Abstract
Graph Neural Networks (GNNs) have recently gained attention due to their performance on non-Euclidean data. The use of custom hardware architectures proves particularly beneficial for GNNs due to their irregular memory access patterns, resulting from the sparse structure of graphs. However, existing FPGA accelerators are limited by their double buffering mechanism, which doesn't account for the irregular node distribution in typical graph datasets. To address this, we introduce \textbf{AMPLE} (Accelerated Message Passing Logic Engine), an FPGA accelerator leveraging a new event-driven programming flow. We develop a mixed-arithmetic architecture, enabling GNN inference to be quantized at a node-level granularity. Finally, prefetcher for data and instructions is implemented to optimize off-chip memory access and maximize node parallelism. Evaluation on citation and social media graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need
