Model-Architecture Co-Design for High Performance Temporal GNN Inference on FPGA
Hongkuan Zhou, Bingyi Zhang, Rajgopal Kannan, Viktor Prasanna, Carl, Busart

TL;DR
This paper introduces a co-designed FPGA hardware and simplified model architecture for high-performance inference of temporal graph neural networks, achieving efficiency through optimized attention computation, neighbor pruning, and hardware techniques.
Contribution
It presents a novel combined model-architecture design for efficient TGNN inference on FPGAs, including lightweight attention, neighbor pruning, and hardware optimizations.
Findings
Achieved high throughput on real-world datasets.
Reduced computation and memory access through pruning and hardware design.
Maintained accuracy with knowledge distillation.
Abstract
Temporal Graph Neural Networks (TGNNs) are powerful models to capture temporal, structural, and contextual information on temporal graphs. The generated temporal node embeddings outperform other methods in many downstream tasks. Real-world applications require high performance inference on real-time streaming dynamic graphs. However, these models usually rely on complex attention mechanisms to capture relationships between temporal neighbors. In addition, maintaining vertex memory suffers from intrinsic temporal data dependency that hinders task-level parallelism, making it inefficient on general-purpose processors. In this work, we present a novel model-architecture co-design for inference in memory-based TGNNs on FPGAs. The key modeling optimizations we propose include a light-weight method to compute attention scores and a related temporal neighbor pruning strategy to further reduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Data Quality and Management
MethodsPruning · Knowledge Distillation
