Accelerating Generic Graph Neural Networks via Architecture, Compiler, Partition Method Co-Design
Shuwen Lu, Zhihui Zhang, Cong Guo, Jingwen Leng, Yangjie Zhou, Minyi, Guo

TL;DR
This paper presents SwitchBlade, a hardware-software co-design framework that accelerates various GNN models by reducing bandwidth needs and improving hardware utilization, achieving significant speedups and energy savings.
Contribution
The paper introduces a novel partition-level operator fusion, multi-threading, and fine-grained graph partitioning to support diverse GNN models efficiently.
Findings
Achieves 1.85x speedup over NVIDIA V100 GPU
Reduces energy consumption by 19.03x
Performs comparably to specialized GNN accelerators
Abstract
Graph neural networks (GNNs) have shown significant accuracy improvements in a variety of graph learning domains, sparking considerable research interest. To translate these accuracy improvements into practical applications, it is essential to develop high-performance and efficient hardware acceleration for GNN models. However, designing GNN accelerators faces two fundamental challenges: the high bandwidth requirement of GNN models and the diversity of GNN models. Previous works have addressed the first challenge by using more expensive memory interfaces to achieve higher bandwidth. For the second challenge, existing works either support specific GNN models or have generic designs with poor hardware utilization. In this work, we tackle both challenges simultaneously. First, we identify a new type of partition-level operator fusion, which we utilize to internally reduce the high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Machine Learning in Materials Science
