Adaptive Message Quantization and Parallelization for Distributed   Full-graph GNN Training

Borui Wan; Juntao Zhao; Chuan Wu

arXiv:2306.01381·cs.LG·June 5, 2023·2 cites

Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

Borui Wan, Juntao Zhao, Chuan Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces AdaQP, a system that accelerates distributed full-graph GNN training by adaptive message quantization and parallelization, significantly improving throughput with minimal accuracy loss.

Contribution

The paper proposes a novel adaptive quantization scheme and communication-computation parallelization for efficient distributed GNN training, with theoretical convergence guarantees.

Findings

01

Up to 3.01x throughput improvement in training speed.

02

Negligible accuracy drop of at most 0.30%.

03

Effective adaptive bit-width assignment for messages.

Abstract

Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is bandwidth-demanding and time-consuming. Frequent exchanges of node features, embeddings and embedding gradients (all referred to as messages) across devices bring significant communication overhead for nodes with remote neighbors on other devices (marginal nodes) and unnecessary waiting time for nodes without remote neighbors (central nodes) in the training graph. This paper proposes an efficient GNN training system, AdaQP, to expedite distributed full-graph GNN training. We stochastically quantize messages transferred across devices to lower-precision integers for communication traffic reduction and advocate communication-computation parallelization between marginal nodes and central nodes. We provide theoretical analysis to prove fast training convergence (at the rate of O(T^{-1}) with T being the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raywan-110/adaqp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Age of Information Optimization · IoT and Edge/Fog Computing