Communication-Free Distributed GNN Training with Vertex Cut

Kaidi Cao; Rui Deng; Shirley Wu; Edward W Huang; Karthik Subbian; Jure; Leskovec

arXiv:2308.03209·cs.LG·August 8, 2023

Communication-Free Distributed GNN Training with Vertex Cut

Kaidi Cao, Rui Deng, Shirley Wu, Edward W Huang, Karthik Subbian, Jure, Leskovec

PDF

Open Access

TL;DR

The paper introduces CoFree-GNN, a communication-free distributed training framework for GNNs using Vertex Cut partitioning, which significantly accelerates training without sacrificing accuracy, achieving up to 10x speedup.

Contribution

It proposes a novel communication-free distributed GNN training method utilizing Vertex Cut partitioning and a reweighting mechanism to maintain accuracy.

Findings

01

Speeds up GNN training by up to 10 times.

02

Maintains high model accuracy with reweighting.

03

Effective on real-world large-scale networks.

Abstract

Training Graph Neural Networks (GNNs) on real-world graphs consisting of billions of nodes and edges is quite challenging, primarily due to the substantial memory needed to store the graph and its intermediate node and edge features, and there is a pressing need to speed up the training process. A common approach to achieve speed up is to divide the graph into many smaller subgraphs, which are then distributed across multiple GPUs in one or more machines and processed in parallel. However, existing distributed methods require frequent and substantial cross-GPU communication, leading to significant time overhead and progressively diminishing scalability. Here, we introduce CoFree-GNN, a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training. The framework utilizes a Vertex Cut partitioning, i.e., rather than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Brain Tumor Detection and Classification

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings