MixGCN: Scalable GCN Training by Mixture of Parallelism and Mixture of   Accelerators

Cheng Wan; Runkai Tao; Zheng Du; Yang Katie Zhao; Yingyan Celine Lin

arXiv:2501.01951·cs.LG·February 26, 2025

MixGCN: Scalable GCN Training by Mixture of Parallelism and Mixture of Accelerators

Cheng Wan, Runkai Tao, Zheng Du, Yang Katie Zhao, Yingyan Celine Lin

PDF

Open Access

TL;DR

MixGCN introduces a novel training framework for GCNs that combines multiple parallelism strategies and heterogeneous accelerators, significantly improving scalability and efficiency in full-graph training.

Contribution

It proposes a combined parallelism and accelerator approach to address memory and computation challenges in scalable GCN training.

Findings

01

Achieves constant communication volume with mixture of parallelism.

02

Enhances workload balance through theoretical and empirical analysis.

03

Demonstrates improved training efficiency and scalability in experiments.

Abstract

Graph convolutional networks (GCNs) have demonstrated superiority in graph-based learning tasks. However, training GCNs on full graphs is particularly challenging, due to the following two challenges: (1) the associated feature tensors can easily explode the memory and block the communication bandwidth of modern accelerators, and (2) the computation workflow in training GCNs alternates between sparse and dense matrix operations, complicating the efficient utilization of computational resources. Existing solutions for scalable distributed full-graph GCN training mostly adopt partition parallelism, which is unsatisfactory as they only partially address the first challenge while incurring scaled-out communication volume. To this end, we propose MixGCN aiming to simultaneously address both the aforementioned challenges towards GCN training. To tackle the first challenge, MixGCN integrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsGraph Convolutional Network · ADaptive gradient method with the OPTimal convergence rate