MixGCN: Scalable GCN Training by Mixture of Parallelism and Mixture of Accelerators
Cheng Wan, Runkai Tao, Zheng Du, Yang Katie Zhao, Yingyan Celine Lin

TL;DR
MixGCN introduces a novel training framework for GCNs that combines multiple parallelism strategies and heterogeneous accelerators, significantly improving scalability and efficiency in full-graph training.
Contribution
It proposes a combined parallelism and accelerator approach to address memory and computation challenges in scalable GCN training.
Findings
Achieves constant communication volume with mixture of parallelism.
Enhances workload balance through theoretical and empirical analysis.
Demonstrates improved training efficiency and scalability in experiments.
Abstract
Graph convolutional networks (GCNs) have demonstrated superiority in graph-based learning tasks. However, training GCNs on full graphs is particularly challenging, due to the following two challenges: (1) the associated feature tensors can easily explode the memory and block the communication bandwidth of modern accelerators, and (2) the computation workflow in training GCNs alternates between sparse and dense matrix operations, complicating the efficient utilization of computational resources. Existing solutions for scalable distributed full-graph GCN training mostly adopt partition parallelism, which is unsatisfactory as they only partially address the first challenge while incurring scaled-out communication volume. To this end, we propose MixGCN aiming to simultaneously address both the aforementioned challenges towards GCN training. To tackle the first challenge, MixGCN integrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsGraph Convolutional Network · ADaptive gradient method with the OPTimal convergence rate
