Optimizing Task Placement and Online Scheduling for Distributed GNN   Training Acceleration

Ziyue Luo; Yixin Bao; Chuan Wu

arXiv:2204.11224·cs.DC·August 23, 2022

Optimizing Task Placement and Online Scheduling for Distributed GNN Training Acceleration

Ziyue Luo, Yixin Bao, Chuan Wu

PDF

TL;DR

This paper presents a novel framework for task placement and online scheduling to accelerate distributed GNN training, addressing data transmission bottlenecks and resource utilization issues.

Contribution

It introduces an integrated algorithm framework with online scheduling and task placement schemes specifically designed for distributed GNN training.

Findings

01

Achieved up to 67% training speed-up over baselines.

02

Improved resource utilization and execution pipelining.

03

Effective handling of large graph data transmission challenges.

Abstract

Training Graph Neural Networks (GNN) on large graphs is resource-intensive and time-consuming, mainly due to the large graph data that cannot be fit into the memory of a single machine, but have to be fetched from distributed graph storage and processed on the go. Unlike distributed deep neural network (DNN) training, the bottleneck in distributed GNN training lies largely in large graph data transmission for constructing mini-batches of training samples. Existing solutions often advocate data-computation colocation, and do not work well with limited resources where the colocation is infeasible. The potentials of strategical task placement and optimal scheduling of data transmission and task execution have not been well explored. This paper designs an efficient algorithm framework for task placement and execution scheduling of distributed GNN training, to better resource utilization,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.