Staleness-Alleviated Distributed GNN Training via Online   Dynamic-Embedding Prediction

Guangji Bai; Ziyang Yu; Zheng Chai; Yue Cheng; Liang Zhao

arXiv:2308.13466·cs.LG·December 12, 2023

Staleness-Alleviated Distributed GNN Training via Online Dynamic-Embedding Prediction

Guangji Bai, Ziyang Yu, Zheng Chai, Yue Cheng, Liang Zhao

PDF

Open Access

TL;DR

This paper introduces SAT, a scalable distributed GNN training framework that predicts future node embeddings to reduce staleness, improving convergence and performance on large-scale graphs.

Contribution

The paper proposes an online embedding prediction model that adaptively reduces staleness in distributed GNN training, enhancing scalability and convergence.

Findings

01

SAT reduces embedding staleness effectively.

02

Improves convergence speed on large-scale datasets.

03

Achieves better performance compared to traditional methods.

Abstract

Despite the recent success of Graph Neural Networks (GNNs), it remains challenging to train GNNs on large-scale graphs due to neighbor explosions. As a remedy, distributed computing becomes a promising solution by leveraging abundant computing resources (e.g., GPU). However, the node dependency of graph data increases the difficulty of achieving high concurrency in distributed GNN training, which suffers from the massive communication overhead. To address it, Historical value approximation is deemed a promising class of distributed training techniques. It utilizes an offline memory to cache historical information (e.g., node embedding) as an affordable approximation of the exact value and achieves high concurrency. However, such benefits come at the cost of involving dated training information, leading to staleness, imprecision, and convergence issues. To overcome these challenges, this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Machine Learning and ELM · Brain Tumor Detection and Classification

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings