DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training

Hongkuan Zhou; Da Zheng; Xiang Song; George Karypis; Viktor Prasanna

arXiv:2307.07649·cs.LG·July 18, 2023·1 cites

DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training

Hongkuan Zhou, Da Zheng, Xiang Song, George Karypis, Viktor Prasanna

PDF

Open Access

TL;DR

DistTGL introduces a scalable distributed training framework for memory-based Temporal Graph Neural Networks, significantly improving training efficiency and accuracy on multi-GPU clusters.

Contribution

It presents a novel training algorithm, an enhanced TGNN model, and system optimizations enabling near-linear speedup and better accuracy in distributed environments.

Findings

01

Achieves 10.17x training throughput increase

02

Outperforms single-machine methods by 14.5% in accuracy

03

Attains near-linear convergence speedup

Abstract

Memory-based Temporal Graph Neural Networks are powerful tools in dynamic graph representation learning and have demonstrated superior performance in many real-world applications. However, their node memory favors smaller batch sizes to capture more dependencies in graph events and needs to be maintained synchronously across all trainers. As a result, existing frameworks suffer from accuracy loss when scaling to multiple GPUs. Evenworse, the tremendous overhead to synchronize the node memory make it impractical to be deployed to distributed GPU clusters. In this work, we propose DistTGL -- an efficient and scalable solution to train memory-based TGNNs on distributed GPU clusters. DistTGL has three improvements over existing solutions: an enhanced TGNN model, a novel training algorithm, and an optimized system. In experiments, DistTGL achieves near-linear convergence speedup,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Human Pose and Action Recognition