Efficient Scaling of Dynamic Graph Neural Networks
Venkatesan T. Chakaravarthy, Shivmaran S. Pandian, Saurabh Raje,, Yogish Sabharwal, Toyotaro Suzumura, Shashanka Ubaru

TL;DR
This paper introduces distributed algorithms for efficiently training dynamic Graph Neural Networks on large-scale graphs across multi-GPU systems, addressing memory, transfer, and communication bottlenecks to enable scalable GNN training.
Contribution
It presents novel graph difference-based strategies and data distribution techniques that significantly improve scalability and reduce execution time for dynamic GNNs on large graphs.
Findings
Achieved up to 30x speedup on 128 GPUs.
Reduced transfer time by up to 4.1x.
Decreased overall execution time by up to 40%.
Abstract
We present distributed algorithms for training dynamic Graph Neural Networks (GNN) on large scale graphs spanning multi-node, multi-GPU systems. To the best of our knowledge, this is the first scaling study on dynamic GNN. We devise mechanisms for reducing the GPU memory usage and identify two execution time bottlenecks: CPU-GPU data transfer; and communication volume. Exploiting properties of dynamic graphs, we design a graph difference-based strategy to significantly reduce the transfer time. We develop a simple, but effective data distribution technique under which the communication volume remains fixed and linear in the input size, for any number of GPUs. Our experiments using billion-size graphs on a system of 128 GPUs shows that: (i) the distribution scheme achieves up to 30x speedup on 128 GPUs; (ii) the graph-difference technique reduces the transfer time by a factor of up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
