Sequential Aggregation and Rematerialization: Distributed Full-batch Training of Graph Neural Networks on Large Graphs
Hesham Mostafa

TL;DR
This paper introduces SAR, a distributed method for full-batch GNN training on large graphs that improves memory efficiency and scalability through sequential rematerialization, enabling training of larger models than previously possible.
Contribution
The paper presents SAR, a novel distributed training scheme for GNNs that allows training on entire large graphs with linear memory scaling, and introduces optimization techniques for attention-based models.
Findings
Memory consumption per worker decreases linearly with more workers.
SAR enables training of the largest full-batch GNNs to date.
Optimized attention kernels significantly improve runtime and memory efficiency.
Abstract
We present the Sequential Aggregation and Rematerialization (SAR) scheme for distributed full-batch training of Graph Neural Networks (GNNs) on large graphs. Large-scale training of GNNs has recently been dominated by sampling-based methods and methods based on non-learnable message passing. SAR on the other hand is a distributed technique that can train any GNN type directly on an entire large graph. The key innovation in SAR is the distributed sequential rematerialization scheme which sequentially re-constructs then frees pieces of the prohibitively large GNN computational graph during the backward pass. This results in excellent memory scaling behavior where the memory consumption per worker goes down linearly with the number of workers, even for densely connected graphs. Using SAR, we report the largest applications of full-batch GNN training to-date, and demonstrate large memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Stochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data
