Sequential Aggregation and Rematerialization: Distributed Full-batch   Training of Graph Neural Networks on Large Graphs

Hesham Mostafa

arXiv:2111.06483·cs.LG·April 18, 2022

Sequential Aggregation and Rematerialization: Distributed Full-batch Training of Graph Neural Networks on Large Graphs

Hesham Mostafa

PDF

Open Access 1 Repo

TL;DR

This paper introduces SAR, a distributed method for full-batch GNN training on large graphs that improves memory efficiency and scalability through sequential rematerialization, enabling training of larger models than previously possible.

Contribution

The paper presents SAR, a novel distributed training scheme for GNNs that allows training on entire large graphs with linear memory scaling, and introduces optimization techniques for attention-based models.

Findings

01

Memory consumption per worker decreases linearly with more workers.

02

SAR enables training of the largest full-batch GNNs to date.

03

Optimized attention kernels significantly improve runtime and memory efficiency.

Abstract

We present the Sequential Aggregation and Rematerialization (SAR) scheme for distributed full-batch training of Graph Neural Networks (GNNs) on large graphs. Large-scale training of GNNs has recently been dominated by sampling-based methods and methods based on non-learnable message passing. SAR on the other hand is a distributed technique that can train any GNN type directly on an entire large graph. The key innovation in SAR is the distributed sequential rematerialization scheme which sequentially re-constructs then frees pieces of the prohibitively large GNN computational graph during the backward pass. This results in excellent memory scaling behavior where the memory consumption per worker goes down linearly with the number of workers, even for densely connected graphs. Using SAR, we report the largest applications of full-batch GNN training to-date, and demonstrate large memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

intellabs/sar
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Stochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data