Activation Compression of Graph Neural Networks using Block-wise Quantization with Improved Variance Minimization
Sebastian Eliassen, Raghavendra Selvan

TL;DR
This paper improves activation compression in large-scale GNN training by introducing block-wise quantization and refined variance estimation, achieving greater memory savings and speedup with minimal performance loss.
Contribution
It proposes a block-wise quantization method and a correction to variance assumptions, enhancing memory efficiency and runtime in GNN training.
Findings
Memory consumption reduced by over 15%
Runtime per epoch increased by about 5%
Performance trade-offs similar to original EXACT
Abstract
Efficient training of large-scale graph neural networks (GNNs) has been studied with a specific focus on reducing their memory consumption. Work by Liu et al. (2022) proposed extreme activation compression (EXACT) which demonstrated drastic reduction in memory consumption by performing quantization of the intermediate activation maps down to using INT2 precision. They showed little to no reduction in performance while achieving large reductions in GPU memory consumption. In this work, we present an improvement to the EXACT strategy by using block-wise quantization of the intermediate activation maps. We experimentally analyze different block sizes and show further reduction in memory consumption (>15%), and runtime speedup per epoch (about 5%) even when performing extreme extents of quantization with similar performance trade-offs as with the original EXACT. Further, we present a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Machine Learning and ELM · Advanced Memory and Neural Computing
MethodsFocus
