AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
Chia-Yu Chen, Jungwook Choi, Daniel Brand, Ankur Agrawal, Wei Zhang,, Kailash Gopalakrishnan

TL;DR
AdaComp introduces an adaptive gradient compression method that significantly reduces communication overhead in distributed DNN training while maintaining high accuracy across various models and datasets.
Contribution
The paper presents AdaComp, a novel adaptive residual gradient compression technique that automatically adjusts compression rates based on local activity, applicable to diverse neural network architectures.
Findings
Achieves up to 200X compression for fully-connected and recurrent layers.
Achieves up to 40X compression for convolutional layers.
Maintains model accuracy despite high compression rates.
Abstract
Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained. To overcome this limitation, new gradient compression techniques are needed that are computationally friendly, applicable to a wide variety of layers seen in Deep Neural Networks and adaptable to variations in network architectures as well as their hyper-parameters. In this paper we introduce a novel technique - the Adaptive Residual Gradient Compression (AdaComp) scheme. AdaComp is based on localized selection of gradient residues and automatically tunes the compression rate depending on local activity. We show excellent results on a wide spectrum of state of the art Deep Learning models in multiple domains (vision, speech, language), datasets (MNIST, CIFAR10, ImageNet, BN50, Shakespeare),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Speech Recognition and Synthesis · Medical Image Segmentation Techniques
