AdaComp : Adaptive Residual Gradient Compression for Data-Parallel   Distributed Training

Chia-Yu Chen; Jungwook Choi; Daniel Brand; Ankur Agrawal; Wei Zhang,; Kailash Gopalakrishnan

arXiv:1712.02679·cs.LG·December 8, 2017·74 cites

AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training

Chia-Yu Chen, Jungwook Choi, Daniel Brand, Ankur Agrawal, Wei Zhang,, Kailash Gopalakrishnan

PDF

Open Access

TL;DR

AdaComp introduces an adaptive gradient compression method that significantly reduces communication overhead in distributed DNN training while maintaining high accuracy across various models and datasets.

Contribution

The paper presents AdaComp, a novel adaptive residual gradient compression technique that automatically adjusts compression rates based on local activity, applicable to diverse neural network architectures.

Findings

01

Achieves up to 200X compression for fully-connected and recurrent layers.

02

Achieves up to 40X compression for convolutional layers.

03

Maintains model accuracy despite high compression rates.

Abstract

Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained. To overcome this limitation, new gradient compression techniques are needed that are computationally friendly, applicable to a wide variety of layers seen in Deep Neural Networks and adaptable to variations in network architectures as well as their hyper-parameters. In this paper we introduce a novel technique - the Adaptive Residual Gradient Compression (AdaComp) scheme. AdaComp is based on localized selection of gradient residues and automatically tunes the compression rate depending on local activity. We show excellent results on a wide spectrum of state of the art Deep Learning models in multiple domains (vision, speech, language), datasets (MNIST, CIFAR10, ImageNet, BN50, Shakespeare),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Speech Recognition and Synthesis · Medical Image Segmentation Techniques