Variance-based Gradient Compression for Efficient Distributed Deep   Learning

Yusuke Tsuzuku; Hiroto Imachi; Takuya Akiba

arXiv:1802.06058·cs.LG·February 21, 2018·50 cites

Variance-based Gradient Compression for Efficient Distributed Deep Learning

Yusuke Tsuzuku, Hiroto Imachi, Takuya Akiba

PDF

Open Access

TL;DR

This paper introduces a variance-based gradient compression method that significantly reduces communication overhead in distributed deep learning, maintaining accuracy and enabling efficient training in low-bandwidth environments.

Contribution

The paper proposes a novel gradient compression technique based on gradient variance, achieving high compression ratios without sacrificing model accuracy.

Findings

01

High compression ratios achieved

02

Maintains model accuracy

03

Enables efficient distributed training in low-bandwidth settings

Abstract

Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently communicate gradients, causing severe bottlenecks, especially on lower bandwidth connections. A few methods have been proposed to compress gradient for efficient communication, but they either suffer a low compression ratio or significantly harm the resulting model accuracy, particularly when applied to convolutional neural networks. To address these issues, we propose a method to reduce the communication overhead of distributed deep learning. Our key observation is that gradient updates can be delayed until an unambiguous (high amplitude, low variance) gradient has been calculated. We also present an efficient algorithm to compute the variance with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning