A Distributed SGD Algorithm with Global Sketching for Deep Learning   Training Acceleration

LingFei Dai; Boyu Diao; Chao Li; Yongjun Xu

arXiv:2108.06004·cs.DC·August 16, 2021·1 cites

A Distributed SGD Algorithm with Global Sketching for Deep Learning Training Acceleration

LingFei Dai, Boyu Diao, Chao Li, Yongjun Xu

PDF

Open Access

TL;DR

This paper introduces gs-SGD, a global sketching-based gradient compression method for distributed deep learning that reduces communication overhead and improves convergence efficiency compared to existing methods.

Contribution

The paper proposes a novel global gradient sketching technique using Count-Sketch for distributed SGD, enhancing scalability and convergence in deep learning training.

Findings

01

gs-SGD achieves 1.3-3.1x higher throughput than gTop-k.

02

Better convergence efficiency than global Top-k and Sketching-based methods.

03

Communication complexity of O(log d * log P) for large-scale models.

Abstract

Distributed training is an effective way to accelerate the training process of large-scale deep learning models. However, the parameter exchange and synchronization of distributed stochastic gradient descent introduce a large amount of communication overhead. Gradient compression is an effective method to reduce communication overhead. In synchronization SGD compression methods, many Top-k sparsification based gradient compression methods have been proposed to reduce the communication. However, the centralized method based on the parameter servers has the single point of failure problem and limited scalability, while the decentralized method with global parameter exchanging may reduce the convergence rate of training. In contrast with Top- $k$ based methods, we proposed a gradient compression method with globe gradient vector sketching, which uses the Count-Sketch structure to store the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications