A Distributed SGD Algorithm with Global Sketching for Deep Learning Training Acceleration
LingFei Dai, Boyu Diao, Chao Li, Yongjun Xu

TL;DR
This paper introduces gs-SGD, a global sketching-based gradient compression method for distributed deep learning that reduces communication overhead and improves convergence efficiency compared to existing methods.
Contribution
The paper proposes a novel global gradient sketching technique using Count-Sketch for distributed SGD, enhancing scalability and convergence in deep learning training.
Findings
gs-SGD achieves 1.3-3.1x higher throughput than gTop-k.
Better convergence efficiency than global Top-k and Sketching-based methods.
Communication complexity of O(log d * log P) for large-scale models.
Abstract
Distributed training is an effective way to accelerate the training process of large-scale deep learning models. However, the parameter exchange and synchronization of distributed stochastic gradient descent introduce a large amount of communication overhead. Gradient compression is an effective method to reduce communication overhead. In synchronization SGD compression methods, many Top-k sparsification based gradient compression methods have been proposed to reduce the communication. However, the centralized method based on the parameter servers has the single point of failure problem and limited scalability, while the decentralized method with global parameter exchanging may reduce the convergence rate of training. In contrast with Top- based methods, we proposed a gradient compression method with globe gradient vector sketching, which uses the Count-Sketch structure to store the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications
