TL;DR
EventGraD introduces an event-triggered communication algorithm for parallel stochastic gradient descent, significantly reducing communication overhead in distributed machine learning without sacrificing accuracy.
Contribution
It proposes a novel event-triggered communication method for parallel SGD, with theoretical convergence analysis and practical validation on neural network training.
Findings
Reduces communication load by up to 60%
Maintains accuracy comparable to standard methods
Compatible with sparsification techniques
Abstract
Communication in parallel systems imposes significant overhead which often turns out to be a bottleneck in parallel machine learning. To relieve some of this overhead, in this paper, we present EventGraD - an algorithm with event-triggered communication for stochastic gradient descent in parallel machine learning. The main idea of this algorithm is to modify the requirement of communication at every iteration in standard implementations of stochastic gradient descent in parallel machine learning to communicating only when necessary at certain iterations. We provide theoretical analysis of convergence of our proposed algorithm. We also implement the proposed algorithm for data-parallel training of a popular residual neural network used for training the CIFAR-10 dataset and show that EventGraD can reduce the communication load by up to 60% while retaining the same level of accuracy. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
