Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization
Yujia Wang, Lu Lin, Jinghui Chen

TL;DR
This paper introduces a communication-efficient distributed adaptive gradient method, AMSGrad, with provable convergence guarantees for nonconvex optimization, reducing communication costs while maintaining performance.
Contribution
It presents a novel communication-compressed AMSGrad algorithm with theoretical convergence guarantees for distributed nonconvex optimization.
Findings
Converges to first-order stationary points at the same rate as uncompressed AMSGrad.
Effective gradient compression strategy reduces communication costs.
Experimental results validate theoretical claims.
Abstract
Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers. While error feedback compression has been proven to be successful in reducing communication costs with stochastic gradient descent (SGD), there are much fewer attempts in building communication-efficient adaptive gradient methods with provable guarantees, which are widely used in training large-scale machine learning models. In this paper, we propose a new communication-compressed AMSGrad for distributed nonconvex optimization problem, which is provably efficient. Our proposed distributed learning framework features an effective gradient compression strategy and a worker-side model update design. We prove that the proposed communication-efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data
MethodsAMSGrad
