Toward Communication Efficient Adaptive Gradient Method

Xiangyi Chen; Xiaoyun Li; Ping Li

arXiv:2109.05109·cs.LG·September 14, 2021

Toward Communication Efficient Adaptive Gradient Method

Xiangyi Chen, Xiaoyun Li, Ping Li

PDF

Open Access

TL;DR

This paper introduces a new adaptive gradient method tailored for federated learning, aiming to enhance communication efficiency and ensure convergence in distributed training on low-bandwidth devices.

Contribution

The paper proposes a novel adaptive gradient algorithm specifically designed for federated learning, addressing communication bottlenecks and convergence guarantees.

Findings

01

Achieves improved communication efficiency in federated learning.

02

Guarantees convergence of the adaptive gradient method in distributed settings.

03

Demonstrates effectiveness on large-scale neural network training.

Abstract

In recent years, distributed optimization is proven to be an effective approach to accelerate training of large scale machine learning models such as deep neural networks. With the increasing computation power of GPUs, the bottleneck of training speed in distributed training is gradually shifting from computation to communication. Meanwhile, in the hope of training machine learning models on mobile devices, a new distributed training paradigm called ``federated learning'' has become popular. The communication time in federated learning is especially important due to the low bandwidth of mobile devices. While various approaches to improve the communication efficiency have been proposed for federated learning, most of them are designed with SGD as the prototype training algorithm. While adaptive gradient methods have been proven effective for training neural nets, the study of adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Machine Learning and ELM

MethodsStochastic Gradient Descent