CADA: Communication-Adaptive Distributed Adam

Tianyi Chen; Ziye Guo; Yuejiao Sun; Wotao Yin

arXiv:2012.15469·cs.LG·January 1, 2021·1 cites

CADA: Communication-Adaptive Distributed Adam

Tianyi Chen, Ziye Guo, Yuejiao Sun, Wotao Yin

PDF

Open Access 1 Repo

TL;DR

CADA is a communication-efficient distributed Adam variant that adaptively reuses stale gradients to reduce communication rounds while maintaining convergence rates comparable to Adam.

Contribution

This paper introduces CADA, a novel adaptive SGD method for distributed learning that reduces communication by reusing stale Adam gradients without sacrificing convergence.

Findings

01

CADA significantly reduces communication rounds in experiments.

02

CADA maintains convergence rates similar to Adam.

03

Empirical results show improved communication efficiency.

Abstract

Stochastic gradient descent (SGD) has taken the stage as the primary workhorse for large-scale machine learning. It is often used with its adaptive variants such as AdaGrad, Adam, and AMSGrad. This paper proposes an adaptive stochastic gradient descent method for distributed machine learning, which can be viewed as the communication-adaptive counterpart of the celebrated Adam method - justifying its name CADA. The key components of CADA are a set of new rules tailored for adaptive stochastic gradients that can be implemented to save communication upload. The new algorithms adaptively reuse the stale Adam gradients, thus saving communication, and still have convergence rates comparable to original Adam. In numerical experiments, CADA achieves impressive empirical performance in terms of total communication round reduction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ChrisYZZ/CADA-master
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Statistical Methods and Inference

MethodsAdam · AMSGrad · AdaGrad